Opened 14 years ago

Closed 14 years ago

Last modified 14 years ago

#536 closed defect (fixed)

Get rid of the STRICT on geography ST_Intersects

Reported by: robe Owned by: pramsey
Priority: high Milestone: PostGIS 1.5.2
Component: postgis Version: 1.5.X
Keywords: Cc:

Description

The STRICT clause on the ST_Intersects geography is preventing the planner from dissolving the function into the index + non-index constituent parts.

This means it is never using the index. Removing the STRICT clause seems to fix that.

See thread http://postgis.refractions.net/pipermail/postgis-users/2010-May/026831.html

Haven't checked the other geography functions to see if they suffer the same issue.

Change History (25)

comment:1 by mcayland, 14 years ago

I'm not 100% convinced that this is the right thing to be doing here, since we only have one reported use case and we don't have a test data set to play with. In any case, STRICT means that if any input parameter is NULL then the function can be short-circuited to return NULL which should help the planner.

In short - without test data and configuration we can work with, I don't feel we have enough evidence to make this change.

Mark.

comment:2 by robe, 14 years ago

I've tested it it doesn't work with STRICT in there. I can provide test data.

But the other point is is that we don't have it in place for geometry ST_Intersects, so why are we treating geography any differently?

comment:3 by robe, 14 years ago

Mark,

Here is your test

CREATE TABLE geogtest(gid SERIAL primary key, geog geography(POLYGON,4326));
CREATE INDEX idx_geogtest_geog
  ON geogtest
  USING gist
  (geog);
  
INSERT INTO geogtest(geog)
SELECT ST_Buffer(geog,random()*10) As geog
FROM (SELECT ST_GeogFromText('POINT(' || i*0.5 || ' ' || j*0.5 || ')') As geog
	FROM generate_series(-350,350) As i 
		CROSS JOIN generate_series(-175,175) As j
		) As foo
LIMIT 1000;

vacuum analyze geogtest;

SELECT f.gid as gid1, count(f2.gid) As tot
FROM geogtest As f INNER JOIN geogtest As f2
	ON ST_Intersects(f.geog, f2.geog)
GROUP BY f.gid;

With STRICT in there -- this query is unbearably slow. So slow I can't wait for it to finish.

Take strict out and it completes in 78 ms.

This is running on 8.4, PostGIS 1.5.1

I think what the STRICT is doing, even though not documented I think that its killing the inlining behavior of the SQL function that we've been relying on to break out the index piece from rest of the function. Its a non-issue for plpgsql functions since they are treated as blackboxes anyway, but a killer for SQL functions where we are counting on the planner to see thru it to see there is a piece that can benefit from index.

I suppose we could say this could be a bug in PostgreSQL and maybe worth noting.

comment:4 by pramsey, 14 years ago

Why aren't *all* our inlined functions broken then?

comment:5 by robe, 14 years ago

Because we don't have STRICT on them. The geometry one looks like this

CREATE OR REPLACE FUNCTION st_intersects(geometry, geometry)
  RETURNS boolean AS
'SELECT $1 && $2 AND _ST_Intersects($1,$2)'
  LANGUAGE 'sql' IMMUTABLE

This new affection for STRICT started recently -- perhaps becuase of our whole NULL IS NULL = NULL NULL NULL = false philosophizing :)

Actually geography ST_Dwithin is fine -- doesn't have strict. I suspect that is why I never noticed it because I rarely use ST_Intersects for my proximity queries since I usually need buffer regions.

comment:6 by robe, 14 years ago

Well actually ST_Covers and ST_Coveredby for geography have STRICT on them too, but the geometry counterparts do not have STRICT.

I suspect those ones no one noticed since they only support POINT/POLYGON

comment:7 by mcayland, 14 years ago

I think the first thing to do would be to try this on 8.3 and see if the behaviour there is the same. The key thing is to find out whether this is a PostgreSQL bug, or whether we are relying on some kind of unspecified behaviour.

Mark.

comment:8 by robe, 14 years ago

Just tried on my 8.3.7 install. Same issue. Although I've just been testing on windows.

comment:9 by mcayland, 14 years ago

I don't think the platform is an issue here. Marking a function as IMMUTABLE means that for a constant input, it returns a constant output which does seem to make sense in this in-line context. IIRC PostgreSQL changed a while back so that by default all functions are marked as VOLATILE by default which is perhaps why it's taken us a while to notice. So I'm leaning towards thinking Regina is not too far off the mark with this fix.

Incidentally, does the following variation also work?

CREATE OR REPLACE FUNCTION st_intersects(geography, geography)

RETURNS boolean AS

'SELECT $1 && $2 AND _ST_Distance($1, $2, 0.0, false) < 0.00001'

LANGUAGE 'sql' IMMUTABLE STRICT COST 100;

Mark.

comment:10 by robe, 14 years ago

Nope that's still slow. How is that different from what we have in place currently. We have all the sql functions as marked IMMUTABLE already.

comment:11 by mcayland, 14 years ago

Even on 8.3? That's weird. The IMMUTABLE and STRICT should be orthogonal concepts since the first indicates whether a function can result can be inlined at plan time, while the second indicates that if any NULL argument is passed in then the function execution can be short-circuited. I think we may need some more PostgreSQL-fu here to understand how they interact with each other.

Mark.

comment:12 by anarazel, 14 years ago

I read this bug via Reginas Blog and can explain the issue at least a bit. There are several conditions causing strict functions to fail:

  • it uses other, non-strict functions. If it would get the strictness would get lost. Imageine a unstrict function add(a, b) := coalesce(a, 0) + coalesce(b, 0) and strict strict_add(a, b) := add(a,b). If strict_add() would get inlined the promise of it returning NULL if any of its parameters are NULL does not hold true anymore.
  • some of the parameters are not used at all. Example: strict func_a(a, b) := a. If that would get inlined the definition would change.

Those conditions are *completely* orthogonal to those imposed by volatility - which imposes a different set of constraints.

comment:13 by anarazel, 14 years ago

Btw, volatile functions or non-volatile functions containing volatile functions do get inlined as well if some conditions are met:

  • the function itself is not less volatile than its contents (i.e. a STABLE function containing a VOLATILE one
  • no parameter *containing* a voltile function is used more than once. (actually I think theres a bug here... It should also not inline if the volatile parameter is not used at all)

comment:14 by pramsey, 14 years ago

Sounds like this is decided now? Regina, you should commit this change.

comment:15 by anarazel, 14 years ago

Actually it might just be some missing STRICT at another place or an unused parameter which causes the problem. So it maybe not necessary to remove those STRICTs.

comment:16 by mcayland, 14 years ago

Yeah, this is the bit I don't get since marking a function STRICT should indicate that it is possible to short-circuit the function as a potential optimisation, not make it slower.

I'm inclined to take Regina's test case up onto pgsql-hackers and get a definitive answer.

comment:17 by robe, 14 years ago

I looked at all the functions used in these and they all have STRICT on them, so Mark go ahead and take it up on pgsql-hackers. It is my understanding that if anarazel is right about the whole STRICT thing, then this should be working with the strict in there since all the functions it uses have STRICT.

comment:18 by anarazel, 14 years ago

Looking in the code the problem lies in the 'AND' in

 'SELECT $1 && $2 AND _ST_Intersects($1,$2)'

"contain_nonstrict_functions" (in clauses.c) considers *all* boolean expressions as being fundamentally nonstrict.....

	if (IsA(node, BoolExpr))
	{
		BoolExpr   *expr = (BoolExpr *) node;

		switch (expr->boolop)
		{
			case AND_EXPR:
			case OR_EXPR:
				/* AND, OR are inherently non-strict */
				return true;
			default:
				break;
		}
	}

Why OR is a good qualification for that is pretty clear (SELECT NULL OR TRUE), but why AND as well I cannot see right now.

comment:19 by anarazel, 14 years ago

Ah, yes: SELECT NULL AND FALSE => 'f' and not NULL...

comment:20 by robe, 14 years ago

So we are in agreement this should be removed. Anarazel's argument makes sense to me.

comment:21 by pramsey, 14 years ago

I am. Yee hawr.

comment:22 by mcayland, 14 years ago

Yup, you learn something new everyday. Let's do it :)

comment:23 by robe, 14 years ago

Done - r5704, r5705, r5706 (ST_Intersects, ST_Covers, ST_CoveredBy,ST_Dwithin)

comment:24 by robe, 14 years ago

Resolution: fixed
Status: newclosed

comment:25 by nicklas, 14 years ago

The same fix for ST_Equals branch 1.5 r5876 trunk r5877

/Nicklas

Note: See TracTickets for help on using tickets.