#536 closed defect (fixed)
Get rid of the STRICT on geography ST_Intersects
Reported by: | robe | Owned by: | pramsey |
---|---|---|---|
Priority: | high | Milestone: | PostGIS 1.5.2 |
Component: | postgis | Version: | 1.5.X |
Keywords: | Cc: |
Description
The STRICT clause on the ST_Intersects geography is preventing the planner from dissolving the function into the index + non-index constituent parts.
This means it is never using the index. Removing the STRICT clause seems to fix that.
See thread http://postgis.refractions.net/pipermail/postgis-users/2010-May/026831.html
Haven't checked the other geography functions to see if they suffer the same issue.
Change History (25)
comment:1 by , 14 years ago
comment:2 by , 14 years ago
I've tested it it doesn't work with STRICT in there. I can provide test data.
But the other point is is that we don't have it in place for geometry ST_Intersects, so why are we treating geography any differently?
comment:3 by , 14 years ago
Mark,
Here is your test
CREATE TABLE geogtest(gid SERIAL primary key, geog geography(POLYGON,4326)); CREATE INDEX idx_geogtest_geog ON geogtest USING gist (geog); INSERT INTO geogtest(geog) SELECT ST_Buffer(geog,random()*10) As geog FROM (SELECT ST_GeogFromText('POINT(' || i*0.5 || ' ' || j*0.5 || ')') As geog FROM generate_series(-350,350) As i CROSS JOIN generate_series(-175,175) As j ) As foo LIMIT 1000; vacuum analyze geogtest; SELECT f.gid as gid1, count(f2.gid) As tot FROM geogtest As f INNER JOIN geogtest As f2 ON ST_Intersects(f.geog, f2.geog) GROUP BY f.gid;
With STRICT in there -- this query is unbearably slow. So slow I can't wait for it to finish.
Take strict out and it completes in 78 ms.
This is running on 8.4, PostGIS 1.5.1
I think what the STRICT is doing, even though not documented I think that its killing the inlining behavior of the SQL function that we've been relying on to break out the index piece from rest of the function. Its a non-issue for plpgsql functions since they are treated as blackboxes anyway, but a killer for SQL functions where we are counting on the planner to see thru it to see there is a piece that can benefit from index.
I suppose we could say this could be a bug in PostgreSQL and maybe worth noting.
comment:5 by , 14 years ago
Because we don't have STRICT on them. The geometry one looks like this
CREATE OR REPLACE FUNCTION st_intersects(geometry, geometry) RETURNS boolean AS 'SELECT $1 && $2 AND _ST_Intersects($1,$2)' LANGUAGE 'sql' IMMUTABLE
This new affection for STRICT started recently -- perhaps becuase of our whole NULL IS NULL = NULL NULL NULL = false philosophizing :)
Actually geography ST_Dwithin is fine -- doesn't have strict. I suspect that is why I never noticed it because I rarely use ST_Intersects for my proximity queries since I usually need buffer regions.
comment:6 by , 14 years ago
Well actually ST_Covers and ST_Coveredby for geography have STRICT on them too, but the geometry counterparts do not have STRICT.
I suspect those ones no one noticed since they only support POINT/POLYGON
comment:7 by , 14 years ago
I think the first thing to do would be to try this on 8.3 and see if the behaviour there is the same. The key thing is to find out whether this is a PostgreSQL bug, or whether we are relying on some kind of unspecified behaviour.
Mark.
comment:8 by , 14 years ago
Just tried on my 8.3.7 install. Same issue. Although I've just been testing on windows.
comment:9 by , 14 years ago
I don't think the platform is an issue here. Marking a function as IMMUTABLE means that for a constant input, it returns a constant output which does seem to make sense in this in-line context. IIRC PostgreSQL changed a while back so that by default all functions are marked as VOLATILE by default which is perhaps why it's taken us a while to notice. So I'm leaning towards thinking Regina is not too far off the mark with this fix.
Incidentally, does the following variation also work?
CREATE OR REPLACE FUNCTION st_intersects(geography, geography)
RETURNS boolean AS
'SELECT $1 && $2 AND _ST_Distance($1, $2, 0.0, false) < 0.00001'
LANGUAGE 'sql' IMMUTABLE STRICT COST 100;
Mark.
comment:10 by , 14 years ago
Nope that's still slow. How is that different from what we have in place currently. We have all the sql functions as marked IMMUTABLE already.
comment:11 by , 14 years ago
Even on 8.3? That's weird. The IMMUTABLE and STRICT should be orthogonal concepts since the first indicates whether a function can result can be inlined at plan time, while the second indicates that if any NULL argument is passed in then the function execution can be short-circuited. I think we may need some more PostgreSQL-fu here to understand how they interact with each other.
Mark.
comment:12 by , 14 years ago
I read this bug via Reginas Blog and can explain the issue at least a bit. There are several conditions causing strict functions to fail:
- it uses other, non-strict functions. If it would get the strictness would get lost. Imageine a unstrict function add(a, b) := coalesce(a, 0) + coalesce(b, 0) and strict strict_add(a, b) := add(a,b). If strict_add() would get inlined the promise of it returning NULL if any of its parameters are NULL does not hold true anymore.
- some of the parameters are not used at all. Example: strict func_a(a, b) := a. If that would get inlined the definition would change.
Those conditions are *completely* orthogonal to those imposed by volatility - which imposes a different set of constraints.
comment:13 by , 14 years ago
Btw, volatile functions or non-volatile functions containing volatile functions do get inlined as well if some conditions are met:
- the function itself is not less volatile than its contents (i.e. a STABLE function containing a VOLATILE one
- no parameter *containing* a voltile function is used more than once. (actually I think theres a bug here... It should also not inline if the volatile parameter is not used at all)
comment:14 by , 14 years ago
Sounds like this is decided now? Regina, you should commit this change.
comment:15 by , 14 years ago
Actually it might just be some missing STRICT at another place or an unused parameter which causes the problem. So it maybe not necessary to remove those STRICTs.
comment:16 by , 14 years ago
Yeah, this is the bit I don't get since marking a function STRICT should indicate that it is possible to short-circuit the function as a potential optimisation, not make it slower.
I'm inclined to take Regina's test case up onto pgsql-hackers and get a definitive answer.
comment:17 by , 14 years ago
I looked at all the functions used in these and they all have STRICT on them, so Mark go ahead and take it up on pgsql-hackers. It is my understanding that if anarazel is right about the whole STRICT thing, then this should be working with the strict in there since all the functions it uses have STRICT.
comment:18 by , 14 years ago
Looking in the code the problem lies in the 'AND' in
'SELECT $1 && $2 AND _ST_Intersects($1,$2)'
"contain_nonstrict_functions" (in clauses.c) considers *all* boolean expressions as being fundamentally nonstrict.....
if (IsA(node, BoolExpr)) { BoolExpr *expr = (BoolExpr *) node; switch (expr->boolop) { case AND_EXPR: case OR_EXPR: /* AND, OR are inherently non-strict */ return true; default: break; } }
Why OR is a good qualification for that is pretty clear (SELECT NULL OR TRUE), but why AND as well I cannot see right now.
comment:20 by , 14 years ago
So we are in agreement this should be removed. Anarazel's argument makes sense to me.
comment:23 by , 14 years ago
comment:24 by , 14 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
I'm not 100% convinced that this is the right thing to be doing here, since we only have one reported use case and we don't have a test data set to play with. In any case, STRICT means that if any input parameter is NULL then the function can be short-circuited to return NULL which should help the planner.
In short - without test data and configuration we can work with, I don't feel we have enough evidence to make this change.
Mark.