Opened 6 years ago

Closed 6 years ago

#4269 closed defect (invalid)

Inconsistent result for ST_Intersects depending on the plan

Reported by: Algunenano Owned by: pramsey
Priority: critical Milestone: PostGIS 2.4.7
Component: postgis Version: 2.5.x -- EOL
Keywords: Cc:

Description

Reported by Tom van Tilburg in the maillist: https://lists.osgeo.org/pipermail/postgis-devel/2018-December/027568.html

Same query outputs different values for st_intersects depending on the plan.

crap=# Select ST_Intersects(ag, bg), acid, bcid FROM         
(
WITH data AS (
SELECT 1 as cid,720::bigint as blockid, ST_GeomFromText('POLYGON Z ((122427.759 489209.498 14.5066901703871,122395.957 489222.943 14.3997094723791,122399.649 489231.676 4.98556790825181,122431.452 489218.231 5.09216481886026,122427.759 489209.498 14.5066901703871))') geom
UNION ALL
SELECT 2, 720, ST_GeomFromText('POLYGON Z ((122427.027 489216.053 5.97863674861823,122419.873 489219.077 6.03923989893278,122421.6 489223.163 1.83027232084998,122428.754 489220.138 1.77054008712955,122427.027 489216.053 5.97863674861823))')
UNION ALL
SELECT 3, 720, ST_GeomFromText('POLYGON Z ((122422.267 489220.741 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 30,122422.267 489220.741 0))')
)
SELECT
a.cid as acid, b.cid as bcid,  
a.geom as ag, b.geom as bg
FROM data a, data b
WHERE a.cid != b.cid  
AND a.blockid = b.blockid
AND a.blockid::int = 720
) _A;
st_intersects | acid | bcid  
---------------+------+------
f             |    3 |    1
t             |    2 |    1
f             |    3 |    2
t             |    1 |    2
f             |    2 |    3
t             |    1 |    3
(6 rows)

crap=# set enable_hashjoin=off;                              
SET
crap=# Select ST_Intersects(ag, bg), acid, bcid FROM
(
WITH data AS (
SELECT 1 as cid,720::bigint as blockid, ST_GeomFromText('POLYGON Z ((122427.759 489209.498 14.5066901703871,122395.957 489222.943 14.3997094723791,122399.649 489231.676 4.98556790825181,122431.452 489218.231 5.09216481886026,122427.759 489209.498 14.5066901703871))') geom
UNION ALL
SELECT 2, 720, ST_GeomFromText('POLYGON Z ((122427.027 489216.053 5.97863674861823,122419.873 489219.077 6.03923989893278,122421.6 489223.163 1.83027232084998,122428.754 489220.138 1.77054008712955,122427.027 489216.053 5.97863674861823))')
UNION ALL
SELECT 3, 720, ST_GeomFromText('POLYGON Z ((122422.267 489220.741 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 30,122422.267 489220.741 0))')
)
SELECT
a.cid as acid, b.cid as bcid,  
a.geom as ag, b.geom as bg
FROM data a, data b
WHERE a.cid != b.cid  
AND a.blockid = b.blockid
AND a.blockid::int = 720
) _A;
st_intersects | acid | bcid  
---------------+------+------
t             |    1 |    2
t             |    1 |    3
t             |    2 |    1
t             |    2 |    3
f             |    3 |    1
t             |    3 |    2
(6 rows)

I've tested it with PG11 - Postgis trunk and with PG10 - Postgis 2.4.

Change History (9)

comment:1 by Algunenano, 6 years ago

Plan with hashjoin on:

                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on _a  (cost=0.17..0.51 rows=1 width=9) (actual time=0.610..2.006 rows=6 loops=1)
   ->  Hash Join  (cost=0.17..0.25 rows=1 width=72) (actual time=0.135..0.191 rows=6 loops=1)
         Hash Cond: (b.blockid = a.blockid)
         Join Filter: (a.cid <> b.cid)
         Rows Removed by Join Filter: 3
         CTE data
           ->  Append  (cost=0.00..0.08 rows=3 width=44) (actual time=0.008..0.021 rows=3 loops=1)
                 ->  Result  (cost=0.00..0.01 rows=1 width=44) (actual time=0.006..0.007 rows=1 loops=1)
                 ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..0.02 rows=1 width=44) (actual time=0.005..0.006 rows=1 loops=1)
                       ->  Result  (cost=0.00..0.01 rows=1 width=40) (actual time=0.002..0.003 rows=1 loops=1)
                 ->  Subquery Scan on "*SELECT* 3"  (cost=0.00..0.02 rows=1 width=44) (actual time=0.003..0.004 rows=1 loops=1)
                       ->  Result  (cost=0.00..0.01 rows=1 width=40) (actual time=0.001..0.002 rows=1 loops=1)
         ->  CTE Scan on data b  (cost=0.00..0.06 rows=3 width=44) (actual time=0.023..0.032 rows=3 loops=1)
         ->  Hash  (cost=0.08..0.08 rows=1 width=44) (actual time=0.071..0.071 rows=3 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               ->  CTE Scan on data a  (cost=0.00..0.08 rows=1 width=44) (actual time=0.012..0.040 rows=3 loops=1)
                     Filter: ((blockid)::integer = 720)
 Planning Time: 2.196 ms
 Execution Time: 2.412 ms
(19 rows)

Plan with hashjoin off:

crap=# set enable_hashjoin=off;
SET
crap=# explain analyze Select ST_Intersects(ag, bg), acid, bcid FROM         
(
WITH data AS (
SELECT 1 as cid,720::bigint as blockid, ST_GeomFromText('POLYGON Z ((122427.759 489209.498 14.5066901703871,122395.957 489222.943 14.3997094723791,122399.649 489231.676 4.98556790825181,122431.452 489218.231 5.09216481886026,122427.759 489209.498 14.5066901703871))') geom
UNION ALL
SELECT 2, 720, ST_GeomFromText('POLYGON Z ((122427.027 489216.053 5.97863674861823,122419.873 489219.077 6.03923989893278,122421.6 489223.163 1.83027232084998,122428.754 489220.138 1.77054008712955,122427.027 489216.053 5.97863674861823))')
UNION ALL
SELECT 3, 720, ST_GeomFromText('POLYGON Z ((122422.267 489220.741 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 30,122422.267 489220.741 0))')
)
SELECT
a.cid as acid, b.cid as bcid,  
a.geom as ag, b.geom as bg
FROM data a, data b
WHERE a.cid != b.cid  
AND a.blockid = b.blockid
AND a.blockid::int = 720
) _A;
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on _a  (cost=0.08..0.52 rows=1 width=9) (actual time=0.726..2.140 rows=6 loops=1)
   ->  Nested Loop  (cost=0.08..0.26 rows=1 width=72) (actual time=0.058..0.139 rows=6 loops=1)
         Join Filter: ((a.cid <> b.cid) AND (a.blockid = b.blockid))
         Rows Removed by Join Filter: 3
         CTE data
           ->  Append  (cost=0.00..0.08 rows=3 width=44) (actual time=0.008..0.026 rows=3 loops=1)
                 ->  Result  (cost=0.00..0.01 rows=1 width=44) (actual time=0.006..0.007 rows=1 loops=1)
                 ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..0.02 rows=1 width=44) (actual time=0.005..0.006 rows=1 loops=1)
                       ->  Result  (cost=0.00..0.01 rows=1 width=40) (actual time=0.002..0.003 rows=1 loops=1)
                 ->  Subquery Scan on "*SELECT* 3"  (cost=0.00..0.02 rows=1 width=44) (actual time=0.005..0.007 rows=1 loops=1)
                       ->  Result  (cost=0.00..0.01 rows=1 width=40) (actual time=0.003..0.004 rows=1 loops=1)
         ->  CTE Scan on data a  (cost=0.00..0.08 rows=1 width=44) (actual time=0.029..0.039 rows=3 loops=1)
               Filter: ((blockid)::integer = 720)
         ->  CTE Scan on data b  (cost=0.00..0.06 rows=3 width=44) (actual time=0.002..0.017 rows=3 loops=3)
 Planning Time: 2.194 ms
 Execution Time: 2.492 ms
(16 rows)

comment:2 by komzpa, 6 years ago

suspected: geometry cache can cache prepared geometry for invalid wrongly and differ on prepared and non-prepared case.

around here: https://codecov.io/gh/postgis/postgis/src/df7a938e3de65cdfc2ab91aa3db9909600f22566/postgis/lwgeom_geos.c#L2200

Version 0, edited 6 years ago by komzpa (next)

comment:3 by pramsey, 6 years ago

Yes, order dependent issues are almost always rooted in the cache code line returning different results from the uncached code line. See if using _ST_Intersects() instead (uncached) results in predictable results (probable does). Confirm by using just the && operator as another test case (probably predictable, but worth trying... if there's an issue in the backend it's more likely to manifest with just the pure operator test). Finally, is one or more of the inputs invalid, as komzpa is alluding?

comment:4 by komzpa, 6 years ago

original mail says there is invalid:

Plane with cid 10890 is on purpose vertical and therefore seen as
'invalid' by GEOS

however ST_IsValid tells "true" on all the records.

comment:5 by tilt, 6 years ago

Using _ST_Intersects() instead of ST_Intersects does not change the issue with me.

It is correct that one of the geometries is (on purpose) invalid since it is a vertical plane. It's the third one in the example.

comment:6 by komzpa, 6 years ago

root=# select ST_IsValid('POLYGON Z ((122422.267 489220.741
root'#  0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741
root'#  30,122422.267 489220.741 0))');
NOTICE:  Too few points in geometry component at or near point 122422.26700000001 489220.74099999998 0
 st_isvalid
------------
 f
(1 row)

comment:7 by robe, 6 years ago

I thought when ST_IsValid is false the intersection is really unknown? Anyway it does seem like the caching is at fault here. ST_DWithin returns all true as expected, but the ST_Intersects flip answers depending on the order the data is traversed

WITH data AS (
SELECT 134 as cid,720::bigint as blockid, ST_GeomFromText('POLYGON Z ((122427.759 489209.498 14.5066901703871,122395.957 489222.943 14.3997094723791,122399.649 489231.676 4.98556790825181,122431.452 489218.231 5.09216481886026,122427.759 489209.498 14.5066901703871))') geom
UNION ALL
SELECT 316, 720, ST_GeomFromText('POLYGON Z ((122427.027 489216.053 5.97863674861823,122419.873 489219.077 6.03923989893278,122421.6 489223.163 1.83027232084998,122428.754 489220.138 1.77054008712955,122427.027 489216.053 5.97863674861823))')
UNION ALL
SELECT 10890, 720, ST_GeomFromText('POLYGON Z ((122422.267 489220.741 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 30,122422.267 489220.741 0))')
)
SELECT ST_DWithin(a.geom, b.geom,0), ST_Intersects(a.geom,b.geom),
ST_Distance(a.geom, b.geom), ST_ISValid(a.geom) AS isvalid_a, ST_IsValid(b.geom) AS isvalid_b,
a.cid AS acid, b.cid As bcid, 
a.blockid,b.blockid
FROM data a, data b
WHERE a.cid != b.cid 
AND a.blockid = b.blockid
AND a.blockid::int = 720
AND ST_DWithin(a.geom, b.geom,0);

Gives me true for all 6 for ST_DWithin but for ST_Intersects the 3 that have invalid geoms would be filtered out

 st_dwithin | st_intersects | st_distance | isvalid_a | isvalid_b | acid  | bcid  | blockid | blockid
------------+---------------+-------------+-----------+-----------+-------+-------+---------+---------
 t          | f             |           0 | f         | t         | 10890 |   134 |     720 |     720
 t          | t             |           0 | t         | t         |   316 |   134 |     720 |     720
 t          | f             |           0 | f         | t         | 10890 |   316 |     720 |     720
 t          | t             |           0 | t         | t         |   134 |   316 |     720 |     720
 t          | f             |           0 | t         | f         |   316 | 10890 |     720 |     720
 t          | t             |           0 | t         | f         |   134 | 10890 |     720 |     720
(6 rows)


WITH data AS (
SELECT 134 as cid,720::bigint as blockid, ST_GeomFromText('POLYGON Z ((122427.759 489209.498 14.5066901703871,122395.957 489222.943 14.3997094723791,122399.649 489231.676 4.98556790825181,122431.452 489218.231 5.09216481886026,122427.759 489209.498 14.5066901703871))') geom
UNION ALL
SELECT 316, 720, ST_GeomFromText('POLYGON Z ((122427.027 489216.053 5.97863674861823,122419.873 489219.077 6.03923989893278,122421.6 489223.163 1.83027232084998,122428.754 489220.138 1.77054008712955,122427.027 489216.053 5.97863674861823))')
UNION ALL
SELECT 10890, 720, ST_GeomFromText('POLYGON Z ((122422.267 489220.741 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 30,122422.267 489220.741 0))')
)
SELECT ST_DWithin(a.geom, b.geom,0), ST_Intersects(a.geom,b.geom),
ST_Distance(a.geom, b.geom), ST_IsValid(a.geom) AS isvalid_a, ST_IsValid(b.geom) AS isvalid_b,
a.cid AS acid, b.cid AS bcid, 
a.blockid,b.blockid
FROM data a, data b
WHERE a.cid != b.cid 
AND a.blockid = b.blockid
AND a.blockid = 720
AND ST_DWithin(a.geom, b.geom,0);

Gives me true for all 6 for ST_DWithin and filters out 1 with invalid

 st_dwithin | st_intersects | st_distance | isvalid_a | isvalid_b | acid  | bcid  | blockid | blockid
------------+---------------+-------------+-----------+-----------+-------+-------+---------+---------
 t          | t             |           0 | t         | t         |   134 |   316 |     720 |     720
 t          | t             |           0 | t         | f         |   134 | 10890 |     720 |     720
 t          | t             |           0 | t         | t         |   316 |   134 |     720 |     720
 t          | t             |           0 | t         | f         |   316 | 10890 |     720 |     720
 t          | f             |           0 | f         | t         | 10890 |   134 |     720 |     720
 t          | t             |           0 | f         | t         | 10890 |   316 |     720 |     720
(6 rows)


I thought maybe the issue was because prepared geometry is assuming ST_Intersects is a commutative operation. In the case of invalid geometries maybe it's not.

though doing a standalone and flipping the order of the geometries, I get false in both cases So this:

SELECT ST_Intersects('POLYGON Z ((122422.267 489220.741 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 30,122422.267 489220.741 0))'::geometry,
             'POLYGON Z ((122427.027 489216.053 5.97863674861823,122419.873 489219.077 6.03923989893278,122421.6 489223.163 1.83027232084998,122428.754 489220.138 1.77054008712955,122427.027 489216.053 5.97863674861823))'::geometry        
                     );

returns: 

false
SELECT ST_Intersects('POLYGON Z ((122427.027 489216.053 5.97863674861823,122419.873 489219.077 6.03923989893278,122421.6 489223.163 1.83027232084998,122428.754 489220.138 1.77054008712955,122427.027 489216.053 5.97863674861823))'::geometry   , 'POLYGON Z ((122422.267 489220.741 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 30,122422.267 489220.741 0))'::geometry
                  
                     );

returns: false

So not sure how it's getting a true for:

134 | 10890

and when I explicitly ask for the ids, I get the expected false answer for both regardless if I cast or don't cast the integer

WITH data AS (
SELECT 134 as cid,720::bigint as blockid, ST_GeomFromText('POLYGON Z ((122427.759 489209.498 14.5066901703871,122395.957 489222.943 14.3997094723791,122399.649 489231.676 4.98556790825181,122431.452 489218.231 5.09216481886026,122427.759 489209.498 14.5066901703871))') geom
UNION ALL
SELECT 316, 720, ST_GeomFromText('POLYGON Z ((122427.027 489216.053 5.97863674861823,122419.873 489219.077 6.03923989893278,122421.6 489223.163 1.83027232084998,122428.754 489220.138 1.77054008712955,122427.027 489216.053 5.97863674861823))')
UNION ALL
SELECT 10890, 720, ST_GeomFromText('POLYGON Z ((122422.267 489220.741 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 30,122422.267 489220.741 0))')
)
SELECT ST_DWithin(a.geom, b.geom,0), ST_Intersects(a.geom,b.geom),
ST_Distance(a.geom, b.geom), ST_ISValid(a.geom) AS isvalid_a, ST_IsValid(b.geom) AS isvalid_b,
a.cid AS acid, b.cid As bcid, 
a.blockid,b.blockid
FROM data a, data b
WHERE a.cid != b.cid 
AND a.blockid = b.blockid
AND a.blockid::int = 720
AND ( (a.cid = 134 AND b.cid =10890)
     OR (a.cid = 10890 and b.cid = 134) );

 st_dwithin | st_intersects | st_distance | isvalid_a | isvalid_b | acid  | bcid  | blockid | blockid
------------+---------------+-------------+-----------+-----------+-------+-------+---------+---------
 t          | f             |           0 | t         | f         |   134 | 10890 |     720 |     720
 t          | f             |           0 | f         | t         | 10890 |   134 |     720 |     720
(2 rows)

WITH data AS (
SELECT 134 as cid,720::bigint as blockid, ST_GeomFromText('POLYGON Z ((122427.759 489209.498 14.5066901703871,122395.957 489222.943 14.3997094723791,122399.649 489231.676 4.98556790825181,122431.452 489218.231 5.09216481886026,122427.759 489209.498 14.5066901703871))') geom
UNION ALL
SELECT 316, 720, ST_GeomFromText('POLYGON Z ((122427.027 489216.053 5.97863674861823,122419.873 489219.077 6.03923989893278,122421.6 489223.163 1.83027232084998,122428.754 489220.138 1.77054008712955,122427.027 489216.053 5.97863674861823))')
UNION ALL
SELECT 10890, 720, ST_GeomFromText('POLYGON Z ((122422.267 489220.741 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 30,122422.267 489220.741 0))')
)
SELECT ST_DWithin(a.geom, b.geom,0), ST_Intersects(a.geom,b.geom),
ST_Distance(a.geom, b.geom), ST_ISValid(a.geom) AS isvalid_a, ST_IsValid(b.geom) AS isvalid_b,
a.cid AS acid, b.cid As bcid, 
a.blockid,b.blockid
FROM data a, data b
WHERE a.cid != b.cid 
AND a.blockid = b.blockid
AND a.blockid = 720
AND ( (a.cid = 134 AND b.cid =10890)
     OR (a.cid = 10890 and b.cid = 134) );

 st_dwithin | st_intersects | st_distance | isvalid_a | isvalid_b | acid  | bcid  | blockid | blockid
------------+---------------+-------------+-----------+-----------+-------+-------+---------+---------
 t          | f             |           0 | t         | f         |   134 | 10890 |     720 |     720
 t          | f             |           0 | f         | t         | 10890 |   134 |     720 |     720
(2 rows)


comment:8 by pramsey, 6 years ago

Just skipping the whole join thing, which I feel is a red herring, you can see fun stuff happening in this example:

WITH data AS (                                                                                                             SELECT 1 as cid, 'POLYGON Z ((122427.759 489209.498 14.5066901703871,122395.957 489222.943 14.3997094723791,122399.649 489231.676 4.98556790825181,122431.452 489218.231 5.09216481886026,122427.759 489209.498 14.5066901703871))'::geometry geom
UNION ALL
SELECT 2, 'POLYGON Z ((122427.027 489216.053 5.97863674861823,122419.873 489219.077 6.03923989893278,122421.6 489223.163 1.83027232084998,122428.754 489220.138 1.77054008712955,122427.027 489216.053 5.97863674861823))'::geometry
UNION ALL
SELECT 3, 'POLYGON Z ((122422.267 489220.741 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 30,122422.267 489220.741 0))'::geometry
ORDER BY cid 
)
SELECT a.cid, b.cid, ST_Relate(a.geom, b.geom) AS relate, _ST_Intersects(a.geom, b.geom)
FROM data a, data b
ORDER BY a.cid, b.cid;

 cid | cid |  relate   | _st_intersects 
-----+-----+-----------+----------------
   1 |   1 | 2FFF1FFF2 | t
   1 |   2 | 212101212 | t
   1 |   3 | FF2FF1FF2 | t
   2 |   1 | 212101212 | t
   2 |   2 | 2FFF1FFF2 | t
   2 |   3 | FF2FF1FF2 | t
   3 |   1 | FFFFFF212 | f
   3 |   2 | FFFFFF212 | t
   3 |   3 | FFFFFFFF2 | t
(9 rows)

Note how the relate matrix of polygon 3 vs itself is not the standard pattern of the other two. The invalid third polygon is problematic. I don't see how this isn't just a known issue... invalid inputs lead to odd results, like non-commutative tests on intersects.

comment:9 by pramsey, 6 years ago

Resolution: invalid
Status: newclosed

And to address the join thing, we're almost certainly seeing an effect of the cache, where one join form pushes the data through in a different order from the other, so you get a full intersects once and a prepared geometry call the next time, and they give different answers when handed that invalid.

Note: See TracTickets for help on using tickets.