#5025 closed defect (fixed)
Inconsistent behavior with ST_Within across versions
Reported by: | byrman | Owned by: | pramsey |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 3.2.3 |
Component: | postgis | Version: | 3.1.x |
Keywords: | Cc: |
Description
I noticed inconsistent behavior with ST_Within across different PostgreSQL-PostGIS versions. To demonstrate the issue, I used postgis/postgis Docker images and a minimal example.
Steps to reproduce the problem
-- Create table "a" having both a 2D and 3D spatial index on its geometry field. CREATE TABLE a (id serial, geom geometry(PointZ, 4326) NOT NULL); CREATE INDEX a_2d_idx ON a USING GIST(geom); CREATE INDEX a_3d_idx ON a USING GIST(geom gist_geometry_ops_nd); -- Create table "b" having both a 2D and 3D spatial index on its geometry field. CREATE TABLE b (id serial, geom geometry(PolygonZ, 4326) NOT NULL); CREATE INDEX b_2d_idx ON b USING GIST (geom); CREATE INDEX b_3d_idx ON b USING GIST (geom gist_geometry_ops_nd); -- Insert a polygon (z=0) into table "b". INSERT INTO b (geom) SELECT ST_Force3D(ST_MakeEnvelope(-1, -1, 1, 1, 4326)); -- Generate 100 pseudo-random points (z=0) which lie within the polygon and insert them into table "a". INSERT INTO a (geom) SELECT ST_Force3D((ST_Dump(ST_GeneratePoints((SELECT geom FROM b), 100))).geom); -- Change the z coordinate of half of the points from 0 into 1. UPDATE a SET geom=ST_SetSRID(ST_MakePoint(ST_x(geom), ST_y(geom), 1), 4326) WHERE MOD(id, 2)=0; -- Ensure up-to-date statistics. VACUUM ANALYZE; -- Finally, run the query below. SELECT COUNT(*) FROM a, b WHERE ST_Within(a.geom, b.geom);
Results
postgis/postgis:11-2.5-alpine: 100 postgis/postgis:11-3.0-alpine: 100 postgis/postgis:11-3.1-alpine: 100 postgis/postgis:12-2.5-alpine: 100 postgis/postgis:12-3.0-alpine: 50 postgis/postgis:12-3.1-alpine: 50 postgis/postgis:13-3.0-alpine: 50 postgis/postgis:13-3.1-alpine: 50 postgis/postgis:14-3.1-alpine: 50
- Clearly, 50 and 100 is the result of a 3D and 2D calculation, respectively. For a user, however, it's unclear what calculation to expect. Documentation on ST_Within does not help.
- Even more confusing is that the result also depends on the indexes present. For example, dropping index a_3d_idx in 14-3.1-alpine, changes the result from 50 into 100. I expect the presence of an index to affect the performance of a query, not its result.
Change History (16)
comment:1 by , 3 years ago
comment:2 by , 3 years ago
Milestone: | PostGIS PostgreSQL → PostGIS 3.0.6 |
---|
comment:3 by , 3 years ago
Imre Samu (docker-postgis contributor) recommended me on the postgis-users mailing list to remove the older docker images from my test cases and to add postgres / postgis / alpine versions:
TAG: 11-2.5-alpine IMAGE ID: 10c7c19a572c CREATED: 4 days ago VERSION: PostgreSQL 11.14 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit POSTGIS_FULL_VERSION: POSTGIS="2.5.5" [EXTENSION] PGSQL="110" GEOS="3.8.1-CAPI-1.13.3" PROJ="Rel. 7.2.1, January 1st, 2021" GDAL="GDAL 3.2.3, released 2021/04/27" LIBXML="2.9.12" LIBJSON="0.15" LIBPROTOBUF="1.4.0" TOPOLOGY RASTER OS VERSION_ID: 3.15.0 COUNT(*) BEFORE DROPPING INDEX a_3d_idx: 100 COUNT(*) AFTER DROPPING INDEX a_3d_idx: 100 TAG: 11-3.2-alpine IMAGES ID: 0303403af49f CREATED: 4 days ago VERSION: PostgreSQL 11.14 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit POSTGIS_FULL_VERSION: POSTGIS="3.2.0 0" [EXTENSION] PGSQL="110" GEOS="3.10.2-CAPI-1.16.0" PROJ="8.2.0" LIBXML="2.9.12" LIBJSON="0.15" LIBPROTOBUF="1.4.0" WAGYU="0.5.0 (Internal)" TOPOLOGY OS VERSION_ID: 3.15.0 COUNT(*) BEFORE DROPPING INDEX a_3d_idx: 100 COUNT(*) AFTER DROPPING INDEX a_3d_idx: 100 TAG: 12-3.2-alpine IMAGE ID: 964255f0cf36 CREATED: 4 days ago VERSION: PostgreSQL 12.9 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit POSTGIS_FULL_VERSION: POSTGIS="3.2.0 0" [EXTENSION] PGSQL="120" GEOS="3.10.2-CAPI-1.16.0" PROJ="8.2.0" LIBXML="2.9.12" LIBJSON="0.15" LIBPROTOBUF="1.4.0" WAGYU="0.5.0 (Internal)" TOPOLOGY OS VERSION_ID: 3.15.0 COUNT(*) BEFORE DROPPING INDEX a_3d_idx: 50 COUNT(*) AFTER DROPPING INDEX a_3d_idx: 100 TAG: 13-3.2-alpine IMAGE ID: 5136446fc2a7 CREATED: 4 days ago VERSION: PostgreSQL 13.5 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit POSTGIS_FULL_VERSION: POSTGIS="3.2.0 0" [EXTENSION] PGSQL="130" GEOS="3.10.2-CAPI-1.16.0" PROJ="8.2.0" LIBXML="2.9.12" LIBJSON="0.15" LIBPROTOBUF="1.4.0" WAGYU="0.5.0 (Internal)" TOPOLOGY OS VERSION_ID: 3.15.0 COUNT(*) BEFORE DROPPING INDEX a_3d_idx: 50 COUNT(*) AFTER DROPPING INDEX a_3d_idx: 100 TAG: 14-3.2-alpine IMAGE ID: 171454c0721a CREATED: 4 days ago VERSION: PostgreSQL 14.1 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit POSTGIS_FULL_VERSION: POSTGIS="3.2.0 0" [EXTENSION] PGSQL="140" GEOS="3.10.2-CAPI-1.16.0" PROJ="8.2.0" LIBXML="2.9.12" LIBJSON="0.15" LIBPROTOBUF="1.4.0" WAGYU="0.5.0 (Internal)" TOPOLOGY OS VERSION_ID: 3.15.0 COUNT(*) BEFORE DROPPING INDEX a_3d_idx: 50 COUNT(*) AFTER DROPPING INDEX a_3d_idx: 100
comment:4 by , 2 years ago
For POSTGIS="3.2.1 3.2.1" [EXTENSION] PGSQL="140" GEOS="3.10.2-CAPI-1.16.0" PROJ="7.2.1" GDAL="GDAL 3.4.2, released 2022/03/08" LIBXML="2.9.9" LIBJSON="0.12" LIBPROTOBUF="1.2.1" WAGYU="0.5.0 (Internal)" RASTER PostgreSQL 14.1, compiled by Visual C++ build 1914, 64-bit
My answers agree with yours, 50 with an index and 100 if I drop the a_3d_idx. I also get 100 if I keep the a_3d_idx and drop the b_2d_idx.
For
POSTGIS="2.5.7dev r17978" [EXTENSION] PGSQL="120" GEOS="3.8.3-CAPI-1.13.4" PROJ="Rel. 7.2.1, January 1st, 2021" GDAL="GDAL 3.3.3, released 2021/10/25" LIBXML="2.7.8" LIBJSON="0.12" LIBPROTOBUF="1.2.1" RASTER PostgreSQL 12.2, compiled by Visual C++ build 1914, 64-bit
I am getting 100 for both cases.
I suspect this has to do with the instrumentation use for PostgreSQL 12 and above introduced in PostGIS 3.0.
comment:5 by , 2 years ago
Milestone: | PostGIS 3.0.6 → PostGIS 3.0.7 |
---|
comment:6 by , 2 years ago
Dispensing with all the index creation above simplifies things a little and just on my latest pg14/postgis3.2 installation we can at least get a workable issue that doesn't involve juggling docker images.
pramsey=# explain SELECT COUNT(*) FROM a, b WHERE ST_Within(a.geom, b.geom); QUERY PLAN ---------------------------------------------------------------- Aggregate (cost=2505.26..2505.27 rows=1 width=8) -> Nested Loop (cost=0.00..2505.01 rows=100 width=0) Join Filter: st_within(a.geom, b.geom) -> Seq Scan on b (cost=0.00..1.01 rows=1 width=168) -> Seq Scan on a (cost=0.00..3.00 rows=100 width=40) (5 rows) pramsey=# CREATE INDEX a_3d_idx ON a USING GIST(geom gist_geometry_ops_nd); CREATE INDEX pramsey=# explain SELECT COUNT(*) FROM a, b WHERE ST_Within(a.geom, b.geom); QUERY PLAN -------------------------------------------------------------------------------- Aggregate (cost=34.42..34.43 rows=1 width=8) -> Nested Loop (cost=0.14..34.17 rows=100 width=0) -> Seq Scan on b (cost=0.00..1.01 rows=1 width=168) -> Index Scan using a_3d_idx on a (cost=0.14..33.16 rows=1 width=40) Index Cond: (geom @@ b.geom) Filter: st_within(geom, b.geom) (6 rows)
There was some changes to the way 3D indexes worked in the last few revisions, from komzpa, I cannot recall when though, or even why. The question of "is something within in 3d" is a little open, particularly if the two operands are of different dimensionality.
comment:7 by , 2 years ago
The operator and functions clearly disagree.
WITH g AS ( SELECT 'POLYGON Z ((-1 -1 0,-1 1 0,1 1 0,1 -1 0,-1 -1 0))'::geometry AS ply0, 'POINT Z (0 0 1)'::geometry AS pt1, 'POINT Z (0 0 0)'::geometry AS pt0 ) SELECT ST_Contains(ply0, pt0) AS contains_00, ST_Within(pt0, ply0) AS within_00, ST_Contains(ply0, pt1) AS contains_01, ST_Within(pt1, ply0) AS within_10, pt0 @@ ply0 AS op_within_00, pt1 @@ ply0 AS op_within_10, ply0 @@ pt1 AS op_within_01 FROM g;
comment:8 by , 2 years ago
Ran this through the debugger, and it's definitely the support function, and also definitely really hard to fix. Higher up the line, PostgreSQL is (somehow) identifying the index of interest and sending it into the support function, and then the support function chooses an appropriate operator to append to the query based on that. However, when the index is a gist_geometry_ops_nd the operator chosen is the 3d operator (@@) which has strict 3d semantics. Meanwhile the function ST_Within has 2d semantics. And thus we get the "add an index, get fewer things" behaviour.
The support function is trying to be really general, so it doesn't care what kind of index it gets and generally that works, in that it can add index clauses for SPGIST, BRIN and GIST indexes without caring much what kind of opfamily is coming in. Recognizing that ST_Within requires a 2d operator is a little hard...
comment:14 by , 2 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:16 by , 2 years ago
Milestone: | PostGIS 3.0.7 → PostGIS 3.2.3 |
---|
@pramsey, I'm switching this to 3.2.3 since I assume you are not planning to back port to 3.1 or 3.0
PostGIS 3.2 has been released. I have updated my results accordingly.