#4139 closed defect (fixed)
Unstable behavior of gist nd indexes
Reported by: | ezimanyi | Owned by: | pramsey |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 2.4.6 |
Component: | postgis | Version: | 2.4.x |
Keywords: | Cc: | me@…, pramsey@… |
Description
I have created a table containing random geometries of mixed dimensions
select count(*) from tbl_geomcollection_nd -- 2400 select distinct substring(st_astext(g) for position('(' in st_astext(g)) - 1) from tbl_geomcollection_nd order by 1 "LINESTRING" "LINESTRING M " "LINESTRING Z " "LINESTRING ZM " "MULTILINESTRING" "MULTILINESTRING M " "MULTILINESTRING Z " "MULTILINESTRING ZM " "MULTIPOINT" "MULTIPOINT M " "MULTIPOINT Z " "MULTIPOINT ZM " "MULTIPOLYGON" "MULTIPOLYGON M " "MULTIPOLYGON Z " "MULTIPOLYGON ZM " "POINT" "POINT M " "POINT Z " "POINT ZM " "POLYGON" "POLYGON M " "POLYGON Z " "POLYGON ZM "
Then I create an n-dimensional index with the gist_geometry_ops_nd
operator class
drop index if exists tbl_geomcollection_nd_gist_nd_idx; create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd);
However, the index shows an unstable behavior obtaining different number of answers each time I drop/create a new index.
set enable_indexscan = off; set enable_bitmapscan = off; set enable_seqscan = on; select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g -- 4470359 set enable_indexscan = on; set enable_bitmapscan = off; set enable_seqscan = off; select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g -- 4470258 -- 4470594 -- 4470658 -- 4470303 -- 4469945
The 5 numbers above are five different answers to the same query after droping/creating the index as done in the two lines above.
I can make available the file containing the test data I used if needed.
Attachments (2)
Change History (14)
comment:1 by , 6 years ago
by , 6 years ago
Attachment: | random_nd.sql added |
---|
SQL scripts for generating random geometries of mixed dimensions (2D, 3Z, 3M, 4D)
by , 6 years ago
Attachment: | create_test_tables_nd.sql added |
---|
SQL scripts for creating tables with random geometries of mixed dimensions (2D, 3Z, 3M, 4D)
comment:2 by , 6 years ago
I have made available the SQL scripts for generating the tables.
The problem does not appear when creating the table tbl_geomcollection_nd by setting limit to 10 or 20. It started to appear when limit is set to 30
drop table if exists tbl_geomcollection_nd; create table tbl_geomcollection_nd ( k serial, g geometry ); insert into tbl_geomcollection_nd (g) (select g from tbl_geompoint limit 30) union (select g from tbl_geompointz limit 30) union (select g from tbl_geompointm limit 30) union (select g from tbl_geompointzm limit 30) union (select g from tbl_geomlinestring limit 30) union (select g from tbl_geomlinestringz limit 30) union (select g from tbl_geomlinestringm limit 30) union (select g from tbl_geomlinestringzm limit 30) union (select g from tbl_geompolygon limit 30) union (select g from tbl_geompolygonz limit 30) union (select g from tbl_geompolygonm limit 30) union (select g from tbl_geompolygonzm limit 30) union (select g from tbl_geommultipoint limit 30) union (select g from tbl_geommultipointz limit 30) union (select g from tbl_geommultipointm limit 30) union (select g from tbl_geommultipointzm limit 30) union (select g from tbl_geommultilinestring limit 30) union (select g from tbl_geommultilinestringz limit 30) union (select g from tbl_geommultilinestringm limit 30) union (select g from tbl_geommultilinestringzm limit 30) union (select g from tbl_geommultipolygon limit 30) union (select g from tbl_geommultipolygonz limit 30) union (select g from tbl_geommultipolygonm limit 30) union (select g from tbl_geommultipolygonzm limit 30) ; select count(*) from tbl_geomcollection_nd -- 720 set enable_indexscan = off; set enable_bitmapscan = off; set enable_seqscan = on; select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g -- 396140 drop index if exists tbl_geomcollection_nd_gist_nd_idx; create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd); set enable_indexscan = on; set enable_bitmapscan = off; set enable_seqscan = off; select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g -- 396111 -- 396140 -- 396119 -- 396072 -- 396129
comment:3 by , 6 years ago
As an aside comment, shouldn't the function geometrytype report both the Z and M dimensions instead of being forced to write an elaborate query such as the following one for obtaining the type ?
select distinct substring(st_astext(g) for position('(' in st_astext(g)) - 1) from tbl_geomcollection_nd order by 1 "LINESTRING" "LINESTRING M " "LINESTRING Z " "LINESTRING ZM " "MULTILINESTRING" "MULTILINESTRING M " "MULTILINESTRING Z " "MULTILINESTRING ZM " "MULTIPOINT" "MULTIPOINT M " "MULTIPOINT Z " "MULTIPOINT ZM " "MULTIPOLYGON" "MULTIPOLYGON M " "MULTIPOLYGON Z " "MULTIPOLYGON ZM " "POINT" "POINT M " "POINT Z " "POINT ZM " "POLYGON" "POLYGON M " "POLYGON Z " "POLYGON ZM " select distinct geometrytype(g) from tbl_geomcollection_nd order by 1 "LINESTRING" "LINESTRINGM" "MULTILINESTRING" "MULTILINESTRINGM" "MULTIPOINT" "MULTIPOINTM" "MULTIPOLYGON" "MULTIPOLYGONM" "POINT" "POINTM" "POLYGON" "POLYGONM"
comment:4 by , 6 years ago
You are expected to use ST_HasM / ST_HasZ to get the type. "LINESRTINGM" is only required as it reinterprets third coordinate as M instead of Z.
LINESTRING has X Y [Z [M]].
LINESTRINGM has X Y [M].
comment:5 by , 6 years ago
I'm not able to reproduce it with trunk:
index_test=# select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g; count --------- 4451734 (1 row) index_test=# select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g; count --------- 4451734 (1 row) index_test=# EXPLAIN ANALYZE select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------- Aggregate (cost=10000001376.15..10000001376.16 rows=1 width=8) (actual time=1889.556..1889.556 rows=1 loops=1) -> Nested Loop (cost=10000000000.15..10000001364.00 rows=4860 width=0) (actual time=0.085..1705.876 rows=4451734 loops=1) -> Seq Scan on tbl_geomcollection_nd t1 (cost=10000000000.00..10000000186.00 rows=2400 width=596) (actual time=0.014..0.338 rows =2400 loops=1) -> Index Scan using tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd t2 (cost=0.14..0.48 rows=1 width=596) (actual tim e=0.019..0.603 rows=1855 loops=2400) Index Cond: (t1.g &&& g) Planning Time: 0.424 ms Execution Time: 1889.612 ms (7 rows) index_test=#
Can you report what Postgis and Postgresql versions are you using? Also an `EXPLAIN ANALYZE` of the query.
comment:6 by , 6 years ago
I downloaded and installed yesterday the last bundle available here http://download.osgeo.org/postgis/windows/pg10/
SELECT version() || ' ' || postgis_full_version(); -- "PostgreSQL 10.1, compiled by Visual C++ build 1800, 64-bit POSTGIS="2.4.4 r16526" PGSQL="100" GEOS="3.6.2-CAPI-1.10.2 4d2925d" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 2.2.4, released 2018/03/19" LIBXML="2.7.8" LIBJSON="0.12" LIBPROTOBUF="1.2.1" RASTER" drop table if exists tbl_geomcollection_nd; create table tbl_geomcollection_nd ( k serial, g geometry ); insert into tbl_geomcollection_nd (g) (select g from tbl_geompoint limit 30) union (select g from tbl_geompointz limit 30) union (select g from tbl_geompointm limit 30) union (select g from tbl_geompointzm limit 30) union (select g from tbl_geomlinestring limit 30) union (select g from tbl_geomlinestringz limit 30) union (select g from tbl_geomlinestringm limit 30) union (select g from tbl_geomlinestringzm limit 30) union (select g from tbl_geompolygon limit 30) union (select g from tbl_geompolygonz limit 30) union (select g from tbl_geompolygonm limit 30) union (select g from tbl_geompolygonzm limit 30) union (select g from tbl_geommultipoint limit 30) union (select g from tbl_geommultipointz limit 30) union (select g from tbl_geommultipointm limit 30) union (select g from tbl_geommultipointzm limit 30) union (select g from tbl_geommultilinestring limit 30) union (select g from tbl_geommultilinestringz limit 30) union (select g from tbl_geommultilinestringm limit 30) union (select g from tbl_geommultilinestringzm limit 30) union (select g from tbl_geommultipolygon limit 30) union (select g from tbl_geommultipolygonz limit 30) union (select g from tbl_geommultipolygonm limit 30) union (select g from tbl_geommultipolygonzm limit 30) ; -- Query returned successfully: 720 rows affected, 662 msec execution time. drop index if exists tbl_geomcollection_nd_gist_nd_idx; create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd); set enable_indexscan = off; set enable_bitmapscan = off; set enable_seqscan = on; select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g -- 396140 explain analyze select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g "Aggregate (cost=7897.56..7897.57 rows=1 width=8) (actual time=1684.293..1684.293 rows=1 loops=1)" " -> Nested Loop (cost=0.00..7892.20 rows=2145 width=0) (actual time=0.078..1609.226 rows=396140 loops=1)" " Join Filter: (t1.g &&& t2.g)" " Rows Removed by Join Filter: 122260" " -> Seq Scan on tbl_geomcollection_nd t1 (cost=0.00..57.20 rows=720 width=625) (actual time=0.029..0.697 rows=720 loops=1)" " -> Materialize (cost=0.00..60.80 rows=720 width=625) (actual time=0.000..0.098 rows=720 loops=720)" " -> Seq Scan on tbl_geomcollection_nd t2 (cost=0.00..57.20 rows=720 width=625) (actual time=0.016..0.354 rows=720 loops=1)" "Planning time: 0.403 ms" "Execution time: 1684.731 ms" set enable_indexscan = on; set enable_bitmapscan = off; set enable_seqscan = off; select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g -- 396100 explain analyze select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g "Aggregate (cost=10000000420.96..10000000420.97 rows=1 width=8) (actual time=737.204..737.205 rows=1 loops=1)" " -> Nested Loop (cost=10000000000.14..10000000415.60 rows=2145 width=0) (actual time=0.329..657.469 rows=396100 loops=1)" " -> Seq Scan on tbl_geomcollection_nd t1 (cost=10000000000.00..10000000057.20 rows=720 width=625) (actual time=0.034..0.339 rows=720 loops=1)" " -> Index Scan using tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd t2 (cost=0.14..0.49 rows=1 width=625) (actual time=0.107..0.796 rows=550 loops=720)" " Index Cond: (t1.g &&& g)" "Planning time: 1.061 ms" "Execution time: 737.288 ms" drop index if exists tbl_geomcollection_nd_gist_nd_idx; create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd); select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g -- 396140 drop index if exists tbl_geomcollection_nd_gist_nd_idx; create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd); select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g -- 396124 drop index if exists tbl_geomcollection_nd_gist_nd_idx; create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd); select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g -- 396128
In our development machine I needed to increase the size of the tables to reproduce the problem
SELECT version() || ' ' || postgis_full_version(); "PostgreSQL 11beta1 on x86_64-pc-linux-gnu, compiled by clang version 5.0.0 (tags/RELEASE_500/final), 64-bit POSTGIS="2.5.0beta1dev r16611" [EXTENSION] PGSQL="110" GEOS="3.6.2-CAPI-1.10.2 4d2925d6" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 2.2.3, release (...)" drop table if exists tbl_geomcollection_nd; create table tbl_geomcollection_nd ( k serial, g geometry ); insert into tbl_geomcollection_nd (g) (select g from tbl_geompoint limit 100) union (select g from tbl_geompointz limit 100) union (select g from tbl_geompointm limit 100) union (select g from tbl_geompointzm limit 100) union (select g from tbl_geomlinestring limit 100) union (select g from tbl_geomlinestringz limit 100) union (select g from tbl_geomlinestringm limit 100) union (select g from tbl_geomlinestringzm limit 100) union (select g from tbl_geompolygon limit 100) union (select g from tbl_geompolygonz limit 100) union (select g from tbl_geompolygonm limit 100) union (select g from tbl_geompolygonzm limit 100) union (select g from tbl_geommultipoint limit 100) union (select g from tbl_geommultipointz limit 100) union (select g from tbl_geommultipointm limit 100) union (select g from tbl_geommultipointzm limit 100) union (select g from tbl_geommultilinestring limit 100) union (select g from tbl_geommultilinestringz limit 100) union (select g from tbl_geommultilinestringm limit 100) union (select g from tbl_geommultilinestringzm limit 100) union (select g from tbl_geommultipolygon limit 100) union (select g from tbl_geommultipolygonz limit 100) union (select g from tbl_geommultipolygonm limit 100) union (select g from tbl_geommultipolygonzm limit 100) ; -- Query returned successfully: 2400 rows affected, 94 msec execution time. drop index if exists tbl_geomcollection_nd_gist_nd_idx; create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd); set enable_indexscan = off; set enable_bitmapscan = off; set enable_seqscan = on; select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g -- 4455622 explain analyze select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g "Aggregate (cost=86804.40..86804.41 rows=1 width=8) (actual time=10509.189..10509.189 rows=1 loops=1)" " -> Nested Loop (cost=0.00..86790.00 rows=5760 width=0) (actual time=0.019..10117.923 rows=4455622 loops=1)" " Join Filter: (t1.g &&& t2.g)" " Rows Removed by Join Filter: 1304378" " -> Seq Scan on tbl_geomcollection_nd t1 (cost=0.00..192.00 rows=2400 width=32) (actual time=0.009..0.987 rows=2400 loops=1)" " -> Materialize (cost=0.00..204.00 rows=2400 width=32) (actual time=0.000..0.136 rows=2400 loops=2400)" " -> Seq Scan on tbl_geomcollection_nd t2 (cost=0.00..192.00 rows=2400 width=32) (actual time=0.002..0.742 rows=2400 loops=1)" "Planning Time: 0.096 ms" "Execution Time: 10509.285 ms" set enable_indexscan = on; set enable_bitmapscan = off; set enable_seqscan = off; select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g -- 4455551 explain analyze select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g "Aggregate (cost=10000001408.47..10000001408.48 rows=1 width=8) (actual time=3889.217..3889.217 rows=1 loops=1)" " -> Nested Loop (cost=10000000000.15..10000001394.00 rows=5788 width=0) (actual time=0.082..3470.596 rows=4455551 loops=1)" " -> Seq Scan on tbl_geomcollection_nd t1 (cost=10000000000.00..10000000192.00 rows=2400 width=626) (actual time=0.018..0.427 rows=2400 loops=1)" " -> Index Scan using tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd t2 (cost=0.14..0.49 rows=1 width=626) (actual time=0.045..1.265 rows=1856 loops=2400)" " Index Cond: (t1.g &&& g)" "Planning Time: 0.268 ms" "Execution Time: 3889.255 ms" drop index if exists tbl_geomcollection_nd_gist_nd_idx; create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd); select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g -- 4455562 drop index if exists tbl_geomcollection_nd_gist_nd_idx; create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd); select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g -- 4455504 drop index if exists tbl_geomcollection_nd_gist_nd_idx; create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd); select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g -- 4455537
comment:7 by , 6 years ago
Milestone: | PostGIS 2.4.5 → PostGIS 2.4.6 |
---|
comment:8 by , 6 years ago
I'm experiencing same unstable behaviour for st_coveredby function on 2D GIST indexed geometry column. Seems like 2D is also affected.
Please share the file. Is the query minimizable? Can you limit the table to only the differing geometries and still catch the issue?