Opened 4 years ago
Closed 3 years ago
#4711 closed defect (fixed)
ST_Union loses precision on complex multilinestring geometries
Reported by: | dannytoone | Owned by: | mdavis |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS GEOS |
Component: | postgis | Version: | 3.0.x |
Keywords: | overlay | Cc: | dannytoone |
Description
Version info:
PostgreSQL 12.3, compiled by Visual C++ build 1914, 64-bit POSTGIS="3.0.1 3.0.1" [EXTENSION] PGSQL="120" GEOS="3.8.0-CAPI-1.13.1 " PROJ="Rel. 5.2.0, September 15th, 2018" LIBXML="2.9.9" LIBJSON="0.12" LIBPROTOBUF="1.2.1" WAGYU="0.4.3 (Internal)"
I have a process where I am trying to analyze FCC license spectrum geographic boundaries across multiple frequency ranges. I am trying to follow a process very similar to this blog post:
http://blog.cleverelephant.ca/2019/07/postgis-overlays.html
The license boundaries, when viewed across several overlapping licenses that are broadcasting on different frequencies, is quite complex. When unioning the various linestrings via ST_Union, I get very odd results. I have found that somehow ST_Union is losing precision on the underlying datapoints.
create temp table all_rings AS select distinct ST_ExteriorRing((ST_Dump(geom)).geom) as geom from call_sign_polys where geom && ST_MakeEnvelope(-79.762152,40.496103,-71.856214,45.01585) ;
The underlying rows have several decimal places of precision.
select left(ST_AsText(geom),100) from all_rings; ======== LINESTRING(-74.911677 39.463256,-74.903864 39.457263,-74.897414 39.452315,-74.893314 39.449815,-74.8 LINESTRING(-79.293682 40.040413,-79.29359 40.040398,-79.293681 40.040272,-79.294509 40.039114,-79.29 LINESTRING(-79.2999533093807 40.4383075755913,-79.315177512369 40.3395369154545,-79.3552093057337 40 LINESTRING(-79.2999533093807 40.4383075755913,-79.315177512369 40.3395369154545,-79.3552093057337 40
However, all of that precision is lost when ST_Union is called:
select left(ST_AsText(ST_Union(geom)),200) from all_rings; ================== MULTILINESTRING((-75 39,-76 39,-76 40),(-76 40,-75 40),(-75 40,-75 41),(-75 40,-74 40),(-75 40,-75 39),(-79 40,-80 40),(-80 40,-81 40),(-81 40,-81 41),(-81 41,-80 41),(-80 41,-79 41),(-79 41,-79 40),(
Attachments (2)
Change History (15)
by , 4 years ago
Attachment: | Annotation 2020-06-29 091307.png added |
---|
comment:1 by , 4 years ago
Since this is an operation which relies on complex data, it is difficult to get the underlying data to you. I can try to work on a more generic generated dataset which can reproduce the condition, which I will add when I have found it.
Until then, please let me know of any other information that you feel would be valuable for understanding this.
comment:2 by , 4 years ago
I believe I have a reproducible example using generated data. Please let me know if you have any problems with running or generating the dataset.
Unfortunately, the geometries are not simple, and simple geometries did not reproduce this bug, so creating a dataset that reproduces the bug is not simple.
For background, the data I'm working on, that I'm trying to emulate with this generated data set, is roughly analogous to the license geometries in the MDS/ITFS frequency bands (2.5-2.7ghz), as regulated by the FCC. They do not overlap when viewing the same frequencies, but across frequencies they may overlap. The following temp tables set up a data structure that is somewhat similar to these license boundaries.
DROP TABLE IF EXISTS areas; CREATE TEMP TABLE areas AS WITH centerpoints AS ( SELECT row_number() over () as id, floor(random() * 10 + 1) as lic_group, ST_SetSRID(ST_MakePoint(random() * 10 + 1, random() * 10 + 1),4326) as centerpoint FROM generate_series(1,10000) ) SELECT id, lic_group, centerpoint, ST_Buffer(centerpoint::geography,10000)::geometry as geom FROM centerpoints ; CREATE INDEX idx_geom ON areas USING GIST (geom); CREATE INDEX idx_group ON areas (lic_group); DROP TABLE IF EXISTS area_exclusions; CREATE TEMP TABLE area_exclusions AS WITH split_lines AS ( SELECT a.id, b.id as b_id, a.lic_group, a.centerpoint, a.geom as a_geom, b.geom as b_geom, ST_LineFromMultiPoint(ST_Intersection(ST_ExteriorRing(a.geom),ST_ExteriorRing(b.geom))) as split_line FROM areas a INNER JOIN areas b ON ST_Intersects(a.geom,b.geom) AND a.lic_group = b.lic_group AND a.id != b.id ), split_areas AS ( SELECT id, b_id, lic_group, centerpoint, ST_Split(ST_Union(a_geom,b_geom),split_line) as split_geom FROM split_lines ) SELECT id, b_id, lic_group, CASE ST_ContainsProperly(ST_GeometryN(split_geom,1),centerpoint) WHEN false THEN ST_GeometryN(split_geom,1) WHEN true THEN ST_GeometryN(split_geom,2) END as geom FROM split_areas ; CREATE INDEX idx_e_id ON area_exclusions (id,lic_group); DROP TABLE IF EXISTS processed_areas; CREATE TEMP TABLE processed_areas AS WITH exclusions AS ( SELECT id, lic_group, ST_Union(geom) as geom FROM area_exclusions GROUP BY id, lic_group ) SELECT id, lic_group, ST_Difference(i.geom,COALESCE(e.geom,ST_GeomFromEWKT('SRID=4326; POLYGON EMPTY'))) as geom FROM areas i LEFT JOIN exclusions e USING (id,lic_group) ;
If I am to query a single lic_group
I can see that they do not have overlapping geometries:
SELECT geom FROM processed_areas WHERE lic_group = 1;
However, across several lic_group
s, there is plenty of overlap:
SELECT geom FROM processed_areas;
.
=======================
Okay, that is the underlying data. Now onto the actual bug. This table is merely the processed geometries as an exterior ring linestring.
DROP TABLE IF EXISTS rings; CREATE TEMP TABLE rings AS SELECT id, lic_group, ST_ExteriorRing(geom) as geom FROM processed_areas ;
Viewing the rings themselves, we can see there is plenty of precision in the underlying points contributing to the linestrings:
SELECT Left(ST_AsText(geom),200) FROM rings; ===================== LINESTRING(10.8643881174031 3.77734188849076,10.8626209909011 3.75970533281878,10.8574629500733 3.74275385983207,10.8491123276953 3.72713876384977,10.8378900600966 3.71345998467201,10.8242273509016 3. LINESTRING(7.57876514569725 2.81432363450241,7.57705816723587 2.7966769867886,7.5719601905669 2.77970431048747,7.56366729736813 2.76405788049345,7.55249832730074 2.75033896339531,7.53888260377069 2.73 LINESTRING(8.47788820510008 6.81387172768232,8.46467044867379 6.8125559523933,8.44701315022753 6.81427415953191,8.43003023487508 6.81940188537274,8.41437401669197 6.82774214951577,8.40064591887797 6.8 ...
However, once unioned, that precision is lost:
SELECT Left(ST_AsText(ST_Union(geom)),200) FROM rings; ================== MULTILINESTRING((8 3,7 3),(8 7,9 7),(2 10,2 9),(10 4,9 4),(4 8,3 8),(2 1,2 2),(5 10,5 11),(9 11,8 11),(9 9,8 9),(3 9,2 9),(11 1,10 1),(10 1,11 2),(11 2,11 1),(3 5,3 4),(5 2,5 1),(10 6,9 6),(9 10,8 10)...
Again, please let me know if this example works for you to be able to reproduce.
comment:3 by , 4 years ago
Cc: | added |
---|---|
Version: | 2.5.x → 3.0.x |
comment:4 by , 4 years ago
It reproduces.
You've managed to produce a data set that has a lot of cases in it where the line intersection routines produce numerical precision problems (leading to the dreaded "TopologyException" error).
In an attempt to recover from TopologyExceptions, GEOS has fallback code that tries to perturb the geometry in hopes of side-stepping the precision issues. That code includes a routine that progressively strips precision from the inputs. This is almost 100% for sure the problem. Amazingly, it does eventually succeed in getting to an answer without an exception. Unfortunately, that answer is garbage.
The alternative, unfortunately, would be turning off the fallbacks and having the calculation simply stop when it hits the exception.
My colleague Martin Davis and I are working on bringing a more robust algorithm for overlay (union/difference/intersection) from JTS into GEOS, I'll point him to your data so he can ensure that the new algorithm does in fact correctly handle it. If we are fortunate it will be in GEOS 3.9 in the fall.
comment:5 by , 4 years ago
To follow on from what Paul has said, I tried the union using the new Overlay algorithm in JTS, and it works perfectly (and is pretty fast too).
The new Overlay uses a more effective snapping approach to improve robustness. It doesn't require the aggressive precision reduction that PostGIS/GEOS is using now. So we're very hopeful this will solve most or all of these kinds of issues.
As a side note, your synthetic data generation process seems like a great way to produce stress-testing datasets for overlay. This is actually tricky to do with synthetic data, so well done!
comment:6 by , 4 years ago
A further note on the cause of the overlay errors. It's not the overlap of the lic_group
s that causes the problem, it's the geometry in each group itself. The group polygons are made non-overlapping by splitting them along the lines that split overlapping circles (which is a clever technique, by the way). However, the splitting and differencing involved introduces some numeric "jitter". The result is polygons which contain nearly-coincident lines along their boundaries. This is a classic cause of overlay robustness issues. And it's something that snapping takes care of very effectively.
follow-ups: 9 10 comment:7 by , 4 years ago
Thanks pramsey and mdavis. I've crossposted this into the GEOS bug tracker:
https://trac.osgeo.org/geos/ticket/1034
Feel free to reclassify as necessary, as it is primarily to have a reference to the dataset.
I wish I could take credit for the circle splitting technique, but it is just an implementation of the FCC's required rules for splitting p35 license boundaries when transmitting on the same frequencies as a neighboring licensee in the MDS/ITFS spectrum. So in my best snarky tone of voice, all thanks go to the FCC!
On the subject of snapping, how would you go about doing that? Snapping everything to a grid before unioning seems to work to some degree, but some of the geometries still end up jagged, and some still end up squared.
comment:8 by , 4 years ago
Milestone: | PostGIS 3.1.0 → PostGIS GEOS |
---|
comment:9 by , 4 years ago
Replying to dannytoone:
Thanks pramsey and mdavis. I've crossposted this into the GEOS bug tracker:
Great, thanks.
comment:10 by , 4 years ago
Replying to dannytoone:
On the subject of snapping, how would you go about doing that? Snapping everything to a grid before unioning seems to work to some degree, but some of the geometries still end up jagged, and some still end up squared.
The snapping mentioned above is a new technique developed for the forthcoming OverlayNG. The algorithm is surprisingly simple: (i) scan all vertices of the input geometries and snap each one to previously scanned ones; (ii) scan all segments and snap them at intersections and to nearby anchor points. This is liable to introduce topology collapses, but these are removed by the overlay algorithm when it reforms the topology.
This is a heuristic algorithm - i.e. the snapping can produce situations with topology so invalid that it causes the overlay to fail. But these are very rare in practice, and so far can all be handled by just snapping more aggresively.
comment:11 by , 4 years ago
Keywords: | overlay added |
---|---|
Owner: | changed from | to
Status: | new → assigned |
comment:12 by , 4 years ago
Based on the GEOS ticket it looks like this should be fixed by OverlayNG. So closing.
comment:13 by , 3 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Before ST_Union call