Opened 4 years ago

Closed 3 years ago

#4711 closed defect (fixed)

ST_Union loses precision on complex multilinestring geometries

Reported by: dannytoone Owned by: mdavis
Priority: medium Milestone: PostGIS GEOS
Component: postgis Version: 3.0.x
Keywords: overlay Cc: dannytoone

Description

Version info:

PostgreSQL 12.3, compiled by Visual C++ build 1914, 64-bit POSTGIS="3.0.1 3.0.1" [EXTENSION] PGSQL="120" GEOS="3.8.0-CAPI-1.13.1 " PROJ="Rel. 5.2.0, September 15th, 2018" LIBXML="2.9.9" LIBJSON="0.12" LIBPROTOBUF="1.2.1" WAGYU="0.4.3 (Internal)"

I have a process where I am trying to analyze FCC license spectrum geographic boundaries across multiple frequency ranges. I am trying to follow a process very similar to this blog post:

http://blog.cleverelephant.ca/2019/07/postgis-overlays.html

The license boundaries, when viewed across several overlapping licenses that are broadcasting on different frequencies, is quite complex. When unioning the various linestrings via ST_Union, I get very odd results. I have found that somehow ST_Union is losing precision on the underlying datapoints.

create temp table all_rings AS
select distinct ST_ExteriorRing((ST_Dump(geom)).geom) as geom
from call_sign_polys
where geom && ST_MakeEnvelope(-79.762152,40.496103,-71.856214,45.01585)
;

The underlying rows have several decimal places of precision.

select left(ST_AsText(geom),100)
from all_rings;
========
LINESTRING(-74.911677 39.463256,-74.903864 39.457263,-74.897414 39.452315,-74.893314 39.449815,-74.8
LINESTRING(-79.293682 40.040413,-79.29359 40.040398,-79.293681 40.040272,-79.294509 40.039114,-79.29
LINESTRING(-79.2999533093807 40.4383075755913,-79.315177512369 40.3395369154545,-79.3552093057337 40
LINESTRING(-79.2999533093807 40.4383075755913,-79.315177512369 40.3395369154545,-79.3552093057337 40

However, all of that precision is lost when ST_Union is called:

select left(ST_AsText(ST_Union(geom)),200)
from all_rings;
==================
MULTILINESTRING((-75 39,-76 39,-76 40),(-76 40,-75 40),(-75 40,-75 41),(-75 40,-74 40),(-75 40,-75 39),(-79 40,-80 40),(-80 40,-81 40),(-81 40,-81 41),(-81 41,-80 41),(-80 41,-79 41),(-79 41,-79 40),(

Attachments (2)

Annotation 2020-06-29 091307.png (988.8 KB ) - added by dannytoone 4 years ago.
Before ST_Union call
Annotation 2020-06-29 091352.png (452.5 KB ) - added by dannytoone 4 years ago.
After ST_Union call

Download all attachments as: .zip

Change History (15)

by dannytoone, 4 years ago

Before ST_Union call

by dannytoone, 4 years ago

After ST_Union call

comment:1 by dannytoone, 4 years ago

Since this is an operation which relies on complex data, it is difficult to get the underlying data to you. I can try to work on a more generic generated dataset which can reproduce the condition, which I will add when I have found it.

Until then, please let me know of any other information that you feel would be valuable for understanding this.

comment:2 by dannytoone, 4 years ago

I believe I have a reproducible example using generated data. Please let me know if you have any problems with running or generating the dataset.

Unfortunately, the geometries are not simple, and simple geometries did not reproduce this bug, so creating a dataset that reproduces the bug is not simple.

For background, the data I'm working on, that I'm trying to emulate with this generated data set, is roughly analogous to the license geometries in the MDS/ITFS frequency bands (2.5-2.7ghz), as regulated by the FCC. They do not overlap when viewing the same frequencies, but across frequencies they may overlap. The following temp tables set up a data structure that is somewhat similar to these license boundaries.

DROP TABLE IF EXISTS areas;
CREATE TEMP TABLE areas AS 
WITH centerpoints AS (
    SELECT 
        row_number() over () as id, 
        floor(random() * 10 + 1) as lic_group,
        ST_SetSRID(ST_MakePoint(random() * 10 + 1, random() * 10 + 1),4326) as centerpoint
    FROM generate_series(1,10000)
)
SELECT
    id,
    lic_group,
    centerpoint,
    ST_Buffer(centerpoint::geography,10000)::geometry as geom
FROM centerpoints
;
CREATE INDEX idx_geom ON areas USING GIST (geom);
CREATE INDEX idx_group ON areas (lic_group);

DROP TABLE IF EXISTS area_exclusions;
CREATE TEMP TABLE area_exclusions AS 
WITH split_lines AS (
    SELECT 
        a.id,
        b.id as b_id,
        a.lic_group,
        a.centerpoint,
        a.geom as a_geom,
        b.geom as b_geom,
        ST_LineFromMultiPoint(ST_Intersection(ST_ExteriorRing(a.geom),ST_ExteriorRing(b.geom))) as split_line
    FROM areas a
    INNER JOIN areas b
        ON ST_Intersects(a.geom,b.geom)
        AND a.lic_group = b.lic_group
        AND a.id != b.id
), split_areas AS (
    SELECT
        id,
        b_id,
        lic_group,
        centerpoint,
        ST_Split(ST_Union(a_geom,b_geom),split_line) as split_geom
    FROM split_lines
)
SELECT
    id,
    b_id,
    lic_group,
    CASE ST_ContainsProperly(ST_GeometryN(split_geom,1),centerpoint)
        WHEN false THEN ST_GeometryN(split_geom,1)
        WHEN true THEN ST_GeometryN(split_geom,2)
    END as geom
FROM split_areas
;
CREATE INDEX idx_e_id ON area_exclusions (id,lic_group);

DROP TABLE IF EXISTS processed_areas;
CREATE TEMP TABLE processed_areas AS 
WITH exclusions AS (
    SELECT 
        id,
        lic_group,
        ST_Union(geom) as geom
    FROM area_exclusions
    GROUP BY
        id,
        lic_group
)
SELECT     
    id,
    lic_group,
    ST_Difference(i.geom,COALESCE(e.geom,ST_GeomFromEWKT('SRID=4326; POLYGON EMPTY'))) as geom
FROM areas i
LEFT JOIN exclusions e
    USING (id,lic_group)
;

If I am to query a single lic_group I can see that they do not have overlapping geometries:

SELECT geom
FROM processed_areas
WHERE lic_group = 1;

However, across several lic_groups, there is plenty of overlap:

SELECT geom
FROM processed_areas;

.

=======================

Okay, that is the underlying data. Now onto the actual bug. This table is merely the processed geometries as an exterior ring linestring.

DROP TABLE IF EXISTS rings;
CREATE TEMP TABLE rings AS
SELECT id, lic_group, ST_ExteriorRing(geom) as geom
FROM processed_areas
;

Viewing the rings themselves, we can see there is plenty of precision in the underlying points contributing to the linestrings:

SELECT Left(ST_AsText(geom),200)
FROM rings;
=====================
LINESTRING(10.8643881174031 3.77734188849076,10.8626209909011 3.75970533281878,10.8574629500733 3.74275385983207,10.8491123276953 3.72713876384977,10.8378900600966 3.71345998467201,10.8242273509016 3.
LINESTRING(7.57876514569725 2.81432363450241,7.57705816723587 2.7966769867886,7.5719601905669 2.77970431048747,7.56366729736813 2.76405788049345,7.55249832730074 2.75033896339531,7.53888260377069 2.73
LINESTRING(8.47788820510008 6.81387172768232,8.46467044867379 6.8125559523933,8.44701315022753 6.81427415953191,8.43003023487508 6.81940188537274,8.41437401669197 6.82774214951577,8.40064591887797 6.8
...

However, once unioned, that precision is lost:

SELECT Left(ST_AsText(ST_Union(geom)),200)
FROM rings;
==================
MULTILINESTRING((8 3,7 3),(8 7,9 7),(2 10,2 9),(10 4,9 4),(4 8,3 8),(2 1,2 2),(5 10,5 11),(9 11,8 11),(9 9,8 9),(3 9,2 9),(11 1,10 1),(10 1,11 2),(11 2,11 1),(3 5,3 4),(5 2,5 1),(10 6,9 6),(9 10,8 10)...

Again, please let me know if this example works for you to be able to reproduce.

Last edited 4 years ago by dannytoone (previous) (diff)

comment:3 by dannytoone, 4 years ago

Cc: dannytoone added
Version: 2.5.x3.0.x

comment:4 by pramsey, 4 years ago

It reproduces.

You've managed to produce a data set that has a lot of cases in it where the line intersection routines produce numerical precision problems (leading to the dreaded "TopologyException" error).

In an attempt to recover from TopologyExceptions, GEOS has fallback code that tries to perturb the geometry in hopes of side-stepping the precision issues. That code includes a routine that progressively strips precision from the inputs. This is almost 100% for sure the problem. Amazingly, it does eventually succeed in getting to an answer without an exception. Unfortunately, that answer is garbage.

The alternative, unfortunately, would be turning off the fallbacks and having the calculation simply stop when it hits the exception.

My colleague Martin Davis and I are working on bringing a more robust algorithm for overlay (union/difference/intersection) from JTS into GEOS, I'll point him to your data so he can ensure that the new algorithm does in fact correctly handle it. If we are fortunate it will be in GEOS 3.9 in the fall.

comment:5 by mdavis, 4 years ago

To follow on from what Paul has said, I tried the union using the new Overlay algorithm in JTS, and it works perfectly (and is pretty fast too).

The new Overlay uses a more effective snapping approach to improve robustness. It doesn't require the aggressive precision reduction that PostGIS/GEOS is using now. So we're very hopeful this will solve most or all of these kinds of issues.

As a side note, your synthetic data generation process seems like a great way to produce stress-testing datasets for overlay. This is actually tricky to do with synthetic data, so well done!

comment:6 by mdavis, 4 years ago

A further note on the cause of the overlay errors. It's not the overlap of the lic_groups that causes the problem, it's the geometry in each group itself. The group polygons are made non-overlapping by splitting them along the lines that split overlapping circles (which is a clever technique, by the way). However, the splitting and differencing involved introduces some numeric "jitter". The result is polygons which contain nearly-coincident lines along their boundaries. This is a classic cause of overlay robustness issues. And it's something that snapping takes care of very effectively.

comment:7 by dannytoone, 4 years ago

Thanks pramsey and mdavis. I've crossposted this into the GEOS bug tracker:

https://trac.osgeo.org/geos/ticket/1034

Feel free to reclassify as necessary, as it is primarily to have a reference to the dataset.

I wish I could take credit for the circle splitting technique, but it is just an implementation of the FCC's required rules for splitting p35 license boundaries when transmitting on the same frequencies as a neighboring licensee in the MDS/ITFS spectrum. So in my best snarky tone of voice, all thanks go to the FCC!

On the subject of snapping, how would you go about doing that? Snapping everything to a grid before unioning seems to work to some degree, but some of the geometries still end up jagged, and some still end up squared.

comment:8 by robe, 4 years ago

Milestone: PostGIS 3.1.0PostGIS GEOS

in reply to:  7 comment:9 by mdavis, 4 years ago

Replying to dannytoone:

Thanks pramsey and mdavis. I've crossposted this into the GEOS bug tracker:

https://trac.osgeo.org/geos/ticket/1034

Great, thanks.

in reply to:  7 comment:10 by mdavis, 4 years ago

Replying to dannytoone:

On the subject of snapping, how would you go about doing that? Snapping everything to a grid before unioning seems to work to some degree, but some of the geometries still end up jagged, and some still end up squared.

The snapping mentioned above is a new technique developed for the forthcoming OverlayNG. The algorithm is surprisingly simple: (i) scan all vertices of the input geometries and snap each one to previously scanned ones; (ii) scan all segments and snap them at intersections and to nearby anchor points. This is liable to introduce topology collapses, but these are removed by the overlay algorithm when it reforms the topology.

This is a heuristic algorithm - i.e. the snapping can produce situations with topology so invalid that it causes the overlay to fail. But these are very rare in practice, and so far can all be handled by just snapping more aggresively.

comment:11 by mdavis, 4 years ago

Keywords: overlay added
Owner: changed from pramsey to mdavis
Status: newassigned

comment:12 by mdavis, 4 years ago

Based on the GEOS ticket it looks like this should be fixed by OverlayNG. So closing.

comment:13 by dannytoone, 3 years ago

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.