Opened 4 years ago

Closed 4 years ago

#4706 closed defect (fixed)

Backend crash with TopoGeo_addLinestring against an invalid topology

Reported by: strk Owned by: strk
Priority: blocker Milestone: PostGIS 2.4.9
Component: topology Version: master
Keywords: Cc:

Description

As reported in https://github.com/larsop/resolve-overlap-and-gap/issues/1 there are cases in which an invalid topology triggers a segfault in PostGIS, which assumes some form of validity (existing MBR for faces).

This is probably true since we switched topology from plpgsql to C, and fixes should be backported to all those versions (2.2.0 onward)

Change History (20)

comment:1 by Sandro Santilli <strk@…>, 4 years ago

In ab4f540/git:

Handle corrupted topology in ST_ChangeEdgeGeom

Rather than crashing the backend.
References #4706 in 3.0 branch
Include tests.

comment:2 by Sandro Santilli <strk@…>, 4 years ago

In 7b9dd285/git:

Handle corrupted topology in ST_ChangeEdgeGeom

Rather than crashing the backend.
References #4706 in 2.5 branch.
Include tests.

comment:3 by Sandro Santilli <strk@…>, 4 years ago

In a2557be/git:

Handle corrupted topology in ST_ChangeEdgeGeom

Rather than crashing the backend.
References #4706 in master branch.
Include tests.

comment:4 by Sandro Santilli <strk@…>, 4 years ago

In 5b809e2/git:

Handle corrupted topology in ST_ChangeEdgeGeom

Rather than crashing the backend.
References #4706 in 2.4 branch.
Include tests.

comment:5 by strk, 4 years ago

Milestone: PostGIS 3.0.2PostGIS 2.4.9
Resolution: fixed
Status: newclosed

I've pushed a fix in branches 2.4, 2.5, 3.0 and master. I realized 2.3 and older are EOLed so won't push there.

comment:6 by strk, 4 years ago

For the record: the crash was really only in master branch, due to the change introduced in #3221 to allow for GetFaceGeometry to return EMPTY instead of NULL. The test was still good to backport

comment:7 by robe, 4 years ago

Resolution: fixed
Status: closedreopened

Looks like this broke something

Getting this on berrie (64-bit rasberry pie arm running 32-bit postgresql) and bessie32 (32-bit freebsd)

14:12:55 -----------------------------------------------------------------------------
14:12:55 --- regress/st_changeedgegeom_expected	2020-06-22 10:40:12.551514646 -0700
14:12:55 +++ /tmp/pgis_reg/test_16_out	2020-06-22 11:12:55.631758039 -0700
14:12:55 @@ -39,5 +39,5 @@
14:12:55  T13.2|Edge 29 changed
14:12:55  T13.3|26
14:12:55  ERROR:  Edge motion collision at POINT(-1.1697 47.7825)
14:12:55 -ERROR:  Corrupted topology: face 4, right of edge 7, has no bbox
14:12:55 +ERROR:  Corrupted topology: face 24, right of edge 4, has no bbox
14:12:55  Topology 'city_data' dropped
14:12:55 -----------------------------------------------------------------------------

comment:8 by strk, 4 years ago

Regina does this only happen in ONE bot ? If so, I guess it depends on some different sorting from queries, resulting in observing one edge instead of another as the first one (edge 7 or edge 4 in this case)

comment:9 by robe, 4 years ago

It happens on both bessie32 and berrie. Both are 32-bit bots so might be 32-bit related I guess.

comment:10 by Regina Obe <lr@…>, 4 years ago

In ceca2a8/git:

Take out topology tests that fail on 32-bit references #4706 for Trunk 3.1.0alpha2

comment:11 by Regina Obe <lr@…>, 4 years ago

In b6f415d3/git:

Took out too much in last commit references #4706 for Trunk 3.1.0alpha2

comment:12 by strk, 4 years ago

I don't think it is a good idea to keep removing failing tests. It defeats the purpose of tests...

comment:13 by Algunenano, 4 years ago

I don't think it is a good idea to keep removing failing tests. It defeats the purpose of tests...

I totally agree, but it's also not ok to have some bots broken for 4 weeks. Should we have reverted the change instead until everything could be properly addressed?

comment:14 by strk, 4 years ago

Rather, the ERROR should be intercepted and references to faces removed (we just want to make sure an exception is thrown, with enough detail, but we don't care about the actual details of face ids)

comment:15 by robe, 4 years ago

strk feel free to reenable the tests once you've fixed them so that they behave consistently on all platforms and don't give false negatives.

comment:16 by strk, 4 years ago

I'd be happy to re-enable tests but I would need a way to reproduce the problem. In your report you don't provide a link, can I have a link ? Even better, can I have access to the 32bit machine ? I'd like to look at the built topology via qgis as I really don't even expect the _existance_ of a face with id 24 (highest numbered face has id 11).

I see other tests being disabled in the past, so I'm not sure what's the current state of constructed topology (see [73e5f0092b987b8d65d68d0513cb7a1948c62c25/git])

comment:17 by Sandro Santilli <strk@…>, 4 years ago

In dcfb125/git:

Re-enable tests for corrupted topology handling

References #4706

comment:18 by strk, 4 years ago

Regina I've pushed a fix but bessie32 is reported to be offline for pipeline 2​11​0 as I write this comment, from page https://debbie.postgis.net/view/PostGIS/job/PostGIS_Worker_Run/label=bessie32/

comment:19 by robe, 4 years ago

just turned her back on - I was backing up her image in prep for upgrading. I'll trigger a rerun.

Note: See TracTickets for help on using tickets.