#1806 closed defect (fixed)
Extremely slow and CPU-intensive ST_MakeValid (ST_BuildArea) case
Reported by: | strk | Owned by: | strk |
---|---|---|---|
Priority: | high | Milestone: | PostGIS 2.0.1 |
Component: | postgis | Version: | 2.0.x |
Keywords: | Cc: |
Description (last modified by )
The attached shapefile contains a single geometry. A multipolygon composed by a single polygon containing 4213 rings with a total of 82304 vertices.
Passing it as an argument to ST_MakeValid takes forever.
Attachments (1)
Change History (24)
comment:1 by , 13 years ago
comment:3 by , 13 years ago
Summary: | Extremely slow and CPU-intensive ST_MakeValid case → Extremely slow and CPU-intensive ST_MakeValid (ST_BuildArea) case |
---|
Yep, confirmed you can hook things forever by calling ST_BuildArea(ST_Boundary(geom))
comment:4 by , 13 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Also memory usage keeps increasing.
Plan: rewrite BuildArea to avoid all those SymDifference calls and simply skip polygons fully contained in other polygons (assuming those ones are holes of the other). Should work pretty good and fast...
comment:5 by , 13 years ago
polygonize finds 3536 polygons, so ST_BuildArea is stuck computing the symdifference between each of them and the aggregated result. While it goes on the time it takes to compute the symdifference grows. It starts with roughly 20 items per second. At iteration 1700 computes ~5 symdifference per second... Every iteration has to rebuild the topology which was already built in the previous iteration.
comment:6 by , 13 years ago
Step back: I was completely wrong about how the geometry was rendered. At least, I can't handle to render it that way again. The information about composition is correct.
comment:7 by , 13 years ago
Description: | modified (diff) |
---|
comment:8 by , 13 years ago
Another observation: the output of ST_BuildArea is plain _wrong_ in this case ! I'll work on reducing the case to address correctness first.
comment:9 by , 13 years ago
Another false alarm: ST_BuildArea was just wrong because I was passing it non-noded input. ST_MakeValid performs the noding step itself.
comment:10 by , 13 years ago
r9729 sorts polygonized output items by number of points, slighly improving performances. A reduction of the offending input, with number of holes cut from 4000+ to only 500 showed a speedup from 12 seconds to 9 seconds. Not such a big improvement.
Another thing to look at is short-circuits in GEOSSymDifference (none seen as of 3.3.3)
comment:11 by , 13 years ago
See http://trac.osgeo.org/geos/ticket/542 and http://trac.osgeo.org/geos/ticket/543 for further speedup possibilities.
comment:12 by , 13 years ago
After speeding up GEOS more, this case didn't gain that much speed. It's obviously a killer to perform 4000+ overlay operations... need a different approach!
comment:13 by , 13 years ago
Alright I've an implementation speeding a reduced case up by a factor of 100. The reduced case gets "only" 418 polygons from polygonize. I'm testing more and will finally commit.
comment:14 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
follow-up: 17 comment:15 by , 13 years ago
We have designed and implemented a different algorithm to automatically repair polygons, it avoids the 4000+ overlay operations mentioned for this polygon (behaviour is not quadratic like ST_MakeValid). The code is not in PostGIS though, but can be obtained there: http://tudelft-gist.github.com/prepair/ For this polygon, it takes ~3s on my laptop.
comment:16 by , 13 years ago
@hugoledoux: thank you very much for the pointer, looking forward to see that code get into PostGIS!
Meanwhile I shall point out that as of r9731 the complexity of ST_MakeValid is not quadratic anymore, doing a single overlay during the BuildArea phase.
The current hot-spot for performance impromvements in ST_MakeValid are the polygonizer (finding 5561 faces in about ~15 seconds) and a Difference call used to find which edges are missing from the output of BuildArea (finding no boundary missing in ~12 seconds). Plus there are ~3 seconds used to link 5561 polygonized faces to their ancestors (finding which face is an hole of which other face, which can be probably improved).
comment:17 by , 12 years ago
Replying to hugoledoux:
We have designed and implemented a different algorithm to automatically repair polygons, it avoids the 4000+ overlay operations mentioned for this polygon (behaviour is not quadratic like ST_MakeValid). The code is not in PostGIS though, but can be obtained there: http://tudelft-gist.github.com/prepair/ For this polygon, it takes ~3s on my laptop.
Hi, I see the sample in your link: http://tudelft-gist.github.com/prepair/
I don't understand why the starting polygon is invalid. However starting from a hypotetical invalid polygon. I guess it's solution should don't break it's typology if possible.
So if the starting geometry is a polygon with a hole, the result after the makevalid should still be (if possible) a polygon with a hole it should try to don't be another kind of geometry (a multipolygon) neither it should be an increase on the record number so it should not return two polygons (2 records) to substitute one polygon (one record).
Otherwise the MakeValid could not be usable really because it need to see one by one all the change to do to every geometry before apply it to the starting table.
comment:18 by , 12 years ago
@aperi2007: the polygon is actually invalid acc. to the OGC rules because the inner ring (grey ring) separate the outer ring into 2 separate polygons. The rules of the OGC state, among others, that the inteterior of a polygon must be connected. The only solution is to split the polygon into 2 polygons (which means that the inner ring disappears, indeed).
ST_MakeValid does not return per se a POLYGON, but a GEOMETRYCOLLECTION which can contain a MULTIPOLYGON, to cope with these situations.
comment:19 by , 12 years ago
@aperi2007: the polygon is actually invalid acc. to the OGC rules because the inner ring (grey ring) separate the outer ring into 2 separate polygons. The rules of the OGC state, among others, that the inteterior of a polygon must be connected.
Sorry, I forgot the connected rule. The polygon is invalid.
The only solution is to split the polygon into 2 polygons (which means that the >inner ring disappears, indeed).
Of course this is the only valid solution.
ST_MakeValid does not return per se a POLYGON, but a GEOMETRYCOLLECTION which can >contain a MULTIPOLYGON, to cope with these situations.
Yes , the ST_MakeValid must try to maintain the original type when this is possible. In this sample this is not possible , so it will return a more complex type.
Just as puntualize question: the st_makevalid never return a set of records, it return always one only record. If the geometry invalid is more complex, the MakeValid will produce an unique geometrycollection with as many polygon or multipolygons and/or lines and/or points to cover all the vertexs of the original geometry. The important is that is wll return always one only record. Is this the strategy you follow also ?
comment:20 by , 12 years ago
By design we chose not to return "unused" geometries. So if an inner rings collapses to a line we do not return it to the user.
We always return one MULTIPOLYGON; if the input polygon is only a line for instance then we return an empty MULTIPOLYGON.
comment:21 by , 12 years ago
One of the main goal of ST_MakeValid is to do not lost any vertex.
This is a strong choice of cocurse, but is the only strategy to allow an user to have all the vertex in a tabel and allow a posst processing phase to repair better the geometries.
Infact in a geometry like this:
----------------- | | | ------- | | | | | | ------- | | | -----------------
For some impredictable situations could became like this:
----------------- | | | | | -------- | | | | | -----------------
That is surely an invalid Polygon.
A strategy to transform this in a collection with this two parts:
----------------- | | | | | | + -------- | | | | -----------------
Will allow to load this two geometri parts in two tables and working because this are both two valid geometries.
After this, a power user could surely analyze the two parts (polygons and lines ) and rearrange the lines restoring the original size (the hole in the polygon).
Instead a strategy where the line is lost returning only the external polygon like this:
----------------- | | | | | | | | | | -----------------
Will deny any possibility to restoring the original hole.
A strategy like this don't allow to use the DB as a startup for an editing of any kind of geometry.
comment:22 by , 12 years ago
I agree that keeping all the geometries can be useful, but we decided to have an *automatic* method where the user presses one button and gets a valid geometry. Our method is predictable since it can be explained in 3 simple steps, so the user knows what to expect.
I believe that if the user wants to manually edit a polygon (with a GUI for instance), she can always go back to the original geometry.
Our design choice makes our algorithm very fast and scalable to very large polygon, modifying it to return all the geometries would make it use more memory and would slow it down...
comment:23 by , 12 years ago
I believe that if the user wants to manually edit a polygon (with a GUI for instance), she can always go back to the original geometry.
This is true only in theory.
If you have an invalid geometry and you would edit it. You need to transform it in a valid geometry otherwise every operation will end with an exception.
So , you need to transform the geometry in a valid geometry to edit it with successfull. But if the ST_MakeValid lost some vertex you cannot never recreate the really geometry but only a more simply not always really true geometry.
Think for example at this sample:
----------------- | | | ------- | | | | | | ------- | | | -----------------
Ad a garden with an hous in the middle. :)
You you see instead this:
----------------- | | | | | | | | | | -----------------
You see a gently lone and isolated garden. who say you that there was originally an house in the middle ? :)
So ok, your algorithm is more rapid, but rapid is not always a giood strategy if it mean lost something that could has a cost to retrieve.
Instead if you see
----------------- | | | | | | + -------- | | | | -----------------
You can understand that perhaps there was something in the middle.
However . I understand this is a question that has different point of view.
If I like to use the system only for low scales, perhaps lost something is not so important.
Ok, but this mean that postgis is not so important when an user need to do in the bigger details. And so it is a gently system only for rapid and simply internet presentation. Not for big and sophisticated elaboration of data.
Who need load something geometry on postgis before It need to work that geometry with something else product to repair , with some method that give me the real geometry don't a simplification of reality. (Perhaps oracle ?, Perhaps ArcGIS ?, perhaps AV3 ?. I don't know)
But why the user cannot user postgis ?
Response: "because postgis like to be the faster." And never is more fast than "throw away the vertex". :)
How does it look in qgis: