Opened 14 years ago
Closed 12 years ago
#547 closed defect (fixed)
ST_contains memory problem when used between polygon and mixed geometry
Reported by: | twain | Owned by: | strk |
---|---|---|---|
Priority: | blocker | Milestone: | PostGIS 1.5.6 |
Component: | postgis | Version: | 1.5.X |
Keywords: | ST_contains memory | Cc: |
Description
Tested with postgis 1.5.1:
The following query uses large amounts of memory (multiple GB depending on the number of rows)
select ST_Contains(source.geometry, placex.geometry) from placex, placex as source where source.place_id = 59001169 and source.geometry && placex.geometry;
where source.geometry is a polygon or multipolygon.
if the query is restricted to:
select ST_Contains(source.geometry, placex.geometry) from placex, placex as source where source.place_id = 59001169 and source.geometry && placex.geometry and ST_geometrytype(placex.geometry) = 'ST_Point';
OR
select ST_Contains(source.geometry, placex.geometry) from placex, placex as source where source.place_id = 59001169 and source.geometry && placex.geometry and ST_geometrytype(placex.geometry) != 'ST_Point';
neither of these queries has high memory usage.
In the original case all memory is freed when the query completed but given a large number of rows (10000+) it can use up all the memory in the machine causing it to swap and eventually kill the process.
As far as I can tell there is nothing special about the source polygons / multi polygon - any polygon shows this behaviour.
Attachments (1)
Change History (27)
comment:1 by , 14 years ago
comment:2 by , 14 years ago
If you're still there I'd be interested if the attached patch resolves the problem.
comment:3 by , 14 years ago
I don't have a test case for this, but I think it's clear that this patch should be applied. Without it, we're just leaking the cache objects whenever we encounter a different type of query.
Ideally, we could be even smarter, and either fit both cache objects into fn_extra (perhaps with a super-cache struct) or, make our processing dependent on what object with have cached. eg. If we are doing a point in poly, and we already have a geos struct, use geos instead of the postgis PIP code. So in the case of mixed geometry types, we would "upgrade" from the rtree cache to the prepared geos cache, but not downgrade (as this patch would do) - this only makes sense to do if the geos PIP using a prepared geom is comparable in speed to the builtin postgis PIP code (which I would think it should be?). Holding on to both cache objects should be pretty easy to do though.
comment:4 by , 13 years ago
Owner: | changed from | to
---|
A quick test - ESRI's standard world shapefiles, tested running PIP on cities in countries, ~2500 cities and ~250 countries.
select st_intersects(co.the_geom, ci.the_geom) from country co, cities ci;
This takes 8 seconds with the rtree code in place. If I removed the rtree shortcut in the intersection function by replacing the "if (one is a point and the other is poly)" with "if(0)" - it takes 28 seconds. That is the second run timings and it doesn't get any different with more runs. This is enough of a difference that I think it is worth keeping both implementations around for now.
So I think I'd like to do the "super-cache" struct to hold both the geos prepared_geom and the rtree index in fn_extra.
comment:5 by , 13 years ago
Milestone: | PostGIS 1.5.3 → PostGIS 1.5.4 |
---|
comment:6 by , 13 years ago
chodgson, did this get done? I have memories of you committing work on a unified cache...
comment:7 by , 13 years ago
IIRC, I tried to apply Paul's original patch, and it didn't seem to fix my test case entirely. I didn't have enough time to try the super-cache struct approach, but I still think it is the "best" way.
comment:8 by , 13 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
I'm working on the grand unified cache (for 2.0)
comment:9 by , 13 years ago
r8877 introduced an unified cache. I _belive_ it is safe to backport, but please give a word of encouragement for that :)
comment:10 by , 13 years ago
Strk, this looks about like what I was expecting, thanks for doing it. I can't see why it shouldn't be backported to 1.5 as that's where I was originally planning to fix it.
comment:11 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:12 by , 13 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
comment:13 by , 13 years ago
Priority: | high → blocker |
---|
There is a big memory leak which has been bisected back to r8877.
http://lists.refractions.net/pipermail/postgis-users/2012-February/032306.html
comment:14 by , 13 years ago
Milestone: | PostGIS 1.5.4 → PostGIS 2.0.0 |
---|
If the grand unified cache was introduced into 1.5 branch it'll need to be fixed there too (reading the comments I guess it was not).
comment:15 by , 13 years ago
Do you have a valgrind report for the leak ? By reading the rtree cache preparer it doesnt' look like having issues, but the GetPrepGeomCache function is pretty confusing to me.
comment:16 by , 13 years ago
Ok, I could reproduce by running ST_Intersects(A, A) OR ST_Intersects(A, P).
Interesting enough the leak remains in the A-A case, doesn't matter if A-P is encountered, so the problem is fully with the prepared geometry cache, not the RTREE one. Which is the confusing function, btw. This is with ST_Intersects though (as the reporter for r8877 regression uses), not ST_Contains (which does _not_ seem to leak for me).
comment:17 by , 13 years ago
The PrepGeom stuff is confusing because it has to account for the memory being allocated by GEOS outside the PgSQL memory pools and clean it up on shutdown. It does this with the crazy-making memory context stuff you're squinting at. The proj cache has the same stuff. The memory leak might be because your unified cache work has not taken those issues into account?
comment:18 by , 13 years ago
Could be. I'm working on making it easier to enable debugging in a single .c file, then I'll know more about what's going on. It strikes me that with that commit I was really careful about not changing things much.
comment:19 by , 13 years ago
Alright, here's what happens:
NOTICE: [lwgeom_geos_prepared.c:GetPrepGeomCache:329] GetPrepGeomCache: creating cache: 0x7fc5735eddb0 NOTICE: [lwgeom_geos_prepared.c:GetPrepGeomCache:338] GetPrepGeomCache: adding context to hash: 0x7fc5735eddb0 NOTICE: [lwgeom_geos_prepared.c:GetPrepGeomCache:436] GetPrepGeomCache: copying pg_geom1 into cache
Over and over again.
comment:20 by , 13 years ago
Uhm, fn_extra keeps getting back as NULL during operations. Can't get a cache hit. Both with prepared and rtree caches
comment:21 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Silly one, fixed by r9136
comment:22 by , 12 years ago
Milestone: | PostGIS 2.0.0 → PostGIS 1.5.5 |
---|---|
Resolution: | fixed |
Status: | closed → reopened |
I had this problem using postgis 1.5.5 (postgresql 9.1.4) with st_intersects in the where clause.
Can we get this fix backported to 1.5 please.
Thanks Darren
comment:23 by , 12 years ago
The 1.5 branch is in horrible conditions at this very moment, see ticket #1954 -- need a succeeding testsuite before adding anything can be committed (any commit in 1.5 needs careful checking)
comment:25 by , 12 years ago
Milestone: | PostGIS 1.5.5 → PostGIS 1.5.6 |
---|
comment:26 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
I think I see where this could come from, as the prepared geometry cache system expects a query to be *either* a p-i-p situation or a p-and-g situation, but not both at the same time. One fix could be just ripping out the postgis-native implementation (for the p-i-p case) of the caching system.