#3220 closed defect (fixed)
Mingw failure on cluster test
Reported by: | robe | Owned by: | pramsey |
---|---|---|---|
Priority: | blocker | Milestone: | PostGIS 2.2.0 |
Component: | postgis | Version: | master |
Keywords: | Cc: | pramsey, nicklas, dbaston |
Description
I'm getting this failure only under mingw, for some reason regress testing against VC++ doesn't error out (well last I tried it didn't anyway, I'll retry to confirm), which explains why winnie is probably not complaining about this.
It looks like it might be just some sort of ordering difference since all the values are there, but in my mingw it comes in as a single geometry collection and regress expected looks like two geometry collections with polygon:
GEOMETRYCOLLECTION(LINESTRING(6 6,7 7))
coming in via a separate geometry collection. Anyway I have to upgrade to 9.4.4 anyway. Though my 9.5dev fails in same spot I think. I also notice I am running a slightly older GEOS 3.5 than winnie so could be caused by that too. Will close this out if that is the case.
PostgreSQL 9.4.1 on x86_64-w64-mingw32, compiled by gcc.exe (x86_64-win32-seh-rev1, Built by MinGW-W64 project) 4.8.3, 64-bit Postgis 2.2.0dev - r13860 - 2015-07-30 23:48:27 scripts 2.2.0dev r13860 GEOS: 3.5.0dev-CAPI-1.9.0 r4034 PROJ: Rel. 4.9.1, 04 March 2015 SFCGAL: 1.1.0 --- cluster_expected 2015-07-30 14:50:49 -0400 +++ /projects/postgis/tmp/2.2_pg9.4w64/test_31_out 2015-07-30 18:37:21 -0400 @@ -4,8 +4,7 @@ t2|GEOMETRYCOLLECTION(LINESTRING(0 0,1 1),LINESTRING(5 5,4 4),LINESTRING(0 0,-1 -1),POLYGON((0 0,4 0,4 4,0 4,0 0))) t2|GEOMETRYCOLLECTION(LINESTRING(6 6,7 7)) t2|GEOMETRYCOLLECTION(POLYGON EMPTY) -t3|GEOMETRYCOLLECTION(LINESTRING(0 0,1 1),LINESTRING(5 5,4 4),LINESTRING(0 0,-1 -1),POLYGON((0 0,4 0,4 4,0 4,0 0))) -t3|GEOMETRYCOLLECTION(LINESTRING(6 6,7 7)) +t3|GEOMETRYCOLLECTION(LINESTRING(0 0,1 1),LINESTRING(5 5,4 4),LINESTRING(0 0,-1 -1),LINESTRING(6 6,7 7),POLYGON((0 0,4 0,4 4,0 4,0 0))) t3|GEOMETRYCOLLECTION(POLYGON EMPTY) t4|GEOMETRYCOLLECTION(LINESTRING(0 0,1 1),LINESTRING(5 5,4 4),LINESTRING(0 0,-1 -1),LINESTRING(6 6,7 7),POLYGON((0 0,4 0,4 4,0 4,0 0))) t4|GEOMETRYCOLLECTION(POLYGON EMPTY)
Change History (24)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
could it be due to unpredictable sorting order in the aggregate ? does the test contain an ORDER BY predicate for it ?
comment:3 by , 9 years ago
there is an order by. Test looks like this:
CREATE TEMPORARY TABLE cluster_inputs (id int, geom geometry); INSERT INTO cluster_inputs VALUES (1, 'LINESTRING (0 0, 1 1)'), (2, 'LINESTRING (5 5, 4 4)'), (3, NULL), (4, 'LINESTRING (0 0, -1 -1)'), (5, 'LINESTRING (6 6, 7 7)'), (6, 'POLYGON EMPTY'), (7, 'POLYGON ((0 0, 4 0, 4 4, 0 4, 0 0))'); SELECT 't1', ST_AsText(unnest(ST_ClusterIntersecting(geom ORDER BY id))) FROM cluster_inputs; SELECT 't2', ST_AsText(unnest(ST_ClusterIntersecting(ST_Accum(geom ORDER BY id)))) FROM cluster_inputs; SELECT 't3', ST_AsText(unnest(ST_ClusterWithin(geom, 1.4 ORDER BY id))) FROM cluster_inputs; SELECT 't4', ST_AsText(unnest(ST_ClusterWithin(ST_Accum(geom ORDER BY id), 1.5))) FROM cluster_inputs;
comment:4 by , 9 years ago
I'm going to upgrade to 9.4.4 and see if the issue goes away. Might be because my mingw 9.4 is older version than my EDB. If it is I'm not going to worry about it.
comment:5 by , 9 years ago
nope same issue with
PostgreSQL 9.4.4 on x86_64-w64-mingw32, compiled by gcc.exe (x86_64-win32-seh-rev1, Built by MinGW-W64 project) 4.8.3, 64-bit Postgis 2.2.0dev - r13860 - 2015-07-31 14:53:41 scripts 2.2.0dev r13860 GEOS: 3.5.0dev-CAPI-1.9.0 r4064 PROJ: Rel. 4.9.1, 04 March 2015
comment:6 by , 9 years ago
No, your answer is wrong, it's not an ordering problem, it's a "getting it wrong" problem, but why your platform, that I dunno.
select st_distance( 'GEOMETRYCOLLECTION(LINESTRING(0 0,1 1),LINESTRING(5 5,4 4),LINESTRING(0 0,-1 -1),POLYGON((0 0,4 0,4 4,0 4,0 0)))', 'GEOMETRYCOLLECTION(LINESTRING(6 6,7 7))' ); st_distance ----------------- 1.4142135623731 (1 row)
The failing test is a "cluster within" with a tolerance of 1.4, so the two elements fall outside the tolerance and should be in different clusters, as the platforms that succeed (all of them but one) return.
comment:7 by , 9 years ago
Nope that one returns the same answer for me on my mingw64 that is failing this test:
test_address=# select version() || ' ' || postgis_full_version(); NOTICE: Function postgis_topology_scripts_installed() not found. Is topology support enabled and topology.sql installed? ?column? ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 9.4.4 on x86_64-w64-mingw32, compiled by gcc.exe (x86_64-win32-seh-rev1, Built by MinGW-W64 project) 4.8.3, 64-bit POSTGIS="2.2.0dev r13905" GEOS="3.5.0dev (1 row) test_address=# test_address=# select st_distance( test_address(# 'GEOMETRYCOLLECTION(LINESTRING(0 0,1 1),LINESTRING(5 5,4 4),LINESTRING(0 0,-1 -1),POLYGON((0 0,4 0,4 4,0 4,0 0)))', test_address(# 'GEOMETRYCOLLECTION(LINESTRING(6 6,7 7))' test_address(# ); st_distance ----------------- 1.4142135623731 (1 row)
BTW I still get this annoying topology script not installed notice every once in a while. I think that happens if I had postgis_topology installed and then later uninstall it. I'll ticket that one separately if I can replicate.
Reran the tests and still fails on cluster:
Loading PostGIS into 'postgis_reg' PostgreSQL 9.4.4 on x86_64-w64-mingw32, compiled by gcc.exe (x86_64-win32-seh-rev1, Built by MinGW-W64 project) 4.8.3, 64-bit Postgis 2.2.0dev - r13906 - 2015-08-15 06:08:55 scripts 2.2.0dev r13906 GEOS: 3.5.0dev-CAPI-1.9.0 r4072 PROJ: Rel. 4.9.1, 04 March 2015
Checked still screwed:
SELECT 't3', ST_AsText(unnest(ST_ClusterWithin(geom, 1.39 ORDER BY id))) FROM cluster_inputs; test_address=# SELECT 't3', ST_AsText(unnest(ST_ClusterWithin(geom, 1.39 ORDER BY id))) FROM cluster_inputs; ?column? | st_astext ----------+-------------------------------------------------------------------------------------------------------------------------------------- t3 | GEOMETRYCOLLECTION(LINESTRING(0 0,1 1),LINESTRING(5 5,4 4),LINESTRING(0 0,-1 -1),LINESTRING(6 6,7 7),POLYGON((0 0,4 0,4 4,0 4,0 0))) t3 | GEOMETRYCOLLECTION(POLYGON EMPTY) (2 rows)
If I'm the only one experiencing this issue, and it's not a symptom of a larger issue, I'm okay with closing this as won't fix. This behavior only exhibits when I am testing my 64-bit mingw binaries against my 64-bit mingw compiled PostgreSQL. It doesn't happen if I use the binaries in a VC++ compiled EDB PostgreSQL, which is my production use and what I target.
pramsey maybe I can see what it's doing if I inject some debug clauses. Where is the best place to put that?
comment:8 by , 9 years ago
Sorry robe, I just saw this.... The first thing I would check is whether the correct tolerance distance is making it to where it's needed at line 164 of lwgeom_geos_cluster.c.
I would work on this myself if I had a Windows build environment set up....you don't have anything I could RDP into, do you?
comment:9 by , 9 years ago
dbaston,
I'll check this out. Sadly no this issue only happens on windows desktop when testing against mingw PostgreSQL. On winnie I have her testing against VC++ builds, where this issue doesn't show, which is why I'm not so concerned if others aren't experiencing it. I'll checkout that line you mentioned and see what's happening there (after I get other tickets squared away).
comment:10 by , 9 years ago
dbaston,
Okay I put in a notice just before that line to show the mindist and cxt->tolerance values and your suspicions seem correct, it's not getting into the union loop.
For the test that is failing t3, this is what I see when I run against my mingw64 9.4 postgresql
NOTICE: mindist: 0, cxt tolerance: 1.38242e+306 NOTICE: mindist: 0, cxt tolerance: 1.38242e+306 NOTICE: mindist: 4.24264, cxt tolerance: 1.38242e+306 NOTICE: mindist: 7.07107, cxt tolerance: 1.38242e+306
So it does look like the mindist is always higher than the tolerance, thus wouldn't go into the UF_Union routine.
Now for comparison, If I run the same binaries against my 64-bit VC++ 9.4 build, here is what I get
NOTICE: mindist: 0, cxt tolerance: 1.4 NOTICE: mindist: 0, cxt tolerance: 1.4 NOTICE: mindist: 0, cxt tolerance: 1.4 NOTICE: mindist: 1.41421, cxt tolerance: 1.4 NOTICE: mindist: 1.41421, cxt tolerance: 1.4
So it seems the mindist <= cxt->tolerance is being tripped up by the scientific notation representation of 1.38....
comment:11 by , 9 years ago
Cc: | added |
---|
Okay for completeness, I put in a notice to trap whenever the union operation is being called:
Test:
CREATE TEMPORARY TABLE cluster_inputs (id int, geom geometry); INSERT INTO cluster_inputs VALUES (1, 'LINESTRING (0 0, 1 1)'), (2, 'LINESTRING (5 5, 4 4)'), (3, NULL), (4, 'LINESTRING (0 0, -1 -1)'), (5, 'LINESTRING (6 6, 7 7)'), (6, 'POLYGON EMPTY'), (7, 'POLYGON ((0 0, 4 0, 4 4, 0 4, 0 0))'); SELECT 't3', ST_AsText(unnest(ST_ClusterWithin(geom, 1.4 ORDER BY id))) FROM cluster_inputs;
-- failing mingw64 PostgreSQL using postgis mingw64 compiled binaries gives these notices
NOTICE: mindist: 0, cxt tolerance: 1.38242e+306 NOTICE: Performaing union NOTICE: mindist: 0, cxt tolerance: 1.38242e+306 NOTICE: Performaing union NOTICE: mindist: 4.24264, cxt tolerance: 1.38242e+306 NOTICE: Performaing union NOTICE: mindist: 7.07107, cxt tolerance: 1.38242e+306 NOTICE: Performaing union
-- passing VC++ build (against mingw64 compiled postgis binaries gives this
NOTICE: mindist: 0, cxt tolerance: 1.4 NOTICE: Performaing union NOTICE: mindist: 0, cxt tolerance: 1.4 NOTICE: Performaing union NOTICE: mindist: 0, cxt tolerance: 1.4 NOTICE: Performaing union NOTICE: mindist: 1.41421, cxt tolerance: 1.4 NOTICE: mindist: 1.41421, cxt tolerance: 1.4
So the failing one always thinks 1.38242..+306 is bigger than mindist. I vaguely remember we having this problem before and pramsey or nicklas came up with a fix for it (some sort of macro as I recall). Maybe one of them can remember (added them to cc). I'll hunt in code base to see if I can bring that up.
comment:12 by , 9 years ago
Cc: | added |
---|
OK, working backwards, next place I would check to see if the tolerance is correct is lwgeom_accum.c, line 133. If the value is correct here, then I would wonder if the Datum holding the tolerance is being freed before the value is actually read from it. Does this Datum need to be copied into a different MemoryContext?
comment:13 by , 9 years ago
Instead of passing p->data on lwgeom_accum.c, line 349, do we want to call datumCopy first?
comment:14 by , 9 years ago
I suppose trying is no harm... I wouldn't expect that memory to disappear or be over-written, as the execution isn't complete and the stack isn't going to be unwound on it... hm.
comment:15 by , 9 years ago
dbaston, pramsey - not sure how to get a read out from line 133. I tried doing this:
POSTGIS_DEBUGF(2, "Datum value %g", PG_GETARG_DATUM(2));
and turning my postgresql clinet_min_messages to DEBUG2, but not seeing anything.
comment:16 by , 9 years ago
robe - I wonder if that's the problem - line 133 is never reached? Is PG_NARGS() not returning "3" ?
comment:17 by , 9 years ago
Hmm I thought I actually tried the binaries also on the VC++ instance which does give the right answer. Let me verify that (sometimes I get confused which instance I launched :) ). I'll also chech what PG_NARGS is returning.
comment:18 by , 9 years ago
finally got it - I think I just had to compile with enable-debug on.
this is what I get for the mingw postgres (which returns the wrong answer)
DEBUG: [lwgeom_accum.c:pgis_geometry_accum_transfn:125] Number of args 3 DEBUG: [lwgeom_accum.c:pgis_geometry_accum_transfn:134] Datum value 4.02286e-316
This is for the vc++ postgresql (which returns the right answer)
DEBUG: [lwgeom_accum.c:pgis_geometry_accum_transfn:125] Number of args 3 DEBUG: [lwgeom_accum.c:pgis_geometry_accum_transfn:134] Datum value 2.47697e-316
comment:19 by , 9 years ago
Okay on mingw after I change the debugf to
POSTGIS_DEBUGF(2, "Datum value %g", PG_GETARG_FLOAT8(2));
I get this:
DEBUG: [lwgeom_accum.c:pgis_geometry_accum_transfn:134] Datum value 1.4
on my vc++ postgres compiled I get this which is really odd since this is the one giving the right answer (and yet I get a tolerance error)
DEBUG: [lwgeom_accum.c:pgis_geometry_accum_transfn:134] Datum value 1.4 ERROR: Tolerance not defined ********** Error ********** ERROR: Tolerance not defined SQL state: XX000
comment:20 by , 9 years ago
Okeedoke - switching to:
POSTGIS_DEBUGF(2, "Datum value %g", DatumGetFloat8(p->data));
and putting that after the
p->data = PG_GETARG_DATUM(2);
Both give the 1.4 answer and VC++ no longer throws an error.
comment:21 by , 9 years ago
Looking through this more, I think that we need to copy the tolerance value into aggcontext. I'll put together a patch to do this.
comment:22 by , 9 years ago
OK robe, here is a patch that I hope may fix the issue:
https://patch-diff.githubusercontent.com/raw/postgis/postgis/pull/63.patch
comment:23 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Yap this one does the trick. Committed at r14035. Though I forgot to retest on VC++, but winnie will do that.
nope same failure with
So older GEOS does not seem to be the issue.