Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#3840 closed defect (fixed)

winnie 32-bit pg10 test failing on protobuf geobuf regress tests

Reported by: robe Owned by: Björn Harrtell
Priority: high Milestone: PostGIS 2.4.0
Component: postgis Version: master
Keywords: Cc:

Description (last modified by robe)

I turned on 32-bit testing on winnie. Had it off for a long time because of the shp2pgsql cunit failures.

To my disappointment getting errors on ST_AsGeobuf. I recall we had a similar issue like this with 64-bit.

PostgreSQL 10beta2 on i686-w64-mingw32, compiled by i686-w64-mingw32-gcc.exe (rev2, Built by MinGW-W64 project) 4.8.1, 32-bit
  Postgis 2.4.0dev - r15675 - 2017-09-10 03:54:10
  scripts 2.4.0dev r15675
  GEOS: 3.6.2-CAPI-1.10.2 4d2925d
  PROJ: Rel. 4.9.1, 04 March 2015
  SFCGAL: 1.3.1


 geobuf .. failed (diff expected obtained: /projects/postgis/tmp/2.4.0dev_pg10_geos3.6.2_gdal2.2.1w32/test_120_diff)
-----------------------------------------------------------------------------
--- geobuf_expected	2017-09-08 18:24:34.086202800 -0400
+++ /projects/postgis/tmp/2.4.0dev_pg10_geos3.6.2_gdal2.2.1w32/test_120_out	2017-09-10 00:07:00.251494400 -0400
@@ -1,4 +1,4 @@
-T1|GAEiCgoICgYIABoCFio=
+T1|GAIiDAoKCggIABoE3gGkAw==
 T2|Cgh0ZXN0X3N0cgoMdGVzdF9wb3NfaW50Cgx0ZXN0X25lZ19pbnQKDHRlc3RfbnVtZXJpYwoKdGVz
 dF9mbG9hdBgAIjoKOAoICAIaBAICAgJqBgoEdGVzdGoCGAFqAiABagUKAzEuMWoJEZqZmZmZmfE/
 cgoAAAEBAgIDAwQE
-----------------------------------------------------------------------------

I know got to upgrade her 10 to beta4, but that shouldn't matter.

Change History (17)

comment:1 by robe, 7 years ago

this reminds me of this ticket - #3742

comment:2 by robe, 7 years ago

Description: modified (diff)
Summary: winnie 32-bit test failing on protobufwinnie 32-bit test failing on protobuf geobuf regress tests

comment:3 by robe, 7 years ago

Summary: winnie 32-bit test failing on protobuf geobuf regress testswinnie 32-bit pg10 test failing on protobuf geobuf regress tests

well this is interesting the PostgreSQL 9.6.5 32-bit run passed this test.

PostgreSQL 9.6.5, compiled by Visual C++ build 1800, 32-bit
  Postgis 2.4.0dev - r15675 - 2017-09-10 04:36:59
  scripts 2.4.0dev r15675
  GEOS: 3.6.2-CAPI-1.10.2 4d2925d
  PROJ: Rel. 4.9.1, 04 March 2015
  SFCGAL: 1.3.1


mvt .. ok 
 geobuf .. ok 

 mvt_jsonb .. ok 
 regress_sfcgal .. ok 

Only difference I can think is that 9.6 I do test against VC++ built PostgreSQL. 10 is mingw64-w32 since EDB hasn't released their 32-bit PostgreSQL 10 yet.

comment:4 by Björn Harrtell, 7 years ago

Interesting indeed. Cannot quickly spot anything. Agree it has similarities with #3742 but in that case it was about attribute value encoding and in this case there are no attributes at all. :S

comment:5 by Björn Harrtell, 7 years ago

That said, the geobuf code is earlier and has seen less love than the mvt code. Will, when I can, look into it more detailed. Could be similar type size issues but in another place.

comment:6 by robe, 7 years ago

Priority: blockerhigh

well the annoying thing is I tried it on my windows 7 desktop (actually using the copy of postgresql 10 32 from winnie that was failing) and it doesn't fail regression. So I think whatever this is is another one of those Heisenberg things that rears it's ugly head only once in a while. I'm going to downgrade this to high since I can't replicate at all on 64-bit and only sometimes on 32-bit.

comment:7 by Björn Harrtell, 7 years ago

Not sure why https://github.com/postgis/postgis/commit/1a693b6c3b0d3273bf3a1c26c469bf8329bc7ad4 wasn't referenced here, but I'm curious if that had any effect on this issue.

comment:8 by robe, 7 years ago

You have to use the phrase References #ticket_num.

note sure. I had the 10 32-bit turned off. I'll turn it back and and close this out if it fixes the issue.

comment:9 by robe, 7 years ago

Sadly still a problem. What's annoying is I can't replicate it on my desktop. So it's not even a simple matter of pg10 32-bit windows since using the same compiled cluster I can't replicate it. On winnie it only happens on her pg10 32-bit runs. thought that may be because the others are tested under Vc++ and this one is pure ming since EDB hasn't shipped their pg10 VC++ I can test against.

Though even on my machine where I have a pure mingw64-w32 test it doesn't fail. Can't imagine its something as stupid as because she's running windows 2012 and I'm running windows 7. hmm or maybe it is something in the mvcrt implementations. Because mingw64-w32 would use what is native to the OS and I think the windows 7 / windows 2012 mvcrt might be a little different.

I probably don't see it with the vc++ builds because the mvcrt for vc++ is different and pegged to compiled version of vc++ rather than the OS. I'm going to test 9.6w32mingw on winnie to prove out that theory. If so might just be a bug in the msvcrt 32-bit that ships with windows 2012r2 64-bit winnie's running. I'm not so concerned about it because in the field only EDB ships the windows 32-bit versions and since they compile with VC++ somethig or other, they aren't relying on the msvcrt that ships with the OS.

Last edited 7 years ago by robe (previous) (diff)

comment:10 by robe, 7 years ago

okay I embarrassingly discovered that it fails on my desktop pg10x32 and pg96x32 mingw64-w32 installs as well. So the good news is it's not some weird spooky thing happening on winnie or with windows 2012r2. It's also not specific to PostgreSQL 10.

I wasn't seeing the failure before because duh I wasn't compiling with protobuf support.

Since it doesn't fail when testing against VC++ 32-bit builds (9.6), my guess is it might be only something that shows with a cassert / debug compiled PostgreSQL 32-bit and may not even have anything to do with vc++ vs. mingw64-w32 differences.

comment:11 by Björn Harrtell, 7 years ago

Interesting enough to me to setup a virtual Debian 9 32-bit to see how it behaves and I get the same failure on a normal build (no cassert / debug) so while this is now more than a fluke it should also be easier for me to hunt it down.

Though, I also see failures for regress_brin_index regress_brin_index_3d which was unexpected. Any ideas?

On this setup I got GEOS 3.5.1, PostgreSQL 9.6.4 and PROJ4 49.

comment:12 by robe, 7 years ago

wow great you were able to reproduce it. Don't feel quite so alone.

hmm not seeing any issue with regress_brin_index or regress_brin_index_3d. How much memory do you have allocated for your VM?

Version 0, edited 7 years ago by robe (next)

comment:13 by Björn Harrtell, 7 years ago

Decoding the geobuf data gives for passing output (GAEiCgoICgYIABoCFio=):

Error: Failed to load processor json
No macro or processor named 'json' found

Failing output (GAIiDAoKCggIABoE3gGkAw==):

Error: Failed to load processor json
No macro or processor named 'json' found

So something is happening to the floating point...

comment:14 by Björn Harrtell, 7 years ago

In 15802:

ST_AsGeobuf fix double comparison
References #3840

comment:15 by Björn Harrtell, 7 years ago

Verified fixing the issue on Debian 9 32-bit.

comment:16 by Björn Harrtell, 7 years ago

Resolution: fixed
Status: assignedclosed

comment:17 by robe, 7 years ago

Great. That fixed winnnie's 32-bit issue as well.

Note: See TracTickets for help on using tickets.