Opened 8 years ago
Closed 7 years ago
#3700 closed defect (fixed)
test_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runs, and travis sometimes
Reported by: | robe | Owned by: | komzpa |
---|---|---|---|
Priority: | high | Milestone: | PostGIS 2.4.4 |
Component: | postgis | Version: | 2.3.x |
Keywords: | Cc: |
Description
This is beginning to annoy me. I thought I had this in a ticket already but couldn't find it.
On occassion especially during high-load, winnie's 32-bit runs fail on this test:
Test: test_kmeans ...Makefile:85: recipe for target `check' failed
It's always that test and when I think I've only seen the 32-bit runs fail. They fail about once every 3-5 runs.
Could be windows, or there is something wrong with kmeans that shows up more often on 32-bit systems.
Change History (23)
comment:1 by , 8 years ago
Component: | buildbots → postgis |
---|---|
Owner: | changed from | to
comment:2 by , 8 years ago
comment:3 by , 8 years ago
Yes she runs with RUNTTESTFLAGS=-v it looks like:
https://git.osgeo.org/gogs/postgis/postgis/src/svn-trunk/ci/winnie/regress_postgis.sh#L143
But that doesn't explain why it only fails on 32-bit and not 64-bit does it?
comment:4 by , 8 years ago
No, but the lack of diff output suggests to me that there's no difference between expected and obtained output, thus the error must be in the run_test script itself. Can you try to run it in isolation on that machine, against the specific offending testcase ?
comment:6 by , 8 years ago
Summary: | test_kmeans fails on winnie often on 32-bit runs → test_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runs |
---|
Okay this just happened on winnie's 64-bit trunk run so guess not limited to 32-bit. This is the first time I recall it happening on 64-bit.
Test: test_kmeans ...Makefile:85: recipe for target `check' failed make[2]: *** [check] Error 255 make[2]: Leaving directory `/projects/postgis/branches/2.4/liblwgeom/cunit' Makefile:205: recipe for target `check' failed make[1]: *** [check] Error 2 make[1]: Leaving directory `/projects/postgis/branches/2.4/liblwgeom' GNUmakefile:16: recipe for target `check' failed make: *** [check] Error 1
comment:7 by , 7 years ago
Milestone: | PostGIS 2.4.0 → PostGIS 2.5.0 |
---|---|
Priority: | medium → high |
comment:8 by , 7 years ago
Resolution: | → worksforme |
---|---|
Status: | new → closed |
I just ran the cunit tests on 4 cores simultaneously in a big loop looking for this failure, but didn't get it. Maybe it's gone? ha ha.
comment:9 by , 7 years ago
Resolution: | worksforme |
---|---|
Status: | closed → reopened |
Nice try buddy. Keeping this for 2.5. I actually haven't been testing 32-bit for a while cause I have a more pressing issue with it failing on shp2pgsql-gui that I haven't figured out. So I turned off testing on 32-bit until I've squared that away.
Anyway like I said I think I've only seen this on windows, so it might have to do with the fact I compile with mingw and test against a VC++ build that it's seeing something you aren't. I'll reassign to myself and try to nail down the issue in 2.5.
comment:10 by , 7 years ago
Owner: | changed from | to
---|---|
Status: | reopened → new |
comment:11 by , 7 years ago
hah guess it's still a problem. Just happened to me when testing r15671 on my mingw gcc 4.8.3 64-bit. though error is a little different so perhaps not quite the same thing.
CUnit - A unit testing framework for C - Version 2.1-2 http://cunit.sourceforge.net/ Suite: computational_geometry Test: test_lw_segment_side ...passed Test: test_lw_segment_intersects ...passed Test: test_lwline_crossing_short_lines ...passed Test: test_lwline_crossing_long_lines ...passed Test: test_lwline_crossing_bugs ...passed Test: test_lwpoint_set_ordinate ...passed Test: test_lwpoint_get_ordinate ...passed Test: test_point_interpolate ...passed Test: test_lwline_clip ...passed Test: test_lwline_clip_big ...passed Test: test_lwmline_clip ...passed Test: test_geohash_point ...passed Test: test_geohash_precision ...passed Test: test_geohash ...passed Test: test_geohash_point_as_int ...passed Test: test_isclosed ...passed Test: test_lwgeom_simplify ...passed Test: test_lw_arc_center ...passed Test: test_point_density ...passed Test: test_kmeans ...Makefile:86: recipe for target 'check' failed make[2]: *** [check] Segmentation fault make[2]: Leaving directory '/projects/postgis/branches/2.4/liblwgeom/cunit' Makefile:205: recipe for target 'check' failed make[1]: *** [check] Error 2 make[1]: Leaving directory '/projects/postgis/branches/2.4/liblwgeom' GNUmakefile:16: recipe for target 'check' failed make: *** [check] Error 1
I should add it's not repeatable. I did another make check exactly the same way and it was fine this time around. This is pure mingw gcc 4.8.3 64-bit compiled PostgreSQL 10 with cassert on. No VC++ in mix since EDB hasn't come out with PostgreSQL 10 for me to test with anyrate when it fails its in the cunit layer so that shouldn't have anything to do with it anyway.
comment:12 by , 7 years ago
damn I wish this happened consistently. I got the error again but then can't repeat it trying 4 times after. true Heisenberg. I'll try throwing in some debug notices to see if I can at least catch where it's happening.
comment:13 by , 7 years ago
Milestone: | PostGIS 2.5.0 → PostGIS 2.4.1 |
---|
still failing randomly usually on 32-bit runs.
comment:14 by , 7 years ago
Milestone: | PostGIS 2.4.1 → PostGIS 2.4.2 |
---|
comment:15 by , 7 years ago
Milestone: | PostGIS 2.4.2 → PostGIS 2.4.3 |
---|
comment:16 by , 7 years ago
Summary: | test_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runs → test_kmeans fails on winnie often on 32-bit and sometimes on 64-bit runs, and travis sometimes |
---|
Yeh travis crashed on kmeans test as well recently (not just all in winnie's head something is fishy in these mean waters)
https://travis-ci.org/postgis/postgis/jobs/322217076
This was run against trunk r16189
PostGIS is now configured for x86_64-unknown-linux-gnu -------------- Compiler Info ------------- C compiler: gcc -O3 -march=native -mtune=native SQL preprocessor: /usr/bin/cpp -traditional-cpp -w -P -------------- Additional Info ------------- Interrupt Tests: DISABLED use: --with-interrupt-tests to enable -------------- Dependencies -------------- GEOS config: /usr/bin/geos-config GEOS version: 3.5.0 GDAL config: /usr/bin/gdal-config GDAL version: 2.2.2 SFCGAL config: /usr/bin/sfcgal-config SFCGAL version: 1.2.2 PostgreSQL config: /usr/lib/postgresql/9.6/bin/pg_config PostgreSQL version: PostgreSQL 9.6.6 PROJ4 version: 49 Libxml2 config: /usr/bin/xml2-config Libxml2 version: 2.9.1 JSON-C support: yes protobuf-c support: no PCRE support: yes Perl: /usr/bin/perl --------------- Extensions --------------- PostGIS Raster: enabled PostGIS Topology: enabled SFCGAL support: enabled Address Standardizer support: enabled -------- Documentation Generation -------- xsltproc: /usr/bin/xsltproc xsl style sheets: /usr/share/xml/docbook/stylesheet/docbook-xsl dblatex: /usr/bin/dblatex convert: /usr/bin/convert mathml2.dtd: /usr/share/xml/schema/w3c/mathml/dtd/mathml2.dtd
Test: test_kmeans ...make[2]: *** [check] Illegal instruction (core dumped) make[2]: Leaving directory `/home/travis/build/postgis/postgis/liblwgeom/cunit' make[1]: *** [check] Error 2 make[1]: Leaving directory `/home/travis/build/postgis/postgis/liblwgeom' make: *** [check] Error 1
comment:17 by , 7 years ago
How about we adopt logbt for cunit as non-temporary measure?
It will just print backtrace for anything running under it if it dumps core.
I've used it like this (full path to cunit was also needed): https://github.com/postgis/postgis/pull/176/commits/f2f06a11572bb25168fe375c9236d3b351f4607e
Likely much more templating is needed to detect presence of logbt and run under it if it's there.
comment:18 by , 7 years ago
yah that would be great. Not sure how to move forward with that.
BTW winnie's 64-bit on 2.5.0 failed
Test: test_kmeans ...Makefile:86: recipe for target 'check' failed make[2]: *** [check] Segmentation fault make[2]: Leaving directory '/projects/postgis/branches/2.4/liblwgeom/cunit' Makefile:207: recipe for target 'check' failed make[1]: *** [check] Error 2 make[1]: Leaving directory '/projects/postgis/branches/2.4/liblwgeom' GNUmakefile:16: recipe for target 'check' failed make: *** [check] Error 1
comment:19 by , 7 years ago
logbt enabled on travis. If it ever reproduces there it will be logged, although likely reason for Illegal Instruction failure was due to -march=native and travis faking CPU ID.
comment:20 by , 7 years ago
Milestone: | PostGIS 2.4.3 → PostGIS 2.4.4 |
---|
after all your changes this might not be an issue anymore, but I'll keep it open until we confirm.
comment:22 by , 7 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:23 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Do you run with RUNTESTFLAGS=-v ? If there's no output but non-success return then maybe it's a missing "return" somewhere, leaving the return code to phase-of-the-moon matters