Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#2267 closed defect (fixed)

Server crash from analyze table

Reported by: peters Owned by: pramsey
Priority: blocker Milestone: PostGIS 2.0.4
Component: postgis Version: 2.0.x
Keywords: centos rhel5 geography history point analyze crash Cc:

Description

Below is simple test case which replicates problem on (2) different PostGIS systems running same version. It seems have to have something to do with a table containing a geographical point in which x values are different but y values are same. May be specific to certain value ranges.

SELECT version()
' ' postgis_full_version();

PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-54), 64-bit POSTGIS="2.0.3 r11128" GEOS="3.3.8-CAPI-1.7.8" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.9.2, released 2012/10/08" LIBXML="2.6.26" LIBJSON="UNKNOWN" RASTER

(1 row)

create table foo(g geography(point,4326));

insert into foo values(st_setsrid(st_makepoint(1.2, 19.8), 4326));

insert into foo values(st_setsrid(st_makepoint(1.3, 19.8), 4326));

analyze verbose foo;

INFO: analyzing "public.foo" INFO: "foo": scanned 1 of 1 pages, containing 2 live rows and 0 dead rows; 2 rows in sample, 2 estimated total rows

server closed the connection unexpectedly

This probably means the server terminated abnormally before or while processing the request.

The connection to the server was lost. Attempting reset: Failed. !> \q

Attachments (2)

2267.log (16.3 KB ) - added by Bborie Park 12 years ago.
valgrind output from running analyze
analyze1.patch (2.0 KB ) - added by pramsey 12 years ago.
Avoid case 3, and condition if things are bad

Download all attachments as: .zip

Change History (26)

comment:1 by peters, 12 years ago

Priority: mediumblocker

comment:2 by robe, 12 years ago

Milestone: PostGIS 2.0.4

comment:3 by strk, 12 years ago

I can't reproduce on PostgreSQL 9.1.8, it could be a 9.2 issue only. Peters, did you have any warning at build time ?

in reply to:  3 comment:4 by peters, 12 years ago

I reproduced on my dev system, running 9.2.4, 2.0.3 under CENTOS-5.

Only really special thing about this build is use of larger block size (./configure --with-blocksize=32 --with-wal-blocksize=32 --enable-debug).

Rebuilt postgres and postgis for debugging, ran posted test case using psql client:

Psql client output:

analyze verbose foo;

INFO: analyzing "public.foo"

NOTICE: [geography_estimate.c:geography_analyze:1490] geography_analyze called

NOTICE: [geography_estimate.c:geography_analyze:1497] attribute stat target: 500

NOTICE: [geography_estimate.c:geography_analyze:1503] minrows: 150000

INFO: "foo": scanned 1 of 1 pages, containing 2 live rows and 0 dead rows; 2 rows in sample, 2 estimated total rows

NOTICE: [geography_estimate.c:compute_geography_stats:826] compute_geography_stats called

NOTICE: [geography_estimate.c:compute_geography_stats:932] End of 1st scan:

NOTICE: [geography_estimate.c:compute_geography_stats:934] Sample extent (min, max): (0.940639 0.0197043 0.338738), (0.940674 0.0213461 0.338738)

NOTICE: [geography_estimate.c:compute_geography_stats:935] No. of geometries sampled: 2

NOTICE: [geography_estimate.c:compute_geography_stats:936] No. of non-null geometries sampled: 2

NOTICE: [geography_estimate.c:compute_geography_stats:947] Standard deviation filter enabled

NOTICE: [geography_estimate.c:compute_geography_stats:979] standard deviations:

NOTICE: [geography_estimate.c:compute_geography_stats:980] LOWx - avg:0.940656 sd:0.000018

NOTICE: [geography_estimate.c:compute_geography_stats:981] LOWy - avg:0.020525 sd:0.000821

NOTICE: [geography_estimate.c:compute_geography_stats:982] LOWz - avg:0.338738 sd:0.000000

NOTICE: [geography_estimate.c:compute_geography_stats:983] HIGx - avg:0.940657 sd:0.000018

NOTICE: [geography_estimate.c:compute_geography_stats:984] HIGy - avg:0.020525 sd:0.000821

NOTICE: [geography_estimate.c:compute_geography_stats:985] HIGz - avg:0.338738 sd:0.000000

NOTICE: [geography_estimate.c:compute_geography_stats:995] sd_extent: xmin, ymin, zmin: 0.940639, 0.019704, 0.338738

NOTICE: [geography_estimate.c:compute_geography_stats:997] sd_extent: xmax, ymax, zmax: 0.940674, 0.021346, 0.338738

NOTICE: [geography_estimate.c:compute_geography_stats:1083] histogram_extent: xmin, ymin, zmin: 0.940639, 0.019704, 0.338738

NOTICE: [geography_estimate.c:compute_geography_stats:1085] histogram_extent: xmax, ymax, zmax: 0.940674, 0.021346, 0.338738

NOTICE: [geography_estimate.c:compute_geography_stats:1129] Number of dimensions in sample set: 3

NOTICE: [geography_estimate.c:compute_geography_stats:1247] computed histogram grid size (X,Y,Z): 128 x 5863 x 0 (0 out of 80000 cells)

NOTICE: [geography_estimate.c:compute_geography_stats:1285] Beginning histogram intersection calculations

NOTICE: [geography_estimate.c:compute_geography_stats:1304] feat 0 box is 0.940674 0.019704 0.338738, 0.940674 0.019704 0.338738

NOTICE: [geography_estimate.c:compute_geography_stats:1337] feat 0 overlaps unitsx 127-127, unitsy 0-0, unitsz -1--1

Gdb server side:

Program received signal SIGSEGV, Segmentation fault.

0x00002b803a870407 in compute_geography_stats (stats=0x1321db98, fetchfunc=0x569d3f <std_fetch_func>, samplerows=2, totalrows=2) at geography_estimate.c:1367 1367 geogstats->value[x + y * unitsx + z * unitsx * unitsy] += 1;

(gdb) where

#0 0x00002b803a870407 in compute_geography_stats (stats=0x1321db98, fetchfunc=0x569d3f <std_fetch_func>, samplerows=2, totalrows=2) at geography_estimate.c:1367

#1 0x00000000005677df in do_analyze_rel (onerel=0x2b803bfbab00, vacstmt=0x131d51f0, acquirefunc=0x5687e2 <acquire_sample_rows>, relpages=1, inh=0 '\000', elevel=17) at analyze.c:525

#2 0x0000000000566e13 in analyze_rel (relid=57344, vacstmt=0x131d51f0, bstrategy=0x1321b300) at analyze.c:265

#3 0x00000000005d2c86 in vacuum (vacstmt=0x131d51f0, relid=0, do_toast=1 '\001', bstrategy=0x1321b300, for_wraparound=0 '\000', isTopLevel=1 '\001') at vacuum.c:247

#4 0x000000000070004e in standard_ProcessUtility (parsetree=0x131d51f0, queryString=0x131d47c0 "analyze verbose foo;", params=0x0, isTopLevel=1 '\001', dest=0x131d5530,

completionTag=0x7fff713d3370 "") at utility.c:1027

#5 0x00002b80194c917c in pgss_ProcessUtility () from /usr/local/pgsql/lib/pg_stat_statements.so

#6 0x00000000006fef11 in ProcessUtility (parsetree=0x131d51f0, queryString=0x131d47c0 "analyze verbose foo;", params=0x0, isTopLevel=1 '\001', dest=0x131d5530, completionTag=0x7fff713d3370 "")

at utility.c:332

#7 0x00000000006fdff8 in PortalRunUtility (portal=0x13146360, utilityStmt=0x131d51f0, isTopLevel=1 '\001', dest=0x131d5530, completionTag=0x7fff713d3370 "") at pquery.c:1185

#8 0x00000000006fe18f in PortalRunMulti (portal=0x13146360, isTopLevel=1 '\001', dest=0x131d5530, altdest=0x131d5530, completionTag=0x7fff713d3370 "") at pquery.c:1316

#9 0x00000000006fd7ff in PortalRun (portal=0x13146360, count=9223372036854775807, isTopLevel=1 '\001', dest=0x131d5530, altdest=0x131d5530, completionTag=0x7fff713d3370 "") at pquery.c:814

#10 0x00000000006f7d16 in exec_simple_query (query_string=0x131d47c0 "analyze verbose foo;") at postgres.c:1046

#11 0x00000000006fbc3a in PostgresMain (argc=1, argv=0x131266b0, dbname=0x13126698 "gsw", username=0x13126680 "gsw") at postgres.c:3959

#12 0x00000000006a9cfb in BackendRun (port=0x1314b450) at postmaster.c:3614

#13 0x00000000006a9412 in BackendStartup (port=0x1314b450) at postmaster.c:3304

#14 0x00000000006a632a in ServerLoop () at postmaster.c:1367

#15 0x00000000006a5cb8 in PostmasterMain (argc=1, argv=0x131247e0) at postmaster.c:1127

#16 0x000000000061d98c in main (argc=1, argv=0x131247e0) at main.c:199

So problem appears to be in postgis compute_geography_stats ().

comment:5 by peters, 12 years ago

Here is some additional gdb info at the crash point:

(gdb) print x $1 = 127

(gdb) print y $2 = 0

(gdb) print z $3 = -1

(gdb) print unitsx $4 = 128

(gdb) print x + y * unitsx + z * unitsx * unitsy $5 = -750337

(gdb) print ndims $6 = 3

(gdb) print z_idx_max $7 = -1

(gdb) print total_cells_coverage $8 = 0

(gdb) print z_idx_min $9 = -1

The negative numbers entering into the calculation for an index into geogstats->value looks suspicious to me, but this is the first time I've seen this code.

--Peter

comment:6 by peters, 12 years ago

Also suspicious is that the code seems to be attempting to process z values when I originally entered 2D geographical points into the table. I am not sure where the z values are coming from.

--Peter

comment:7 by pramsey, 12 years ago

Geography coordinates are indexed in geocentric (x/y/z) space, so that's not an issue. I'll see if this test case exercises my own environment.

comment:8 by pramsey, 12 years ago

I'm on 9.2.3 and I'm not seeing a crash on 2.0 branch.

comment:9 by pramsey, 12 years ago

Er, on OS/X, I should note.

comment:10 by peters, 12 years ago

Ive tried a few things, and I consistently get crash at the same point. Note that I am running CENTOS-5 (RHEL-5 look-alike).

Database 9.2.3 with postgis 2.0.3 crashes.

Database 9.2.3 with postgis 2.0.2 crashes.

I went back to database 9.2.4 and postgis 2.0.3 and created brand new database using default block sizing and still crashes exactly the same.

Can you try this on a CENTOS, RHEL, or Fedora Os?

thanks, --Peter

comment:11 by robe, 12 years ago

Keywords: centos rhel5 added

FWIW:

I have PostGIS compiled under mingw64 using GCC 4.5.4 but deployed on VC++ compiled PostgreSQL.

I tried on my 9.2.2 Windows 7

SELECT version() || ' ' || postgis_full_version(); 

PostgreSQL 9.2.2, compiled by Visual C++ build 1600, 64-bit POSTGIS="2.0.3 r11132" GEOS="3.4.0dev-CAPI-1.8.0 r0" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.9.2, released 2012/10/08" LIBXML="2.7.8" LIBJSON="UNKNOWN" RASTER

and my windows 2008 64-bit

PostgreSQL 9.2.4, compiled by Visual C++ build 1600, 64-bit POSTGIS="2.0.2SVN r10643" GEOS="3.4.0dev-CAPI-1.8.0 r0" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.9.2, released 2012/10/08" LIBXML="2.7.8" LIBJSON="UNKNOWN" (core procs from "2.0.1 r9979" need upgrade) RASTER (raster procs from "2.0.1 r9979" need upgrade)

and doesn't seem to crash on either of them even after running multiple times. So might be specific to Centos or Redhat I guess.

Tried my 2.1 build on windows 7 64-bit and didn't see any crashes.

Bborie, You've got every VM under the sun :) You wouldn't happen to have a CentOS 5 64-bit one too would you?

comment:12 by peters, 12 years ago

Another data point. Tried on my Windows 7 machine, using prebuilt versions:

PostgreSQL 9.2.2, compiled by Visual C++ build 1600, 64-bit POSTGIS="2.0.2 r10789" GEOS="3.3.6-CAPI-1.7.6" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.9.2, released 2012/10/08 GDAL_DATA not found" LIBXML="2.7.8" LIBJSON="UNKNOWN" RASTER

Can't tell you the line number, as I don't have windows dev tools, but it crashed on analyze. Took out the whole service.

2013-04-10 15:54:47 EDT:::@: [11048]: LOG: server process (PID 6388) was terminated by exception 0xC0000005 2013-04-10 15:54:47 EDT:::@: [11048]: DETAIL: Failed process was running: analyze verbose foo; 2013-04-10 15:54:47 EDT:::@: [11048]: HINT: See C include file "ntstatus.h" for a description of the hexadecimal value.

Note that you should just do those 2 insert statements and analyze. You can drop the table and recreate as needed and it crashes every time.

comment:13 by robe, 12 years ago

Can you try the 2.0.3 one? Do you run all the steps at once. I tried individually and then together and couldn't get it to crash.

comment:14 by robe, 12 years ago

I'll try swapping out my GEOS to see if that is the culprit -- it shouldn't be involved at all, but you never know. I'm running with a dev version of GEOS.

by Bborie Park, 12 years ago

Attachment: 2267.log added

valgrind output from running analyze

comment:15 by robe, 12 years ago

dustymugs,

What is your postgis full, version, and OS you are running on?

comment:16 by Bborie Park, 12 years ago

For the valgrind log, the env is...

Slackware 14.0 (32-bit) gcc 4.7.1

POSTGIS="2.0.4SVN r11275" GEOS="3.4.0dev-CAPI-1.8.0 r3795" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.10dev, released 2011/12/29" LIBXML="2.8.0" LIBJSON=" UNKNOWN" RASTER

comment:17 by Bborie Park, 12 years ago

Forgot PostgreSQL version: 9.2.3

comment:18 by Bborie Park, 12 years ago

For the same env but with -trunk r11281, there is no crash.

comment:19 by peters, 12 years ago

I updated my Win7 install using the prebuilt release:

PostgreSQL 9.2.2, compiled by Visual C++ build 1600, 64-bit POSTGIS="2.0.3 r11132" GEOS="3.3.8-CAPI-1.7.8" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.9.2, released 2012/10/08 GDAL_DATA not found" LIBXML="2.7.8" LIBJSON="UNKNOWN" (core procs from "2.0.2 r10789" need upgrade) TOPOLOGY (topology procs from "2.0.2 r10789" need upgrade) RASTER (raster procs from "2.0.2 r10789" need upgrade)

Note that there is bug in makepostgisdb_using_extensions.bat:

xcopy /I plugins.d "%PGADMIN%\plugins.d"

should be changed to read:

xcopy /I "pgAdmin III\plugins.d" "%PGADMIN%\plugins.d"

My test case still crashes on this version too! I am just copy/paste into psql, nothing special here.

--Peter

comment:20 by robe, 12 years ago

Peter, Thanks I'll fix the packaging. Regarding crashing, I still can't get to crash on my windows 7 64-bit even replacing all my binaries and cutting and pasting the lines in psql.

PostgreSQL 9.2.2, compiled by Visual C++ build 1600, 64-bit POSTGIS="2.0.3 r11132" GEOS="3.3.8-CAPI-1.7.8" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.9.2, released 2012/10/08" LIBXML="2.7.8" LIBJSON="UNKNOWN" RASTER

Bborie was able to replicate on his slackware 32-bit but not his 64-bit.

Are you running in a VM? or real equipment? I have real windows servers and workstation. Anyrate they are busy dissecting this issue on IRC if you want to take part in the conversation.

http://irclogs.geoapt.com/postgis/%23postgis.2013-04-10.log

by pramsey, 12 years ago

Attachment: analyze1.patch added

Avoid case 3, and condition if things are bad

comment:21 by peters, 12 years ago

I was able to patch my version (9.2.4 + 2.0.3) and was unable to crash using my test case. I also ran analyze on a number of tables with geography and it ran fine.

I looked at pg_stats and it seems that for point columns you always have n_distinct = -1, avg_width = 56, everything else looks to be null. For polygons, it looks like you get more useful avg_width values. Hopefully that is ok.

Thanks for all of your efforts on this and the quick turnaround time. The real test for me will be next week when I attempt to load large amounts of data (over 1TB).

Thanks again!

--Peter

comment:22 by Bborie Park, 12 years ago

Yep. No more crashes with that patch.

comment:23 by pramsey, 12 years ago

Resolution: fixed
Status: newclosed

Committed to 2.0 at r11287. The 2d index in 2.0 has some ceil() wrappings that likely prevent this problem. The histogram allocator in trunk was completely re-written, so this code line is gone there.

comment:24 by robe, 12 years ago

Keywords: history added
Note: See TracTickets for help on using tickets.