Opened 6 years ago

Closed 4 years ago

Last modified 4 years ago

#4129 closed defect (fixed)

Cannot update SecondarySnapshot during a parallel operation

Reported by: davidp Owned by: Bborie Park
Priority: high Milestone: PostGIS 2.4.9
Component: raster Version: 2.4.x
Keywords: SecondarySnapshot parallel Cc:

Description

Hi, since we upgraded our Postgres/Postgis stack to version 10/2.4 this error starts to appear in our log:

ERROR: cannot update SecondarySnapshot during a parallel operation

Most often in this context:

SQL statement: "SELECT proj4text FROM public.spatial_ref_sys WHERE srid = 5514 LIMIT 1"

or

SQL statement "SELECT TRUE" PL/pgSQL function _st_intersects(geometry,raster,integer)

There are many diferent sources of this error (PHP, Python, Mapnik), so i thing, there is marginal chance to be caused by our code (SQL statements). When i wrap problematic statements into PL/pgSQL function (which are parallel restricted by default), then this error disappear. So my suspicion falls on PostGIS functions and how they behave in PostgreSQL parallel mode.

For now i thing there is issue in these PostGIS functions:

  • ST_Transform(geometry, int) which executes "select proj4text from..."
  • ST_Intersects(geometry, raster)

Our system info is:

  • PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit
  • POSTGIS="2.4.4 r16526" PGSQL="100" GEOS="3.5.1-CAPI-1.9.1 r4246" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 2.1.2, released 2016/10/24" LIBXML="2.9.4" LIBJSON="0.12.1" LIBPROTOBUF="1.2.1" RASTER

And PostreSQL cluster resides behind PGBouncer in session pooling mode.

Best Regards, David

Attachments (2)

pgbadger-error-report-1.png (23.9 KB ) - added by davidp 6 years ago.
pgbadger-error-report-2.png (46.5 KB ) - added by davidp 6 years ago.

Download all attachments as: .zip

Change History (15)

comment:1 by Algunenano, 6 years ago

Component: postgisraster
Owner: changed from pramsey to Bborie Park

This looks like there is some Postgis function marked as PARALLEL SAFE when it isn't.

ST_Transform might execute a query like the one you've posted above, but it's read only and doesn't do anything strange so it should be ok.

On the other hand ST_Intersects(geometry, raster) is defined as:

CREATE OR REPLACE FUNCTION _st_intersects(geom geometry, rast raster, nband integer DEFAULT NULL)
	RETURNS boolean AS $$
	DECLARE
		hasnodata boolean := TRUE;
		_geom geometry;
	BEGIN
		IF ST_SRID(rast) != ST_SRID(geom) THEN
			RAISE EXCEPTION 'Raster and geometry do not have the same SRID';
		END IF;

		_geom := ST_ConvexHull(rast);
...

And if we look at ST_ConverHull(raster) it is missing the parallel declaration:

-- Availability: 2.0.0
-- Changed: 2.1.4 raised cost
-- used to cast raster -> geometry
CREATE OR REPLACE FUNCTION st_convexhull(raster)
    RETURNS geometry
    AS 'MODULE_PATHNAME','RASTER_convex_hull'
    LANGUAGE 'c' IMMUTABLE STRICT
    COST 300;

This means that, by default rules, it's parallel unsafe, which makes anyone using it also unsafe. Now, I don't know if the parallelism declaration is missing or it really is unsafe, but that's where I would start looking into.

in reply to:  1 comment:2 by davidp, 6 years ago

Replying to Algunenano:

This means that, by default rules, it's parallel unsafe, which makes anyone using it also unsafe. Now, I don't know if the parallelism declaration is missing or it really is unsafe, but that's where I would start looking into.

I didn't thing so... PARALLEL UNSAFE apply only downstream (by my observations). So, function marked as parallel safe is executed in parallel mode, even if internally calls parallel unsafe function. Contrary, parallel unsafe function never executes in parallel mode, no matter how internally called functions are labeled.

Ind in our DB, functions are declared as follows:

CREATE OR REPLACE FUNCTION public._st_intersects(geom geometry, rast raster, nband integer DEFAULT NULL::integer)
 RETURNS boolean
 LANGUAGE plpgsql
 IMMUTABLE PARALLEL SAFE COST 1000
 ...

and

CREATE OR REPLACE FUNCTION public.st_intersects(geom geometry, rast raster, nband integer DEFAULT NULL::integer)
 RETURNS boolean
 LANGUAGE sql
 IMMUTABLE PARALLEL SAFE COST 1000
AS $function$ SELECT $1 OPERATOR(public.&&) $2::geometry AND public._st_intersects($1, $2, $3); $function$

And one correction to my original post... function default is PARALLEL UNSAFE, not RESTRICTED. Sorry, my fault.

Last edited 6 years ago by davidp (previous) (diff)

comment:3 by Algunenano, 6 years ago

So, function marked as parallel safe is executed in parallel mode, even if internally calls parallel unsafe function.

Yes, Postgresql trust the programmer when he declares a function parallel but, by definition, any function that calls another one that is parallel unsafe should be marked as unsafe too. If you don't, then your unsafe function might get called in parallel which leads to errors like this one or worse.

What I meant is, that if ST_ConverHull is unsafe (should be reviewed and tagged appropriately) then st_intersects(geom, raster, int) should be marked as unsafe too, but I haven't investigated any further.

comment:4 by pramsey, 6 years ago

Milestone: PostGIS 2.4.5PostGIS 2.4.6

comment:5 by robe, 6 years ago

In 16988:

Add missing schema qualifications for raster _ST_Intersects
references #4129 for PostGIS 3.0

comment:6 by pramsey, 6 years ago

So, that particular error message arises from GetLatestSnapshot which is not called at all in the PostGIS code base and pretty infrequently in the PgSQL code base. The only places that I can see that might happen during run time is some replication code paths? Are you running with replication on? I'm looking particularly at the st_transform() code line in PostGIS, and it all seems like it should be fine WRT parallelism. Any hints on replicating this?

comment:7 by davidp, 6 years ago

Yes, we are using synchronous streaming replication. Pretty simle scenario with one Master and one Hot Stanby.

There is SQL query that causes errors:

with adrm as( select *, ST_Distance( ST_TRANSFORM( ST_SETSRID( ST_MAKEPOINT( $1, $2 ), $4 ), 5514 ), definicni_bod) distance
	from ruian.rn_adresni_misto
	where ST_DWithin( ST_TRANSFORM( ST_SETSRID( ST_MAKEPOINT( $1, $2 ), $4 ), 5514 ), definicni_bod, $5 )
	order by distance asc limit $3 ) 
select json_agg( json_build_object( 'address', address, 'coords', coords, 'distance', distance )) as result
	from(select json_build_object( 'adrm_kod', adrm.kod, 'kraj', ruian.rn_vusc.nazev, 'kraj_kod', ruian.rn_okres.vusc_kod, 'okres', ruian.rn_okres.nazev, 'okres_kod', ruian.rn_obec.okres_kod, 'obec', ruian.rn_obec.nazev, 'obec_kod', ruian.rn_obec.kod, 'cast_obce', ruian.rn_cast_obce.nazev, 'cast_obce_kod', ruian.rn_cast_obce.kod, 'ulice', ruian.rn_ulice.nazev, 'ulice_kod', ruian.rn_ulice.kod, 'stavebni_objekt', ruian.rn_stavebni_objekt.kod, 'co', cislo_orientacni_hodnota, 'co_pism', cislo_orientacni_pismeno, 'cd', cislo_domovni, 'psc', adrp_psc ) address,
			st_asgeojson( ST_Transform( adrm.definicni_bod, $4 ))::json -> 'coordinates' coords, 
			round( adrm.distance::numeric, 2 ) distance
			from adrm
			left join ruian.rn_ulice on
			adrm.ulice_kod = ruian.rn_ulice.kod
			left join ruian.rn_stavebni_objekt on
			ruian.rn_stavebni_objekt.kod = adrm.stavobj_kod
			left join ruian.rn_cast_obce on
			ruian.rn_cast_obce.kod = ruian.rn_stavebni_objekt.cobce_kod
			left join ruian.rn_obec on
			ruian.rn_ulice.obec_kod = ruian.rn_obec.kod
			or ruian.rn_cast_obce.obec_kod = ruian.rn_obec.kod
			left join ruian.rn_okres on
			ruian.rn_obec.okres_kod = ruian.rn_okres.kod
			left join ruian.rn_vusc on
			ruian.rn_okres.vusc_kod = ruian.rn_vusc.kod) a;

where: $1 = X coordinate, $2 = Y coordinate, $3 = row limit, $4 = coordinates SRID, $5 = search radius

Error appears randomly when this query is executed in batch mode - eg 20k of executions... When i transformed this query into function (parallel unsafe), errors disappears.

Hope that helps...

by davidp, 6 years ago

Attachment: pgbadger-error-report-1.png added

by davidp, 6 years ago

Attachment: pgbadger-error-report-2.png added

comment:8 by robe, 5 years ago

Milestone: PostGIS 2.4.6PostGIS 2.4.7

comment:9 by pramsey, 5 years ago

Milestone: PostGIS 2.4.7PostGIS 2.4.9

comment:10 by robe, 4 years ago

Is this still an issue. I thought it might have had to do with our calls to spatial_ref_sys in C code not being schema qualified, I think we fixed it in newer versions of PostGIS , but I forget if we went as far back as 2.4

comment:11 by robe, 4 years ago

Resolution: fixed
Status: newclosed

I suspect this is a similar issue to #4661 and is fixed by

40015c93/git

comment:12 by Regina Obe <lr@…>, 4 years ago

In 5e2440d/git:

Revise ST_ConcaveHull to run for 0.99 as well, ammend NEWs for #4129. Closes #4541 for PostGIS 2.5.5. references #4129 for PostGIS 2.5.5

comment:13 by Regina Obe <lr@…>, 4 years ago

In ac3d025/git:

Revise ST_ConcaveHull to run for 0.99 as well, ammend NEWs for #4129. Closes #4541 for PostGIS 3.0.2. references #4129 for PostGIS 3.0.2

Note: See TracTickets for help on using tickets.