#5387 closed defect (wontfix)

ST_Transform(geom, str, str) is very slow

Reported by: cdestigter Owned by: pramsey
Priority: medium Milestone: PostGIS 3.4.0
Component: postgis Version: 3.3.x
Keywords: Cc: cdestigter, rcoup

Description

Using the string arguments to ST_Transform() causes a massive slowdown:

# SELECT st_extent(foo.geom) from (select ST_Transform("GEOMETRY", 'EPSG:7844', 4326) AS geom FROM mytable LIMIT 200) foo;
Time: 3852.815 ms (00:03.853)

# SELECT st_extent(foo.geom) from (select ST_Transform("GEOMETRY", 'EPSG:7844', 'EPSG:4326') AS geom FROM mytable LIMIT 200) foo;
Time: 527.896 ms

# SELECT st_extent(foo.geom) from (select ST_Transform("GEOMETRY",  4326) AS geom FROM mytable LIMIT 200) foo;
Time: 22.951 ms

We're using the string argument form to force postGIS to use proj.db instead of spatial_ref_sys for CRS definitions.

  1. I had assumed that postGIS would cache the transform object, but from a quick look at the code it looks like it's probably just instantiating a new transform for every geometry, hence the slowness.
  1. I don't understand why the (str, int) form and the (str, str) form are so different in terms of timing.

Could postgis maintain a cache of transform objects for (str, str) pairs, to achieve performance similar to the int argument form?

Change History (10)

comment:1 by rcoup, 18 months ago

Cc: rcoup added

comment:2 by robe, 18 months ago

Can you output your

` SELECT postgis_full_version(); `

I don't see using ST_Transform(geom, srid) would yield a different answer from ST_Transform(geom, text, srid)

In both cases it should be using the proj.db since the proj.db takes precedence over waht is in spatial_ref_sys table.

Can you output an example where

ST_AsText(ST_Transform(ST_SetSRID("GEOMETRY", 7844), 4326))

is different from

ST_AsText(ST_Transform("GEOMETRY", 'EPSG:7844' , 4326))

comment:3 by robe, 18 months ago

Milestone: PostGIS 3.3.3PostGIS 3.4.0

comment:4 by cdestigter, 17 months ago

Apologies, I had mis-remembered why we did this.

It's actually because

(a) we wanted to be able to use CRSes that Postgis doesn't know about. Just using an integer SRID requires that that CRS be in spatial_ref_sys

(b) We wanted to ensure we have a single source of truth for CRS information (proj.db) - and not have to sync patches into PostGIS if we change proj.db by adding custom CRSes etc.

comment:5 by cdestigter, 17 months ago

 POSTGIS="3.3.3 2355e8e" [EXTENSION] PGSQL="130" GEOS="3.10.2-CAPI-1.16.0" PROJ="8.2.1" LIBXML="2.9.13" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)" (core procs from "3.3.2 4975da8" need upgrade)

comment:6 by cdestigter, 17 months ago

I also find this a bit surprising (I'm not sure if it's related):

dbname=# select st_asewkt(st_transform('point(0 0)'::geometry, 'EPSG:7844', 'EPSG:4283'));
 POINT(-0.000009040658614 0.000009514196768)

dbname=# select st_asewkt(st_transform('srid=7844;point(0 0)'::geometry, 'EPSG:4283'));
 POINT(0 0)

(our proj.db has an overridden custom transformation for these CRSes, which might explain the differences here - maybe one of these calls is using the custom pipeline and the other is not.) I'll try and dig into that behaviour a bit more tomorrow.

comment:7 by robe, 17 months ago

@cdestigter,

PostGIS delegates to proj.db if the srid matches an entry in your proj.db. It does need to exist in spatial_ref_sys (the srid, epsg code), but that's it.

If you are not finding this to be the case, then that is an issue

comment:8 by cdestigter, 17 months ago

Yes - but it doesn't make sense to have to sync up the SRIDs in that table with the systems in proj.db. Apart from any other concerns, we'd have to find and use some form of integer IDs for them - and not all of them are EPSG CRSes. In addition the Postgres database is not deployed concurrently with the proj.db in our app images and so trying to keep them in sync would require complex migrations, which can be avoided if we skip using spatial_ref_sys at all.

I feel like that concern is tangential - the issue I had in this ticket is that some forms of the ST_Transform() function are very slow - presumably they are not caching their transforms across multiple input rows. Can this be improved?

Version 0, edited 17 months ago by cdestigter (next)

comment:9 by pramsey, 17 months ago

We cache based on srid, which in turn maps to auth/auth_srid. If an auth/auth_srid pair exists in the proj.db, then that is the definition that is used, not the proj4text or srtext. If you need to add non-EPSG systems, yes you'll have to add an entry to spatial_ref_sys with as detailed an SRTEXT as you can find. In a just world, the transform(geom, str, str) wouldn't even exist, as the SFSQL is pretty clear about the centrality of SRID and spatial_ref_sys as the mechanisms for CRS definitions. It is a holdover from the days when proj strings were the only way of representing CRSs. An good proj person could explain in even more detail, but basically dereferencing to an auth/auth_srid doesn't just get you a look-up to an SRTEXT. It gets you a lookup into the full EPSG database, which also referentially gives you all the best transforms between different systems, a rich extra layer of information that doesn't exist inside the SRTEXT.

Anyways, no, we won't be changing the proj cache, so str/str transforms are going to remain slow. You can always add systems you want to formalize into the spatial_ref_sys table.

comment:10 by pramsey, 17 months ago

Resolution: wontfix
Status: newclosed
Note: See TracTickets for help on using tickets.