Opened 18 months ago
Closed 17 months ago
#5387 closed defect (wontfix)
ST_Transform(geom, str, str) is very slow
Reported by: | cdestigter | Owned by: | pramsey |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 3.4.0 |
Component: | postgis | Version: | 3.3.x |
Keywords: | Cc: | cdestigter, rcoup |
Description
Using the string arguments to ST_Transform()
causes a massive slowdown:
# SELECT st_extent(foo.geom) from (select ST_Transform("GEOMETRY", 'EPSG:7844', 4326) AS geom FROM mytable LIMIT 200) foo; Time: 3852.815 ms (00:03.853) # SELECT st_extent(foo.geom) from (select ST_Transform("GEOMETRY", 'EPSG:7844', 'EPSG:4326') AS geom FROM mytable LIMIT 200) foo; Time: 527.896 ms # SELECT st_extent(foo.geom) from (select ST_Transform("GEOMETRY", 4326) AS geom FROM mytable LIMIT 200) foo; Time: 22.951 ms
We're using the string argument form to force postGIS to use proj.db instead of spatial_ref_sys
for CRS definitions.
- I had assumed that postGIS would cache the transform object, but from a quick look at the code it looks like it's probably just instantiating a new transform for every geometry, hence the slowness.
- I don't understand why the (str, int) form and the (str, str) form are so different in terms of timing.
Could postgis maintain a cache of transform objects for (str, str) pairs, to achieve performance similar to the int argument form?
Change History (10)
comment:1 by , 18 months ago
Cc: | added |
---|
comment:2 by , 18 months ago
comment:3 by , 18 months ago
Milestone: | PostGIS 3.3.3 → PostGIS 3.4.0 |
---|
comment:4 by , 17 months ago
Apologies, I had mis-remembered why we did this.
It's actually because
(a) we wanted to be able to use CRSes that Postgis doesn't know about. Just using an integer SRID requires that that CRS be in spatial_ref_sys
(b) We wanted to ensure we have a single source of truth for CRS information (proj.db) - and not have to sync patches into PostGIS if we change proj.db by adding custom CRSes etc.
comment:5 by , 17 months ago
POSTGIS="3.3.3 2355e8e" [EXTENSION] PGSQL="130" GEOS="3.10.2-CAPI-1.16.0" PROJ="8.2.1" LIBXML="2.9.13" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)" (core procs from "3.3.2 4975da8" need upgrade)
comment:6 by , 17 months ago
I also find this a bit surprising (I'm not sure if it's related):
dbname=# select st_asewkt(st_transform('point(0 0)'::geometry, 'EPSG:7844', 'EPSG:4283')); POINT(-0.000009040658614 0.000009514196768) dbname=# select st_asewkt(st_transform('srid=7844;point(0 0)'::geometry, 'EPSG:4283')); POINT(0 0)
(our proj.db has an overridden custom transformation for these CRSes, which might explain the differences here - maybe one of these calls is using the custom pipeline and the other is not.) I'll try and dig into that behaviour a bit more tomorrow.
comment:7 by , 17 months ago
@cdestigter,
PostGIS delegates to proj.db if the srid matches an entry in your proj.db. It does need to exist in spatial_ref_sys (the srid, epsg code), but that's it.
If you are not finding this to be the case, then that is an issue
comment:8 by , 17 months ago
Yes - but it doesn't make sense to have to sync up the SRIDs in that table with the systems in proj.db. Apart from any other concerns, we'd have to find and use some form of integer IDs for them - and not all of them are EPSG CRSes. In addition the Postgres database is not deployed concurrently with the proj.db in our app images and so trying to keep them in sync would require complex migrations, which can be avoided if we skip using spatial_ref_sys at all.
I feel like that concern is tangential - the issue I had in this ticket is that some forms of the ST_Transform() function are very slow - presumably they are not caching their transforms across multiple input rows. Can this be improved?
comment:9 by , 17 months ago
We cache based on srid, which in turn maps to auth/auth_srid. If an auth/auth_srid pair exists in the proj.db, then that is the definition that is used, not the proj4text or srtext. If you need to add non-EPSG systems, yes you'll have to add an entry to spatial_ref_sys with as detailed an SRTEXT as you can find. In a just world, the transform(geom, str, str) wouldn't even exist, as the SFSQL is pretty clear about the centrality of SRID and spatial_ref_sys as the mechanisms for CRS definitions. It is a holdover from the days when proj strings were the only way of representing CRSs. An good proj person could explain in even more detail, but basically dereferencing to an auth/auth_srid doesn't just get you a look-up to an SRTEXT. It gets you a lookup into the full EPSG database, which also referentially gives you all the best transforms between different systems, a rich extra layer of information that doesn't exist inside the SRTEXT.
Anyways, no, we won't be changing the proj cache, so str/str transforms are going to remain slow. You can always add systems you want to formalize into the spatial_ref_sys table.
comment:10 by , 17 months ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Can you output your
` SELECT postgis_full_version();
`
I don't see using ST_Transform(geom, srid) would yield a different answer from ST_Transform(geom, text, srid)
In both cases it should be using the proj.db since the proj.db takes precedence over waht is in spatial_ref_sys table.
Can you output an example where
is different from