Opened 3 years ago
Last modified 3 years ago
#5176 new defect
no data check in raster2pgsql and also ST_BandIsNoData are slow
Reported by: | robe | Owned by: | robe |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS Fund Me |
Component: | raster | Version: | master |
Keywords: | Cc: |
Description
I suspect it's too late to do anything about this in 3.3 and maybe nothing can be done.
In PostGIS 3.2 we added the -k to raster2pgsql to allow skipping no data check. From PostGIS 3.2 on, the no data check is automatically done unless -k is specified.
I am finding that this check is taking quite a long time for large files. By large I mean like a tiff of 300MB or more.
For example
I've been trying to output to disk a raster2pgsql for a 500 MB file which gdalinfo shows this:
Size is 637200, 270000 Coordinate System is: : ID["EPSG",4326]] Data axis to CRS axis mapping: 2,1 Origin = (-125.000000000000000,49.000000000000000) Pixel Size = (0.000092592592593,-0.000092592592593) Metadata: AREA_OR_POINT=Area Image Structure Metadata: COMPRESSION=LZW INTERLEAVE=BAND Corner Coordinates: Upper Left (-125.0000000, 49.0000000) (125d 0' 0.00"W, 49d 0' 0.00"N) Lower Left (-125.0000000, 24.0000000) (125d 0' 0.00"W, 24d 0' 0.00"N) Upper Right ( -66.0000000, 49.0000000) ( 66d 0' 0.00"W, 49d 0' 0.00"N) Lower Right ( -66.0000000, 24.0000000) ( 66d 0' 0.00"W, 24d 0' 0.00"N) Center ( -95.5000000, 36.5000000) ( 95d30' 0.00"W, 36d30' 0.00"N) Band 1 Block=637200x1 Type=Byte, ColorInterp=Gray NoData Value=0
my call looks something like this: My original test started 10 hrs ago hasn't spit out anything yet. I assume it is still stuck in the no data check loop.
raster2pgsql -I -t auto -Y 15000 -e file.tif >> test2.sql
# this one with the -k option started spitting out to disk within 5 minutes, after 2 hrs it's at 8GB
raster2pgsql -k -I -t auto -Y 15000 -e file.tif >> test2.sql
I was thinking that for these kinds of cases, perhaps loading all the junk in and purging the no data rows would be faster, but sadly
http://postgis.net/docs/RT_ST_BandIsNoData.html Doing something like below to force the check is equally slow.
SELECT ST_BandIsNoData(rast, true) FROM sometable
Change History (2)
comment:1 by , 3 years ago
comment:2 by , 3 years ago
For this particular file, it was like 95% no data and 5% data. After retiling, the time for the whole process for this took about 41 minutes
# this took 40 minutes raster2pgsql -I -t 128x128 -Y 15000 -e file_retiled.tif t_128_128 >> test.sql # this took 75.56 s psql -f test.sql
I think going direct to PostgreSQL took about 25 minutes, but have to recheck
raster2pgsql -I -t 128x128 -Y 15000 -e file_retiled.tif t_128_128 | psql
So much of the slowness was just the tiff blocksize of Block=637200x1. It never completed so I killed it after 48 hrs.
Still would be nice to get the no data checking timing down a bit, but it does end up much faster doing the no data check than trying to load like 30GB of useless no data data in this case.
This isn't quite as bad of an issue as I thought. The slowness of no data checking seemed to be attributed to the block size.
After converting with:
which gives default of 256x256 blocks,
then the no data checking finished within about 20 minutes. Still waiting for the load though.