wiki:WKTRaster/GDALDriverSpecificationWorking

Version 43 (modified by jorgearevalo, 13 years ago) ( diff )

Specification simplified and uptated after a discussion by mail between the members of raster team

GDAL Driver for PostGIS Raster Working Specifications

Current status of the driver (April 2012)

The driver is:

  • Able to read in-db evenly blocked rasters (all blocks with same size)
  • Able to generate two kind of raster object based on two modes:
    • ONE_RASTER_PER_ROW ('mode = 1' in connection string, or nothing): The default mode. Each table row is considered as an independent raster. If the table required has more than one row, and no -where clause has been specified in connection string, all the table rows will be considered as reported as Subdatasets. Unless you specify the other working mode
    • ONE_RASTER_PER_TABLE ('mode = 2' in connection string): Each table is considered as a raster coverage, and each row is a raster tile.

The driver is not:

  • Able to read out-db rasters
  • Able to create new rasters
  • Able to manage all the PostGIS Raster arrangements
  • Able to provide a color interpretation for bands

Design principles

Topic: The basis

The main class of a GDAL driver is GDALDataset: A set of associated raster bands. So, 1 GDALDataset must be able to contain:

  • An untiled image stored in a raster table's row
  • A tiled image stored in a raster table (regular or irregular, rectangular or not, with or without missing tiles, with or without overlapping between tiles)
  • A raster object coverage from the rasterization of a vector coverage stored in a raster table (regular or irregular, rectangular or not, with or without missing tiles, with or without overlapping between tiles)

UPDATE: As Pierre suggested, there're only 2 arrangements

  • Regulary tiled raster
  • Irregulary tiled raster

Take into account a raster can contain only 1 tile. In that case, 1 GDALDataset = 1 PostGIS Raster object (= 1 PostGIS Raster table row). Otherwise, 1 GDALDataset = Several PostGIS Raster objects (= several PostGIS Raster rows). For this reason, the GDAL PostGIS Raster driver has 2 working modes: ONE_RASTER_PER_TABLE, ONE_RASTER_PER_ROW.

However, currently the driver only deals with continuous tiled raster layers, when all the raster tiles are the same size, snap to the same grid and do not overlap (the ideal case).

Open question: Are 2 working modes enough to manage all the raster arrangements? [SOLVED]: YES

Pierre: I think yes. We have to distinguish "want we want to produce" from "what we have to deal with". The two modes answer "want we want to produce" and the different table arrangement are "what we have to deal with".

From a GDAL user point of view I know there is a bunch of raster rows in the DB and there is only two things I want to do: extract those rasters rows one by one creating one raster per row or treat them all as a single raster and blend them all together. Furthermore, I want to be able to SELECT those rows using a WHERE statement. If I want a single raster from the db, I have to build my WHERE clause accordingly. There is no need for an extra mode for this. Beside, I don't want to know or have to know what is the raster table arrangement. I expect the driver is able to deal with them all.

Then, the driver has to deal with all the possible arrangement of those selected rows in both mode (overlap, gaps, missing tiles, etc...). You tried to enumerate the posssible arrangement above but I think there is only two cases: the tiles are regularly tiled or they are not, whatever the number of tile there is (1 or more). To me the irregular case is a generalization of the first one.

Jorge: I think we have 3 cases: untiled raster, regularly tiled raster and irregularly tiled raster.

Jorge: ok, updated

If, and only if, you can optimize the regularly tiled case, then you write is as an exception. The problem is to make sure the table is REALLY regularly tiled without relying on the user knowledge. Just the introduction of the -a option to raster2pgsql.py allowing to append tiles to an existing table make the "regularly blocked" flag untrustable. If really we want to maintain this flag we will have to create something like a ST_ValidateRegularBlocking aggregate function.

Jorge: fully agree. The only way to ensure a raster is regularly tiled is a checking function. To be used carefully.

Pierre: Then if we can not rely on the raster_columns flag and if a ST_ValidateRegularBlocking() would be too slow, we have to treat "regularly tiled" and "irregularly tiled" as a one unique case hoping that the "regular" one will be faster because it involves less processing when merging the tiles together.

Jorge: Agree.


Topic: Constructing the GDALDataset object

To construct a GDALDataset object, the driver must:

  • Open the dataset (create db connection)
  • Determine, in a 1st, very fast query to the db, by looking in the raster_overview view, what lower resolution table are available for the requested raster table
  • Determine, in a 2nd, fast enough query to the db, the extent and the maximum number of bands of the requested raster be aggregating the extents of all the rasters. This takes about 1 second on 360000 tiles even if there is no index.
  • Determine, in a 3rd, very fast query to the db, the pixel size & rotation, the band types and the nodata value for each band of ONLY ONE raster (LIMIT 1). The driver should assume those values will be the same for every other rasters in the table. If when fetching the other tiles, it realizes one does not, we must say that we do not support this arrangement. (I'm still a bit perplex about the nodata value though.)

Open Question: If in the first query we find a lower resolution table, does the rest of the work must be performed with this lower resolution table? At least these 3 queries, until we want to read the actual raster data to burn it into the buffer. The queries should be faster in an overview table, but the pixel size will not be the same using an overview table instead the normal resolution table. And you don't read from overviews unless you want to implement decimation because your buffer size is different from your raster size. Am I right?


Topic: Reading/Writing raster data

Once constructed the basic structure (GDALDataset object and related GDALRasterBand objects), we can read/write the data, following this general method: Fetch, in a long query, all the rasters along with their world georeferences (upperleftx and upperlefy, width and height) and burn them in the GDAL buffer by converting their world coordinates to the raster coordinates of the buffer.

More specific:

GDALRasterBand::IRasterIO(required tile metadata) {

  Deduce a world coordinate rectangle which four corners are the center of the upper left, lower left, lower right, upper right pixels of the area requested.
  
  Fetch, in a unique SQL query, every rasters intersecting this area along with the upper left X & Y, width, height, scalex, scaleY, skewX and skewY
  
  If there is only one row fetched and the metadata of this raster fits the required tile metadata (meaning we are querying based on the natural block size)

     just copy the values as a block (memcopy) (is this possible? Should we have/isn't there a ST_Bytea() SQL function?)
  	
  If not iterate on the required tile

     copy pixel values one by one from each raster fetched (that means if there is overlaps, only the pixels of the last raster fetched is burned. We can enhace this later by providing a BLENDING_MODE with the driver)

}

This algorithm must be developed in the implementation of IRasterIO method of the rasterband class. In the best case the required blocks fits what is in the table and everything is optimized. If not it is slower. We don't have to know in advance whether the table is regularly tiled or not.

About the IReadBlock method (to be implemented in the rasterband class):

GDALRasterBand::!ReadBlock(block x & y){

  Deduce IRasterIO required tile metadata

  call IRasterIO(required tile metadata)
}
Note: See TracWiki for help on using the wiki.