Opened 12 years ago

Closed 10 years ago

#2247 closed enhancement (fixed)

[raster] Create ST_CreateOverviews()

Reported by: dzwarg Owned by: strk
Priority: medium Milestone: PostGIS 2.2.0
Component: raster Version: master
Keywords: raster, overviews Cc:

Description

The overview creation is bundled into the loader.

Refactor the overview creation into an SQL function, so overviews can be created consistently, and in one place.

Change History (33)

comment:1 by dzwarg, 12 years ago

After discussion with Pierre and Bborie, this is waiting for ST_CreateEmptyCoverage() and ST_SetValues(raster).

comment:3 by Bborie Park, 12 years ago

Milestone: PostGIS Future

comment:4 by strk, 10 years ago

Type: defectenhancement

ST_CreateOverviews is specified here: http://trac.osgeo.org/postgis/wiki/WKTRaster/SpecificationWorking03

I could not find ST_CreateEmptyCoverage specification, where is it ?

comment:5 by dzwarg, 10 years ago

Hello strk,

I updated the wiki doc to include the discussion from this ticket, and referenced the patch for st_makeemptycoverage.

z

comment:6 by strk, 10 years ago

Thanks, the ticket for ST_MakeEmptyCoverage is #2249

comment:7 by strk, 10 years ago

Keywords: overviews added

comment:8 by strk, 10 years ago

@dzwarg, given your pseudo-code for ST_CreateOverviews in the wiki, the function seems to be not dependent on ST_MakeEmptyCoverage, nor on ST_SetValues. Am I missing anything ?

comment:9 by strk, 10 years ago

I can't stop thinking that I'd like a function like this to help me with building a regular coverage from an irregular one. For example:

  1. A raster is imported tiled in the database
  2. The raster tiles are reprojected within the database

In step 2, the rasters stop having the same scale, so the concept of "factor" (the proposed argument to the function) becomes impossible to enforce.

On the other hand, if we specified directly an output scale and tile size, we could rescale and union all source tiles intersecting each of the output tiles no matter regularity of input.

At that point we'd also need to take an extent, to know when to stop (and maybe some flags to decide if we'd want to pad on the right and on the bottom if tilesize*scale != extsize.

What do you think @pracine, @dustymugs, @robe ? Maybe what I'm talking about should be a different signature completely...

comment:10 by strk, 10 years ago

I've a work in progress here: https://github.com/strk/postgis/tree/svn-trunk-createoverviews

currently fighting with correctly select 2factor tiles from input (I seem to be always getting more). Will be instructive when it comes to think about the API.

comment:11 by strk, 10 years ago

Milestone: PostGIS FuturePostGIS 2.2.0

I've got the selection code fixed. It takes 1 minute to produce a x2 overview of a 3655 tiles coverage (256x256 pixels x 1 band). No clipping is performed so time would get worst once that is introduced (to cleanly handle irregular tiles).

Also, it keeps only working for aligned coverage in input, but maybe I'll keep this function simple and then eventually we'll do a C implemented generalized version turning this one into a wrapper.

comment:12 by strk, 10 years ago

Should the function create the overview constraints ? The spec do not say \cc @pracine

in reply to:  8 ; comment:13 by pracine, 10 years ago

Replying to strk:

@dzwarg, given your pseudo-code for ST_CreateOverviews in the wiki, the function seems to be not dependent on ST_MakeEmptyCoverage, nor on ST_SetValues. Am I missing anything ?

ST_MakeEmptyCoverage() is useful in the irregular case (the last ELSE specifying Resample(all the tiles) in the pseudo code). The only way to create overview from a messy coverage (with irregular and/or overlapping tiles) would be to regenerate each tiles at a lower resolution. However a good overview has to merge neighbouring tiles in order to be able to lower their resolution deeper and deeper (got the point?).

So before resampling, if the base coverage is messy, we first need to resample it to a clean regular coverage. Thus the need for the ST_MakeEmptyCoverage() function. Then you can resample the messy coverage using tiles from the newly empty coverage as reference raster tiles.

It is not an easy job to go from a messy coverage to a clean one and it is a very desirable feature. Then the question is "How to resample to the regular tiles?" You can use ST_Resample() and union tile's fragments to complete tiles but this does not answer the problem of overlapping tiles (you might want the mean of two overlapping pixel value as the new value). A better (but slow) resampler is to use ST_MapAlgebra the same way I use it to rasterize a vector coverage in ST_ExtractToRaster (see Method 2 in http://geospatialelucubrations.blogspot.ca/2014/05/a-guide-to-rasterization-of-vector.html) but using the already implemented pixel extraction methods from the ST_ExtractPixelCentroidValue4ma call back: FIRST_RASTER_VALUE_AT_PIXEL_CENTROID, MIN_OF_RASTER_VALUES_AT_PIXEL_CENTROID, MAX_OF_RASTER_VALUES_AT_PIXEL_CENTROID, MEAN_OF_RASTER_VALUES_AT_PIXEL_CENTROID. There is also COUNT_OF_RASTER_VALUES_AT_PIXEL_CENTROID, SUM_OF_RASTER_VALUES_AT_PIXEL_CENTROID, STDDEVP_OF_RASTER_VALUES_AT_PIXEL_CENTROID, RANGE_OF_RASTER_VALUES_AT_PIXEL_CENTROID but they are likely for other applications. Using the ST_ExtractPixelCentroidValue4ma callback function in a way similar to ST_ExtractToRaster you can resample overlapping coverage :-) See Objective FV.27 from the specs. Most of it is in ST_ExtractToRaster(). I guess we just need a new signature to extract from a raster coverage instead of a vector coverage. See also ST_GlobalRasterUnion() just after ST_ExtractToRaster() which is an alternative to ST_Union using the raster callback methods.

in reply to:  13 comment:14 by pracine, 10 years ago

Replying to pracine:

Replying to strk:

@dzwarg, given your pseudo-code for ST_CreateOverviews in the wiki, the function seems to be not dependent on ST_MakeEmptyCoverage, nor on ST_SetValues. Am I missing anything ?

ST_MakeEmptyCoverage() is useful in the irregular case (the last ELSE specifying Resample(all the tiles) in the pseudo code). The only way to create overview from a messy coverage (with irregular and/or overlapping tiles) would be to regenerate each tiles at a lower resolution. However a good overview has to merge neighbouring tiles in order to be able to lower their resolution deeper and deeper (got the point?).

So before resampling, if the base coverage is messy, we first need to resample it to a clean regular coverage. Thus the need for the ST_MakeEmptyCoverage() function. Then you can resample the messy coverage using tiles from the newly empty coverage as reference raster tiles.

It is not an easy job to go from a messy coverage to a clean one and it is a very desirable feature. Then the question is "How to resample to the regular tiles?" You can use ST_Resample() and union tile's fragments to complete tiles but this does not answer the problem of overlapping tiles (you might want the mean of two overlapping pixel value as the new value). A better (but slow) resampler is to use ST_MapAlgebra the same way I use it to rasterize a vector coverage in ST_ExtractToRaster (see Method 2 in http://geospatialelucubrations.blogspot.ca/2014/05/a-guide-to-rasterization-of-vector.html) but using the already implemented pixel extraction methods from the ST_ExtractPixelCentroidValue4ma call back: FIRST_RASTER_VALUE_AT_PIXEL_CENTROID, MIN_OF_RASTER_VALUES_AT_PIXEL_CENTROID, MAX_OF_RASTER_VALUES_AT_PIXEL_CENTROID, MEAN_OF_RASTER_VALUES_AT_PIXEL_CENTROID. There is also COUNT_OF_RASTER_VALUES_AT_PIXEL_CENTROID, SUM_OF_RASTER_VALUES_AT_PIXEL_CENTROID, STDDEVP_OF_RASTER_VALUES_AT_PIXEL_CENTROID, RANGE_OF_RASTER_VALUES_AT_PIXEL_CENTROID but they are likely for other applications. Using the ST_ExtractPixelCentroidValue4ma callback function in a way similar to ST_ExtractToRaster you can resample overlapping coverage :-) See Objective FV.27 from the specs. Most of it is in ST_ExtractToRaster(). I guess we just need a new signature to extract from a raster coverage instead of a vector coverage. See also ST_GlobalRasterUnion() just after ST_ExtractToRaster() which is an alternative to ST_Union using the raster callback methods.

If you make ST_CreateOverview dependent on something from the Addons then it should also be part of the Addons (until we get ST_ExtractToRaster or something better in the core)...

comment:15 by strk, 10 years ago

Pierre, the current version of the function I wrote produces the so-called "virtual grid" internally, so doesn't need any ST_MakeEmptyCoverage. Take a look here: https://github.com/strk/postgis/tree/svn-trunk-createoverviews

The problem is in the interface itself. If the function takes table name, field name and scale factor, it won't be able to compute the target grid configuration unles the input coverage has a constant scale (for example).

Maybe we should move the discussion on the list...

comment:16 by pracine, 10 years ago

Hum... The original idea was to be able to compute overviews on table even if they don't have constraints. We wanted the function to work also on irregular coverage and to provide ST_MakeEmptyCoverage() as an extra function since it can be useful for other applications.

It is true that scale should be the same for all tiles in the table. Could it be otherwise?

comment:17 by strk, 10 years ago

Consider this scenario:

1) you import a raster tiled 2) you reproject each tile to a new SRS

As a result your tiles might end up having different scales.

Now you want to create a new regular tiled coverage. The output scales you want to be all the same, but the input scales are not. With the currently suggested signature of the ST_CreateOverviews function you cannot do that as you cannot specify the output scale, if not as a factor of the input scale, which in this case is not a fixed value.

It is true that given the current definition of an overview (raster_overviews view structure) you can not even express an "overview" for a table which does not have a fixed scale, so the operation described above might be something different from an "overview" creation.

comment:18 by strk, 10 years ago

So I have the basic version of ST_CreateOverview() working good now. It internally loops over every cell of the target tileset and for each picks overlapping source tiles, resample them, clip them to target cell, snaps to the source grid (to workaround #2920) and finally unions them.

The function returns a set of rasters, so cannot create constraints (raster or overview) because it doesn't deal with creating the table at all.

Now I could split it to have a lower-level companion explicitly taking the configuration of the target coverage (origin, scale) and an extent to work on. Such a function could be used to parallelize population of the target grid from multiple processes/threads.

At that point the higher-level could also be made into doing more, like creating the target table (for which it'd need a name) and all the constraints [ as it would not need to scan the target table again to find out some of the characteristics ].

What do you think about the plan ? Ideas about the names of the lower-level function (or should it be an overload...) ?

Current code, as usual, is in the github fork, with a testcase added: https://github.com/strk/postgis/tree/svn-trunk-createoverviews

comment:19 by strk, 10 years ago

Owner: changed from dzwarg to strk
Status: newassigned

comment:20 by strk, 10 years ago

I ended up making an overload: ST_CreateOverview(table, col, ext, scalex, scaley, tilewidth, tileheight);

But as mentioned above it may be unfair to call the result an "overview" given that overviews are bound to have a scale being a multiple of the source scale. Maybe it should be called ST_Retile ?

comment:21 by strk, 10 years ago

I've renamed the generic one to ST_Retile, and added a resampling algorithm optional parameter to both ST_Retile and ST_CreateOverview.

Testcases for both functions are in place now. This is a good time for an interface review.

For example, do we want any function also taking care of _creating_ the overview table (given the name contains the "create" word...)

in reply to:  20 comment:22 by pracine, 10 years ago

Replying to strk:

I ended up making an overload: ST_CreateOverview(table, col, ext, scalex, scaley, tilewidth, tileheight);

But as mentioned above it may be unfair to call the result an "overview" given that overviews are bound to have a scale being a multiple of the source scale. Maybe it should be called ST_Retile ?

Sounds good.

comment:23 by strk, 10 years ago

So finally I pushed everything as r12941 in trunk. We have:

  • ST_Retile returns SETOF raster
  • ST_CreateOverview returns REGCLASS

The former is more flexible and doesn't write to the db. The latter is more easy-to-use and goes on to create the tables and add the constraints.

comment:24 by strk, 10 years ago

Today I tweaked the function to protect against specifying a factor < 2 and to recommend creating an index on the source table if missing (it's really only needed if the source table has many tiles).

I was wondering if the function should also create a spatial index on the overview table, or do we leave that to the caller to decide upon ? Or we accept flags to determine that ?

Also, should the created overview table inherit permissions of the parent ?

in reply to:  24 comment:25 by pracine, 10 years ago

Replying to strk:

I was wondering if the function should also create a spatial index on the overview table, or do we leave that to the caller to decide upon ? Or we accept flags to determine that ?

If your function is a meta function creating new table and constraints, it should have an option to create indexes.

in reply to:  24 comment:26 by pracine, 10 years ago

Replying to strk:

Also, should the created overview table inherit permissions of the parent ?

Certainly.

comment:27 by pracine, 10 years ago

About documentation:

I would put those two new functions in the 9.3. Raster Constructors section of the doc instead.

I would explain what is the difference between ST_Tile and ST_Retile and between ST_Retile and ST_CreateOverview and ST_Resample.

I would also explain what happen when there is gaps and/or overlaps and/or rotation in the source coverage.

I would also add a section in 5.2.2. Raster Overviews explaining how to create overviews with SQL.

comment:28 by pracine, 10 years ago

I personnaly don't like your extent parameter. How should I define such an extent? Why such an extent would be different from the extent of the source if the main goal of the function is mainly to retile an entire coverage, not to clip it? A better approch is just to let users define a grid by defining any arbitrary corner of the grid (gridx and gridy as in ST_AsRaster) and to let the function compute the right extent. Then you make a signature taking a reference raster from which to borrow those parameters.

comment:29 by pracine, 10 years ago

About the parameters names:

-"ext" should be named "extent" -"sfx" and "sfy" should be named "scalex" and "scaley" like other raster functions -"tw" and "t"h should be named "tilewidth" and "tileheight"

In general I prefer long self explaining parameter's names so sometimes I just have to look at the signature to understand what is what.

What is the advantage of having a "regclass" parameter over a "name" parameter for the table and schema names like in AddRasterConstraints() and DropRasterConstraints() and others? I personnally have to search further in the doc to know how to specify this "regclass" argument. Annoying...

A general revision of parameter names would be welcome. This is essential for easier use of the API. For example, we now have four different ways to refer to schema, table and column names:

name rasttable, name rastcolumn (AddRasterConstraints, DropRasterConstraints, ) name schema_name, name table_name, name column_name (UpdateRasterSRID) regclass tab, name col (ST_Retile) text rastertable, text rastercolumn (ST_Histogram, ST_SummaryStats, ST_ValueCount)

Two different names for algorithm ("algorithm" and "algo").

Four different names for band numbers ("band", "nband", "bandnum" and "band_num")

Sometimes we name long arguments using underscores ("exclude_nodata_value") and sometimes we don't ("padwithnodata").

If we don't want to change that many signatures, we migth at least not introduce new ones...

comment:30 by strk, 10 years ago

About the extent parameter of the ST_Retile function: the use case for specifying one is to use different processes to retile different portions of the table. Especially useful for big coverages. If you want a single process you can either find the extent in the raster_columns view or compute it using st_extent(rast).

Allowing for a null (or omitted) extent could be an added facility, but as ST_CreateOverview is an easier-to-use wrapper to it already I'm not sure we need a more user-oriented signature in ST_Retile.

I agree about reducing parameter names (besides, those name are big pain in general as any name change requires a DROP of the old function...).

About regclass, the advantage is that you get a single value to univoquely identify a table. There's an implicit cast from 'unknown' (inputted text) and 'regclass' so you usually just name your table, with or without schema qualification, and PostgreSQL does the rest for you, making use of the search_path exactly as done when using normal sql. Very useful. I love that type. Learning new things is annoying, I know, but once you look it up, you're going to like it. It's not going anywhere. From a usage point of view you may consider it just a text representing your table, with values like 'mytab' or '"MyMixedCaseSchema"."MyMixedCaseTable"'. Maybe the PostGIS manual could contain a page that talks about this "regclass" type...

comment:31 by strk, 10 years ago

I noticed argument names are written manually in the manual source files. Robe: would writing consistent names there despite the actual signature help or make things worst ?

comment:32 by strk, 10 years ago

r12972 moves ST_Retile under "raster constructors" and adds a note about difference from ST_Tile, plus links ST_Resample. But then r12973 removes the references again as I realized the documented difference wasn't clear at all.

I don't plan to work on improved documentation or changed signatures for this before one or two weeks from now so if anyone feels like expressing better the various differences, please go ahead

comment:33 by strk, 10 years ago

Resolution: fixed
Status: assignedclosed

8 months with no feedback means this is ok for everyone, ST_CreateOverview is available, so closing.

Note: See TracTickets for help on using tickets.