wiki:DevWikiSpatialCollectionTutorial

Introduction

This page is intended to quickly demonstrate the major features of the spatial collection concept...a concept designed to erase the distinction between raster and vector data types. Using the spatial collection concept, developers can write code which performs useful spatial functions without regard for the nature of the data on which they are operating.

The fundamental premise of the spatial collection design is that all spatial data, no matter the form it takes, can provide answers to these two questions:

  1. Is point P included?
  2. What is the value at point P?

In its current form, the implementation of spatial collection always provides an answer to question 2, although strictly speaking, it is optional. A purely geometric object may neglect to provide any value information. Currently, the wrapper for purely geometric objects will return one of two user specified values: an "inside" value if point P is included, and an "outside" value if it is not. The wrapper for a raster object will actually lookup the value given the provided point.

The value returned by a spatial collection is restricted to numeric types. An array of numbers (e.g., a vector) is returned every time a point is queried. A given spatial collection will return the same length vector no matter what point is queried, and the ordinal position in the vector is significant (e.g., the item at index zero always represents the same thing.) When wrapping a raster type, the vector could represent values from one or more selected bands at point P. While an individual spatial collection will always return a vector of the same length, a different spatial collection may have a vector of a different length, where the elements represent different quantities. The salient point is that all locations within a single spatial collection are consistently reported, but a different collection may report them differently.

The general strategy for using the spatial collection framework is to wrap the actual data item with a spatial collection early, then write the majority of the working code against the spatial collection interface.

Wrapping rasters

Here we demonstrate a typical workflow where a function accepts two rasters. The first order of business is to retrieve the rasters and some "helper" data from postgresql and wrap them. The first step is retrieval.

        /* r1 is null, return null */
        if (PG_ARGISNULL(0)) PG_RETURN_NULL();
        r1_pg = (rt_pgraster *) PG_DETOAST_DATUM_SLICE(PG_GETARG_DATUM(0), 0, sizeof(struct rt_raster_serialized_t));

        /* r2 is null, return null */
        if (PG_ARGISNULL(1)) PG_RETURN_NULL();
        r2_pg = (rt_pgraster *) PG_DETOAST_DATUM_SLICE(PG_GETARG_DATUM(1), 0, sizeof(struct rt_raster_serialized_t));

....

In the interim, we load a list of desired bands from each raster as well as a (possibly null) index to the band we should use as a nodata value. We then wrap the rasters as follows:

        /* wrap r1 in a spatial collection */
        if (r1_hasnodata) {
                r1_sc = sc_create_pgraster_wrapper_nodata(r1_pg,
                                r1_bands, r1_bandnum, r1_nodata) ;
        } else {
                r1_sc = sc_create_pgraster_wrapper(r1_pg, r1_bands, r1_bandnum) ;
        }

        /* wrap r2 in a spatial collection */
        if (r2_hasnodata) {
                r2_sc = sc_create_pgraster_wrapper_nodata(r2_pg,
                                r2_bands, r2_bandnum, r2_nodata) ;
        } else {
                r2_sc = sc_create_pgraster_wrapper(r2_pg, r2_bands, r2_bandnum) ;
        }

Note that this version of the raster wrapper accepts a pg_raster, or the serialized version. The wrapper actually takes care of deserializing it and making the spatial collection object ready to answer the questions listed above. There is also a version of the raster wrapper which takes a deserialized rt_raster if you have one.

Note also that there are two versions of the raster wrapper: one which supports a "nodata" lookup and the other which assumes that all pixels contain data. The latter version is a little simpler to set up, and may offer a performance advantage. Both versions offer the opportunity to specify a list of bands which should be returned when a point is queried. If the list of bands (r1_bands or r2_bands, above) is not provided, then all of the raster's bands are returned.

Spatial Relationships as Spatial Collections

At this point, we have two spatial collections, not two rasters. As long as we only write code which uses the spatial collection interface, we do not have to change the code to handle different data types. Observe how the set of spatial relationships (intersection, union, difference and symdifference) is implemented against the spatial collection interface:

        /* create the evaluator */
        eval = sc_create_first_value_evaluator(r1_sc, r2_sc) ;
        if (eval != NULL) {
                /* create the "relation" operator */
                relation_op = sc_create_sync_relation_op(SPATIAL_PLUS_VALUE,
                                r1_sc, r2_sc, relation, eval) ;

Our intent here is to create a single spatial collection (relation_op), which answers the two questions above based on a spatial operation applied to two input collections. The answer to "Is point P included?" is straightforward. However, we are left with a choice as to how to generate the value associated with point P. In this case, we instantiate a "first value" evaluator, which will return the value from r1_sc if possible, and otherwise will return the value of r2_sc.

In the above, sc_create_first_value_evaluator instantiates an evaluator (one of the two interfaces related to a spatial collection) to provide answers to the value question. Although this is hardcoded in this instance, it does not have to be. This evaluator is then used when sc_create_sync_relation_op instantiates relation_op, which affects how relation_op generates and returns values when points are queried.

The salient point here is that value logic is separated from spatial logic. Different methods of determining values may be implemented and passed to the relation operator. This allows the relation operator code to be reused in other contexts (perhaps as a different function exposed to the user, perhaps as a utility object performing some supporting computation.) Likewise, the first-value evaluator may be reused as a component of any spatial collection where that behavior is appropriate.

In addition, relation_op is defined entirely in terms of the spatial collection interface. Therefore, the exact same code may be reused for other combinations of spatial data types: vector/raster; raster/vector; or vector/vector. The only difference is the very first step when the source data is initially wrapped into the spatial collection interface. Instead of four variants on roughly the same theme, there is one implementation to write and maintain.

Each of these spatial collections carries with it a GBOX named extent. When relation_op was instantiated, an approximate (or expected) extent was calculated based on the specified relation (intersection, union...) and the extents of the two inputs. It is important to point out that at this point, we have set up an operator, but we have not performed any actual computations. The extent of relation_op may be too big. It is best to take this with a grain of salt.

Creating an empty result raster

When creating a "raster result", one is typically talking about evaluating an expression at each grid cell in the raster which is to be returned. In the previous section, the creation of the relation_op operator set up the desired expression, but we have not yet performed any computations. We now need to define a raster which will contain the result, and then ask relation_op for a value over each of the grid cells.

                        /* make an empty raster to store the result */
                        result = rt_raster_new(res_width, res_height) ;

                        /* copy srid and geo transform (except for offsets) */
                        rt_raster_set_srid(result, r1_pg->srid) ;
                        rt_raster_set_scales(result, r1_pg->scaleX, r1_pg->scaleY) ;
                        rt_raster_set_skews(result, r1_pg->skewX, r1_pg->skewY) ;

                        /* compute new offsets because raster may be rotated
                         * w.r.t. the extent.
                         */
                        fit_raster_to_extent(res_extent, result) ;
                        
                        /* finally, wrap the relation operator to allow access by 
                         * indices of the result raster.
                         */
                        relation_op_aligned = 
                                        sc_create_raster_aligned_collection(relation_op, result) ;

                        if (relation_op_aligned != NULL) { 
                                /* sample the relation operator into the raster */
                                sc_sampling_engine(relation_op_aligned, result, NULL) ;
                        }
                        

Here we create a new raster with a width and height computed to maintain the same x and y resolution as r1_pg, fitting within the expected extent of the result (relation_op->extent). We then copy all of the geolocation information except for the offsets into the result from pg_r1. This means that the pixels will have the same x and y scale and the same rotation as pg_r1. There is no particular reason that this must be the case: pg_r1 is merely a convenient source of resolution and rotation information which is likely to be of interest to the user. Any geotransform parameters will do, as long as result is big enough to cover the expected result area and is in the same projection/srid. The function could even be defined to accept an empty raster from the user, if they want control over the output.

The last step, fit_raster_to_extent, calculates and sets appropriate values for x and y offsets which position result in the extent of relation_op. With the location and alignment of result finalized, we then make it possible to query relation_op by the indices (pixel locations) of result. The constructor sc_create_raster_aligned_collection will add the evaluateIndex and includesIndex methods to any spatial collection. Again, this could be any spatial collection, including one which is derived from a purely geometric data type. The affect of this is to align the spatial collection to the grid defined by the provided raster.

Finally, we get to the meat of any and all operations which return a raster filled with values: generating a value for every pixel with sc_sampling_engine. This is the topic of the next section. Note that in this section, we defined the geometry, but the raster itself is empty (meaning that it has no bands defined.)

Sampling a spatial collection

Now we have a way to generate values and we have a grid of points. These two come together in sc_sampling_engine. This function will create the appropriate number of new bands if necessary, but it can use existing bands if they are present. The only constraint is that the number of bands in the result raster must be the same as the number of elements returned by the spatial collection when it is asked for a value.

Again, this code is written entirely against the spatial collection interface. This same sampling engine could be re-used to "rasterize" any spatial collection, including a wrapper around a single geometry which only returns two values: inside or outside.

In the preliminaries, we have checked to ensure that the proper number of bands exist to receive data (or we have created them). We have also initialized a nodata vector (not a single nodata value). This is necessary because if the collection does not include a particular point, each band of the raster must still have a value set. The nodata vector allows for the possibility that different bands might have different nodata values. After these preliminaries, we have :

        /* now sample the raster */
        sample_pt = lwpoint_make2d(SRID_UNKNOWN, 0,0) ;
        for (j=0; j<height; j++) {
                for (i=0; i<width; i++) {
                        int band_num ;
                        VALUE *store_val ;

                        /* set up the sample point */
                        sample_pt_p4d.x = i ;
                        sample_pt_p4d.y = j ;
                        ptarray_set_point4d(sample_pt->point,0,&sample_pt_p4d) ;

                        /* evaluate the collection */
                        collection_val = sc_evaluateIndex(source, sample_pt) ;

                        /* store the returned value or the nodata vector */
                        store_val = collection_val ;
                        if (collection_val == NULL) {
                                store_val = nodata_val ;
                        }

                        /* copy the values to the raster */
                        for (band_num=0; band_num < coll_bands; band_num++) {
                                rt_band band ;

                                band = rt_raster_get_band(result, band_num) ;
                                rt_band_set_pixel(band, i, j, store_val->data[band_num]) ;
                        }
                }
        }

This function, sc_sampling_engine may be reused whenever it is necessary to populate a raster with values from a spatial collection. It may be used with any spatial collection. In this case, the spatial collection is a "two input" spatial collection which performs an operation on the inputs. It could also be used on a spatial collection which represents only a single input. It is also possible to use sc_sampling_engine to populate a raster with the result of an equation (e.g., something which does not ultimately based on either a vector or raster data type.

The salient point here is that sc_sampling_engine does not care how the values for each pixel are generated, nor does it care what those values are. It merely asks "What is the value at point P?" for every grid cell. It doesn't even care what algorithm was used to determine the size, shape, resolution or rotation of the raster.

Projecting

Not included in this example is the following implementation of the spatial collection interface:

SPATIAL_COLLECTION *
sc_create_projection_wrapper(SPATIAL_COLLECTION *wrapped,
                                     int32_t desired_srid,
                                     projPJ wrapped_proj, projPJ desired_proj );
void sc_destroy_projection_wrapper(SPATIAL_COLLECTION *dead);

Note again that we write and maintain only one version of the projection code. Because it is the query point which is projected (and not the base data from which the collection is derived), we need not be concerned with the nature of the base data. Wrap your spatial object with a collection, project the collection, then move on.

The constructor shown requires that you pass in a projPJ object because it is included in the liblwgeom library which does not have access to the spatial_ref_sys table in the database. It would be possible (and desirable) to put another constructor in the libpgcommon library which could generate a projPJ object based on the srid.

Potential directions

This section lists some ideas about potentially useful spatial collection implementations:

  • Implement a spatial collection which evaluates an expression, plugging in values from a single wrapped collection. (one input mapalgebra)
  • Implement a spatial collection which evaluates an expression, plugging in values from two wrapped collections. (two input mapalgebra)
  • Currently, single geometries and single rasters are the only spatial objects which can be wrapped. Implement "overviews" which can select a set of geometries or rasters from a table based on some criteria from the other columns in the table.
  • The current implementation is very much geared towards evaluating data at a point, which is exactly what is needed to populate a raster with values. To generate "geometry" results, it will be necessary to add an iterator over all the geometric objects in the collection to the interface. Both vector and raster data can also answer the question "What spatial objects do you contain?"

Summary

We have gone a long way to erasing the distinction between raster and vector data by articulating two questions which both data types can answer. From these two questions, a set of interfaces have been designed. Implementing code against these interfaces instead of against the base data types allows the same algorithms to be applied to different data. This in turn promotes reuse and reduces the number of combinations which must be explicitly coded and maintained.

Last modified 13 years ago Last modified on 09/16/11 21:07:38
Note: See TracWiki for help on using the wiki.