Changes between Initial Version and Version 1 of WKTRasterTutorial01


Ignore:
Timestamp:
06/10/10 13:09:13 (14 years ago)
Author:
pracine
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WKTRasterTutorial01

    v1 v1  
     1= Intersecting vector buffers with large raster coverage using PostGIS WKT Raster =
     2
     3This exercise will show you how to do a very classical raster/vector analysis with PostGIS WKT Raster. Basically the problem is to compute the mean elevation for buffers surrounding a series of point representing caribou observations. You can do this kind of analysis pretty easily in any GIS package but what is special here and is not easy to do in ANY GIS package is the size of the datasets used (900 MB or raster data), the simplicity of the queries and the speed at which we will get the results.
     4
     5We will describe the steps mainly on Windows, but Linux gurus will have no problem writing the equivalent commands in their favourite OS. We assume that PostgreSQL, PostGIS and PostGIS WKT Raster are well installed. Refer to the readme file of each software to install them properly. The version used are:
     6
     7 * PostgreSQL 8.4
     8 * PostGIS 1.5.1
     9 * WKT Raster: We compiled the last code available.
     10 * GDAL 1.6
     11 * Python 2.6
     12 * GDAL 1.6 for Python 2.6
     13
     14We will use two different datasets:
     15
     16 * The first dataset is a point shapefile of 814 caribou observations done by airplane in the northern part of the province of Quebec by the provincial ministry of natural ressources between 1999 and 2005. We will see later in the tutorial how you can easily produce a similar point cover.
     17
     18 * The second dataset is a series of 13 SRTM elevation rasters covering most of the province of Quebec. These free rasters, available on the web, are 6000x6000 pixels each, have a pixel size of about 60 x 90 meters and weight a total of 900 MB once unzipped. There are many versions of SRTM. We choose the one at ftp://srtm.csi.cgiar.org/. You can find a pretty good description of this dataset in Wikipedia (http://en.wikipedia.org/wiki/SRTM).
     19
     20We will import the shapefile and the raster coverage into PostGIS, create buffers around each point and then intersect those buffers with the elevation coverage in order to get a weighted mean elevation for each buffer. We will have a graphical look at the different results along the way using OpenJUMP.
     21
     22== Loading the caribou points shapefile into PostGIS ==
     23
     24This operation is very straightforward for any casual PostGIS user. There are many ways to import a shapefile into PostGIS. You can use the classical loader provided with PostGIS (shp2pgsql.exe) or the GUI one integrated into pgAdmin III. You can also use GDAL/OGR, OpenJUMP or QGIS. The classical loader and GDAL/ORG are command line program and the PostGIS GUI one, OpenJUMP and QGIS are graphical user interfaces (GUI). PostGIS WKT Raster does not have a GUI loader yet. It comes with a Python command line loader (gdal2wktraster.py, thanks to Mateusz Loskot) very similar to PostGIS shp2pgsql.exe. We will import both datasets using those loaders in order to appreciate the similitudes.
     25
     26Here is the procedure to import the caribou point shapefile:
     27
     28Launch a command prompt (or a shell under Linux) and make sure you can execute "shp2pgsql". If this works, you should get the help for the shp2pgsql command. If not, make sure the folder where it is located ("PostgreSQL_Installation_Path/bin) is in you PATH environement variable or simply type the complete path to shp2pgsql like this:
     29
     30        >"c:/Program Files/PostgreSQL/8.X/bin/shp2pgsql"
     31       
     32shp2pgsql works in two steps: The first step is to create a .sql file that you then have to execute using psql, an application which comes with PostgreSQL. To convert the caribou point shapefile in cariboupoint.sql, we do the following:
     33
     34        >shp2pgsql.exe -s 32198 -I C:\Temp\Pierre\Data\CaribouPoints\cariboupoints.shp > C:\Temp\
     35Pierre\Data\CaribouPoints\cariboupoints.sql
     36
     37The -s option tells the loader that the shapefile points are located in the Quebec Lambert spatial reference system which has SRID number 32198 in the PostGIS spatial_ref_sys table.
     38
     39The -I tell the loader to add a line to create a spatial index on the table.
     40
     41The resulting sql commands are copied to the .sql file using the ">" pipe command.
     42
     43You can then execute the sql file with psql like this:
     44
     45        >psql -f C:\Temp\Pierre\Data\CaribouPoints\cariboupoints.sql demodb     
     46       
     47== Visualizing the caribou point in OpenJUMP ==
     48
     49The favorite tool of the PostGIS community to display geometries loaded into PostGIS is OpenJUMP. This java open source software is very easy to install and work very well with PostGIS. Once OpenJUMP have been downloaded and installed, you can view the caribou points by following these steps:
     50
     511) Start OpenJUMP and select "Layer->Run Datastore Query" from the menu.
     52
     532) Create a connection to PostGIS by clicking the icon on the right of the connection field and  adding a new connection with the name of your choice, the "localhost" server, at port "5432", on your database instance (here "demodb") with your PostgreSQL username and password. Click OK to establish the connection.
     54
     553) In the query field, type the following command:
     56
     57        SELECT id, ST_AsBinary(the_geom)
     58        FROM cariboupoints
     59       
     60You should see your caribou observation point appear in the OpenJUMP map window.
     61       
     62== Loading the SRTM elevation rasters into PostGIS WKT Raster ==
     63
     64The process to load the elevation raster data is very similar. The name of the loader is gdal2wktraster.py. It is a Python program dependent on the Python package and the GDAL Python package.
     65
     66To display gdal2wktraster.py help, simply do:
     67
     68        >gdal2wktraster.py -h
     69
     70We downloaded 8 zipped files. If you have 7-Zip installed, you can simply select all the zip files, right-click on them and select "7-Zip->Extract files..." 7-Zip will extract each TIF file in the same folder. You can then create another folder and move them there. Each file is 6000 pixels wide by 6000 pixel high. You can quickly preview the images if you have IrfanView installed. Efficient raster/vector analysis with PostGIS WKT Raster requires the raster files to be split into tiles in the database. Fortunately gdal2wktraster.py has an option to tile the images the size we like. We will tile them into 100x100 pixels resulting in 60 * 60 * 13 = 46800 tiles. gdal2wktraster.py also accepts wildcard so we will be able to load our 8 big raster in one sole command.
     71
     72To load the raster files in PostGIS WKT Raster, do something like:
     73
     74        >gdal2wktraster.py -r C:\Temp\Pierre\Data\SRTM\tif\*.tif -t srtm_tiled -s 4326 -k 100x100 -I > C:\Temp\Pierre\Data\SRTM\srtm.sql
     75       
     76The -r option indicate the series of raster we want to load in the database.
     77
     78The -t option specify the table in which we want to load them.
     79
     80Similar to shp2pgsql.exe, the -s option is required to specify the spatial reference system ID. In this case the raster are in WGS 84 having the SRID number 4326. Unlike some GIS software, PostGIS does not support on the fly reprojection so that we cannot do operations on table stored in different spatial reference systems. As we could see, the caribou point layer was in the projected spatial reference system NAD 83/Quebec Lambert", when the SRTM images are in the geographic spatial reference system "WGS 84". We will have to deal with this later.
     81
     82The last -k option specify the size of the tiles we want to load in PostGIS.
     83
     84The -I option tells the loader to create a spatial index on the raster tile table. If you forget to add this option you can always add the index by executing a SQL command similar to this one:
     85
     86        CREATE INDEX srtm_tiled_gist_idx ON srtm_tiled USING GIST (ST_ConvexHull(rast));
     87
     88The result of the gdal2wktraster.py command is a 1.8 GB .sql file produced in 1 minute on my brand new Lenovo X201 labtop (Intel Core i5, 1.17 GHz, 3 GB or RAM :-).
     89
     903) The same way we loaded the caribou point sql command file, we will load this sql file using psql:
     91
     92        >psql -f C:\Temp\Pierre\Data\SRTM\srtm.sql demodb
     93
     94This process took less than 4 minutes. You can quickly verify the success of the loading operation by looking at the number of row present in the srtm_tiled table. There should be 46800 rows.
     95
     96You can then visualise the extent of each of those 46800 rasters by typing the following command in the OpenJUMP "Run Datastore Query" dialog:
     97
     98        SELECT rid, ST_AsBinary(rast::geometry) FROM srtm_tiled
     99       
     100You can also view the complete area covered by the raster coverage as one geometry by doing this:
     101
     102        SELECT ST_AsBinary(ST_Buffer(ST_Union(rast::geometry), 0.000001)) FROM srtm_tiled
     103       
     104As you can see, unlike most raster coverage loaded into a GIS, the area covered by this single table raster coverage is not stricly rectangular and the two big missing square areas are not filled with nodata values. Here you have only one table for the whole coverage which is build from many raster files loaded in a single step. You don't need 13 tables to store 13 rasters. We did this with one band SRTM files in TIF format, but it could have been 3 bands BMP files of JPEG and the total size of the coverage does not really matter (PostgreSQL has a limit of 32 Terabytes)...
     105
     106If you want to go to the pixel level and verify the integrity of the values associated to a sample of the raster, (thanks to Jorge Arevalo and GDAL) you can view a vectorization of one of the tile (vectorizing all the table would be way too long) in OpenJUMP. Just type the following SQL query in the same query dialog:
     107
     108        SELECT ST_AsBinary((ST_DumpAsPolygons(rast)).geom), (ST_DumpAsPolygons(rast)).val
     109        FROM srtm_tiled
     110        WHERE rid=34
     111       
     112You will notice that a small area in the upper left corner of the tile is missing. This is the nodata value area. Any analysis function in PostGIS WKT Raster takes the nodata value into account. That means those area will be ommited from the operation. For example in the upcoming intersection operation, if one buffer falls in a nodata value area, st_intersection() and st_intersects() will ignore any part of the raster with nodata values possibly resulting in no intersection at all.
     113
     114You can verify that your raster has a nodatavalues set and the effective value of the nodata value by doing this query:
     115
     116        SELECT ST_BandHasNodataValue(rast), ST_BandNodataValue(rast)
     117        FROM srtm_tiled
     118        LIMIT 1
     119
     120I you want a query to ignore the nodata value set and treat it like any other value, you can always replace any rast argument with ST_SetBandHasNodataValue(rast, FALSE) like in this query:
     121
     122        SELECT ST_AsBinary((ST_DumpAsPolygons(ST_SetBandHasNodataValue(rast, FALSE) )).geom), (ST_DumpAsPolygons(ST_SetBandHasNodataValue(rast, FALSE))).val
     123        FROM srtm_tiled
     124        WHERE rid=XXX
     125
     126== Producing a random point table similar to the caribou shapefile ==
     127
     128Now that we know the extent of our elevation data, we can produce a table similar to the caribou_point one. For this we will use a function that generate a certain number of point geometry inside a polygon. Copy this function in the pgAdmin III query window and execute it:
     129
     130CREATE OR REPLACE FUNCTION ST_RandomPoints(geom geometry, nb int)
     131    RETURNS SETOF geometry AS
     132    $$
     133    DECLARE
     134                pt geometry;
     135                xmin float8;
     136                xmax float8;
     137                ymin float8;
     138                ymax float8;
     139                xrange float8;
     140                yrange float8;
     141                srid int;
     142                count integer := 0;
     143                bcontains boolean := FALSE;
     144                gtype text;
     145    BEGIN
     146        SELECT ST_GeometryType(geom)
     147        INTO gtype;
     148        IF ( gtype != 'ST_Polygon' ) AND ( gtype != 'ST_MultiPolygon' ) THEN
     149            RAISE EXCEPTION 'Attempting to get random point in a non polygon geometry';
     150        END IF;
     151                SELECT ST_XMin(geom), ST_XMax(geom), ST_YMin(geom), ST_YMax(geom), ST_SRID(geom)
     152                INTO xmin, xmax, ymin, ymax, srid;
     153                SELECT xmax - xmin, ymax - ymin
     154                INTO xrange, yrange;
     155
     156                WHILE count < nb LOOP
     157                        SELECT ST_SetSRID(ST_MakePoint(xmin + xrange * random(), ymin + yrange * random()), srid)
     158                        INTO pt;
     159                        SELECT ST_Contains(geom, pt)
     160                        INTO bcontains;
     161                        IF bcontains THEN
     162                                count := count + 1;
     163                                RETURN NEXT pt;
     164                        END IF;
     165                END LOOP;
     166                RETURN;
     167        END;
     168    $$
     169    LANGUAGE 'plpgsql' IMMUTABLE STRICT;
     170
     171You can then generate a fake cariboupoints table with 814 points doing this query:
     172
     173        CREATE TABLE cariboupoints AS
     174        SELECT generate_series(1,1000) id,
     175               ST_RandomPoint(the_geom, 1000) the_geom
     176        FROM (SELECT ST_Transform(ST_Extent(rast::geometry), 32198) FROM srtm_tiled) foo
     177
     178Similarly you can visualise your own caribou point layer with this OpenJUMP query:
     179
     180        SELECT id, ST_AsBinary(the_geom)
     181        FROM cariboupoints
     182
     183Now that we have our two coverages loaded in the database and that we are confident about their values and spatial reference, let's start doing some analysis queries...
     184
     185== Making buffers around the caribou points ==
     186
     187This operation is also very straightforward for any casual PostGIS user. The caribou point table is in a spatial reference system suitable for measurements in meters so we can build our buffers in this system and then convert them to the raster one later in order to intersects them. We will make 1 km buffers around our points and save them as a new table. We do this operation in a pgAdmin III query window (not in OpenJUMP):
     188
     189        CREATE TABLE cariboupoint_buffers AS
     190        SELECT id, ST_Buffer(the_geom, 1000) the_geom
     191        FROM cariboupoints
     192       
     193Again you can display your buffers in OpenJUMP with this query:
     194
     195        SELECT id, ST_AsBinary(the_geom)
     196        FROM cariboupoint_buffers
     197       
     198We then reproject our buffers to WGS 84 so we can intersect them with our raster coverage:
     199
     200        CREATE TABLE cariboupoint_buffers_wgs AS
     201        SELECT id, ST_Transform(the_geom, 4326) the_geom
     202        FROM cariboupoint_buffers
     203       
     204Again the same kind of query to display them in OpenJUMP and confirm that they overlap with the raster coverage and see that now they are a bit squashed:
     205
     206        SELECT id, ST_AsBinary(the_geom)
     207        FROM cariboupoint_buffers_wgs
     208       
     209We could have done those two queries in one:
     210
     211        CREATE TABLE cariboupoint_buffers_wgs AS
     212        SELECT id, ST_Transform(ST_Buffer(the_geom, 1000), 4326) the_geom
     213        FROM cariboupoints
     214       
     215We then create a spatial index on this last table to make sure the next intersection operation will be done as quickly as possible:
     216
     217        CREATE INDEX cariboupoint_buffers_wgs_geom_idx ON cariboupoint_buffers_wgs USING GIST (the_geom);
     218       
     219== Intersecting the caribou buffers with the elevation rasters ==
     220
     221Our two coverages are now ready to be intersected. We will use the brand new st_intersection() function and st_intersects() operator which operates directly on raster and geometries by vectorizing only the necessary part of the raster before doing the intersection and is one of the main feature of PostGIS WKT Raster: 
     222
     223        CREATE TABLE caribou_srtm_inter AS
     224        SELECT id,
     225              (st_intersection(rast, the_geom)).geom the_geom,
     226              (st_intersection(rast, the_geom)).val
     227        FROM cariboupoint_buffers_wgs,
     228             srtm_tiled
     229        WHERE st_intersects(rast, the_geom)
     230       
     231This query takes about 30 minutes to complete. It can greatly be optimized by invoquing the st_intersection only once with the help of a subquery producing record values that we split later in the main query like this:
     232
     233        CREATE TABLE caribou_srtm_inter AS
     234        SELECT id,
     235               (gv).geom the_geom,
     236               (gv).val
     237        FROM (SELECT id,
     238                     st_intersection(rast, the_geom) gv
     239              FROM srtm_tiled,
     240                   cariboupoint_buffers_wgs
     241              WHERE st_intersects(rast, the_geom)
     242             ) foo
     243       
     244This version takes only 17 minutes. Almost half the time spent by the preceding one... This little increase in complexity certainly worths the gain in performance.
     245
     246Once again you can visualise the result of the query by doing this query in OpenJUMP (you might have to increase RAM allowed to OpenJUMP by changing "-Xmx256M" to "-Xmx1024M" in "OpenJUMPInstallPath/bin/openjump.bat"):
     247
     248        SELECT id, val, ST_AsBinary(the_geom)
     249        FROM caribou_srtm_inter
     250
     251One approach to make this operation without PostGIS WKT Raster would have been to first convert our 13 rasters to 13 shapefiles, import them in PostGIS and do a traditionnal intersection query between geometries only. The problem is that converting only one of those SRTM file is very difficult... The vector version of only one of those files exceed the limit of 2 GB imposed on shapefiles and the GML equivalent file is more than 10 GB... The WKT Raster approach goes like this: "since it is mostly impossible to convert those raster to vector, load them all into PostGIS as is and only the smallest part of them raster will be vectorized to do the requested operation", this using the very efficient GiST index, the versatility of GDAL and of power of the postGIS vector engine. XXXXXWe will later examine alternative solutions to this particular vectorization problem as well as how we could do the complete raster/vector operation using other software.
     252
     253
     254== Summarizing the elevation values for each buffer ==
     255
     256The last step of our analysis is to summarize the elevation for each buffer by computing 814 weighted mean elevations. "Weighted mean elevation" means that if the area occupied by pixels with an elevation equal to 234 m is greater than the area occupy pixels with an elevation equal to 56 m, then 234 has a higher weight in the equation (proportional to it relative area).This query should do the work:
     257
     258        CREATE TABLE result01 AS
     259        SELECT id,
     260               sum(st_area(ST_Transform(the_geom, 32198)) * val) / sum(st_area(ST_Transform(the_geom, 32198))) as meanelev
     261        FROM caribou_srtm_inter1
     262        GROUP BY id
     263        ORDER BY id