Context Navigation

← Previous Ticket
Next Ticket →

#931 closed task (fixed)

[raster] ST_Mean

Reported by:	Bborie Park	Owned by:	Bborie Park
Priority:	medium	Milestone:	PostGIS 2.0.0
Component:	raster	Version:	master
Keywords:	history	Cc:

Description

A function to get the arithmetic mean of a raster's band.

ST_Mean(rast raster, nband int, ignore_nodata boolean) -> double

returns the mean

nband: index of band to process on

ignore_nodata: if TRUE, any pixel who's value is nodata is ignored.

ST_Mean(rast, 2, TRUE)

ST_Mean(rast raster, nband int) -> double

assumes ignore_nodata = TRUE

ST_Mean(rast, 2)

ST_Mean(rast raster, ignore_nodata boolean) -> double

assumes band index = 1

ST_Mean(rast, FALSE)

ST_Mean(rast raster) -> double

assumes band index = 1 and ignore_nodata = TRUE

ST_Mean(rast)

Four approximation functions are also proposed sacrificing some accuracy for speed, especially on large rasters (10000 x 10000).

ST_ApproxMean(rast raster, nband int, ignore_nodata boolean, sample_percent double precision) -> double

sample_percent: a value between 0 and 1 indicating the percentage of the raster band's pixels to consider when determining the min/max pair.

ST_ApproxMean(rast, 3, FALSE, 0.1)

ST_ApproxMean(rast, 1, TRUE, 0.5)

ST_ApproxMean(rast raster, ignore_nodata boolean, sample_percent double precision) -> double

assumes that nband = 1

ST_ApproxMean(rast, FALSE, 0.01)

ST_ApproxMean(rast, TRUE, 0.025)

ST_ApproxMean(rast raster, sample_percent double precision) -> double

assumes that nband = 1 and ignore_nodata = TRUE

ST_ApproxMean(rast, 0.25)

ST_ApproxMean(rast raster) -> double

assumes that nband = 1, ignore_nodata = TRUE and sample_percent = 0.1

ST_ApproxMean(rast)

Attachments (1)

st_mean.patch (7.9 KB ) - added by Bborie Park 14 years ago.: Incremental patch for ST_Mean. ST_SummaryStats patch is required for this patch.

Download all attachments as: .zip

Change History (10)

follow-up: 2 comment:1 by pracine, 14 years ago

How will this work on a tiled raster coverage?

in reply to: 1 comment:2 by Bborie Park, 14 years ago

Replying to pracine:

How will this work on a tiled raster coverage?

Would there be any issues? If the tiles were brought together using ST_Union or ST_Accum, it would be no different as passing a single tile. I may not understanding something...

follow-up: 4 comment:3 by pracine, 14 years ago

Using ST_Union on a 1 TB coverage would mean creating a one row, 1 TB objects... I don't think we want to do that. ST_Union should be used to generate bigger raster from a mimited number of tiles. Or in a SET function returning itself only tiles.

I guess the preferred general technique to make a global raster coverage function is to create a ST_Mean function accepting a table name and a raster column name and have this function to build a EXECUTE query over all the raster of that table. Something like:

SELECT ST_Mean('myrastercoveragetable', 'myrastercolumn')

The code of this function would look very much like the one I prototyped at the end of raster\scripts\plpgsql\st_histogram.sql. In the case of ST_Mean, example 7.

in reply to: 3 comment:4 by Bborie Park, 14 years ago

Good point on the absolutely obscene coverage. I think your ST_Mean prototype would need to be expanded out to encompass all the other summary stat functions, ST_SummaryStats, ST_MinMax and ST_StdDev.

SELECT ST_Mean('myrastercoveragetable', 'myrastercolumn')

One thing that does concern me about the prototype above is the lack of flexibility in filtering the coverage table on the fly. Maybe something like

CREATE TYPE summarystatsarg AS (
	rast raster,
	nband integer,
	hasnodata boolean,
	sample_percent double precision
);

CREATE OR REPLACE FUNCTION st_mean(coveragestats[])
	RETURNS int
	AS $$
	DECLARE
	BEGIN
		-- some code
	END;
	$$ LANGUAGE 'plpgsql';

SELECT st_mean((
	SELECT
		array_agg(stats)
	FROM (
		SELECT
			ROW(rast, 1, FALSE, 1)::coveragestats AS stats
		FROM tmax_2011
		WHERE observation_date = '2011-05-01'
	) AS t
));

The above would give the user the most amount of flexibility and control over what is evaluated in the function.

I took a look at example 7 of st_histogram.sql and have only one comment/note...

In the implementation I'm thinking of, a weighed mean may work better plus it goes hand-in-hand with a weighed standard deviation.

http://en.wikipedia.org/wiki/Weighted_mean

As I've been looking at the ST_MinMax function and refactoring it for the summary stats, I envision that the summary stats will be the base stats function call. The histogram and quantile functions will depend upon the base stats as the histogram needs the min/max values to set the bounds for the number of bins and the quantile function needs the ordered series of values to determine a percentile's value. The reason I believe this is the right way to go is that the summary stats can be computed in one pass. Histograms and quantiles usually require a second pass so I'd rather keep them as separate functions that are called if needed.

comment:5 by Bborie Park, 14 years ago

Status:	new → assigned

I've decided to go with your suggestion Pierre.

ST_Mean(rastertable text, rastercolumn text)

I realized as I was playing around with various possibilities that your way was the only way that a tile would be loaded one at a time and not overtly tax memory.

So, a set of variations of ST_Mean and ST_ApproxMean for handling large sets of raster tiles:

ST_Mean(rastertable text, rastercolumn text, nband int, hasnodata boolean) -> double precision

ST_Mean('tmax_2010', 'rast', 1, FALSE)

ST_Mean('precip_2011', 'rast', 1, TRUE)

ST_Mean(rastertable text, rastercolumn text, nband int) -> double precision

hasnodata is set to FALSE

ST_Mean('tmax_2010', 'rast', 1)

ST_Mean(rastertable text, rastercolumn text, hasnodata boolean) -> double precision

nband is set to 1

ST_Mean('precip_2011', 'rast', TRUE)

ST_Mean(rastertable text, rastercolumn text) -> double precision

nband is set to 1 and hasnodata is set to FALSE

ST_Mean('tmin_2009', 'rast')

Variations for ST_ApproxMean are:

ST_ApproxMean(rastertable text, rastercolumn text, nband int, hasnodata boolean, sample_percent double precision) -> double precision

ST_ApproxMean('tmax_2010', 'rast', 1, FALSE, 0.5)

ST_ApproxMean('precip_2011', 'rast', 1, TRUE, 0.2)

ST_ApproxMean(rastertable text, rastercolumn text, nband int, sample_percent double precision) -> double precision

hasnodata is set to FALSE

ST_ApproxMean('tmax_2010', 'rast', 1, 0.5)

ST_ApproxMean('precip_2011', 'rast', 1, 0.2)

ST_ApproxMean(rastertable text, rastercolumn text, hasnodata boolean, sample_percent double precision) -> double precision

nband is set to 1

ST_ApproxMean('tmax_2010', 'rast', FALSE, 0.5)

ST_ApproxMean('precip_2011', 'rast', TRUE, 0.2)

ST_ApproxMean(rastertable text, rastercolumn text, sample_percent double precision) -> double precision

nband is set to 1 and hasnodata is set to FALSE

ST_ApproxMean('tmax_2010', 'rast', 0.5)

ST_ApproxMean('precip_2011', 'rast', 0.2)

ST_ApproxMean(rastertable text, rastercolumn text) -> double precision

nband is set to 1, hasnodata is set to FALSE and sample_percent is set to 0.1

ST_ApproxMean('tmax_2010', 'rast')

ST_ApproxMean('precip_2011', 'rast')

Similar variations will be provided for ST_SummaryStats, ST_StdDev and ST_MinMax and corresponding approximation functions.

comment:6 by Bborie Park, 14 years ago

The mean returned in the coverage functions is a weighted mean of the means from the raster tile.

by Bborie Park, 14 years ago

Attachment:	st_mean.patch added

Incremental patch for ST_Mean. ST_SummaryStats patch is required for this patch.

comment:7 by Bborie Park, 14 years ago

Adds ST_Mean function, which builds upon ST_SummaryStats. Merges cleanly against r7145.

The following patches must be merged first for this patch:

ST_Band

ST_SummaryStats

comment:8 by Bborie Park, 14 years ago

Keywords:	history added
Resolution:	→ fixed
Status:	assigned → closed

Added in r7149

comment:9 by Bborie Park, 14 years ago

Milestone:	PostGIS Raster Future → PostGIS 2.0.0

Note: See TracTickets for help on using tickets.

Download in other formats: