Opened 11 years ago

Last modified 7 years ago

#2617 new enhancement

[raster] Enhanced mask object for raster map algebra

Reported by: Bborie Park Owned by: Bborie Park
Priority: medium Milestone: PostGIS Fund Me
Component: raster Version: master
Keywords: Cc:

Description

With the introduction of the mask parameter for ST_MapAlgebra, we can now enhance the mask for additional flexibility.

For a mask, be able to specify the POI pixel in the mask. This will permit truly user defined masks with even-valued dimensions and the POI does not need to be the center value of the mask.

So in addition to:

0, 0, 0
0, 1, 0
0, 0, 0

We want to be able to do:

1, 0, 0
0, 0, 0
0, 0, 0

The "1" above just indicates the POI. Indicating the POI could be done with another (!!!) funciton parameter comprised of a 2-element array for the POI.

mask =[
1, 1, 1, 1, 1
1, 1, 1, 1, 1
1, 1, 1, 1, 1
]

poi = [3, 2] // X, Y

Change History (9)

comment:1 by nclay, 11 years ago

This is my thought so far is to introduce a new array in the mask object, no-op, it would be used to pad out "Invalid" arrays to "Valid" ones so a mask:

mask = [
1,1,1,1
1,1,1,1
1,1,1,1
]

poi = [2,2]

would translate to:

mask = [
-,-,-,-,-,-,-
-,-,1,1,1,1,-
-,-,1,1,1,1,-
-,-,1,1,1,1,-
-,-,-,-,-,-,-
]

The cells with just a dash in them would be entered in to the no-op array as true, thus allowing us to just check the no-op array to see if a read is necessary and not write to the returning array. Does anyone have objections to this approach or another idea?

Thanks,

Nathaniel Hunter Clay

comment:2 by Bborie Park, 11 years ago

Assuming that mixed array is what's being passed at the SQL level, you can't do that. PostgreSQL arrays must have all elements of one datatype. The programmatic padding is also undesirable.

I was thinking that the use of distancex and distancey would only be present at the SQL level. Everything passed to the C side (rtpg_ and rt_) and within the C side would only be of a mask object.

comment:3 by nclay, 11 years ago

Dustymugs,

My first thought was to do the padding calculation at the C level or even doing offsets, although No-Op padding at the SQL level does intrigue me and may have use-cases.

One useful example the calculation of FCC HAAT ( Height Above Average Terrain ) as it would generate a very sparse mask array with the minimum radials and even the maximum radials depending on your data-set resolution, this however would also require a center value or poi value to be returned as that value would not be readily locatable in the returning array.

Another use case would be with an wedge annulus with the operations of sum() , avg(), etc. An updated ST_TPI with a user defined focal radius, would also benefit as well. Again this would require the poi value to be also returned separately. No-Op in both cases would greatly reduce the size of the array returned, also I believe that this is one of the bottle necks in translating arrays. Does the size of arrays effect the performance of translating arrays and would returning an extra value negate the performance gain?

Maybe we should consider this at the SQL level as it would give a good option, for problems with sparse and spatially irrelevant mask calculations, mainly aggregate functions.

As for the mixed array problem, when we currently use a non-weighted array we really are wanting to use and Integer but are forced for the same reason to cast it to a double precision, why could we not have the mask array be text and at the C level cast it to the appropriate type or set the appropriate bits? All we would be adding is an another look up on top of ones we already have to do and the overhead of casting mask elements from text to double once per MapAlgerbra call. Am I missing something significant that would negate the possible benefits to our users to not add this feature/option?

Thanks,

Nathaniel Hunter Clay

comment:4 by Bborie Park, 11 years ago

Hmmm... Maybe I should explain what I'm thinking...

  1. I don't consider the following mask to be invalid.
mask = [
1,1,1,1
1,1,1,1
1,1,1,1
]

poi = [2,2]

Matter of fact, I think it's as valid as an odd-dimensioned array (both dimensions have odd-valued length). It's just that we have been making an assumption about the POI with odd-dimensioned arrays.

  1. I don't think we should be making an even-dimensioned array (at least one dimension has even-valued length) valid by adding additional elements. The mask provided as input is the mask carried throughout the code to the callback function. I expect that users providing even-dimensioned masks expect their callback to receive value arrays with matching dimensions.
  1. To make it easier for mask generating functions (such as the wedge) and ease the number of input parameters to ST_MapAlgebra, we should consider creating a composite type...

mapalgebra_mask = (

mask = ARRAY[]::double precision, weighted = BOOLEAN, poi = ARRAY[]::integer

)

  1. Ensuring the passage of any dimension array (matching the dimensions of the mask) to the callback would require passing the POI coordinates of the array. This could be done by overloading the pos array provided to each call of the callback function.

I hope that explains my thoughts and thought processes.

Also... one comment...

"As for the mixed array problem, when we currently use a non-weighted array we really are wanting to use and Integer but are forced for the same reason to cast it to a double precision, why could we not have the mask array be text and at the C level cast it to the appropriate type or set the appropriate bits?"

The non-weighted integer array is forced to be double precision because that is what is the function signature expects. You can specify the parameter as being of anyarray and then filter out permitted types in switch at line 646 of rtpg_mapalgebra.c.

http://www.postgresql.org/docs/9.3/static/datatype-pseudo.html

comment:5 by nclay, 11 years ago

  1. We are on the same page ALL masks are valid except for irregular (jagged).
    mask = [
    1,1,1,1
    1,1
    ]
    

Would be Invalid.

  1. Yes that was a kluge of an Idea, to add padding. However what came of it is that it, was an idea that it would be nice to have a way of expressing that you dont want a cell to be considered or returned at all. This has its limitations mainly to sparse masks and spatially irrelevant operations. An concrete example yet not very useful would be:
    mask = [
    3.14,-,-,-,3.14
     -,-,-,-,-
    3.14,-,-,-,3.14
    ]
    poi = [3,2]
    

This would return a 1D a array containing the values from the four corners of the mask multiplied by their respective weights, also the value at the poi would be returned for the common case where you would like to poi -,+,*,/ aggregate function.

This would be applicable to HAAT, where you are only really concerned with the values along 8 radials ( every 45 degrees ) from 2 to 16 km. Yes, currently you could pass in an mask array with nulls and receive a array with nulls in the respective positions but that is a lot of nulls to be only ignored when passed to an aggregate such as avg(). Why not through them out early on if the user explicitly ask for that? I would value your opinion on this.

  1. Again We're on the same page, except for the having the option explained in 2.
  1. Same as 3.

Comment.... I realized that wasn't right after I had sent that.... too late in the evening and not enough coffee. ;)

Thanks,

Nathaniel Hunter Clay

in reply to:  5 comment:6 by Bborie Park, 11 years ago

My comments are interspersed...

Replying to nclay:

  1. We are on the same page ALL masks are valid except for irregular (jagged).
    mask = [
    1,1,1,1
    1,1
    ]
    

Would be Invalid.

Correct. Not to mention that PostgreSQL does not permit ragged arrays.

  1. Yes that was a kluge of an Idea, to add padding. However what came of it is that it, was an idea that it would be nice to have a way of expressing that you dont want a cell to be considered or returned at all. This has its limitations mainly to sparse masks and spatially irrelevant operations. An concrete example yet not very useful would be:
    mask = [
    3.14,-,-,-,3.14
     -,-,-,-,-
    3.14,-,-,-,3.14
    ]
    poi = [3,2]
    

This would return a 1D a array containing the values from the four corners of the mask multiplied by their respective weights, also the value at the poi would be returned for the common case where you would like to poi -,+,*,/ aggregate function.

This would be applicable to HAAT, where you are only really concerned with the values along 8 radials ( every 45 degrees ) from 2 to 16 km. Yes, currently you could pass in an mask array with nulls and receive a array with nulls in the respective positions but that is a lot of nulls to be only ignored when passed to an aggregate such as avg(). Why not through them out early on if the user explicitly ask for that? I would value your opinion on this.

Isn't the usage of dash (-) the same as using NULL to indicate that that pixel should be ignored?

Convert that 2D array to a 1D array? How do we specify placement (e.g. left to right, top to bottom) and why (e.g. why left to right, top to bottom instead of bottom to top, right to left?)? I think there will be too many assumptions made here and this would be non-obvious to end-users.

As for "Why not through them out early on if the user explicitly ask for that?", my response would be: What did the user explicitly ask for? (Yes, question with a question. But I have an answer! ;-)

From what the user has provided as input (assuming the current proposed mask composite type), we only know that the user wants to act upon those pixels indicated in the mask. We copy the mask structure for creating the value array to prevent confusion. One way to let the user decide to have a compact value array would be to add another boolean parameter to the proposed mask composite type...

compact = BOOLEAN

I think we're on the same page on this.

comment:7 by nclay, 11 years ago

Isn't the usage of dash (-) the same as using NULL to indicate that that pixel should be ignored? No, a NULL indicates that a NULL should be returned. A (-) indicates an effective skip, eg do not return anything at all. However with the purposed compact option a NULL could be used here to indicate the same.

Convert 2d array to 1D array, placement and why. The reason in my mind to convert the 2D array to a 1D array is to make it abundantly clear that when you compact a mask, you loose effective ways (other than counting cells and that would be determined by implementation) of predetermining placement of values within the returning array. This would be done to allow for easy building of a array or list of poi's to be returned (at the C level) with little regard for order as some rows would have x values and others may have y values and another may have m values... The only predetermined placement of a value in the 1D array would be the last value in the 1D array being the specified, poi value. In my mind.

This is a new topic:

poiList = [ [0,0,3.l4],[0,4,3.14],[2,0,3.14],[2,4,3.14]]
eg.
poilist = [[x,y,mValue],...,[poix,poiy,1]]

Furthermore, I think we should move toward a list of poi's, so that we can do optimizations for sparse masks. Such as presetting dominate determinable values like 0 or NULL using memory setting. Thus only iterating over masks values that are non-deterministic.

However, idea should be carefully weighed and planed, as to not detrimentally affect the performance of dense masks. This may come down to a user definable optional hint:

sparse = BOOLEAN

when not specified (NULL) the MapAlgebra will try its best to determine whether to preset values or not and compress the poi list ( I am not saying to dynamically compress the output array just the internal representation of the mask. ) eg. dropping the deterministic values from the poi list and setting the dominate deterministic value in the output array prior to iterating over the poiList.

An example would be: A weighted mask with a large population of its cells set to 0, lets say greater than 50%. This analysis could be done while building the poiList from the mask.

Also, If the returning of a 1D array when compacted is a sticking point for you then I can live with returning a 2D array then too. However you would have the issues as stated above.

in reply to:  7 comment:8 by Bborie Park, 11 years ago

Replying to nclay:

Isn't the usage of dash (-) the same as using NULL to indicate that that pixel should be ignored? No, a NULL indicates that a NULL should be returned. A (-) indicates an effective skip, eg do not return anything at all. However with the purposed compact option a NULL could be used here to indicate the same.

Effective skip = no value = NODATA? So what would the callback function receive? From what I can see in the code, NULL values in the mask set the nodata flag to true for that position in the mask.

Convert 2d array to 1D array, placement and why. The reason in my mind to convert the 2D array to a 1D array is to make it abundantly clear that when you compact a mask, you loose effective ways (other than counting cells and that would be determined by implementation) of predetermining placement of values within the returning array. This would be done to allow for easy building of a array or list of poi's to be returned (at the C level) with little regard for order as some rows would have x values and others may have y values and another may have m values... The only predetermined placement of a value in the 1D array would be the last value in the 1D array being the specified, poi value. In my mind.

I find the lack of a high degree of obviousness disconcerting and leaves me with significant reservations of the compact mode.

This is a new topic:

poiList = [ [0,0,3.l4],[0,4,3.14],[2,0,3.14],[2,4,3.14]]
eg.
poilist = [[x,y,mValue],...,[poix,poiy,1]]

Furthermore, I think we should move toward a list of poi's, so that we can do optimizations for sparse masks. Such as presetting dominate determinable values like 0 or NULL using memory setting. Thus only iterating over masks values that are non-deterministic.

An interesting idea that should require a significant amount of thinking. It does sound like it is ideal for the sparse situation.

However, idea should be carefully weighed and planed, as to not detrimentally affect the performance of dense masks. This may come down to a user definable optional hint:

sparse = BOOLEAN

when not specified (NULL) the MapAlgebra will try its best to determine whether to preset values or not and compress the poi list ( I am not saying to dynamically compress the output array just the internal representation of the mask. ) eg. dropping the deterministic values from the poi list and setting the dominate deterministic value in the output array prior to iterating over the poiList.

An example would be: A weighted mask with a large population of its cells set to 0, lets say greater than 50%. This analysis could be done while building the poiList from the mask.

Also, If the returning of a 1D array when compacted is a sticking point for you then I can live with returning a 2D array then too. However you would have the issues as stated above.

Traversing an array is always O(N) where N is the number of elements. A smaller array can be traversed faster than a larger array.

I'd suggest creating a new ticket for discussing the list of POI approach as that is completely more radical and a more appropriate alternative to having compacted masks.

comment:9 by robe, 7 years ago

Milestone: PostGIS FuturePostGIS Fund Me

Milestone renamed

Note: See TracTickets for help on using tickets.