#2676 closed enhancement (wontfix)
r.neighbors on large rasters
Reported by: | dnewcomb | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | 7.4.1 |
Component: | Default | Version: | svn-trunk |
Keywords: | r.neighbors large rasters | Cc: | |
CPU: | x86-64 | Platform: | Linux |
Description
running r.neighbors --o --verbose input=naip_change output=naip_change_average method=average size=29
The input raster is ( r.info output)
Total Cells: 365659744068 Starts and shows 0% complete, then crashes with segmentation fault.
linux top program shows process using 1.5 GB RAM before crash
Using grass-7.0.svn_src_snapshot_2015_05_02
In main.c , line 139 I see : int i , n;
I assume that this means that r.neighbors is limited to rasters with the number of cells in the 32 bit integer range?
Change History (15)
follow-up: 2 comment:1 by , 10 years ago
follow-up: 4 comment:2 by , 10 years ago
Replying to glynn:
Replying to dnewcomb:
In main.c , line 139 I see : int i , n;
I assume that this means that r.neighbors is limited to rasters with the number of cells in the 32 bit integer range?
If it is, it has nothing to do with above declaration. "i" will never exceed the number of outputs, while "n" will never exceed the number of cells in the neighbourhood.
r.neighbours operates row-by-row, only holding as many rows as are in the neighbourhood, and for method=average it isn't maintaining category data, so it shouldn't have any problems with rasters with more than 2^31 cells. At least, there's no more reason for r.neighbors to have such issues than any other raster module.
What are the actual dimensions (rows x columns) of the current region? (The dimensions of the input map aren't directly relevant).
Sorry, rookie mistake.
The region stats are these.
rows: 1969532 cols: 1875705 cells: 3694261020060
I had used r.external -e to expand the extent of the region to include the linked raster.
Setting the region to match the raster gives me this size region.
rows: 440046 cols: 830958 cells: 365659744068
This seems to be working so far.
Sorry for the noise.
Kind of curious what is the upper limit for the command, though.
comment:3 by , 10 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
follow-up: 5 comment:4 by , 10 years ago
Resolution: | invalid |
---|---|
Status: | closed → reopened |
Replying to dnewcomb:
What are the actual dimensions (rows x columns) of the current region? (The dimensions of the input map aren't directly relevant).
Sorry, that's incorrect. If you don't use the -a flag, r.neighbors sets the region to match the input map.
In any case, the cause is the use of G_alloca() for allocating temporary row buffers within the raster library. This is a macro which expands to alloca(), which allocates memory on the stack.
While this is vastly more efficient than malloc() etc (alloca() may compile to just 2 CPU instructions), it doesn't report allocation failure; the next instruction to push something onto the stack will result in a segfault.
1875705 columns at 8 bytes per cell corresponds to 15 MB for a row. On my system, the default maximum stack size (ulimit -s) is 8192 KiB (8 MiB). Changing this to 50 MiB avoids the segfault (although I have neither the patience nor the free disk space to see whether it runs to completion).
While I doubt that this will be a genuine issue for many people, it can result in the upper limit on map dimensions being smaller than it could be.
Consequently, we may want to think about whether the use of alloca() should be optional. Currently, it's used on any system which is believed to provide it (those which lack it use malloc/free). See the top of include/defs/gis.h for the details.
If most users can live with the existing behaviour (an 8 MiB default stack allows for just short of a million columns with DCELL data) and most of the rest can live with explicitly increasing the stack size, it may suffice to just add e.g. "#ifndef DONT_USE_ALLOCA" to allow the remainder to override the behaviour at compile time.
comment:5 by , 10 years ago
Replying to glynn:
Replying to dnewcomb:
What are the actual dimensions (rows x columns) of the current region? (The dimensions of the input map aren't directly relevant).
Sorry, that's incorrect. If you don't use the -a flag, r.neighbors sets the region to match the input map.
In any case, the cause is the use of G_alloca() for allocating temporary row buffers within the raster library. This is a macro which expands to alloca(), which allocates memory on the stack.
While this is vastly more efficient than malloc() etc (alloca() may compile to just 2 CPU instructions), it doesn't report allocation failure; the next instruction to push something onto the stack will result in a segfault.
1875705 columns at 8 bytes per cell corresponds to 15 MB for a row. On my system, the default maximum stack size (ulimit -s) is 8192 KiB (8 MiB). Changing this to 50 MiB avoids the segfault (although I have neither the patience nor the free disk space to see whether it runs to completion).
While I doubt that this will be a genuine issue for many people, it can result in the upper limit on map dimensions being smaller than it could be.
Consequently, we may want to think about whether the use of alloca() should be optional. Currently, it's used on any system which is believed to provide it (those which lack it use malloc/free). See the top of include/defs/gis.h for the details.
If most users can live with the existing behaviour (an 8 MiB default stack allows for just short of a million columns with DCELL data) and most of the rest can live with explicitly increasing the stack size, it may suffice to just add e.g. "#ifndef DONT_USE_ALLOCA" to allow the remainder to override the behaviour at compile time.
Just to give an idea of the use case, I have done a land cover change ndvi analysis based off of 4 band 1m resolution digital aerial photography for the state of NC that I'm averaging to a coarser resolution. I can see down the road a couple of years, as image data gets denser, the possibility of more folks running into the limits you describe.
Of course, one could always chop the data set into overlapping blocks, process, and patch together later.
comment:10 by , 9 years ago
Milestone: | 7.0.4 → 7.0.5 |
---|
comment:11 by , 8 years ago
Milestone: | 7.0.5 → 7.3.0 |
---|
comment:14 by , 7 years ago
Resolution: | → wontfix |
---|---|
Status: | reopened → closed |
No activity last 3 years, closing. Feel free to reopen if desired.
Replying to dnewcomb:
If it is, it has nothing to do with above declaration. "i" will never exceed the number of outputs, while "n" will never exceed the number of cells in the neighbourhood.
r.neighbours operates row-by-row, only holding as many rows as are in the neighbourhood, and for method=average it isn't maintaining category data, so it shouldn't have any problems with rasters with more than 2^31 cells. At least, there's no more reason for r.neighbors to have such issues than any other raster module.
What are the actual dimensions (rows x columns) of the current region? (The dimensions of the input map aren't directly relevant).