Opened 9 years ago
Last modified 6 years ago
#3062 new defect
Segmentation fault with r.buffer
Reported by: | escheper | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 7.6.2 |
Component: | Raster | Version: | 7.0.4 |
Keywords: | r.buffer | Cc: | |
CPU: | Other | Platform: | Linux |
Description
I'm trying to buffer raster regions at various resolutions. The following commands went well:
g.region w=-180.0 s=-90.0 e=180.0 n=90.0 res=0.00833333 r.buffer input=rasset output=rasbuf distances=10.0 units="kilometers" --o
The next commands give a segmentation fault at "Finding buffer zones..." (at around 37%).
g.region w=-180.0 s=-90.0 e=180.0 n=90.0 res=0.002777777 r.buffer input=rasset output=rasbuf distances=10.0 units="kilometers" --o
I'm using Ubuntu 16.04 LTS on an Intel Xeon E5-1650 server with >64GB memory.
Does anyone have a workaround or an idea what the cause is?
Attachments (1)
Change History (19)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
Added the file settlements.pack which is a small version of the original input raster. The size of the original raster is 16MB which cannot be added as an attachment. I could send this file by WeTransfer or is there another way?
comment:3 by , 9 years ago
Did some further tests. Using the small version did not resulted in a segmentation fault. So the original raster is needed.
comment:5 by , 9 years ago
Keywords: | r.buffer added; buffer removed |
---|
comment:6 by , 9 years ago
CMD='r.buffer input=r120749038 output=rasbuf distances=10.0 units=kilometers --o' valgrind --tool=memcheck --leak-check=yes --show-reachable=yes $CMD --o ==11148== Memcheck, a memory error detector ==11148== ==11148== 933,120,000 bytes in 1 blocks are still reachable in loss record 73 of 73 ==11148== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==11148== by 0x5076A4A: G__malloc (alloc.c:39) ==11148== by 0x40333B: read_input_map (read_map.c:39) ==11148== by 0x402814: main (main.c:141) ...
There is no G_free(map) it seems:
raster/r.buffer $ grep 'alloc\|free' *.c | grep -v GNU parse_dist.c: distances = (struct Distance *)G_calloc(count, sizeof(struct Distance)); read_map.c: map = (MAPTYPE *) G_malloc((size_t) window.rows * window.cols * sizeof(MAPTYPE)); read_map.c: cell = Rast_allocate_c_buf(); read_map.c: G_free(cell); support.c: Rast_free_cats(&pcats); write_map.c: cell = Rast_allocate_c_buf(); write_map.c: G_free(cell);
follow-up: 8 comment:7 by , 9 years ago
Thanks for your support!
Reading the map didn't cause the problem. The method read_input_map() finishes without a fault. The segmentation fault occurs in execute_distance() in execute.c after the message "Finding buffer zones..." is shown.
After adding "G_free(map);" in read_input_map() the r.buffer command crashes within that method and execute_distances() isn't executed. So I removed the "G_free(map)" again.
After adding some debug statements I have noticed the program crashes at the following point in process_left().
/* convert 1,2,3,4 to -1,0,1,2 etc. 0 becomes ndist */ G_message(_("PL6")); G_message(_("cur_zone: %i ZONE_INCR: %i ndist: %i to_ptr: %i"),cur_zone,ZONE_INCR,ndist,to_ptr); if ((cur_zone = *--to_ptr)) cur_zone -= ZONE_INCR; else cur_zone = ndist;
The last lines of the debug output was:
PL6
cur_zone: 16538 ZONE_INCR: 2 ndist: 1 to_ptr: -2116621104
Segmentation fault
I do not have much experience with C so maybe you can give me a clue.
comment:8 by , 9 years ago
Replying to escheper:
G_message(_("cur_zone: %i ZONE_INCR: %i ndist: %i to_ptr: %i"),cur_zone,ZONE_INCR,ndist,to_ptr); if ((cur_zone = *--to_ptr)) cur_zone -= ZONE_INCR; else cur_zone = ndist;
...
cur_zone: 16538 ZONE_INCR: 2 ndist: 1 to_ptr: -2116621104
Good analysis, this number looks like an integer overflow to me (it is negative). A C expert wanted here...
comment:9 by , 9 years ago
Changed the %i to %u. Now the numbers/pointers are positive. Also changed
if ((cur_zone = *--to_ptr))
to
if ((cur_zone == *--to_ptr))
but still get the segmentation fault :(.
follow-up: 11 comment:10 by , 9 years ago
I would say it is related to the fact that to_ptr
is unsigned char, so it can overflow fast, although I don't understand the algorithm there. Note the comment in the code:
/* if MAPTYPE is changed to unsigned short, MAX_DIST can be set to 2^16-2 * (if short is 2 bytes) */
So I would try to change it to unsigned short. Unfortunately I can't easily test the testcase, I have little memory right now.
follow-ups: 12 13 comment:11 by , 9 years ago
Replying to annakrat:
I would say it is related to the fact that
to_ptr
is unsigned char, so it can overflow fast, although I don't understand the algorithm there. Note the comment in the code:/* if MAPTYPE is changed to unsigned short, MAX_DIST can be set to 2^16-2 * (if short is 2 bytes) */So I would try to change it to unsigned short. Unfortunately I can't easily test the testcase, I have little memory right now.
Thanks to your tip I found the solution.
Changing the MAPTYPE didn't solve the problem. But the problem was another overflow.
I'm using a 64,800 x 129,600 grid which results in 8,398,080,000 cells (or bytes). This is far beyond the maxint (2,147,483,647) and causes the overflow.
After changing most int variables in the r.buffer code in long types the segmentation fault disappeared :).
I didn't check the output visually, that's what I need to do next. Having no segmentation fault is very promising.
Except from changing the integer types to long, I have changed some "=" to "==" in "if" statements in process_left.c, process_at.c and process_rite.c. This didn't solve the segmentation fault but in my opinion the original code seems not to be correct.
I found some other lines that may not be correct too. But I'm not sure about it. These lines are:
process_rite.c(90): *to_ptr = (first_zone = i) + ZONE_INCR; process_row.c(34): for (r = row; r >= 0 && (first_zone = find_distances(r)) >= 0; r--) process_row.c(47): r < window.rows && (first_zone = find_distances(r)) >= 0; r++) { read_map.c(65): if ((*ptr++ = (*cell++ != 0))) { read_map.c(81): if ((*ptr++ = !Rast_is_c_null_value(cell++))) {
Can anyone confirm this?
Remains the following questions:
- How are other GRASS modules dealing with large grids? Do they have the same problem?
- What will be the impact on the output raster after changing some "=" to "=="? Do we get different result buffers?
- In what way should I do a "pull request" of my changes in de code.
comment:12 by , 9 years ago
Replying to escheper:
Remains the following questions:
- How are other GRASS modules dealing with large grids? Do they have the same problem?
- What will be the impact on the output raster after changing some "=" to "=="? Do we get different result buffers?
- In what way should I do a "pull request" of my changes in de code?
Create a diff and attach it here. You can make it simpler for us if you first identify the exact source of the error and then upload the patch which fixes without any extra changes. Changing int variables to long int might result in spending too much memory, so you shouldn't change it unless you consider the consequences.
If you think there are other changes which need to be done to run r.buffer correctly, please explain them. For example, it's not clear why "==" should replace "=" and where. And yes, very probably we get then different results.
comment:13 by , 9 years ago
Replying to neteler:
CMD='r.buffer input=r120749038 output=rasbuf distances=10.0 units=kilometers --o' valgrind --tool=memcheck --leak-check=yes --show-reachable=yes $CMD --o ==11148== Memcheck, a memory error detector ==11148== ==11148== 933,120,000 bytes in 1 blocks are still reachable in loss record 73 of 73 ==11148== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==11148== by 0x5076A4A: G__malloc (alloc.c:39) ==11148== by 0x40333B: read_input_map (read_map.c:39) ==11148== by 0x402814: main (main.c:141) ...
A segfault is reflected in valgrind as "invalid read of size ..." or "invalid write of size ...". Still reachable blocks are not a cause for segfault.
Replying to escheper:
[...] I found the solution.
Changing the MAPTYPE didn't solve the problem. But the problem was another overflow.
I'm using a 64,800 x 129,600 grid which results in 8,398,080,000 cells (or bytes). This is far beyond the maxint (2,147,483,647) and causes the overflow.
After changing most int variables in the r.buffer code in long types the segmentation fault disappeared :).
I didn't check the output visually, that's what I need to do next. Having no segmentation fault is very promising.
Note that long is a signed 32 bit integer except for POSIX 64 bit systems. On Windows, long is always 32 bit and would not solve the problem. Better use long long, or even better, ask for a platform-independent 64 bit integer type in GRASS.
comment:15 by , 9 years ago
Priority: | blocker → major |
---|
No, it seems to be only problem with huge data.
comment:16 by , 8 years ago
Milestone: | 7.0.5 → 7.0.6 |
---|
comment:17 by , 7 years ago
Milestone: | 7.0.6 → 7.0.7 |
---|
comment:18 by , 6 years ago
Milestone: | 7.0.7 → 7.6.2 |
---|
Please make the map "rasset" available (via r.pack) or tell us how to generate it in order to enable us to reproduce the problem.