Opened 16 years ago
Closed 8 years ago
#475 closed defect (wontfix)
r.stats: last bin always has a single cell
Reported by: | hamish | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 6.4.6 |
Component: | Raster | Version: | svn-develbranch6 |
Keywords: | r.stats | Cc: | |
CPU: | All | Platform: | All |
Description
Hi, r.stats seems to be doing something funny: the last bin always contains only one cell. Same result with GRASS 5.4.1 - 6.5svn on Linux, OSX, Solaris.
speafish:
G65> g.region rast=elevation.10m G65> r.stats elevation.10m nsteps=10 -c | nl 100% 1 1061.064087-1139.632019 243490 2 1139.632019-1218.199951 732066 3 1218.199951-1296.767883 399213 4 1296.767883-1375.335815 351601 5 1375.335815-1453.903748 315246 6 1453.903748-1532.47168 265487 7 1532.47168-1611.039612 216796 8 1611.039612-1689.607544 115175 9 1689.607544-1768.175476 15727 10 1768.175476-1846.743408 1 G65> r.mapcalc 'bin10 = if(elevation.10m > 1768.175476, 1, null() )' G65> r.univar bin10 | grep '^n' n: 12344
other bin counts don't line up either:
G65> r.mapcalc 'bin2 = if(elevation.10m > 1139.632019 && elevation.10m < 1218.199951, 1, null() )' G65> r.univar bin2 | grep '^n' n: 645823 G65> r.mapcalc 'bin9 = if(elevation.10m > 1689.607544 && elevation.10m < 1768.175476, 1, null() )' G65> r.univar bin9 | grep '^n' n: 87599
all-in-one bin gives correct result:
r.stats elevation.10m nsteps=1 -c 100% 1061.064087-1846.743408 2654802 r.univar elevation.10m | grep '^n' n: 2654802 # but, r.stats elevation.10m nsteps=2 -c 100% 1061.064087-1453.903748 2654801 1453.903748-1846.743408 1
sum total seems to be ok, just distribution does not match reported range:
r.stats elevation.10m nsteps=5 -c 100% 1061.064087-1218.199951 1090464 1218.199951-1375.335815 817890 1375.335815-1532.47168 573165 1532.47168-1689.607544 173282 1689.607544-1846.743408 1 echo $((1090464 + 817890 + 573165 + 173282 + 1)) 2654802
thanks to Doc Robinson for reporting this.
(I have a vague memory that this was also reported years ago in the mailing list archives??)
thanks, Hamish
Change History (10)
comment:1 by , 16 years ago
comment:2 by , 16 years ago
I see the same bug in 5.0.0, so it probably dates back to the initial FP conversion.
H
follow-up: 4 comment:3 by , 16 years ago
request for help solving this critical bug: we are reporting bad statistical data from our core stats module. It is used by r.report, d.histogram, etc and on to users' work output.
I'm horrified by this bug but unfamiliar with hash tables and don't know how to proceed with debugging.
thanks, Hamish
comment:4 by , 16 years ago
Replying to hamish:
request for help solving this critical bug: we are reporting bad statistical data from our core stats module. It is used by r.report, d.histogram, etc and on to users' work output.
I'm horrified by this bug but unfamiliar with hash tables and don't know how to proceed with debugging.
I think that you're looking in the wrong place.
This appears to be a problem with the quantisation semantics.
In your test case, r.stats calls
G_quant_add_rule(&q, 1061.064087, 1846.743408, 1, 10)
It then assigns the quantisation rules to the input map.
Cell values are obtained using G_get_c_raster_row(), which converts the map's FP values to integer CELL values via G_quant_get_cell_value(). Given the above quantisation rule, it converts FP values as follows:
> print G_quant_get_cell_value(&q, 1846.743408) $15 = 10 > print G_quant_get_cell_value(&q, 1846.743407) $17 = 9 > print G_quant_get_cell_value(&q, 1846.743409) $18 = -2147483648
Essentially, G_quant_add_rule() treats the ranges as half-open, i.e. the values range from low (inclusive) to high (exclusive). While half-open ranges are a common concept (e.g. floor() behaves the same way), the range of a GRASS raster is closed, i.e. both the low and high values are inclusive.
I suggest changing r.stats to use cHigh=nsteps+1, and truncate the quantised values into range (fudging dHigh may seem simpler, but the maximum value may be positive, negative or zero, and the range may be 1e+300 or 1e-300, so choosing a suitable fudge factor isn't straightforward).
comment:5 by , 16 years ago
I've applied a fix for this bug in devbr6 r37699, and trunk r37700.
please review/test.
One thing I notice is that for multiple input maps of mixed types the numbers for the * second map's ranges are a bit weird + change with the patch.
e.g.:
#spearfish # elevation.dem is CELL, elevation.10m is DCELL g.region rast=elevation.10m g.region n=n+10 # same before and after patch G65> r.stats elevation.dem nsteps=5 -c | tail -n 11 100% 1831 108 1832 117 1833 108 1834 90 1835 72 1836 63 1837 36 1838 9 1839 27 1840 36 * 25848
# before patch G65> r.stats elevation.10m nsteps=5 -c 100% 1061.064087-1218.199951 1090464 1218.199951-1375.335815 817890 1375.335815-1532.47168 573165 1532.47168-1689.607544 173282 1689.607544-1846.743408 1 G65> r.stats elevation.dem,elevation.10m nsteps=5 -c | tail -n 11 100% 1836 1532.47168-1689.607544 63 1837 1532.47168-1689.607544 35 1837 1689.607544-1846.743408 1 1838 1532.47168-1689.607544 9 1839 1532.47168-1689.607544 27 1840 1532.47168-1689.607544 36 * 1061.064087-1218.199951 10968 * 1218.199951-1375.335815 2496 * 1375.335815-1532.47168 10395 * 1532.47168-1689.607544 90 * * 1899
note 2x cat 1837 in above..(?) [they add up ok]
# after patch G65> r.stats elevation.10m nsteps=5 -c 100% 1061.064087-1218.199951 847076 1218.199951-1375.335815 730296 1375.335815-1532.47168 565685 1532.47168-1689.607544 411802 1689.607544-1846.743408 99943 * 1899 G65> r.stats elevation.dem,elevation.10m nsteps=5 -c | tail -n 11 100% 1835 1689.607544-1846.743408 72 1836 1689.607544-1846.743408 63 1837 1689.607544-1846.743408 36 1838 1689.607544-1846.743408 9 1839 1689.607544-1846.743408 27 1840 1689.607544-1846.743408 36 * 1061.064087-1218.199951 10090 * 1218.199951-1375.335815 881 * 1375.335815-1532.47168 8634 * 1532.47168-1689.607544 4344 * * 1899
Hamish
comment:6 by , 16 years ago
Index: stats.c =================================================================== --- stats.c (revision 37700) +++ stats.c (working copy) @@ -98,8 +98,8 @@ void fix_max_fp_val(CELL *cell, int ncols) { while (ncols-- > 0) { - if (cell[ncols] > nsteps) - cell[ncols] = nsteps; + if (cell[ncols] > (CELL)nsteps) + cell[ncols] = (CELL)nsteps; /* { G_debug(5, ". resetting %d to %d\n", cell[ncols], nsteps); } */ } return;
?
comment:7 by , 16 years ago
Priority: | critical → major |
---|
downgrading severity because the main reported bug is fixed.
the weird output for multiple input maps of mixed types still remains so not closing it.
Hamish
comment:8 by , 9 years ago
Milestone: | 6.4.0 → 6.4.6 |
---|
comment:9 by , 8 years ago
To me, the logic after the path looks just fine. I don't know what's weird about them.
Suggesting to close this bug.
comment:10 by , 8 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Replying to hamish:
apparently in raster/r.stats/stats.c update_cell_stats(), but then I get lost as I'm not really trained to deal with hash tables.
Interestingly if I change the resolution to misalign I only get the first output step:
Hamish