Opened 9 years ago

Closed 6 years ago

#2750 closed enhancement (fixed)

LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed

Reported by: sprice Owned by: grass-dev@…
Priority: normal Milestone: 7.6.0
Component: Raster Version: svn-releasebranch74
Keywords: ZLIB LZ4 ZSTD Cc:
CPU: All Platform: All

Description

I've added the ability for reading/writing raster rows in compression formats LZ4, LZ4HC, & ZSTD (in addition to the existing RLE & ZLIB.) These new algorithms are extremely fast (an order of magnitude faster than ZLIB) and are a better fit for modern, fast, hard drives & SSDs.

I've attached a .tgz file with the necessary added & changed files. The new algorithms can be enabled/disabled with environment vars the same way as ZLIB, as described in the r.compress documentation: https://grass.osgeo.org/grass70/manuals/r.compress.html

Algorithm summary:

LZ4 produces a slightly worse compression ratio than ZLIB, but it compresses about an order of magnitude faster than ZLIB. It decompresses even faster.

LZ4-HC is supposed to produce a compression ratio similar to ZLIB, and at about the same speed as ZLIB. It decompresses as fast as the regular LZ4 (>2 GB/s). Unfortunately, the improved compression ratio doesn't show in my tests, probably due to the fact we're compressing each row individually. This may change against floating point data if someone wants to test it.

ZSTD is a new algorithm by the author of LZ4 that is intended to replace ZLIB. It compresses and decompresses extremely quickly, while maintaining a similar compression ratio as ZLIB. Unfortunately, it's still in beta.

They are all under the BSD license. Links below show more info & performance numbers.

https://github.com/Cyan4973/zstd

https://github.com/Cyan4973/lz4

http://www.lz4.org

I recommend incorporating the attached changes into GRASS, and leaving it as optional for users. At some point in the future, after testing, GRASS should move to using ZSTD as default. Power users who want the best performance now (and have the disk space) can use LZ4 immediately.

It would be trivial to further alter get_row.c & put_row.c to use LZ4 for floating point compression. (And I would recommend someone do that.)

Note: I've decided to use LZ4_decompress_fast() instead of LZ4_decompress_safe(). In my test, it was noticeably faster. According to the documentation, it leaves LZ4 open to a malicious attack. If this is a serious concern in the GRASS GIS internal data structures, change the commenting in get_row.c to use the safer code.

On my computer, I've better than halved (!) the runtime of a r.mapcalc identity operation when using LZ4. Below are my tests while working with a RapidEye scene.

> time r.mapcalc expression="out_test_zlib=out_test_lz4hc" --overwrite
 100%

real	3m33.503s
user	3m25.451s
sys	0m6.750s
> time r.mapcalc expression="out_test_zlib=out_test_lz4hc" --overwrite
 100%

real	3m34.398s
user	3m26.684s
sys	0m6.138s
> export GRASS_INT_LZ4=1
> time r.mapcalc expression="out_test_lz4=out_test_lz4hc" --overwrite
 100%

real	1m31.222s
user	1m25.379s
sys	0m5.035s
> time r.mapcalc expression="out_test_lz4=out_test_lz4hc" --overwrite
 100%

real	1m29.792s
user	1m24.029s
sys	0m4.858s
> unset GRASS_INT_LZ4
> export GRASS_INT_LZ4HC=1
> time r.mapcalc expression="out_test_lz4hc2=out_test_lz4hc" --overwrite
 100%

real	3m5.332s
user	2m58.610s
sys	0m5.603s
> time r.mapcalc expression="out_test_lz4hc2=out_test_lz4hc" --overwrite
 100%

real	3m3.710s
user	2m56.606s
sys	0m5.858s
> unset GRASS_INT_LZ4HC
> export GRASS_INT_ZSTD=1
> time r.mapcalc expression="out_test_zstd=out_test_lz4hc" --overwrite
 100%

real	1m38.322s
user	1m32.654s
sys	0m4.897s
> time r.mapcalc expression="out_test_zstd=out_test_lz4hc" --overwrite
 100%

real	1m42.370s
user	1m35.487s
sys	0m5.282s
> unset GRASS_INT_ZSTD
> ls -l vrt_test/PERMANENT/cell/out_test_*
-rw-r--r--  1 sprice  staff  4080217012 Sep 26 14:01 vrt_test/PERMANENT/cell/out_test_lz4
-rw-r--r--  1 sprice  staff  4069728048 Sep 26 13:34 vrt_test/PERMANENT/cell/out_test_lz4hc
-rw-r--r--  1 sprice  staff  4069728048 Sep 26 14:08 vrt_test/PERMANENT/cell/out_test_lz4hc2
-rw-r--r--  1 sprice  staff  3737100577 Sep 26 13:57 vrt_test/PERMANENT/cell/out_test_zlib
-rw-r--r--  1 sprice  staff  3811356101 Sep 26 14:12 vrt_test/PERMANENT/cell/out_test_zstd
> time r.univar out_test_zlib
 100%
total null and non-null cells: 3526771952
total null cells: 1502448926

Of the non-null cells:
----------------------
n: 2024323026
minimum: 807
maximum: 32767
range: 31960
mean: 9385.79
mean of absolute values: 9385.79
standard deviation: 6620.52
variance: 4.38312e+07
variation coefficient: 70.5377 %
sum: 18999862195879

real	1m29.980s
user	1m27.111s
sys	0m2.589s
> time r.univar out_test_lz4
 100%
total null and non-null cells: 3526771952
total null cells: 1502448926

Of the non-null cells:
----------------------
n: 2024323026
minimum: 807
maximum: 32767
range: 31960
mean: 9385.79
mean of absolute values: 9385.79
standard deviation: 6620.52
variance: 4.38312e+07
variation coefficient: 70.5377 %
sum: 18999862195879

real	1m9.883s
user	1m7.559s
sys	0m2.210s
> time r.univar out_test_lz4hc
 100%
total null and non-null cells: 3526771952
total null cells: 1502448926

Of the non-null cells:
----------------------
n: 2024323026
minimum: 807
maximum: 32767
range: 31960
mean: 9385.79
mean of absolute values: 9385.79
standard deviation: 6620.52
variance: 4.38312e+07
variation coefficient: 70.5377 %
sum: 18999862195879

real	1m10.199s
user	1m7.902s
sys	0m2.173s
> time r.univar out_test_zstd
 100%
total null and non-null cells: 3526771952
total null cells: 1502448926

Of the non-null cells:
----------------------
n: 2024323026
minimum: 807
maximum: 32767
range: 31960
mean: 9385.79
mean of absolute values: 9385.79
standard deviation: 6620.52
variance: 4.38312e+07
variation coefficient: 70.5377 %
sum: 18999862195879

real	1m25.206s
user	1m21.351s
sys	0m2.518s

Attachments (9)

lz4_zstd.tgz (80.3 KB ) - added by sprice 9 years ago.
lz4_zstd2.tgz (80.6 KB ) - added by sprice 9 years ago.
lz4_zstd3.tgz (83.9 KB ) - added by sprice 9 years ago.
lz4_zstd4.tgz (90.1 KB ) - added by sprice 9 years ago.
raster_compress_benchmark.sh (2.5 KB ) - added by wenzeslaus 9 years ago.
Benchmark DCELL
gislib_compressor_benchmark.sh (2.4 KB ) - added by wenzeslaus 9 years ago.
Benchmark DCELL rasters with GRASS_COMPRESSOR
bzipsupport.patch (198.3 KB ) - added by mmetz 9 years ago.
bzip2 support patch
compressors.patch (15.4 KB ) - added by mmetz 9 years ago.
minimal modification to add more compressors
giscompress.tar.gz (21.8 KB ) - added by mmetz 9 years ago.
new files for lib/gis to support different compressors

Download all attachments as: .zip

Change History (82)

by sprice, 9 years ago

Attachment: lz4_zstd.tgz added

comment:1 by sprice, 9 years ago

I've gone ahead and added code to also compress FCELL/DCELL. Same tests at before. Late 2013 Mac Pro, v10.10.5, 64 GB RAM, 3.5 GHz 6-Core. The r.mapcalc commands are intended to test real-world write performance. The ls -l command shows file sizes. r.univar is intended to show read performance.

tl;dr: Write speed with LZ4 & ZSTD continues to be double the current method. Read speed also significantly faster. LZ4 shows the best performance if you have the disk space. ZSTD is very good all around, and doesn't fill the hard drive as fast.

> time r.mapcalc expression="out_fp_orig=float(out_test_lz4hc)" --overwrite
 100%

real	4m57.518s
user	4m48.823s
sys	0m7.118s
> time r.mapcalc expression="out_fp_orig=float(out_test_lz4hc)" --overwrite
 100%

real	4m57.555s
user	4m49.946s
sys	0m6.382s
> export GRASS_INT_LZ4=1
> time r.mapcalc expression="out_fp_lz4=float(out_test_lz4hc)" --overwrite
 100%

real	1m51.136s
user	1m44.766s
sys	0m5.879s
> time r.mapcalc expression="out_fp_lz4=float(out_test_lz4hc)" --overwrite
 100%

real	1m51.898s
user	1m44.991s
sys	0m5.928s
> unset GRASS_INT_LZ4
> export GRASS_INT_LZ4HC=1
> time r.mapcalc expression="out_fp_lz4hc=float(out_test_lz4hc)" --overwrite
 100%

real	4m22.887s
user	4m15.556s
sys	0m6.206s
> time r.mapcalc expression="out_fp_lz4hc=float(out_test_lz4hc)" --overwrite
 100%

real	4m22.143s
user	4m15.164s
sys	0m6.075s
> unset GRASS_INT_LZ4HC
> export GRASS_INT_ZSTD=1
> time r.mapcalc expression="out_fp_zstd=float(out_test_lz4hc)" --overwrite
 100%

real	1m49.065s
user	1m43.401s
sys	0m5.183s
> time r.mapcalc expression="out_fp_zstd=float(out_test_lz4hc)" --overwrite
 100%

real	1m48.771s
user	1m43.238s
sys	0m5.093s
> ls -l vrt_test/PERMANENT/fcell/
total 46206984
-rw-r--r--  1 sprice  staff  7268345798 Sep 27 19:57 out_fp_lz4
-rw-r--r--  1 sprice  staff  5979129773 Sep 27 20:06 out_fp_lz4hc
-rw-r--r--  1 sprice  staff  4969771486 Sep 27 19:53 out_fp_orig
-rw-r--r--  1 sprice  staff  5440720320 Sep 27 20:10 out_fp_zstd
> unset GRASS_INT_ZSTD
> r.compress -p out_test_zlib
<out_test_zlib> is compressed (level 2: DEFLATE). Data type: <CELL>
> r.compress -p out_test_lz4
<out_test_lz4> is compressed (level 3: LZ4). Data type: <CELL>
> r.compress -p out_test_lz4hc
<out_test_lz4hc> is compressed (level 4: LZ4HC). Data type: <CELL>
> r.compress -p out_test_zstd
<out_test_zstd> is compressed (level 5: ZSTD). Data type: <CELL>
> r.compress -p out_fp_zstd
<out_fp_zstd> is compressed (level 5: ZSTD). Data type: <FCELL>
> r.compress -p out_fp_orig
<out_fp_orig> is compressed (level 2: DEFLATE). Data type: <FCELL>
> r.compress -p out_fp_lz4
<out_fp_lz4> is compressed (level 3: LZ4). Data type: <FCELL>
> r.compress -p out_fp_lz4hc
<out_fp_lz4hc> is compressed (level 4: LZ4HC). Data type: <FCELL>
> time r.univar out_fp_orig
 100%
total null and non-null cells: 3526771952
total null cells: 1502448926

Of the non-null cells:
----------------------
n: 2024323026
minimum: 807
maximum: 32767
range: 31960
mean: 9385.79
mean of absolute values: 9385.79
standard deviation: 6620.52
variance: 4.38312e+07
variation coefficient: 70.5377 %
sum: 18999862195879

real	1m49.227s
user	1m46.152s
sys	0m2.843s
> time r.univar out_fp_lz4
 100%
total null and non-null cells: 3526771952
total null cells: 1502448926

Of the non-null cells:
----------------------
n: 2024323026
minimum: 807
maximum: 32767
range: 31960
mean: 9385.79
mean of absolute values: 9385.79
standard deviation: 6620.52
variance: 4.38312e+07
variation coefficient: 70.5377 %
sum: 18999862195879

real	1m8.749s
user	1m4.596s
sys	0m3.564s
> time r.univar out_fp_lz4hc
 100%
total null and non-null cells: 3526771952
total null cells: 1502448926

Of the non-null cells:
----------------------
n: 2024323026
minimum: 807
maximum: 32767
range: 31960
mean: 9385.79
mean of absolute values: 9385.79
standard deviation: 6620.52
variance: 4.38312e+07
variation coefficient: 70.5377 %
sum: 18999862195879

real	1m10.307s
user	1m6.546s
sys	0m3.352s
> time r.univar out_fp_zstd
 100%
total null and non-null cells: 3526771952
total null cells: 1502448926

Of the non-null cells:
----------------------
n: 2024323026
minimum: 807
maximum: 32767
range: 31960
mean: 9385.79
mean of absolute values: 9385.79
standard deviation: 6620.52
variance: 4.38312e+07
variation coefficient: 70.5377 %
sum: 18999862195879

real	1m31.310s
user	1m27.916s
sys	0m3.044s

by sprice, 9 years ago

Attachment: lz4_zstd2.tgz added

comment:2 by wenzeslaus, 9 years ago

I have executed all tests with lz4_zstd2.tgz​ and I got 22 (17%) more failing tests, unfortunately. One example is r.viewshed test where the results are just completely wrong:

...
mismatch values (key, reference, actual):
[('max', 43.15356, -3.44000005722046),
 ('min', -24.98421, -3.44000005722046),
 ('null_cells', 1963758, 0)]
...

I haven't investigated further. Can anybody confirm or disproof?

in reply to:  1 comment:3 by neteler, 9 years ago

Replying to sprice:

I've gone ahead and added code to also compress FCELL/DCELL.

Great!

A question: is the NULL file also compressed (see the related ticket #2349) with the new compression algorithm? This would be important - I could not yet check your patches myself.

comment:4 by sprice, 9 years ago

The null file is compressed when GRASS_COMPRESS_NULLS is set.

by sprice, 9 years ago

Attachment: lz4_zstd3.tgz added

comment:5 by sprice, 9 years ago

I just uploaded a new version of the code that shouldn't fail any additional unit tests, and I've added unit tests to r.compress that test reading/writing the different compression schemes.

in reply to:  5 ; comment:6 by wenzeslaus, 9 years ago

Replying to sprice:

I just uploaded a new version of the code that shouldn't fail any additional unit tests, and I've added unit tests to r.compress that test reading/writing the different compression schemes.

Nice. Runs for me as well. Test is well written, but to be sure, please improve setting of the environmental variables. When you do os.environ[env_var] = '1', env_var stays in the environment, so next time it might be picked up (to be honest, right now I don't understand why it is not picked up). You can run a module in an isolated environment by something like this:

env = os.environ.copy()
env[env_var] = '1'
...Module(..., env_=env)

I haven't tried that but in theory it should work with all *Module functions as well as classes (all is based on PyGRASS Module).

in reply to:  6 comment:7 by wenzeslaus, 9 years ago

Replying to wenzeslaus:

Replying to sprice:

I just uploaded a new version of the code that shouldn't fail any additional unit tests, and I've added unit tests to r.compress that test reading/writing the different compression schemes.

Nice. Runs for me as well.

Bad news. Only the r.compress tests runs well for me. I messed up the environmental variables.

Now (with export GRASS_INT_LZ4=1) I'm getting fails from various tests.

Wrong values in the result (for example):

./raster/r.series.interp/testsuite/interp_test.py
======================================================================
FAIL: test_infile (__main__.InterpolationTest)
----------------------------------------------------------------------
AssertionError: The actual maximum (269) is greater than the reference
one (200) for raster map prec_2 (with minimum 200)

Possible segmentation faults:

./raster/r.gwflow/testsuite/validation_7x7_grid.py
=====================================================================
FAIL: test_transient (__main__.Validation7x7Grid)
---------------------------------------------------------------------
AssertionError: Running <r.gwflow> module ended with non-zero return code (-11)
./temporal/t.rast.univar/testsuite/test.t.rast.univar.sh
...
grass.exceptions.CalledModuleError: Module run ['r.univar', '-ge',
u'map=prec_1@__temporal_t_rast_univar_test.t. rast.univar'] ended with error
Process ended with non-zero return code -11.

With the changes applied but with default settings (unset GRASS_INT_LZ4), I'm getting the same results as at the test server. r.compress test runs well in both cases.

by sprice, 9 years ago

Attachment: lz4_zstd4.tgz added

comment:8 by sprice, 9 years ago

I think I've fixed all issues. And a bad memory read that was in there before that valgrind discovered. And a segfault in n_les_assemble.c (via r.gwflow) that someone should check out. I've uploaded a new set of files.

in reply to:  8 comment:9 by wenzeslaus, 9 years ago

Replying to sprice:

I think I've fixed all issues.

It works for me as well. With GRASS_INT_LZ4=1 I get

Executed 129 test files in 0:25:10.484136.
From them 120 files (93%) were successful and 9 files (7%) failed.

without GRASS_INT_LZ4 it is the same including the time (I don't expect that the improvement should be visible on speed of the tests, at least not much). (7% is little bit more than on the server but the one test which is different is unrelated to this.) Now running tests also with the other compressions.

And a bad memory read that was in there before that valgrind discovered. And a segfault in n_les_assemble.c (via r.gwflow) that someone should check out.

I don't understand you completely. r.gwflow doesn't fail with the current trunk and it doesn't fail even with your changes now for me. Does it fail for you?

by wenzeslaus, 9 years ago

Benchmark DCELL

comment:10 by wenzeslaus, 9 years ago

The tests are running well for me also with GRASS_INT_LZ4HC=1, GRASS_INT_ZSTD=1 and GRASS_INT_ZLIB=0 (RLE). I haven't tried GRASS_COMPRESS_NULLS=1.

Here is a report (polished) created by the attached benchmark script based on what you posted which uses completely random data. I have used GRASS_COMPRESS_NULLS=1 and region with 30,000,000 cells. The disk was SSD, OS was Linux.

ZLIB compression writing
========================

Performance counter stats for 'r.mapcalc expression=test_rast_orig=double(test_rast_z_base)' (10 runs):

::


      10415.903368 task-clock (msec)         #    0.993 CPUs utilized            ( +-  0.36% )
              2798 context-switches          #    0.269 K/sec                    ( +-  2.11% )
                23 cpu-migrations            #    0.002 K/sec                    ( +-  4.20% )
            325702 page-faults               #    0.031 M/sec                    ( +-  0.00% )
       30804357778 cycles                    #    2.957 GHz                      ( +-  0.42% )
        9055572140 stalled-cycles-frontend   #   29.40% frontend cycles idle     ( +-  1.81% )
       47982328290 instructions              #    1.56  insns per cycle        
                                             #    0.19  stalled cycles per insn  ( +-  0.02% )
        7087070642 branches                  #  680.409 M/sec                    ( +-  0.02% )
         341325584 branch-misses             #    4.82% of all branches          ( +-  0.05% )

      10.489354952 seconds time elapsed                                          ( +-  0.41% )

RLE compression writing
=======================

Performance counter stats for 'r.mapcalc expression=test_rast_rle=double(test_rast_z_base)' (10 runs):

::


      10367.674362 task-clock (msec)         #    0.999 CPUs utilized            ( +-  0.53% )
              1642 context-switches          #    0.158 K/sec                    ( +- 18.72% )
                22 cpu-migrations            #    0.002 K/sec                    ( +-  5.32% )
            325702 page-faults               #    0.031 M/sec                    ( +-  0.00% )
       30666690391 cycles                    #    2.958 GHz                      ( +-  0.38% )
        8921313281 stalled-cycles-frontend   #   29.09% frontend cycles idle     ( +-  1.80% )
       47975696799 instructions              #    1.56  insns per cycle        
                                             #    0.19  stalled cycles per insn  ( +-  0.02% )
        7085878436 branches                  #  683.459 M/sec                    ( +-  0.02% )
         340649966 branch-misses             #    4.81% of all branches          ( +-  0.04% )

      10.382500561 seconds time elapsed                                          ( +-  0.53% )

LZ4 compression writing
=======================

Performance counter stats for 'r.mapcalc expression=test_rast_lz4=double(test_rast_z_base)' (10 runs):

::


       2490.815692 task-clock (msec)         #    0.999 CPUs utilized            ( +-  0.23% )
               321 context-switches          #    0.129 K/sec                    ( +- 13.63% )
                20 cpu-migrations            #    0.008 K/sec                    ( +-  5.02% )
               684 page-faults               #    0.274 K/sec                    ( +-  0.12% )
        7259170408 cycles                    #    2.914 GHz                      ( +-  0.12% )
        2305705372 stalled-cycles-frontend   #   31.76% frontend cycles idle     ( +-  0.20% )
       13796117271 instructions              #    1.90  insns per cycle        
                                             #    0.17  stalled cycles per insn  ( +-  0.06% )
        2790495244 branches                  # 1120.314 M/sec                    ( +-  0.05% )
          33371582 branch-misses             #    1.20% of all branches          ( +-  0.41% )

       2.492994675 seconds time elapsed                                          ( +-  0.23% )

LZ4HC compression writing
=========================

Performance counter stats for 'r.mapcalc expression=test_rast_lz4hc=double(test_rast_z_base)' (10 runs):

::


       6867.635439 task-clock (msec)         #    0.999 CPUs utilized            ( +-  0.25% )
               648 context-switches          #    0.094 K/sec                    ( +-  0.29% )
                21 cpu-migrations            #    0.003 K/sec                    ( +-  5.28% )
               745 page-faults               #    0.108 K/sec                    ( +-  0.18% )
       20199681252 cycles                    #    2.941 GHz                      ( +-  0.28% )
        6449729534 stalled-cycles-frontend   #   31.93% frontend cycles idle     ( +-  0.62% )
       31860120047 instructions              #    1.58  insns per cycle        
                                             #    0.20  stalled cycles per insn  ( +-  0.03% )
        5196919230 branches                  #  756.726 M/sec                    ( +-  0.03% )
         184132785 branch-misses             #    3.54% of all branches          ( +-  0.04% )

       6.873512386 seconds time elapsed                                          ( +-  0.25% )

ZSTD compression writing
========================

Performance counter stats for 'r.mapcalc expression=test_rast_zstd=double(test_rast_z_base)' (10 runs):

::


       3540.287381 task-clock (msec)         #    0.999 CPUs utilized            ( +-  0.20% )
               382 context-switches          #    0.108 K/sec                    ( +-  3.67% )
                24 cpu-migrations            #    0.007 K/sec                    ( +-  5.61% )
               776 page-faults               #    0.219 K/sec                    ( +-  0.13% )
       10367186950 cycles                    #    2.928 GHz                      ( +-  0.05% )
        3160263203 stalled-cycles-frontend   #   30.48% frontend cycles idle     ( +-  0.10% )
       19098247069 instructions              #    1.84  insns per cycle        
                                             #    0.17  stalled cycles per insn  ( +-  0.04% )
        3831842251 branches                  # 1082.353 M/sec                    ( +-  0.04% )
          35124859 branch-misses             #    0.92% of all branches          ( +-  0.16% )

       3.543262199 seconds time elapsed                                          ( +-  0.20% )

Original raster map test
========================

Performance counter stats for 'r.univar test_rast_z_base' (10 runs):

::


       2024.195978 task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.29% )
               646 context-switches          #    0.319 K/sec                    ( +-  0.64% )
                 0 cpu-migrations            #    0.000 K/sec                  
               457 page-faults               #    0.226 K/sec                    ( +-  0.04% )
        5934175598 cycles                    #    2.932 GHz                      ( +-  0.05% )
        1712911175 stalled-cycles-frontend   #   28.87% frontend cycles idle     ( +-  0.14% )
       11404604123 instructions              #    1.92  insns per cycle        
                                             #    0.15  stalled cycles per insn  ( +-  0.00% )
        2280049632 branches                  # 1126.398 M/sec                    ( +-  0.00% )
          32906874 branch-misses             #    1.44% of all branches          ( +-  0.37% )

       2.029035083 seconds time elapsed                                          ( +-  0.28% )

ZLIB compression reading
========================

Performance counter stats for 'r.univar test_rast_orig' (10 runs):

::


       2000.246389 task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.42% )
               640 context-switches          #    0.320 K/sec                    ( +-  1.09% )
                 0 cpu-migrations            #    0.000 K/sec                  
               458 page-faults               #    0.229 K/sec                    ( +-  0.02% )
        5930779846 cycles                    #    2.965 GHz                      ( +-  0.08% )
        1716273412 stalled-cycles-frontend   #   28.94% frontend cycles idle     ( +-  0.18% )
       11406021691 instructions              #    1.92  insns per cycle        
                                             #    0.15  stalled cycles per insn  ( +-  0.01% )
        2280208665 branches                  # 1139.964 M/sec                    ( +-  0.01% )
          32553520 branch-misses             #    1.43% of all branches          ( +-  0.24% )

       2.005018871 seconds time elapsed                                          ( +-  0.42% )

RLE compression reading
=======================

Performance counter stats for 'r.univar test_rast_rle' (10 runs):

::


       2016.279711 task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.34% )
               653 context-switches          #    0.324 K/sec                    ( +-  1.34% )
                 0 cpu-migrations            #    0.000 K/sec                    ( +- 50.92% )
               458 page-faults               #    0.227 K/sec                    ( +-  0.04% )
        5931202618 cycles                    #    2.942 GHz                      ( +-  0.07% )
        1711592367 stalled-cycles-frontend   #   28.86% frontend cycles idle     ( +-  0.13% )
       11406103365 instructions              #    1.92  insns per cycle        
                                             #    0.15  stalled cycles per insn  ( +-  0.01% )
        2280223560 branches                  # 1130.906 M/sec                    ( +-  0.01% )
          32763877 branch-misses             #    1.44% of all branches          ( +-  0.41% )

       2.021075900 seconds time elapsed                                          ( +-  0.34% )

LZ4 compression reading
=======================

Performance counter stats for 'r.univar test_rast_lz4' (10 runs):

::


        690.267191 task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.37% )
               235 context-switches          #    0.341 K/sec                    ( +-  1.55% )
                 0 cpu-migrations            #    0.000 K/sec                  
               449 page-faults               #    0.650 K/sec                    ( +-  0.04% )
        2003905382 cycles                    #    2.903 GHz                      ( +-  0.11% )
         586090598 stalled-cycles-frontend   #   29.25% frontend cycles idle     ( +-  0.29% )
        3982189156 instructions              #    1.99  insns per cycle        
                                             #    0.15  stalled cycles per insn  ( +-  0.04% )
         971667430 branches                  # 1407.669 M/sec                    ( +-  0.02% )
             64052 branch-misses             #    0.01% of all branches          ( +-  2.47% )

       0.691904075 seconds time elapsed                                          ( +-  0.37% )

LZ4HC compression reading
=========================

Performance counter stats for 'r.univar test_rast_lz4hc' (10 runs):

::

        692.453563 task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.18% )
               243 context-switches          #    0.351 K/sec                    ( +-  0.96% )
                 0 cpu-migrations            #    0.000 K/sec                  
               449 page-faults               #    0.649 K/sec                    ( +-  0.03% )
        1999520778 cycles                    #    2.888 GHz                      ( +-  0.09% )
         581415687 stalled-cycles-frontend   #   29.08% frontend cycles idle     ( +-  0.23% )
        3982233099 instructions              #    1.99  insns per cycle        
                                             #    0.15  stalled cycles per insn  ( +-  0.04% )
         971675561 branches                  # 1403.236 M/sec                    ( +-  0.02% )
             63306 branch-misses             #    0.01% of all branches          ( +-  1.98% )

       0.694124867 seconds time elapsed                                          ( +-  0.18% )

ZSTD compression reading
========================


Performance counter stats for 'r.univar test_rast_zstd' (10 runs):

::

       1168.682507 task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.41% )
               377 context-switches          #    0.323 K/sec                    ( +-  1.35% )
                 0 cpu-migrations            #    0.000 K/sec                  
               460 page-faults               #    0.394 K/sec                    ( +-  0.06% )
        3397563517 cycles                    #    2.907 GHz                      ( +-  0.06% )
         780090112 stalled-cycles-frontend   #   22.96% frontend cycles idle     ( +-  0.28% )
        8084726269 instructions              #    2.38  insns per cycle        
                                             #    0.10  stalled cycles per insn  ( +-  0.02% )
        1325103816 branches                  # 1133.844 M/sec                    ( +-  0.02% )
            426732 branch-misses             #    0.03% of all branches          ( +-  1.51% )

       1.171226998 seconds time elapsed                                          ( +-  0.41% )

Check of types and compression
==============================

::

    <test_rast_z_base> is compressed (level 2: DEFLATE). Data type: <DCELL>
    <test_rast_orig> is compressed (level 2: DEFLATE). Data type: <DCELL>
    <test_rast_rle> is compressed (level 1: RLE). Data type: <DCELL>
    <test_rast_lz4> is compressed (level 3: LZ4). Data type: <DCELL>
    <test_rast_lz4hc> is compressed (level 4: LZ4HC). Data type: <DCELL>
    <test_rast_zstd> is compressed (level 5: ZSTD). Data type: <DCELL>

File sizes
==========

::

 240045009 Oct  9 23:26 fcell/test_rast_lz4
 240045009 Oct  9 23:27 fcell/test_rast_lz4hc
 229517654 Oct  9 23:24 fcell/test_rast_orig
 229517654 Oct  9 23:26 fcell/test_rast_rle
 229517654 Oct  9 23:22 fcell/test_rast_z_base
 227636390 Oct  9 23:28 fcell/test_rast_zstd
 105009 Oct  9 23:26 cell_misc/test_rast_lz4/null2
 105009 Oct  9 23:27 cell_misc/test_rast_lz4hc/null2
 125009 Oct  9 23:24 cell_misc/test_rast_orig/null2
 125009 Oct  9 23:26 cell_misc/test_rast_rle/null2
 125009 Oct  9 23:22 cell_misc/test_rast_z_base/null2
 175009 Oct  9 23:28 cell_misc/test_rast_zstd/null2

comment:11 by sprice, 9 years ago

Completely random data is not a good test because (in theory) it won't compress. However, it looks like it demonstrated the expected results.

If you do a diff with n_les_assemble.c you'll see that I added a few if statements to ensure that the indexing stays within row/col bounds. I was getting segfaults with corrupt data before I fixed the other bugs. I figure it should be able to handle any data without segfaulting, even if corrupt.

in reply to:  11 comment:12 by wenzeslaus, 9 years ago

Replying to sprice:

Completely random data is not a good test because (in theory) it won't compress. However, it looks like it demonstrated the expected results.

Yes, probably G7:r.surf.fractal or G7:r.random.surface are better, but G7:r.mapcalc with rand() is kind of worst-case scenario and the compression still worked well!

If you do a diff with n_les_assemble.c...

I've created a separate ticket #2754 for this. Let's discuss it there.

I've executed the tests with GRASS_COMPRESS_NULLS=1 in combination with default compression and GRASS_INT_LZ4=1 and they seems to be OK (although there is some issue with r.slope.aspect test but that's unrelated). But still, the test coverage is limited, user testing is needed.

comment:13 by wenzeslaus, 9 years ago

In the diff I see the following pattern (this if from lib/raster/get_row.c, read_data_fp_compressed() function):

-    if ((size_t) G_zlib_read(fcb->data_fd, readamount, data_buf, bufsize) != bufsize)
+    cmp = G_alloca(readamount);
+
+    if (read(fcb->data_fd, cmp, readamount) != readamount) {
+        G_freea(cmp);
        G_fatal_error(_("Error reading raster data for row %d of <%s>"),
                      row, fcb->name);
 }
 
+    if (cmp[0] == '0') // flate.c::G_ZLIB_COMPRESSED_NO
+        // Uncompressed
+        memcpy(data_buf, cmp+1, bufsize);
+    else
+        if (cmp[0] == 3 || cmp[0] == 4)
+//          LZ4_decompress_safe((char *)(cmp+1), (char *)data_buf,
+//                              readamount-1, bufsize);
+        LZ4_decompress_fast((char *)(cmp+1), (char *)data_buf, bufsize);
+    else if (cmp[0] == 5)
+        ZSTD_decompress(data_buf, bufsize, cmp+1, readamount-1);
+    else
+        G_zlib_expand(cmp+1, readamount-1, data_buf, bufsize);
+
+    G_freea(cmp);
+}

In words, G_zlib_read() function is replaced by a block of code. Similar applies to G_zlib_write() function and to certain extent to G_zlib_expand() and zlib_compress() as well.

My question is if it wouldn't be more advantageous to create some wrapper which would take the all necessary inputs including compression type and do the necessary switches and format specific things.

I suppose this would make easier to add the same compression options also to 3D raster library where G_zlib_read() and G_zlib_write() functions are already used (lib/raster3d/fpcompress.c) or perhaps even to some other code such as lib/segment where no compression is used.

in reply to:  13 ; comment:14 by glynn, 9 years ago

Replying to wenzeslaus:

My question is if it wouldn't be more advantageous to create some wrapper which would take the all necessary inputs including compression type and do the necessary switches and format specific things.

Agreed.

In practical terms, there are only two distinct cases: uncompressed (where the size of the data read or written matches the size of the data stored in the file) and compressed (where the sizes differ). Everything else is just options.

comment:15 by sprice, 9 years ago

I considered writing a more general function for compression but I held off for two reasons:

1) There seems to be a lot of very specific code depending on exactly what you're doing. For example, for read_data_fp_compressed() there was functionality hidden in G_zlib_read to save a header for each compressed row. Or the use of G_ZLIB_COMPRESSED_NO ('0') as a flag for FP compression, vs various integers defining CELL compression. To keep backwards compatibility, I delt with the different formats (CELL vs NULL vs FCELL/DCELL) on a case-by-case basis.

2) I'm not familiar with the best practices for the code in GRASS GIS. I'd encourage someone else to create new files (or libraries) as required. I can help as required, but I'm concerned that as each edge case is considered (as mentioned above) there won't be all that much common code between the various compression uses.

in reply to:  15 comment:16 by wenzeslaus, 9 years ago

Replying to sprice:

I considered writing a more general function for compression but I held off for two reasons:

1) There seems to be a lot of very specific code depending on exactly what you're doing.

My point is to put these specific cases into one function. So for example, whatever in the comment:13, should be in one function. What I suggest should be simple to moving of the existing code unless I miss some important part.

For example, for read_data_fp_compressed() there was functionality hidden in G_zlib_read to save a header for each compressed row.

Here I perhaps hit my limited understanding of the code in question. What is doing this job now? Or it is not needed?

Or the use of G_ZLIB_COMPRESSED_NO ('0') as a flag for FP compression, vs various integers defining CELL compression.

This is the kind of things which I would like to have hidden in one place. (In this way, we can keep the strange things at one place and e.g. easily replace, literals by defines (which we should do anyway, BTW).

To keep backwards compatibility, I delt with the different formats (CELL vs NULL vs FCELL/DCELL) on a case-by-case basis.

You mean e.g. read_data_compressed() versus read_data_fp_compressed()? I had no time to actually try to identify all the differences, but the idea would be that these functions should deal with the differences among CELL, NULL and FCELL/DCELL while the proposed "G_compresslib_write()" function would deal with the differences between compression formats.

I'm concerned that as each edge case is considered (as mentioned above) there won't be all that much common code between the various compression uses.

Another way to look at it is to think about, what do I have to replicate in 3D raster library where only G_zlib_* is used now. The things I need to replicate to enable other compressions should be in the "G_compresslib_*()" functions.

I won't be able to give it attention it deserves in the following weeks, so please don't expect any actual code from me now. But feel free to oppose me or try it yourself in the mean time.

comment:17 by sprice, 9 years ago

This makes sense, but requires work that neither of us has time for. My main concern really is to get this merged & working before additional changes such as those discussed in #2762 & #2349 make merging difficult. What do you think about merging these changes now, and then upgrading other I/O functions as you have time?

in reply to:  17 comment:18 by wenzeslaus, 9 years ago

Replying to sprice:

My main concern really is to get this merged & working before additional changes such as those discussed in #2762 & #2349 make merging difficult.

That's right, but I would like to see some more testing before merging it. Any chance somebody can do some? There are roughly three ways how to do that after applying the patch:

  • Set the variables (see above) and run the automatic tests on NC SPM and report the results back.
  • Run the benchmark Bash script attached to this ticket and compare with results already reported.
  • Set the variables and go through some your workflow and check the results.

It seems to be that #2349 has all issues solved and committed, but lacks feedback/testing. Conflicts with #2762 would be probably solvable, but yes, it would be better to avoid them.

What do you think about merging these changes now, and then upgrading other I/O functions as you have time?

This can work. The refactoring I'm suggesting shouldn't be difficult at the end, although we should take some time to do it right.

in reply to:  14 ; comment:19 by mmetz, 9 years ago

Replying to glynn:

Replying to wenzeslaus:

My question is if it wouldn't be more advantageous to create some wrapper which would take the all necessary inputs including compression type and do the necessary switches and format specific things.

Agreed.

In practical terms, there are only two distinct cases: uncompressed (where the size of the data read or written matches the size of the data stored in the file) and compressed (where the sizes differ). Everything else is just options.

Without knowing about this ticket, I have implemented something like this recently and added support for LZ4 (and BZIP2) compression to my local copy of GRASS trunk. The motivation was to have a compressor that is substantially faster than ZLIB but still better than RLE, and another compressor that is substantially better (higher compression) than ZLIB but not exceedingly slow. XZ with lzma2 is 1) too slow, 2) uses too much memory, 3) does not compress binary raster data better than BZIP2.

In particular, I have added support for new compressors to gislib, not rasterlib. As before gislib does the actual compression, not rasterlib. My gislib now also handles LZ4 and BZIP2 compression. The actual change to rasterlib is to replace G_zlib_compress() with G_compress(..., int compressor) and G_zlib_expand() with G_expand(..., int compressor). G_zlib_write() and G_zlib_read() are now G_write_compressed(..., int compressor) and G_read_compressed(..., int compressor). Here, "..." means same arguments as before. The new argument "compressor" is actually "cellhd.compressed" with the same meaning as before. The internal function zlib_compress is no longer needed.

As before, the compressor type is encoded in cellhd.compressed with previously 0: no compression, 1: RLE, 2: ZLIB, now also 3: LZ4, 4: BZIP2.

r.univar results for CELL, FCELL, and DCELL maps are identical, independent of the compressor. The new gislib interface to compress data is generic and it is easy to add any other compressor, e.g. LZ4HC or ZSTD.

Generally, any new compression method should go into gislib and not into rasterlib, just like ZLIB compression has been done by gislib. This keeps changes to the rasterlib to a minimum and makes debugging easier.

For fast storage devices with plenty of space, LZ4 is by far the fastest, at the same time providing some reasonable compression where possible.

For slow storage devices, e.g. accessed over network, BZIP2 compression is the fastest (yes, faster than LZ4) because the amount of data is the least (50% - 70% of ZLIB). That reduces network traffic and saves disk space. For my work, it would be a big advantage to use LZ4 for actual processing on fast local disks and BZIP2 for storing the final results on sometimes very slow network attached storage.

The compressor type for new raster maps could be selected with one other environment variable GRASS_COMPRESSOR, e.g. GRASS_COMPRESSOR=LZ4

I am not sure about the pro's and con's for using compressors other than ZLIB. ZLIB is a good compromise of speed and compression. Adding other compressors to G7.1 means that raster data compressed with a new method can not be opened by G7.0 or earlier. New compression types should, if added to G7.1, be clearly marked as "use it only if you really know what you are doing". I would profit from the choice of other compressors, but on a standard laptop/desktop system the current G7 default of ZLIB is probably the best alround solution.

I am attaching a patch for trunk r66775 and an archive with new files to go to lib/gis

in reply to:  19 ; comment:20 by wenzeslaus, 9 years ago

Replying to mmetz:

I have implemented something like this recently and added support for LZ4 (and BZIP2) compression to my local copy of GRASS trunk.

I am attaching a patch for trunk r66775 and an archive with new files to go to lib/gis

The design in the patch looks really good. I did tests and benchmark but it was not as successful as I hoped for.

The benchmark was the same as before but modified for this patch. It is more for testing than benchmark anyway. It was on 30,000,000 cells but perhaps the previous one was on more and it is not completely precise overall due to some other computations running at the same time (although the result is from 10 runs aggregated by perf).

type write read
NONE 2.58 0.72
ZLIB 1.52 0.93
LZ4 1.56 0.85

RLE and BZIP2 are missing because I got and error when writing using r.mapcalc with RLE:

*** Error in `r.mapcalc': free(): invalid next size (normal): 0x0000000000edacd0 ***
r.mapcalc: Aborted

The the commands are:

export GRASS_COMPRESSOR=RLE
r.mapcalc expression="test_rast_z_base = rand(double(-200.), 900)" seed=100
r.mapcalc expression=test_rast_rle=double(test_rast_z_base)

I guess I'm missing something in configure because with BZIP2 it says "ERROR: GRASS needs to be compiled with BZIP2 for BZIP2 compression".

The report from r.compress:

<test_rast_z_base> is compressed (level 2: DEFLATE). Data type: <DCELL>
<test_rast_none> is uncompressed (level 0: NONE). Data type: <DCELL>
<test_rast_zlib> is compressed (level 2: DEFLATE). Data type: <DCELL>
<test_rast_lz4> is compressed (level 3: DEFLATE). Data type: <DCELL>

When running the tests with

grass71 /grassdata/ncarolina_smp_base/practice1 --exec python -m grass.gunittest.main--location ncarolina_smp_base --location-type nc

I get

ERROR: Error reading raster data for row 0 of <elevation>

from many raster-related tests. Sometimes it is a different row number. The environmental variable GRASS_COMPRESSOR was not set. I didn't get if you already meant it to be backwards compatible with existing maps.

by wenzeslaus, 9 years ago

Benchmark DCELL rasters with GRASS_COMPRESSOR

in reply to:  20 ; comment:21 by mmetz, 9 years ago

Replying to wenzeslaus:

Replying to mmetz:

I have implemented something like this recently and added support for LZ4 (and BZIP2) compression to my local copy of GRASS trunk.

I am attaching a patch for trunk r66775 and an archive with new files to go to lib/gis

The design in the patch looks really good. I did tests and benchmark but it was not as successful as I hoped for.

The benchmark was the same as before but modified for this patch. It is more for testing than benchmark anyway. It was on 30,000,000 cells

I tested on 552,000,000 cells

but perhaps the previous one was on more and it is not completely precise overall due to some other computations running at the same time (although the result is from 10 runs aggregated by perf).

type write read
NONE 2.58 0.72
ZLIB 1.52 0.93
LZ4 1.56 0.85

RLE and BZIP2 are missing because I got and error when writing using r.mapcalc with RLE:

> *** Error in `r.mapcalc': free(): invalid next size (normal): 0x0000000000edacd0 ***
> r.mapcalc: Aborted

Fixed in the updated patch "compressors.patch"

The the commands are:

export GRASS_COMPRESSOR=RLE
r.mapcalc expression="test_rast_z_base = rand(double(-200.), 900)" seed=100
r.mapcalc expression=test_rast_rle=double(test_rast_z_base)

RLE compression is not and was never supported for fp maps. Also, if you use r.mapcalc's rand() function, you test the rand() function and not the compressor because all compressors are much faster than the rand() function. (BTW, there are nice fast random number generators in gsl).

I guess I'm missing something in configure because with BZIP2 it says "ERROR: GRASS needs to be compiled with BZIP2 for BZIP2 compression".

You will need to hack in BZIP2 support, or make distclean, apply the patch "bzipsupport.patch", configure --with-bzlib, make

The report from r.compress:

I have not updated r.compress. For r.compress, and for meaningful warning/error messages, something like char *G_compressor_name(int compressor) that gives the compressor name could be useful.

When running the tests with

grass71 /grassdata/ncarolina_smp_base/practice1 --exec python -m grass.gunittest.main--location ncarolina_smp_base --location-type nc

I get

> ERROR: Error reading raster data for row 0 of <elevation>

from many raster-related tests. Sometimes it is a different row number. The environmental variable GRASS_COMPRESSOR was not set. I didn't get if you already meant it to be backwards compatible with existing maps.

Fixed in the new patch "compressors.patch". I meant it to be backwards compatible, but that one slipped through: old fp map compression is sometimes 1, sometimes 2, but both 1 and 2 mean ZLIB.

Thanks for testing!

in reply to:  21 comment:22 by mmetz, 9 years ago

Replying to mmetz:

Replying to wenzeslaus:

Replying to mmetz:

I have implemented something like this recently and added support for LZ4 (and BZIP2) compression to my local copy of GRASS trunk.

I am attaching a patch for trunk r66775 and an archive with new files to go to lib/gis

I guess I'm missing something in configure because with BZIP2 it says "ERROR: GRASS needs to be compiled with BZIP2 for BZIP2 compression".

You will need to hack in BZIP2 support, or make distclean, apply the patch "bzipsupport.patch", configure --with-bzlib, make

..and use the updated files in "giscompress.tar.gz"

in reply to:  20 ; comment:23 by mmetz, 9 years ago

Replying to wenzeslaus:

Replying to mmetz:

I have implemented something like this recently and added support for LZ4 (and BZIP2) compression to my local copy of GRASS trunk.

I am attaching a patch for trunk r66775 and an archive with new files to go to lib/gis

The design in the patch looks really good. I did tests and benchmark but it was not as successful as I hoped for.

The benchmark was the same as before but modified for this patch. It is more for testing than benchmark anyway. It was on 30,000,000 cells but perhaps the previous one was on more and it is not completely precise overall due to some other computations running at the same time (although the result is from 10 runs aggregated by perf).

type write read
NONE 2.58 0.72
ZLIB 1.52 0.93
LZ4 1.56 0.85

Some more explanation about the proposed new mechanism:

The proposed G_compress() interface provides a generic mechanism to data compression in libgis, not restricted to raster data but generic. Built-in compression methods would be no compression, RLE, ZLIB, ZL4. BZIP2 compression would be available if GRASS is configured --with-bzlib. Other compression methods could be added by cloning lib/gis/flate.c and adding new G_*_compress() and G_*_expand() functions to lib/gis/compress.[h|c]. The raster lib does not need to be modified any more.

As before, the rasterlib makes only partial use of the generic compression methods: no compression and RLE is handled by the rasterlib internally, and RLE is not supported for fp maps. Creating uncompressed raster maps has been and should be only possible with Rast_open_new_uncompressed().

That means that the behaviour of the rasterlib with using GRASS_COMPRESSOR=NONE needs to be defined: really use no compression or use default compression instead? Using GRASS_COMPRESSOR=RLE affects only new CELL maps. For fp maps, compression_type = 1 (RLE) is as before interpreted as ZLIB compression.

If you want to know the original amount of data passed to any compressor, you need to use

GRASS_COMPRESSOR=ZLIB
GRASS_ZLIB_LEVEL=0

ZLIB level = 0 tells ZLIB to copy the data as is from source to destination. With CELL maps, the rasterlib will then still trim high zero bytes with trim_bytes() which can already reduce the data size considerably, but ZLIB will not compress the data.

I modified gislib_compressor_benchmark.sh to use

GRASS_COMPRESSOR=ZLIB
GRASS_ZLIB_LEVEL=0

for no compression and discarded RLE because it is inefficient for CELL maps and not supported for fp maps. I tested also ZLIB levels 1 (fastest) and 6 (ZLIB default).

I used the nc_basic_spm_grass7 location and set the region with

g.region -p rast=elevation res=2.5

resulting in 32,000,000 cells

The test raster was generated with

r.mapcalc expression="test_rast_z_base = rand(double(-200.), 900)" seed=100

random numbers very difficult to compress.

The write and read columns in the tables below have seconds as unit.

compressor size MB size % write read
NONE 259.2 100 5.2 1.4
ZLIB 1 247.9 95.6 14.3 2.5
ZLIB 6 246.9 95.3 16.3 2.4
LZ4 259.2 100 4.5 1.1
BZIP2 249.4 96.2 63.4 19.9

LZ4 is the fastest method, no method is really the best because these random numbers could not be compressed to less than 95% of the original size.

The next test raster was generated with

r.mapcalc expression="test_rast2_z_base = elevation"

which had 4x4 blocks of identical raster values, should be easy to compress

compressor size MB size % write read
NONE 259.2 100 4.2 1.1
ZLIB 1 41.8 16.1 4.7 1.6
ZLIB 6 32.1 12.4 11.1 1.4
LZ4 71.5 27.6 2.2 0.9
BZIP2 49.8 19.2 28.0 7.1

LZ4 was again the fastest, and the best was ZLIB level 6. Here, the performance of BZIP2 was not convincing: by far the slowest and not as good as ZLIB.

Then I tested with MODIS land surface temperature for Europe, a bit more than 400,000,000 cells:

LST as CELL

compressor size MB size % write read
NONE 829 100 28.5 14.8
ZLIB 1 269 32.4 30.5 14.9
ZLIB 6 261 31.5 36.7 16.5
LZ4 366 44.1 20.0 13.0
BZIP2 175 21.1 89.8 29.2

LZ4 was the fastest, BZIP2 was the best.

LST as DCELL

compressor size MB size % write read
NONE 3300 100 85.5 52.9
ZLIB 1 503 15.2 62.7 23.9
ZLIB 6 370 11.2 129.8 21.6
LZ4 629 19.1 29.5 13.9
BZIP2 196 5.9 221 51.1

Again, LZ4 was the fastest, BZIP2 was the best.

I am interested in having BZIP2 because for these LST data it compresses 30 - 50% better than the second best (ZLIB level 6).

comment:24 by wenzeslaus, 9 years ago

I was not able to make BZIP2 work for me. I would say I miss something like addition to GISDEPS but I was not able to fix it (getting undefined reference to some BZ function).

I was able to run the tests successfully with LZ4 (on modified r66770):

GRASS_COMPRESSOR=LZ4 \
    ./bin.../grass71  /grassdata/nc_basic/practice1 --exec \
    python -m grass.gunittest.main --location nc_basic --location-type nc

The benchmarks look really good. I was getting quite good results before even on the random data, perhaps pure luck. But I wrongly recorded the region, it seems, so even with the seed I cannot reproduce; not important anyway.

in reply to:  24 ; comment:25 by mmetz, 9 years ago

Replying to wenzeslaus:

I was not able to make BZIP2 work for me. I would say I miss something like addition to GISDEPS but I was not able to fix it (getting undefined reference to some BZ function).

You need to add $(BZLIB) to GISDEPS in include/Make/Grass.make. I will update bzipsupport.patch accordingly.

by mmetz, 9 years ago

Attachment: bzipsupport.patch added

bzip2 support patch

by mmetz, 9 years ago

Attachment: compressors.patch added

minimal modification to add more compressors

by mmetz, 9 years ago

Attachment: giscompress.tar.gz added

new files for lib/gis to support different compressors

in reply to:  25 ; comment:26 by mmetz, 9 years ago

Replying to mmetz:

Replying to wenzeslaus:

I was not able to make BZIP2 work for me. I would say I miss something like addition to GISDEPS but I was not able to fix it (getting undefined reference to some BZ function).

You need to add $(BZLIB) to GISDEPS in include/Make/Grass.make. I will update bzipsupport.patch accordingly.

The patches compressors.patch and compressors.patch are also updated to properly support NULL compression. Data for the null2 file are now compressed with the same method like actual raster values. Because NULL data usually occur in long sequences of identical values, the new generic RLE algorithm I wrote for lib/gis/compress.c seems to perform best (highest compression, very fast). However, for consistency it would be better to use the same compressor for both the null bitsream and the raster data.

comment:27 by wenzeslaus, 9 years ago

Thanks for the patch for BZIP2, I've tried to do it before but I did something wrong. Now it works and the tests are running:

GRASS_COMPRESSOR=BZIP2 ./bin.x86_64-unknown-linux-gnu/grass71 /grassdata/nc_basic_spm_grass7/user1 --exec python -m grass.gunittest.main --location nc_basic_spm_grass7 --location-type nc

in reply to:  26 comment:28 by wenzeslaus, 9 years ago

Replying to mmetz:

The patches compressors.patch and compressors.patch are also updated to properly support NULL compression. Data for the null2 file are now compressed with the same method like actual raster values.

Tested with GRASS_COMPRESSOR=LZ4, GRASS_COMPRESSOR=BZIP2 using the command above but as I mentioned here or in #2349, the test do not cover NULL files well.

in reply to:  23 comment:29 by glynn, 9 years ago

Replying to mmetz:

As before, the rasterlib makes only partial use of the generic compression methods: no compression and RLE is handled by the rasterlib internally, and RLE is not supported for fp maps. Creating uncompressed raster maps has been and should be only possible with Rast_open_new_uncompressed().

There's no fundamental reason why RLE can't be supported for FP maps; it's just seldom useful. RLE is most useful for category data where you have contiguous areas with the same value, and such maps tend to be integer maps (e.g. you can't create a reclass of a FP map).

No compression used to be a special case because it means that maps don't have to be written sequentially: 6.x has G_put_map_row_random() which allows data to be written to an arbitrary row rather than to the "next" row. 7.x doesn't have an equivalent because nothing actually used it.

In theory, even compressed maps can be written non-sequentially: just append the data to the end of the file then update the row pointer for the row. However, overwriting a row will result in the old data taking up space in the file.

comment:30 by neteler, 9 years ago

"User" comment: I would suggest to always activate the NULL file compression by default, anything else would be confusing.

in reply to:  26 comment:31 by wenzeslaus, 9 years ago

Replying to mmetz:

Data for the null2 file are now compressed with the same method like actual raster values. Because NULL data usually occur in long sequences of identical values, the new generic RLE algorithm I wrote for lib/gis/compress.c seems to perform best (highest compression, very fast). However, for consistency it would be better to use the same compressor for both the null bitsream and the raster data.

For advanced users it seems that different compression for FCELL+DCELL and for NULL+CELL would make sense. If I have a MODIS LST for an island, then I want to use RLE for NULL files but BZIP2 for DCELL.

comment:32 by sprice, 9 years ago

Hey all, just checking in. What is remaining to get this pushed to the GRASS v7.1 code?

in reply to:  32 ; comment:33 by neteler, 9 years ago

Replying to sprice:

Hey all, just checking in. What is remaining to get this pushed to the GRASS v7.1 code?

For me two things are yet open.

First:

Replying to neteler:

"User" comment: I would suggest to always activate the NULL file compression by default, anything else would be confusing.

--> not yet reflected in the patch (so, the meaning of the var GRASS_COMPRESS_NULLS should be inverted or, better, implemented as GRASS_NOCOMPRESS_NULLS=1. I have terabytes of uncompressed NULL files at time and would like to get rid of wasting too much disk space.

Second:

We need a patch for GRASS GIS 7.0 for backward compatibility, at least for the NULL file compression. Which leads to the suggestion that either RLE or BZIP should be used here since these two compressors are already present in releasebranch70.

comment:34 by sprice, 9 years ago

For "backward compatibility" do you mean files written by grass71 can be read by grass70? Or would you like a similar patch to also work with the grass70 branch?

in reply to:  34 ; comment:35 by neteler, 9 years ago

Sorry for the delay.

Replying to sprice:

For "backward compatibility" do you mean files written by grass71 can be read by grass70?

Yes, exactly.

Or would you like a similar patch to also work with the grass70 branch?

Perhaps in future, once we know that the grass71 implementation runs well on all OS.

Last edited 9 years ago by neteler (previous) (diff)

in reply to:  33 ; comment:36 by mmetz, 9 years ago

Replying to neteler:

Replying to sprice:

Hey all, just checking in. What is remaining to get this pushed to the GRASS v7.1 code?

For me two things are yet open.

First:

Replying to neteler:

"User" comment: I would suggest to always activate the NULL file compression by default, anything else would be confusing.

--> not yet reflected in the patch (so, the meaning of the var GRASS_COMPRESS_NULLS should be inverted or, better, implemented as GRASS_NOCOMPRESS_NULLS=1. I have terabytes of uncompressed NULL files at time and would like to get rid of wasting too much disk space.

That concerns #2349.

Second:

We need a patch for GRASS GIS 7.0 for backward compatibility, at least for the NULL file compression. Which leads to the suggestion that either RLE or BZIP should be used here since these two compressors are already present in releasebranch70.

No, neither the generic RLE algorithm nor BZIP support is available in relbr70. Note that CELL RLE compression is rasterlib-internal and tailored for integer numbers with variable byte size. Therefore it does not make sense to use the rasterlib-internal RLE method to compress null files where the informative unit is one bit, not 1-4 bytes. For relbr70, null files could be (un-)compressed with ZLIB's DEFLATE, other methods would not be supported.

in reply to:  36 comment:37 by neteler, 9 years ago

Replying to mmetz:

For relbr70, null files could be (un-)compressed with ZLIB's DEFLATE, other methods would not be supported.

OK, also fine as long as one compliant solution exists.

in reply to:  35 comment:38 by mmetz, 9 years ago

Replying to neteler:

Sorry for the delay.

Replying to sprice:

For "backward compatibility" do you mean files written by grass71 can be read by grass70?

Yes, exactly.

Or would you like a similar patch to also work with the grass70 branch?

Perhaps in future, once we know that the grass71 implementation runs well on all OS.

I have added different compression methods (LZ4, BZIP2) in r67212-16. Please test.

comment:39 by neteler, 9 years ago

(requires: r65272, r65273, r65322, r65323, r65489, r65490, r65491, r65775 from #2349)

For the record, all changesets related to this ticket which I could identify (for a future backport):

r67212, r67213, r67214, r67215, r67216, r67217, r67218, r67254, r67295, r67296

Note: In a local backport, I found a merge issue between r67214 and r67215 which I could not resolve. Any changeset missing from my list?

comment:40 by neteler, 9 years ago

To summarize the current state

We currently have the following compression methods in trunk (according to lib/gis/compress.c; set by export GRASS_COMPRESSOR=XXX):

  • NONE (uncompressed)
  • RLE (generic Run-Length Encoding of single bytes)
  • ZLIB (DEFLATE, good speed and compression)
    • with zlib compression levels (export GRASS_ZLIB_LEVEL=X): -1..9 (-1 is default which is level 6)
    • Notes from #comment:23: ZLIB level = 0 tells ZLIB to copy the data as is from source to destination. With CELL maps, the rasterlib will then still trim high zero bytes with trim_bytes() which can already reduce the data size considerably, but ZLIB will not compress the data.
  • LZ4 (fastest, low compression)
  • BZIP2 (slowest, high compression)

NULL file compression: At time it must be explicitly turned on (IMHO it should become the default) with

  • export GRASS_COMPRESS_NULLS=1

Backward NULL file compression compatibility could be implemented in relbranch70 only by ZLIB's DEFLATE (see #comment:36). Hence ZLIB may qualify for the default NULL compression algorithm.

comment:41 by neteler, 9 years ago

Some MODIS LST benchmark, using a European LST coverage from http://gis.cri.fmach.it/eurolst/

File size tests: trunk, r67403 (build_date=2015-12-28)

# dataset
r.univar -g lst_2002_196_average
n=188341794
null_cells=226948851
cells=415290645
min=-481
max=4570
...


########################
#### RLE
export GRASS_COMPRESSOR=RLE
export GRASS_COMPRESS_NULLS=1

ls -la modis_lst_daily*/cell_misc/lst_2002_196_average/*
-rw-r--r-- 1 neteler gis       10 Jun  9  2014 modis_lst_daily/cell_misc/lst_2002_196_average/range
-rw-r--r-- 1 neteler gis 51923025 Jun  9  2014 modis_lst_daily/cell_misc/lst_2002_196_average/null
-rw-r--r-- 1 neteler gis       44 Sep 22 15:15 modis_lst_daily/cell_misc/lst_2002_196_average/timestamp
-rw-r--r-- 1 neteler gis       10 Dec 28 15:51 modis_lst_daily_compr/cell_misc/lst_2002_196_average/range
-rw-r--r-- 1 neteler gis  1423931 Dec 28 15:51 modis_lst_daily_compr/cell_misc/lst_2002_196_average/null2

ls -la modis_lst_daily*/cell/lst_2002_196_average
-rw-r--r-- 1 neteler gis 257730745 Jun  9  2014 modis_lst_daily/cell/lst_2002_196_average
-rw-r--r-- 1 neteler gis 529008029 Dec 28 15:51 modis_lst_daily_compr/cell/lst_2002_196_average


########################
#### BZIP Level 1
export GRASS_COMPRESSOR=ZLIB
export GRASS_ZLIB_LEVEL=1
export GRASS_COMPRESS_NULLS=1

ls -la modis_lst_daily*/cell_misc/lst_2002_196_average/*
-rw-r--r-- 1 neteler gis       10 Jun  9  2014 modis_lst_daily/cell_misc/lst_2002_196_average/range
-rw-r--r-- 1 neteler gis 51923025 Jun  9  2014 modis_lst_daily/cell_misc/lst_2002_196_average/null
-rw-r--r-- 1 neteler gis       44 Sep 22 15:15 modis_lst_daily/cell_misc/lst_2002_196_average/timestamp
-rw-r--r-- 1 neteler gis       10 Dec 28 16:49 modis_lst_daily_compr/cell_misc/lst_2002_196_average/range
-rw-r--r-- 1 neteler gis  1579674 Dec 28 16:49 modis_lst_daily_compr/cell_misc/lst_2002_196_average/null2

ls -la modis_lst_daily*/cell/lst_2002_196_average
-rw-r--r-- 1 neteler gis 257730745 Jun  9  2014 modis_lst_daily/cell/lst_2002_196_average
-rw-r--r-- 1 neteler gis 263256860 Dec 28 16:49 modis_lst_daily_compr/cell/lst_2002_196_average


########################
#### BZIP Level 6 (default)
export GRASS_COMPRESSOR=ZLIB
export GRASS_ZLIB_LEVEL=6
export GRASS_COMPRESS_NULLS=1

ls -la modis_lst_daily*/cell_misc/lst_2002_196_average/*
-rw-r--r-- 1 neteler gis       10 Jun  9  2014 modis_lst_daily/cell_misc/lst_2002_196_average/range
-rw-r--r-- 1 neteler gis 51923025 Jun  9  2014 modis_lst_daily/cell_misc/lst_2002_196_average/null
-rw-r--r-- 1 neteler gis       44 Sep 22 15:15 modis_lst_daily/cell_misc/lst_2002_196_average/timestamp
-rw-r--r-- 1 neteler gis       10 Dec 28 17:54 modis_lst_daily_compr/cell_misc/lst_2002_196_average/range
-rw-r--r-- 1 neteler gis  1358116 Dec 28 17:54 modis_lst_daily_compr/cell_misc/lst_2002_196_average/null2

ls -la modis_lst_daily*/cell/lst_2002_196_average
-rw-r--r-- 1 neteler gis 257730745 Jun  9  2014 modis_lst_daily/cell/lst_2002_196_average
-rw-r--r-- 1 neteler gis 257730745 Dec 28 17:54 modis_lst_daily_compr/cell/lst_2002_196_average


########################
#### LZ4
export GRASS_COMPRESSOR=LZ4
export GRASS_COMPRESS_NULLS=1

ls -la modis_lst_daily*/cell_misc/lst_2002_196_average/*
-rw-r--r-- 1 neteler gis       10 Jun  9  2014 modis_lst_daily/cell_misc/lst_2002_196_average/range
-rw-r--r-- 1 neteler gis 51923025 Jun  9  2014 modis_lst_daily/cell_misc/lst_2002_196_average/null
-rw-r--r-- 1 neteler gis       44 Sep 22 15:15 modis_lst_daily/cell_misc/lst_2002_196_average/timestamp
-rw-r--r-- 1 neteler gis       10 Dec 28 17:52 modis_lst_daily_compr/cell_misc/lst_2002_196_average/range
-rw-r--r-- 1 neteler gis  1573603 Dec 28 17:52 modis_lst_daily_compr/cell_misc/lst_2002_196_average/null2

ls -la modis_lst_daily*/cell/lst_2002_196_average
-rw-r--r-- 1 neteler gis 257730745 Jun  9  2014 modis_lst_daily/cell/lst_2002_196_average
-rw-r--r-- 1 neteler gis 362932376 Dec 28 17:52 modis_lst_daily_compr/cell/lst_2002_196_average

########################
#### BZIP2
export GRASS_COMPRESSOR=BZIP2
export GRASS_COMPRESS_NULLS=1

ls -la modis_lst_daily*/cell_misc/lst_2002_196_average/*
-rw-r--r-- 1 neteler gis       10 Jun  9  2014 modis_lst_daily/cell_misc/lst_2002_196_average/range
-rw-r--r-- 1 neteler gis 51923025 Jun  9  2014 modis_lst_daily/cell_misc/lst_2002_196_average/null
-rw-r--r-- 1 neteler gis       44 Sep 22 15:15 modis_lst_daily/cell_misc/lst_2002_196_average/timestamp
-rw-r--r-- 1 neteler gis       10 Dec 28 17:49 modis_lst_daily_compr/cell_misc/lst_2002_196_average/range
-rw-r--r-- 1 neteler gis  1824531 Dec 28 17:49 modis_lst_daily_compr/cell_misc/lst_2002_196_average/null2

ls -la modis_lst_daily*/cell/lst_2002_196_average
-rw-r--r-- 1 neteler gis 257730745 Jun  9  2014 modis_lst_daily/cell/lst_2002_196_average
-rw-r--r-- 1 neteler gis 180209506 Dec 28 17:49 modis_lst_daily_compr/cell/lst_2002_196_average

Overall, ZLIB at Level 1 performs well followed by LZ4 (BZIP2 compresses best but was very slow; ZLIB level 6's small size gain over ZLIB level 1 is "paid" by performance).

in reply to:  33 comment:42 by martinl, 9 years ago

Replying to neteler:

--> not yet reflected in the patch (so, the meaning of the var GRASS_COMPRESS_NULLS should be inverted or, better, implemented as GRASS_NOCOMPRESS_NULLS=1. I have terabytes of uncompressed NULL files at time and would like to get rid of wasting too much disk space.

Or just perform NULL compression if environmental variable GRASS_COMPRESS_NULLS is not defined or defined and set to 1, otherwise do not perform compression.

in reply to:  41 ; comment:43 by mmetz, 9 years ago

Replying to neteler:

Overall, ZLIB at Level 1 performs well followed by LZ4 (BZIP2 compresses best but was very slow; ZLIB level 6's small size gain over ZLIB level 1 is "paid" by performance).

Now we have compression options for different use cases. For large raster maps (many columns, e.g. + 20,000), BZIP2 should provide the best compression and should require the least disk space. If disk space consumption and I/O speed do not matter, LZ4 could be used. The best compression method depends on the size of the raster to be compressed, the I/O reading and writing speed, and the available storage capacity. ZLIB level 1 provides a pretty good compromise. Optimizing raster map compression is now possible but requires testing with the current raster map(s) and the available storage infrastructure (I/O speed + capacity).

TODO: documentation + r.compress

in reply to:  43 comment:44 by mmetz, 9 years ago

Replying to mmetz:

TODO: documentation + r.compress

I have updated r.compress to 1) compress a raster map again even if it is already compressed, 2) print the method used to compress a given raster.

GRASS documentation in raster/rasterintro.html and raster/r.compress/r.compress.html now explains the currently available raster compression methods.

comment:45 by neteler, 9 years ago

Thanks. It would be good to now also pave the way to compress the NULL files by default (chose default method to be able to have read-support in relbr70). See also wish above and here:

https://grasswiki.osgeo.org/wiki/GRASS_Community_Sprint_Paris_2016#Raster_compression

in reply to:  45 ; comment:46 by mmetz, 9 years ago

Replying to neteler:

Thanks. It would be good to now also pave the way to compress the NULL files by default (chose default method to be able to have read-support in relbr70).

Why have this also in relbr70? Backporting new features from trunk to a release branch is a bad idea and against GRASS policy, I thought. You could instead release 7.1 if you want these features in a release branch.

ZLIB is not really the best method for null bits compression. With large chunks of NULL values and/or large chunks of non-NULL values, the new RLE method seems to be as fast as no null file compression and provides the best compression. For special cases, LZ4 should compress better than RLE at the same speed.

Some tests with a CELL map (MODIS LST Europe) with 18711 rows and 22195 columns using r.compress (processing time and size of the null2 file):

no null compression
14.8 sec, 51.9 MB
RLE
14.8 sec, 1.2 MB
ZLIB
16.8 sec, 1.4 MB
LZ4
14.8 sec, 1.4 MB
BZIP2
18.1 sec, 1.7 MB

in reply to:  46 ; comment:47 by neteler, 9 years ago

Replying to mmetz:

Why have this also in relbr70? Backporting new features from trunk to a release branch is a bad idea and against GRASS policy,

I wrote only that there must be one method to *read* compressed NULL files in G70. Not much of a big backport if I recall a personal discussion we had on this topic last year. That's all I meant above.

Or we need to release trunk as G8 in order to be allowed to break raster readability compatibility completely.

in reply to:  47 ; comment:48 by mmetz, 9 years ago

Replying to neteler:

Replying to mmetz:

Why have this also in relbr70? Backporting new features from trunk to a release branch is a bad idea and against GRASS policy,

I wrote only that there must be one method to *read* compressed NULL files in G70. Not much of a big backport if I recall a personal discussion we had on this topic last year. That's all I meant above.

Even only read support in G70 will require quite some changes to the rasterlib.

Or we need to release trunk as G8 in order to be allowed to break raster readability compatibility completely.

In your opinion:

  • Adding new compression methods in G71 for cell/fcell files that are not readable by G70 is ok.
  • Compression of null files in G71 not readable by G70 is not ok.

That does not make sense to me. Please explain.

BTW, the new compression methods for cell/fcell files available in trunk have a much larger impact on performance and disk space requirements than null file compression.

in reply to:  48 ; comment:49 by neteler, 9 years ago

Replying to mmetz:

In your opinion:

  • Adding new compression methods in G71 for cell/fcell files that are not readable by G70 is ok.

No. If the format is broken, we need to call it G8 and provide converters.

  • Compression of null files in G71 not readable by G70 is not ok.

Yes (I find double negation sentences confusing, though).

in reply to:  49 ; comment:50 by mmetz, 9 years ago

Replying to neteler:

Replying to mmetz:

In your opinion:

  • Adding new compression methods in G71 for cell/fcell files that are not readable by G70 is ok.

No. If the format is broken, we need to call it G8 and provide converters.

If we call it G8, converters are not needed because trunk as it is now can read G7 and G6 rasters.

Assuming trunk should be released as G7.1:

For full compatibility of relbr70 with trunk, not only read support for compressed null files would need to be backported, but also and more importantly read support for compressed cell/fcell files. These are new features at library level involving quite a few changes.

The current default settings in trunk are compatible with relbr70, and I do not see a reason to change the respective default settings in trunk. What exactly is the reason why null file compression should be enabled by default in trunk?

I will not backport the changes I did to trunk with regard to cell/cell file and null file compression because these are new features which should IMHO not go into an existing release branch. Someone else would need to do the backporting.

IMHO, the new compression methods for cell/fcell files are more important than null file compression because the potential gains in disk space and processing speed are much higher. Recompressing a large cell file from ZLIB to BZIP2 can gain disk space several times larger than the corresponding uncompressed(!) null file. In relation, a compressed null file would only be a small added bonus but not a substantial gain in free disk space. On the other hand, compressing cell/fcell files with LZ4 can substantially speed up processing time.

Short answer: release trunk now as G8, no need for converters because backward compatibility exists.

in reply to:  50 comment:51 by mlennert, 9 years ago

Replying to mmetz:

Replying to neteler:

Replying to mmetz:

In your opinion:

  • Adding new compression methods in G71 for cell/fcell files that are not readable by G70 is ok.

No. If the format is broken, we need to call it G8 and provide converters.

If we call it G8, converters are not needed because trunk as it is now can read G7 and G6 rasters.

Assuming trunk should be released as G7.1:

For full compatibility of relbr70 with trunk, not only read support for compressed null files would need to be backported, but also and more importantly read support for compressed cell/fcell files. These are new features at library level involving quite a few changes.

-1 to backporting. Only bug fixes should go into relbr70.

The current default settings in trunk are compatible with relbr70, and I do not see a reason to change the respective default settings in trunk.

+1, as long as the default settings lead to compatible maps, then I do not think any backporting is needed.

What exactly is the reason why null file compression should be enabled by default in >trunk?

Again +1. I don't understand all the effects and issues involved, but null file compression seems to be something that is mostly needed for specific big data situations. People dealing with such situations can alter specific settings.

I will not backport the changes I did to trunk with regard to cell/cell file and null file compression because these are new features which should IMHO not go into an existing release branch.

+1

Someone else would need to do the backporting.

Please don't.

comment:52 by neteler, 9 years ago

Milestone: 7.1.07.2.0

Milestone renamed

comment:53 by neteler, 9 years ago

Given that these improvements will be released as 7.2.0, the only missing issue seems to be the enabling of NULL file compression by default.

Right?

in reply to:  53 comment:54 by neteler, 8 years ago

For the record - concerning NULL file compression:

At time cell_misc/null is uncompressed by default.

Using the environment variable GRASS_COMPRESS_NULLS=1 NULL compression is activated for the session for newly created raster maps; and

export GRASS_COMPRESS_NULLS=1
r.null -z raster_map

generates a new compressed file cell_misc/nullcmpr and removes the old uncompressed cell_misc/null file for existing maps.

Note: At least GRASS GIS 7.2 is needed to read a raster map with compressed null file.

comment:55 by neteler, 8 years ago

r.compress manual updated in r69403, r69404.

comment:56 by neteler, 8 years ago

Milestone: 7.2.07.2.1

Ticket retargeted after milestone closed

comment:57 by wenzeslaus, 8 years ago

This seems like a great success!

Seth, do you want to add another method?

MarkusM, is there something else remaining?

MarkusN, anything you want to change or document?

If everybody is OK, we can close this.

in reply to:  57 ; comment:58 by mmetz, 8 years ago

Replying to wenzeslaus:

This seems like a great success!

Seth, do you want to add another method?

Other possible methods are LZ4H and lzma from xz, but IMHO the currently implemented methods are sufficient.

MarkusM, is there something else remaining?

I don't think so.

MarkusN, anything you want to change or document?

Maybe some documentation that BZIP2 is not always providing the highest compression ratio, it needs to be evaluated for each dataset separately.

If everybody is OK, we can close this.

in reply to:  58 ; comment:59 by neteler, 8 years ago

Replying to mmetz:

Maybe some documentation that BZIP2 is not always providing the highest compression ratio, it needs to be evaluated for each dataset separately.

Probably the notes on BZIP2 are sufficient here?

https://grass.osgeo.org/grass72/manuals/r.compress.html#compression-algorithm-details

in reply to:  59 comment:60 by mmetz, 8 years ago

Replying to neteler:

Replying to mmetz:

Maybe some documentation that BZIP2 is not always providing the highest compression ratio, it needs to be evaluated for each dataset separately.

Probably the notes on BZIP2 are sufficient here?

https://grass.osgeo.org/grass72/manuals/r.compress.html#compression-algorithm-details

and here

https://grass.osgeo.org/grass72/manuals/rasterintro.html#raster-compression

That should be sufficient.

Notes on NULL file compression could be added to the raster intro.

comment:61 by neteler, 8 years ago

Replying to mmetz:

Notes on NULL file compression could be added to the raster intro.

Done in r70669 and r70670.

Proposal: I would suggest to make NULL file compression the default in trunk.

in reply to:  61 comment:62 by lucadelu, 8 years ago

Replying to mmetz:

Done in r70669 and r70670.

Proposal: I would suggest to make NULL file compression the default in trunk.

+1

in reply to:  61 comment:63 by mmetz, 8 years ago

Replying to neteler:

Replying to mmetz:

Notes on NULL file compression could be added to the raster intro.

Done in r70669 and r70670.

Proposal: I would suggest to make NULL file compression the default in trunk.

Suggested tests:

  • have a raster map with NULL cells without NULL compression

export GRAS_COMPRESS_NULLS=1

test the result of r.compress

test the result of r.null -z

  • have a raster map with NULL cells with NULL compression

unset GRAS_COMPRESS_NULLS

test the result of r.compress

test the result of r.null -z

  • have two raster maps, one with compressed NULL cells, one with uncompressed NULL cells

export GRAS_COMPRESS_NULLS=1

test the result of mapa + mapb

unset GRAS_COMPRESS_NULLS

test the result of mapa + mapb

  • add your tests here
Version 0, edited 8 years ago by mmetz (next)

comment:64 by martinl, 8 years ago

Milestone: 7.2.17.2.2

comment:65 by martinl, 7 years ago

Milestone: 7.2.27.4.0

All enhancement tickets should be assigned to 7.4 milestone.

comment:66 by Nikos Alexandris, 7 years ago

Suggested tests implemented at https://github.com/NikosAlexandris/test_r_compress. Expectedly, tests will work with either GRASS_COMPRESS_NULLS=0 or GRASS_COMPRESS_NULLS=1.

comment:67 by neteler, 7 years ago

Milestone: 7.4.07.4.1

Ticket retargeted after milestone closed

comment:68 by martinl, 7 years ago

What is status of this issue?

comment:69 by neteler, 7 years ago

Version: svn-trunksvn-releasebranch74

Compress NULL file by default since r71674. On 12 Nov 2017: Creation of the GRASS GIS 7.4 release branch (r71701), hence it is default also in 7.4.0.

Last edited 7 years ago by neteler (previous) (diff)

comment:70 by neteler, 7 years ago

Concerning LZ4 compression: Shall this be backported?

  • r71999 libgis: update lz4 to 1.8.0

comment:71 by neteler, 7 years ago

Milestone: 7.4.17.4.2

comment:72 by neteler, 6 years ago

CPU: OSX/IntelAll
Milestone: 7.4.27.6.0
Platform: MacOSXAll

All including ZSTD compression is available and tested in the upcoming 7.6.0 (see relbranch76).

Thanks for much for all the contributions here! Closing.

comment:73 by neteler, 6 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.