Opened 10 years ago

Closed 9 years ago

Last modified 9 years ago

#2579 closed enhancement (fixed)

Specify command to be exectued as parameter of grass command

Reported by: wenzeslaus Owned by: grass-dev@…
Priority: normal Milestone: 7.2.0
Component: Startup Version: svn-trunk
Keywords: batch job, GRASS_BATCH_JOB, init Cc:
CPU: Unspecified Platform: All

Description

To run some modules from outside of GRASS you currently have to either setup the environment yourself which is hard, error prone and you won't get it right anyway or you can use grass command in a batch mode. For this you have to specify GRASS_BATCH_JOB environmental variable and then call GRASS GIS:

export GRASS_BATCH_JOB=.../test_script.sh
grass7 ~/grassdata/location/mapset

Although this works it might be quite cumbersome especially in some languages. For example Python has much smoother interface where you just specify the script and its arguments:

python .../test_script.py arg1 arg2 ...

The attached patch is introducing an additional interface for the grass command which allows to call scripts like this:

grass7 --mapset ~/grassdata/location/mapset --batch .../test_script.sh

But it actually allows to also use parameters, GRASS modules, and generally any commands:

grass7 --mapset ~/grassdata/location/mapset --batch .../test_script.sh some parameters
grass7 --mapset ~/grassdata/location/mapset --batch r.info map=elevation

If you are fine with what is in the rc file, you can use just:

grass7 --batch r.info map=elevation

But I'm not sure if it is a best practice.

I wrote the patch in the way that you don't get any additional output, just the output from the module, unless something unusual is happening (e.g., creation of a new location):

$ grass71 --mapset ~/grassdata/location/mapset --batch r.info map="elevation" -g
north=228500
south=215000
east=645000
west=630000
nsres=10
ewres=10
rows=1350
cols=1500
cells=2025000
datatype=FCELL
ncats=255

I tried to preserve the functionality of GRASS_BATCH_JOB including the GRASS textual output and sanity checks.

When both GRASS_BATCH_JOB and --batch are provided --batch is used and GRASS_BATCH_JOB is ignored as Python documentation says: it is customary that command-line switches override environmental variables where there is a conflict (e.g. gcc follows the same practice).

The names --mapset and --batch seemed to me at best choice, although there are other good options too such as --run.

To test, try something like:

cat > test_script.sh <<EOF
#!/bin/bash
echo "Hello from GRASS GIS (`date`)"
echo "This is what was called: $0 $@"
EOF
grass7 --mapset ~/grassdata/location/mapset --batch test_script.sh some parameters
grass7 --mapset ~/grassdata/location/mapset --batch r.mapcalc "aaa = 5"
grass7 --mapset ~/grassdata/location/mapset --batch r.info aaa

GUI works too, although I'm not sure if it is useful (could be even inconvenient for scripting).

grass7 --mapset ~/grassdata/location/mapset --batch r.info

From what I see now, the only issue with calling individual modules is that you cannot (or should not) parallelize the calls of grass command in the same mapset.

Additional ideas

This is out of scope of this ticket but there is a potential to create one even more powerful interface similar let's say to git.

mkdir some_project
cd some_project
# init connects to existing database, location and mapset or creates a new one
# creates .grassrc (.rc or .gisrc) file current directory
grass7 init ~/grassdata/location/mapset [-c | -c geofile | -c EPSG:code[:datum_trans]]
grass7 import .../some_image.tiff
grass7 run r.info some_image
grass7 run r.mapcalc "improved_image = 5 * some_image"
grass7 export improved_image .../improved_image.tiff
# next time you can cd into some_project directory and commands will work right away
# because .grassrc file will be already there

Some commands such as grass7 link or grass7 external might be quite useful, although they would be, similarly to grass7 import and grass7 export just appropriate r.in.gdal, r.in.proj, etc. calls.

It would be even more interesting to have:

grass7 run r.slope.aspect elevation=file://.../elevation.tiff aspect=file://...aspect.tiff

The grass command would have to parse the command line, find the files which should be maps and link them. And perhaps if it wouldn't be grass7 run but something different such as grass7 runonly, we could even skip the .grassrc and create location on the fly in /tmp and delete it after execution. If data would be just linked, not imported and exported, it could be pretty fast. (But obviously we could be hitting issues with projection and topology here, so it is a bit tricky.)

Attachments (1)

batch_job_from_cmd_line.diff (8.1 KB ) - added by wenzeslaus 10 years ago.
First implementation of batch job which could be specified from command line

Download all attachments as: .zip

Change History (17)

by wenzeslaus, 10 years ago

First implementation of batch job which could be specified from command line

comment:1 by mlennert, 10 years ago

Why do you need the --mapset as in:

grass7 --mapset ~/grassdata/location/mapset --batch r.mapcalc "aaa = 5"

?

The current GRASS startupt script already allows to define a mapset at startup, so this seems redundant.

in reply to:  1 comment:2 by wenzeslaus, 10 years ago

Replying to mlennert:

Why do you need the --mapset as in:

grass7 --mapset ~/grassdata/location/mapset --batch r.mapcalc "aaa = 5"

?

The current GRASS startupt script already allows to define a mapset at startup, so this seems redundant.

This is a thing I'm not sure about. The current syntax is

grass ~/grassdata/nc_spm_08_grass7/user1
grass -c ~/grassdata/nc_spm_08_grass7/user1

which follows the following general pattern

name options/flags files
name [option]... [file]...
name [option]... [file] [arg]...

where options/flags are distinguished by - or -- and first thing which is not an option starts a file list. The last row actually describes python and Rscript:

python [option] ... [-c cmd | -m mod | file | -] [arg] ...
Rscript [--options] [-e expr [-e expr2 ...] | file] [args]

So it seems that they actually leave out (equivalent of) --batch. If we would allow not to specify db+l+mapset then one could use:

grass .../test_script.sh
grass r.info
grass r.mapcalc "aaa = 5"

which could be hard to distinguish from (current):

grass ~/grassdata/nc_spm_08_grass7/user1

We can also say that db+l+mapset is always required when passing module or script because the usual use case is when GRASS GIS is used as processing backend and in this case you rarely want to use db+l+mapset from rc file. This gives us:

grass ~/grassdata/nc_spm_08_grass7/user1 .../test_script.sh
grass ~/grassdata/nc_spm_08_grass7/user1 r.info
grass ~/grassdata/nc_spm_08_grass7/user1 r.mapcalc "aaa = 5"

In this case, I'm not sure how well we can distinguish different cases (standard/batch) when parsing and how we can provide good error messages.

Similar, but not the same, case is grep. With grep you always have to provide the PATTERN parameter:

grep [OPTION]... PATTERN [FILE]...

In our case all parameters would be optional. The order is really important in this case and identification of options can become tricky, although hopefully not as much as with grep (try to search for a string which starts with -). If we decide for something like this, the command line parsing in grass.py will have to be reimplemented, at least I think according to the code. For example, you have to recognize where the actual command starts because then everything else should not be used (trivial with --batch, hard without).

comment:3 by rkrug, 10 years ago

This syntax would open many possibilities and make life much easier for accessing GRASS GIS from other languages (e.g. R).

I guess there is no chance that it can be included into GRASS 7?

in reply to:  3 ; comment:4 by martinl, 10 years ago

Replying to rkrug:

I guess there is no chance that it can be included into GRASS 7?

I would say no for GRASS 7.0, the issue is focused on GRASS 7.1 I would say.

comment:5 by rkrug, 10 years ago

Pity - so when is 7.1 out? (Just joking...)

comment:6 by wenzeslaus, 10 years ago

See also older ticket with the same idea #1660 (leaving this one open as the discussion is more developed here).

in reply to:  4 ; comment:7 by wenzeslaus, 10 years ago

Replying to martinl:

Replying to rkrug:

I guess there is no chance that it can be included into GRASS 7?

I would say no for GRASS 7.0, the issue is focused on GRASS 7.1 I would say.

Right, the I've already set the milestone to 7.1.0.

Feedback for the command line syntax would be appreciated to move this forward and is necessary before including to trunk (which should be done and before the patch becomes incompatible).

in reply to:  7 ; comment:8 by rkrug, 10 years ago

Some remarks concerning the syntax:

1) I don't like the idea of having to repeat the mapset each time as it will require quite a bit of typing and possible errors in longer scripts. So the assumption to use the mapset which is in the rc file (i.e., if I am correct, the one used before), would be quite useful.

2) Instead of using the normal grass command, I would suggest to introduce a new command(e.g. ggrassbatch), which is taking only one parameter: the command to be executed including the parameter. So it could be used as

grassbatch r.info

3) The function grassbatch should accept, in addition to the normal grass commands, one more command named e.g. set.mapset which is only doing one thing, setting the mapset in the rc file, so this would be the mapset to be used for all following grassbatch commands, unless the mapset is changed.

So a script could look as followed:

grassbatch set.mapset ~/grassdata/nc_spm_08_grass7/user1
grassbatch .../test_script.sh
grassbatch r.info
grassbatch r.mapcalc "aaa = 5"

comment:9 by wenzeslaus, 10 years ago

Setting up the session manually is really challenging, setting up addons path is yet another step which should be done if you want have fully working session:

+ # add path to GRASS addons
+ home = os.path.expanduser("~")
+ os.environ['PATH'] += os.pathsep + os.path.join(home, '.grass7', 'addons', 'scripts')

Any other opinions about how the command line interface should look like? Use cases would be also appreciated.

in reply to:  8 comment:10 by wenzeslaus, 10 years ago

Replying to rkrug:

Some remarks concerning the syntax:

1) I don't like the idea of having to repeat the mapset each time as it will require quite a bit of typing and possible errors in longer scripts. So the assumption to use the mapset which is in the rc file (i.e., if I am correct, the one used before), would be quite useful.

I have two use cases which were not mentioned. Testing framework which does not have particluar requirements on command line syntax but you run just one command/script, so you probably want to set Database/Location/Mapset in one command. And then it is Docker where setting of the Database/Location/Mapset in/with the command itself seems to be really important because you create a new instance with the command:

docker run user/grass-system grass70 --mapset ~/grassdata/nc_spm_08_grass7/user1 .../script.sh with parameters
docker run user/grass-system grass70 --mapset ~/grassdata/nc_spm_08_grass7/user1 g.gisenv
docker run user/grass-system grass70 --mapset ~/grassdata/nc_spm_08_grass7/user1 r.mapcalc "aaa = 5"
docker run user/grass-system grass70 --mapset ~/grassdata/nc_spm_08_grass7/user1 r.info map=aaa

or following my initial suggestion:

docker run user/grass-system grass70 --mapset ~/grassdata/nc_spm_08_grass7/user1 --batch .../script.sh with parameters
docker run user/grass-system grass70 --mapset ~/grassdata/nc_spm_08_grass7/user1 --batch g.gisenv
docker run user/grass-system grass70 --mapset ~/grassdata/nc_spm_08_grass7/user1 --batch r.mapcalc "aaa = 5"
docker run user/grass-system grass70 --mapset ~/grassdata/nc_spm_08_grass7/user1 --batch r.info map=aaa

or similarly to Docker:

docker run user/grass-system grass70 run --mapset=~/grassdata/nc_spm_08_grass7/user1 .../script.sh with parameters
docker run user/grass-system grass70 run --mapset=~/grassdata/nc_spm_08_grass7/user1 g.gisenv
docker run user/grass-system grass70 run --mapset=~/grassdata/nc_spm_08_grass7/user1 r.mapcalc "aaa = 5"
docker run user/grass-system grass70 run --mapset=~/grassdata/nc_spm_08_grass7/user1 r.info map=aaa

where run can be replaced by something different (exec, batch, do, cmd, script) in order to provide nice syntax also for the alternative usage described see below. Note that --mapset is not (does not have to be) mandatory, it can just use the last Mapset as stored in $HOME/.grass7/rc.

Docker general run syntax is by the way:

docker run [OPTIONS] IMAGE[:TAG] [COMMAND] [ARG...]

Docker has the advantage that IMAGE is mandatory while in GRASS GIS, Mapset or command would be required (none, one, or both are possible).

2) Instead of using the normal grass command, I would suggest to introduce a new command(e.g. ggrassbatch), which is taking only one parameter: the command to be executed including the parameter. So it could be used as

grassbatch r.info

This would be similar to Rscript command (as opposed to R CMD syntax). However, the nested commands seems to be quite common (although now I can remember just revision control systems and docker). But basically grassbatch and grass batch are very similar.

3) The function grassbatch should accept, in addition to the normal grass commands, one more command named e.g. set.mapset which is only doing one thing, setting the mapset in the rc file, so this would be the mapset to be used for all following grassbatch commands, unless the mapset is changed.

So a script could look as followed:

grassbatch set.mapset ~/grassdata/nc_spm_08_grass7/user1
grassbatch .../test_script.sh
grassbatch r.info
grassbatch r.mapcalc "aaa = 5"

This goes to my other suggestion (in description or in GSoC ideas), particularly a basic version of it. I think that there are good reasons to have both of them. Some use cases (Docker, testing framework, Cron jobs) push more for the single-command syntax, while user scripts and things which typically needs to import and export data (QGIS, WPS) would benefit more from this multi-command syntax. Where would you expect the current GISRC file to be? The runtime one is now in "/tmp" (/tmp/grass7-user-number/gisrc) while the initial one (from where Database/Location/Mapset is taken if not provided in command line) is in $HOME/.grass7/rc. My idea was to have it in the current directory (or possibly also specified in command line), so that it does not interfere with the one in $HOME which is the use case I expect.

If we spend some time figuring out this, I think we can save a lot of time in a long run on the support, see for example recent post on the grass-user mailing list (nabble link).

comment:11 by wenzeslaus, 10 years ago

First version of executing command specified as parameter implemented in r65252. Please try:

grass71 ~/grassdata/nc_spm_08_grass7/user1/ exec g.region -p

If you are interested more, something like:

grass71 -c EPGS:4545 ~/grassdata/nc_spm_08_grass7/user1/ exec g.region -p

should work too. See commit message for details (there is no doc yet). Alternative syntax would be:

grass71 exec ~/grassdata/nc_spm_08_grass7/user1/ g.region -p
grass71 exec -c EPSG:4545 ~/grassdata/nc_spm_08_grass7/user1/ g.region -p

which supposes that both mapset path and command are mandatory and appear together (and exec is always the first one, mapset path is last from standard parameters). This is quite nice in relation to the other potential extension discussed in the this ticket.

However, the option I used seems to me a little bit better when considered alone. Perhaps I should change exec to --exec (I believe we should not use -exec) and leave the "subcommand syntax" to their actual implementation. I used exec rather than batch (or even batch-job) because the word batch is used in far too many different meanings.

It required some refactoring in r65238, r65241, r65246, r65247, r65248, r65250, r65251 and although I split it into small portions and tested what I could, the code is quite tricky and critical, so it would be to get it tested throughly (I would like to see some automated tests too but I had no idea how to do it).

GRASS_BATCH_JOB is still supported and should be supported at least in the 7.x series (can be removed for 8 if not useful).

in reply to:  11 comment:12 by martinl, 10 years ago

Replying to wenzeslaus:

First version of executing command specified as parameter implemented in r65252. Please try:

> grass71 ~/grassdata/nc_spm_08_grass7/user1/ exec g.region -p

Absolutely cool! Martin

comment:13 by wenzeslaus, 10 years ago

I've updated documentation in r65269 and r65270. It is not complete (e.g., there are no explanations) but it should be enough to try it and use it. I created example which uses r.external, r.univar and g.gui.mapswipe.

comment:14 by wenzeslaus, 10 years ago

In r65294 I changed exec to --exec. This follows general long flag syntax which is already partially used in grass.py. I think it will be better to leave the "subcommand syntax" for the actual implementation of it. So now we have:

grass71 ~/grassdata/nc_spm_08_grass7/user1/ --exec g.region -p

The only alternative would be to go with majority there and use short flags, for example -x (-e is already taken for exit after creating mapset). But I'm not adding it because it should be primary used in script and programming and there you should use long flags anyway.

It would be possible to go just with the short flags, as we do now with -c, -e and -f but this would go against the general trend of readability and I expect that we might change it in the future as we already support --version, --config and also all GUI selection flags can be used with two dashes.

comment:15 by wenzeslaus, 9 years ago

Resolution: fixed
Status: newclosed

Parsing of parameters should be refactored and improved and behavior when both GRASS_BATCH_JOB and --exec are used should be revised but the goal of this ticket was fulfilled. Closing as fixed. This shouldn't be backported to 7.0 branch (or at lease not any time soon) because it makes a lot of changes to a crucial code.

Two other major issues were discussed together with this topic and these are direct execution with imports (grass run r.slope.aspect elevation=file:///path/to/file.tiff...) and subcommand interface (grass start .../mapset; grass run r.slope.aspect...;). The later should be easy to implement now. The hardest part in implementation is to read the "gisrc" file from current directory and decide the name for the run subcommand considering newly added --exec and potential direct execution with imports which would probably need another subcommand or parameter. How to call individual interfaces (including this one) should be revised as well.

There is an unresolved issue with code duplication between lib/init/grass.py and lib/python/script/setup.py. There is some code duplication now but script.setup(.init) would benefit from some code in grass.py which would even increase the duplication. The question is if we can safely import grass.script.setup during startup in grass.py.

Finally, to make this --exec interface really convenient (and script universal), it would be nice to ensure that on system there is grass available on path. This is true for Linux distributions but it is not true for MS Windows. Perhaps it also needs to be documented more including distinction between grass, grass7 and grass71.

None of the issues above has currently its ticket.

Direct execution with imports GSoC idea:

Subcommand interface mailing list discussions:

Related tickets:

comment:16 by neteler, 9 years ago

Milestone: 7.1.07.2.0

Milestone renamed

Note: See TracTickets for help on using tickets.