wiki:Grass8Planning

Version 49 (modified by wenzeslaus, 5 years ago) ( diff )

--

GRASS 8 planning: ideas

For now just a brainstorming zone...

General

Runtime, current location and mapset, execution

  • C libraries and Python package(s) in system paths
    • Problem:
      • GRASS Python packages require things like grass_session package to be set up
        • This means than in between your imports you need to set up path to additional imports
      • GRASS C libraries require dynamic library path settings (and include settings when compiling) to be set up
        • Besides less convenience, this means that PyGRASS does not work even when both C and Python runtime is set up correctly because LD_LIBRARY_PATH change does influence the currently running process (only subprocesses).
    • Expected behavior: Most libraries and packages (and probably libraries of programs in general) are simply on standard system paths in unix-like environments. GRASS GIS should be accessible in the same way.
      • This is about the runtime environment, not about the connection to specific data or initialization of constants in C libraries.
    • Solution:
      • GRASS Python packages, C libraries and includes should install into standard paths thus being readily available for import (Python), execution (C), or includes.
      • On Linux/unix, make install and distributions need to be changed.
      • On Mac and Win, if you care about the above, you probably want to use Anaconda (or WSL) anyway, so this is how it should be distributed. Standalone and OSGeo4W installers should do the closest reasonable thing.
    • Additions:
      • Documented and short full initialization procedures for Python and C
  • Modules as subcommands
    • Given the above, should the GRASS GIS modules be "on path" as well? No, they should be subcommands where subcommand is something like the current grass ... --exec but perhaps with different syntax.
    • Looking at the GRASS GIS modules as subcommands does not change much directly right away, but clarifies how the API should be used and developed.
    • The db/location/mapset is then always expected to be set by grass command or another API and are not expected to run by themselves (this is starting to be already the case as people are adopting --exec).
    • The shell which starts with GRASS and the Console in GUI would be special cases where the subcommands are available without providing the actual GRASS command.
    • For tasks such as parallelization with tiling (or vectors) or imports and exports with reprojection require individual module calls to have different computational regions, mapsets or locations. This can be achieved by adding region and "gisrc" parameters to every module (i.e. to parser) or by providing this through grass command which needs to support most (if not all) of it anyway.
    • Subcommand interface would be then used inside GRASS Python code as well for the things which require it or potentially for everything (relying perhaps on the modules to be on path in addition to the libraries). Wrapping in Python functions expected with the current session being a special case (this open a way to all sorts of parallelizations and remote executions). Writing a GRASS modules would be a typical example of this special case (where perhaps the package uses internally the current session executor object as opposed to a generic one which requires to be initialized with db/location/mapset).
    • For the future, this allows us more flexibility, e.g. to implement addtional steps before or after module execution or to implement modules as C functions (and when called in a series, maps would not have to be written to disk or segment library storage could be transferred in between them).

Database format

  • Text encoding in the database is not defined. This applies for both text files and database records. It is somewhat defined as system encoding as long as the user knows it and converts to it when/before importing their data.
    • One thing which does not work now is moving the database from one computer to another in case the encodings are different.
    • This includes metadata, attributes, and temporal database.
    • There can be encoding defined per location, mapset, map, SQLite file, ...

Naming of GRASS GIS Database

  • GISDBASE versus grassdata versus GRASS Database versus GRASS Database Directory versus GRASS GIS Database versus GRASS GIS Spatial Database versus GIS Data Directory
    • The Data tab (Data Catalog), avoids it altogether (in 7.4) and says "GRASS locations in /abs/path/here"
  • Problem 1: It is hard to understand for beginners that the messing with the data in "grassdata" is not useful and can be harmful.
  • Problem 2: Non-unix users confuse current working directory with "grassdata".
  • Problem 3: The same things is referred to in different ways (GISDBASE, grassdata, GRASS Database Directory, dbase).
  • Problem 4: Instructions often say "set your location and mapset" while they mean "set 'database directory', then location, and then mapset, and optionally also working directory."
  • Problem 5: When setting up "GRASS session" manually, e.g. in Python, GISBASE and GISDBASE look too much alike.
  • Solution: Make it clear that users are dealing with a database and while also emphasizing that the database is spatial as opposed to just attributes (as in GIS vector/spreadsheet tables).
  • Outcome: Consistent naming of "grassdata" in doc, GUI, API (measurable). Clear way of creating instructions for beginners. Improved understating of the concept (long-term).
  • Challenges: name length, strong legacy ("GISDBASE" on many places), backwards compatibility (API changes), convention (such as grassdata) versus naming, different internal naming, relation to location and mapset, GRASS versus GRASS GIS, lowercase letters (as in thing, concept) versus uppercase (as in file format)
  • Consistency is perhaps the main thing needed:
    • GISDBASE is an environment variable and an abbreviated form of "GRASS GIS Database",
    • "dbase=" is an argument to change GISDBASE in several modules (e.g. g.mapset, r.reproject, etc),
    • "grassdata" is a commonly used name of a directory that serves as a GRASS GIS Database. However, a person can have more than one GIS Database or may want to give it a different name. That should be OK.
    • The modules that change GISDBASE should have arguments of "gisdbase=", not "dbase=". That would alleviate considerable confusion
    • When referring to the GIS Database in the interface, we should use the term "GIS Database" (or something similar) consistently. Again, consistency can help with confusion here.

Naming of Location

grass executable

  • Refer to it as grass in the documentation to avoid rewriting grass6 to grass7 to grass8 (or even with minor versions). Request/require packagers to create grass command (they already often do that).
  • Use the general "Linux" (POSIX+GNU) standard for options/flags/parameters (of the executable).
    • For example, use -c and --create rather than -create, i.e. avoid long flag to have one dash and provide both alternatives.
    • Supporting current/legacy syntax like -text and -gui might be too difficult (definitely remove from manual).
    • Same for module-like syntax which could be supported as well, e.g. supporting both --create=EPSG:3358 and create=EPSG:3358.
    • Require the order of arguments/operands to be correct (now not enforced).
    • Only exception to the rule should be --help.
    • Possibly use one of Python libraries (argparse >2.7 and >3.2, https://docs.python.org/3.6/library/optparse.html in 2.6 and 3.8? but deprecated since 2.7, getopt not recommended for general use).
    • The two dash long options (such as --text) are now in help, messages, and documentation (see r73100, r73263, r73265, and #1665). The one dash long ones (such as -text) are still allowed, but not advertised. The one letter only ones (such as -c) still don't have long equivalent and vice versa).
  • Unify GRASS_BATCH_JOB with --exec (don't use the shell=True for it) or possibly depreciate, or even remove it. (Goal: Simpler documentation and implementation.)

Color tables

  • There is a lot of color tables. Are all useful? Some are perceptually uniform, but some are quite simple. Are all good enough? For example, Matplotlib has much more color tables. Are we missing some?
  • Make clear and easy how users can store and share color tables in a simple way.
  • Make the GUI more clear, e.g. integrate the custom color table dialog into the r.colors/... dialogs.

Computational region

  • G7:g.region flags -p, -3, -g, and -f are not in sync in various ways: CRS info seems to be much more useful with -p and -3 then -g or -f (e.g. projection: 99 (NAD83(HARN) / North Carolina); datum: nad83harn versus projection=99). The -f flag can't be used by itself unlike the other three. Also -f is actually split over multiple lines (projection and zone are separate). Additionally, a defined string like "xy", "general" and "utm" might be much better than the numbers for the projection value.

C code

  • Replace the "4 spaces for level 1 and 1 tab for level 2" indent by "4 spaces for level 1 and 8 spaces for level 2" which is what we are using in Python code.
  • Define max for DCELL and/or general double somewhere in the library (raster, or for both raster and vector). DBL_MAX and others are used in the code base in various ways. Possibly HUGE_VAL and/or custom "undefined value" macros should be defined even if it is just for promoting best practice or consistency.

Modules

  • All modules with (pseudo-)random outputs like G7:r.mapcalc with rand() should use seed= to set the seed and -s to get random, non-deterministic result. (This is a blocker since the point is changing interface in a breaking way.)
  • For parser definitions, make label for each option really the primary used one in the code (currently description is usually included, but label is used as primary if available).
  • Parser format and functionality should be changed to:
    • comply with PEP8 for Python (e.g., replace #% by # %)
    • include the high level standard options into --script and perhaps other places
    • create better strings for the high level standard options for use in scripts (e.g., replace option: G_OPT_R_INPUT by something like option: raster_input)
    • include format/version of the interface description in scripts (e.g., format: grass and version: 2.0)
    • add ability to read older versions
    • use existing standard for the format for scripts, e.g., YAML, to make it readable and even writable by standardized tools (even if pre/post processing, like adding # %, would be needed)
    • swap label and description in the modules to simplify documentation on which one will be used (now description is used in code by default but label is primary when present)
    • add OGC/generic descriptions for module and parameters when the module is used out of context of GRASS GIS (goal is to avoid the need for QGIS Processing (and others) to have its own descriptions because of terminology, e.g., map -> layer/file/data)
    • add short module description in imperative (Compute XY) or as a title (XY computation), i.e., move these from wxGUI XMLs to modules themselves (e.g., for r.slope.aspect, wxGUI has Slope and aspect)
  • Consider including simplified modules. We already have r.mapcalc.simple which is useful not only for QGIS Processing etc. but also in GRASS itself, specifically with PyGRASS GridModule. For example, r.slope.aspect in wxGUI says Slope and aspect but in fact it does much more, yet the menu does not indicate that, perhaps a reason to have a basic and advanced versions.
  • Usage of wildcards and regular expressions should be better documented across the interface.
    • g.list and others have "bash wildcards" (aka glob) as default and then come with flags to turn on basic and extended regular expressions. BRE and ERE (POSIX or GNU) are mysterious and misleading for users not well versed in Unix history.
    • GUI tends to use Python re package.
    • In addition to it, users get in contact with SQL (e.g. LIKE, GLOB and REGEX in SQLite).
    • Examples in the interface seems to be a must regardless of the naming.
  • Default memory limit (option memory=) should be increased from the current 300MB. We may also consider using GB instead or including unit, and potentially we can also use actual MB not MiB or at least be clear about it.

Raster library

  • organization of raster file storage layout: have one raster folder per map like for vector data or raster3D
    • for fp maps: move fcell to cell, eliminating empty cell file
  • implement GRASS_RASTER_TMPDIR_MAPSET like it exists for vector data (GRASS_VECTOR_TMPDIR_MAPSET), i.e. change all .tmp/ to variable in source code in init/grass.py, gis/open.c, gis/file_name.c, raster library
  • maintain history as it is already for vector maps
  • synchronize metadata between raster and vector maps
  • keep track of open raster maps (already done in the R structure)
  • Storage in tiles instead of by row (See Grass7/RasterLib)
  • Merge NULL file into main data array (See Grass7/RasterLib)
  • save more raster metadata like number of non-null cells, mean and stddev (see GDAL)
  • Mask:
    • Add support for writing raster data (not only reading)
    • Create a Python and C functions to test its presence (like for 3D mask in C)
    • Add C define specifying its name (like for 3D mask)
    • Add environmental variable similar to WIND_OVERRIDE or GRASS_REGION to use something else as a mask
  • add tile support for better large map support (Sentinel, global data, ...), supporting massive parallel computations (based on discussions from Aug 2017)
    • it must be clear why this would better than GDAL vrt combined with r.external
    • tiles could then be even stored on different nodes for speed optimization
    • storage implementation:
      • develop a new virtual raster mapset "VRT" (special like PERMANENT)
      • virtual map: combination between current groups/stds and external maps - map metadata link the other maps, possibly in other mapset (Vaclav: I suggest this (vrt map) rather than vrt mapset or low level (i.e. new format) tiling)
        • something like the segment library opens the appropriate "standard" raster maps
        • different storage nodes possible if maps are different mapsets which are on diff nodes
    • reading:
      • [the above options or] add tile support deeply into raster lib (Rast_get_row())
      • use name scheme? make use of segment library
      • problem: due to row compression always whole row is read even if computation region is smaller
    • writing:
      • it is more complex

Imagery library

  • get rid of subgroups. No real need for that... (but leading to overcomplication of current usage)
    • or strengthen subgroups. One of options could be auto creating some "magic" subgroups as i.e. "_all" - all raster maps in a group (aka if subgroup is not set, use all maps from group); "_initial" - only imported remote sensing layers (equals "_all" if no new maps are added later); "RGB" - if source is RGB or RGB can be defined (I would love to be able to have an option to choose a group in d.rgb and get RGB subgroup selected as a default instead of searching for separate channels); "MLC" subgroup - see #2483.

Vector library

  • keep track of opened vector maps
  • keep track of dblinks to not remove table connected to multiple vector maps
  • get rid of dbf as database backend
  • portability and attribute file management would be much easier if each vector map had its own SQLite database (with the potential for multiple tables) rather than a single database for all vectors and their tables (layers) for the entire mapset
  • ...

Python library

  • finalize/stabilize Python 3 support in GRASS GIS
  • simplify the startup from Python script (i.e. less steps to start session from Python, possibly includes change in distribution/installation)
  • remove deprecated Python functions: https://trac.osgeo.org/grass/changeset/67669
  • Split grass.pygrass into ctypes-dependent C library wrappers and ctypes-independent module handling.
    • The functionality is not related. The reason why it is together is that it was implemented at the same time.
    • Problems with ctypes should not influence the running the modules.
  • Address the confusion between grass as the GRASS GIS Python library and PyGRASS. pygrass seems to be name for everything as py like this usually means Python, but in pygrass the idea was to mean Pythonic as opposed to shell/bash-like grass.script.

GUI

  • review (again) startup window, setting database/location/mapset, initial (default) location (wiki:wxGUIDevelopment/New_Startup)
  • single window interface as an option (first step: moving code to controller classes and panels) (wiki:wxGUIDevelopment/SingleWindow)
  • avoid the need for setting up path to packaged before imports (now we need to mix code and imports), e.g. grassgui package next to grass package
  • integrated Addon and GUI toolboxes
  • consistently inform user about whether or not an operation uses current computational region

Display modules

  • Clarify position of ximgview/wximgview, e.g. integrate them into d.mon. (They are easy to miss and separated from the rest of the functionality.)
  • Consider dropping, improving, or repurposing the HTML driver.

Manual pages

  • generate Sphinx manuals (rst converter is already in man/)
    • or use Markdown

Bug reports

Blockers

defect type tickets:

#3023
Change default behavior of d.title to draw instead of output text
#3055
Revise monochromatic color tables

task type tickets:

#969
move color structs to colors.h?
#2681
Remove legacy meaning of LOCATION variable

Critical issues

No results

Further issues

Note: See TracWiki for help on using the wiki.