wiki:GSoC/2014/TestingFrameworkForGRASS

Version 12 (modified by wenzeslaus, 11 years ago) ( diff )

Weekly report 1

Testing framework for GRASS GIS

Title: Testing framework for GRASS GIS
Student: Vaclav Petras, North Carolina State University, Open Source Geospatial Research and Education Laboratory
Organization: OSGeo - Open Source Geospatial Foundation
Mentors: Sören Gebbert, Helena Mitasova
GSoC link: abstract

Abstract

GRASS GIS is one of the core projects in the OSGeo Foundation. GRASS provides wide range of geospatial analyses including raster and vector analyses and image processing. However, there is no system for regular testing of it's algorithms. To ensure software quality and reliability, a standardized way of testing needs to be introduced. This project will implement a testing framework which can be used for writing and running tests of GRASS GIS modules, C/C++ libraries and Python libraries.

Introduction

GRASS GIS is one of the core projects in the OSGeo Foundation and is used by several other free and open source projects to perform geoprocessing tasks. The software quality and reliability is crucial. Thus, proper testing is needed. So far, the testing was done manually by both developers and users. This is questionable in terms of test coverage and frequency of the tests and moreover, it is inconvenient. This project will implement a testing framework which can be used for writing and running tests for GRASS GIS. This will be beneficial not only for the quality of GRASS GIS but also for everyday development of GRASS GIS because it will help to identify problems with the new code at the time when the change is done.

Background

There was already several attempts to establish testing infrastructure for GRASS GIS, namely quality assessment and monitoring mailing list which is inactive for several years, then older test suite which was never integrated into GRASS GIS itself, and most recently a test suite proposal which was trying to interpret shell scripts as test cases. Also, an experience with usage of Python doctest at different circumstances shows that this solution is not applicable everywhere.

These previous experiences give us a clear idea what is not working (e.g. tests outside main source code), what is overcomplicated (e.g. reimplementing shell) and what is oversimplified (e.g. shell scripts without clear set up and tear down steps), and point us to the direction of an implementation which will be most efficient (general but simple enough), integrated in GRASS source code, and which will be accepted by the GRASS development team. The long preceding discussions also showed what is necessary to have in the testing framework and what should be left out.

The idea

The purpose of this project is to develop a general mechanism which would be applicable for testing GRASS modules, libraries or workflows with different data sets. Tests will be part of GRASS main source code, cross-platform, and as easy to write and run as possible. The testing framework will enable the use of different testing data sets because different test cases might need special data. The testing framework will be implemented in Python and based on testing tools included in standard Python distribution (most notably unittest) which will not bring a new dependency but also it will avoid writing everything from scratch. The usage of Makefile system will be limited to triggering the test or tests with the right parameters for particular location in the source tree, everything else will be implemented in Python to ensure maximum re-usability, maintainability, and availability across platforms.

This project will focus on building infrastructure to test modules, C/C++ libraries (using ctypes interface), and Python libraries. It is expected that testing of Python GUI code will be limited to pure Python parts. The focus will be on the majority of GRASS modules and functionality while special cases such as rendering, creation of locations, external data sources and databases, and downloading of extensions from GRASS Addons will be left for future work. Moreover, this project will not cover tests of graphical user interface, server side automatic testing (e.g. commit hooks), using testing shell scripts or C/C++ programs, and testing of internal functions in C/C++ code (e.g. static functions in libraries and functions in modules). Creation of HTML, XML, or other rich outputs will not be completely solved but the implementation will consider the need for a presentation of test results. Finally, writing the tests for particular parts will not be part of this project, however several sample tests for different parts of code, especially modules, will be written to test the testing framework.

Project plan

date proposed task
2014-05-19 - 2014-05-23 (week 01) Designing a basic template for the test case and interface of test suite class(es)
2014-05-26 - 2014-05-30 (week 02) Basic implementation
2014-06-02 - 2014-06-06 (week 03) Dealing with evaluation and comparison of textual and numerical outputs
2014-06-09 - 2014-06-13 (week 04) Dealing with evaluation and comparison of map outputs and other outputs
2014-06-16 - 2014-06-20 (week 05) Re-writing some existing tests using testing framework
2014-06-23 - 2014-06-27 (week 06) Testing of what was written so far and evaluating current design and implementation
June 23 Mentors and students can begin submitting mid-term evaluations
June 27 Mid-term evaluations deadline
2014-06-30 - 2014-07-04 (week 07) Integration with GRASS source code, documentation and build system
2014-07-07 - 2014-07-11 (week 08) Implementation of location switching
2014-07-14 - 2014-07-18 (week 09) Dealing with evaluation and comparison of so far unresolved outputs
2014-07-21 - 2014-07-25 (week 10) Implementing the basic test results reports
2014-07-28 - 2014-08-01 (week 11) Re-writing some other existing tests using testing framework
2014-08-04 - 2014-08-08 (week 12) Writing documentation of framework internals and guidelines how to write tests
2014-08-11 - 2014-08-15 (week 13) Polish the code and documentation
August 11 Suggested 'pencils down' date. Take a week to scrub code, write tests, improve documentation, etc.
2014-08-18 - 2014-08-22 (week 14) Submit evaluation and code to Google
August 18 Firm 'pencils down' date. Mentors, students and organization administrators can begin submitting final evaluations to Google.
August 22 Final evaluation deadline
August 22 Students can begin submitting required code samples to Google

Design of testing API

import unittest
import grass.pygrass.modules as gmodules

# alternatively, these can be private to module with setter and getter
# or it can be in a class
USE_VALGRIND = False


class GrassTestCase(unittest.TestCase):
    """Base class for GRASS test cases."""

    def run_module(self, module):
        """Method to run the module. It will probably use some class or instance variables"""
        # get command from pygrass module
        command = module.make_cmd()
        # run command using valgrind if desired and module is not python script
        # see also valgrind notes at be end of this section
        if is_not_python_script(command[0]) and USE_VALGRIND:
            command = ['valgrind', '--tool=...', '--xml=...', '--xml-file=...'] + command
        # run command
        # store valgrind output (memcheck has XML output to a file)
        # store module return code, stdout and stderr, how to distinguish from valgrind?
        # return code, stdout and stderr could be returned in tuple
    
    def assertRasterMap(self, actual, reference, msg=None):
        # e.g. g.compare.md5 from addons
        # uses msg if provided, generates its own if not,
        # or both if self.longMessage is True (unittest.TestCase.longMessage)
        # precision should be considered too (for FCELL and DCELL but perhaps also for CELL)
        # the actual implementation will be in separate module, so it can be reused by doctests or elsewhere
        # this is necessary considering the number and potential complexity of functions
        # and it is better design anyway

        if check sums not equal:
            self.fail(...)  # unittest.TestCase.fail
class SomeModuleTestCase(GrassTestCase):
    """Example of test case for a module."""
    
    def test_flag_g(self):
        """Test to validate the output of r.info using flag "g"
        """
        # Configure a r.info test 
        module = gmodules.Module("r.info", map="test", flags="g", run_=False)

        self.run_module(module=module)
        # it is not clear where to store stdout and stderr
        self.assertStdout(actual=module.stdout, reference="r_info_g.ref")
        
    def test_something_complicated(self):
        """Test something which has several outputs
        """
        # Configure a r.info test 
        module = gmodules.Module("r.complex", rast="test", vect="test", flags="p", run_=False)
        
        (ret, stdout, stderr) = self.run_module(module=module)
        self.assertEqual(ret, 0, "Module should have suceed but return code is not 0")
        self.assertStdout(actual=stdout, reference="r_complex_stdout.ref")
        self.assertRasterMap(actual=module.rast, reference="r_complex_rast.ref")
        self.assertVectorMap(actual=module.vect, reference="r_complex_vect.ref")

Compared to suggestion in ticket:2105#comment:4 it does not solve everything in test_module (run_module) function but it uses self.assert* similarly to unittest.TestCase which (syntactically) allows to check more then one thing.

Test scripts must have module/package character (as unittests requires for test discovery). This applies true for unittests and doctests, no exceptions. Doctests (inside normal module code or in separate doc) will be wrapped as unittest test cases (in testsuite directory). There is a standard way to do it. To have the possibility of import, all the GRASS Python libraries shouldn't do anything fancy at import time. For example, doctests currently don't work with grass.script unless you call a set of functions to deal with function _ because of installing translate function as buildin _ function while _ is used also in doctest. (This is fixed for GUI but still applies to Python libraries).

Analyzing module run using valgrind or others

Modules (or any tests?) can run with valgrind (probably --tool=memcheck). This could be done on the level of testing classes but the better option is to integrate this functionality (optional running with valgrind). Environmental variable (GRASS_PYGRASS_VALGRIND) or additional option valgrind_=True (similarly to overwrite) would invoke module with valgrind (works for both binaries and scripts). Additional options can be passed to valgrind using valgrind's environmental variable $VALGRIND_OPTS. Output would be saved in file to not interfere with module output.

We may want to use also some (runtime checking) tools other than valgrind, for example clang/LLVM sanitizers (as for example Python does). However, it is unclear how to handle more than one tool as well as it is unclear how to store the results for any of these (including valgrind) because one test can have multiple module calls (or none), module calls can be indirect (function in Python lib which calls a module or module calling module) and there is no standard way in unittest to pass additional test details.

PyGRASS or specialized PyGRASS module runner (function) in testing framework can have function, global variable, or environmental variable which would specify which tool should run a module (if any) and what are the parameters (besides the possibility to set parameters by environmental variable defined by the tool). The should ideally be separated from the module output and go to a file in the test output directory (and it could be later linked from/included into the main report).

Data types to be checked

We must deal especially with GRASS specific files such as raster maps. We consider that comparison of simple things such as strings and individual numbers is already implemented by unittest.

  • raster map
    • composite? reclassified map?
    • color table included
  • vector map
  • 3D raster map
  • color table
  • SQL table
  • file

Most of the outputs can be checked with different numerical precision.

Resources:

Naming conventions

The unittest.TestLoader.discover function requires that module names are importable (i.e. are valid Python identifiers). Consequently, names of files with tests should contain dots (except for the .py suffix).

Methods with tests must start with test_ to be recognized by the unittest framework (with default setting but there is no reason to not keep this convention).

Name for directory with test is "testsuite". It also fits to how unittest is using this term (set of test cases and other test suites). "test" and "tests" is simpler and you can see it, for example in Python, but might be too general. "unittest" would confuse with the module unittest.

Layout of directories and files

Test scripts are located in a dedicated directory within module or library directories. All reference files or additionally needed data should be located there.

The same directory as tested would work well for one or two Python files but not for number of reference files. In case of C/C++ this would mean mixing Python and C/C++ files in one directory which is strange. One directory in root with separate tree is something which would not work either because tests are not close enough to actual code to be often run and extended when appropriate.

Invoking tests

Test scripts will work when directly executed from command line and when executed from the make system. When tests will executed by make system they might be executed by a dedicated "test_runner" Python script to set up the environment. However, the environment can be set up also inside the test script and not setting the environment would be the default (or other way around since setting up a different environment would be safer).

Tests should be executable by itself (i.e. they should have main() function) to encourage running them often. This can be used by the framework itself rather then imports because it will simplify parallelization and outputs needs to go to files anyway (because of size) and we will collect everything from the files afterwards (so it does not matter if we will use process calls or imports).

Example run

cd lib/python/temporal

Now there are two options to run the tests. First execution by hand in my current location:

cd testsuite
for test in `ls *.py` ; do
    python ${test} >> /tmp/tgis_lib.txt 2>&1
done

The test output will be written to stdout and stderr (piped to a file in this case).

Second option is an execution by the make system (still in lib/python/temporal):

make tests

Locations, mapsets and data

The test scripts should not depend on specific mapsets for their run. In case of make system run, every test script will be executed in its own temporary mapset. Probably location will be copied for each test (testsuite) to keep the location clean and allow multiprocessing. In case of running by hand directly without make, tests will be executed in the current location and mapset which will allow users to test with their own data and projections.

Tests (suites, cases, or scripts/modules) itself can define in which location or locations they should be executed as a global variable (both for unit and doctests, doctests in they their unittest wrapper). The global variable will be ignored in case the test script is executed by hand. When executed by make system, the test can be executed in all locations specified by the global variable, or one location specified by make system (or generally forced from the top). In other words, the global variable specifies what the test want to do, not what it is capable to do because we want it to run in any location.

We should have dedicated test locations with different projections and identical map names. I wouldn't use the GRASS sample locations (NC, Spearfish) as test locations directly. We should have dedicated test locations with selected data. They can overlap with (let's say) NC but may contain less imagery but on the other hand some additional strange data. The complication are doctests which are documentation, so as a consequence they should use the intersection of NC sample and testing location. The only difference between the locations would be the projection, so it really makes difference only for projected, latlon and perhaps XY.

If multiple locations are allowed and we expect some maps being in the location such as elevation raster, it is not clear how to actually test the result such as aspect computed from elevation since the result (such as MD5 sum) will be different for each location/projection. This would mean that the checking/assert functions or tests themselves would have to handle different locations and moreover, this type of tests would always fail in the user provided location.

It is expected that any needed (geo-)data is located in the PERMANENT mapset when a specific location is requested. This applies for prepared locations. For running in the user location, it is not a requirement.

The created data will be deleted at the end of the test. The newly created mapset will be deleted (if it was created). If we would copy the whole location (advantageous e.g. for temporal things), the whole location will be deleted. The tests will be probably not required to delete all the created maps but they might be required to delete other files (or e.g., tables in database) if they created some.

User should be able to disable the removal of created data to be able to inspect them. This would be particularly advantageous for running tests by hand in some user specified location.

In case the mapset information is needed, then g.mapset -p (prints the current mapset) must be parsed within the test. An special case, a test of g.mapset -p itself will create a new mapset and then switch to the new one and test there.

Testing framework can have a function to check/test the current location (currently accessible mapsets) whether it contains all the required maps (according to their name).

All reference files (and perhaps also additional data) will be located in the testsuite directory. There can be also one global directory with additional data (e.g. data to import) which will be shared between test suites and exposed by the testing framework.

Testing framework design should allow us to make different decisions about how to solve data and locations questions.

Weekly reports

Week 01

  1. What did you get done this week?

I discussed the design and implementation with mentor during a week. The result of discussions is on project wiki page (this page, link to version). I will add more in the next days.

The code will be probably placed in GRASS sandbox repository (HTML browser), later it will be hopefully moved to GRASS trunk (HTML browser). However, the discussion about where to put GSoC source code is still open (gsoc preferred source location, at nabble).

  1. What do you plan on doing next week?

I plan to implement some basic prototype of (part of) testing framework to see how the suggested design would look like in practice and if it needs further refinement. So far it is how I have it in my schedule.

  1. Are you blocked on anything?

It is not clear to me how certain things in testing framework will work on MS Windows. I will discuss this later on the wiki page and grass-dev.

report email

Attachments (6)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.