Changes between Version 12 and Version 13 of GSoC/2014/TestingFrameworkForGRASS
- Timestamp:
- 05/24/14 20:57:03 (11 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
GSoC/2014/TestingFrameworkForGRASS
v12 v13 61 61 || August 22 || Students can begin submitting required code samples to Google 62 62 63 == Design of testing API ==63 == Design of testing API and implementation == 64 64 65 65 {{{ … … 132 132 Compared to suggestion in ticket:2105#comment:4 it does not solve everything in `test_module` (`run_module`) function but it uses `self.assert*` similarly to `unittest.TestCase` which (syntactically) allows to check more then one thing. 133 133 134 === Test script should be importable === 135 134 136 Test scripts must have module/package character (as unittests requires for test discovery). This applies true for unittests and doctests, no exceptions. Doctests (inside normal module code or in separate doc) will be wrapped as `unittest` test cases (in `testsuite` directory). There is a [https://docs.python.org/2/library/doctest.html#unittest-api standard way] to do it. To have the possibility of import, all the GRASS Python libraries shouldn't do anything fancy at import time. For example, doctests currently don't work with `grass.script` unless you call [source:grass/trunk/gui/wxpython/core/toolboxes.py?rev=60218#L630 a set of functions] to deal with function `_` because of installing translate function as buildin `_` function while `_` is used also in `doctest`. (This is fixed for GUI but still applies to Python libraries). 135 137 136 137 === Analyzing module run using valgrind or others === 138 === Dealing with process termination === 139 140 There is no easy way how to test that (GRASS) fatal errors are invoked when appropriate. Even if the test method (`test_*`) itself would run in separate process (not only the whole script) the process will be ended without proper reporting of the test result (considering we want detailed test results). However, since this applies also to fatal errors invoked by unintentional failure and to fatal errors, it seems that it will be necessary to invoke the test methods (`test_*`) in a separate process to at least finish the other tests and not break the final report. This might be done by function decorator so that we don't invoke new process for each function but only for those who need it (the ones using things which use `ctypes`). 141 142 === Analyzing module run using Valgrind or others === 138 143 139 144 Modules (or any tests?) can run with `valgrind` (probably `--tool=memcheck`). This could be done on the level of testing classes but the better option is to integrate this functionality (optional running with `valgrind`). Environmental variable (GRASS_PYGRASS_VALGRIND) or additional option `valgrind_=True` (similarly to overwrite) would invoke module with `valgrind` (works for both binaries and scripts). Additional options can be passed to `valgrind` using `valgrind`'s environmental variable `$VALGRIND_OPTS`. Output would be saved in file to not interfere with module output. … … 142 147 143 148 PyGRASS or specialized PyGRASS module runner (function) in testing framework can have function, global variable, or environmental variable which would specify which tool should run a module (if any) and what are the parameters (besides the possibility to set parameters by environmental variable defined by the tool). The should ideally be separated from the module output and go to a file in the test output directory (and it could be later linked from/included into the main report). 149 150 Having output from many modules can be confusing (e.g. we run `r.info` before actually running our module). It would be ideal if it would be possible to specify which modules called in the test should run with `valgrind` or other tool. API for this may, however, interfere with the API for global settings of running with these tools. It is not clear if `valgrind` would be applied even for library tests. This would require to run the testing process with `valgrind`. But since it needs to run separately anyway, this can be done. 151 152 153 === Dependencies === 154 155 ==== Dependencies on other tests ==== 156 157 The test runner needs to know if the dependencies are fulfilled, hence if the required modules and library tests were successful. So there must be a databases that keeps track of the test process. For example, if the raster library test fails, then all raster test will fail, such a case should be handled. The tests would need to specify the dependencies (there might be even more test dependencies then dependencies of the tested code). 158 159 Alternatively, we can ignore dependency issues. We can just let all the tests fail if dependency failed (without us checking that dependency) and this would be it. By tracking dependencies you just save time and you make the result more clear. Fail of one test in the library, or one test of a module does not mean that the test using it was using the broken feature, so it can be still successful (e.g. failed test vector library 3D capabilities and module accessing just 2D geometries). Also not all tests of dependent code have to use that dependency (e.g. particular interpolation method). 160 161 The simplest way to implement parallel dependency checking would be to have a file lock (e.g., [http://code.activestate.com/recipes/65203/ Cross-platform API for flock-style file locking]), so that only a single test runner has read and write access to the test status text file. Tests can run in parallel and have to wait until the file is unlocked. Consequently the test runner should not crash so that the file lock is always removed. 162 163 Anyway, dependency checking may be challenging if we allow parallel testing. Not allowing parallel testing makes the test status database really simple, it's a text file that will be parsed by the test runner for each test script execution and extended with a new entry at the end of the test run. Maybe at least the library test shouldn't be executed in parallel (something might be in the make system already). 164 165 Logs about the test state can be used to generate a simple test success/fail overview. 166 167 168 ==== Dependencies of tested code ==== 169 170 Modules such as G7:r.in.lidar (depends on libLAS) or G7:v.buffer (depends on GEOS) are not build if the dependencies are not fulfilled. It might be good to have some special indication that the dependency is missing but this might be also leaved as task of test author who can implement special test function which will just check the presence of the module. Thus that the tests failed because of missing dependency would be visible in the test report. 171 172 173 == Reports from testing == 174 175 Everything should go (primarily) to files and directories with some defined structure. Something would have to gather information from files and build some main pages and summary pages. The advantage of having everything in files is that it might be more robust and that it can easily run in parallel. However, gathering of information afterwards can be challenging. Files are really the only option how to integrate valgrind outputs. 176 177 178 There is `TextTestRunner` in `unittest`, the implementation will start from there. For now, the testing framework will focus on HTML output. However, the goal is something like `GRASSTestRunner` which could do multiple outputs simultaneously (in the future) namely HTML, XML (there might be some reusable XML schemes for testing results) and TXT (might be enriched by some reStructuredText or Markdown or really plain). Some (simple) text (summary) should go in to standard output in parallel to output to files. 179 180 It is not clear if the results should be organized by test functions (`test_*`) or only by test scripts (modules, test cases). 181 182 Details to one test (not all have to be implemented): 183 184 * standard output and standard error output of tests 185 * it might be hard to split if more than one module is called (same applies to functions) 186 * Valgrind output or output by another tool used for running a modules in test 187 * might be from one or more modules 188 * the tested code 189 * code itself with e.g., [http://pygments.org/ Pygments] or links to Doxygen documentation 190 * it might be unclear what code to actually include (you can see names of modules, function, you know in which directory test suite was) 191 * the testing code to see what exactly was tested and failed 192 * pictures generated from maps for tests which were not successful (might be applied also to other types but this is really a bonus) 193 194 Generally, the additional data can be linked or included directly (e.g. with some folding in HTML). This needs to be investigated. 195 196 Each test (or whatever is generating output) will generate an output file which will be possible to include directly in the final HTML report (by link or by including it into some bigger file). Test runner which is not influenced by fatal errors and segmentation faults has to take care of the (HTML) file generation. The summary pages will be probably done by some reporter or finalizer script. The output of one standalone test script (which can be invoked by itself) will have (nice) usable output (this can or even should be reused in the main report). 144 197 145 198 … … 181 234 Test scripts will work when directly executed from command line and when executed from the make system. When tests will executed by make system they might be executed by a dedicated "test_runner" Python script to set up the environment. However, the environment can be set up also inside the test script and not setting the environment would be the default (or other way around since setting up a different environment would be safer). 182 235 236 To actually have separate processes is necessary in any case because only this makes testing framework robust enough to handle (GRASS) fatal error calls and segmentation faults. 237 183 238 Tests should be executable by itself (i.e. they should have `main()` function) to encourage running them often. This can be used by the framework itself rather then imports because it will simplify parallelization and outputs needs to go to files anyway (because of size) and we will collect everything from the files afterwards (so it does not matter if we will use process calls or imports). 239 240 184 241 === Example run === 185 242 … … 195 252 python ${test} >> /tmp/tgis_lib.txt 2>&1 196 253 done 197 }}} 254 }}} 198 255 199 256 The test output will be written to stdout and stderr (piped to a file in this case). … … 205 262 }}} 206 263 264 or 265 266 {{{ 267 make test 268 }}} 269 270 which might be more standard solution. 271 272 273 == Testing on MS Windows == 274 275 On Linux and all other unix-like systems we expect that the test will be done only when you also compile GRASS by yourself. This cannot be expected on MS Windows because of complexity of compilation and lack of MS Windows-based GRASS developers. Moreover, because of experience with different failures on different MS Windows computers (depending not only on system version but also system settings) we need to enable tests for as many users (computers) as possible. 276 277 Invoking the test script on a MS Windows by hand and by make system should work. Test will be executed in the source tree in the same way as on Linux. 278 279 I hope that we can get to the state that users will be able to test GRASS. It is Python. We can use make system but also discovery (our or unittest's). The only problem I currently see is different layout of dirs in src and dist but it might not be an issue. 280 281 Libraries are tested through ctypes, modules as programs, and the rest is mostly Python, so this should work in any case. However, there are several library tests that are executable programs (usually GRASS modules), for example in gmath, gpde, raster3d. These modules will be executed by testing framework inside testing functions (`test_*`). These modules are not compiled by default and are not part of the distribution. They need to be compiled in order to run the test. I guess we can compile additional modules and put them to one separate directory in distribution, or we can have debug distribution with testing framework and these modules, or we can create a similar system as we have for addons (on MS Windows). The modules could be compiled a prepared on server for download and they would be downloaded by testing framework or upon user request. 282 207 283 208 284 == Locations, mapsets and data == … … 214 290 We should have dedicated test locations with different projections and identical map names. I wouldn't use the GRASS sample locations (NC, Spearfish) as test locations directly. We should have dedicated test locations with selected data. They can overlap with (let's say) NC but may contain less imagery but on the other hand some additional strange data. The complication are doctests which are documentation, so as a consequence they should use the intersection of NC sample and testing location. The only difference between the locations would be the projection, so it really makes difference only for projected, latlon and perhaps XY. 215 291 292 All data should be in PERMANENT mapset. The reason is that on the fly generated temporary mapsets will have only access to the PERMANENT mapset by default. Access to other mapsets would have to be explicitly set. This might be the case when user wants to use his or her own mapset. On the other hand, it might be advantageous to have maps in different mapsets and just allow the access to all these mapsets. User would have to do the same and would have to keep the same mapset structure (which might not be so advantageous) which is just slightly more complex then keep the same map names (which user must do in any case). 293 216 294 If multiple locations are allowed and we expect some maps being in the location such as elevation raster, it is not clear how to actually test the result such as aspect computed from elevation since the result (such as MD5 sum) will be different for each location/projection. This would mean that the checking/assert functions or tests themselves would have to handle different locations and moreover, this type of tests would always fail in the user provided location. 217 295 … … 228 306 All reference files (and perhaps also additional data) will be located in the `testsuite` directory. There can be also one global directory with additional data (e.g. data to import) which will be shared between test suites and exposed by the testing framework. 229 307 308 The reference checking in case of different locations (projections) can only be solved in the test itself. The test author has to implement a conditioned reference check. Alternatively, a function (e.g., `def pick_the_right_reference(general_reference_name, location_name)`) could be implemented to help with getting the right reference file (or perhaps value) because some naming conventions for reference files will be introduced anyway. 309 230 310 Testing framework design should allow us to make different decisions about how to solve data and locations questions. 231 311 232 == Weekly reports == 312 Testing data will be available on server for download. The testing framework can download them if test is requested by user. The data can be saved in the user home directory and used next time. This may simplify things for users and also it will be clear for testing framework where to find testing data. 313 314 315 == GSoC weekly reports == 233 316 234 317 === Week 01 ===