Opened 16 years ago
Last modified 9 years ago
#516 new enhancement
v.extract slow on large datasets
Reported by: | gisboa | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | 6.4.6 |
Component: | Vector | Version: | unspecified |
Keywords: | Cc: | ||
CPU: | All | Platform: | All |
Description
Using v.extract on large datasets is incredibly slow. From a 3,000,000 areas dataset I extracted the first 99 (id<100). It took 12 minutes to extract the geometries, after that it says 'writing attributes' for another 6 minutes. The pg process is a runner-up in top, consuming about 50% cpu time, the remaing 50% goes to v.extract. What is going on here? Writing a hundred rows to PostgreSQL should take only a split second. Is this also due to the fact that the geometry index is not in a file? Would this be another reason to implement the file based geometry index? Maybe a few modules should be rewritten to perform a dedicated task on their own, instead of relying on others, if that makes it slow.
Replying to gisboa:
There are probably several reasons for this. The spatial index is built from topology, that can take a bit. The category index used to select features is rather inefficient for large numbers of categories. These two aspects are handled by the vector libs. v.extract itself has potential for speed improvement. Regarding the vector libs, changes of the spatial index and the category index will only be done in grass7. Improving v.extract is possible for grass6, I have some ideas, but I won't get to it soon, and I don't know if anybody else will rewrite v.extract soon.
I think Glynn answered that in his comment to #513.
Probably yes. But that's not easy. There are "off-the-shelf" solutions for that, but 1) someone needs to evaluate these solutions for their suitability for grass, and 2) someone has to implement it.
AFAICT, v.extract does not rely on other modules, it uses library functions only. IMHO, modules should not bypass core libraries. If a particular task is done inefficiently by the core libraries, these libraries need to be improved. A workaround for a specific module would only create a mess.