Opened 16 years ago

Last modified 9 years ago

#516 new enhancement

v.extract slow on large datasets

Reported by: gisboa Owned by: grass-dev@…
Priority: minor Milestone: 6.4.6
Component: Vector Version: unspecified
Keywords: Cc:
CPU: All Platform: All

Description

Using v.extract on large datasets is incredibly slow. From a 3,000,000 areas dataset I extracted the first 99 (id<100). It took 12 minutes to extract the geometries, after that it says 'writing attributes' for another 6 minutes. The pg process is a runner-up in top, consuming about 50% cpu time, the remaing 50% goes to v.extract. What is going on here? Writing a hundred rows to PostgreSQL should take only a split second. Is this also due to the fact that the geometry index is not in a file? Would this be another reason to implement the file based geometry index? Maybe a few modules should be rewritten to perform a dedicated task on their own, instead of relying on others, if that makes it slow.

Change History (3)

in reply to:  description comment:1 by mmetz, 16 years ago

Replying to gisboa:

Using v.extract on large datasets is incredibly slow. From a 3,000,000 areas dataset I extracted the first 99 (id<100). It took 12 minutes to extract the geometries,

There are probably several reasons for this. The spatial index is built from topology, that can take a bit. The category index used to select features is rather inefficient for large numbers of categories. These two aspects are handled by the vector libs. v.extract itself has potential for speed improvement. Regarding the vector libs, changes of the spatial index and the category index will only be done in grass7. Improving v.extract is possible for grass6, I have some ideas, but I won't get to it soon, and I don't know if anybody else will rewrite v.extract soon.

after that it says 'writing attributes' for another 6 minutes. The pg process is a runner-up in top, consuming about 50% cpu time,

I think Glynn answered that in his comment to #513.

Would this be another reason to implement the file based geometry index?

Probably yes. But that's not easy. There are "off-the-shelf" solutions for that, but 1) someone needs to evaluate these solutions for their suitability for grass, and 2) someone has to implement it.

Maybe a few modules should be rewritten to perform a dedicated task on their own, instead of relying on others, if that makes it slow.

AFAICT, v.extract does not rely on other modules, it uses library functions only. IMHO, modules should not bypass core libraries. If a particular task is done inefficiently by the core libraries, these libraries need to be improved. A workaround for a specific module would only create a mess.

comment:2 by mlennert, 9 years ago

See also #2587

comment:3 by neteler, 9 years ago

Milestone: 6.4.06.4.6
Note: See TracTickets for help on using tickets.