Opened 16 years ago

Closed 7 years ago

#542 closed enhancement (fixed)

grass7 vector libraries modifications

Reported by: mmetz Owned by: grass-dev@…
Priority: major Milestone: 8.0.0
Component: Vector Version: svn-trunk
Keywords: Cc: martinl
CPU: All Platform: All

Description

I want to suggest some more profound changes to the vector model for grass7. These changes would affect topology, spatial index and maybe category index, but not the coor file. That means that there will be limited forward/backward compatibility: topology would need to be rebuilt before vectors can be accessed. Vector modules would not need to be rewritten, but more efficient library functions could be made available.

My general idea/complaint is that the current topology layout is not tailored towards vector object types; instead several (very) different types (points, lines, boundaries, centroids, faces, kernels) are stored in the same structure. Working with one particular type is a bit inefficient because the desired type has to be selected out of everything stored in this universal structure every single time. I am sure that a lot of time and space can be safed with a redesigned topology layout and vector libraries that make use of it. As an example, what I want to get rid of is

for (line = 0; line < nlines; line++) {
   if (!Vect_line_alive(Map, line))
     continue;
   type = Vect_read_line(map, points, cats, line);
   if (!(type & otype))
      continue;
   /* process line */
}

The whole coor file is read, in the worst case e.g. just to get the few centroids in it. This can not always be avoided or changed, but could often be replaced with e.g.

for (centroid = 1; centroid < ncentroids; centroid++) {
   /* process centroid */
}

The current implementation has some consequences of which I am not sure if they are actually desired. E.g. when cleaning a vector with tool=snap (snapping vertices of lines and boundaries), lines and boundaries may be snapped together at the same time: a boundary may be snapped to a line and vice versa. Maybe this is sometimes desired, but maybe this should be avoided? Another example is removing duplicates: currently it is possible to do that for points and centroids together, and if there are a point and a centroid with identical coordinates, one of them is deleted (random selection).

With the changes I have in mind, the size of support structures should generally go down, most for point datasets, least for areas. Massive point datasets like LIDAR could be easier processed on level 2 with topology, because support structures for massive point datasets would be reduced in size by about 70% (rough estimates: spatial index reduced down to 25%, topology reduced down to 40%).

There are however some problems with my suggestions: 1) IMHO nobody should decide on that alone, 2) the coding is too much for one person alone, e.g. I can't do all that without help, 3) I'm not really a programmer, 4) I don't know enough about vector geometry algorithms.

Below are more technical details:

Status quo

the coor file holds lines (better: primitives) of types
point
line
boundary
centroid
face (3D boundary, not yet implemented)
kernel (3D centroid, not yet implemented)

structures derived from these types are
nodes
areas
isles
edges (3D areas, not yet implemented)
volumes (3D shapes, not yet implemented)
holes (3D volumes within volumes, like isles in areas, not yet implemented)

topology holds information about
nodes
lines
areas
isles

where lines can be points, lines, boundaries, centroids, faces, or kernels

see http://trac.osgeo.org/grass/browser/grass/trunk/include/vect/dig_structs.h#L440

points, lines, boundaries, centroids, faces, kernels are obviously different things, but the current topology layout squeezes all of them into the same structure with information about: start node (assigned for all types, but not needed for points, centroids, kernels)
end node (used for lines and boundaries, otherwise unused)
area to left (for boundary, area for centroid, unused for all other types)
area to right (for boundary, unused for all other types)
3D bounding box (completely redundant for points, centroids, kernels)
offset (into coor file)
type (point, line, boundary, centroid, face, or kernel)

Proposed new layout

the coor file would hold the same types as before. To avoid confusion, all coordinate strings would be referred to as primitives (like in the output of current v.build), but that's just naming. IMHO anything but line is fine. A line can be a line or boundary or point or ... is too philosophical for my taste.

topology would have a separate data structure for each of
points
lines
boundaries
nodes (only needed for lines, boundaries, and faces)
centroids
areas
isles
faces
edges
volumes
holes

An additional small data structure would be needed that would be a boiled down replacement of current P_Line with information about primitives.

Similarly, a separate spatial index would be created for each type separately, instead of lumping all points, lines, boundaries, centroids, faces, and kernels into the same spatial index. It is more efficient with regard to time and space if separate spatial indices are maintained.

I'm reaching limits on what I can change in the vector libs without breaking compatibility, and I'm sometimes getting frustrated with the waste of time and space for large vectors. IIUR grass7 is an opportunity to introduce changes like these, so I hope to initiate a discussion and for more ideas on how to improve grass vector handling.

Regards,

Markus M

Change History (9)

comment:1 by mlennert, 16 years ago

As no one has ever reacted to this, I will now: I'm not a real programmer, so won't be able to say much about the details, but from my limited experience, what you propose sounds very reasonable and I encourage you to go on in that direction.

Moritz

comment:2 by martinl, 15 years ago

Cc: martinl added
Priority: minormajor

in reply to:  description ; comment:3 by martinl, 15 years ago

Replying to mmetz:

face (3D boundary, not yet implemented)
edges (3D areas, not yet implemented)

Just very little note: shouldn't be

  • face 3D area

and

  • edge 3D boundary

?

Thanks, Martin

in reply to:  3 ; comment:4 by mmetz, 15 years ago

Replying to martinl:

Replying to mmetz:

face (3D boundary, not yet implemented)
edges (3D areas, not yet implemented)

Just very little note: shouldn't be

  • face 3D area

and

  • edge 3D boundary

?

You're right, yes. The documentation is a bit contradicting, e.g. in GRASS7 programmer's manual it says "Face and kernel are 3D equivalents of boundary and centroid...", but edge is commonly used as 2D/3D boundary (Vector network analysis, general graph theory). Thus for 3D we would have edges in the coor file, faces (3D areas) are going to be constructed from edges and volumes are going to be constructed from faces and kernels. Is this nomenclature ok?

Markus M

in reply to:  4 comment:5 by martinl, 15 years ago

You're right, yes. The documentation is a bit contradicting, e.g. in GRASS7 programmer's manual it says "Face and kernel are 3D equivalents of boundary and centroid...", but edge is commonly used as 2D/3D boundary (Vector network analysis, general graph theory). Thus for 3D we would have edges in the coor file, faces (3D areas) are going to be constructed from edges and volumes are going to be constructed from faces and kernels. Is this nomenclature ok?

Seems to be reasonable to me.

Martin

comment:6 by martinl, 9 years ago

I am not sure if it is still relevant to GRASS 7. Probably moving this ticket to milestone GRASS 8 would make better sense?

comment:7 by martinl, 9 years ago

Milestone: 7.0.07.0.5

comment:8 by martinl, 8 years ago

Milestone: 7.0.58.0.0

in reply to:  6 comment:9 by mmetz, 7 years ago

Resolution: fixed
Status: newclosed

Replying to martinl:

I am not sure if it is still relevant to GRASS 7. Probably moving this ticket to milestone GRASS 8 would make better sense?

The proposed changes for 2D topology have been implemented in GRASS 7. A new ticket should be opened for 3D topology if the need arises. Closing ticket.

Note: See TracTickets for help on using tickets.