wiki:DevWikiPostGISCoding

PostGIS Coding Notes

While going through the PostGIS code base, particularly in the ./postgis directory, it will not be uncommon to say to yourself "what the hell is that?". There is a strange interplay between PgSQL memory management and provided MACROS which can make it unclear what is going on behind the scenes.

PostgreSQL is the Man Behind the Curtain

All memory allocation in the ./liblwgeom directory should be done with lwalloc/lwfree/lwrealloc, in the ./postgis directory should be done with palloc/pfree/repalloc, in the ./raster/rt_core directory should be done with rtalloc/rtdealloc/rtrealloc and in the ./raster/rt_pg directory should be done with palloc/pfree/repalloc. Why so? At the core, memory management in PostGIS does not matter because PostgreSQL actually handles all memory in a "heirarchical memory manager" behind the scenes. That means PostgreSQL is the one calling the actual system malloc, and your palloc calls are managed by PostgreSQL inside larger pages ("contexts") that it mallocs. Each time a SQL function is called, PostgreSQL creates a new memory context and all your palloc/pfree calls happen in that context; when the SQL function call is complete, the whole context is discarded, which means any extra memory you failed to free (whoops!) is discarded too (phew!). Basically all your memory management is for naught, because PostgreSQL will clean up after the end of the function call.

The lwalloc/lwfree calls in liblwgeom are themselves palloc/pfree calls when made within the context of the PostgreSQL engine. However, they can also be called outside of PostgreSQL, for example in the shp2pgsql and pgsql2shp utilities: in those cases they are direct calls to malloc/free. So memory management matters more in ./liblwgeom than in ./postgis. In ./liblwgeom there might not be someone cleaning up memory behind you.

However, it is important not to abuse the PostgreSQL memory manager, because sometimes our GIS calls use up a lot of memory inside just one function call. Try to clean up as if you were not working inside a memory manager. Below are some PostgreSQL MACROs and ./liblwgeom functions that are useful for memory management and/or just confusing in and of themselves.

PG_FUNCTION_INFO_V1(functionname)

This macro is in front of all PostgreSQL C functions, it ensures the following function is properly declared so that the database can pick it up out of the DLL/SO/DYLIB generated when PostGIS is compiled. Just copy and paste an example.

functionname(PG_FUNCTION_ARGS)

The actual arguments to a PostgreSQL function are various function contexts and database internals that we don't want to care about. All that stuff is hidden in this macro, and we use the PG_GETARG macros later on to retrieve the information we really want.

PG_GETARG

Most of the PG_GETARG calls are pretty self explanatory. There's the PG_GETARG part, the part that declares the data type you are retrieving, and the argument number you are retrieving. So, PG_GETARG_INT32(0) gets the first argument as an integer.

 PG_FUNCTION_INFO_V1(multiply)
 Datum multiply(PG_FUNCTION_ARGS)
 {
   int count = PG_GETARG_INT32(0);
   double factor = PG_GETARG_DOUBLE(1);
   PG_RETURN_DOUBLE(factor * count);
}

PG_DETOAST_DATUM(PG_GETARG_DATUM())

Datum? Every piece of information in the database is passed around as a Datum, which is a de-natured pointer. For big objects, like geography objects, the datum itself might not point directly to the object, it might point to a "TOAST tuple" which in turn points to where the data is stored. In order to access the whole object, we need to "de-TOAST" it, hence we first get the datum number for our argument object, then de-toast it into a pointer. The pointer is untyped, so you will usually see this macro call in conjunction with a (TYPE*) cast to the appropriate pointer type.

VARLENA

All PostGIS objects are "varlena", they don't have a fixed size. A polygon can have 4 points, or it can have 400. Similarly text is variable length. All variable length objects in PostgreSQL are required to start with 4 bytes of metadata, which are mostly used to declare a size for what follows behind. The PostgreSQL text type is instructive. Here's an example of turning a C string into a text object suitable for returning to the database.

char *str = "my string";
text *text;
size_t str_size = strlen(str);
/* We need space for both the string and the metadata header! Note, no space for null terminator! */
text = palloc(str_size + VARHDRSZ);
/* Use macro to write that size information into the header of the text object */
SET_VARSIZE(text, str_size + VARHDRSZ);
/* Copy from str into the data area of the text object, given by VARDATA macro */
memcpy(VARDATA(text),str,str_size);
/* Return using the pointer return type. Can also return with PG_RETURN_POINTER. */
PG_RETURN_TEXT_P(text);

LWGEOM

LWGEOM are the working structure of the PostGIS algorithms. Because PostGIS is written in C, LWGEOM is not quite a abstract superclass of all the geometry variants, but it tries hard to be. Every LWGEOM variant can be cast to LWGEOM, and will retain common references to lgeom->type, lwgeom->flags, and lwgeom->bbox. The type in an integer type number (POINT, LINE, POLYGON, etc). The flags is a set of eight binary bits indicating the presence of Z or M dimensions, whether the interpretation of the coordinates in geodetic, and so on. The the bbox is a pointer to a GBOX which defines the extend of the geometry in up to 4 dimensions.

After the standard components, the LWGEOM variants start to differ. LWLINE, LWPOINT, TRIANGLE< and LWCIRCSTRING have a pointer to a POINTARRAY (the coordinates). POLYGON has ring count, then a pointer to a list of POINTARRAY pointers (the rings). All other variants have a sub-element counts, then a pointer to a list of LWGEOM pointers. For homogenous collections, the sub-geometries may be appropriately types, but from a memory-layout point-of-view, they are all identical.

All geometries are, at their core, collections of coordinates, just with different organizations and different interpretations between the coordinates (multi geometries versus single, or linear versus curved, or cartesian versus geodetic). The coordinates are managed inside the POINTARRAY type, in a "serialized_pointlist". When LWGEOMs are constructed from WKT or WKB (using ptarray_construct_copy_data() ), each POINTARRAY has it's own serialized_pointlist allocated and the coordinates copied into that list. When LWGEOMs are constructed from database serializations (see GSERIALIZED and PG_LWGEOM below) the serialized_pointlist is not separately allocated, it is a pointer into a position in the GSERIALIZED or PG_LWGEOM, constructed using ptarray_construct_reference_data().

Polygon with POINTARRAYs on top of database storage. Polygon with independent POINTARRAY storage.

Regardless of whether a POINTARRAY is database-backed, or independently allocated, the lwgeom_free() and ptarray_free() functions should automatically "do the right thing", if the POINTARRAY was built using one of the constructors provided by the POINTARRAY API. When built with ptarray_construct_reference_data(), a POINTARRAY has its FLAGS_READ_ONLY bit set, which tells the ptarray_free() function to not free the underlying serialized_pointlist (because it is not managed by the POINTARRAY, but by the database).

LWGEOM vs PG_LWGEOM vs GSERIALIZED

PG_LWGEOM and GSERIALIZED are both PostgreSQL varlena types. They have 4 bytes of metadata about size, then a payload. The payload of the PG_LWGEOM is described in the ./postgis/SERIALIZED_FORM document, and the payload in the GSERIALIZED is described in ./liblwgeom/gserialized.txt. Basically they are serializations of a geometry, similar to well-known binary (WKB).

LWGEOM is a struct, with much more internal structure than the varlena byte strings. It has a type number, some flags, it has coordinates that are represented by a pointer(s) to POINTARRAY structs, or if it is a collection type it has sub-geometries represented by more pointers to LWGEOM structs. The ./liblwgeom directory is a library for working with LWGEOM structs.

You can get LWGEOM from PG_LWGEOM using pglwgeom_deserialize(), and you can get a PG_LWGEOM from an LWGEOM using pglwgeom_serialize().

You can get LWGEOM from GSERIALIZED using lwgeom_from_gserialized() and you can get a GSERIALIZED from an LWGEOM using gserialized_from_lwgeom().

Once you have instantiated an LWGEOM, you should remember to free it with lwgeom_free(). In the case where you want to free the LWGEOM struct and bounding box, but leave the sub-geometries or POINTARRAYs intact, use lwgeom_release().

PG_FREE_IF_COPY()

As noted above, when you PG_DETOAST_DATUM, you create an in memory representation of the varlena. Sometimes this is a copy and sometimes it is a direct pointer to the original tuple. You can free the copy with PG_FREE_IF_COPY(varlena_pointer, argumentnumber). So, if you detoasted argument zero into a varlena named foo, you'd call PG_FREE_IF_COPY(foo, 0) before exiting the function. See the example below.

An Example

Note the order in which things are allocated and freed.

  1. We get our pointers to the varlena objects
  2. We deserialize them into LWGEOM
  3. We do our work
  4. We serialize our result back to a varlena
  5. We free our LWGEOM
  6. We conditionally free our input varlena
  7. We return
PG_FUNCTION_INFO_V1(shortestline2d);
Datum shortestline2d(PG_FUNCTION_ARGS)
{
       GSERIALIZED *result;
       LWGEOM *theline;

       /* Detoast the actual PgSQL varlena structures, in our case GSERIALIZED */
       GSERIALIZED *geom1 = (GSERIALIZED*)PG_DETOAST_DATUM(PG_GETARG_DATUM(0));
       GSERIALIZED *geom2 = (GSERIALIZED*)PG_DETOAST_DATUM(PG_GETARG_DATUM(1));

       /* Build LWGEOM from the varlena (soon to be with lwgeom_from_gserialized) */
       LWGEOM *lwgeom1 = lwgeom_from_gserialized(geom1);
       LWGEOM *lwgeom2 = lwgeom_from_gserialized(geom2);

       /* SRID consistency test */
       if (lwgeom1->srid != lwgeom2->srid)
       {
               elog(ERROR,"Operation on two GEOMETRIES with different SRIDs\n");
               PG_RETURN_NULL();
       }

       /* Calculate the shortest line, returned as an LWGEOM */
       theline = lw_dist2d_distanceline(lwgeom1, lwgeom2, lwgeom1->srid, DIST_MIN);
       if (lwgeom_is_empty(theline))
               PG_RETURN_NULL();

       /* Finished with the input LWGEOMs */
       lwgeom_free(lwgeom1);
       lwgeom_free(lwgeom2);

       /* Serialize the result back down from LWGEOM, but don't return right away */
       result = geometry_serialize(theline);

       /* Finished with the derived LWGEOM */
       lwgeom_free(theline);

       /* Then call free_if_copy on the *varlena* structures you originally get as arguments */
       PG_FREE_IF_COPY(geom1, 0);
       PG_FREE_IF_COPY(geom2, 1);

       /* And now return */
       PG_RETURN_POINTER(result);
}
Last modified 13 years ago Last modified on 05/24/12 10:36:36

Attachments (2)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.