wiki:DevWikiPostGISCoding

Version 3 (modified by pramsey, 14 years ago) ( diff )

--

PostGIS Coding Notes

While going through the PostGIS code base, particularly in the ./postgis directory, it will not be uncommon to say to yourself "what the hell is that?". There is a strange interplay between PgSQL memory management and provided MACROS which can make it unclear what is going on behind the scenes.

PostgreSQL is the Man Behind the Curtain

All memory allocation in the ./liblwgeom directory should be done with lwalloc/lwfree/lwrealloc and in the ./postgis directory should be done with palloc/pfree/repalloc. Why so? the core, memory management in PostGIS does not matter because PostgreSQL actually handles all memory in a "heirarchical memory manager". That means PostgreSQL is the one calling the actual system malloc, and your palloc calls are managed by PostgreSQL inside larger pages ("contexts") that it mallocs. Each time a function is called, PostgreSQL creates a new memory context and all palloc/pfree calls happen in that context; when the function call is complete, the whole context is discarded, which means any extra memory you failed to free is discarded too. Basically all your memory management is for naught, because PostgreSQL will clean up after the end of the function call.

The lwalloc/lwfree calls in liblwgeom are themselves palloc/pfree calls when made within the context of the PostgreSQL engine. However, they can also be called outside of PostgreSQL, for example in the shp2pgsql and pgsql2shp utilities: in those cases they are direct calls to malloc/free. So memory management matters more in ./liblwgeom than in ./postgis. In ./liblwgeom there might not be someone cleanly up behind you.

It's important not to abuse the PostgreSQL memory manager though, because sometimes our GIS calls use up a lot of memory inside just one function call. Below are some PostgreSQL MACROs and ./liblwgeom functions that are useful for memory management and/or just confusing in and of themselves.

PG_FUNCTION_INFO_V1(functionname)

This macro is in front of all PostgreSQL C functions, it ensures the following function is properly declared so that the database can pick it up out of the DLL/SO/DYLIB generated when PostGIS is compiled. Just copy and paste an example.

functionname(PG_FUNCTION_ARGS)

The actual arguments to a PostgreSQL function are various function contexts and database internals that we don't want to care about. All that stuff is hidden in this macro, and we use the PG_GETARG macros later on to retrieve the information we really want.

PG_GETARG

Most of the PG_GETARG calls are pretty self explanatory. There's the PG_GETARG part, the part that declares the data type you are retrieving, and the argument number you are retrieving. So, PG_GETARG_INT32(0) gets the first argument as an integer.

PG_DETOAST_DATUM(PG_GETARG_DATUM())

Datum? Every piece of information in the database is passed around as a Datum, which is a de-natured pointer. For big objects, like geography objects, the datum itself might not point directly to the object, it might point to a "TOAST tuple" which in turn points to where the data is stored. In order to access the whole object, we need to "de-TOAST" it, hence we first get the datum number for our argument object, then de-toast it into a pointer. The pointer is untyped, so you will usually see this macro call in conjunction with a (TYPE*) cast to the appropriate pointer type.

VARLENA

All PostGIS objects are "varlena", they don't have a fixed size. A polygon can have 4 points, or it can have 400. Similarly text is variable length. All variable length objects in PostgreSQL are required to start with 4 bytes of metadata, which are mostly used to declare a size for what follows behind. The PostgreSQL text type is instructive. Here's an example of turning a C string into a text object suitable for returning to the database.

char *str = "my string";
text *text;
size_t str_size = strlen(str);
/* We need space for both the string and the metadata header! Note, no space for null terminator! */
text = palloc(str_size + VARHDRSZ);
/* Use macro to write that size information into the header of the text object */
SET_VARSIZE(text, str_size + VARHDRSZ);
/* Copy from str into the data area of the text object, given by VARDATA macro */
memcpy(VARDATA(text),str,str_size);
/* Return using the pointer return type. Can also return with PG_RETURN_POINTER. */
PG_RETURN_TEXT_P(text);

An Example

PG_FUNCTION_INFO_V1(LWGEOM_shortestline2d);
Datum LWGEOM_shortestline2d(PG_FUNCTION_ARGS)
{
       PG_LWGEOM *result;
       /* Detoast the actual PgSQL varlena structures, in our case PG_LWGEOM (soon to be GSERIALIZED) */
       PG_LWGEOM *geom1 = (PG_LWGEOM*)PG_DETOAST_DATUM(PG_GETARG_DATUM(0));
       PG_LWGEOM *geom2 = (PG_LWGEOM*)PG_DETOAST_DATUM(PG_GETARG_DATUM(1));
       /* Build LWGEOM from the varlena (soon to be with lwgeom_from_gserialized) */
       LWGEOM *lwgeom1 = pglwgeom_deserialize(geom1);
       LWGEOM *lwgeom2 = pglwgeom_deserialize(geom2);
       LWGEOM *theline;


       if (lwgeom1->srid != lwgeom2->srid)
       {
               elog(ERROR,"Operation on two GEOMETRIES with different SRIDs\n");
               PG_RETURN_NULL();
       }

       theline = lw_dist2d_distanceline(lwgeom1, lwgeom2, lwgeom1->srid, DIST_MIN);
       if (lwgeom_is_empty(theline))
               PG_RETURN_NULL();

       /* Serialize the result back down from LWGEOM, but don't return right away */
       result = pglwgeom_serialize(theline);
       /* First free the LWGEOMs you used */
       lwgeom_free(lwgeom1);
       lwgeom_free(lwgeom2);

       /* Then call free_if_copy on the *varlena* structures you originally get as arguments */
       PG_FREE_IF_COPY(geom1, 0);
       PG_FREE_IF_COPY(geom2, 1);

       /* And now return */
       PG_RETURN_POINTER(result);
}

Attachments (2)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.