#944 closed enhancement (fixed)
typmod support for PostGIS geometry - make geometry_columns a view
Reported by: | robe | Owned by: | pramsey |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 2.0.0 |
Component: | postgis | Version: | master |
Keywords: | history | Cc: |
Description
Paul,
You said you are planning to put typmod for geometry, but didn't see a ticket in place for it.
Have you thought about all the side issues. I would be willing to help on that.
Issues I think need to be addressed:
What happens to geometry_columns. Do we keep it as a table and create a view called ST_Geometry_Columns (which is what I believe SQL/MM says it should be called anyway). This seems least invasive, though I see it posing problems for viewing and mapping tools.
What happens with AddGeometryColumns, Populate etc. Do they still register in geometry_columns and create constraints or do they do the typmod thing, register in geometry_columns (assuming its still a table) as well.
I think you asked me once about why typmod wouldn't work in all cases. One thought I have is (which I need to test), it won't work for inheritance models where the geometry column type is different for each child table.
The reason is I don't think geometry(POINT,...) and geometry(POLYGON,...) will be considered the same type. So the what we discussed in chapter 3 of our book won't work with typmod. Would still be better with the old model of adding constraints. I still still need to verify that though. But really don't see how it would work since for example varchar(20) and varchar(50) are not legal in the same inheritance column.
Anyway sorry to make this long and painful and raise all these issues.
Change History (43)
comment:1 by , 14 years ago
comment:2 by , 14 years ago
I was hoping to just do it the same way geography does. A simple geometry_columns view on the system tables. For backwards compatibility, the Add/Remove utility functions can be simplified down to wrappers on ALTER TABLE ADD COLUMN, etc.
comment:3 by , 14 years ago
Normally I'd be the first to say down with the past except in this case I'm not convinced the new way is always superior to the past. Then again I think about my frustrating arguments with Josh Berkus how he thinks people should just use text and put size checks on their fields. Feels funny being on the other side of the argument.
If we are going to make the old functions wrappers around ALTER TABLE, what am I going to do about my case when I want POLYGONS/MULTIPOLYGONS in the same table because in many cases, there isn't any reason to treat them separately. Also the case of inheritance. Now my work flow is so much more complicated. Before I would just add it as a MULTIPOLYGON and manually relax the type constraint to allow POLYGONS. Also the case with the inheritance thing I talked about.
I know Paul is thinking "Why is it always about you, Regina?". Cause I'm the first use case I can think of :).
comment:4 by , 14 years ago
Oh I forgot about the issue of all those people with existing tables -- with your model we would have to force them to alter all their existing tables to typmod explicit types if they wanted to still use geometry_columns. I suspect that would take a long time with large tables. Not to mention we would probably have to create a script to do that to make it less painful.
comment:5 by , 13 years ago
Do we want a painful migration or to be stuck with an ugly hack forever?
comment:6 by , 13 years ago
ugly hack. ugly hack. ugly hack. Ugly is in the eye of the beholder, but pain is felt by everyone :) Besides that's why I'm going for an ST_geometry_columns. That is what SQL/MM says it should be called. So let's call the pure solution that and then perhaps one day we can deprecate geometry_columns and then remove it.
I also don't want an old thing AddGeometryColumns camoflaging a new way. If we are going to push people to do it the easy way CREATE TABLE ... geometry(POINT,..);
Why confuse them with having the old do the new. At some point we might want to deprecate AddgeometryColumns so its best people not rely on it to do a new thing.
comment:8 by , 13 years ago
You mean all the addgeometry stuff works against old_geometry_columns (instead of geometry_columns_legacy). I'm indifferent about the name.
Just that geometry_columns be a view and should be a fuse of old and new.
comment:9 by , 13 years ago
How do you think the view can fuse the old and new? It'll necessarily find all the non-typmod'ed columns in the system tables along with the typmodded ones...
comment:10 by , 13 years ago
SELECT all geometry columns with a geometry type (not like geometry) UNION ALL geometry_columns_old
How about this Paul, you go do your typmod thing since that's a known we want and we have no differences in opinion how that should be done.
I'll do the view, the revision of all the management functions. I'll even write the pre install script that renames peoples existing geometry_columns should they have one and I'll submit this as a patch for you to tear apart and do what you will with. Deal?
BTW: we also need to change the loader. I think as a clean beginning it should check versions of PostGIS and for version of PostGIS 2.0+ it should use the standard CREATE TABLE ...geometry(..) instead of the management functions. Would be nice if we had a switch option to allow people to choose the old addgeometrycolumns or new typmod behavior and default to new behavior, but that should go in as a separate ticket I think.
comment:11 by , 13 years ago
Gosh - this all sounds really really horrible and likely to come back and bite us later. Hard.
It sounds as if we just need a generic "GEOMETRY" type we can put in the typmod that will allow any type of geometry within the column. If you're really foolish enough to be mixing geometry types in a single column then you're definitely not adhering to the OGC standard, so I don't see lack of adherence to the OGC specification being an issue here.
Oh and talking about OGC spec compliance - even if we do get rid of the geometry_columns table, we still need to keep the AddGeometryColumn() and DropGeometryColumn() functions, even if they don't actually end up doing anything.
comment:12 by , 13 years ago
Mark,
I think you missed my two main points 1) If we expect people tomove to PostGIS 2.0, there current tables built with AddGeometryColumn should work as they did before. It would be painful to have them alter their geometry columns to typmod. With that said -- the postgresql catalog will be wrong for these geometries.
2) For inheritance tables -- it is not unusual and I don't think bad practice to have each child have a separate geometry type (though it should have the same srid and dimension). typmod simply doesn't work for this model.
3) There are varying degrees -- e.g. I think for example Oracle doesn't mind POLYGON/MULTIPOLYGON coexisting. Neither do ESRI Shapefiles. When you have a polygon in must cases it is more efficient to keep it that way than wrapping it in a multi wrapper. typmod doesn't work for this either.
comment:13 by , 13 years ago
In these cases, my responses would be:
1) If we expect people tomove to PostGIS 2.0, there current tables built with AddGeometryColumn? should work as they did before. It would be painful to have them alter their geometry columns to typmod. With that said -- the postgresql catalog will be wrong for these geometries.
This is something that could be automated as part of an upgrade. A script can be run to check the uniqueness of the geometry types within a table - if they are unique then we can update the catalog to represent the correct value, otherwise we set the typmod to be a generic GEOMETRY instead.
And also - if we're going to make people do things like a dump/reload (which they will need with the new geometry format in 2.0) we might as well do other horrible things as part of the upgrade.
2) For inheritance tables -- it is not unusual and I don't think bad practice to have each child have a separate geometry type (though it should have the same srid and dimension). typmod simply doesn't work for this model.
Again, set the typmod to a generic GEOMETRY type but then add your own constraints on your child tables. I would say that mixing things in this way is an advanced level feature.
3) There are varying degrees -- e.g. I think for example Oracle doesn't mind POLYGON/MULTIPOLYGON coexisting. Neither do ESRI Shapefiles. When you have a polygon in must cases it is more efficient to keep it that way than wrapping it in a multi wrapper. typmod doesn't work for this either.
This one is more tricky - if it were me, I would just use MULTIPOLYGONS with one polygon for single shapes. When you say more efficient, what exactly do you mean here?
comment:14 by , 13 years ago
1) Automation is not what I was worrying about. I was more worried about speed. For example converting a varchar(50) to say a varchar(1000) takes a painful amount of time for a large table. OF course I may be over estimating this. I think if Paul goes ahead with the typmod, I can test with a 10 million record table converting to typmod to see just how painful it is.
I really think we should avoid updating the catalogs directly. too much has changed betweenn 8.4-9.1 that I wouldn't want to risk screwing it up. Besides the PostgreSQL devs really frown on those changes.
2) Okay I can concede to that. It's a bit inconvenient than using addgeometrycolumn but I agree it's a more advanced use case and rare at that.
3) I think a lot of the speed issues we have addressed for this. In the past it was significantly faster doing checks with polygons than multipolygons even when a multipolygon had only one polygon. This is probalby more an issue with third party tools than PostGIS.
comment:15 by , 13 years ago
Mark,
AddGeometryColumn is an OGC standard but seems to not be mentioned at all in the SQL/MM specs and in the OGC it does call it a table that needs registration. Same for DropGeometryColumn.
The SQL/MM spec (seciton ISO/IEC CD 13249-3:200x() I am looking at doesn't even have geometry_columns but talks about ST_GEOMETRY_COLUMNS (which is a view (living in ST_Information_Schema) "which lists the columns whose declared type is ST_Geometry or one of its subtypes;" ). Thus my suggestion that our pure solution that only looks at the system catalogs be called ST_Geometry_Columns and that geometry_columns be a hybrid.
One additional note. Given the amount of back and forth we have had on this topic, I think this one calls for an RFC of how exactly we propose to change this. I think typmod is great and we should go ahead with that so its really the question of how to manage the metadata. To me this is a fundamental change in the way people operate and the way we will be recommending people to operate so it needs more consideration and documentation.
I also think we should bring this up to discussion in regular postgis users group to get a better population metric of its impact and people's thoughts on the matter (particularly people managing viewers and other tools that use postgis). I'm only guessing based on my personal use cases, but I'm sure there are a lot more we haven't thought about.
strk also mentioned that INSPIRE would like to keep track of additional information beyond what is allowed in geometry_columns and making this a system catalog only query view will make that worse. True its a side issue, but thought I would bring it up as a topic for discussion.
comment:16 by , 13 years ago
Now i remember the use case why I have multipolygon/polygon. For one project I am working on where we have our own flex renderer, its a pain to output multipolygons especially since most of the records in the table are singular polygons. I have 9,000,000 records of polygons/multipolygon for example of which only 1000 are true multipolygons. Admittedly since we are writing our own renderer, the registration issue in geometry_columns isn't terribly important except when my client has to view these things like Safe FME - as I recall he couldn't see the tables in his FME workbench unless they were registered.
comment:17 by , 13 years ago
I think that if we went as far as having a hybrid solution, it would just cause even more chaos in the long run. As Paul mentions up-thread, do we want to be stuck with an ugly hack forever?
For point 1, converting a varchar(50) to a varchar(1000) can be done easily with a single catalog query because the internal representation doesn't change. In the same way that setting a typmod for a mixed column can be done in the same way because by changing from a sub-class to a super-class, we can guarantee that all of the criteria can be met and so we don't need to check every single row. We would have to visit every row if going the other way, but I don't think that's the common use case.
For point 3, I think we would need to see which tools have these issues in order to get a feel for whether or not this would be a real problem.
comment:18 by , 13 years ago
I think both you are Paul overestimate the length of forever. To me forever (in software terms) is at best 3 years. If you believe the doomsday sayers, it will be like a week from now.
A view can be easily changed whenever we want really, heck that's the point of having a view -- to hide the underlying complexity of your system and change it when you don't like it anymore. What you are proposing violates with fiddling with catalogs violates the PostgreSQL development policy and is much harder to backout than what I am proposing.
Anyrate my point is we don't really need to make this decision quite just yet. We want typmod, we are all in agreement with that, so we should go ahead with that piece and test applications and various scenarios to know which way to go with the second part.
comment:19 by , 13 years ago
I don't think changing the catalogs is violating the PostgreSQL development policy - for example, until reasonably recently the operator/operator class definitions were all done by inserting/updating the catalogs directly because the operator and operator class functions did not contain sufficient functionality at the time.
Sure you can shoot yourself in the foot if you don't know what you're doing with the catalogs, but I don't see much elevated risk performing an upgrade that involves running a pre-packaged, test script.
I agree with you that we should probably spin this out into a separate thread on postgis-users. If people are doing interesting things with their geometry columns, then that's where we'll find out about it.
comment:20 by , 13 years ago
I just thought about another issue with this. It won't handle complex views I don't think.
It will handle the simple view case of
SELECT ...geom FROM sometable;
But I don't think it will handle
SELECT ST_Transform(geom,4326) As geom FROM sometable;
Anyrate I'm going to push this to postgis-users since a similar topic came up.
comment:21 by , 13 years ago
Okay as I mentioned in hindsight -- the trick to make views appear correctly in our meta table is probably the same trick I use in other views for having them register correctly in information_schema etc.
SELECT ST_Transform(geom,4326)::geometry(POINT,4326)
etc. So I guess a bit of a false alarm there.
Honestly though -- I'm beginning to like my hybrid idea much better now that I have heard all of your dilbert ideas :)---. In fact I would say my hybrid is not an ugly duckling at all but quite pretty and is the only solution thus for that satifies everybody's needs.
Really what is so bad about having a UNION ALL or even an EXCEPT in a view. I do it all the time and it will work for everyone. So to clarify -- this is what I propose our geometry_columns look like
-- just for upgrading -- ALTER TABLE geometry_column RENAME TO geometry_columns_additional; -- end upgrading -- CREATE VIEW geometry_columns AS SELECT ... FROM pg_catalogs blah blah blah WHERE geometry_typmod_type(a.atttypmod) != 'Geometry' UNION ALL SELECT .. FROM geometry_columns_additional;
BTW Mark -- I don't care what you say -- I am still vehemently against screwing around with pg_catalog tables regardless of how much we think we know about them. All we need is someone coming back yelling at us because we screwed up their data because they had some strange setup we didn't account for.
comment:22 by , 13 years ago
Paul, I thought about another way of doing this that perhaps won't be quite as offensive to you and Mark and won't require a side-line table.
First to summarize, typmod doesn't work in some of my use cases like the example I described where I have an inheritance model like
features (people query from here if they want all types)
features_poly (here for just polys and so forth) features_point features_lines
However for those child tables, I still have srid and geometry type constraints on them which can be read from the constraints catalogs. So my second proposition is:
Forget about the side line table, but have geometry_columns check both typmod and constraints. I think it might be slightly slower, but definitely faster than what I think QGIS does of inspecting all the tables with geometries (when you choose check all tables).
This would solve my other issue of when I restore my data schema over a new version of postgis having to repopulate my geometry_columns table.
Now for views -- sadly I think people would have to rebuild some of their views with a geometry typmod cast so that they are properly registered in the catalogs. I don't think that's bad though because you can be assured your geometry_columns is always consistent with your view without having a second registration step. The number of views people have to put a typmod on is few. The reason being is that unless you are applying a function to the geometry column, PostgreSQL uses the underlying table type for the type of the column. So even view people should be happier.
As far as strk's need -- I think that is best served with something else. Additional columns aren't defined in SQL/MM nor geometry_columns and I don't think we should introduce foreign things like that. Perhaps another table like geometry_columns_more_detail (well can't think of a name) -- that would just have the additional meta data and then a view that joins the geometry_columns with this table will suffice. Heck since we can make view updatable we can easily create such a left join thing and would only require filling in the additional information not present in geometry_columns.
comment:23 by , 13 years ago
Paul,
Do you realistically think you will get to this in a one month time frame? It's a nice to have feature, but given that there are so many loose what-if ends, I'd just assume punt this to the future.
I still have hopes of making the PostgreSQL 9.1 cut or close to it so we are not stuck with a lot of people running 1.5 on 9.1. To me that means we really got to focus on the nindex and underlying structural changes you have even if it means forgoing everything else. I thought about what strk said about what about a 1.6. Honestly with all the raster and 3D stuff we've got going already, I think it would be a farce to call it a 1.6 even if it does mean that people can do a soft upgrade. I also feel that if we release a 1.6 with all these features, much fewer people will care to upgrade to PostGIS 2.0 when we finally release it which would be a real shame.
comment:24 by , 13 years ago
Well, was almost as easy as expected... at r7409. None of the supporting PL/PgSQL functions has been touched, and right now geometry_columns is still a table while geometry_columns_v is the view. The view definition needs a slight fix as it is currently picking up a couple composite types as well as tables and views with geometry. Rubber will hit the road in deciding how to ditch the table.
comment:25 by , 13 years ago
Paul,
hmm f_geography_column -- REALLY? Me thinks you copied too much.
Anyrate once I load up some data -- I'll submit patches so don't worry about it unless you feel the urge to correct.
I was thinking for the maintenance functions, we should introduce a default argument -- call it -- use_typmod which defaults to true.
So new behavior would be to use typmod and if someone explicitly passes in false, it will use the old behavior. That will take care of my esoteric needs and I think keep the behavior that is most suitable for most people.
I'll revise the view too -- trying out my idea on a database with a lot of tables to see if the speed is acceptable.
comment:27 by , 13 years ago
Okay I've committed my uncontroversial change which corrects the f_geometry_column and also filters to only show tables and views.
For my more controversial ones, you want me to just include as patch to this ticket or just commit and you can remove or alter if you disapprove?
comment:29 by , 13 years ago
Paul - which approach did you go for in the end? I can't work out from your comment in this ticket or the commit message which solution you went for :(
comment:30 by , 13 years ago
Mark,
We haven't yet -- that's what he meant by "Rubber will hit the road in deciding how to ditch the table. "
My somewhat controversial solution -- which I will submit soon is.
1) Have no table just a view geometry_columns 2) The view will use basically Paul's geometry_columns_v approach for geometries with type defined via typmod.
3) For older geometries and for people that can't use typmod typed for whatever reason -- it will reads the types from the contraints. I'm creating a function to do that, so basically the view will be like what Paul has except instead
COALESCE(NULLIF(geometry_typmod_type(a.atttypmod), 'Geometry'), read_type_from_constraint_goes_here(schema,table,column), 'Geometry') As ...
The COALESCE(NULLIF will essentially short-circuit so people using the typmod approach should not experience any speed penalty. I haven't written the function to see how much if any of a speed penalty there would be for people with check constraint based type conformance.
comment:31 by , 13 years ago
Okay I committed the change I discussed abovee at r7450
There is one issue that is bugging me. That is our new types do not display in a backwards compatible way. As far as speed goes, I have to test more, but in a database of 30 geometry columns most of which wer defined using contraints, the speed to output geometry_columns_v was under 23ms. I'll do a more straineous test of generating thousands of tables.
Getting back to what concerns me -- here is my basic test:
CREATE TABLE test_geom_typmod(gid serial primary key, geom geometry(POINTZ,26986)); CREATE TABLE test_geom_constraint(gid serial primary key); SELECT AddGeometryColumn ('public','test_geom_constraint','geom',26986,'POINT',3); SELECT AddGeometryColumn ('public','test_geom_constraint','geomm',26986,'POINTM',3); ALTER TABLE test_geom_typmod ADD COLUMN geomm geometry(POINTM,26986); CREATE OR REPLACE VIEW vw_test_geom_typmod AS SELECT * FROM test_geom_typmod; CREATE OR REPLACE VIEW vw_test_geom_constraint AS SELECT * FROM test_geom_constraint; CREATE OR REPLACE VIEW vw_test_geom_constraint_cast AS SELECT gid, geom::geometry(POINTZ,26986) As geom, geomm::geometry(POINTM,26986) As geomm FROM test_geom_constraint;
Of which the results of my query:
SELECT f_table_schema, f_table_name, f_geometry_column, coord_dimension, srid, type FROM geometry_columns_v WHERE f_table_schema = 'public';
yield:
f_table_name | f_geometry_column | coord_dimension | srid | type ------------------------------+-------------------+-----------------+-------+---------- raster_columns | extent | 2 | 0 | Geometry test_geom_constraint | geom | 3 | 26986 | POINT test_geom_constraint | geomm | 3 | 26986 | POINTM test_geom_typmod | geom | 3 | 26986 | PointZ test_geom_typmod | geomm | 3 | 26986 | PointM vw_test_geom_typmod | geom | 3 | 26986 | PointZ vw_test_geom_typmod | geomm | 3 | 26986 | PointM vw_test_geom_constraint | geom | 2 | 0 | Geometry vw_test_geom_constraint | geomm | 2 | 0 | Geometry vw_test_geom_constraint_cast | geom | 3 | 26986 | PointZ vw_test_geom_constraint_cast | geomm | 3 | 26986 | PointM
I fear this change in casing will break applications. I can almost forgive the difference between PointZ and POINT since most tools we work with are currently 2D anyway.
The other change I made is I cast all the columns so the are the same type and size as teh original geometry_columns. Before they were outputting objects of type "name" which I can just see a bunch of things coughing up on.
comment:32 by , 13 years ago
FYI: I think I can fix the view issue with constraint based non-adorned geometries, but it's a given that for more complex views to be registered properly(including those using typmod geometries) some type casting will need to be done in the view definition.
comment:33 by , 13 years ago
BTW: I'm still for having both a geometry_columns and an st_geometry_columns, but my reason for both has changed. As I said the st_geometry_columns is defined in the SQL/MM specs so if our objective is to move in that direction, I think we should have it.
However, I now feel both should be views, but that geometry_columns should return the old form (uppercase -- POINT, POINT,POINT, POINTM), and st_geometry_columns should return the new format Point,PointZ, PointZM, PointM (or should that be ST_Point... I forget).
comment:34 by , 13 years ago
I'm fairly strenuously opposed to st_geometry_columns, in any form. We're hopefully jettisoning old scruft at 2.0, because we know we're going to have to live with 2.0 for a long time. And I don't want to have to explain two metadata tables to everyone I teach PostGIS to for the next 10 years. I also feel that enough software uses and expects an un-prefixed geometry_columns that keeping it is preferable to switching to table names that are not known or used.
comment:35 by , 13 years ago
Fair enough. I had my reservations as well especially since we don't have an ST_Information_Schema either.
One thing that does have to be decided though, is it going to be POINT (4 dimensions in the table), or PointZ, PointZM or Point Z, Point ZM, PointM vs. POINTM (just a casing issue). I think we have to decide on which way we go and stick with that for both typmod constrained as well as constraint based constrained geometry. I don't want to have to explain why one shows PointZ and one shows POINT. I'm fairly indifferent since most people don't use ZM that much in older versions and aside from casing the others are essentially the same name. I would like to keep the uppercase though just for backward compatibility.
Also is my general approach fine with everyone. If it is, was planning to move to change the functions -- which means making populate_geometry not add or delete any records from geometry_columns(just add constraints to tables or convert column to typmod), AddGeometryColumn same deal - will just add a new column. Each I would replace with a new function that takes an optional argument use_typmod which we can default to true.
comment:36 by , 13 years ago
Yes, agree with your general approach, and would like to go with mixed-case types with no ST_ prefix (eg MultiPolygonZ) for type information, because I just think it's prettier.
comment:37 by , 13 years ago
Okay will do that then and then test with QGIS to see if it coughs up blood at the sight of change in casing. This could be more indepth breaking change even for 2D, but it's prettier :)
comment:38 by , 13 years ago
I think we'll have enough of these that we need to separately document them in some detail. See ./MIGRATION
comment:39 by , 13 years ago
Sorry to revisit this, but I just noticed (duh) that geometry_columns as a coord_dimension
column which partially removes the need for the
type
column to include dimensionality signifiers like ZM. So we could retain (backwards compatible) ZM-less uppercase types. Unfortunately there is another side to this which says that the
coord_dimension
only gives number of ordinates, but doesn't distinguish between XYZ and XYM.
comment:40 by , 13 years ago
That's why we have in old model POINTM etc.
So POINT in old was for XY, XYZM and POINTM was for the POINT XYM case.
So you saying I should keep as is. But I thought you wanted things to look pretty so you were going to change the casing which breaks backward compatibility anyway
comment:41 by , 13 years ago
Summary: | typmod support for PostGIS geometry → typmod support for PostGIS geometry - make geometry_columns a view |
---|
comment:43 by , 13 years ago
Keywords: | history added |
---|
My gut feeling says we should keep AddGeometryColumn, populate etc. more or less as is so they still do the add constraint thing and register into geometry_columns or a new table called
geometry_columns_legacy;
So here is my thought for merging the old with the new world.
st_geometry_columns -- a view parallel in concept to geography_columns that just inspects the catalogs -- perhaps just leaves out geometry columns with no typmod info (or includes them I guess marking them as just geometry). this is a true SQL/MM implementation.
geometry_columns -- a view that unions st_geometry_columns (except for those just marked as geometry) with geometry_columns_legacy. So old applications can continue to work as they did before.
The populate, addgeometrycolumns etc. will just populate geometry_columns_legacy. We will expect all people expecting the new behavior to use the typmod behavior for creating geometries. It would also only add constraints to columns that don't use typmod construction.
Any thoughts / issues people see?