#3031 closed defect (fixed)
"POINT EMPTY" as WKB defaults to "MULTIPOINT EMPTY" when restoring with copy and failing validation
Reported by: | jeanchristophesaville | Owned by: | pramsey |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 2.1.6 |
Component: | postgis | Version: | 2.1.x |
Keywords: | POINT EMPTY, MULTIPOINT EMPTY | Cc: |
Description
Using version "POSTGIS="2.1.2 r12389" GEOS="3.4.2-CAPI-1.8.2 r3921" PROJ="Rel. 4.8.0, 6 March 2012" LIBXML="2.6.26" LIBJSON="UNKNOWN""
I have a table with a column definition of: ALTER TABLE "myTable" ADD COLUMN point_as_4326 geometry(Point,4326);
In which I will store 'POINT EMPTY' for invalid point geometries
However when I try and restore a previous dump file to this table I get the following error.
COPY failed for table "position": ERROR: Geometry type (MultiPoint) does not match column type (Point)
Upon further investigation I found that my version of PostGIS seems to produce the same WKB for 'POINT EMPTY' and 'MULTIPOINT EMPTY' which in turn fails the column validation of geometry(Point,4326);
To demonstrate the problem behaviour:
SELECT ST_GeomFromText('POINT EMPTY') --> producess '010400000000000000'
SELECT ST_GeomFromText('MULTIPOINT EMPTY') --> producess '010400000000000000' SELECT ST_GeomFromText('POINT EMPTY') = ST_GeomFromText('MULTIPOINT EMPTY') --> produces true
SELECT ST_AsText('010400000000000000') --> produces 'MULTIPOINT EMPTY' -- I expect 'POINT EMPTY' here
SELECT ST_AsText(ST_GeomFromText('POINT EMPTY')) --> However produces 'POINT EMPTY'
It seems that there are explicit constants for various empty geometry types as per the following SELECT ST_AsText('010200000000000000') --"LINESTRING EMPTY" SELECT ST_AsText('010300000000000000') --"POLYGON EMPTY" SELECT ST_AsText('010400000000000000') --"MULTIPOINT EMPTY" SELECT ST_AsText('010500000000000000') --"MULTILINESTRING EMPTY" SELECT ST_AsText('010600000000000000') --"MULTIPOLYGON EMPTY" SELECT ST_AsText('010700000000000000') --"GEOMETRYCOLLECTION EMPTY" SELECT ST_AsText('010800000000000000') --"CIRCULARSTRING EMPTY" SELECT ST_AsText('010900000000000000') --"COMPOUNDCURVE EMPTY"
However I'm not sure what happened to "POINT EMPTY" which I haven't come accross yet
In version "POSTGIS="1.5.8" GEOS="3.4.2-CAPI-1.8.2 r3921" PROJ="Rel. 4.8.0, 6 March 2012" LIBXML="2.6.26" USE_STATS (procs from 1.5 r5976 need upgrade)" SELECT ST_GeomFromText('POINT EMPTY') -- produces '010700000000000000' = "GEOMETRYCOLLECTION EMPTY"
All of the above to ask why are we not using an explicit POINT EMPTY value and how can I get my previous backup restored without failing geometry type validation.
Change History (30)
comment:1 by , 10 years ago
Milestone: | Management 2.0 → PostGIS 2.1.6 |
---|
comment:2 by , 10 years ago
comment:3 by , 10 years ago
Every type except point as a WKB representation that goes
endian[byte] + typenumber[integer] + numpoints[integer]
those are the patterns you're seeing for the other types. endian, type, and then a zero, indicating the object has zero elements or zero points
trouble is, the wkb for point is
endian[byte] + typenumber[integer] + x[double} + y[double]
there's nowhere to put in the information that the point is actually empty
our internal representation is slightly fluffier, but in order to retain backwards compatibility with postgis 1.x when we moved to 2.x we kept the same canonical representation... this allowed things like apps (mapserver, geoserver, fme, etc) to keep on functioning w/o changes against 2.0 which was great. the only problem we've had is this compromise we had to make on POINT EMPTY
probably the "best" fix would be to stop creating POINT EMPTY in *any* eventuality and always create a MULTIPOINT EMPTY if called upon for an empty pointy thing
comment:4 by , 10 years ago
Hm, other alternative representations for POINT EMPTY:
- POINT(NaN NaN)
- NULL
Neither is ideal, but each at least matches the type of the original, so wouldn't cause the constraint failure we have here.
comment:5 by , 10 years ago
POINT(NaN NaN) is good for me. At least there's another known implementor of it: http://lists.osgeo.org/pipermail/gdal-dev/2013-December/037734.html
I don't think this should go in 2.1 though.
comment:6 by , 10 years ago
On the one hand, it's a "big" change, on the other hand, without it, folks on the current stable branch cannot dump/restore a point table that has an EMPTY in it. That feels like a bug, and worthy of fixing here.
comment:7 by , 10 years ago
Changing the WKB to "POINT NAN NAN" won't help with restoring existing dumps, so we'll still need support from postgis_restore.pl to do that. Or, for those NOT using postgis_restore.pl, we might need to make the constraint looser (add OR ST_IsEmpty() ?)
comment:8 by , 10 years ago
Not for existing dump files, no, but it's not like people aren't out there running backups on their 2.1 databases every day. The sooner we fix this, the sooner it is fixed.
comment:10 by , 10 years ago
I should add why I think this is important to get in NOW -- we can't count on constraints since most people use typmod now (like this guy) and besides there are so many ways people restore their data that we can't expect people to have postgis_restore.pl
comment:11 by , 10 years ago
OK, a patch for this at r13359. The patch is actually a suggestion from strk, to just allow MULTIPOINT EMPTY inputs a special pass through the typmod checker. It turned out to be less invasive code than I feared, so I went for that. The reason I did was to avoid emitting POINT(Nan nan) for our WKB output, which could potentially break downstream applications not expecting such a fun construct.
Question: for trunk, do we go to POINT(nan nan) or keep our funny little hack in place?
comment:12 by , 10 years ago
OK, so that fix actually breaks Regina's carefully constructed all-on-all regression test for typmod, which is so complex I cannot change it myself. I backed it out. Comments?
comment:15 by , 10 years ago
I think you are giving me way too much credit. I don't think I wrote that. Looks more like strk's craftmanship.
comment:17 by , 10 years ago
I looked at the regress and your failures here:
http://debbie.postgis.net:8080/job/PostGIS_Regress/5936/consoleFull
I guess the question is if we need to change the test or just the regress output or your code. I thinkyou need to change your code (as its still not right, but the test output needs to be changed as well since it assumes screwed up behavior which we are trying to fix)
So here is how you read it. ( If I read this right, then I've probably been spending way too much time talking with strk :) )
pattern is: table_to_insert|srid_of_geom|type_of_geom|zmflag_geom|out_status
for out_status: OK means text insert pass KO means text insert fail BOK is binary insert pass (binary is from a COPY proces so the equivalent of our restore) BKO is binary insert failure GOK is text geography pass, GKO: geography plain text failure etc.
and sample of putput from debbie
-point|0|Point|0|KO-BKO-GKO:-BGKO +point|0|Point|0|OK-BKO-GOK-BGKO -pointm4326|0|Point|1|KO-BKO-GKO:-BGKO +pointm4326|0|Point|1|OK-BKO-GOK-BGKO
So from the above assuming I am reading these okay, I'm a little concerned about the fact that you can do a text rep insert but not a binary as I thought this was supposed to fix that issue.
I could very well be misreading this. But I would expect the answers to be for new not the same as the old, but not what you are emitting
--this is what it should be point|0|Point|0|OK-BOK-GOK-BGOK -- this is what the old returns point|0|Point|0|KO-BKO-GKO:-BGKO -- this is still wrong (assume this is with your NAN NAN) point|0|Point|0|OK-BKO-GOK-BGKO
To quickly test this is what I get with current which is wrong behavior (but what current emits)
CREATE SCHEMA tm; CREATE TABLE tm.point(id serial, g geometry(point), gg geography(point); -- note this is mimicking strk's test without resorting to backup and restore (this test is a bit flawed as you really aren't testing POINT EMPTY but POINT EMTPY casted to geomety casted as text to unknown so it's dumb luck the answers look agreeable -- meh KO -- ERROR: Geometry type (MultiPoint) does not match column type (Point) ********** Error ********** INSERT INTO tm.point(g) VALUES('POINT EMPTY'::geometry::text || ''); -- meh GKO INSERT INTO tm.point(gg) VALUES('POINT EMPTY'::geography::text || ''); -- meh KO, BGKO -- ERROR: Geometry type (MultiPoint) does not match column type (Point) INSERT INTO tm.point(g, gg) VALUES('POINT EMPTY'::geometry::bytea, 'POINT EMPTY'::geography::bytea);
comment:18 by , 10 years ago
Hey, Regina, thanks for this in-depth description of the test. I'm the author but I swear it was useful for me to read your documentation. Could you maybe write it (if not already present) in the typmod.sql test file, for future reference ?
I was wondering, reading your description, what is the ":" symbol for ? In
-pointm4326|0|Point|1|KO-BKO-GKO:-BGKO +pointm4326|0|Point|1|OK-BKO-GOK-BGKO
comment:19 by , 10 years ago
btw, I agree about updating the test to STOP expecting the bogus behaviour of being unable to restore point empty
comment:20 by , 10 years ago
strk you have in your code :-BGKO (as signifier of binary geography fail) I wasn't sure why you put a : there maybe its a typo in the code we should remove.
comment:21 by , 10 years ago
looked, I agree it's a typo, to be removed once we change expected results.
comment:22 by , 10 years ago
comment:23 by , 10 years ago
Looking at the currently expected output, I see issues with both point and multipoint.
A sane expectance, for a (polygon) typmoded geometry/geography, is to accept only 2d polygons, of any srid:
polygon|0|Polygon|0|OK-BOK-GOK-BGOK polygon|4326|Polygon|0|OK-BOK-GOK-BGOK polygon||COUNT|4| polygon||GCOUNT|4|
But we're also expecting (insanely, I'd say) that for a (multipoint) typmoded geometry/geography, to accept both 2d multipoints AND points:
multipoint|0|Point|0|OK-BOK-GOK-BGOK multipoint|0|MultiPoint|0|OK-BOK-GOK-BGOK multipoint|4326|Point|0|OK-BOK-GOK-BGOK multipoint|4326|MultiPoint|0|OK-BOK-GOK-BGOK multipoint||COUNT|8| multipoint||GCOUNT|8|
And (again insanely) that for a (point) typmoded geometry/geography, to accept NOTHING:
point||COUNT|0| point||GCOUNT|0|
By "accept" here we mean by stepping trough WKB. The wrong failures in the point case are due to WKB converting the point into a multipoint. The wrong successes I'm not sure about (why is a point accepted in a multipoint?)
comment:24 by , 10 years ago
Oh, now I got why for a multipoint we accept both multipoint and point. It is because the _value_ of POINT EMPTY becomes a multipoint and is thus accepted. So it's still the same exact problem we're trying to address.
So, with the workaround fix in place, we should be accepting both points and multipoints (we only test empties) for the point case and we'd be still accepting both points and multipoints (empty) for the multipoint case. Basically it would be making the point typmod as "tolerant" as the multipoint one (when it comes to empties).
The test should probably be further enhanced to also test non-empties...
comment:25 by , 10 years ago
With r13363 in 2.1 branch and r13364 in trunk I've added the non-empty point and multipoint cases to the test, confirming that a "point" typmod'ed geometry/geography do can accept non-empty points and a "multipoint" typmod'ed geometry/geography cannot accept non-empty points:
point|0|Point|0|KO-BKO-GKO-BGKO point|0|PointNE|0|OK-BOK-GOK-BGOK point|4326|Point|0|KO-BKO-GKO-BGKO point|4326|PointNE|0|OK-BOK-GOK-BGOK point||COUNT|4| point||GCOUNT|4|
multipoint|0|Point|0|OK-BOK-GOK-BGOK multipoint|0|PointNE|0|KO-BKO-GKO-BGKO multipoint|0|MultiPoint|0|OK-BOK-GOK-BGOK multipoint|0|MultiPointNE|0|OK-BOK-GOK-BGOK multipoint|4326|Point|0|OK-BOK-GOK-BGOK multipoint|4326|PointNE|0|KO-BKO-GKO-BGKO multipoint|4326|MultiPoint|0|OK-BOK-GOK-BGOK multipoint|4326|MultiPointNE|0|OK-BOK-GOK-BGOK multipoint||COUNT|12| multipoint||GCOUNT|12|
comment:26 by , 10 years ago
OK, this has been a good process, the patch is now much more sensible (IMO) and committed at r13367. I have *not* changed the regress tests, they are going to fail, but that's on purpose, because I want you to review the failures and make sure they are sane to you. To my eyes they look right.
comment:27 by , 10 years ago
I'm ok with r13367, please update the test (you can add --expect to RUNTESTFLAGS) and forward-port the whole thing.
comment:28 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Committed to trunk at r13372. All up-to-date. Should patch 2.0 also, I suppose? (same problem)
comment:30 by , 10 years ago
Discussion and implementation of POINT EMPTY ==> POINT(NaN NaN) in #3181
It's complicated :). As I recall it being explained to me - its a mismatch between the canonical and the internal representation. Since backup uses the canonical, canonical doesn't know how to represent POINT EMPTY so restore comes back as MULTIPOINT EMPTY.
pramsey can explain better why this limitation. I can't honestly remember why it was so complicated to get the canonical right. This is a known issue but I guess few have stumbled on the issue with restore which makes this a bit more serious.
I thought we had this in our ticket system somewhere as an open issue, but I can't find it.