Opened 4 years ago
Closed 4 years ago
#4705 closed defect (fixed)
pgsql2shp - Pg field column in mapping file incorrectly compared with the name of the dbf field name
Reported by: | zezzagio | Owned by: | robe |
---|---|---|---|
Priority: | medium | Milestone: | |
Component: | utils/loader-dumper | Version: | 3.0.x |
Keywords: | Cc: |
Description
The first column in the mapping file (postgres field name) is incorrectly compared with the name of dbf field (wich defaults to the first 10 characters of the postgres name).
This makes it impossible to recognize (and therefore rename) a postgres field longer than 10 characters.
This affect surely version 3.0.x, but I suspect some other previous versions too.
Refer even to: https://stackoverflow.com/questions/61365672
Proposed patch:
https://github.com/zezzagio/postgis
diff --git a/loader/pgsql2shp-core.c b/loader/pgsql2shp-core.c index 1596dc206..06226e951 100644 --- a/loader/pgsql2shp-core.c +++ b/loader/pgsql2shp-core.c @@ -1536,7 +1536,7 @@ ShpDumperOpenTable(SHPDUMPERSTATE *state) * use this to create the dbf field name from * the PostgreSQL column name */ { - const char *mapped = colmap_dbf_by_pg(&state->column_map, dbffieldname); + const char *mapped = colmap_dbf_by_pg(&state->column_map, pgfieldname); if (mapped) { strncpy(dbffieldname, mapped, 10);
Change History (23)
comment:1 by , 4 years ago
comment:2 by , 4 years ago
Hi Zez, do you think you could also help with automating a test to secure this bugfix ? The run_test.pl script does have support for testing options passed to the loader&dumper roundtrip and a test exists checking it (possibly not exhaustive to include this case).
You can run the test with:
cd regress make staged-install cd loader ../run_test.pl -v mfile
The test will apply switches in mfile.opts which happens to pass -m mfile_mapping.txt having the following mapping:
address add short_name short
If I understand you correctly it would be enough to just add a mapping with a different beginning ?
comment:3 by , 4 years ago
No, the content of regress/loader/mfile.select.sql has to be changed accorderly too, in order to all tests to succed (or fail... well, succed to fail):
--- mfile_mapping.txt address add some_very_long_name short --- mfile.select.sql select address, some_very_long_name, color from loadedshp order by 1;
And this is for the loader (shp2pgsl); I suspect the dumper (pgsql2shp) isn't tested at all for the mapping options.
I'm not entirely sure yet; run_test.pl is perl, after all (I'm familiar with perl, but this is really... perl; with some residual bash idiotisms, just in case), but I don't seem to see pgsql2shp called with options readed from mfile_mapping.txt.
I will try to check better and see what I can do, if necessary.
comment:4 by , 4 years ago
You are probably right that there's no test starting with pgsql2shp, but the so called "loader tests" (passing a .shp filename or base filename to run_test.pl) uses both shp2pgsql and pgsql2shp, doing a round-trip and comparing the original shapefile with the expected output shapefile.
This is really driven by what "expected" files exist. In this case, shapefile-to-shapefile comparison is only performed if a file called <input>.shp.expected exists, see subroutine run_dumper_and_check_output
comment:5 by , 4 years ago
Yes, I saw that; not sure it's enough. The parameter $custom_opts is not passed on, nor is readed directly by run_dumper_and_check_output from mfile_mapping.txt.
I'm checking, anyway.
comment:6 by , 4 years ago
I see no easy way to test this feature with the current test tool.
If a <input>.shp.expected exists, the sub run_dumper_and_check_output is indeed called, without the -m parameter; and would fail (that's a start), if a .dbf.expected file was compared with the result, which is not. Only the .shp.expected is considered.
The sub run_dumper_test is more promising, and seems to compare all the shape files. It is called if a ${TEST}.dmp file exists, from which it reads the parameters; but only as the last parameter of pgsql2shp, that is the [table|query] argument. This goes after the [database] argument in the call, thus it can't be used to pass the -m option.
Modifying run_dumper_test in order to take in account the custom options is not a big thing, of course, but I'm not very confident to be able to do it without breaking something.
I can investigate further, if you want, and propose something. Not sure how much time it will take.
I estimate that this bug is present al least from version 2.5 (possibly generated from some refactoring.) Having survived so long, I think, does mean this is not a very used feature (or maybe the 10 characters limit is not that low after all.) The whole shape file thing, I think, is something we all have been hoping for a long time to get rid of (perhaps the thing to do could be pgsl2spatalite or pgsql2gpkg.)
comment:7 by , 4 years ago
There's a truncated comment above the dumper tester function that seems to say the file should support options on multiple lines. Sounds worth completing that comment and making it work (maybe git history knows more about it)
follow-up: 10 comment:9 by , 4 years ago
I've pushed commit 0a94d7bd49c9ac4f7f9c71217c703ab7f8a991dd to fix support for passing those switches, just add to subsequent lines of the .dmp file
comment:10 by , 4 years ago
Replying to strk:
I've pushed commit 0a94d7bd49c9ac4f7f9c71217c703ab7f8a991dd to fix support for passing those switches, just add to subsequent lines of the .dmp file
If I correctly understand the comment should be:
# input is ${TEST}.dmp, where first line is considered to be the # [table|query] argument for pgsql2shp and all the next lines, # if any are options.
or so seems to me it does; I don't know how it could have possibly work otherwise.
comment:11 by , 4 years ago
I'm not very familiar with git; I added the correction and the test in my fork:
https://github.com/zezzagio/postgis
Tell me if I have to do a pull request or something.
follow-up: 13 comment:12 by , 4 years ago
No need to create a pull request, your reference to the URL of your git repository is enough.
Please for proper attribution use the correct author for your commits though, with a valid email.
You can use, to amend your already made commit:
git commit --amend --author "Your Name <your@email>"
Then you'll need to force-push it to your remote, as this would effectively be a change in history:
git push --force
Now, suggestions for the patch itself:
- You don't need a SRID for the data (will just skip .prj creation)
- You don't need a table, can use a query (can avoid -pre and -post)
- I dubt you want to exit 0 (succes) when PostGIS object count test fails
comment:13 by , 4 years ago
Replying to strk:
Please for proper attribution use the correct author for your commits though, with a valid email.
I don't like very much that my email address is published; is there any way to anonymize it?
- You don't need a SRID for the data (will just skip .prj creation)
Gis data without srid is something that I think would never have occurred to me.
- I dubt you want to exit 0 (succes) when PostGIS object count test fails
Yes, it was "return 0", wich by chance caused an error state, because "Can't return outside a subroutine"; I translated it blindly into "exit 0", which is the right way to do it wrong.
comment:14 by , 4 years ago
I don't like very much that my email address is published; is there any way to anonymize it?
I guess you can just tweak it to your liking, maybe just use your OSGeo UserID ?
Gis data without srid is something that I think would never have occurred to me.
It's just to keep the test smaller, given you didn't provide an expected .prj file..
Yes, it was "return 0", wich by chance caused an error state, because "Can't return outside a subroutine"; I translated it blindly into "exit 0", which is the right way to do it wrong.
I see, yes, it should probably be a "next" or "exit 1" there. Or just drop the line, as it's the last thing done, and fail() will count the failure anyway.
comment:15 by , 4 years ago
You can use, to amend your already made commit:
git commit --amend --author "Your Name <your@email>"
Not sure to have done it properly (some rebase and other black magic git stuff), but the log list seems as desired.
The other suggested changes are here too.
I removed the exit line; some cleaning operations follow, it doesn't seem necessary to skip them.
comment:16 by , 4 years ago
Great that you learned git rebase.
I can suggest you to also learn git reset
,
which keeps the changes but drops the commits,
so you can re-do the commits to your liking.
For exmaple you probably don't want one commit that adds the -pre and -post files and another that removes them :)
The way I see it this could be 2 commits:
- Fix run_test.pl to do the {regdir} substitution and the continuation on install-objects mismatch
- Fix pgsql2shp handling of -m and associated tests
Note: you can add "Closes #4705" in the commit log for commit 2 to see this ticket automatically closed upon acceptance of the changes.
Let me know if you prefer leaving these things to us (but I recommend you learn to juggle with this great tool :)
comment:17 by , 4 years ago
Let me know if you prefer leaving these things to us (but I recommend you learn to juggle with this great tool :)
Being in my sixties, a (not so) little more than overweight and all, "juggling" doesn't seem to convey an appropriate image of myself, but I'll do my best in the next days.
Not a big fan of history rewriting, also (too much to rewrite, in my personal case), but I understand if you need more linearity.
I'll let you know if I need some help.
comment:18 by , 4 years ago
Modified as suggested.
P.S. I found it rather annoying that the working directory has to be readable by the postgres user, in order the checks to succed; meaning the user's home directory has to be readable. I know this is the default permissions in most Linux distributions (755), but I don't like it (on my system I set it to 700.) Is this really inevitable? or is there a workaround (other than creating a new user for this only purpose, as I did)? An explicit notice in the README file, in any case, would be appreciated.
follow-up: 21 comment:19 by , 4 years ago
Git work is great, but someone will need to double check this because the header file also says that function wants the PG name, not the DBF name:
const char *colmap_dbf_by_pg(colmap *map, const char *pgname); const char *colmap_pg_by_dbf(colmap *map, const char *dbfname);
Those signatures are in loader/shpcommon.h, lacking a comment.
The help output from shp2pgsql says the format is:
COLUMNNAME DBFFIELD1 AVERYLONGCOLUMNNAME DBFFIELD2
So we have the PostgreSQL column name on the left and the DBF field name on the right.
The _same_ format is expected by pgsql2shp (yes, key on the right in this case).
Is the code doing that ?
You are changing this:
- const char *mapped = colmap_dbf_by_pg(&state->column_map, dbffieldname);
+ const char *mapped = colmap_dbf_by_pg(&state->column_map, pgfieldname);
Does not seem to match with parameter names in signature...
comment:20 by , 4 years ago
For the readable dir... workaround is building postgis outside of your home dir.
Something like:
git worktree add /tmp/build1
comment:21 by , 4 years ago
Replying to strk:
Git work is great, but someone will need to double check
Of course someone will need to double check! it's the because I don't understand.
Git work is great, but someone will need to double check this because the header file also says that function wants the PG name, not the DBF name:
const char *colmap_dbf_by_pg(colmap *map, const char *pgname); const char *colmap_pg_by_dbf(colmap *map, const char *dbfname);
Exactly so! the header says it wants the PG name, the code called it with the DBF name.
Those signatures are in loader/shpcommon.h, lacking a comment.
The help output from shp2pgsql says the format is:
COLUMNNAME DBFFIELD1 AVERYLONGCOLUMNNAME DBFFIELD2
So we have the PostgreSQL column name on the left and the DBF field name on the right.
The _same_ format is expected by pgsql2shp (yes, key on the right in this case).
No, the "key" (i.e. the searched value) is on the left for pgsql2shp. We are searching for a PG column, in order to get the corresponding DBF column.
The functions colmap_dbf_by_pg and colmap_pg_by_dbf are quite similar (and quite simple): they compare an array of strings (map) against a string (key).
In the case of colmap_dbf_by_pg, the array of PG fields names is compared against the PG field name (returning the corresponding DBF field name); in the case of colmap_pg_by_dbf the array of DBF fields names is compared against the DBF field name (returning the correponding PG field name.)
The mapping file is the same; this seems a good choice to me.
You are changing this:
- const char *mapped = colmap_dbf_by_pg(&state->column_map, dbffieldname);
+ const char *mapped = colmap_dbf_by_pg(&state->column_map, pgfieldname);
Does not seem to match with parameter names in signature...
I don't understand. It didn't match, in fact; now it does.
comment:22 by , 4 years ago
- const char *mapped = colmap_dbf_by_pg(&state->column_map, dbffieldname);
+ const char *mapped = colmap_dbf_by_pg(&state->column_map, pgfieldname);
Does not seem to match with parameter names in signature...
I don't understand. It 'didn't' match, in fact; now it does.
Ah, sorry, yes, now I see. I'll run tests locally and then push the fix. Thanks a lot !
The test can't catch the error, because of the content of mfile_mapping.txt:
"short_name" is too short (exactly 10 characters), so it is the same as the default dbf field name; try "very_long_name"