Opened 4 years ago

Closed 4 years ago

#4705 closed defect (fixed)

pgsql2shp - Pg field column in mapping file incorrectly compared with the name of the dbf field name

Reported by: zezzagio Owned by: robe
Priority: medium Milestone:
Component: utils/loader-dumper Version: 3.0.x
Keywords: Cc:

Description

The first column in the mapping file (postgres field name) is incorrectly compared with the name of dbf field (wich defaults to the first 10 characters of the postgres name).

This makes it impossible to recognize (and therefore rename) a postgres field longer than 10 characters.

This affect surely version 3.0.x, but I suspect some other previous versions too.

Refer even to: https://stackoverflow.com/questions/61365672

Proposed patch:

https://github.com/zezzagio/postgis

diff --git a/loader/pgsql2shp-core.c b/loader/pgsql2shp-core.c
index 1596dc206..06226e951 100644
--- a/loader/pgsql2shp-core.c
+++ b/loader/pgsql2shp-core.c
@@ -1536,7 +1536,7 @@ ShpDumperOpenTable(SHPDUMPERSTATE *state)
                 * use this to create the dbf field name from
                 * the PostgreSQL column name */
                {
-                       const char *mapped = colmap_dbf_by_pg(&state->column_map, dbffieldname);
+                       const char *mapped = colmap_dbf_by_pg(&state->column_map, pgfieldname);
                        if (mapped)
                        {
                                strncpy(dbffieldname, mapped, 10);

Change History (23)

comment:1 by zezzagio, 4 years ago

The test can't catch the error, because of the content of mfile_mapping.txt:

address add
short_name short

"short_name" is too short (exactly 10 characters), so it is the same as the default dbf field name; try "very_long_name"

comment:2 by strk, 4 years ago

Hi Zez, do you think you could also help with automating a test to secure this bugfix ? The run_test.pl script does have support for testing options passed to the loader&dumper roundtrip and a test exists checking it (possibly not exhaustive to include this case).

You can run the test with:

cd regress make staged-install cd loader ../run_test.pl -v mfile

The test will apply switches in mfile.opts which happens to pass -m mfile_mapping.txt having the following mapping:

address add short_name short

If I understand you correctly it would be enough to just add a mapping with a different beginning ?

comment:3 by zezzagio, 4 years ago

No, the content of regress/loader/mfile.select.sql has to be changed accorderly too, in order to all tests to succed (or fail... well, succed to fail):

--- mfile_mapping.txt

address add
some_very_long_name short

--- mfile.select.sql

select address, some_very_long_name, color from loadedshp order by 1;

And this is for the loader (shp2pgsl); I suspect the dumper (pgsql2shp) isn't tested at all for the mapping options.

I'm not entirely sure yet; run_test.pl is perl, after all (I'm familiar with perl, but this is really... perl; with some residual bash idiotisms, just in case), but I don't seem to see pgsql2shp called with options readed from mfile_mapping.txt.

I will try to check better and see what I can do, if necessary.

Last edited 4 years ago by zezzagio (previous) (diff)

comment:4 by strk, 4 years ago

You are probably right that there's no test starting with pgsql2shp, but the so called "loader tests" (passing a .shp filename or base filename to run_test.pl) uses both shp2pgsql and pgsql2shp, doing a round-trip and comparing the original shapefile with the expected output shapefile.

This is really driven by what "expected" files exist. In this case, shapefile-to-shapefile comparison is only performed if a file called <input>.shp.expected exists, see subroutine run_dumper_and_check_output

comment:5 by zezzagio, 4 years ago

Yes, I saw that; not sure it's enough. The parameter $custom_opts is not passed on, nor is readed directly by run_dumper_and_check_output from mfile_mapping.txt.

I'm checking, anyway.

Last edited 4 years ago by zezzagio (previous) (diff)

comment:6 by zezzagio, 4 years ago

I see no easy way to test this feature with the current test tool.

If a <input>.shp.expected exists, the sub run_dumper_and_check_output is indeed called, without the -m parameter; and would fail (that's a start), if a .dbf.expected file was compared with the result, which is not. Only the .shp.expected is considered.

The sub run_dumper_test is more promising, and seems to compare all the shape files. It is called if a ${TEST}.dmp file exists, from which it reads the parameters; but only as the last parameter of pgsql2shp, that is the [table|query] argument. This goes after the [database] argument in the call, thus it can't be used to pass the -m option.

Modifying run_dumper_test in order to take in account the custom options is not a big thing, of course, but I'm not very confident to be able to do it without breaking something.

I can investigate further, if you want, and propose something. Not sure how much time it will take.

I estimate that this bug is present al least from version 2.5 (possibly generated from some refactoring.) Having survived so long, I think, does mean this is not a very used feature (or maybe the 10 characters limit is not that low after all.) The whole shape file thing, I think, is something we all have been hoping for a long time to get rid of (perhaps the thing to do could be pgsl2spatalite or pgsql2gpkg.)

Last edited 4 years ago by zezzagio (previous) (diff)

comment:7 by strk, 4 years ago

There's a truncated comment above the dumper tester function that seems to say the file should support options on multiple lines. Sounds worth completing that comment and making it work (maybe git history knows more about it)

comment:8 by Sandro Santilli <strk@…>, 4 years ago

In 0a94d7bd/git:

Add support for passing options to dumper tests

See #4705

comment:9 by strk, 4 years ago

I've pushed commit 0a94d7bd49c9ac4f7f9c71217c703ab7f8a991dd to fix support for passing those switches, just add to subsequent lines of the .dmp file

in reply to:  9 comment:10 by zezzagio, 4 years ago

Replying to strk:

I've pushed commit 0a94d7bd49c9ac4f7f9c71217c703ab7f8a991dd to fix support for passing those switches, just add to subsequent lines of the .dmp file

If I correctly understand the comment should be:

# input is ${TEST}.dmp, where first line is considered to be the
# [table|query] argument for pgsql2shp and all the next lines,
# if any are options.

or so seems to me it does; I don't know how it could have possibly work otherwise.

Last edited 4 years ago by zezzagio (previous) (diff)

comment:11 by zezzagio, 4 years ago

I'm not very familiar with git; I added the correction and the test in my fork:

https://github.com/zezzagio/postgis

Tell me if I have to do a pull request or something.

comment:12 by strk, 4 years ago

No need to create a pull request, your reference to the URL of your git repository is enough.

Please for proper attribution use the correct author for your commits though, with a valid email.

You can use, to amend your already made commit:

git commit --amend --author "Your Name <your@email>"

Then you'll need to force-push it to your remote, as this would effectively be a change in history:

git push --force

Now, suggestions for the patch itself:

  1. You don't need a SRID for the data (will just skip .prj creation)
  2. You don't need a table, can use a query (can avoid -pre and -post)
  3. I dubt you want to exit 0 (succes) when PostGIS object count test fails

in reply to:  12 comment:13 by zezzagio, 4 years ago

Replying to strk:

Please for proper attribution use the correct author for your commits though, with a valid email.

I don't like very much that my email address is published; is there any way to anonymize it?

  1. You don't need a SRID for the data (will just skip .prj creation)

Gis data without srid is something that I think would never have occurred to me.

  1. I dubt you want to exit 0 (succes) when PostGIS object count test fails

Yes, it was "return 0", wich by chance caused an error state, because "Can't return outside a subroutine"; I translated it blindly into "exit 0", which is the right way to do it wrong.

comment:14 by strk, 4 years ago

I don't like very much that my email address is published; is there any way to anonymize it?

I guess you can just tweak it to your liking, maybe just use your OSGeo UserID ?

Gis data without srid is something that I think would never have occurred to me.

It's just to keep the test smaller, given you didn't provide an expected .prj file..

Yes, it was "return 0", wich by chance caused an error state, because "Can't return outside a subroutine"; I translated it blindly into "exit 0", which is the right way to do it wrong.

I see, yes, it should probably be a "next" or "exit 1" there. Or just drop the line, as it's the last thing done, and fail() will count the failure anyway.

comment:15 by zezzagio, 4 years ago

You can use, to amend your already made commit:

git commit --amend --author "Your Name <your@email>"

Not sure to have done it properly (some rebase and other black magic git stuff), but the log list seems as desired.

The other suggested changes are here too.

I removed the exit line; some cleaning operations follow, it doesn't seem necessary to skip them.

comment:16 by strk, 4 years ago

Great that you learned git rebase. I can suggest you to also learn git reset, which keeps the changes but drops the commits, so you can re-do the commits to your liking.

For exmaple you probably don't want one commit that adds the -pre and -post files and another that removes them :)

The way I see it this could be 2 commits:

  1. Fix run_test.pl to do the {regdir} substitution and the continuation on install-objects mismatch
  1. Fix pgsql2shp handling of -m and associated tests

Note: you can add "Closes #4705" in the commit log for commit 2 to see this ticket automatically closed upon acceptance of the changes.

Let me know if you prefer leaving these things to us (but I recommend you learn to juggle with this great tool :)

comment:17 by zezzagio, 4 years ago

Let me know if you prefer leaving these things to us (but I recommend you learn to juggle with this great tool :)

Being in my sixties, a (not so) little more than overweight and all, "juggling" doesn't seem to convey an appropriate image of myself, but I'll do my best in the next days.

Not a big fan of history rewriting, also (too much to rewrite, in my personal case), but I understand if you need more linearity.

I'll let you know if I need some help.

comment:18 by zezzagio, 4 years ago

Modified as suggested.

P.S. I found it rather annoying that the working directory has to be readable by the postgres user, in order the checks to succed; meaning the user's home directory has to be readable. I know this is the default permissions in most Linux distributions (755), but I don't like it (on my system I set it to 700.) Is this really inevitable? or is there a workaround (other than creating a new user for this only purpose, as I did)? An explicit notice in the README file, in any case, would be appreciated.

comment:19 by strk, 4 years ago

Git work is great, but someone will need to double check this because the header file also says that function wants the PG name, not the DBF name:

const char *colmap_dbf_by_pg(colmap *map, const char *pgname); const char *colmap_pg_by_dbf(colmap *map, const char *dbfname);

Those signatures are in loader/shpcommon.h, lacking a comment.

The help output from shp2pgsql says the format is:

COLUMNNAME DBFFIELD1 AVERYLONGCOLUMNNAME DBFFIELD2

So we have the PostgreSQL column name on the left and the DBF field name on the right.

The _same_ format is expected by pgsql2shp (yes, key on the right in this case).

Is the code doing that ?

You are changing this:

  • const char *mapped = colmap_dbf_by_pg(&state->column_map, dbffieldname);

+ const char *mapped = colmap_dbf_by_pg(&state->column_map, pgfieldname);

Does not seem to match with parameter names in signature...

comment:20 by strk, 4 years ago

For the readable dir... workaround is building postgis outside of your home dir.

Something like:

git worktree add /tmp/build1

in reply to:  19 comment:21 by zezzagio, 4 years ago

Replying to strk:

Git work is great, but someone will need to double check

Of course someone will need to double check! it's the because I don't understand.

Git work is great, but someone will need to double check this because the header file also says that function wants the PG name, not the DBF name:

const char *colmap_dbf_by_pg(colmap *map, const char *pgname); const char *colmap_pg_by_dbf(colmap *map, const char *dbfname);

Exactly so! the header says it wants the PG name, the code called it with the DBF name.

Those signatures are in loader/shpcommon.h, lacking a comment.

The help output from shp2pgsql says the format is:

COLUMNNAME DBFFIELD1 AVERYLONGCOLUMNNAME DBFFIELD2

So we have the PostgreSQL column name on the left and the DBF field name on the right.

The _same_ format is expected by pgsql2shp (yes, key on the right in this case).

No, the "key" (i.e. the searched value) is on the left for pgsql2shp. We are searching for a PG column, in order to get the corresponding DBF column.

The functions colmap_dbf_by_pg and colmap_pg_by_dbf are quite similar (and quite simple): they compare an array of strings (map) against a string (key).

In the case of colmap_dbf_by_pg, the array of PG fields names is compared against the PG field name (returning the corresponding DBF field name); in the case of colmap_pg_by_dbf the array of DBF fields names is compared against the DBF field name (returning the correponding PG field name.)

The mapping file is the same; this seems a good choice to me.

You are changing this:

  • const char *mapped = colmap_dbf_by_pg(&state->column_map, dbffieldname);

+ const char *mapped = colmap_dbf_by_pg(&state->column_map, pgfieldname);

Does not seem to match with parameter names in signature...

I don't understand. It didn't match, in fact; now it does.

Last edited 4 years ago by zezzagio (previous) (diff)

comment:22 by strk, 4 years ago

  • const char *mapped = colmap_dbf_by_pg(&state->column_map, dbffieldname);

+ const char *mapped = colmap_dbf_by_pg(&state->column_map, pgfieldname);

Does not seem to match with parameter names in signature...

I don't understand. It 'didn't' match, in fact; now it does.

Ah, sorry, yes, now I see. I'll run tests locally and then push the fix. Thanks a lot !

comment:23 by Sandro Santilli <strk@…>, 4 years ago

Resolution: fixed
Status: newclosed

In 3334fd06/git:

Corrected pgsql2shp for -m option; added test

Postgres field column in mapping file was incorrectly compared with the name of the dbf field
Closes #4705

Note: See TracTickets for help on using tickets.