Opened 8 years ago
Last modified 5 years ago
#3220 new defect
WinGRASS not recognizing accented utf-8 (nor cp1252) attribute values
Reported by: | hellik | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | 7.2.4 |
Component: | Default | Version: | svn-releasebranch72 |
Keywords: | Cc: | ||
CPU: | Unspecified | Platform: | MSWindows 8 |
Description
taken from the user ML:
https://lists.osgeo.org/pipermail/grass-user/2016-December/075682.html
I've got shape files with Swedish accented letters (ÄÖÅ) in the some of the attribute values. The Attributes are shwon as they should in the GUI. SQL statements, however, are not recognizing them. They're also messed up in the command output if another (not accented) values are queried. I sat GRASS_DB_ENCODING to cp1252 firstly and it didn't work. Then I converted the dbf file into utf-8 and sat it as the value of the variable, to no avail. I also tried using the 'encoding' parameter in v.in.ogr in both cases, didn't work. I tried it on windows 8.1 and windows 10. The same is happening in both, stable GRASS 7.0.5 and GRASS 7.2.0RC1. The problem is only happening on Windows. Fedora and Mac OsX don't have this issue with the same shape files.
https://lists.osgeo.org/pipermail/grass-user/2016-December/075688.html
confirmed with GRASS version: 7.3.svn GRASS SVN revision: r70001 Build date: 2016-12-06 Build platform: x86_64-w64-mingw32 GDAL: 2.1.2 PROJ.4: 4.9.3 GEOS: 3.5.0 SQLite: 3.14.1 Python: 2.7.5 wxPython: 2.8.12.1 Platform: Windows-8-6.2.9200 (OSGeo4W)
and a test vector with following attributes v.db.select map=test_points at data file=D:\temp\test_point.txt cat|id|names 1|1|ÄÖÅ 2||Æ 3||Ø 4||Å,å,Æ,æ,Ø,ø 5||ø, Ø 6||Þ 7||Ð 8||Å 9||æ
d.vect map=test_points2 at data where="names = 'Å,å,Æ,æ,Ø,ø'" width=1 icon=basic/point size=10 doesn't show the selected point in the map display.
v.report map=test_points at data option=coor cat|id|names|x|y|z 1|1|ÄÖÅ|1.37409120951759|47.039352838731|0.0 2||Æ|2.62326503635168|28.5515802015863|0.0 3||Ø|44.095836087244|57.2825782187707|0.0 4||Å,å,Æ,æ,Ø,ø|30.8545935228025|49.787535257766|0.0 5||ø, Ø|10.1183079973563|51.0367090846001|0.0 6||Þ|20.361533377396|52.0360481460674|0.0 8||Å|15.1491119517375|60.3621017805262|0.0 9||æ|-1.26290587954035|52.5879880709736|0.0 Traceback (most recent call last): File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\gu i_core\goutput.py", line 473, in OnCmdOutput self.cmdOutput.AddStyledMessage(message, type) File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\gu i_core\goutput.py", line 772, in AddStyledMessage self.AddTextWrapped(message, wrap=None) File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\gu i_core\goutput.py", line 721, in AddTextWrapped txt = EncodeString(txt) File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\co re\gcmd.py", line 97, in EncodeString return string.encode(_enc) File "C:\OSGEO4~1\apps\Python27\lib\encodings\cp1252.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_table) UnicodeDecodeError : 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)
Attachments (2)
Change History (16)
follow-ups: 2 3 4 comment:1 by , 8 years ago
comment:2 by , 8 years ago
Replying to martinl:
Import (
v.import/v.in.ogr
) withencoding=cp1252
will not help?
v.import encoding=cp1252 input=D:\temp\test_points.shp layer=test_points output=testimportcp1252 WARNING: All available OGR layers will be imported into vector map <test_points> Check if OGR layer <test_points> contains polygons... Importing 9 features (OGR layer <test_points>)... ----------------------------------------------------- Building topology for vector map <testimportcp1252@data2>... Registering primitives... 9 primitives registered 9 vertices registered Building areas... 0 areas built 0 isles built Attaching islands... Attaching centroids... Number of nodes: 0 Number of primitives: 9 Number of points: 9 Number of lines: 0 Number of boundaries: 0 Number of centroids: 0 Number of areas: 0 Number of isles: 0 Input <D:\temp\test_points.shp> successfully imported without reprojection
v.report map=testimportcp1252@data2 option=coor cat|id|names|x|y|z 1|1|ÄÖÅ|1.37409120951759|47.039352838731|0.0 2||Æ|2.62326503635168|28.5515802015863|0.0 3||Ø|44.095836087244|57.2825782187707|0.0 4||Å,å,Æ,æ,Ø,ø|30.8545935228025|49.787535257766|0.0 5||ø, Ø|10.1183079973563|51.0367090846001|0.0 6||Þ|20.361533377396|52.0360481460674|0.0 8||Å|15.1491119517375|60.3621017805262|0.0 9||æ|-1.26290587954035|52.5879880709736|0.0 Traceback (most recent call last): File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\gu i_core\goutput.py", line 473, in OnCmdOutput self.cmdOutput.AddStyledMessage(message, type) File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\gu i_core\goutput.py", line 772, in AddStyledMessage self.AddTextWrapped(message, wrap=None) File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\gu i_core\goutput.py", line 721, in AddTextWrapped txt = EncodeString(txt) File "C:\OSGEO4~1\apps\grass\grass-7.3.svn\gui\wxpython\co re\gcmd.py", line 97, in EncodeString return string.encode(_enc) File "C:\OSGEO4~1\apps\Python27\lib\encodings\cp1252.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_table) UnicodeDecodeError : 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)
it doesn't help.
by , 8 years ago
Attachment: | test_points_encoding_errors.zip added |
---|
zipped shapefile in wgs84 for testing
comment:3 by , 8 years ago
Replying to martinl:
Import (
v.import/v.in.ogr
) withencoding=cp1252
will not help?
encoding=cp1252 in v.in.ogr did not help. And I'm getting them messed even with v.db.select but without any error output:
v.db.select map=test_points cat|id|names 1|1|ÄÖÅ 2||Æ 3||Ø 4||Å,å,Æ,æ,Ø,ø 5||ø, Ø 6||Þ 7||Р8||Å 9||æ (Wed Dec 07 15:32:51 2016) Command finished (0 sec)
I used the stand-alone installer for both GRASS 7.0.5 and GRASS 7.2.0svn, if it matters.
follow-ups: 5 6 comment:4 by , 8 years ago
Replying to martinl:
Import (
v.import/v.in.ogr
) withencoding=cp1252
will not help?
Looking at the file, I do not have the feeling that it is in cp1252, but rather in utf-8, so IIUC the parameter setting for v.in.ogr should be encoding=utf-8.
comment:5 by , 8 years ago
Replying to mlennert:
Replying to martinl:
Import (
v.import/v.in.ogr
) withencoding=cp1252
will not help?Looking at the file, I do not have the feeling that it is in cp1252, but rather in utf-8, so IIUC the parameter setting for v.in.ogr should be encoding=utf-8.
Tried it also with UTF-8,it fails here too.
by , 8 years ago
Attachment: | qgis_shapefile_cp1252.zip added |
---|
qgis generated cp1252 example shapefile
comment:6 by , 8 years ago
Replying to mlennert:
Replying to martinl:
Import (
v.import/v.in.ogr
) withencoding=cp1252
will not help?Looking at the file, I do not have the feeling that it is in cp1252, but rather in utf-8, so IIUC the parameter setting for v.in.ogr should be encoding=utf-8.
added now a qgis generated (hopefully) cp1p1252 example shape files. this one fails here also on a self compiled linux grass trunk.
follow-up: 8 comment:7 by , 8 years ago
A comment without looking into actual code. It is necessary to provide clear info on reproducing the issue. Crucial info is:
- Windows locale (will influence assumed encoding);
- The mechanism of executing example command (CMD.exe will have different encoding than other places. Think ANSI vs OEM).
Some related reading: https://trac.osgeo.org/grass/ticket/2525#comment:1 https://trac.osgeo.org/grass/ticket/2120#comment:10 http://stackoverflow.com/a/17177904 https://bugs.python.org/issue6135
comment:8 by , 8 years ago
Replying to marisn:
... Crucial info is:
- Windows locale (will influence assumed encoding);
- The mechanism of executing example command (CMD.exe will have different encoding than other places. Think ANSI vs OEM).
Here is what I've got: Windows locale
systeminfo System Locale: sv;Svenska Input Locale: sv;Svenska
Originally, I've got
chcp 850
But since it's not working, I tried using
chcp 1252
and the Nordic OEM:
chcp 865
before importing, in the cmd and from within GRASS in the command console. Nothing really changed. The rest of the reading I did was way over my head, sorry. But here's a link to the original shape file I'm having issues with. I can't attach it here because it's a bit over 2MB and I'm afraid that taking a sample from it and exporting it might change the encoding on export: https://www.dropbox.com/s/2ptgaf5owco63f0/stockholm.zip?dl=0 (the link should be valid for a month).
comment:10 by , 8 years ago
Milestone: | 7.2.1 → 7.2.2 |
---|
comment:13 by , 7 years ago
Milestone: | → 7.2.4 |
---|
Import (
v.import/v.in.ogr
) withencoding=cp1252
will not help?