Opened 10 years ago
Closed 6 years ago
#2532 closed defect (wontfix)
TypeError: environment can only contain string when launching script on Windows
Reported by: | annakrat | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | 7.0.7 |
Component: | Default | Version: | svn-trunk |
Keywords: | encoding | Cc: | |
CPU: | Unspecified | Platform: | MSWindows 8 |
Description
When launching python script in GUI - File - Launch script, I am asked to add the path to GRASS_ADDON_PATH
. I did it and ran the script successfully. However, I am not able to run any command afterwards because of the python error (TypeError: environment can only contain string). The problem is the script path is unicode type (although I am using only ascii letters). The solution is to encode the script path, but with which encoding? And how it is going to be decoded?
A temporary solution is to reject any scripts with path with non-ascii letters and just use str()
.
Change History (22)
follow-up: 2 comment:1 by , 10 years ago
comment:2 by , 10 years ago
Replying to glynn:
Replying to annakrat:
wxGUI's core.gcmd module has EncodeString() and DecodeString() methods which use whatever wxGUI considers to be the "system" encoding. Those are used by gcmd.Popen for converting the arguments to strings and by gcmd.RunCommand() for converting the process' output to unicode.
OK, I used EncodeString, but then with non-ascii characters I get (ascii only path works fine now):
Traceback (most recent call last): File "C:\Users\akratoc\Programs\GRASS GIS 7.0.0svn\gui\wxpython\lmgr\frame.py", line 842, in OnRunScript filename = EncodeString(filename) File "C:\Users\akratoc\Programs\GRASS GIS 7.0.0svn\gui\wxpython\core\gcmd.py", line 101, in EncodeString return string.encode(_enc) File "C:\Users\akratoc\Programs\GRASS GIS 7.0.0svn\Python27\lib\encodings\cp1252.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_table) UnicodeEncodeError : 'charmap' codec can't encode character u'\u0165' in position 40: character maps to <undefined>
I have seen this error in several other tickets, is there something we can do about it?
follow-up: 4 comment:3 by , 10 years ago
I see what you were writing in #2525. So should we just catch an exception and say the user, sorry, don't use non ascii characters in the script path (and change your operating system)?
comment:4 by , 10 years ago
Replying to annakrat:
I see what you were writing in #2525. So should we just catch an exception and say the user, sorry, don't use non ascii characters in the script path (and change your operating system)?
It's not "non-ASCII" characters per se, it's characters which aren't representable in your system codepage (configurable on Windows 7 via Control Panel -> Region and Language -> Administrative -> Change system locale ...).
For Western European languages, the system locale's encoding will be cp1252, which is basically ISO-8859-1 but with most of the C1 control codes (\x80-\x9f) remapped to additional graphic characters.
U+0165 is present in cp1250 (Eastern European, similar to ISO-8859-2).
It appears that Windows has a mechanism for approximating accented characters; if I create a directory whose name contains that character, the "dir" command (in a console using cp1252) shows the directory with the character replaced by "t", and I can "cd" into the directory. Unfortunately, this feature doesn't appear to be accessible via Python.
follow-up: 7 comment:5 by , 10 years ago
I used EncodeString
in r63997, r63998. I tested it successfully on Windows (cp1252) with ascii characters and non-ascii characters which are not present in cp1252 result in error dialog with message how to avoid that. However, I failed to run the script when the name contained non-ascii characters present in cp1252 (á). I don't get any error, but in gui console I get:
Launching script 'C:\Users\akratoc\Desktop\test_workshopá.py'... (Thu Jan 08 12:04:24 2015) Description: Adds the values of two rasters (A + B) Keywords: raster, algebra, sum Usage: test_workshopá.py araster=name braster=name output=name [--overwrite] [--help] [--verbose] [--quiet] [--ui] Flags: --o Allow output files to overwrite existing files --h Print usage summary --v Verbose module output --q Quiet module output --ui Force launching GUI dialog Parameters: araster Name of input raster A in an expression A + B braster Name of input raster B in an expression A + B output Name for output raster map ERROR: Required parameter <araster> not set: (Name of input raster A in an expression A + B) ERROR: Required parameter <braster> not set: (Name of input raster B in an expression A + B) ERROR: Required parameter <output> not set: (Name for output raster map) (Thu Jan 08 12:04:25 2015) Command finished (0 sec)
comment:6 by , 10 years ago
Priority: | major → normal |
---|
follow-up: 8 comment:7 by , 10 years ago
Replying to annakrat:
However, I failed to run the script when the name contained non-ascii characters present in cp1252 (á). I don't get any error, but in gui console I get:
Launching script 'C:\Users\akratoc\Desktop\test_workshopá.py'...
Is the "..." literal? I.e. does the GUI omit the arguments, or does it include details which have been omitted from the ticket?
ERROR: Required parameter <araster> not set:
Can you get any more debug output?
follow-ups: 9 10 comment:8 by , 10 years ago
Replying to glynn:
Replying to annakrat:
However, I failed to run the script when the name contained non-ascii characters present in cp1252 (á). I don't get any error, but in gui console I get:
Launching script 'C:\Users\akratoc\Desktop\test_workshopá.py'...Is the "..." literal? I.e. does the GUI omit the arguments, or does it include details which have been omitted from the ticket?
That comes from here, there are no details, it's ran without any arguments.
ERROR: Required parameter <araster> not set:Can you get any more debug output?
Will try.
comment:9 by , 10 years ago
Replying to annakrat:
Replying to glynn:
Can you get any more debug output?
Will try.
With debug messages on I get in the GUI console:
Launching script 'C:\Users\akratoc\Desktop\test_workshopá.py'... (Thu Jan 08 12:04:24 2015) C:\Users\akratoc\Desktop\test_workshopá.py D2/5: filename = C:\Users\akratoc\Desktop\test_workshopá.py D1/5: G_set_program_name(): test_workshopá D2/5: G_file_name(): path = C:\Users\akratoc\grassdata/nc_basic_spm_grass7/user1 Description: ... and the same as above
and in the terminal window:
GUI D5/5: EncodeString(): enc=cp1252 D1/5: grass.script.core.start_command(): g.gisenv -n D1/5: G_set_program_name(): g.gisenv D2/5: G_option_to_separator(): key = separator -> sep = ' ' GUI D1/5: gcmd.CommandThread(): C:\Users\akratoc\Desktop\test_workshopá.py GUI D5/5: EncodeString(): enc=cp1252 GUI D5/5: EncodeString(): enc=cp1252
It doesn't seem particularly helpful but I don't know what else I can do.
follow-up: 11 comment:10 by , 10 years ago
Replying to annakrat:
That comes from here, there are no details, it's ran without any arguments.
I see.
It's executing the script, which is executing g.parser, which is reading the option definitions from the script then calling G_parser(). As it's called without arguments, G_parser() should be generating a GUI dialog, but it's not even attempting to do that; it's falling through to the option-checking code.
AFAICT, in order for that error message to occur, either argc would have to be at least 2 or isatty(0) would have to be false. But if argc >= 2, that would result in the value of argv[1] being used as the value for araster= (even if it's an empty string), which would prevent the "Required parameter <araster> not set" error.
Which leaves isatty(0) being false. But that shouldn't have anything to do with whether the script filename contains non-ASCII characters. it might be something to do with wxGUI, or it might be Windows weirdness.
Can you add the following to the script, before the call to grass.parser():
import os print os.isatty(0)
follow-ups: 12 13 comment:11 by , 10 years ago
comment:12 by , 10 years ago
Replying to annakrat:
It gives me False.
Presumably that's only the case when the script filename has non-ASCII characters?
follow-up: 14 comment:13 by , 10 years ago
Replying to annakrat:
Replying to glynn:
Replying to annakrat:
It gives me False. I will try to see if there is something wrong in the gui part.
I found that there is raised and ignored exception here and if I remove the try except block, I get:
Traceback (most recent call last): File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\gui\wxpython\lmgr\frame.py", line 907, in OnRunScript self._gconsole.RunCmd([filename]) File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\gui\wxpython\core\gconsole.py", line 554, in RunCmd task = gtask.parse_interface(command[0]) File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\etc\python\grass\script\task.py", line 509, in parse_interface tree = etree.fromstring(get_interface_description(name)) File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\etc\python\grass\script\task.py", line 465, in get_interface_description stderr=PIPE) File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\etc\python\grass\script\core.py", line 62, in __init__ subprocess.Popen.__init__(self, args, **kwargs) File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\Python27\lib\subprocess.py", line 711, in __init__ errread, errwrite) File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\Python27\lib\subprocess.py", line 922, in _execute_child args = '{} /c "{}"'.format (comspec, args) UnicodeEncodeError 'ascii' codec can't encode character u'\xe1' in position 38: ordinal not in range(128)
The command[0]
is Unicode. It seems Popen in Python 2.7 can't handle non-ascii characters. So I tried to encode the command string and I get different error:
Traceback (most recent call last): File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\gui\wxpython\lmgr\frame.py", line 907, in OnRunScript self._gconsole.RunCmd([filename]) File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\gui\wxpython\core\gconsole.py", line 555, in RunCmd task = gtask.parse_interface(EncodeString(command[0])) File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\etc\python\grass\script\task.py", line 509, in parse_interface tree = etree.fromstring(get_interface_description(name)) File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\Python27\lib\xml\etree\ElementTree.py", line 1300, in XML parser.feed(text) File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\Python27\lib\xml\etree\ElementTree.py", line 1642, in feed self._raiseerror(v) File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\Python27\lib\xml\etree\ElementTree.py", line 1506, in _raiseerror raise err xml.etree.ElementTree . ParseError : syntax error: line 1, column 0
It seems that get_interface_description
returns empty xml. I didn't have time to look into it further.
follow-up: 15 comment:14 by , 10 years ago
Replying to annakrat:
The
command[0]
is Unicode. It seems Popen in Python 2.7 can't handle non-ascii characters.
It's more accurate to say that it can't handle unicode. Or, more precisely, unicode which cannot be implicitly converted to a string. Implicit conversions use the default encoding (which is typically ASCII) rather than the locale's encoding. The default encoding is a system or user preference and cannot be changed by scripts.
So I tried to encode the command string and I get different error:
raise err xml.etree.ElementTree . ParseError
It seems that get_interface_description returns empty xml
Did you confirm that?
Otherwise, my guess is that the XML is invalid due to encoding issues.
The program name is copied verbatim into the XML, in the <task name="..."> tag.
If GRASS was built with iconv support, the declared encoding of the XML will be UTF-8; text nodes will be convert from the locale's encoding to UTF-8 (and <,>,& will be converted to entities), but attribute values aren't converted:
fprintf(stdout, "<task name=\"%s\">\n", st->pgm_name);
So, they need to be restricted to the intersection of the locale's encoding and UTF-8 (which probably means ASCII).
I'm not sure that it's worth trying to support script names which contain non-ASCII characters. However, scripts in directories whose names contain non-ASCII characters need to be supported. The same applies to other files; e.g. we can reasonably restrict map, mapset and location names to ASCII, but we should support the situation where the database path contains non-ASCII characters.
In any case, the GUI should be encoding the arguments which it passes to Popen(); it shouldn't be passing unicode values.
follow-up: 16 comment:15 by , 10 years ago
Replying to glynn:
Replying to annakrat:
So I tried to encode the command string and I get different error:
raise err xml.etree.ElementTree . ParseErrorIt seems that get_interface_description returns empty xml
Did you confirm that?
No, when I print the string I get xml, seems to be valid:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE task SYSTEM "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\gui\xml\grass-interface.dtd"> <task name="test_workshopá.py"> <description> Adds the values of two rasters (A + B) </description> ...
I don't understand what's wrong with it.
Otherwise, my guess is that the XML is invalid due to encoding issues.
The program name is copied verbatim into the XML, in the <task name="..."> tag.
If GRASS was built with iconv support, the declared encoding of the XML will be UTF-8; text nodes will be convert from the locale's encoding to UTF-8 (and <,>,& will be converted to entities), but attribute values aren't converted:
fprintf(stdout, "<task name=\"%s\">\n", st->pgm_name);So, they need to be restricted to the intersection of the locale's encoding and UTF-8 (which probably means ASCII).
I'm not sure that it's worth trying to support script names which contain non-ASCII characters. However, scripts in directories whose names contain non-ASCII characters need to be supported. The same applies to other files; e.g. we can reasonably restrict map, mapset and location names to ASCII, but we should support the situation where the database path contains non-ASCII characters.
In any case, the GUI should be encoding the arguments which it passes to Popen(); it shouldn't be passing unicode values.
Should the be encoding moved to get_interface_description
in task.py? The EncodeString
function is in gui, not in python scripting library.
If I try to run the script (this time the script name is only ascii, but the path has some non-ascii characters which are in cp1252), I get the gui dialog and when I run it, I get an error:
Exception in thread Thread-28: Traceback (most recent call last): File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\Python27\lib\threading.py", line 810, in __bootstrap_inner self.run() File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\gui\wxpython\core\gconsole.py", line 155, in run self.resultQ.put((requestId, self.requestCmd.run())) File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\gui\wxpython\core\gcmd.py", line 575, in run env = self.env) File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\gui\wxpython\core\gcmd.py", line 161, in __init__ args = map(EncodeString, args) File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\gui\wxpython\core\gcmd.py", line 92, in EncodeString return string.encode(_enc) File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\Python27\lib\encodings\cp1252.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_table) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 38: ordinal not in range(128)
because in Popen class in gcmd.py some of the arguments are of type str
, some are unicode
. So if encode only the unicode ones, it starts to work.
for i in range(len(args)): if type(args[i]) != str: args[i] = EncodeString(args[i])
So I am not sure what should I do with these results.
follow-up: 17 comment:16 by , 10 years ago
Replying to annakrat:
No, when I print the string I get xml, seems to be valid:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE task SYSTEM "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\gui\xml\grass-interface.dtd"> <task name="test_workshopá.py">
I don't understand what's wrong with it.
The name= attribute will fail to decode due to not being valid UTF-8. The "á" will be encoded in cp1252 (i.e. '\xe1'); attempting to decode that as UTF-8 will fail (non-ASCII characters are encoded as multi-byte sequences; an isolated byte >= 128 can never occur in UTF-8).
In any case, the GUI should be encoding the arguments which it passes to Popen(); it shouldn't be passing unicode values.
Should the be encoding moved to
get_interface_description
in task.py?
No. The GUI shouldn't be passing unicode values to the grass.script library; it should be converting them to strings itself.
The
EncodeString
function is in gui, not in python scripting library.
grass.script.core has encode() and decode().
If I try to run the script (this time the script name is only ascii, but the path has some non-ascii characters which are in cp1252), I get the gui dialog and when I run it, I get an error:
File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\gui\wxpython\core\gcmd.py", line 92, in EncodeString return string.encode(_enc) File "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\Python27\lib\encodings\cp1252.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_table) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 38: ordinal not in range(128)
Ugh. I couldn't figure out what was happening here until I read the next sentence. It appears that str.encode() actually exists; it tries to convert the string to unicode (using the default encoding) so that it can encode it.
because in Popen class in gcmd.py some of the arguments are of type
str
, some areunicode
. So if encode only the unicode ones, it starts to work.
That makes sense. But the encoding should ideally be done at a higher level, at the point that wxGUI "knows" that it's dealing with a unicode value.
This is the main reason why I dislike dynamically-typed languages for large-scale projects (I'd never have suggested Python if I'd have known that wxGUI was going to turn into such a behemoth). In C/C++, you'd just get a compile error if you pass a wchar_t*/std::wstring() where a char*/std::string was expected. In Python, you get something which appears to work until it starts getting decent test coverage.
I'm wondering if sys.setdefaultencoding("EBCDIC-CP-BE") would work ...
follow-up: 18 comment:17 by , 10 years ago
Replying to glynn:
Replying to annakrat:
No, when I print the string I get xml, seems to be valid:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE task SYSTEM "C:\Users\akratoc\Programs\GRASS GIS 7.1.svn\gui\xml\grass-interface.dtd"> <task name="test_workshopá.py">I don't understand what's wrong with it.
The name= attribute will fail to decode due to not being valid UTF-8. The "á" will be encoded in cp1252 (i.e. '\xe1'); attempting to decode that as UTF-8 will fail (non-ASCII characters are encoded as multi-byte sequences; an isolated byte >= 128 can never occur in UTF-8).
I take it that we are supporting only ascii characters in the script name.
In any case, the GUI should be encoding the arguments which it passes to Popen(); it shouldn't be passing unicode values.
Should the be encoding moved to
get_interface_description
in task.py?No. The GUI shouldn't be passing unicode values to the grass.script library; it should be converting them to strings itself.
Ok.
The
EncodeString
function is in gui, not in python scripting library.grass.script.core has encode() and decode().
If I try to run the script (this time the script name is only ascii, but the path has some non-ascii characters which are in cp1252), I get the gui dialog and when I run it, I get an error:
Ugh. I couldn't figure out what was happening here until I read the next sentence. It appears that str.encode() actually exists; it tries to convert the string to unicode (using the default encoding) so that it can encode it.
because in Popen class in gcmd.py some of the arguments are of type
str
, some areunicode
. So if encode only the unicode ones, it starts to work.That makes sense. But the encoding should ideally be done at a higher level, at the point that wxGUI "knows" that it's dealing with a unicode value.
I am not sure where the higher level is and why str and unicode are mixed in this case.
I'm wondering if sys.setdefaultencoding("EBCDIC-CP-BE") would work ...
Why would it? Is it easy to test?
Anyway, I think whatever we do, shouldn't get into the current release. I already fixed the important part (works with ascii path only) and I don't want to make things worse.
comment:18 by , 10 years ago
Replying to annakrat:
That makes sense. But the encoding should ideally be done at a higher level, at the point that wxGUI "knows" that it's dealing with a unicode value.
I am not sure where the higher level is and why str and unicode are mixed in this case.
Unicode values typically come from wxWidgets, e.g. any text retrieved from a text field will be a unicode object.
I'm wondering if sys.setdefaultencoding("EBCDIC-CP-BE") would work ...
Why would it? Is it easy to test?
Sorry, that was really just thinking out loud. It wouldn't fix anything, it would just highlight any remaining implicit conversions.
EBCDIC (used on IBM mainframes) is one of the few encodings which [b]isn'tb compatible (or even mostly-compatible) with ASCII. Setting the default encoding to EBCDIC would make it obvious when implicit str<->unicode conversions were being performed, because the results would be completely wrong (e.g. even A-Z/a-z don't have the same codepoints as ASCII).
The default encoding can only be set in site.py; site.py deletes the setdefaultencoding() function from the sys module to prevent the default encoding from being changed after start-up.
comment:19 by , 9 years ago
Milestone: | 7.0.0 → 7.0.5 |
---|
comment:20 by , 8 years ago
Milestone: | 7.0.5 → 7.0.6 |
---|
comment:21 by , 7 years ago
Milestone: | 7.0.6 → 7.0.7 |
---|
Replying to annakrat:
wxPython uses Unicode for almost everything. So retrieving the contents of a text field will return a Python unicode value.
It won't be decoded. The byte string will be available to the called program as a char* via getenv() (for C) or os.environ (Python).
wxGUI's core.gcmd module has EncodeString() and DecodeString() methods which use whatever wxGUI considers to be the "system" encoding. Those are used by gcmd.Popen for converting the arguments to strings and by gcmd.RunCommand() for converting the process' output to unicode.