* python-shell-send-region uses wrong encoding? @ 2013-10-29 11:30 Ernest Adrogué 2013-10-29 14:24 ` Drew Adams ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: Ernest Adrogué @ 2013-10-29 11:30 UTC (permalink / raw) To: help-gnu-emacs Hi there, I have got a problem with python-shell-send-region. It seems to use a wrong encoding. For example, I have this file: $ cat /tmp/test.py #!/usr/bin/env python # -*- coding: utf-8 -*- from __future__ import print_function a = 'Wörterbuch'.decode('utf8') b = u'Wörterbuch' print(repr(a)) print(repr(b)) $ When I open the file with Emacs and do python-shell-send-buffer (C-c C-c by default) the following lines appear in the shell buffer: u'W\xf6rterbuch' u'W\xf6rterbuch' This is the same output that I get when I run the script in a terminal. However, if I select the lines after 'from __future__ ...' until the end and do python-shell-send-region (C-c C-r) I get this output instead: u'W\xf6rterbuch' u'W\xc3\xb6rterbuch' The second line of output seems to indicate that the text was sent in a different encoding compared to python-shell-send-buffer. What is going on? Regards. ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: python-shell-send-region uses wrong encoding? 2013-10-29 11:30 python-shell-send-region uses wrong encoding? Ernest Adrogué @ 2013-10-29 14:24 ` Drew Adams 2013-10-29 14:37 ` Ernest Adrogué 2013-10-29 14:26 ` Andreas Röhler 2013-10-29 17:28 ` Stefan Monnier 2 siblings, 1 reply; 28+ messages in thread From: Drew Adams @ 2013-10-29 14:24 UTC (permalink / raw) To: Ernest Adrogué, help-gnu-emacs http://stackoverflow.com/questions/19648263/why-does-emacs-get-my-literal-unicode-strings-wrong ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-29 14:24 ` Drew Adams @ 2013-10-29 14:37 ` Ernest Adrogué 2013-10-29 16:00 ` Drew Adams 0 siblings, 1 reply; 28+ messages in thread From: Ernest Adrogué @ 2013-10-29 14:37 UTC (permalink / raw) To: help-gnu-emacs 29-10-2013, 07:24 (-0700); Drew Adams escriu: > http://stackoverflow.com/questions/19648263/why-does-emacs-get-my-literal-unicode-strings-wrong That was me who asked the question on stackoverflow :-) The solutions they gave me didn't really solve the issue. ^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: python-shell-send-region uses wrong encoding? 2013-10-29 14:37 ` Ernest Adrogué @ 2013-10-29 16:00 ` Drew Adams 2013-10-29 16:54 ` Ernest Adrogué 2013-10-29 17:11 ` Eli Zaretskii 0 siblings, 2 replies; 28+ messages in thread From: Drew Adams @ 2013-10-29 16:00 UTC (permalink / raw) To: Ernest Adrogué, help-gnu-emacs > That was me who asked the question on stackoverflow :-) Yes, that's clear. > The solutions they gave me didn't really solve the issue. I see. Then why did you accept one of them (instead of just up-voting it, if it provided only partial help)? And just what was wrong with the solution (from user `wvxvw') that you accepted? Here is `wvxvw's answer, in case it helps others understand what else might need to be added in order to help you: "you need the encoding system in your buffer which contains the source code to be utf-8 to send two bytes for ö. However, if it is a single-byte encoding, and given that you select the locale that maps the byte F6 to ö, you will get that byte. PS. Make sure you have -*- coding: utf-8 -*- comment." ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-29 16:00 ` Drew Adams @ 2013-10-29 16:54 ` Ernest Adrogué 2013-10-29 17:11 ` Eli Zaretskii 1 sibling, 0 replies; 28+ messages in thread From: Ernest Adrogué @ 2013-10-29 16:54 UTC (permalink / raw) To: help-gnu-emacs 29-10-2013, 09:00 (-0700); Drew Adams escriu: > > That was me who asked the question on stackoverflow :-) > > Yes, that's clear. > > > The solutions they gave me didn't really solve the issue. > > I see. Then why did you accept one of them (instead of just > up-voting it, if it provided only partial help)? I accepted it temporarily until I or someone else comes up with a better solution. > And just what was wrong with the solution (from user `wvxvw') > that you accepted? > > Here is `wvxvw's answer, in case it helps others understand > what else might need to be added in order to help you: > > "you need the encoding system in your buffer which contains > the source code to be utf-8 to send two bytes for ö. However, > if it is a single-byte encoding, and given that you select the > locale that maps the byte F6 to ö, you will get that byte. > > PS. Make sure you have -*- coding: utf-8 -*- comment." What is wrong with this answer is that the encoding in the buffer was already UTF-8. I double checked every variable and all of them are set to utf-8-unix. Regards. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-29 16:00 ` Drew Adams 2013-10-29 16:54 ` Ernest Adrogué @ 2013-10-29 17:11 ` Eli Zaretskii 1 sibling, 0 replies; 28+ messages in thread From: Eli Zaretskii @ 2013-10-29 17:11 UTC (permalink / raw) To: help-gnu-emacs > Date: Tue, 29 Oct 2013 09:00:08 -0700 (PDT) > From: Drew Adams <drew.adams@oracle.com> > > "you need the encoding system in your buffer which contains > the source code to be utf-8 to send two bytes for ö. However, > if it is a single-byte encoding, and given that you select the > locale that maps the byte F6 to ö, you will get that byte. > > PS. Make sure you have -*- coding: utf-8 -*- comment." I hope you (and everyone else) understand that the above is profoundly wrong. There's no relation whatsoever between the buffer's file encoding and the encoding of the material Emacs sends to an inferior process. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-29 11:30 python-shell-send-region uses wrong encoding? Ernest Adrogué 2013-10-29 14:24 ` Drew Adams @ 2013-10-29 14:26 ` Andreas Röhler 2013-10-29 14:55 ` Ernest Adrogué 2013-10-29 17:28 ` Stefan Monnier 2 siblings, 1 reply; 28+ messages in thread From: Andreas Röhler @ 2013-10-29 14:26 UTC (permalink / raw) To: help-gnu-emacs Am 29.10.2013 12:30, schrieb Ernest Adrogué: > #!/usr/bin/env python > # -*- coding: utf-8 -*- > > from __future__ import print_function > > a = 'Wörterbuch'.decode('utf8') > b = u'Wörterbuch' > > print(repr(a)) > print(repr(b)) Works here without the `repr': print(a) print(b) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-29 14:26 ` Andreas Röhler @ 2013-10-29 14:55 ` Ernest Adrogué 2013-10-29 15:29 ` Andreas Röhler 2013-10-29 15:34 ` Peter Dyballa 0 siblings, 2 replies; 28+ messages in thread From: Ernest Adrogué @ 2013-10-29 14:55 UTC (permalink / raw) To: help-gnu-emacs 29-10-2013, 15:26 (+0100); Andreas Röhler escriu: > Am 29.10.2013 12:30, schrieb Ernest Adrogué: > >#!/usr/bin/env python > ># -*- coding: utf-8 -*- > > > >from __future__ import print_function > > > >a = 'Wörterbuch'.decode('utf8') > >b = u'Wörterbuch' > > > >print(repr(a)) > >print(repr(b)) > > Works here without the `repr': > > print(a) > print(b) Do you get the same result with C-c C-c as with C-c C-r? Here it's different, print(b) prints `Wörterbuch' (C-c C-r) and `Wörterbuch' (C-c C-c). Something is wrong. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-29 14:55 ` Ernest Adrogué @ 2013-10-29 15:29 ` Andreas Röhler 2013-10-29 15:34 ` Peter Dyballa 1 sibling, 0 replies; 28+ messages in thread From: Andreas Röhler @ 2013-10-29 15:29 UTC (permalink / raw) To: help-gnu-emacs [-- Attachment #1: Type: text/plain, Size: 764 bytes --] Am 29.10.2013 15:55, schrieb Ernest Adrogué: > 29-10-2013, 15:26 (+0100); Andreas Röhler escriu: >> Am 29.10.2013 12:30, schrieb Ernest Adrogué: >>> #!/usr/bin/env python >>> # -*- coding: utf-8 -*- >>> >> >from __future__ import print_function >>> >>> a = 'Wörterbuch'.decode('utf8') >>> b = u'Wörterbuch' >>> >>> print(repr(a)) >>> print(repr(b)) >> >> Works here without the `repr': >> >> print(a) >> print(b) > > Do you get the same result with C-c C-c as with C-c C-r? > > Here it's different, print(b) prints `Wörterbuch' (C-c C-r) and > `Wörterbuch' (C-c C-c). > > Something is wrong. > > Indeed, get the same error. IMO a bug. BTW `py-execute-region' using python-mode.el would work. Attachment displays results with `repr'-forms first, then without. [-- Attachment #2: py-execute-region.png --] [-- Type: image/png, Size: 77299 bytes --] ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-29 14:55 ` Ernest Adrogué 2013-10-29 15:29 ` Andreas Röhler @ 2013-10-29 15:34 ` Peter Dyballa 2013-10-29 16:34 ` Ernest Adrogué 1 sibling, 1 reply; 28+ messages in thread From: Peter Dyballa @ 2013-10-29 15:34 UTC (permalink / raw) To: Ernest Adrogué; +Cc: help-gnu-emacs Am 29.10.2013 um 15:55 schrieb Ernest Adrogué: > Here it's different, print(b) prints `Wörterbuch' (C-c C-r) and > `Wörterbuch' (C-c C-c). This obviously happens in an 8-bit environment. `Wörterbuch' is the sequence of octets that represent the ISO Latin-x (or ISO 8859) encoded word `Wörterbuch' in UTF-8 encoding. Here the "ö" is encoded as two octets: 0xC3 0xB6. The first one is in ISO 8859-15 the character "Ä" and the latter is in that encoding the character "¶". So it seems that one functions prints exclusively in UTF-8… -- Greetings Pete You can never know too little of what is not worth knowing at all. – Anon. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-29 15:34 ` Peter Dyballa @ 2013-10-29 16:34 ` Ernest Adrogué 2013-10-29 17:15 ` Eli Zaretskii 2013-10-29 18:07 ` Peter Dyballa 0 siblings, 2 replies; 28+ messages in thread From: Ernest Adrogué @ 2013-10-29 16:34 UTC (permalink / raw) To: help-gnu-emacs 29-10-2013, 16:34 (+0100); Peter Dyballa escriu: > > Am 29.10.2013 um 15:55 schrieb Ernest Adrogué: > > > Here it's different, print(b) prints `Wörterbuch' (C-c C-r) and > > `Wörterbuch' (C-c C-c). > > This obviously happens in an 8-bit environment. `Wörterbuch' is the > sequence of octets that represent the ISO Latin-x (or ISO 8859) encoded > word `Wörterbuch' in UTF-8 encoding. Here the "ö" is encoded as two > octets: 0xC3 0xB6. The first one is in ISO 8859-15 the character "Ä" and > the latter is in that encoding the character "¶". > > So it seems that one functions prints exclusively in UTF-8… The "ö" character is stored in the file as 0xC3 0xB6. As you say, this is the UTF-8 encoding for this character. The Python interpreter interprets the 2-byte sequence correctly. This can be seen in a number of ways: if I run the script in a terminal, or if I paste or yank the line into Python shell buffer, or I do python-shell-send-buffer, in all these cases the sequence is converted into 0xF6, which is the UTF-16 encoding for "ö" that Python uses internally, as the output from repr() shows.. However, when the bytes are sent with python-shell-send-region, the interpeter thinks that 0xC3 0xB6 are 2 characters, which is wrong. In light of this, I would say that there is a bug in python-shell-send-region. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-29 16:34 ` Ernest Adrogué @ 2013-10-29 17:15 ` Eli Zaretskii 2013-10-29 17:53 ` Ernest Adrogué 2013-10-29 18:07 ` Peter Dyballa 1 sibling, 1 reply; 28+ messages in thread From: Eli Zaretskii @ 2013-10-29 17:15 UTC (permalink / raw) To: help-gnu-emacs > Date: Tue, 29 Oct 2013 17:34:26 +0100 > From: Ernest Adrogué <nfdisco@gmail.com> > > The "ö" character is stored in the file as 0xC3 0xB6. As you say, this is > the UTF-8 encoding for this character. > > The Python interpreter interprets the 2-byte sequence correctly. This can > be seen in a number of ways: if I run the script in a terminal, or if I > paste or yank the line into Python shell buffer, or I do > python-shell-send-buffer, in all these cases the sequence is converted into > 0xF6, which is the UTF-16 encoding for "ö" that Python uses internally, as > the output from repr() shows.. > > However, when the bytes are sent with python-shell-send-region, the > interpeter thinks that 0xC3 0xB6 are 2 characters, which is wrong. In light > of this, I would say that there is a bug in python-shell-send-region. Why is that a bug, and what would you expect python-shell-send-region to send instead (and why)? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-29 17:15 ` Eli Zaretskii @ 2013-10-29 17:53 ` Ernest Adrogué 2013-10-29 19:10 ` Eli Zaretskii 0 siblings, 1 reply; 28+ messages in thread From: Ernest Adrogué @ 2013-10-29 17:53 UTC (permalink / raw) To: help-gnu-emacs 29-10-2013, 19:15 (+0200); Eli Zaretskii escriu: > > Date: Tue, 29 Oct 2013 17:34:26 +0100 > > From: Ernest Adrogué <nfdisco@gmail.com> > > > > The "ö" character is stored in the file as 0xC3 0xB6. As you say, this is > > the UTF-8 encoding for this character. > > > > The Python interpreter interprets the 2-byte sequence correctly. This can > > be seen in a number of ways: if I run the script in a terminal, or if I > > paste or yank the line into Python shell buffer, or I do > > python-shell-send-buffer, in all these cases the sequence is converted into > > 0xF6, which is the UTF-16 encoding for "ö" that Python uses internally, as > > the output from repr() shows.. > > > > However, when the bytes are sent with python-shell-send-region, the > > interpeter thinks that 0xC3 0xB6 are 2 characters, which is wrong. In light > > of this, I would say that there is a bug in python-shell-send-region. > > Why is that a bug, and what would you expect python-shell-send-region > to send instead (and why)? I would expect python-shell-send-region to be a shortcut for saving the region, switching to the shell buffer, yanking and hitting RET. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-29 17:53 ` Ernest Adrogué @ 2013-10-29 19:10 ` Eli Zaretskii 2013-10-29 20:48 ` Ernest Adrogué 0 siblings, 1 reply; 28+ messages in thread From: Eli Zaretskii @ 2013-10-29 19:10 UTC (permalink / raw) To: help-gnu-emacs > Date: Tue, 29 Oct 2013 18:53:03 +0100 > From: Ernest Adrogué <nfdisco@gmail.com> > > I would expect python-shell-send-region to be a shortcut for saving the > region, switching to the shell buffer, yanking and hitting RET. Emacs can send stuff to an inferior program directly, without going through a file. But even if going through a file, why is it wrong to send 2 bytes when a character's UTF-8 encoding takes those same 2 bytes? ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-29 19:10 ` Eli Zaretskii @ 2013-10-29 20:48 ` Ernest Adrogué 0 siblings, 0 replies; 28+ messages in thread From: Ernest Adrogué @ 2013-10-29 20:48 UTC (permalink / raw) To: help-gnu-emacs 29-10-2013, 21:10 (+0200); Eli Zaretskii escriu: > > Date: Tue, 29 Oct 2013 18:53:03 +0100 > > From: Ernest Adrogué <nfdisco@gmail.com> > > > > I would expect python-shell-send-region to be a shortcut for saving the > > region, switching to the shell buffer, yanking and hitting RET. > > Emacs can send stuff to an inferior program directly, without going > through a file. > > But even if going through a file, why is it wrong to send 2 bytes when > a character's UTF-8 encoding takes those same 2 bytes? It's not wrong to send 2 bytes, the problem was that you also have to tell Python the encoding. Anyway, it's fixed now. Thanks Stefan for the patch and everyone else for the help. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-29 16:34 ` Ernest Adrogué 2013-10-29 17:15 ` Eli Zaretskii @ 2013-10-29 18:07 ` Peter Dyballa 2013-10-29 20:37 ` Ernest Adrogué 1 sibling, 1 reply; 28+ messages in thread From: Peter Dyballa @ 2013-10-29 18:07 UTC (permalink / raw) To: Ernest Adrogué; +Cc: help-gnu-emacs Am 29.10.2013 um 17:34 schrieb Ernest Adrogué: > in all these cases the sequence is converted into > 0xF6, which is the UTF-16 encoding for "ö" that Python uses internally, as > the output from repr() shows.. No! UTF-16 is a text encoding that uses for each character in the BMP (Basic Multilingual Plane) 2 bytes or 16 bits – therefore the name. In UTF-16 ö is encoded as 0x00F6. (Above it uses twice 2 bytes, i.e. 4 bytes, which are distinct from UTF-32.) -- Greetings Pete Make it simple, as simple as possible but no simpler. – Albert Einstein ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-29 18:07 ` Peter Dyballa @ 2013-10-29 20:37 ` Ernest Adrogué 0 siblings, 0 replies; 28+ messages in thread From: Ernest Adrogué @ 2013-10-29 20:37 UTC (permalink / raw) To: help-gnu-emacs 29-10-2013, 19:07 (+0100); Peter Dyballa escriu: > > Am 29.10.2013 um 17:34 schrieb Ernest Adrogué: > > > in all these cases the sequence is converted into > > 0xF6, which is the UTF-16 encoding for "ö" that Python uses internally, as > > the output from repr() shows.. > > No! > > UTF-16 is a text encoding that uses for each character in the BMP (Basic > Multilingual Plane) 2 bytes or 16 bits – therefore the name. In UTF-16 ö > is encoded as 0x00F6. (Above it uses twice 2 bytes, i.e. 4 bytes, which > are distinct from UTF-32.) I stand corrected. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-29 11:30 python-shell-send-region uses wrong encoding? Ernest Adrogué 2013-10-29 14:24 ` Drew Adams 2013-10-29 14:26 ` Andreas Röhler @ 2013-10-29 17:28 ` Stefan Monnier 2013-10-30 3:20 ` Stefan Monnier 2 siblings, 1 reply; 28+ messages in thread From: Stefan Monnier @ 2013-10-29 17:28 UTC (permalink / raw) To: help-gnu-emacs; +Cc: Fabián Ezequiel Gallina > This is the same output that I get when I run the script in a terminal. > However, if I select the lines after 'from __future__ ...' until the end and > do python-shell-send-region (C-c C-r) I get this output instead: > u'W\xf6rterbuch' > u'W\xc3\xb6rterbuch' > The second line of output seems to indicate that the text was sent in a > different encoding compared to python-shell-send-buffer. No, it indicates that Python interpreted the bytes sent by Emacs in a different way: the first line (where you explicitly asked for a utf-8 decoding) indicates that the bytes indeed use the right utf-8 encoding, but the second indicates that Python does not decode the input as utf-8 but as something else (presumably latin-1). E.g. the patch below (which causes python-shell-send-string to tell Python that the file sent is using utf-8) should fix your problem (tho it's not a proper fix, since we shouldn't hardcode utf-8 here, but copy which ever -*- coding: -*- coding is in the file). Fabián, could you write a cleaner fix? Stefan === modified file 'lisp/progmodes/python.el' --- lisp/progmodes/python.el 2013-10-07 18:51:26 +0000 +++ lisp/progmodes/python.el 2013-10-29 17:25:11 +0000 @@ -2047,6 +2047,7 @@ (temp-file-name (make-temp-file "py")) (file-name (or (buffer-file-name) temp-file-name))) (with-temp-file temp-file-name + (insert "# -*- coding: utf-8 -*-") (insert string) (delete-trailing-whitespace)) (python-shell-send-file file-name process temp-file-name)) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-29 17:28 ` Stefan Monnier @ 2013-10-30 3:20 ` Stefan Monnier 2013-10-30 6:45 ` Andreas Röhler ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: Stefan Monnier @ 2013-10-30 3:20 UTC (permalink / raw) To: help-gnu-emacs; +Cc: Fabián Ezequiel Gallina > E.g. the patch below (which causes python-shell-send-string to tell > Python that the file sent is using utf-8) should fix your problem (tho > it's not a proper fix, since we shouldn't hardcode utf-8 here, but copy > which ever -*- coding: -*- coding is in the file). I installed a variant of that patch in Emacs's trunk, which should fix the problem. The relevant part of the patch is quoted below, so you can try it out, Stefan === modified file 'lisp/progmodes/python.el' --- lisp/progmodes/python.el 2013-10-07 18:51:26 +0000 +++ lisp/progmodes/python.el 2013-10-30 01:28:36 +0000 @@ -2045,7 +2051,9 @@ (concat (file-remote-p default-directory) "/tmp") temporary-file-directory)) (temp-file-name (make-temp-file "py")) + (coding-system-for-write 'utf-8) (file-name (or (buffer-file-name) temp-file-name))) (with-temp-file temp-file-name + (insert "# -*- coding: utf-8 -*-\n") (insert string) (delete-trailing-whitespace)) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-30 3:20 ` Stefan Monnier @ 2013-10-30 6:45 ` Andreas Röhler 2013-10-30 11:37 ` Stefan Monnier 2013-10-31 14:31 ` Ernest Adrogué 2013-10-31 17:54 ` Ernest Adrogué 2 siblings, 1 reply; 28+ messages in thread From: Andreas Röhler @ 2013-10-30 6:45 UTC (permalink / raw) To: help-gnu-emacs Am 30.10.2013 04:20, schrieb Stefan Monnier: >> E.g. the patch below (which causes python-shell-send-string to tell >> Python that the file sent is using utf-8) should fix your problem (tho >> it's not a proper fix, since we shouldn't hardcode utf-8 here, but copy >> which ever -*- coding: -*- coding is in the file). > > I installed a variant of that patch in Emacs's trunk, which should fix > the problem. The relevant part of the patch is quoted below, so you can > try it out, > > > Stefan > > > === modified file 'lisp/progmodes/python.el' > --- lisp/progmodes/python.el 2013-10-07 18:51:26 +0000 > +++ lisp/progmodes/python.el 2013-10-30 01:28:36 +0000 > @@ -2045,7 +2051,9 @@ > (concat (file-remote-p default-directory) "/tmp") > temporary-file-directory)) > (temp-file-name (make-temp-file "py")) > + (coding-system-for-write 'utf-8) > (file-name (or (buffer-file-name) temp-file-name))) > (with-temp-file temp-file-name > + (insert "# -*- coding: utf-8 -*-\n") > (insert string) > (delete-trailing-whitespace)) > > > IIUC the second added line "-*- coding: utf-8 -*-\n" should not be needed, as that's the default at the Python side anyway. Also functions tracing a possibly error might see a different line-offset that way - just an abstract reasoning so far. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-30 6:45 ` Andreas Röhler @ 2013-10-30 11:37 ` Stefan Monnier 2013-10-30 12:08 ` Yuri Khan 2013-10-31 14:30 ` Ernest Adrogué 0 siblings, 2 replies; 28+ messages in thread From: Stefan Monnier @ 2013-10-30 11:37 UTC (permalink / raw) To: help-gnu-emacs > IIUC the second added line "-*- coding: utf-8 -*-\n" should not be needed, > as that's the default at the Python side anyway. Based on the OP's experience, it seems this is not true. Also, the doc I can find indicates that the default is ASCII (and was latin-1 in the past, which is what the OP seems to be seeing). But, you're a lot more experienced in Python than I am (I never wrote a single line of Python, basically), so maybe there's another explanation for the OP's problem? > Also functions tracing a possibly error might see a different line-offset > that way - just an abstract reasoning so far. This function is used to send a string (typically extracted from the region) not a file, so the offsets have always been wrong anyway (except when the region happens to start on the first line). By, yes, as it happens, the patch I installed does include some other change to try and translate the offsets appropriately. Stefan ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-30 11:37 ` Stefan Monnier @ 2013-10-30 12:08 ` Yuri Khan 2013-10-30 12:45 ` Andreas Röhler 2013-10-31 14:30 ` Ernest Adrogué 1 sibling, 1 reply; 28+ messages in thread From: Yuri Khan @ 2013-10-30 12:08 UTC (permalink / raw) To: Stefan Monnier; +Cc: help-gnu-emacs@gnu.org On Wed, Oct 30, 2013 at 6:37 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote: >> IIUC the second added line "-*- coding: utf-8 -*-\n" should not be needed, >> as that's the default at the Python side anyway. > > Based on the OP's experience, it seems this is not true. > Also, the doc I can find indicates that the default is ASCII (and was > latin-1 in the past, which is what the OP seems to be seeing). There is Python, and, on the other hand, there is Python. In some GNU/Linux distributions (e.g. Ubuntu), the default Python is version 2.x, whose default encoding is ASCII (and has been that way since at least 2.6). On the other hand, the current version of Python is 3.x, where a big Unicode revolution has happened and now the default encoding is UTF-8. Both of these Pythons recognize Emacs-style and vim-style encoding declarations, and in addition the UTF-8 byte order mark (which can be called Notepad-style encoding declaration). ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-30 12:08 ` Yuri Khan @ 2013-10-30 12:45 ` Andreas Röhler 0 siblings, 0 replies; 28+ messages in thread From: Andreas Röhler @ 2013-10-30 12:45 UTC (permalink / raw) To: help-gnu-emacs Am 30.10.2013 13:08, schrieb Yuri Khan: > On Wed, Oct 30, 2013 at 6:37 PM, Stefan Monnier > <monnier@iro.umontreal.ca> wrote: >>> IIUC the second added line "-*- coding: utf-8 -*-\n" should not be needed, >>> as that's the default at the Python side anyway. >> >> Based on the OP's experience, it seems this is not true. >> Also, the doc I can find indicates that the default is ASCII (and was >> latin-1 in the past, which is what the OP seems to be seeing). > > There is Python, and, on the other hand, there is Python. > > In some GNU/Linux distributions (e.g. Ubuntu), the default Python is > version 2.x, whose default encoding is ASCII (and has been that way > since at least 2.6). > > On the other hand, the current version of Python is 3.x, where a big > Unicode revolution has happened and now the default encoding is UTF-8. > > Both of these Pythons recognize Emacs-style and vim-style encoding > declarations, and in addition the UTF-8 byte order mark (which can be > called Notepad-style encoding declaration). > > Thanks clarifying this, so I stand corrected. Andreas ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-30 11:37 ` Stefan Monnier 2013-10-30 12:08 ` Yuri Khan @ 2013-10-31 14:30 ` Ernest Adrogué 1 sibling, 0 replies; 28+ messages in thread From: Ernest Adrogué @ 2013-10-31 14:30 UTC (permalink / raw) To: help-gnu-emacs 30-10-2013, 07:37 (-0400); Stefan Monnier escriu: > This function is used to send a string (typically extracted from the > region) not a file, so the offsets have always been wrong anyway (except > when the region happens to start on the first line). The line number is right, because the temp file is filled with empty lines so that the line numbers match. Because of the extra line, it's one line off. This fixes it: --- python.el.orig 2013-10-31 15:21:26.000000000 +0100 +++ python.el 2013-10-31 15:19:03.673891453 +0100 @@ -2154,7 +2146,7 @@ 3. Wraps indented regions under an \"if True:\" block so the interpreter evaluates them correctly." (let ((substring (buffer-substring-no-properties start end)) - (fillstr (make-string (1- (line-number-at-pos start)) ?\n)) + (fillstr (make-string (- (line-number-at-pos start) 2) ?\n)) (toplevel-block-p (save-excursion (goto-char start) (or (zerop (line-number-at-pos start)) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-30 3:20 ` Stefan Monnier 2013-10-30 6:45 ` Andreas Röhler @ 2013-10-31 14:31 ` Ernest Adrogué 2013-10-31 17:54 ` Ernest Adrogué 2 siblings, 0 replies; 28+ messages in thread From: Ernest Adrogué @ 2013-10-31 14:31 UTC (permalink / raw) To: Stefan Monnier; +Cc: Fabián Ezequiel Gallina, help-gnu-emacs 29-10-2013, 23:20 (-0400); Stefan Monnier escriu: > I installed a variant of that patch in Emacs's trunk, which should fix > the problem. The relevant part of the patch is quoted below, so you can > try it out, It works for me. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-30 3:20 ` Stefan Monnier 2013-10-30 6:45 ` Andreas Röhler 2013-10-31 14:31 ` Ernest Adrogué @ 2013-10-31 17:54 ` Ernest Adrogué 2013-10-31 20:35 ` Stefan Monnier 2013-11-04 19:15 ` Stefan Monnier 2 siblings, 2 replies; 28+ messages in thread From: Ernest Adrogué @ 2013-10-31 17:54 UTC (permalink / raw) To: help-gnu-emacs 29-10-2013, 23:20 (-0400); Stefan Monnier escriu: > I installed a variant of that patch in Emacs's trunk, which should fix > the problem. The original problem is fixed, but now there's another problem. Send this to Python: class Foo(object): pass and you get IndentationError. The problem seems to be this change: @@ -2034,26 +2038,32 @@ there for compatibility with CEDET.") (defun python-shell-send-string (string &optional process msg) "Send STRING to inferior Python PROCESS. -When MSG is non-nil messages the first line of STRING." +When MSG is non-nil messages the first line of STRING. +If a temp file is used, return its name, otherwise return nil." (interactive "sPython command: ") (let ((process (or process (python-shell-get-or-create-process))) - (lines (split-string string "\n" t))) - (and msg (message "Sent: %s..." (nth 0 lines))) - (if (> (length lines) 1) + (_ (string-match "\\`\n*\\(.*\\)\\(\n.\\)?" string)) + (multiline (match-beginning 2))) + (and msg (message "Sent: %s..." (match-string 1 string))) + (if multiline (let* ((temporary-file-directory ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-31 17:54 ` Ernest Adrogué @ 2013-10-31 20:35 ` Stefan Monnier 2013-11-04 19:15 ` Stefan Monnier 1 sibling, 0 replies; 28+ messages in thread From: Stefan Monnier @ 2013-10-31 20:35 UTC (permalink / raw) To: help-gnu-emacs >> I installed a variant of that patch in Emacs's trunk, which should fix >> the problem. > The original problem is fixed, but now there's another problem. Send this > to Python: Ah, thanks for the test case. I see there are more problems, even. I'm beginning to better understand the code, Stefan ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding? 2013-10-31 17:54 ` Ernest Adrogué 2013-10-31 20:35 ` Stefan Monnier @ 2013-11-04 19:15 ` Stefan Monnier 1 sibling, 0 replies; 28+ messages in thread From: Stefan Monnier @ 2013-11-04 19:15 UTC (permalink / raw) To: help-gnu-emacs > The original problem is fixed, but now there's another problem. Send this > to Python: > class Foo(object): > pass > and you get IndentationError. The problem seems to be this change: This part should now be fixed in trunk. If you see more breakage, please report it via M-x report-emacs-bug so it gets a bug-number. Stefan ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2013-11-04 19:15 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-10-29 11:30 python-shell-send-region uses wrong encoding? Ernest Adrogué 2013-10-29 14:24 ` Drew Adams 2013-10-29 14:37 ` Ernest Adrogué 2013-10-29 16:00 ` Drew Adams 2013-10-29 16:54 ` Ernest Adrogué 2013-10-29 17:11 ` Eli Zaretskii 2013-10-29 14:26 ` Andreas Röhler 2013-10-29 14:55 ` Ernest Adrogué 2013-10-29 15:29 ` Andreas Röhler 2013-10-29 15:34 ` Peter Dyballa 2013-10-29 16:34 ` Ernest Adrogué 2013-10-29 17:15 ` Eli Zaretskii 2013-10-29 17:53 ` Ernest Adrogué 2013-10-29 19:10 ` Eli Zaretskii 2013-10-29 20:48 ` Ernest Adrogué 2013-10-29 18:07 ` Peter Dyballa 2013-10-29 20:37 ` Ernest Adrogué 2013-10-29 17:28 ` Stefan Monnier 2013-10-30 3:20 ` Stefan Monnier 2013-10-30 6:45 ` Andreas Röhler 2013-10-30 11:37 ` Stefan Monnier 2013-10-30 12:08 ` Yuri Khan 2013-10-30 12:45 ` Andreas Röhler 2013-10-31 14:30 ` Ernest Adrogué 2013-10-31 14:31 ` Ernest Adrogué 2013-10-31 17:54 ` Ernest Adrogué 2013-10-31 20:35 ` Stefan Monnier 2013-11-04 19:15 ` Stefan Monnier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).