* python-shell-send-region uses wrong encoding?
@ 2013-10-29 11:30 Ernest Adrogué
2013-10-29 14:24 ` Drew Adams
` (2 more replies)
0 siblings, 3 replies; 28+ messages in thread
From: Ernest Adrogué @ 2013-10-29 11:30 UTC (permalink / raw)
To: help-gnu-emacs
Hi there,
I have got a problem with python-shell-send-region. It seems to use a wrong
encoding. For example, I have this file:
$ cat /tmp/test.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import print_function
a = 'Wörterbuch'.decode('utf8')
b = u'Wörterbuch'
print(repr(a))
print(repr(b))
$
When I open the file with Emacs and do python-shell-send-buffer (C-c C-c by
default) the following lines appear in the shell buffer:
u'W\xf6rterbuch'
u'W\xf6rterbuch'
This is the same output that I get when I run the script in a terminal.
However, if I select the lines after 'from __future__ ...' until the end and
do python-shell-send-region (C-c C-r) I get this output instead:
u'W\xf6rterbuch'
u'W\xc3\xb6rterbuch'
The second line of output seems to indicate that the text was sent in a
different encoding compared to python-shell-send-buffer.
What is going on?
Regards.
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: python-shell-send-region uses wrong encoding?
2013-10-29 11:30 python-shell-send-region uses wrong encoding? Ernest Adrogué
@ 2013-10-29 14:24 ` Drew Adams
2013-10-29 14:37 ` Ernest Adrogué
2013-10-29 14:26 ` Andreas Röhler
2013-10-29 17:28 ` Stefan Monnier
2 siblings, 1 reply; 28+ messages in thread
From: Drew Adams @ 2013-10-29 14:24 UTC (permalink / raw)
To: Ernest Adrogué, help-gnu-emacs
http://stackoverflow.com/questions/19648263/why-does-emacs-get-my-literal-unicode-strings-wrong
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-29 11:30 python-shell-send-region uses wrong encoding? Ernest Adrogué
2013-10-29 14:24 ` Drew Adams
@ 2013-10-29 14:26 ` Andreas Röhler
2013-10-29 14:55 ` Ernest Adrogué
2013-10-29 17:28 ` Stefan Monnier
2 siblings, 1 reply; 28+ messages in thread
From: Andreas Röhler @ 2013-10-29 14:26 UTC (permalink / raw)
To: help-gnu-emacs
Am 29.10.2013 12:30, schrieb Ernest Adrogué:
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
>
> from __future__ import print_function
>
> a = 'Wörterbuch'.decode('utf8')
> b = u'Wörterbuch'
>
> print(repr(a))
> print(repr(b))
Works here without the `repr':
print(a)
print(b)
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-29 14:24 ` Drew Adams
@ 2013-10-29 14:37 ` Ernest Adrogué
2013-10-29 16:00 ` Drew Adams
0 siblings, 1 reply; 28+ messages in thread
From: Ernest Adrogué @ 2013-10-29 14:37 UTC (permalink / raw)
To: help-gnu-emacs
29-10-2013, 07:24 (-0700); Drew Adams escriu:
> http://stackoverflow.com/questions/19648263/why-does-emacs-get-my-literal-unicode-strings-wrong
That was me who asked the question on stackoverflow :-)
The solutions they gave me didn't really solve the issue.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-29 14:26 ` Andreas Röhler
@ 2013-10-29 14:55 ` Ernest Adrogué
2013-10-29 15:29 ` Andreas Röhler
2013-10-29 15:34 ` Peter Dyballa
0 siblings, 2 replies; 28+ messages in thread
From: Ernest Adrogué @ 2013-10-29 14:55 UTC (permalink / raw)
To: help-gnu-emacs
29-10-2013, 15:26 (+0100); Andreas Röhler escriu:
> Am 29.10.2013 12:30, schrieb Ernest Adrogué:
> >#!/usr/bin/env python
> ># -*- coding: utf-8 -*-
> >
> >from __future__ import print_function
> >
> >a = 'Wörterbuch'.decode('utf8')
> >b = u'Wörterbuch'
> >
> >print(repr(a))
> >print(repr(b))
>
> Works here without the `repr':
>
> print(a)
> print(b)
Do you get the same result with C-c C-c as with C-c C-r?
Here it's different, print(b) prints `Wörterbuch' (C-c C-r) and
`Wörterbuch' (C-c C-c).
Something is wrong.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-29 14:55 ` Ernest Adrogué
@ 2013-10-29 15:29 ` Andreas Röhler
2013-10-29 15:34 ` Peter Dyballa
1 sibling, 0 replies; 28+ messages in thread
From: Andreas Röhler @ 2013-10-29 15:29 UTC (permalink / raw)
To: help-gnu-emacs
[-- Attachment #1: Type: text/plain, Size: 764 bytes --]
Am 29.10.2013 15:55, schrieb Ernest Adrogué:
> 29-10-2013, 15:26 (+0100); Andreas Röhler escriu:
>> Am 29.10.2013 12:30, schrieb Ernest Adrogué:
>>> #!/usr/bin/env python
>>> # -*- coding: utf-8 -*-
>>>
>> >from __future__ import print_function
>>>
>>> a = 'Wörterbuch'.decode('utf8')
>>> b = u'Wörterbuch'
>>>
>>> print(repr(a))
>>> print(repr(b))
>>
>> Works here without the `repr':
>>
>> print(a)
>> print(b)
>
> Do you get the same result with C-c C-c as with C-c C-r?
>
> Here it's different, print(b) prints `Wörterbuch' (C-c C-r) and
> `Wörterbuch' (C-c C-c).
>
> Something is wrong.
>
>
Indeed, get the same error. IMO a bug.
BTW `py-execute-region' using python-mode.el would work.
Attachment displays results with `repr'-forms first, then without.
[-- Attachment #2: py-execute-region.png --]
[-- Type: image/png, Size: 77299 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-29 14:55 ` Ernest Adrogué
2013-10-29 15:29 ` Andreas Röhler
@ 2013-10-29 15:34 ` Peter Dyballa
2013-10-29 16:34 ` Ernest Adrogué
1 sibling, 1 reply; 28+ messages in thread
From: Peter Dyballa @ 2013-10-29 15:34 UTC (permalink / raw)
To: Ernest Adrogué; +Cc: help-gnu-emacs
Am 29.10.2013 um 15:55 schrieb Ernest Adrogué:
> Here it's different, print(b) prints `Wörterbuch' (C-c C-r) and
> `Wörterbuch' (C-c C-c).
This obviously happens in an 8-bit environment. `Wörterbuch' is the sequence of octets that represent the ISO Latin-x (or ISO 8859) encoded word `Wörterbuch' in UTF-8 encoding. Here the "ö" is encoded as two octets: 0xC3 0xB6. The first one is in ISO 8859-15 the character "Ä" and the latter is in that encoding the character "¶".
So it seems that one functions prints exclusively in UTF-8…
--
Greetings
Pete
You can never know too little of what is not worth knowing at all.
– Anon.
^ permalink raw reply [flat|nested] 28+ messages in thread
* RE: python-shell-send-region uses wrong encoding?
2013-10-29 14:37 ` Ernest Adrogué
@ 2013-10-29 16:00 ` Drew Adams
2013-10-29 16:54 ` Ernest Adrogué
2013-10-29 17:11 ` Eli Zaretskii
0 siblings, 2 replies; 28+ messages in thread
From: Drew Adams @ 2013-10-29 16:00 UTC (permalink / raw)
To: Ernest Adrogué, help-gnu-emacs
> That was me who asked the question on stackoverflow :-)
Yes, that's clear.
> The solutions they gave me didn't really solve the issue.
I see. Then why did you accept one of them (instead of just
up-voting it, if it provided only partial help)?
And just what was wrong with the solution (from user `wvxvw')
that you accepted?
Here is `wvxvw's answer, in case it helps others understand
what else might need to be added in order to help you:
"you need the encoding system in your buffer which contains
the source code to be utf-8 to send two bytes for ö. However,
if it is a single-byte encoding, and given that you select the
locale that maps the byte F6 to ö, you will get that byte.
PS. Make sure you have -*- coding: utf-8 -*- comment."
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-29 15:34 ` Peter Dyballa
@ 2013-10-29 16:34 ` Ernest Adrogué
2013-10-29 17:15 ` Eli Zaretskii
2013-10-29 18:07 ` Peter Dyballa
0 siblings, 2 replies; 28+ messages in thread
From: Ernest Adrogué @ 2013-10-29 16:34 UTC (permalink / raw)
To: help-gnu-emacs
29-10-2013, 16:34 (+0100); Peter Dyballa escriu:
>
> Am 29.10.2013 um 15:55 schrieb Ernest Adrogué:
>
> > Here it's different, print(b) prints `Wörterbuch' (C-c C-r) and
> > `Wörterbuch' (C-c C-c).
>
> This obviously happens in an 8-bit environment. `Wörterbuch' is the
> sequence of octets that represent the ISO Latin-x (or ISO 8859) encoded
> word `Wörterbuch' in UTF-8 encoding. Here the "ö" is encoded as two
> octets: 0xC3 0xB6. The first one is in ISO 8859-15 the character "Ä" and
> the latter is in that encoding the character "¶".
>
> So it seems that one functions prints exclusively in UTF-8…
The "ö" character is stored in the file as 0xC3 0xB6. As you say, this is
the UTF-8 encoding for this character.
The Python interpreter interprets the 2-byte sequence correctly. This can
be seen in a number of ways: if I run the script in a terminal, or if I
paste or yank the line into Python shell buffer, or I do
python-shell-send-buffer, in all these cases the sequence is converted into
0xF6, which is the UTF-16 encoding for "ö" that Python uses internally, as
the output from repr() shows..
However, when the bytes are sent with python-shell-send-region, the
interpeter thinks that 0xC3 0xB6 are 2 characters, which is wrong. In light
of this, I would say that there is a bug in python-shell-send-region.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-29 16:00 ` Drew Adams
@ 2013-10-29 16:54 ` Ernest Adrogué
2013-10-29 17:11 ` Eli Zaretskii
1 sibling, 0 replies; 28+ messages in thread
From: Ernest Adrogué @ 2013-10-29 16:54 UTC (permalink / raw)
To: help-gnu-emacs
29-10-2013, 09:00 (-0700); Drew Adams escriu:
> > That was me who asked the question on stackoverflow :-)
>
> Yes, that's clear.
>
> > The solutions they gave me didn't really solve the issue.
>
> I see. Then why did you accept one of them (instead of just
> up-voting it, if it provided only partial help)?
I accepted it temporarily until I or someone else comes up with a better
solution.
> And just what was wrong with the solution (from user `wvxvw')
> that you accepted?
>
> Here is `wvxvw's answer, in case it helps others understand
> what else might need to be added in order to help you:
>
> "you need the encoding system in your buffer which contains
> the source code to be utf-8 to send two bytes for ö. However,
> if it is a single-byte encoding, and given that you select the
> locale that maps the byte F6 to ö, you will get that byte.
>
> PS. Make sure you have -*- coding: utf-8 -*- comment."
What is wrong with this answer is that the encoding in the buffer was
already UTF-8. I double checked every variable and all of them are set to
utf-8-unix.
Regards.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-29 16:00 ` Drew Adams
2013-10-29 16:54 ` Ernest Adrogué
@ 2013-10-29 17:11 ` Eli Zaretskii
1 sibling, 0 replies; 28+ messages in thread
From: Eli Zaretskii @ 2013-10-29 17:11 UTC (permalink / raw)
To: help-gnu-emacs
> Date: Tue, 29 Oct 2013 09:00:08 -0700 (PDT)
> From: Drew Adams <drew.adams@oracle.com>
>
> "you need the encoding system in your buffer which contains
> the source code to be utf-8 to send two bytes for ö. However,
> if it is a single-byte encoding, and given that you select the
> locale that maps the byte F6 to ö, you will get that byte.
>
> PS. Make sure you have -*- coding: utf-8 -*- comment."
I hope you (and everyone else) understand that the above is profoundly
wrong. There's no relation whatsoever between the buffer's file
encoding and the encoding of the material Emacs sends to an inferior
process.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-29 16:34 ` Ernest Adrogué
@ 2013-10-29 17:15 ` Eli Zaretskii
2013-10-29 17:53 ` Ernest Adrogué
2013-10-29 18:07 ` Peter Dyballa
1 sibling, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2013-10-29 17:15 UTC (permalink / raw)
To: help-gnu-emacs
> Date: Tue, 29 Oct 2013 17:34:26 +0100
> From: Ernest Adrogué <nfdisco@gmail.com>
>
> The "ö" character is stored in the file as 0xC3 0xB6. As you say, this is
> the UTF-8 encoding for this character.
>
> The Python interpreter interprets the 2-byte sequence correctly. This can
> be seen in a number of ways: if I run the script in a terminal, or if I
> paste or yank the line into Python shell buffer, or I do
> python-shell-send-buffer, in all these cases the sequence is converted into
> 0xF6, which is the UTF-16 encoding for "ö" that Python uses internally, as
> the output from repr() shows..
>
> However, when the bytes are sent with python-shell-send-region, the
> interpeter thinks that 0xC3 0xB6 are 2 characters, which is wrong. In light
> of this, I would say that there is a bug in python-shell-send-region.
Why is that a bug, and what would you expect python-shell-send-region
to send instead (and why)?
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-29 11:30 python-shell-send-region uses wrong encoding? Ernest Adrogué
2013-10-29 14:24 ` Drew Adams
2013-10-29 14:26 ` Andreas Röhler
@ 2013-10-29 17:28 ` Stefan Monnier
2013-10-30 3:20 ` Stefan Monnier
2 siblings, 1 reply; 28+ messages in thread
From: Stefan Monnier @ 2013-10-29 17:28 UTC (permalink / raw)
To: help-gnu-emacs; +Cc: Fabián Ezequiel Gallina
> This is the same output that I get when I run the script in a terminal.
> However, if I select the lines after 'from __future__ ...' until the end and
> do python-shell-send-region (C-c C-r) I get this output instead:
> u'W\xf6rterbuch'
> u'W\xc3\xb6rterbuch'
> The second line of output seems to indicate that the text was sent in a
> different encoding compared to python-shell-send-buffer.
No, it indicates that Python interpreted the bytes sent by Emacs in
a different way: the first line (where you explicitly asked for a utf-8
decoding) indicates that the bytes indeed use the right utf-8 encoding,
but the second indicates that Python does not decode the input as utf-8
but as something else (presumably latin-1).
E.g. the patch below (which causes python-shell-send-string to tell
Python that the file sent is using utf-8) should fix your problem (tho
it's not a proper fix, since we shouldn't hardcode utf-8 here, but copy
which ever -*- coding: -*- coding is in the file).
Fabián, could you write a cleaner fix?
Stefan
=== modified file 'lisp/progmodes/python.el'
--- lisp/progmodes/python.el 2013-10-07 18:51:26 +0000
+++ lisp/progmodes/python.el 2013-10-29 17:25:11 +0000
@@ -2047,6 +2047,7 @@
(temp-file-name (make-temp-file "py"))
(file-name (or (buffer-file-name) temp-file-name)))
(with-temp-file temp-file-name
+ (insert "# -*- coding: utf-8 -*-")
(insert string)
(delete-trailing-whitespace))
(python-shell-send-file file-name process temp-file-name))
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-29 17:15 ` Eli Zaretskii
@ 2013-10-29 17:53 ` Ernest Adrogué
2013-10-29 19:10 ` Eli Zaretskii
0 siblings, 1 reply; 28+ messages in thread
From: Ernest Adrogué @ 2013-10-29 17:53 UTC (permalink / raw)
To: help-gnu-emacs
29-10-2013, 19:15 (+0200); Eli Zaretskii escriu:
> > Date: Tue, 29 Oct 2013 17:34:26 +0100
> > From: Ernest Adrogué <nfdisco@gmail.com>
> >
> > The "ö" character is stored in the file as 0xC3 0xB6. As you say, this is
> > the UTF-8 encoding for this character.
> >
> > The Python interpreter interprets the 2-byte sequence correctly. This can
> > be seen in a number of ways: if I run the script in a terminal, or if I
> > paste or yank the line into Python shell buffer, or I do
> > python-shell-send-buffer, in all these cases the sequence is converted into
> > 0xF6, which is the UTF-16 encoding for "ö" that Python uses internally, as
> > the output from repr() shows..
> >
> > However, when the bytes are sent with python-shell-send-region, the
> > interpeter thinks that 0xC3 0xB6 are 2 characters, which is wrong. In light
> > of this, I would say that there is a bug in python-shell-send-region.
>
> Why is that a bug, and what would you expect python-shell-send-region
> to send instead (and why)?
I would expect python-shell-send-region to be a shortcut for saving the
region, switching to the shell buffer, yanking and hitting RET.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-29 16:34 ` Ernest Adrogué
2013-10-29 17:15 ` Eli Zaretskii
@ 2013-10-29 18:07 ` Peter Dyballa
2013-10-29 20:37 ` Ernest Adrogué
1 sibling, 1 reply; 28+ messages in thread
From: Peter Dyballa @ 2013-10-29 18:07 UTC (permalink / raw)
To: Ernest Adrogué; +Cc: help-gnu-emacs
Am 29.10.2013 um 17:34 schrieb Ernest Adrogué:
> in all these cases the sequence is converted into
> 0xF6, which is the UTF-16 encoding for "ö" that Python uses internally, as
> the output from repr() shows..
No!
UTF-16 is a text encoding that uses for each character in the BMP (Basic Multilingual Plane) 2 bytes or 16 bits – therefore the name. In UTF-16 ö is encoded as 0x00F6. (Above it uses twice 2 bytes, i.e. 4 bytes, which are distinct from UTF-32.)
--
Greetings
Pete
Make it simple, as simple as possible but no simpler.
– Albert Einstein
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-29 17:53 ` Ernest Adrogué
@ 2013-10-29 19:10 ` Eli Zaretskii
2013-10-29 20:48 ` Ernest Adrogué
0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2013-10-29 19:10 UTC (permalink / raw)
To: help-gnu-emacs
> Date: Tue, 29 Oct 2013 18:53:03 +0100
> From: Ernest Adrogué <nfdisco@gmail.com>
>
> I would expect python-shell-send-region to be a shortcut for saving the
> region, switching to the shell buffer, yanking and hitting RET.
Emacs can send stuff to an inferior program directly, without going
through a file.
But even if going through a file, why is it wrong to send 2 bytes when
a character's UTF-8 encoding takes those same 2 bytes?
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-29 18:07 ` Peter Dyballa
@ 2013-10-29 20:37 ` Ernest Adrogué
0 siblings, 0 replies; 28+ messages in thread
From: Ernest Adrogué @ 2013-10-29 20:37 UTC (permalink / raw)
To: help-gnu-emacs
29-10-2013, 19:07 (+0100); Peter Dyballa escriu:
>
> Am 29.10.2013 um 17:34 schrieb Ernest Adrogué:
>
> > in all these cases the sequence is converted into
> > 0xF6, which is the UTF-16 encoding for "ö" that Python uses internally, as
> > the output from repr() shows..
>
> No!
>
> UTF-16 is a text encoding that uses for each character in the BMP (Basic
> Multilingual Plane) 2 bytes or 16 bits – therefore the name. In UTF-16 ö
> is encoded as 0x00F6. (Above it uses twice 2 bytes, i.e. 4 bytes, which
> are distinct from UTF-32.)
I stand corrected.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-29 19:10 ` Eli Zaretskii
@ 2013-10-29 20:48 ` Ernest Adrogué
0 siblings, 0 replies; 28+ messages in thread
From: Ernest Adrogué @ 2013-10-29 20:48 UTC (permalink / raw)
To: help-gnu-emacs
29-10-2013, 21:10 (+0200); Eli Zaretskii escriu:
> > Date: Tue, 29 Oct 2013 18:53:03 +0100
> > From: Ernest Adrogué <nfdisco@gmail.com>
> >
> > I would expect python-shell-send-region to be a shortcut for saving the
> > region, switching to the shell buffer, yanking and hitting RET.
>
> Emacs can send stuff to an inferior program directly, without going
> through a file.
>
> But even if going through a file, why is it wrong to send 2 bytes when
> a character's UTF-8 encoding takes those same 2 bytes?
It's not wrong to send 2 bytes, the problem was that you also have to tell
Python the encoding. Anyway, it's fixed now.
Thanks Stefan for the patch and everyone else for the help.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-29 17:28 ` Stefan Monnier
@ 2013-10-30 3:20 ` Stefan Monnier
2013-10-30 6:45 ` Andreas Röhler
` (2 more replies)
0 siblings, 3 replies; 28+ messages in thread
From: Stefan Monnier @ 2013-10-30 3:20 UTC (permalink / raw)
To: help-gnu-emacs; +Cc: Fabián Ezequiel Gallina
> E.g. the patch below (which causes python-shell-send-string to tell
> Python that the file sent is using utf-8) should fix your problem (tho
> it's not a proper fix, since we shouldn't hardcode utf-8 here, but copy
> which ever -*- coding: -*- coding is in the file).
I installed a variant of that patch in Emacs's trunk, which should fix
the problem. The relevant part of the patch is quoted below, so you can
try it out,
Stefan
=== modified file 'lisp/progmodes/python.el'
--- lisp/progmodes/python.el 2013-10-07 18:51:26 +0000
+++ lisp/progmodes/python.el 2013-10-30 01:28:36 +0000
@@ -2045,7 +2051,9 @@
(concat (file-remote-p default-directory) "/tmp")
temporary-file-directory))
(temp-file-name (make-temp-file "py"))
+ (coding-system-for-write 'utf-8)
(file-name (or (buffer-file-name) temp-file-name)))
(with-temp-file temp-file-name
+ (insert "# -*- coding: utf-8 -*-\n")
(insert string)
(delete-trailing-whitespace))
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-30 3:20 ` Stefan Monnier
@ 2013-10-30 6:45 ` Andreas Röhler
2013-10-30 11:37 ` Stefan Monnier
2013-10-31 14:31 ` Ernest Adrogué
2013-10-31 17:54 ` Ernest Adrogué
2 siblings, 1 reply; 28+ messages in thread
From: Andreas Röhler @ 2013-10-30 6:45 UTC (permalink / raw)
To: help-gnu-emacs
Am 30.10.2013 04:20, schrieb Stefan Monnier:
>> E.g. the patch below (which causes python-shell-send-string to tell
>> Python that the file sent is using utf-8) should fix your problem (tho
>> it's not a proper fix, since we shouldn't hardcode utf-8 here, but copy
>> which ever -*- coding: -*- coding is in the file).
>
> I installed a variant of that patch in Emacs's trunk, which should fix
> the problem. The relevant part of the patch is quoted below, so you can
> try it out,
>
>
> Stefan
>
>
> === modified file 'lisp/progmodes/python.el'
> --- lisp/progmodes/python.el 2013-10-07 18:51:26 +0000
> +++ lisp/progmodes/python.el 2013-10-30 01:28:36 +0000
> @@ -2045,7 +2051,9 @@
> (concat (file-remote-p default-directory) "/tmp")
> temporary-file-directory))
> (temp-file-name (make-temp-file "py"))
> + (coding-system-for-write 'utf-8)
> (file-name (or (buffer-file-name) temp-file-name)))
> (with-temp-file temp-file-name
> + (insert "# -*- coding: utf-8 -*-\n")
> (insert string)
> (delete-trailing-whitespace))
>
>
>
IIUC the second added line "-*- coding: utf-8 -*-\n" should not be needed, as that's the default at the Python side anyway.
Also functions tracing a possibly error might see a different line-offset that way - just an abstract reasoning so far.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-30 6:45 ` Andreas Röhler
@ 2013-10-30 11:37 ` Stefan Monnier
2013-10-30 12:08 ` Yuri Khan
2013-10-31 14:30 ` Ernest Adrogué
0 siblings, 2 replies; 28+ messages in thread
From: Stefan Monnier @ 2013-10-30 11:37 UTC (permalink / raw)
To: help-gnu-emacs
> IIUC the second added line "-*- coding: utf-8 -*-\n" should not be needed,
> as that's the default at the Python side anyway.
Based on the OP's experience, it seems this is not true.
Also, the doc I can find indicates that the default is ASCII (and was
latin-1 in the past, which is what the OP seems to be seeing).
But, you're a lot more experienced in Python than I am (I never wrote
a single line of Python, basically), so maybe there's another
explanation for the OP's problem?
> Also functions tracing a possibly error might see a different line-offset
> that way - just an abstract reasoning so far.
This function is used to send a string (typically extracted from the
region) not a file, so the offsets have always been wrong anyway (except
when the region happens to start on the first line).
By, yes, as it happens, the patch I installed does include some other
change to try and translate the offsets appropriately.
Stefan
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-30 11:37 ` Stefan Monnier
@ 2013-10-30 12:08 ` Yuri Khan
2013-10-30 12:45 ` Andreas Röhler
2013-10-31 14:30 ` Ernest Adrogué
1 sibling, 1 reply; 28+ messages in thread
From: Yuri Khan @ 2013-10-30 12:08 UTC (permalink / raw)
To: Stefan Monnier; +Cc: help-gnu-emacs@gnu.org
On Wed, Oct 30, 2013 at 6:37 PM, Stefan Monnier
<monnier@iro.umontreal.ca> wrote:
>> IIUC the second added line "-*- coding: utf-8 -*-\n" should not be needed,
>> as that's the default at the Python side anyway.
>
> Based on the OP's experience, it seems this is not true.
> Also, the doc I can find indicates that the default is ASCII (and was
> latin-1 in the past, which is what the OP seems to be seeing).
There is Python, and, on the other hand, there is Python.
In some GNU/Linux distributions (e.g. Ubuntu), the default Python is
version 2.x, whose default encoding is ASCII (and has been that way
since at least 2.6).
On the other hand, the current version of Python is 3.x, where a big
Unicode revolution has happened and now the default encoding is UTF-8.
Both of these Pythons recognize Emacs-style and vim-style encoding
declarations, and in addition the UTF-8 byte order mark (which can be
called Notepad-style encoding declaration).
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-30 12:08 ` Yuri Khan
@ 2013-10-30 12:45 ` Andreas Röhler
0 siblings, 0 replies; 28+ messages in thread
From: Andreas Röhler @ 2013-10-30 12:45 UTC (permalink / raw)
To: help-gnu-emacs
Am 30.10.2013 13:08, schrieb Yuri Khan:
> On Wed, Oct 30, 2013 at 6:37 PM, Stefan Monnier
> <monnier@iro.umontreal.ca> wrote:
>>> IIUC the second added line "-*- coding: utf-8 -*-\n" should not be needed,
>>> as that's the default at the Python side anyway.
>>
>> Based on the OP's experience, it seems this is not true.
>> Also, the doc I can find indicates that the default is ASCII (and was
>> latin-1 in the past, which is what the OP seems to be seeing).
>
> There is Python, and, on the other hand, there is Python.
>
> In some GNU/Linux distributions (e.g. Ubuntu), the default Python is
> version 2.x, whose default encoding is ASCII (and has been that way
> since at least 2.6).
>
> On the other hand, the current version of Python is 3.x, where a big
> Unicode revolution has happened and now the default encoding is UTF-8.
>
> Both of these Pythons recognize Emacs-style and vim-style encoding
> declarations, and in addition the UTF-8 byte order mark (which can be
> called Notepad-style encoding declaration).
>
>
Thanks clarifying this, so I stand corrected.
Andreas
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-30 11:37 ` Stefan Monnier
2013-10-30 12:08 ` Yuri Khan
@ 2013-10-31 14:30 ` Ernest Adrogué
1 sibling, 0 replies; 28+ messages in thread
From: Ernest Adrogué @ 2013-10-31 14:30 UTC (permalink / raw)
To: help-gnu-emacs
30-10-2013, 07:37 (-0400); Stefan Monnier escriu:
> This function is used to send a string (typically extracted from the
> region) not a file, so the offsets have always been wrong anyway (except
> when the region happens to start on the first line).
The line number is right, because the temp file is filled with empty lines
so that the line numbers match. Because of the extra line, it's one line
off. This fixes it:
--- python.el.orig 2013-10-31 15:21:26.000000000 +0100
+++ python.el 2013-10-31 15:19:03.673891453 +0100
@@ -2154,7 +2146,7 @@
3. Wraps indented regions under an \"if True:\" block so the
interpreter evaluates them correctly."
(let ((substring (buffer-substring-no-properties start end))
- (fillstr (make-string (1- (line-number-at-pos start)) ?\n))
+ (fillstr (make-string (- (line-number-at-pos start) 2) ?\n))
(toplevel-block-p (save-excursion
(goto-char start)
(or (zerop (line-number-at-pos start))
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-30 3:20 ` Stefan Monnier
2013-10-30 6:45 ` Andreas Röhler
@ 2013-10-31 14:31 ` Ernest Adrogué
2013-10-31 17:54 ` Ernest Adrogué
2 siblings, 0 replies; 28+ messages in thread
From: Ernest Adrogué @ 2013-10-31 14:31 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Fabián Ezequiel Gallina, help-gnu-emacs
29-10-2013, 23:20 (-0400); Stefan Monnier escriu:
> I installed a variant of that patch in Emacs's trunk, which should fix
> the problem. The relevant part of the patch is quoted below, so you can
> try it out,
It works for me.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-30 3:20 ` Stefan Monnier
2013-10-30 6:45 ` Andreas Röhler
2013-10-31 14:31 ` Ernest Adrogué
@ 2013-10-31 17:54 ` Ernest Adrogué
2013-10-31 20:35 ` Stefan Monnier
2013-11-04 19:15 ` Stefan Monnier
2 siblings, 2 replies; 28+ messages in thread
From: Ernest Adrogué @ 2013-10-31 17:54 UTC (permalink / raw)
To: help-gnu-emacs
29-10-2013, 23:20 (-0400); Stefan Monnier escriu:
> I installed a variant of that patch in Emacs's trunk, which should fix
> the problem.
The original problem is fixed, but now there's another problem. Send this
to Python:
class Foo(object):
pass
and you get IndentationError. The problem seems to be this change:
@@ -2034,26 +2038,32 @@ there for compatibility with CEDET.")
(defun python-shell-send-string (string &optional process msg)
"Send STRING to inferior Python PROCESS.
-When MSG is non-nil messages the first line of STRING."
+When MSG is non-nil messages the first line of STRING.
+If a temp file is used, return its name, otherwise return nil."
(interactive "sPython command: ")
(let ((process (or process (python-shell-get-or-create-process)))
- (lines (split-string string "\n" t)))
- (and msg (message "Sent: %s..." (nth 0 lines)))
- (if (> (length lines) 1)
+ (_ (string-match "\\`\n*\\(.*\\)\\(\n.\\)?" string))
+ (multiline (match-beginning 2)))
+ (and msg (message "Sent: %s..." (match-string 1 string)))
+ (if multiline
(let* ((temporary-file-directory
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-31 17:54 ` Ernest Adrogué
@ 2013-10-31 20:35 ` Stefan Monnier
2013-11-04 19:15 ` Stefan Monnier
1 sibling, 0 replies; 28+ messages in thread
From: Stefan Monnier @ 2013-10-31 20:35 UTC (permalink / raw)
To: help-gnu-emacs
>> I installed a variant of that patch in Emacs's trunk, which should fix
>> the problem.
> The original problem is fixed, but now there's another problem. Send this
> to Python:
Ah, thanks for the test case. I see there are more problems, even.
I'm beginning to better understand the code,
Stefan
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: python-shell-send-region uses wrong encoding?
2013-10-31 17:54 ` Ernest Adrogué
2013-10-31 20:35 ` Stefan Monnier
@ 2013-11-04 19:15 ` Stefan Monnier
1 sibling, 0 replies; 28+ messages in thread
From: Stefan Monnier @ 2013-11-04 19:15 UTC (permalink / raw)
To: help-gnu-emacs
> The original problem is fixed, but now there's another problem. Send this
> to Python:
> class Foo(object):
> pass
> and you get IndentationError. The problem seems to be this change:
This part should now be fixed in trunk. If you see more breakage,
please report it via M-x report-emacs-bug so it gets a bug-number.
Stefan
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2013-11-04 19:15 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-29 11:30 python-shell-send-region uses wrong encoding? Ernest Adrogué
2013-10-29 14:24 ` Drew Adams
2013-10-29 14:37 ` Ernest Adrogué
2013-10-29 16:00 ` Drew Adams
2013-10-29 16:54 ` Ernest Adrogué
2013-10-29 17:11 ` Eli Zaretskii
2013-10-29 14:26 ` Andreas Röhler
2013-10-29 14:55 ` Ernest Adrogué
2013-10-29 15:29 ` Andreas Röhler
2013-10-29 15:34 ` Peter Dyballa
2013-10-29 16:34 ` Ernest Adrogué
2013-10-29 17:15 ` Eli Zaretskii
2013-10-29 17:53 ` Ernest Adrogué
2013-10-29 19:10 ` Eli Zaretskii
2013-10-29 20:48 ` Ernest Adrogué
2013-10-29 18:07 ` Peter Dyballa
2013-10-29 20:37 ` Ernest Adrogué
2013-10-29 17:28 ` Stefan Monnier
2013-10-30 3:20 ` Stefan Monnier
2013-10-30 6:45 ` Andreas Röhler
2013-10-30 11:37 ` Stefan Monnier
2013-10-30 12:08 ` Yuri Khan
2013-10-30 12:45 ` Andreas Röhler
2013-10-31 14:30 ` Ernest Adrogué
2013-10-31 14:31 ` Ernest Adrogué
2013-10-31 17:54 ` Ernest Adrogué
2013-10-31 20:35 ` Stefan Monnier
2013-11-04 19:15 ` Stefan Monnier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).