From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ernest =?iso-8859-1?Q?Adrogu=E9?= Newsgroups: gmane.emacs.help Subject: Re: python-shell-send-region uses wrong encoding? Date: Tue, 29 Oct 2013 17:34:26 +0100 Message-ID: <20131029163426.GA29055@doriath.local> References: <20131029113044.GA28039@doriath.local> <526FC58A.6080204@easy-emacs.de> <20131029145554.GB28671@doriath.local> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1383064610 18613 80.91.229.3 (29 Oct 2013 16:36:50 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Tue, 29 Oct 2013 16:36:50 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue Oct 29 17:36:52 2013 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1VbCHk-0002Vp-Og for geh-help-gnu-emacs@m.gmane.org; Tue, 29 Oct 2013 17:36:52 +0100 Original-Received: from localhost ([::1]:48289 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VbCHj-0003Q3-O4 for geh-help-gnu-emacs@m.gmane.org; Tue, 29 Oct 2013 12:36:51 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:53876) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VbCFi-0002Cb-AH for help-gnu-emacs@gnu.org; Tue, 29 Oct 2013 12:34:54 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VbCFY-0002bw-ON for help-gnu-emacs@gnu.org; Tue, 29 Oct 2013 12:34:46 -0400 Original-Received: from mail-wi0-x22d.google.com ([2a00:1450:400c:c05::22d]:40871) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VbCFY-0002ac-HR for help-gnu-emacs@gnu.org; Tue, 29 Oct 2013 12:34:36 -0400 Original-Received: by mail-wi0-f173.google.com with SMTP id ey11so5483030wid.12 for ; Tue, 29 Oct 2013 09:34:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:references:mime-version :content-type:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=3WjXlLP8iQ/B0d4DCrWYtZWnqlN9bkabFRi4GXpbqUg=; b=omOWU9fTrrLD90YXC+MjdAw8Gq2MXwITG6Sa7y3DT6VYzTGjdR/3obnpt2ClWqLawX Wih/NkBjERfc9NUel248YNGU8JWkkZCK3GYbws2sZTnJgF8zO5w87pyhvjYl8rfepP6K 3ktEJkzwhIZXfHqq1Nym2JEVuVXVNuOD0X80OZdkt8fKMK48hwAfqo0H+rexR9Lu8TcP 9Pk6LJE6DraFaZGOJCQPb3RPwIyBWtebjPpa6W82uqNs31KXGvNE+EpGaN1SMyJkxQfB Zym9UjdfD1FJH6MOeXFpxEN3TDKTrnKX79g1wppkc/WnIfcnw9xKkxdKC22rPjYE+5+m nDSg== X-Received: by 10.180.160.147 with SMTP id xk19mr256663wib.30.1383064474411; Tue, 29 Oct 2013 09:34:34 -0700 (PDT) Original-Received: from doriath (200.Red-83-58-146.dynamicIP.rima-tde.net. [83.58.146.200]) by mx.google.com with ESMTPSA id fr4sm6461113wib.0.2013.10.29.09.34.33 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 29 Oct 2013 09:34:33 -0700 (PDT) Content-Disposition: inline In-Reply-To: X-Operating-System: GNU/Linux (Debian jessie/sid) User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2a00:1450:400c:c05::22d X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:94249 Archived-At: 29-10-2013, 16:34 (+0100); Peter Dyballa escriu: >=20 > Am 29.10.2013 um 15:55 schrieb Ernest Adrogu=C3=A9: >=20 > > Here it's different, print(b) prints `W=C3=83=C2=B6rterbuch' (C-c C-r) = and > > `W=C3=B6rterbuch' (C-c C-c). >=20 > This obviously happens in an 8-bit environment. `W=C3=83=C2=B6rterbuch' i= s the > sequence of octets that represent the ISO Latin-x (or ISO 8859) encoded > word `W=C3=B6rterbuch' in UTF-8 encoding. Here the "=C3=B6" is encoded as= two > octets: 0xC3 0xB6. The first one is in ISO 8859-15 the character "=C3=84"= and > the latter is in that encoding the character "=C2=B6". >=20 > So it seems that one functions prints exclusively in UTF-8=E2=80=A6 The "=C3=B6" character is stored in the file as 0xC3 0xB6. As you say, this= is the UTF-8 encoding for this character. The Python interpreter interprets the 2-byte sequence correctly. This can be seen in a number of ways: if I run the script in a terminal, or if I paste or yank the line into Python shell buffer, or I do python-shell-send-buffer, in all these cases the sequence is converted into 0xF6, which is the UTF-16 encoding for "=C3=B6" that Python uses internally= , as the output from repr() shows.. However, when the bytes are sent with python-shell-send-region, the interpeter thinks that 0xC3 0xB6 are 2 characters, which is wrong. In light of this, I would say that there is a bug in python-shell-send-region.