From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Joseph Brenner Newsgroups: gmane.emacs.help Subject: Re: non-ascii chars in octal in sub-shell windows Date: Fri, 15 Jan 2010 14:08:11 -0800 Message-ID: <87ska7auno.fsf@kzsu.stanford.edu> References: <87aawgdpcj.fsf@kzsu.stanford.edu> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: ger.gmane.org 1263595327 16676 80.91.229.12 (15 Jan 2010 22:42:07 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 15 Jan 2010 22:42:07 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Fri Jan 15 23:42:00 2010 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1NVurf-00012I-CC for geh-help-gnu-emacs@m.gmane.org; Fri, 15 Jan 2010 23:41:59 +0100 Original-Received: from localhost ([127.0.0.1]:35425 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NVurg-0007aT-7Q for geh-help-gnu-emacs@m.gmane.org; Fri, 15 Jan 2010 17:42:00 -0500 Original-Path: news.stanford.edu!usenet.stanford.edu!postnews.google.com!news2.google.com!Xl.tags.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!local2.nntp.dca.giganews.com!nntp.posted.rawbandwidth!news.posted.rawbandwidth.POSTED!not-for-mail Original-NNTP-Posting-Date: Fri, 15 Jan 2010 16:07:23 -0600 Original-Newsgroups: gnu.emacs.help User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1.90 (gnu/linux) Cancel-Lock: sha1:zihkaqryNkRgRdKmPne3El46gdE= Original-Lines: 48 X-Usenet-Provider: http://www.giganews.com Original-NNTP-Posting-Host: 198.144.208.84 Original-X-Trace: sv3-BHDAkecTmgBcr3rACmgzP0LVqlyZXKRFhiyWM+BOyooKAFe0CpFHjM5lNfra+D1iReAyRbOdJme57tV!OVsqzN9ZLLoVHlyFxgofs5UEpNj8s5wub18zaynv63ezIMFjpktPAazHjovS3269oHzVjQVdbP8= X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly X-Postfilter: 1.3.40 Original-Xref: news.stanford.edu gnu.emacs.help:176177 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:71247 Archived-At: Peter Dyballa writes: > Joseph Brenner: >> When running a program that outputs utf-8 characters such as u-umlaut, >> in a terminal window I'll see the actual character, but in an emacs >> sub-shell I'm seeing the octal form (which looks like: \374). > > No, you're not running such a programme! The LATIN SMALL LETTER U > WITH DIAERESIS, ü, is encoded in UTF-8 as C3BC. In UTF-16 it is > 00FC – exactly two bytes! Obviously your programme just outputs > some ISO Latin dialect or such... Correct. If anyone's interested in the details of the screw-up, here's some off-topic chattering about perl programming: A typical perl test script is based on the Test::More module, which provides features to do checks such as: is_deeply( $some_structure, $expected_structure, "Testing whether structure is as expected."); This routine outputs different messages to STDOUT and/or STDERR depending on whether the check passes or fails. I was seeing octal junk in those output messages, even after adding some commands to the *.t script like so: binmode STDOUT, ':encoding(utf8)'; binmode STDERR, ':encoding(utf8)'; Normally, that would be all it would take to convince perl it needs to output UTF-8, in the case of Test::More routines, this approach fails, because it creates new output handles of it's own. Unbeknownst to me, the documentation for Test::More has been recommending doing something more like this: my $builder = Test::More->builder; binmode $builder->output, ":encoding(utf8)"; binmode $builder->failure_output, ":encoding(utf8)"; Note that merely doing this sort of thing has no effect: use utf8; use locale;