From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Herbert Euler" Newsgroups: gmane.emacs.devel Subject: Re: Fcall_process: wrong conversion Date: Mon, 15 May 2006 23:17:06 +0800 Message-ID: References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; format=flowed X-Trace: sea.gmane.org 1147706252 14269 80.91.229.2 (15 May 2006 15:17:32 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Mon, 15 May 2006 15:17:32 +0000 (UTC) Cc: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon May 15 17:17:31 2006 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by ciao.gmane.org with esmtp (Exim 4.43) id 1Ffep5-0003RD-VA for ged-emacs-devel@m.gmane.org; Mon, 15 May 2006 17:17:28 +0200 Original-Received: from localhost ([127.0.0.1] helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ffep5-0004ag-BT for ged-emacs-devel@m.gmane.org; Mon, 15 May 2006 11:17:27 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Ffeop-0004YM-Fh for emacs-devel@gnu.org; Mon, 15 May 2006 11:17:11 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Ffeon-0004XT-Ks for emacs-devel@gnu.org; Mon, 15 May 2006 11:17:11 -0400 Original-Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ffeon-0004XM-Gr for emacs-devel@gnu.org; Mon, 15 May 2006 11:17:09 -0400 Original-Received: from [64.4.26.33] (helo=hotmail.com) by monty-python.gnu.org with esmtp (Exim 4.52) id 1FferE-0003nn-KA for emacs-devel@gnu.org; Mon, 15 May 2006 11:19:40 -0400 Original-Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; Mon, 15 May 2006 08:17:08 -0700 Original-Received: from 64.4.26.200 by by112fd.bay112.hotmail.msn.com with HTTP; Mon, 15 May 2006 15:17:06 GMT X-Originating-IP: [145.18.59.41] X-Originating-Email: [herberteuler@hotmail.com] X-Sender: herberteuler@hotmail.com In-Reply-To: Original-To: monnier@iro.umontreal.ca X-OriginalArrivalTime: 15 May 2006 15:17:08.0097 (UTC) FILETIME=[A190FF10:01C67832] X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:54508 Archived-At: I followed these steps: - Create a file contains UTF-16 text, either UTF-16BE or UTF-16LE is OK. For example, create a file contains "a" in UTF-16LE as its content and name this file with "1". - Visit file "1" with C-x C-f. In fact, files in UTF-16 can be interpreted as UTF-16 text, or ASCII text with non-ASCII characters. The UTF-16LE representation of content of file "1" is "a", and the ASCII representation is "\377\376a^@", where "\377\376" means the text is in UTF-16LE encoding, and in which "a" is represented as "a^@" (^@ is \0 here). If for some reason Emacs doesn't visit the file with correct encoding, one can type C-x RET r followed by the correct encoding and RET to correct it. - In case the buffer is encoded with raw-text-unix, the content is displayed as "\377\376a^@". Type M-x hexl-mode RET, correct result is displayed (no description here, since it's easy to get). - In case the buffer is encoded with utf-16-le, the content is displayed as "a". Type M-x hexl-mode RET, the result is \377?: Invalid argument displayed in the buffer. This is because hexl-mode finishes its job as follows: 1. Store the buffer content in a temporary file. 2. Invoke "hexl" with argument "-hex" and stdin set to the temporary file, and put its output into the same buffer. This is done by calling `call-process-region' (and so `call-process'). 3. Manipulate the output to generate correct result. When the buffer is encoded with raw-text-unix, the code of `Fcall_process' in callproc.c shown in the last mail will not convert the argument "-hex", so the actual command to be invoked is "hexl -hex". But if the buffer is encoded with utf-16-le, "-hex" will be converted to "\377\376-^@h^@e^@x^@", so the command to be invoked is "hexl \377\376-^@h^@e^@x^@". Since "^@" is actually '\0', "hexl" would see "\377\376-" as its first argument. That's why the content displayed in the second case is an error message. The following code of hexl-mode can't manipulate the (wrong) output correctly as a result. Hope I've described clearly. Regards, Guanpeng Xu >From: Stefan Monnier >To: "Herbert Euler" >CC: emacs-devel@gnu.org >Subject: Re: Fcall_process: wrong conversion >Date: Mon, 15 May 2006 10:25:27 -0400 > > > Fcall_process in callproc.c, which is correspond to `call-process', > > cannot handle UTF-16 (both LE or BE) correctly. Take a look at line > >Actually, it handles it just fine. The problem is that call-process and >start-process both use the same coding system to encode arguments and to >encode the data sent via stdin to the process, whereas you want them to >be distinct. >If you want them to be distinct, then you need to manually encode your >arguments before passing them to call-process. > >I.e. the bug with hexl-mode is in hexl.el. Please report it separately >indicating how to reproduce the problem (I don't know how to "applying >`hexl-mode' to UTF-16 texts"). > > > Stefan _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/