unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Herbert Euler" <herberteuler@hotmail.com>
Cc: emacs-devel@gnu.org
Subject: Re: Fcall_process: wrong conversion
Date: Mon, 15 May 2006 23:17:06 +0800	[thread overview]
Message-ID: <BAY112-F23A14F71F826D50ED045A5DAA30@phx.gbl> (raw)
In-Reply-To: <k6z64k71ihm.fsf-monnier+emacs@gnu.org>

I followed these steps:

    - Create a file contains UTF-16 text, either UTF-16BE or UTF-16LE
      is OK.  For example, create a file contains "a" in UTF-16LE as
      its content and name this file with "1".

    - Visit file "1" with C-x C-f.

In fact, files in UTF-16 can be interpreted as UTF-16 text, or ASCII
text with non-ASCII characters.  The UTF-16LE representation of
content of file "1" is "a", and the ASCII representation is
"\377\376a^@", where "\377\376" means the text is in UTF-16LE
encoding, and in which "a" is represented as "a^@" (^@ is \0 here).
If for some reason Emacs doesn't visit the file with correct encoding,
one can type C-x RET r followed by the correct encoding and RET to
correct it.

    - In case the buffer is encoded with raw-text-unix, the content is
      displayed as "\377\376a^@".  Type M-x hexl-mode RET, correct
      result is displayed (no description here, since it's easy to
      get).

    - In case the buffer is encoded with utf-16-le, the content is
      displayed as "a".  Type M-x hexl-mode RET, the result is

          \377?: Invalid argument

      displayed in the buffer.

This is because hexl-mode finishes its job as follows:

    1. Store the buffer content in a temporary file.

    2. Invoke "hexl" with argument "-hex" and stdin set to the
       temporary file, and put its output into the same buffer.  This
       is done by calling `call-process-region' (and so
       `call-process').

    3. Manipulate the output to generate correct result.

When the buffer is encoded with raw-text-unix, the code of
`Fcall_process' in callproc.c shown in the last mail will not convert
the argument "-hex", so the actual command to be invoked is "hexl
-hex".  But if the buffer is encoded with utf-16-le, "-hex" will be
converted to "\377\376-^@h^@e^@x^@", so the command to be invoked is
"hexl \377\376-^@h^@e^@x^@".  Since "^@" is actually '\0', "hexl"
would see "\377\376-" as its first argument.  That's why the content
displayed in the second case is an error message.  The following code
of hexl-mode can't manipulate the (wrong) output correctly as a
result.

Hope I've described clearly.

Regards,
Guanpeng Xu


>From: Stefan Monnier <monnier@iro.umontreal.ca>
>To: "Herbert Euler" <herberteuler@hotmail.com>
>CC: emacs-devel@gnu.org
>Subject: Re: Fcall_process: wrong conversion
>Date: Mon, 15 May 2006 10:25:27 -0400
>
> > Fcall_process in callproc.c, which is correspond to `call-process',
> > cannot handle UTF-16 (both LE or BE) correctly.  Take a look at line
>
>Actually, it handles it just fine.  The problem is that call-process and
>start-process both use the same coding system to encode arguments and to
>encode the data sent via stdin to the process, whereas you want them to
>be distinct.
>If you want them to be distinct, then you need to manually encode your
>arguments before passing them to call-process.
>
>I.e. the bug with hexl-mode is in hexl.el.  Please report it separately
>indicating how to reproduce the problem (I don't know how to "applying
>`hexl-mode' to UTF-16 texts").
>
>
>         Stefan

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

  reply	other threads:[~2006-05-15 15:17 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-15  6:09 Fcall_process: wrong conversion Herbert Euler
2006-05-15 14:25 ` Stefan Monnier
2006-05-15 15:17   ` Herbert Euler [this message]
2006-05-15 16:06     ` Stefan Monnier
2006-05-16  2:59       ` Herbert Euler
2006-05-16  4:10         ` Kenichi Handa
2006-05-16  4:34           ` Herbert Euler
2006-05-16  4:39             ` Kenichi Handa
2006-05-16  5:40               ` Herbert Euler
2006-05-18  2:24                 ` Kenichi Handa
2006-05-18  6:07                   ` Herbert Euler
2006-05-18  6:14                     ` Herbert Euler
2006-05-18  6:26                     ` Kenichi Handa
2006-05-18  6:40                       ` Herbert Euler
2006-05-19  3:01                   ` Herbert Euler
2006-05-18 17:35           ` Stefan Monnier
2006-05-19  2:49             ` Herbert Euler
2006-05-19 10:41               ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BAY112-F23A14F71F826D50ED045A5DAA30@phx.gbl \
    --to=herberteuler@hotmail.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).