unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Ryan Duan <duanpanda@gmail.com>
To: 3616@emacsbugs.donarmstrong.com
Subject: bug#3616: 23.0.94; vc-bzr coding system bug
Date: Mon, 22 Jun 2009 10:01:51 +0800	[thread overview]
Message-ID: <30dcab0d0906211901n646f348bu4e6c80bafe5ba780@mail.gmail.com> (raw)
In-Reply-To: <30dcab0d0906202300n3f64dac5i54b79932bcfcf4fb@mail.gmail.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=GB2312, Size: 4040 bytes --]

It works from the command line which is part of Windows XP and uses
Windows ANSI coding system.  Windows command line seems use cp936 as
the coding system.
The value of buffer-file-coding-system in the *shell* buffer is
chinese-gbk-dos, one of whose alias is cp936-dos.  It doesn't help to
change it to any of cp936 nor chinese-iso-8bit.

I observe that *shell* and *VC-log* buffers pass UTF-8 encoded string
(Is Emacs's internal buffer code UTF-8?) to Windows command line,
which might be the real cause of this bug and other related bugs.
Three examples following.

EXAMPLE 1
--------------------------------
In *shell*,
d:\code>bzr commit -m "µÚ¶þ"
bzr commit -m "ç¬ ºŒ"
Traceback (most recent call last):
 File "bzr", line 130, in <module>
 File "bzrlib\commands.pyo", line 969, in main
bzrlib.errors.BzrError: Parameter ''\xe7\xac\xac\xe4\xba\x8c'' is
unsupported by the current encoding.

Notice ''\xe7\xac\xac\xe4\xba\x8c'' which is the UTF-8 encoding of my
inputted Chinese characters.  It was these UTF-8 string causing the
above error.

Apply C-u C-x = on the Chinese character "µÚ":
       character: µÚ (31532, #o75454, #x7b2c)
preferred charset: chinese-gbk (GBK Chinese simplified.)
      code point: 0xB5DA
          syntax: w    which means: word
        category:
                  .:Base, C:2-byte han, c:Chinese, h:Korean,
j:Japanese, |:line breakable
     buffer code: #xE7 #xAC #xAC
       file code: #xB5 #xDA (encoded by coding system chinese-gbk-dos)
         display: by this font (glyph code)
   uniscribe:-outline-ÐÂËÎÌå-normal-normal-normal-mono-13-*-*-*-c-*-gb2312.1980-0
(#x3100)

Notice its buffer code is "\xe7\xac\xac" which is the first substring
of ''\xe7\xac\xac\xe4\xba\x8c''.  The file code "\xb5\xda" is
chinse-gbk encoded, and is what I expect to pass to the command line
system in Windows, which might work correctly.  But unfortunately,
instead of passing Chinese GBK encoded string to SHELL, Emacs passes
UTF-8 encoded string to SHELL.

EXAMPLE 2
--------------------------------
In *VC-log* buffer, I inputted two Chinese characters "µÚ¶þ" which was
the same as that in EXAMPLE 1.
After C-c C-c, the same error occurs: bzrlib.errors.BzrError:
Parameter ''\xe7\xac\xac\xe4\xba\x8c'' is unsupported by the current
encoding.
Apply C-u C-x = on "µÚ" returned the same information as that in EXAMPLE 1.

EXAMPLE 3 (Another related bug)
--------------------------------
In Windows, I created a directory (folder) named "µÚ¶þ".
In dired, it works all right.
But in *shell*,
d:\>cd µÚ¶þ
cd ç¬ ºŒ
ϵͳÕÒ²»µ½Ö¸¶¨µÄ·¾¶¡£

It complains that the system cannot find the specified path.  Because
the "\xb5\xda\xb6\xfe"(Chinese GBK) is converted to
''\xe7\xac\xac\xe4\xba\x8c''(UTF-8) to pass to the SHELL, but the
SHELL can only process Chinese GBK characters.

CONCLUSION
--------------------------------
When we use Emacs on Chinese Windows, Chinese GBK characters are
converted to UTF-8 characters to pass to Windows command line, but
Windows command line cannot process UTF-8 characters, which causes
this bug and other related bugs.

I feel that this is not a small problem.  Emacs should detect the OS's
locale, then use the correct encoding system to interact with the OS.
It seems to do well on Linux but badly on Windows.  Dired seems do
well on Windows but shell.el and vc-bzr.el do badly.  I didn't test
other vc-* modes.

I hope the information above will help solve this problem.  Thank you!
HAPPY HACKING!

2009/6/19 Eli Zaretskii <eliz@gnu.org>:
>> Date: Fri, 19 Jun 2009 16:24:37 +0800
>> From: =?UTF-8?Q?=E7=AB=AF=E7=91=9E?= <duanpanda@gmail.com>
>> Cc:
>> Reply-To: =?UTF-8?Q?=E7=AB=AF=E7=91=9E?= <duanpanda@gmail.com>,
>>       3616@emacsbugs.donarmstrong.com

> Does it work for you from the command line?  If it does, what encoding
> of Chinese do you use in that case?
>
> What is the value of buffer-file-coding-system in the *shell* buffer?
> Does it help to change it to cp936?





  parent reply	other threads:[~2009-06-22  2:01 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-19  8:24 bug#3616: 23.0.94; vc-bzr coding system bug 端瑞
2009-06-19 12:10 ` Eli Zaretskii
     [not found]   ` <30dcab0d0906202300n3f64dac5i54b79932bcfcf4fb@mail.gmail.com>
2009-06-22  2:01     ` Ryan Duan [this message]
2009-06-22 17:59       ` Andreas Schwab
2009-06-23  2:39         ` Ryan Duan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=30dcab0d0906211901n646f348bu4e6c80bafe5ba780@mail.gmail.com \
    --to=duanpanda@gmail.com \
    --cc=3616@emacsbugs.donarmstrong.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).