From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ryan Duan Newsgroups: gmane.emacs.bugs Subject: bug#3616: 23.0.94; vc-bzr coding system bug Date: Mon, 22 Jun 2009 10:01:51 +0800 Message-ID: <30dcab0d0906211901n646f348bu4e6c80bafe5ba780@mail.gmail.com> References: <30dcab0d0906190124s606e571ep288130d84a250760@mail.gmail.com> <30dcab0d0906202300n3f64dac5i54b79932bcfcf4fb@mail.gmail.com> Reply-To: Ryan Duan , 3616@emacsbugs.donarmstrong.com NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1245637060 5881 80.91.229.12 (22 Jun 2009 02:17:40 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 22 Jun 2009 02:17:40 +0000 (UTC) To: 3616@emacsbugs.donarmstrong.com Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Jun 22 04:17:36 2009 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1MIZ6G-0000Lz-AU for geb-bug-gnu-emacs@m.gmane.org; Mon, 22 Jun 2009 04:17:36 +0200 Original-Received: from localhost ([127.0.0.1]:36180 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MIZ6F-00049r-Lj for geb-bug-gnu-emacs@m.gmane.org; Sun, 21 Jun 2009 22:17:35 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MIZ69-00046s-Kj for bug-gnu-emacs@gnu.org; Sun, 21 Jun 2009 22:17:29 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MIZ66-00045U-4M for bug-gnu-emacs@gnu.org; Sun, 21 Jun 2009 22:17:29 -0400 Original-Received: from [199.232.76.173] (port=33984 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MIZ65-00045O-Q4 for bug-gnu-emacs@gnu.org; Sun, 21 Jun 2009 22:17:25 -0400 Original-Received: from rzlab.ucr.edu ([138.23.92.77]:41366) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1MIZ65-00046K-4o for bug-gnu-emacs@gnu.org; Sun, 21 Jun 2009 22:17:25 -0400 Original-Received: from rzlab.ucr.edu (rzlab.ucr.edu [127.0.0.1]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id n5M2HMUl007878; Sun, 21 Jun 2009 19:17:23 -0700 Original-Received: (from debbugs@localhost) by rzlab.ucr.edu (8.14.3/8.14.3/Submit) id n5M256Ii005252; Sun, 21 Jun 2009 19:05:06 -0700 X-Loop: owner@emacsbugs.donarmstrong.com Resent-From: Ryan Duan Resent-To: bug-submit-list@donarmstrong.com Resent-CC: Emacs Bugs Resent-Date: Mon, 22 Jun 2009 02:05:06 +0000 Resent-Message-ID: Resent-Sender: owner@emacsbugs.donarmstrong.com X-Emacs-PR-Message: followup 3616 X-Emacs-PR-Package: emacs X-Emacs-PR-Keywords: Original-Received: via spool by 3616-submit@emacsbugs.donarmstrong.com id=B3616.12456361174754 (code B ref 3616); Mon, 22 Jun 2009 02:05:06 +0000 Original-Received: (at 3616) by emacsbugs.donarmstrong.com; 22 Jun 2009 02:01:57 +0000 X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. Original-Received: from qw-out-2122.google.com (qw-out-2122.google.com [74.125.92.25]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id n5M21p3Z004749 for <3616@emacsbugs.donarmstrong.com>; Sun, 21 Jun 2009 19:01:53 -0700 Original-Received: by qw-out-2122.google.com with SMTP id 5so1524012qwd.13 for <3616@emacsbugs.donarmstrong.com>; Sun, 21 Jun 2009 19:01:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=HFPk3hylXN7HopIccR2u+cJ0psVn+vbRDxykNcAziVk=; b=obBSLHQPHpqTt+NHSCa/BvC2qDgGXhLIU8ImuQDDzNibNHObFRLifJvpIQtakRFs5b 81AgudrsqqACU80jSWzfva+kCJ3oUeXxE4LclE0skRq+dsKLi7l/YcFrFs9zxPsYmrtB F/yxAEnzhCFZwOAc35DzoENBpOobhfJKMYCmY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=Ng45vx9bOiE3xEXFGLJr9aTuV6halmIEwHBDRfzQwyKUTQZVWG+/ceV7ao82ovswVg skQvcykpUtsQ+TWOUm44CWz+1/Tf6ypP6FHGSJqscHbpZdtHm6eqRWNWGDues/++7OAD SUg3FlFXSZ5SqsODfqVvwG9kRlKNG42tJUKCc= Original-Received: by 10.224.28.210 with SMTP id n18mr2471978qac.19.1245636111637; Sun, 21 Jun 2009 19:01:51 -0700 (PDT) In-Reply-To: <30dcab0d0906202300n3f64dac5i54b79932bcfcf4fb@mail.gmail.com> X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2) Resent-Date: Sun, 21 Jun 2009 22:17:29 -0400 X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:28889 Archived-At: It works from the command line which is part of Windows XP and uses Windows ANSI coding system. Windows command line seems use cp936 as the coding system. The value of buffer-file-coding-system in the *shell* buffer is chinese-gbk-dos, one of whose alias is cp936-dos. It doesn't help to change it to any of cp936 nor chinese-iso-8bit. I observe that *shell* and *VC-log* buffers pass UTF-8 encoded string (Is Emacs's internal buffer code UTF-8?) to Windows command line, which might be the real cause of this bug and other related bugs. Three examples following. EXAMPLE 1 -------------------------------- In *shell*, d:\code>bzr commit -m "=B5=DA=B6=FE" bzr commit -m "=E7=AC =BA=8C" Traceback (most recent call last): File "bzr", line 130, in File "bzrlib\commands.pyo", line 969, in main bzrlib.errors.BzrError: Parameter ''\xe7\xac\xac\xe4\xba\x8c'' is unsupported by the current encoding. Notice ''\xe7\xac\xac\xe4\xba\x8c'' which is the UTF-8 encoding of my inputted Chinese characters. It was these UTF-8 string causing the above error. Apply C-u C-x =3D on the Chinese character "=B5=DA": character: =B5=DA (31532, #o75454, #x7b2c) preferred charset: chinese-gbk (GBK Chinese simplified.) code point: 0xB5DA syntax: w which means: word category: .:Base, C:2-byte han, c:Chinese, h:Korean, j:Japanese, |:line breakable buffer code: #xE7 #xAC #xAC file code: #xB5 #xDA (encoded by coding system chinese-gbk-dos) display: by this font (glyph code) uniscribe:-outline-=D0=C2=CB=CE=CC=E5-normal-normal-normal-mono-13-*-*-*= -c-*-gb2312.1980-0 (#x3100) Notice its buffer code is "\xe7\xac\xac" which is the first substring of ''\xe7\xac\xac\xe4\xba\x8c''. The file code "\xb5\xda" is chinse-gbk encoded, and is what I expect to pass to the command line system in Windows, which might work correctly. But unfortunately, instead of passing Chinese GBK encoded string to SHELL, Emacs passes UTF-8 encoded string to SHELL. EXAMPLE 2 -------------------------------- In *VC-log* buffer, I inputted two Chinese characters "=B5=DA=B6=FE" which = was the same as that in EXAMPLE 1. After C-c C-c, the same error occurs: bzrlib.errors.BzrError: Parameter ''\xe7\xac\xac\xe4\xba\x8c'' is unsupported by the current encoding. Apply C-u C-x =3D on "=B5=DA" returned the same information as that in EXAM= PLE 1. EXAMPLE 3 (Another related bug) -------------------------------- In Windows, I created a directory (folder) named "=B5=DA=B6=FE". In dired, it works all right. But in *shell*, d:\>cd =B5=DA=B6=FE cd =E7=AC =BA=8C =CF=B5=CD=B3=D5=D2=B2=BB=B5=BD=D6=B8=B6=A8=B5=C4=C2=B7=BE=B6=A1=A3 It complains that the system cannot find the specified path. Because the "\xb5\xda\xb6\xfe"(Chinese GBK) is converted to ''\xe7\xac\xac\xe4\xba\x8c''(UTF-8) to pass to the SHELL, but the SHELL can only process Chinese GBK characters. CONCLUSION -------------------------------- When we use Emacs on Chinese Windows, Chinese GBK characters are converted to UTF-8 characters to pass to Windows command line, but Windows command line cannot process UTF-8 characters, which causes this bug and other related bugs. I feel that this is not a small problem. Emacs should detect the OS's locale, then use the correct encoding system to interact with the OS. It seems to do well on Linux but badly on Windows. Dired seems do well on Windows but shell.el and vc-bzr.el do badly. I didn't test other vc-* modes. I hope the information above will help solve this problem. Thank you! HAPPY HACKING! 2009/6/19 Eli Zaretskii : >> Date: Fri, 19 Jun 2009 16:24:37 +0800 >> From: =3D?UTF-8?Q?=3DE7=3DAB=3DAF=3DE7=3D91=3D9E?=3D >> Cc: >> Reply-To: =3D?UTF-8?Q?=3DE7=3DAB=3DAF=3DE7=3D91=3D9E?=3D , >> 3616@emacsbugs.donarmstrong.com > Does it work for you from the command line? If it does, what encoding > of Chinese do you use in that case? > > What is the value of buffer-file-coding-system in the *shell* buffer? > Does it help to change it to cp936?