From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#28180: [w32] Unicode characters in subprocess (git) arguments changed to space Date: Tue, 22 Aug 2017 17:54:59 +0300 Message-ID: <83y3qbabzw.fsf@gnu.org> References: <87d17oba8j.fsf@users.sourceforge.net> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1503415357 7262 195.159.176.226 (22 Aug 2017 15:22:37 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 22 Aug 2017 15:22:37 +0000 (UTC) Cc: 28180@debbugs.gnu.org To: npostavs@users.sourceforge.net Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Tue Aug 22 17:22:33 2017 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkB0Y-0001Gy-Dw for geb-bug-gnu-emacs@m.gmane.org; Tue, 22 Aug 2017 17:22:22 +0200 Original-Received: from localhost ([::1]:53681 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dkB0f-0001HA-3r for geb-bug-gnu-emacs@m.gmane.org; Tue, 22 Aug 2017 11:22:29 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:48985) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dkAb7-0002FF-Pm for bug-gnu-emacs@gnu.org; Tue, 22 Aug 2017 10:56:06 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dkAb4-00027D-Kr for bug-gnu-emacs@gnu.org; Tue, 22 Aug 2017 10:56:05 -0400 Original-Received: from debbugs.gnu.org ([208.118.235.43]:41022) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1dkAb4-00026H-IJ for bug-gnu-emacs@gnu.org; Tue, 22 Aug 2017 10:56:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1dkAb4-0007qB-93 for bug-gnu-emacs@gnu.org; Tue, 22 Aug 2017 10:56:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 22 Aug 2017 14:56:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 28180 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 28180-submit@debbugs.gnu.org id=B28180.150341371330084 (code B ref 28180); Tue, 22 Aug 2017 14:56:02 +0000 Original-Received: (at 28180) by debbugs.gnu.org; 22 Aug 2017 14:55:13 +0000 Original-Received: from localhost ([127.0.0.1]:49703 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkAaH-0007pA-Lu for submit@debbugs.gnu.org; Tue, 22 Aug 2017 10:55:13 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:57464) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dkAaG-0007oy-Ct for 28180@debbugs.gnu.org; Tue, 22 Aug 2017 10:55:12 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dkAa8-0000e4-03 for 28180@debbugs.gnu.org; Tue, 22 Aug 2017 10:55:07 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:59673) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dkAa7-0000dx-So; Tue, 22 Aug 2017 10:55:03 -0400 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2450 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1dkAa7-0002LU-7V; Tue, 22 Aug 2017 10:55:03 -0400 In-reply-to: <87d17oba8j.fsf@users.sourceforge.net> (npostavs@users.sourceforge.net) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 208.118.235.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:136042 Archived-At: > From: npostavs@users.sourceforge.net > Date: Mon, 21 Aug 2017 22:35:24 -0400 > > In w32.c there is a comment saying > > . Running subprocesses in non-ASCII directories and with non-ASCII > file arguments is limited to the current codepage [...] > This should be fixed, but will also require changes in cmdproxy. > The current limitation is not terribly bad anyway, since very > few, if any, Windows console programs that are likely to be > invoked by Emacs support UTF-16 encoded command lines. > > I believe we're running into this limitation with git: staging a file > named 好.txt fails from magit[1] (I tried also with vc, same problem). > A quick way to see the problem is evaluating the call-process form > below, the output shows that the Chinese character has been transformed > into a space. I'd expect that in a non-Chinese locale (which I believe was what you did), but the OP of the Magit issue has Windows set up for a Chinese locale, so there has to be some other explanation, because passing Chinese characters on the command line ought to work in that case. > Am I correct that this problem is related the w32.c comment? The comment is accurate, but it can only explain why command-line arguments with characters outside of the current Windows locale cannot be safely passed to sub-processes. Which AFAIU is not the case with the OP of that Magit issue. > It's not clear to me what changes are needed in cmdproxy (and other > places?) to address it. cmdproxy is not involved in call-process, but it is involved in shell-command and its ilk. As it makes no sense to support Unicode in the former, but not in the latter, if we want to lift this limitation, we must teach cmdproxy to use "wide" APIs both for receiving command-line arguments from Emacs and for passing them to programs it invokes. As to the "other places", the only problem I'm aware of is that the encoding of the command-line arguments, when they arrive at w32proc.c, is not known in advance, so this must be somehow fixed/changed, otherwise we will be unable to re-encode them in UTF-16. I believe the comment in w32.c does mention that.