From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: on eshell's encoding Date: Wed, 27 Jul 2016 19:22:05 +0300 Message-ID: <83invrrqz6.fsf@gnu.org> References: <86fuqw5vd8.fsf@toledo.com> <864m7c5oqc.fsf@toledo.com> <86zip3uweo.fsf@toledo.com> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1469636587 16750 80.91.229.3 (27 Jul 2016 16:23:07 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 27 Jul 2016 16:23:07 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Jul 27 18:23:02 2016 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1bSRbq-0000wh-8s for geh-help-gnu-emacs@m.gmane.org; Wed, 27 Jul 2016 18:23:02 +0200 Original-Received: from localhost ([::1]:47509 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bSRbp-0004nd-Bq for geh-help-gnu-emacs@m.gmane.org; Wed, 27 Jul 2016 12:23:01 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52114) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bSRbA-0004ln-Qg for help-gnu-emacs@gnu.org; Wed, 27 Jul 2016 12:22:22 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bSRb6-000304-F1 for help-gnu-emacs@gnu.org; Wed, 27 Jul 2016 12:22:20 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:58958) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bSRb6-0002zn-Ba for help-gnu-emacs@gnu.org; Wed, 27 Jul 2016 12:22:16 -0400 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:3071 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bSRb4-0006eR-81 for help-gnu-emacs@gnu.org; Wed, 27 Jul 2016 12:22:14 -0400 In-reply-to: (message from Yuri Khan on Wed, 27 Jul 2016 19:15:45 +0600) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:110948 Archived-At: > From: Yuri Khan > Date: Wed, 27 Jul 2016 19:15:45 +0600 > Cc: "help-gnu-emacs@gnu.org" > > On Wed, Jul 27, 2016 at 6:56 PM, Daniel Bastos wrote: > > > I meant not being messed with. I don't know anything about MS-Windows. > > In UNIX the creation of a new process by a shell is likely to call > > execve, which won't touch the caller strings passed in through the > > argv-argument. > > Well Windows is a different beast entirely. The basic premise is the > same, in that the parent invokes CreateProcessW, passing a > UTF-16-encoded command line, and the child process invokes > GetCommandLineW and then optionally CommandLineToArgvW to split the > command line into arguments. So it isn't a different beast, really. Both on Unix and on Windows, Emacs encodes the command line before passing it to system APIs. The details differ, but not the basic idea. > Problem is, most programs prefer to work internally with 8-bit-based > encodings, and the Win32 API makes it very easy by providing backward > compatibility wrapper functions CreateProcessA and GetCommandLineA, > which unfortunately convert from/to the ANSI or OEM encoding defined > by the locale. Nitpicking: always ANSI, never the OEM. > And there is no Win32 locale for which UTF-8 is either the ANSI or > the OEM encoding. It's actually worse than that: the Windows locale implementation doesn't support variable-length encodings, so UTF-8 cannot be a locale's encoding, unless MS change their related runtime libraries in a radical way. > This one point makes it very difficult to use Windows in the Unix Way: > you get to worry about encodings on every process boundary. Same on Unix, unless you are willing to bet on UTF-8 being the locale's codeset.