From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: on eshell's encoding Date: Tue, 02 Aug 2016 18:12:19 +0300 Message-ID: <83poprmch8.fsf@gnu.org> References: <86fuqw5vd8.fsf@toledo.com> <864m7c5oqc.fsf@toledo.com> <86zip3uweo.fsf@toledo.com> <8637mnwbfz.fsf@toledo.com> NNTP-Posting-Host: blaine X-Trace: blaine.gmane.org 1470150798 24355 195.159.176.226 (2 Aug 2016 15:13:18 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 2 Aug 2016 15:13:18 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue Aug 02 17:13:14 2016 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bUbNX-0005kB-6p for geh-help-gnu-emacs@m.gmane.org; Tue, 02 Aug 2016 17:13:11 +0200 Original-Received: from localhost ([::1]:57219 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bUbNT-0005lF-QN for geh-help-gnu-emacs@m.gmane.org; Tue, 02 Aug 2016 11:13:07 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:55488) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bUbN0-0005jI-Lo for help-gnu-emacs@gnu.org; Tue, 02 Aug 2016 11:12:39 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bUbMw-00045W-FZ for help-gnu-emacs@gnu.org; Tue, 02 Aug 2016 11:12:37 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:33289) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bUbMw-00045P-CK for help-gnu-emacs@gnu.org; Tue, 02 Aug 2016 11:12:34 -0400 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2473 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bUbMu-00074Z-Hu for help-gnu-emacs@gnu.org; Tue, 02 Aug 2016 11:12:33 -0400 In-reply-to: <8637mnwbfz.fsf@toledo.com> (message from Daniel Bastos on Tue, 02 Aug 2016 10:24:32 -0300) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:111047 Archived-At: > From: Daniel Bastos > Date: Tue, 02 Aug 2016 10:24:32 -0300 > > > Like I said, Eshell is not a shell, it just pretends to be one. It > > will eventually cause execve, or something like it, to be called, but > > before it, the command-line arguments will be encoded in the locale's > > encoding, since that's what execve expects. This is true on Windows > > and on Unix alike. > > That's true of EMACS. You're saying EMACS always encodes the command > line arguments. But what I said about UNIX is that whatever execve > receives in argv[] will remain as such, which apparently is not the > MS-Windows behavior. > > Precisely: if on UNIX I use EMACS to call /program/ with argv[] encoded > in X, then /program/ will definitely receive its argv[] as prepared by > EMACS. That does not happen on MS-Windows. EMACS encodes the command > line in utf-8, but /program/ receives it in another encoding. That's not true. Emacs encodes the command line passed to subprocesses on Windows and Unix alike. On each OS, it always encodes them in the locale's codeset. If the Unix locale specified UTF-8 as its codeset, then the command line will be encoded in UTF-8, but that's no more than a coincidence. (On Windows, the locale's codeset, a.k.a. "system codepage", can never be UTF-8, but that's the only difference between Unix and Windows wrt encoding command lines of subprocesses by Emacs.) So, as long as you launch processes from Emacs, the difference between Windows and Unix in this respect is all but non-existent. The difference between the 2 OSes comes into play when you put arbitrary byte sequences into argv[] passed to execve etc. (This cannot be easily done in Emacs, but you can do that in your own programs.) If those bytes are not valid for the locale's codeset, Unix will nevertheless pass them verbatim to the subprogram. By contrast, Windows will convert those bytes to UTF-16, assuming they are in the current locale's codeset, then convert back to that codeset when it invokes the subprogram. This conversion is lossy when the bytes are not valid for the locale, as Windows will replace the invalid bytes with either their close equivalents or with blanks or with question marks. (When these bytes are all valid in the current locale, this conversion happens as well, but it's not lossy, and therefore its effect is exactly as on Unix.) > This surprises me. MS-Windows should not care what a program puts in > argv[]. It cares, because it attempts to transparently support both Unicode programs, which expect their arguments in UTF-16, and non-Unicode programs which expect their arguments in the locale's codeset. > I think it violates an important principle: an operating system > should help programs to communicate, but it should not care what they're > saying to each other. That's an important principle UNIX has given us. Clearly, Unix and Windows differ in their philosophy in this regard. Each alternative has its advantages and disadvantages; which one you like better is up to you.