all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: help-gnu-emacs@gnu.org
Subject: Re: on eshell's encoding
Date: Tue, 02 Aug 2016 18:12:19 +0300	[thread overview]
Message-ID: <83poprmch8.fsf@gnu.org> (raw)
In-Reply-To: <8637mnwbfz.fsf@toledo.com> (message from Daniel Bastos on Tue, 02 Aug 2016 10:24:32 -0300)

> From: Daniel Bastos <dbastos@toledo.com>
> Date: Tue, 02 Aug 2016 10:24:32 -0300
> 
> > Like I said, Eshell is not a shell, it just pretends to be one.  It
> > will eventually cause execve, or something like it, to be called, but
> > before it, the command-line arguments will be encoded in the locale's
> > encoding, since that's what execve expects.  This is true on Windows
> > and on Unix alike.  
> 
> That's true of EMACS.  You're saying EMACS always encodes the command
> line arguments.  But what I said about UNIX is that whatever execve
> receives in argv[] will remain as such, which apparently is not the
> MS-Windows behavior.
> 
> Precisely: if on UNIX I use EMACS to call /program/ with argv[] encoded
> in X, then /program/ will definitely receive its argv[] as prepared by
> EMACS.  That does not happen on MS-Windows.  EMACS encodes the command
> line in utf-8, but /program/ receives it in another encoding.

That's not true.  Emacs encodes the command line passed to
subprocesses on Windows and Unix alike.  On each OS, it always encodes
them in the locale's codeset.  If the Unix locale specified UTF-8 as
its codeset, then the command line will be encoded in UTF-8, but
that's no more than a coincidence.  (On Windows, the locale's codeset,
a.k.a. "system codepage", can never be UTF-8, but that's the only
difference between Unix and Windows wrt encoding command lines of
subprocesses by Emacs.)

So, as long as you launch processes from Emacs, the difference between
Windows and Unix in this respect is all but non-existent.

The difference between the 2 OSes comes into play when you put
arbitrary byte sequences into argv[] passed to execve etc.  (This
cannot be easily done in Emacs, but you can do that in your own
programs.)  If those bytes are not valid for the locale's codeset,
Unix will nevertheless pass them verbatim to the subprogram.  By
contrast, Windows will convert those bytes to UTF-16, assuming they
are in the current locale's codeset, then convert back to that codeset
when it invokes the subprogram.  This conversion is lossy when the
bytes are not valid for the locale, as Windows will replace the
invalid bytes with either their close equivalents or with blanks or
with question marks.  (When these bytes are all valid in the current
locale, this conversion happens as well, but it's not lossy, and
therefore its effect is exactly as on Unix.)

> This surprises me.  MS-Windows should not care what a program puts in
> argv[].

It cares, because it attempts to transparently support both Unicode
programs, which expect their arguments in UTF-16, and non-Unicode
programs which expect their arguments in the locale's codeset.

> I think it violates an important principle: an operating system
> should help programs to communicate, but it should not care what they're
> saying to each other.  That's an important principle UNIX has given us.

Clearly, Unix and Windows differ in their philosophy in this regard.
Each alternative has its advantages and disadvantages; which one you
like better is up to you.



      reply	other threads:[~2016-08-02 15:12 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-26 14:25 on eshell's encoding Daniel Bastos
2016-07-26 15:05 ` Eli Zaretskii
     [not found] ` <mailman.2058.1469545530.26859.help-gnu-emacs@gnu.org>
2016-07-26 16:49   ` Daniel Bastos
2016-07-26 17:17     ` Eli Zaretskii
2016-07-26 18:26       ` Yuri Khan
2016-07-26 18:35         ` Eli Zaretskii
     [not found]     ` <mailman.2074.1469553449.26859.help-gnu-emacs@gnu.org>
2016-07-27 11:56       ` Daniel Bastos
2016-07-27 13:15         ` Yuri Khan
2016-07-27 16:22           ` Eli Zaretskii
2016-07-27 16:47             ` Yuri Khan
2016-07-27 17:12               ` Eli Zaretskii
2016-07-27 16:14         ` Eli Zaretskii
     [not found]         ` <mailman.2119.1469636078.26859.help-gnu-emacs@gnu.org>
2016-08-02 13:24           ` Daniel Bastos
2016-08-02 15:12             ` Eli Zaretskii [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83poprmch8.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.