all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Klaus-Dieter Bauer <bauer.klaus.dieter@gmail.com>
Cc: emacs-devel@gnu.org
Subject: Re: Passing unicode filenames to start-process on Windows?
Date: Wed, 06 Jan 2016 18:13:35 +0200	[thread overview]
Message-ID: <83si2a3cuo.fsf@gnu.org> (raw)
In-Reply-To: <CANtbJLHOJOsy+CfgsvgCYN-aA6Ur1UYuRENPREpsRW_JaSJpDg@mail.gmail.com> (message from Klaus-Dieter Bauer on Wed, 6 Jan 2016 16:20:29 +0100)

> From: Klaus-Dieter Bauer <bauer.klaus.dieter@gmail.com>
> Date: Wed, 6 Jan 2016 16:20:29 +0100
> 
> Is there a reliable way to pass unicode file names as
> arguments through `start-process'?

No, not at the moment, not in the native Windows build of Emacs.
Arguments to subprocesses are forced to be encoded in the current
system codepage.  This commentary in w32.c tells a few more details:

   . Running subprocesses in non-ASCII directories and with non-ASCII
     file arguments is limited to the current codepage (even though
     Emacs is perfectly capable of finding an executable program file
     in a directory whose name cannot be encoded in the current
     codepage).  This is because the command-line arguments are
     encoded _before_ they get to the w32-specific level, and the
     encoding is not known in advance (it doesn't have to be the
     current ANSI codepage), so w32proc.c functions cannot re-encode
     them in UTF-16.  This should be fixed, but will also require
     changes in cmdproxy.  The current limitation is not terribly bad
     anyway, since very few, if any, Windows console programs that are
     likely to be invoked by Emacs support UTF-16 encoded command
     lines.

   . For similar reasons, server.el and emacsclient are also limited
     to the current ANSI codepage for now.

   . Emacs itself can only handle command-line arguments encoded in
     the current codepage.

The main reason for this being a low-priority problem is that the
absolute majority of console programs Emacs might invoke don't support
UTF-16 encoded command-line arguments anyway, so the efforts to enable
this would yield very little gains.  However, patches to do that will
be welcome.  (Note that, as the comment above says, the changes will
also need to touch cmdproxy, since we invoke all the programs through
it.)

> I realized two limitations:
> 
> 1. Using `prefer-coding-system' with anything other than
> `locale-default-encoding', e.g. 
> (prefer-coding-system 'utf-8), 
> causes a file name "Ö.txt" to be misdecoded as by
> subprocesses -- notably including "emacs.exe", but also
> all other executables I tried (both Windows builtins like
> where.exe and third party executables like ffmpeg.exe or
> GnuWin32 utilities). 
> In my case (German locale, 'utf-8 preferred coding
> system) it is mis-decoded as "Ö.txt", i.e. emacs encodes
> the process argument as 'utf-8 but the subprocess decodes
> it as 'latin-1 (in my case).
> While this can be fixed by an explicit encoding 
> (start-process ... 
> (encode-coding-string filename locale-coding-system))
> such code will probably not be used in most projects, as
> the issue occurs only on Windows, dependent on the user
> configuration (-> hard-to-find bug?). I have added some
> elisp for demonstration at the end of the mail.
> 
> 2. When a file-name contains characters that cannot be
> encoded in the locale's encoding, e.g. Japanese
> characters in a German locale, I cannot find any way to
> pass the file name through the `start-process' interface; 
> Unlike for characters, that are supported by the locale, 
> it fails even in a clean "emacs -Q" session. 
> Curiously the file name can still be used in cmd.exe,
> though entering it may require TAB-completion (even
> though the active codepage shouldn't support them).

Does the program which you invoke support UTF-16 encoded command-line
arguments?  It would need to either use '_wmain' instead of 'main', or
access the command-line arguments via GetCommandLineW or such likes,
and process them as wchar_t strings.

If the program doesn't have these capabilities, it won't help that
Emacs passes it UTF-16 encoded arguments, because Windows will attempt
to convert them to strings encoded in the current codepage, and will
replace any un-encodable characters with question marks or blanks.

> ;; Set the preferred coding system. 
> (prefer-coding-system 'utf-8)

You cannot use UTF-8 to encode command-line arguments on Windows, not
in general, even if the program you invoke does understand UTF-8
strings as its command-line arguments.  (I can explain if you want.)

> ;; On Unix (tested with cygwin), it works fine; Presumably because
> ;; the file name is decoded (in `directory-files') and encoded (in
> ;; `start-process') with the same preferred coding system.

It works with Cygwin because Cygwin does support UTF-8 for passing
strings to subprograms.  That support lives inside the Cygwin DLL,
which replaces most of the Windows runtime with Posix-compatible
APIs.  The native Windows build of Emacs doesn't have that luxury.



  reply	other threads:[~2016-01-06 16:13 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-06 15:20 Passing unicode filenames to start-process on Windows? Klaus-Dieter Bauer
2016-01-06 16:13 ` Eli Zaretskii [this message]
2016-01-06 21:19   ` Klaus-Dieter Bauer
2016-01-06 23:05     ` Davis Herring
2016-01-07  3:36       ` Eli Zaretskii
2016-01-07 16:00     ` Eli Zaretskii
2016-01-07 23:31       ` Klaus-Dieter Bauer
2016-01-08  9:17         ` Eli Zaretskii
2016-01-08 20:01           ` Klaus-Dieter Bauer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83si2a3cuo.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=bauer.klaus.dieter@gmail.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.