​ 2016-01-06 17:13 GMT+01:00 Eli Zaretskii : > > From: Klaus-Dieter Bauer > > Date: Wed, 6 Jan 2016 16:20:29 +0100 > > > > Is there a reliable way to pass unicode file names as > > arguments through `start-process'? > > No, not at the moment, not in the native Windows build of Emacs. > Arguments to subprocesses are forced to be encoded in the current > system codepage. This commentary in w32.c tells a few more details: > > . Running subprocesses in non-ASCII directories and with non-ASCII > file arguments is limited to the current codepage (even though > Emacs is perfectly capable of finding an executable program file > in a directory whose name cannot be encoded in the current > codepage). This is because the command-line arguments are > encoded _before_ they get to the w32-specific level, and the > encoding is not known in advance (it doesn't have to be the > current ANSI codepage), so w32proc.c functions cannot re-encode > them in UTF-16. This should be fixed, but will also require > changes in cmdproxy. The current limitation is not terribly bad > anyway, since very few, if any, Windows console programs that are > likely to be invoked by Emacs support UTF-16 encoded command > lines. > > . For similar reasons, server.el and emacsclient are also limited > to the current ANSI codepage for now. > > . Emacs itself can only handle command-line arguments encoded in > the current codepage. > > The main reason for this being a low-priority problem is that the > absolute majority of console programs Emacs might invoke don't support > UTF-16 encoded command-line arguments anyway, so the efforts to enable > this would yield very little gains. However, patches to do that will > be welcome. (Note that, as the comment above says, the changes will > also need to touch cmdproxy, since we invoke all the programs through > it.) > > > I realized two limitations: > > > > 1. Using `prefer-coding-system' with anything other than > > `locale-default-encoding', e.g. > > (prefer-coding-system 'utf-8), > > causes a file name "Ö.txt" to be misdecoded as by > > subprocesses -- notably including "emacs.exe", but also > > all other executables I tried (both Windows builtins like > > where.exe and third party executables like ffmpeg.exe or > > GnuWin32 utilities). > > In my case (German locale, 'utf-8 preferred coding > > system) it is mis-decoded as "Ö.txt", i.e. emacs encodes > > the process argument as 'utf-8 but the subprocess decodes > > it as 'latin-1 (in my case). > > While this can be fixed by an explicit encoding > > (start-process ... > > (encode-coding-string filename locale-coding-system)) > > such code will probably not be used in most projects, as > > the issue occurs only on Windows, dependent on the user > > configuration (-> hard-to-find bug?). I have added some > > elisp for demonstration at the end of the mail. > > > > 2. When a file-name contains characters that cannot be > > encoded in the locale's encoding, e.g. Japanese > > characters in a German locale, I cannot find any way to > > pass the file name through the `start-process' interface; > > Unlike for characters, that are supported by the locale, > > it fails even in a clean "emacs -Q" session. > > Curiously the file name can still be used in cmd.exe, > > though entering it may require TAB-completion (even > > though the active codepage shouldn't support them). > > Does the program which you invoke support UTF-16 encoded command-line > arguments? It would need to either use '_wmain' instead of 'main', or > access the command-line arguments via GetCommandLineW or such likes, > and process them as wchar_t strings. > > If the program doesn't have these capabilities, it won't help that > Emacs passes it UTF-16 encoded arguments, because Windows will attempt > to convert them to strings encoded in the current codepage, and will > replace any un-encodable characters with question marks or blanks. > > > ;; Set the preferred coding system. > > (prefer-coding-system 'utf-8) > > You cannot use UTF-8 to encode command-line arguments on Windows, not > in general, even if the program you invoke does understand UTF-8 > strings as its command-line arguments. (I can explain if you want.) > > > ;; On Unix (tested with cygwin), it works fine; Presumably because > > ;; the file name is decoded (in `directory-files') and encoded (in > > ;; `start-process') with the same preferred coding system. > > It works with Cygwin because Cygwin does support UTF-8 for passing > strings to subprograms. That support lives inside the Cygwin DLL, > which replaces most of the Windows runtime with Posix-compatible > APIs. The native Windows build of Emacs doesn't have that luxury. > I checked again and found that indeed some of the utilities I tested before (specifically the GnuWin32 tools) can't handle japanese characters when called from cmd.exe; ffmpeg on the other hand supports unicode file names in cmd.exe, but I agree that this is quite a niche usage. I thought up some workarounds, but they all run into limitations: - w32-short-file-name: Doesn't work, because in modern Windows systems 8.3 file names may not be generated, so it may just return the unchanged filename. - rename-file: Allows working with a name via a temporary supported file name. Sadly there is no way to guarantee that such renaming is undone afterwards. - copy-file (to a temporary directory): Would work for the current application, but unviable when larger amounts of data are involved. Would you happen to know any other possible workaround? thanks for the explanations, - Klaus