From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Klaus-Dieter Bauer Newsgroups: gmane.emacs.devel Subject: Passing unicode filenames to start-process on Windows? Date: Wed, 6 Jan 2016 16:20:29 +0100 Message-ID: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=089e0141a3ecb1a6e20528abe623 X-Trace: ger.gmane.org 1452093683 9994 80.91.229.3 (6 Jan 2016 15:21:23 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Wed, 6 Jan 2016 15:21:23 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Jan 06 16:21:22 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1aGptm-0003ky-AW for ged-emacs-devel@m.gmane.org; Wed, 06 Jan 2016 16:21:18 +0100 Original-Received: from localhost ([::1]:54730 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aGptl-00089g-Cb for ged-emacs-devel@m.gmane.org; Wed, 06 Jan 2016 10:21:17 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:42475) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aGptV-00089I-5h for emacs-devel@gnu.org; Wed, 06 Jan 2016 10:21:02 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aGptT-0007f7-Oo for emacs-devel@gnu.org; Wed, 06 Jan 2016 10:21:01 -0500 Original-Received: from mail-wm0-x22e.google.com ([2a00:1450:400c:c09::22e]:37509) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aGptT-0007f0-E1 for emacs-devel@gnu.org; Wed, 06 Jan 2016 10:20:59 -0500 Original-Received: by mail-wm0-x22e.google.com with SMTP id f206so80275888wmf.0 for ; Wed, 06 Jan 2016 07:20:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=7NH1KbC3FVUqReTZXiGU7CNgvJjM2IX++bpMIfDJvDw=; b=YS9sGB8dl+ByZITdO9MPlvxJUCzUIDfqS8MA/dl/Gw2VhertQiq0RIj3i7pt9C5uYT qxhpo0gEuT1HEy4C3+DpfOu2O3a2SkFM611XoYVXliUOzqu1lI1J2JrUsMgGxsOxA4qR HjwrRz/qPApqKLgSqi+SMlr/E2/zXCCwvHw1BY2zGC8RNZ7chPm7hfSbej9vvw39MOE9 w3gN0RvCngOan2uLBH5yKbKaQ1gmaf3/Jk61sobBAI84tO57hhbLX5PPQbnyzAKQKR29 7Iotlt/3Jy0Mpu2SMtSg4cAuVzEGbRG6jDTkbZ1Mpinz4ne6opsPwhpkQYe1uO2qp4g+ 9wfw== X-Received: by 10.194.179.71 with SMTP id de7mr106436788wjc.119.1452093658472; Wed, 06 Jan 2016 07:20:58 -0800 (PST) Original-Received: by 10.27.12.104 with HTTP; Wed, 6 Jan 2016 07:20:29 -0800 (PST) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2a00:1450:400c:c09::22e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:197699 Archived-At: --089e0141a3ecb1a6e20528abe623 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello! Is there a reliable way to pass unicode file names as arguments through `start-process'? I realized two limitations: 1. Using `prefer-coding-system' with anything other than `locale-default-encoding', e.g. (prefer-coding-system 'utf-8), causes a file name "=C3=96.txt" to be misdecoded as by subprocesses -- notably including "emacs.exe", but also all other executables I tried (both Windows builtins like where.exe and third party executables like ffmpeg.exe or GnuWin32 utilities). In my case (German locale, 'utf-8 preferred coding system) it is mis-decoded as "=C3=83=E2=80=93.txt", i.e. emacs encodes the process argument as 'utf-8 but the subprocess decodes it as 'latin-1 (in my case). While this can be fixed by an explicit encoding (start-process ... (encode-coding-string filename locale-coding-system)) such code will probably not be used in most projects, as the issue occurs only on Windows, dependent on the user configuration (-> hard-to-find bug?). I have added some elisp for demonstration at the end of the mail. 2. When a file-name contains characters that cannot be encoded in the locale's encoding, e.g. Japanese characters in a German locale, I cannot find any way to pass the file name through the `start-process' interface; Unlike for characters, that are supported by the locale, it fails even in a clean "emacs -Q" session. Curiously the file name can still be used in cmd.exe, though entering it may require TAB-completion (even though the active codepage shouldn't support them). - Klaus ---------------- EXAMPLE CODE -------------------- ;; Setup: Create a file "unifilebug/=C3=96.txt" with ;; some arbitrary text. Make sure it is the only file in ;; "unifilebug". ;; ;; Note that for this issue it doesn't matter what coding system ;; is chosen for file names (Unix only; On Windows the coding ;; system for file names is fixed anyway.) ;; Set the preferred coding system. (prefer-coding-system 'utf-8) ;; Try opening it in an emacs subprocess. ;; ;; On Windows this breaks ;; if `prefer-coding-system' was called with anything other than ;; `locale-coding-system', here 'utf-8. ;; ;; On Unix (tested with cygwin), it works fine; Presumably because ;; the file name is decoded (in `directory-files') and encoded (in ;; `start-process') with the same preferred coding system. (let ((file-name (car (directory-files "unifilebug" t "txt$")))) (start-process "" nil "emacs" "-Q" file-name)) ;; It can be fixed by explicitly encoding file-names. This ;; thankfully works both in the W32 and the Cygwin version of ;; emacs. (let ((file-name (car (directory-files "unifilebug" t "txt$")))) (start-process "" nil "emacs" "-Q" (encode-coding-string file-name locale-coding-system))) ;; Now we create a file called "ufb2/=E3=81=93=E3=82=93=E3=81=AB=E3=81=A1= =E3=81=AF=E4=B8=96=E7=95=8C.txt" ;; Even in a emacs-session without prefer-coding-system it will ;; fail, decoding the file-name as "ufb2/ .txt". (let ((file-name (car (directory-files "ufb2" t "txt$")))) (start-process "" nil "emacs" "-Q" file-name)) -------------------------------------------------- --089e0141a3ecb1a6e20528abe623 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hello!=C2=A0

Is there a reliable way to pass unicode fi= le names as
arguments through `start-process'?

I realized tw= o limitations:


1. Using `prefer-coding-system' with a= nything other than
=C2=A0 =C2=A0`locale-default-encoding', e.g.=C2=A0=
=C2=A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0(prefer-coding-system 'ut= f-8),=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0
=C2=A0 =C2=A0causes a file name= "=C3=96.txt" to be misdecoded as by
=C2=A0 =C2=A0subprocesses = -- notably including "emacs.exe", but also
=C2=A0 =C2=A0all oth= er executables I tried (both Windows builtins like
=C2=A0 =C2=A0where.e= xe and third party executables like ffmpeg.exe or
=C2=A0 =C2=A0GnuWin32 u= tilities).=C2=A0
=C2=A0 =C2=A0
<= font face=3D"monospace, monospace">=C2=A0 =C2=A0In my case (German locale, = 'utf-8 preferred coding
=C2=A0 =C2=A0system) it is mis-decoded as &qu= ot;=C3=83=E2=80=93.txt", i.e. emacs encodes
=C2=A0 =C2=A0the process= argument as 'utf-8 but the subprocess decodes
=C2=A0 =C2=A0it as &= #39;latin-1 (in my case).
=C2=A0 =C2=A0
=C2=A0 =C2=A0While this can be = fixed by an explicit encoding=C2=A0
=C2=A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0= =C2=A0(start-process ...=C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(encode= -coding-string filename locale-coding-system))
=C2=A0 =C2=A0
=
=C2=A0 =C2= =A0such code will probably not be used in most projects, as
=C2=A0 =C2=A0= the issue occurs only on Windows, dependent on the user
=C2=A0 =C2=A0conf= iguration (-> hard-to-find bug?). I have added some
=C2=A0 =C2=A0elisp= for demonstration at the end of the mail.

=C2=A0 =C2=A0 =C2=A0
= 2. When a file-name contains characters that cannot be
=C2=A0 =C2=A0encod= ed in the locale's encoding, e.g. Japanese
=C2=A0 =C2=A0characters in= a German locale, I cannot find any way to
=C2=A0 =C2=A0pass the file nam= e through the `start-process' interface;=C2=A0
=C2=A0 =C2=A0Unlike = for characters, that are supported by the locale,=C2=A0
=C2=A0 =C2=A0it f= ails even in a clean "emacs -Q" session.=C2=A0
=C2=A0 =C2=A0
= =C2=A0 =C2=A0Curiously the file name can still be used in cmd.exe,
=C2=A0= =C2=A0though entering it may require TAB-completion (even
=C2=A0 =C2=A0t= hough the active codepage shouldn't support them).

<= div class=3D"gmail_default">
=
- Kl= aus


---------------- EXAMPLE CODE --------------------
;; Setup: Create a file "unifilebug/=C3=96.txt" with<= /div>
;; so= me arbitrary text. Make sure it is the only file in
;; "unifilebug&= quot;.=C2=A0
;;=C2=A0
;; Note that for this issue it doesn't matter= what coding system
;; is chosen for file names (Unix only; On Windows th= e coding
;; system for file names is fixed anyway.)


<= /div>
;; Se= t the preferred coding system.=C2=A0
(prefer-coding-system 'utf-8)
=

;; Try opening it in an emacs subprocess.=C2=A0
<= div class=3D"gmail_default">;;=C2=A0
= ;; On Windows this breaks
;; if `prefer-coding-system' was called wit= h anything other than
;; `locale-coding-system', here 'utf-8.= =C2=A0
;;=C2=A0
;; On Unix (tested with cygwin), it works fine; Presuma= bly because
;; the file name is decoded (in `directory-files') and en= coded (in
;; `start-process') with the same preferred coding system.<= /font>
=C2=A0 (start-process "" nil "emacs" &quo= t;-Q" file-name))


;; It can be fixed by explicitly en= coding file-names. This
;; thankfully works both in the W32 and the Cygwi= n version of
;; emacs.
(let ((file-name (car (directory-files "uni= filebug" t "txt$"))))
=C2=A0 (start-process "" n= il "emacs" "-Q"=C2=A0
=C2=A0 =C2=A0 (encode-coding-st= ring file-name locale-coding-system)))


;; Now we create a f= ile called "ufb2/=E3=81=93=E3=82=93=E3=81=AB=E3=81=A1=E3=81=AF=E4=B8= =96=E7=95=8C.txt"
;; Even in a emacs-session without prefer-coding-= system it will
;; fail, decoding the file-name as "ufb2/ .txt".=
(let ((file-name (car (directory-files "ufb2" t "txt$&quo= t;)))) =C2=A0 =C2=A0 =C2=A0
=C2=A0 (start-process "" nil "= emacs" "-Q" file-name))


--------------------= ------------------------------

--089e0141a3ecb1a6e20528abe623--