unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#28180: [w32] Unicode characters in subprocess (git) arguments changed to space
@ 2017-08-22  2:35 npostavs
  2017-08-22 14:54 ` Eli Zaretskii
  0 siblings, 1 reply; 5+ messages in thread
From: npostavs @ 2017-08-22  2:35 UTC (permalink / raw)
  To: 28180

[-- Attachment #1: Type: text/plain, Size: 1396 bytes --]

In w32.c there is a comment saying

   . Running subprocesses in non-ASCII directories and with non-ASCII
     file arguments is limited to the current codepage [...]
     This should be fixed, but will also require changes in cmdproxy.
     The current limitation is not terribly bad anyway, since very
     few, if any, Windows console programs that are likely to be
     invoked by Emacs support UTF-16 encoded command lines.

I believe we're running into this limitation with git: staging a file
named 好.txt fails from magit[1] (I tried also with vc, same problem).
A quick way to see the problem is evaluating the call-process form
below, the output shows that the Chinese character has been transformed
into a space.  This happens even if I execute 'chcp 65001' before
starting Emacs (a possible workaround I saw suggested in a few places).
The short file name trick doesn't help either.

(call-process "git" nil '(t t) nil
              "-c" "alias.x=!x() { printf '%s' \"$1\" | od -tx1; }; x" "x" "(好)")
0000000 28 20 29
0000003

As far as I can tell, git does support UTF-16 encoded command lines, as
demonstrated by the attached git-args.c, which produces the utf8
encoding of the character (this is also what the call-process form
produces when I run it on GNU/Linux):

C:\Users\npostavs\src\win32-args>.\git-args.exe
0000000 28 e5 a5 bd 29
0000005


[-- Attachment #2: test program, C source --]
[-- Type: text/plain, Size: 883 bytes --]

#include <windows.h>
#include <stdio.h>

int main(int argc, char **argv)
{
    STARTUPINFOW wsi;
    PROCESS_INFORMATION pi;

    WCHAR wcmdline[] = L"git -c \"alias.x=!x() { printf '%s' \\\"$1\\\" | od -tx1; }; x\" x (\x597d)";

    ZeroMemory( &wsi, sizeof(wsi) );
    wsi.cb = sizeof(wsi);
    ZeroMemory( &pi, sizeof(pi) );

    if (!CreateProcessW(NULL, wcmdline,
            NULL, NULL, // proc, thread attrs
            FALSE, // handle inheritance
            0, // creation flags
            NULL, // env
            NULL, // cur dir
            &wsi,
            &pi)) {
        printf( "CreateProcess failed (%d).\n", GetLastError() );
        return 1;
    }

    // Wait until child process exits.
    WaitForSingleObject( pi.hProcess, INFINITE );

    // Close process and thread handles. 
    CloseHandle( pi.hProcess );
    CloseHandle( pi.hThread );
    return 0;
}

[-- Attachment #3: Type: text/plain, Size: 204 bytes --]


Am I correct that this problem is related the w32.c comment?  It's not
clear to me what changes are needed in cmdproxy (and other places?) to
address it.

[1]: https://github.com/magit/magit/issues/3111

^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#28180: [w32] Unicode characters in subprocess (git) arguments changed to space
  2017-08-22  2:35 bug#28180: [w32] Unicode characters in subprocess (git) arguments changed to space npostavs
@ 2017-08-22 14:54 ` Eli Zaretskii
  2017-08-28 14:42   ` Noam Postavsky
  0 siblings, 1 reply; 5+ messages in thread
From: Eli Zaretskii @ 2017-08-22 14:54 UTC (permalink / raw)
  To: npostavs; +Cc: 28180

> From: npostavs@users.sourceforge.net
> Date: Mon, 21 Aug 2017 22:35:24 -0400
> 
> In w32.c there is a comment saying
> 
>    . Running subprocesses in non-ASCII directories and with non-ASCII
>      file arguments is limited to the current codepage [...]
>      This should be fixed, but will also require changes in cmdproxy.
>      The current limitation is not terribly bad anyway, since very
>      few, if any, Windows console programs that are likely to be
>      invoked by Emacs support UTF-16 encoded command lines.
> 
> I believe we're running into this limitation with git: staging a file
> named 好.txt fails from magit[1] (I tried also with vc, same problem).
> A quick way to see the problem is evaluating the call-process form
> below, the output shows that the Chinese character has been transformed
> into a space.

I'd expect that in a non-Chinese locale (which I believe was what you
did), but the OP of the Magit issue has Windows set up for a Chinese
locale, so there has to be some other explanation, because passing
Chinese characters on the command line ought to work in that case.

> Am I correct that this problem is related the w32.c comment?

The comment is accurate, but it can only explain why command-line
arguments with characters outside of the current Windows locale cannot
be safely passed to sub-processes.  Which AFAIU is not the case with
the OP of that Magit issue.

> It's not clear to me what changes are needed in cmdproxy (and other
> places?) to address it.

cmdproxy is not involved in call-process, but it is involved in
shell-command and its ilk.  As it makes no sense to support Unicode in
the former, but not in the latter, if we want to lift this limitation,
we must teach cmdproxy to use "wide" APIs both for receiving
command-line arguments from Emacs and for passing them to programs it
invokes.

As to the "other places", the only problem I'm aware of is that the
encoding of the command-line arguments, when they arrive at w32proc.c,
is not known in advance, so this must be somehow fixed/changed,
otherwise we will be unable to re-encode them in UTF-16.  I believe
the comment in w32.c does mention that.





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#28180: [w32] Unicode characters in subprocess (git) arguments changed to space
  2017-08-22 14:54 ` Eli Zaretskii
@ 2017-08-28 14:42   ` Noam Postavsky
  2017-08-28 17:15     ` Eli Zaretskii
  2017-08-29 22:06     ` Noam Postavsky
  0 siblings, 2 replies; 5+ messages in thread
From: Noam Postavsky @ 2017-08-28 14:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 28180

[-- Attachment #1: Type: text/plain, Size: 962 bytes --]

On Tue, Aug 22, 2017 at 10:54 AM, Eli Zaretskii <eliz@gnu.org> wrote:

> As to the "other places", the only problem I'm aware of is that the
> encoding of the command-line arguments, when they arrive at w32proc.c,
> is not known in advance, so this must be somehow fixed/changed,
> otherwise we will be unable to re-encode them in UTF-16.  I believe
> the comment in w32.c does mention that.

Just to understand the issue better, I applied the attached diff to
use CreateProcessW. It seemed to work, but only when I start emacs
from mingw's msys shell. When running from cmd.exe it still translates
to space.

Furthermore, when I run an unpatched Emacs from the msys shell, the
output of the test I posted above is different:

(call-process "git" nil '(t t) nil
              "-c" "alias.x=!x() { printf '%s' \"$1\" | od -tx1; }; x"
"x" "(好)")
0000000 28 c3 a5 c2 a5 c2 bd 29
0000010

Do you have any idea what setting could cause this?

[-- Attachment #2: CreateProcessW.diff --]
[-- Type: text/plain, Size: 2433 bytes --]

diff --git c/src/process.c i/src/process.c
index e7ee99a..2c17b5a 100644
--- c/src/process.c
+++ i/src/process.c
@@ -1881,8 +1881,7 @@ usage: (make-process &rest ARGS)  */)
 	  if (STRING_MULTIBYTE (arg))
 	    {
 	      if (NILP (arg_encoding))
-		arg_encoding = (complement_process_encoding_system
-				(XPROCESS (proc)->encode_coding_system));
+		arg_encoding = Qutf_8;
 	      arg = code_convert_string_norecord (arg, arg_encoding, 1);
 	    }
 	  tem = Fcons (arg, tem);
diff --git c/src/w32proc.c i/src/w32proc.c
index 76af55f..86aaee2 100644
--- c/src/w32proc.c
+++ i/src/w32proc.c
@@ -1204,14 +1204,13 @@ static BOOL
 create_child (char *exe, char *cmdline, char *env, int is_gui_app,
 	      pid_t * pPid, child_process *cp)
 {
-  STARTUPINFO start;
+  STARTUPINFOW start;
   SECURITY_ATTRIBUTES sec_attrs;
 #if 0
   SECURITY_DESCRIPTOR sec_desc;
 #endif
   DWORD flags;
   char dir[ MAX_PATH ];
-  char *p;
   const char *ext;
 
   if (cp == NULL) emacs_abort ();
@@ -1242,14 +1241,8 @@ create_child (char *exe, char *cmdline, char *env, int is_gui_app,
   sec_attrs.lpSecurityDescriptor = NULL /* &sec_desc */;
   sec_attrs.bInheritHandle = FALSE;
 
-  filename_to_ansi (process_dir, dir);
-  /* Can't use unixtodos_filename here, since that needs its file name
-     argument encoded in UTF-8.  OTOH, process_dir, which _is_ in
-     UTF-8, points, to the directory computed by our caller, and we
-     don't want to modify that, either.  */
-  for (p = dir; *p; p = CharNextA (p))
-    if (*p == '/')
-      *p = '\\';
+  strcpy (dir, process_dir);
+  unixtodos_filename (dir);
 
   /* CreateProcess handles batch files as exe specially.  This special
      handling fails when both the batch file and arguments are quoted.
@@ -1265,8 +1258,13 @@ create_child (char *exe, char *cmdline, char *env, int is_gui_app,
 	   : CREATE_NEW_CONSOLE);
   if (NILP (Vw32_start_process_inherit_error_mode))
     flags |= CREATE_DEFAULT_ERROR_MODE;
-  if (!CreateProcessA (exe, cmdline, &sec_attrs, NULL, TRUE,
-		       flags, env, dir, &start, &cp->procinfo))
+
+  wchar_t exeW[MAX_PATH], cmdlineW[MAX_PATH], dirW[MAX_PATH];
+  filename_to_utf16 (exe, exeW);
+  filename_to_utf16 (cmdline, cmdlineW);
+  filename_to_utf16 (dir, dirW);
+  if (!CreateProcessW (exeW, cmdlineW, &sec_attrs, NULL, TRUE,
+                       flags, env, dirW, &start, &cp->procinfo))
     goto EH_Fail;
 
   cp->pid = (int) cp->procinfo.dwProcessId;

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* bug#28180: [w32] Unicode characters in subprocess (git) arguments changed to space
  2017-08-28 14:42   ` Noam Postavsky
@ 2017-08-28 17:15     ` Eli Zaretskii
  2017-08-29 22:06     ` Noam Postavsky
  1 sibling, 0 replies; 5+ messages in thread
From: Eli Zaretskii @ 2017-08-28 17:15 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: 28180

> From: Noam Postavsky <npostavs@users.sourceforge.net>
> Date: Mon, 28 Aug 2017 10:42:14 -0400
> Cc: 28180@debbugs.gnu.org
> 
> Just to understand the issue better, I applied the attached diff to
> use CreateProcessW.

I hope you realize that this is just a quick hack, which cannot work
in general, yes?  For starters, the command line is not a file name,
in general, so using filename_to_utf16 is inappropriate.  Also, I
think the environment variables need to be converted to UTF-16.

> It seemed to work, but only when I start emacs from mingw's msys
> shell. When running from cmd.exe it still translates to space.

What exactly did you run from cmd.exe?  What command?

> Furthermore, when I run an unpatched Emacs from the msys shell, the
> output of the test I posted above is different:
> 
> (call-process "git" nil '(t t) nil
>               "-c" "alias.x=!x() { printf '%s' \"$1\" | od -tx1; }; x"
> "x" "(好)")
> 0000000 28 c3 a5 c2 a5 c2 bd 29
> 0000010
> 
> Do you have any idea what setting could cause this?

Windows tries to interpret UTF-8 as something else?





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#28180: [w32] Unicode characters in subprocess (git) arguments changed to space
  2017-08-28 14:42   ` Noam Postavsky
  2017-08-28 17:15     ` Eli Zaretskii
@ 2017-08-29 22:06     ` Noam Postavsky
  1 sibling, 0 replies; 5+ messages in thread
From: Noam Postavsky @ 2017-08-29 22:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 28180

On Mon, Aug 28, 2017 at 10:42 AM, Noam Postavsky
<npostavs@users.sourceforge.net> wrote:

> Just to understand the issue better, I applied the attached diff to
> use CreateProcessW. It seemed to work, but only when I start emacs
> from mingw's msys shell. When running from cmd.exe it still translates
> to space.

Ugh, it's just a stupid mistake on my part, I changed the argument
encoding to utf8 in make-process instead of call-process. It
*appeared* to work because msys sets LANG to en_US.UTF-8.





^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-08-29 22:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-22  2:35 bug#28180: [w32] Unicode characters in subprocess (git) arguments changed to space npostavs
2017-08-22 14:54 ` Eli Zaretskii
2017-08-28 14:42   ` Noam Postavsky
2017-08-28 17:15     ` Eli Zaretskii
2017-08-29 22:06     ` Noam Postavsky

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).