* bug#28180: [w32] Unicode characters in subprocess (git) arguments changed to space @ 2017-08-22 2:35 npostavs 2017-08-22 14:54 ` Eli Zaretskii 0 siblings, 1 reply; 5+ messages in thread From: npostavs @ 2017-08-22 2:35 UTC (permalink / raw) To: 28180 [-- Attachment #1: Type: text/plain, Size: 1396 bytes --] In w32.c there is a comment saying . Running subprocesses in non-ASCII directories and with non-ASCII file arguments is limited to the current codepage [...] This should be fixed, but will also require changes in cmdproxy. The current limitation is not terribly bad anyway, since very few, if any, Windows console programs that are likely to be invoked by Emacs support UTF-16 encoded command lines. I believe we're running into this limitation with git: staging a file named 好.txt fails from magit[1] (I tried also with vc, same problem). A quick way to see the problem is evaluating the call-process form below, the output shows that the Chinese character has been transformed into a space. This happens even if I execute 'chcp 65001' before starting Emacs (a possible workaround I saw suggested in a few places). The short file name trick doesn't help either. (call-process "git" nil '(t t) nil "-c" "alias.x=!x() { printf '%s' \"$1\" | od -tx1; }; x" "x" "(好)") 0000000 28 20 29 0000003 As far as I can tell, git does support UTF-16 encoded command lines, as demonstrated by the attached git-args.c, which produces the utf8 encoding of the character (this is also what the call-process form produces when I run it on GNU/Linux): C:\Users\npostavs\src\win32-args>.\git-args.exe 0000000 28 e5 a5 bd 29 0000005 [-- Attachment #2: test program, C source --] [-- Type: text/plain, Size: 883 bytes --] #include <windows.h> #include <stdio.h> int main(int argc, char **argv) { STARTUPINFOW wsi; PROCESS_INFORMATION pi; WCHAR wcmdline[] = L"git -c \"alias.x=!x() { printf '%s' \\\"$1\\\" | od -tx1; }; x\" x (\x597d)"; ZeroMemory( &wsi, sizeof(wsi) ); wsi.cb = sizeof(wsi); ZeroMemory( &pi, sizeof(pi) ); if (!CreateProcessW(NULL, wcmdline, NULL, NULL, // proc, thread attrs FALSE, // handle inheritance 0, // creation flags NULL, // env NULL, // cur dir &wsi, &pi)) { printf( "CreateProcess failed (%d).\n", GetLastError() ); return 1; } // Wait until child process exits. WaitForSingleObject( pi.hProcess, INFINITE ); // Close process and thread handles. CloseHandle( pi.hProcess ); CloseHandle( pi.hThread ); return 0; } [-- Attachment #3: Type: text/plain, Size: 204 bytes --] Am I correct that this problem is related the w32.c comment? It's not clear to me what changes are needed in cmdproxy (and other places?) to address it. [1]: https://github.com/magit/magit/issues/3111 ^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#28180: [w32] Unicode characters in subprocess (git) arguments changed to space 2017-08-22 2:35 bug#28180: [w32] Unicode characters in subprocess (git) arguments changed to space npostavs @ 2017-08-22 14:54 ` Eli Zaretskii 2017-08-28 14:42 ` Noam Postavsky 0 siblings, 1 reply; 5+ messages in thread From: Eli Zaretskii @ 2017-08-22 14:54 UTC (permalink / raw) To: npostavs; +Cc: 28180 > From: npostavs@users.sourceforge.net > Date: Mon, 21 Aug 2017 22:35:24 -0400 > > In w32.c there is a comment saying > > . Running subprocesses in non-ASCII directories and with non-ASCII > file arguments is limited to the current codepage [...] > This should be fixed, but will also require changes in cmdproxy. > The current limitation is not terribly bad anyway, since very > few, if any, Windows console programs that are likely to be > invoked by Emacs support UTF-16 encoded command lines. > > I believe we're running into this limitation with git: staging a file > named 好.txt fails from magit[1] (I tried also with vc, same problem). > A quick way to see the problem is evaluating the call-process form > below, the output shows that the Chinese character has been transformed > into a space. I'd expect that in a non-Chinese locale (which I believe was what you did), but the OP of the Magit issue has Windows set up for a Chinese locale, so there has to be some other explanation, because passing Chinese characters on the command line ought to work in that case. > Am I correct that this problem is related the w32.c comment? The comment is accurate, but it can only explain why command-line arguments with characters outside of the current Windows locale cannot be safely passed to sub-processes. Which AFAIU is not the case with the OP of that Magit issue. > It's not clear to me what changes are needed in cmdproxy (and other > places?) to address it. cmdproxy is not involved in call-process, but it is involved in shell-command and its ilk. As it makes no sense to support Unicode in the former, but not in the latter, if we want to lift this limitation, we must teach cmdproxy to use "wide" APIs both for receiving command-line arguments from Emacs and for passing them to programs it invokes. As to the "other places", the only problem I'm aware of is that the encoding of the command-line arguments, when they arrive at w32proc.c, is not known in advance, so this must be somehow fixed/changed, otherwise we will be unable to re-encode them in UTF-16. I believe the comment in w32.c does mention that. ^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#28180: [w32] Unicode characters in subprocess (git) arguments changed to space 2017-08-22 14:54 ` Eli Zaretskii @ 2017-08-28 14:42 ` Noam Postavsky 2017-08-28 17:15 ` Eli Zaretskii 2017-08-29 22:06 ` Noam Postavsky 0 siblings, 2 replies; 5+ messages in thread From: Noam Postavsky @ 2017-08-28 14:42 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 28180 [-- Attachment #1: Type: text/plain, Size: 962 bytes --] On Tue, Aug 22, 2017 at 10:54 AM, Eli Zaretskii <eliz@gnu.org> wrote: > As to the "other places", the only problem I'm aware of is that the > encoding of the command-line arguments, when they arrive at w32proc.c, > is not known in advance, so this must be somehow fixed/changed, > otherwise we will be unable to re-encode them in UTF-16. I believe > the comment in w32.c does mention that. Just to understand the issue better, I applied the attached diff to use CreateProcessW. It seemed to work, but only when I start emacs from mingw's msys shell. When running from cmd.exe it still translates to space. Furthermore, when I run an unpatched Emacs from the msys shell, the output of the test I posted above is different: (call-process "git" nil '(t t) nil "-c" "alias.x=!x() { printf '%s' \"$1\" | od -tx1; }; x" "x" "(好)") 0000000 28 c3 a5 c2 a5 c2 bd 29 0000010 Do you have any idea what setting could cause this? [-- Attachment #2: CreateProcessW.diff --] [-- Type: text/plain, Size: 2433 bytes --] diff --git c/src/process.c i/src/process.c index e7ee99a..2c17b5a 100644 --- c/src/process.c +++ i/src/process.c @@ -1881,8 +1881,7 @@ usage: (make-process &rest ARGS) */) if (STRING_MULTIBYTE (arg)) { if (NILP (arg_encoding)) - arg_encoding = (complement_process_encoding_system - (XPROCESS (proc)->encode_coding_system)); + arg_encoding = Qutf_8; arg = code_convert_string_norecord (arg, arg_encoding, 1); } tem = Fcons (arg, tem); diff --git c/src/w32proc.c i/src/w32proc.c index 76af55f..86aaee2 100644 --- c/src/w32proc.c +++ i/src/w32proc.c @@ -1204,14 +1204,13 @@ static BOOL create_child (char *exe, char *cmdline, char *env, int is_gui_app, pid_t * pPid, child_process *cp) { - STARTUPINFO start; + STARTUPINFOW start; SECURITY_ATTRIBUTES sec_attrs; #if 0 SECURITY_DESCRIPTOR sec_desc; #endif DWORD flags; char dir[ MAX_PATH ]; - char *p; const char *ext; if (cp == NULL) emacs_abort (); @@ -1242,14 +1241,8 @@ create_child (char *exe, char *cmdline, char *env, int is_gui_app, sec_attrs.lpSecurityDescriptor = NULL /* &sec_desc */; sec_attrs.bInheritHandle = FALSE; - filename_to_ansi (process_dir, dir); - /* Can't use unixtodos_filename here, since that needs its file name - argument encoded in UTF-8. OTOH, process_dir, which _is_ in - UTF-8, points, to the directory computed by our caller, and we - don't want to modify that, either. */ - for (p = dir; *p; p = CharNextA (p)) - if (*p == '/') - *p = '\\'; + strcpy (dir, process_dir); + unixtodos_filename (dir); /* CreateProcess handles batch files as exe specially. This special handling fails when both the batch file and arguments are quoted. @@ -1265,8 +1258,13 @@ create_child (char *exe, char *cmdline, char *env, int is_gui_app, : CREATE_NEW_CONSOLE); if (NILP (Vw32_start_process_inherit_error_mode)) flags |= CREATE_DEFAULT_ERROR_MODE; - if (!CreateProcessA (exe, cmdline, &sec_attrs, NULL, TRUE, - flags, env, dir, &start, &cp->procinfo)) + + wchar_t exeW[MAX_PATH], cmdlineW[MAX_PATH], dirW[MAX_PATH]; + filename_to_utf16 (exe, exeW); + filename_to_utf16 (cmdline, cmdlineW); + filename_to_utf16 (dir, dirW); + if (!CreateProcessW (exeW, cmdlineW, &sec_attrs, NULL, TRUE, + flags, env, dirW, &start, &cp->procinfo)) goto EH_Fail; cp->pid = (int) cp->procinfo.dwProcessId; ^ permalink raw reply related [flat|nested] 5+ messages in thread
* bug#28180: [w32] Unicode characters in subprocess (git) arguments changed to space 2017-08-28 14:42 ` Noam Postavsky @ 2017-08-28 17:15 ` Eli Zaretskii 2017-08-29 22:06 ` Noam Postavsky 1 sibling, 0 replies; 5+ messages in thread From: Eli Zaretskii @ 2017-08-28 17:15 UTC (permalink / raw) To: Noam Postavsky; +Cc: 28180 > From: Noam Postavsky <npostavs@users.sourceforge.net> > Date: Mon, 28 Aug 2017 10:42:14 -0400 > Cc: 28180@debbugs.gnu.org > > Just to understand the issue better, I applied the attached diff to > use CreateProcessW. I hope you realize that this is just a quick hack, which cannot work in general, yes? For starters, the command line is not a file name, in general, so using filename_to_utf16 is inappropriate. Also, I think the environment variables need to be converted to UTF-16. > It seemed to work, but only when I start emacs from mingw's msys > shell. When running from cmd.exe it still translates to space. What exactly did you run from cmd.exe? What command? > Furthermore, when I run an unpatched Emacs from the msys shell, the > output of the test I posted above is different: > > (call-process "git" nil '(t t) nil > "-c" "alias.x=!x() { printf '%s' \"$1\" | od -tx1; }; x" > "x" "(好)") > 0000000 28 c3 a5 c2 a5 c2 bd 29 > 0000010 > > Do you have any idea what setting could cause this? Windows tries to interpret UTF-8 as something else? ^ permalink raw reply [flat|nested] 5+ messages in thread
* bug#28180: [w32] Unicode characters in subprocess (git) arguments changed to space 2017-08-28 14:42 ` Noam Postavsky 2017-08-28 17:15 ` Eli Zaretskii @ 2017-08-29 22:06 ` Noam Postavsky 1 sibling, 0 replies; 5+ messages in thread From: Noam Postavsky @ 2017-08-29 22:06 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 28180 On Mon, Aug 28, 2017 at 10:42 AM, Noam Postavsky <npostavs@users.sourceforge.net> wrote: > Just to understand the issue better, I applied the attached diff to > use CreateProcessW. It seemed to work, but only when I start emacs > from mingw's msys shell. When running from cmd.exe it still translates > to space. Ugh, it's just a stupid mistake on my part, I changed the argument encoding to utf8 in make-process instead of call-process. It *appeared* to work because msys sets LANG to en_US.UTF-8. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-08-29 22:06 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-08-22 2:35 bug#28180: [w32] Unicode characters in subprocess (git) arguments changed to space npostavs 2017-08-22 14:54 ` Eli Zaretskii 2017-08-28 14:42 ` Noam Postavsky 2017-08-28 17:15 ` Eli Zaretskii 2017-08-29 22:06 ` Noam Postavsky
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.