all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Ke Wu <ellpih@zohomail.jp>
Cc: 71472@debbugs.gnu.org
Subject: bug#71472: [PATCH] Add pty support by using ConPTY on Windows
Date: Tue, 11 Jun 2024 10:27:04 +0300	[thread overview]
Message-ID: <86ed946it3.fsf@gnu.org> (raw)
In-Reply-To: <190055cd3c0.5289e49215028.2058921479589116968@zohomail.jp> (message from Ke Wu on Tue, 11 Jun 2024 12:34:48 +0900)

[Please use Reply All to reply, to keep the bug tracker CC'ed.]

> Date: Tue, 11 Jun 2024 12:34:48 +0900
> From: Ke Wu <ellpih@zohomail.jp>
> 
>  > If we must use UTF-8 as the only encoding to talk to sub-processes via 
>  > ConPTY, that makes the number of applications that can be used this 
>  > way very small, since most programs we are used to run as 
>  > subprocesses, in particularly ports of GNU software like GCC, GDB, 
>  > Grep, Find, and many others, cannot reliably talk to Emacs in UTF-8 
>  > encoding on MS-Windows.
> 
> The statement is not so accurate. On Emacs side, UTF-8 is assumed due
> to the limitation of ConPTY (it would communicate with the console only in
> UTF-8). However, on the subprocesses side, ConPTY would respect its
> codepage and translate it into UTF-8 when sending to the console. So
> we can make these subprocesses run in the codepage other than
> 65001(UTF-8).

This is inaccurate: ConPTY always assumes the process running on the
other side of the connection uses the system codepage.  If the
subprocess expects some other encoding, ConPTY will not know that, and
Emacs has no way of telling ConPTY to use a different encoding.  This
is the essence of the issue I filed with them, and they basically told
me that what ConPTY does is "by design".

This is not an academic issue: some very important programs we invoke
from Emacs need us to talk to them in encoding different from the
system codepage.  A notable example is Git, which wants UTF-8 (it can
support other encodings, but that is not recommended, and Emacs
doesn't really support that well on Windows).

> I am not very familiar with these GNU software ports :(
> Please let me know if there will be problems with ConPTY translating from
> UTF-8 to other codepages.

See above.  There's no way for Emacs to set that up, except when the
"other codepage" is the system codepage.

>  >  https://github.com/microsoft/terminal/issues/9174 
> 
> I think a possible solution to this issue is to use a wrapper program to
> set the codepage for the applications that do not call `SetConsoleOutputCP`.
> As a proof of concept, the following code snippet uses cmdproxy.exe to
> change the codepage to 1255. Please replace the cmdproxy.exe path in the
> snippet.
> 
> (progn
>   (set-buffer
>    (apply #'make-term
>           "terminal"
>           "C:/Users/oracl/Documents/Programs/emacs-master/nt/cmdproxy.exe"
>           nil
>           '("-c" "chcp 1255 && call cmd")))
>   (term-char-mode)
>   (pop-to-buffer-same-window "*terminal*"))
> 
> The codepage can be verified by either using `chcp` in the newly created cmd process.
> Also, the following hack can be applied to make the created conhost.exe visible.
> Therefore, the codepage can be directly verified by viewing the properties of the
> conhost.exe window. 
> 
> --- a/src/w32.c
> +++ b/src/w32.c
> @@ -11208,7 +11208,7 @@ make_console_with_pipe (ptrdiff_t nargs, Lisp_Object * args, const int * fds)
> 
>    command_new = CALLN (Flist,
>                         build_string ("conhost.exe"),
> -                       build_string ("--headless"),
> +                       /* build_string ("--headless"), */
>                         build_string ("--feature"),
>                         build_string ("pty"));
>    if (!NILP (width)) 
> 
> Therefore, we can have subprocesses run in codepage other than 65001 or the OEM default
> codepage.  And as a console program, Emacs talks in UTF-8.  It may be feasible if we add a
> `:coding` to function `term`, which builds up a wrapper to change the code page before the
> real program starts.

cmdproxy is only used when invoking programs via the shell.  But Emacs
also invokes programs directly (call-process etc.), in which case
cmdproxy (or any other kind of wrapper) will be very problematic at
best, if not impossible.  See below about the complications this
causes wrt quoting of command-line arguments, for example.

Please keep in mind how Emacs arranges to use correct encoding when
invoking other programs: we have data structures
(process-coding-system-alist etc.) which define the correct encoding
by program name, and we also have variables (coding-system-for-read
etc.) that can be bound to override those defaults temporarily.  The
encoding is applied separately to the program's command-line arguments
and to the stuff we write and read to and from the process.  How can
all this work reliably with ConPTY, even if the wrapper trick could
sometimes work?  Specifically:

  . how do we control encoding of command-line arguments? most
    programs running on Windows cannot handle UTF-8 encoded command
    lines
  . what if the encoding we need doesn't have a corresponding Windows
    codepage (which means chcp will not work)?
  . how can we handle the eol-conversion part of the encoding (some
    programs _must_ be fed with Unix EOLs)?

Also please note that using a wrapper adds another layer of
interpreting command-line arguments, which might break some
complicated cases that use fancy quoting of special characters.  Any
wrapper we provide will be compiled with MinGW, so it will use the
MinGW startup code to process quoting.  But the program the wrapper
runs might not be a MinGW program, so it could use different ways of
processing quotes.  The simplest example of such a combination is
cmd.exe itself: its quoting rules are very different from what MinGW
uses.  This will definitely break some cases.  For example, Git uses
the '^' character for special purposes, and some Windows styles of
quoting interpret '^' as a quote character -- this could easily break
Emacs commands that invoke Git.

If someone can figure out how to do all this stuff with ConPTY, then
okay, we could use it.  But it is not a trivial problem, not at all.
The way ConPTY was designed is the way Windows works everywhere else:
it doesn't allow applications to communicate with raw bytestreams
without interpreting; instead, Windows _interprets_ the bytestreams as
characters encoded in the encoding it assumes for the source, and then
converts those characters to the encoding of the destination.  This
basic design principle is built into every part of Windows APIs.  For
example, a program whose 'main' function is declared as accepting
wchar_t (i.e. UTF-16) command-line arguments will magically have the
command-line arguments converted to UTF-16, even if the calling
process uses plain ASCII.  ConPTY uses the same design principles, so
it is inherently unable to pass through raw bytes without interpreting
them.  And without that, we cannot easily implement the way Emacs
expects this stuff to work, because Emacs assumes the encoding to be a
private contract between Emacs and the program it calls, with nothing
in-between interfering.

I hope I explained some of the issues with ConPTY, and why we cannot
install its support without some reasonably reliable solutions for
those problematic aspects.

Thanks.





  parent reply	other threads:[~2024-06-11  7:27 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-10 10:26 bug#71472: [PATCH] Add pty support by using ConPTY on Windows Ke Wu
2024-06-10 15:40 ` Eli Zaretskii
     [not found]   ` <190055cd3c0.5289e49215028.2058921479589116968@zohomail.jp>
2024-06-11  7:27     ` Eli Zaretskii [this message]
2024-06-11  8:24       ` Ke Wu
2024-06-11  8:42         ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86ed946it3.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=71472@debbugs.gnu.org \
    --cc=ellpih@zohomail.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.