From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#71472: [PATCH] Add pty support by using ConPTY on Windows Date: Tue, 11 Jun 2024 10:27:04 +0300 Message-ID: <86ed946it3.fsf@gnu.org> References: <874ja1m6u1.fsf@zohomail.jp> <86jziw956n.fsf@gnu.org> <190055cd3c0.5289e49215028.2058921479589116968@zohomail.jp> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="19026"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 71472@debbugs.gnu.org To: Ke Wu Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Tue Jun 11 22:43:09 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sH8Kg-0004a8-EW for geb-bug-gnu-emacs@m.gmane-mx.org; Tue, 11 Jun 2024 22:43:06 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sH8Jp-0004Zs-U9; Tue, 11 Jun 2024 16:42:13 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sH8Jk-0004We-Sp for bug-gnu-emacs@gnu.org; Tue, 11 Jun 2024 16:42:08 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sH8Jk-00083K-KT for bug-gnu-emacs@gnu.org; Tue, 11 Jun 2024 16:42:08 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1sH8Jj-0006NA-1g for bug-gnu-emacs@gnu.org; Tue, 11 Jun 2024 16:42:07 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 11 Jun 2024 20:42:07 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 71472 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 71472-submit@debbugs.gnu.org id=B71472.171813849124206 (code B ref 71472); Tue, 11 Jun 2024 20:42:07 +0000 Original-Received: (at 71472) by debbugs.gnu.org; 11 Jun 2024 20:41:31 +0000 Original-Received: from localhost ([127.0.0.1]:36755 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sH8J8-0006IE-FG for submit@debbugs.gnu.org; Tue, 11 Jun 2024 16:41:31 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:47376) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sH82P-0002i3-6G for 71472@debbugs.gnu.org; Tue, 11 Jun 2024 16:24:13 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sGvuZ-0008LC-Th; Tue, 11 Jun 2024 03:27:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=goOOg6NLSqFV8eDSUSgUcR3eYtiD7VIk3VnIF1bqpOo=; b=ozdVzstE4cUB 9GSnDQBzQxcKc7P1t6Hm5gUVaud1lMHoaq7EA+Vghi/EwJPBZCh8lJlSLlMboqIlTayOXGAlJAmhW lhXUhzKq1/6vNWafygwmBzMHLWGyZXUU5f7BFE2j1VXpzi7QbcF4Mzln4YZTXHMxaTg6FzSDvsua+ 8H3mUQssI6fGVg7mpdgTQQ06GPigCzXZUjsGGpvmxTk8/0Gc2j/rIN9mruMtdIIKVALpHINdKFqyq 5WKVq+kwnlpckSjTmpjYXG3UYQLVsoWM8xWVll5/Q2y3D9qb+LBQwdCenOmU+IoZY5Nn0+BRN8fLv ongcbnYsTCoEqWXh6+ADSw==; In-Reply-To: <190055cd3c0.5289e49215028.2058921479589116968@zohomail.jp> (message from Ke Wu on Tue, 11 Jun 2024 12:34:48 +0900) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:287127 Archived-At: [Please use Reply All to reply, to keep the bug tracker CC'ed.] > Date: Tue, 11 Jun 2024 12:34:48 +0900 > From: Ke Wu > > > If we must use UTF-8 as the only encoding to talk to sub-processes via > > ConPTY, that makes the number of applications that can be used this > > way very small, since most programs we are used to run as > > subprocesses, in particularly ports of GNU software like GCC, GDB, > > Grep, Find, and many others, cannot reliably talk to Emacs in UTF-8 > > encoding on MS-Windows. > > The statement is not so accurate. On Emacs side, UTF-8 is assumed due > to the limitation of ConPTY (it would communicate with the console only in > UTF-8). However, on the subprocesses side, ConPTY would respect its > codepage and translate it into UTF-8 when sending to the console. So > we can make these subprocesses run in the codepage other than > 65001(UTF-8). This is inaccurate: ConPTY always assumes the process running on the other side of the connection uses the system codepage. If the subprocess expects some other encoding, ConPTY will not know that, and Emacs has no way of telling ConPTY to use a different encoding. This is the essence of the issue I filed with them, and they basically told me that what ConPTY does is "by design". This is not an academic issue: some very important programs we invoke from Emacs need us to talk to them in encoding different from the system codepage. A notable example is Git, which wants UTF-8 (it can support other encodings, but that is not recommended, and Emacs doesn't really support that well on Windows). > I am not very familiar with these GNU software ports :( > Please let me know if there will be problems with ConPTY translating from > UTF-8 to other codepages. See above. There's no way for Emacs to set that up, except when the "other codepage" is the system codepage. > > https://github.com/microsoft/terminal/issues/9174 > > I think a possible solution to this issue is to use a wrapper program to > set the codepage for the applications that do not call `SetConsoleOutputCP`. > As a proof of concept, the following code snippet uses cmdproxy.exe to > change the codepage to 1255. Please replace the cmdproxy.exe path in the > snippet. > > (progn > (set-buffer > (apply #'make-term > "terminal" > "C:/Users/oracl/Documents/Programs/emacs-master/nt/cmdproxy.exe" > nil > '("-c" "chcp 1255 && call cmd"))) > (term-char-mode) > (pop-to-buffer-same-window "*terminal*")) > > The codepage can be verified by either using `chcp` in the newly created cmd process. > Also, the following hack can be applied to make the created conhost.exe visible. > Therefore, the codepage can be directly verified by viewing the properties of the > conhost.exe window. > > --- a/src/w32.c > +++ b/src/w32.c > @@ -11208,7 +11208,7 @@ make_console_with_pipe (ptrdiff_t nargs, Lisp_Object * args, const int * fds) > > command_new = CALLN (Flist, > build_string ("conhost.exe"), > - build_string ("--headless"), > + /* build_string ("--headless"), */ > build_string ("--feature"), > build_string ("pty")); > if (!NILP (width)) > > Therefore, we can have subprocesses run in codepage other than 65001 or the OEM default > codepage. And as a console program, Emacs talks in UTF-8. It may be feasible if we add a > `:coding` to function `term`, which builds up a wrapper to change the code page before the > real program starts. cmdproxy is only used when invoking programs via the shell. But Emacs also invokes programs directly (call-process etc.), in which case cmdproxy (or any other kind of wrapper) will be very problematic at best, if not impossible. See below about the complications this causes wrt quoting of command-line arguments, for example. Please keep in mind how Emacs arranges to use correct encoding when invoking other programs: we have data structures (process-coding-system-alist etc.) which define the correct encoding by program name, and we also have variables (coding-system-for-read etc.) that can be bound to override those defaults temporarily. The encoding is applied separately to the program's command-line arguments and to the stuff we write and read to and from the process. How can all this work reliably with ConPTY, even if the wrapper trick could sometimes work? Specifically: . how do we control encoding of command-line arguments? most programs running on Windows cannot handle UTF-8 encoded command lines . what if the encoding we need doesn't have a corresponding Windows codepage (which means chcp will not work)? . how can we handle the eol-conversion part of the encoding (some programs _must_ be fed with Unix EOLs)? Also please note that using a wrapper adds another layer of interpreting command-line arguments, which might break some complicated cases that use fancy quoting of special characters. Any wrapper we provide will be compiled with MinGW, so it will use the MinGW startup code to process quoting. But the program the wrapper runs might not be a MinGW program, so it could use different ways of processing quotes. The simplest example of such a combination is cmd.exe itself: its quoting rules are very different from what MinGW uses. This will definitely break some cases. For example, Git uses the '^' character for special purposes, and some Windows styles of quoting interpret '^' as a quote character -- this could easily break Emacs commands that invoke Git. If someone can figure out how to do all this stuff with ConPTY, then okay, we could use it. But it is not a trivial problem, not at all. The way ConPTY was designed is the way Windows works everywhere else: it doesn't allow applications to communicate with raw bytestreams without interpreting; instead, Windows _interprets_ the bytestreams as characters encoded in the encoding it assumes for the source, and then converts those characters to the encoding of the destination. This basic design principle is built into every part of Windows APIs. For example, a program whose 'main' function is declared as accepting wchar_t (i.e. UTF-16) command-line arguments will magically have the command-line arguments converted to UTF-16, even if the calling process uses plain ASCII. ConPTY uses the same design principles, so it is inherently unable to pass through raw bytes without interpreting them. And without that, we cannot easily implement the way Emacs expects this stuff to work, because Emacs assumes the encoding to be a private contract between Emacs and the program it calls, with nothing in-between interfering. I hope I explained some of the issues with ConPTY, and why we cannot install its support without some reasonably reliable solutions for those problematic aspects. Thanks.