unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#52816: 28.0.90; misspelling of windows-nt system-type in reset-language-environment, enquiring about default-process-coding-systems on MS-Windows
@ 2021-12-27 10:52 Ioannis Kappas
  2021-12-27 17:26 ` Eli Zaretskii
  0 siblings, 1 reply; 2+ messages in thread
From: Ioannis Kappas @ 2021-12-27 10:52 UTC (permalink / raw)
  To: 52816; +Cc: rgm

Hi,

there appears to have been a symbol spelling mistake in a recent
commit a4bfb0bc5c14e002c0926fc320aeb4a3fc261447 to "Default Emacs to
UTF-8 instead of Latin-1". The `window-nt' symbol is used instead of
`windows-nt' when checking for membership in `system-type`.

It does not appear to be of much consequence though, since both
`default-file-name-coding-system' and `default-process-coding-system`
affected by it appear to be overwritten later on anyway at runtime. A
fix could be

diff --git a/lisp/international/mule-cmds.el b/lisp/international/mule-cmds.el
index a0a6557c95..2b52d4bf86 100644
--- a/lisp/international/mule-cmds.el
+++ b/lisp/international/mule-cmds.el
@@ -1873,7 +1873,7 @@ reset-language-environment
   (set-default-coding-systems nil)
   (setq default-sendmail-coding-system 'utf-8)
   (setq default-file-name-coding-system (if (memq system-type
-                                                  '(window-nt ms-dos))
+                                                  '(windows-nt ms-dos))
                                             'iso-latin-1-unix
                                           'utf-8-unix))
   ;; Preserve eol-type from existing default-process-coding-systems.
@@ -1892,9 +1892,9 @@ reset-language-environment
  (condition-case nil
       (coding-system-change-text-conversion
        (cdr default-process-coding-system)
-       (if (memq system-type '(window-nt ms-dos)) 'iso-latin-1 'utf-8))
+       (if (memq system-type '(windows-nt ms-dos)) 'iso-latin-1 'utf-8))
     (coding-system-error
-     (if (memq system-type '(window-nt ms-dos)) 'iso-latin-1 'utf-8)))))
+     (if (memq system-type '(windows-nt ms-dos)) 'iso-latin-1 'utf-8)))))
     (setq default-process-coding-system
    (cons output-coding input-coding)))

I happen to notice this while looking at the default data encoding
behaviour when sending data to a sub-process using
`call-process-region' on MS-Windows, which I found to differ when
compared out of the box to other OSes. When I was sending a UTF-8
region to a sub-process, I was expecting the data reaching the
sub-process to have that encoding, though that is not the default
behaviour on MS-Windows, which I found confusing, the data are most
likely to arrive encoded as iso-8859-1. On GNU/Linux though it is most
likely to arrive encoded as UTF-8.

I was expecting at first that the encoding of the data sent to the
sub-process would be determined by the region's codepage, but it
rather seems to be determined by the `default-process-coding-system'
(or by a particular sub-process' `process-coding-system-alist', when
set). This is all fine, and is document as such.

I think the expectation at this time and age is for communication
between processes should be in Unicode by default, so as to allow
multilingual sets to passed on between them.

`flycheck' is an example of a utility which is using `call-process' to
marshal buffers to/from a checker sub-processes. Sending multilingual
data to the checkers on MS-Windows are likely to cause failure due to
the default proc encodings being `undecided-unix', and thus encoded as
iso-8859-1 dropping the unicode chars. On GNU/Linux the same operation
is most likely to succeed, because the default encoding is most likely
to be set to `utf8-unix', courtesy of the LANG env variable being most
likely set to a UTF-8 codepage such as `C.UTF-8', and picked up by the
locale logic in Emacs.

The default process coding system is forced in
lisp/w32-fns.el:w32-set-default-process-coding-system:

  ;; Most programs on Windows will accept Unix line endings on input
  ;; (and some programs ported from Unix require it) but most will
  ;; produce DOS line endings on output.
  (setq default-process-coding-system
'(undecided-dos . undecided-unix))

Is now perhaps a good time perhaps now that the utf-8 adaptation is
almost universal, to change the default from undecided to utf-8 and
thus align it (more or less) with the the most likely out of the box
encoding behaviour on GNU/Linux?

Of course, a user can set the LANG env variable on MS-Windows to a
similar codepage as in Linux, but is rather unlikely a user would ever
set this on windows.

Also, should the eol type be set to -dos on the input encoding? The
comment suggests that this was done because most programs back then
were requiring unix eols, but I don't believe that this is the case
any more.

A final note, the documentation under `Default Coding Systems` gives a
warning that `undecided' coding systems do not work reliably with
asynchronous sub-process output, perhaps this is an additional
argument while we should move away from the undecided default above?

https://www.gnu.org/software/emacs/manual/html_node/elisp/Default-Coding-Systems.html

"""
Warning: Coding systems such as undecided, which determine the coding
system from the data, do not work entirely reliably with asynchronous
subprocess output. This is because Emacs handles asynchronous
subprocess output in batches, as it arrives. If the coding system
leaves the character code conversion unspecified, or leaves the
end-of-line conversion unspecified, Emacs must try to detect the
proper conversion from one batch at a time, and this does not always
work.
"""

Thanks!

In GNU Emacs 28.0.90 (build 1, x86_64-w64-mingw32)
 of 2021-12-26
Repository revision: 89a82182cbca0caa19f5b9463629918b7131ef0c
Repository branch: emacs-28
Windowing system distributor 'Microsoft Corp.', version 10
System Description: Microsoft Windows 10





^ permalink raw reply related	[flat|nested] 2+ messages in thread

* bug#52816: 28.0.90; misspelling of windows-nt system-type in reset-language-environment, enquiring about default-process-coding-systems on MS-Windows
  2021-12-27 10:52 bug#52816: 28.0.90; misspelling of windows-nt system-type in reset-language-environment, enquiring about default-process-coding-systems on MS-Windows Ioannis Kappas
@ 2021-12-27 17:26 ` Eli Zaretskii
  0 siblings, 0 replies; 2+ messages in thread
From: Eli Zaretskii @ 2021-12-27 17:26 UTC (permalink / raw)
  To: Ioannis Kappas; +Cc: rgm, 52816-done

> From: Ioannis Kappas <ioannis.kappas@gmail.com>
> Date: Mon, 27 Dec 2021 10:52:20 +0000
> Cc: rgm@gnu.org
> 
> there appears to have been a symbol spelling mistake in a recent
> commit a4bfb0bc5c14e002c0926fc320aeb4a3fc261447 to "Default Emacs to
> UTF-8 instead of Latin-1". The `window-nt' symbol is used instead of
> `windows-nt' when checking for membership in `system-type`.

Thanks, fixed.

> It does not appear to be of much consequence though, since both
> `default-file-name-coding-system' and `default-process-coding-system`
> affected by it appear to be overwritten later on anyway at runtime.

Yes, because fortunately those typos are in
reset-language-environment, which is immediately followed by the likes
of set-language-environment.  IOW, those are defaults that are never
seen in real usage.

> I think the expectation at this time and age is for communication
> between processes should be in Unicode by default, so as to allow
> multilingual sets to passed on between them.

We still cannot use UTF-8 by default for process I/O on MS-Windows, as
UTF-8 is still not a first-class citizen there.  Latest versions of
Windows support it better, but not as well as other (fixed-length)
encoding, and AFIK even that incomplete support needs that the user
turns on an optional feature that is meant for developers.

> `flycheck' is an example of a utility which is using `call-process' to
> marshal buffers to/from a checker sub-processes. Sending multilingual
> data to the checkers on MS-Windows are likely to cause failure due to
> the default proc encodings being `undecided-unix', and thus encoded as
> iso-8859-1 dropping the unicode chars. On GNU/Linux the same operation
> is most likely to succeed, because the default encoding is most likely
> to be set to `utf8-unix', courtesy of the LANG env variable being most
> likely set to a UTF-8 codepage such as `C.UTF-8', and picked up by the
> locale logic in Emacs.

If the checker sub-processes used with flycheck indeed support UTF-8
I/O (I sincerely doubt that, unless you are using Cygwin or MSYS
programs, not native MS-Windows programs), then your customizations of
flycheck should ensure it uses UTF-8 for communicating with those
programs.  We have process-coding-system-alist for that purpose.

> Also, should the eol type be set to -dos on the input encoding? The
> comment suggests that this was done because most programs back then
> were requiring unix eols, but I don't believe that this is the case
> any more.

What comes _from_ a subprocess on Windows can have DOS-style CRLF
EOLs, so using -dos in that case makes sure we decode the EOLs
correctly, and don't leave ^M characters in the text that ends up in
Emacs buffers.  What goes _to_ a subprocess can have Unix EOLs
because MS-Windows programs don't mind if they get LF without a CR.

> A final note, the documentation under `Default Coding Systems` gives a
> warning that `undecided' coding systems do not work reliably with
> asynchronous sub-process output, perhaps this is an additional
> argument while we should move away from the undecided default above?

No, because the problems with UTF-8 on Windows are worse.

I'm closing this bug, as the problem you reported is now fixed on the
emacs-28 branch.





^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-12-27 17:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-27 10:52 bug#52816: 28.0.90; misspelling of windows-nt system-type in reset-language-environment, enquiring about default-process-coding-systems on MS-Windows Ioannis Kappas
2021-12-27 17:26 ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).