From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ioannis Kappas Newsgroups: gmane.emacs.bugs Subject: bug#52816: 28.0.90; misspelling of windows-nt system-type in reset-language-environment, enquiring about default-process-coding-systems on MS-Windows Date: Mon, 27 Dec 2021 10:52:20 +0000 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17486"; mail-complaints-to="usenet@ciao.gmane.io" Cc: rgm@gnu.org To: 52816@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Dec 27 11:53:10 2021 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1n1ncr-0004IF-Nu for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 27 Dec 2021 11:53:09 +0100 Original-Received: from localhost ([::1]:35240 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n1ncq-0001H3-FX for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 27 Dec 2021 05:53:08 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:50268) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n1nck-0001Gq-FE for bug-gnu-emacs@gnu.org; Mon, 27 Dec 2021 05:53:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:58928) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1n1nck-0007EI-68 for bug-gnu-emacs@gnu.org; Mon, 27 Dec 2021 05:53:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1n1nck-00085h-5n for bug-gnu-emacs@gnu.org; Mon, 27 Dec 2021 05:53:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Ioannis Kappas Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 27 Dec 2021 10:53:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 52816 X-GNU-PR-Package: emacs X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.164060235231051 (code B ref -1); Mon, 27 Dec 2021 10:53:01 +0000 Original-Received: (at submit) by debbugs.gnu.org; 27 Dec 2021 10:52:32 +0000 Original-Received: from localhost ([127.0.0.1]:42241 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n1ncG-00084j-88 for submit@debbugs.gnu.org; Mon, 27 Dec 2021 05:52:32 -0500 Original-Received: from lists.gnu.org ([209.51.188.17]:36898) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n1ncD-00084b-Mg for submit@debbugs.gnu.org; Mon, 27 Dec 2021 05:52:30 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:50176) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n1ncC-0001Eg-As for bug-gnu-emacs@gnu.org; Mon, 27 Dec 2021 05:52:29 -0500 Original-Received: from [2607:f8b0:4864:20::22b] (port=35556 helo=mail-oi1-x22b.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1n1ncA-0007B9-Gl; Mon, 27 Dec 2021 05:52:28 -0500 Original-Received: by mail-oi1-x22b.google.com with SMTP id m6so24822871oim.2; Mon, 27 Dec 2021 02:52:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:from:date:message-id:subject:to:cc; bh=QdrDJgIyOiaxZebnfl408/oLR9uCd18WanA8vBZROlk=; b=H5D0Uw+AIeqyqhE564QHkyTkk5Ky+HoJphhTMieGhiXD1DdDq+4Dy4QlsLxLwopTgM d7nNgavOwxjVYGtzuozs4G+RAZIBtKFpgYpAlw5pn/UXdFi6uKpOYLs0i2wuS5XlmmQO t4FHQnGQsVm32q7AMhGSmCpgBj2VLzV5tkUSvfQ2eDpDMtENQ8HLLM1UR8d1HNuf2nCO 0Y1AVRWtz8bFuMG6xXQnbloIHUz4s26mjXH2RuVXigaT5rjuf7MduL55QnUpUfOGIC1j DvI3NvQn4B8ojb1kukW2iPnAI/u9BvmuPvhBeCplhmj/qxtP3IDpI9JW3xg0hafrhX1I p3QA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=QdrDJgIyOiaxZebnfl408/oLR9uCd18WanA8vBZROlk=; b=D3Dv7LRtZ2DTZtlmMSBet3DZxN8jx8Q7h4ta9bCSjCQhiIxBVohi+kLvnqO0TJsPFQ AgHiWCfwdb0nPqO+WwuTA3vBFNa7KqMrnlLt5B5a/uPZA23xqkuCnpruCJRukag1afib uIfOUwdSxwJErWMC5P1QYve/9ZPRS1dOHG+aNh+zbqvTER4SPLaJRErBmDq0EfihKx8i AdDCIllIRK8rh3Nphw5C0W05VnVxuaQbkRI+wtKQqDlG80dnkYDV4LRhDjO1/SqhksvG 7mfsDu5rrco2TZN+4VUV1u4ql8+rXyVTY5LjHMCt3GzsWVEQJrkSmozhIz7apnLdPueN Xviw== X-Gm-Message-State: AOAM531NZ1PTEHD7+fpbqv0dlKGyEPxduu36kCCNt/NAYGTX9gAEXIwT 09ospTrX4fmT/M0SA0Nju4513KVm5eK98pAivpjkBit2XpM= X-Google-Smtp-Source: ABdhPJwFEPl95Lc5tvYml92w1td3OhtWJlNlG3QAWw3WfITkq36AUFw/0beK5XXs6LVCdGIz/ARvahMwMSymwVrlCFs= X-Received: by 2002:a05:6808:2396:: with SMTP id bp22mr13202984oib.78.1640602344375; Mon, 27 Dec 2021 02:52:24 -0800 (PST) X-Host-Lookup-Failed: Reverse DNS lookup failed for 2607:f8b0:4864:20::22b (failed) Received-SPF: pass client-ip=2607:f8b0:4864:20::22b; envelope-from=ioannis.kappas@gmail.com; helo=mail-oi1-x22b.google.com X-Spam_score_int: -12 X-Spam_score: -1.3 X-Spam_bar: - X-Spam_report: (-1.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:223182 Archived-At: Hi, there appears to have been a symbol spelling mistake in a recent commit a4bfb0bc5c14e002c0926fc320aeb4a3fc261447 to "Default Emacs to UTF-8 instead of Latin-1". The `window-nt' symbol is used instead of `windows-nt' when checking for membership in `system-type`. It does not appear to be of much consequence though, since both `default-file-name-coding-system' and `default-process-coding-system` affected by it appear to be overwritten later on anyway at runtime. A fix could be diff --git a/lisp/international/mule-cmds.el b/lisp/international/mule-cmds.el index a0a6557c95..2b52d4bf86 100644 --- a/lisp/international/mule-cmds.el +++ b/lisp/international/mule-cmds.el @@ -1873,7 +1873,7 @@ reset-language-environment (set-default-coding-systems nil) (setq default-sendmail-coding-system 'utf-8) (setq default-file-name-coding-system (if (memq system-type - '(window-nt ms-dos)) + '(windows-nt ms-dos)) 'iso-latin-1-unix 'utf-8-unix)) ;; Preserve eol-type from existing default-process-coding-systems. @@ -1892,9 +1892,9 @@ reset-language-environment (condition-case nil (coding-system-change-text-conversion (cdr default-process-coding-system) - (if (memq system-type '(window-nt ms-dos)) 'iso-latin-1 'utf-8)) + (if (memq system-type '(windows-nt ms-dos)) 'iso-latin-1 'utf-8)) (coding-system-error - (if (memq system-type '(window-nt ms-dos)) 'iso-latin-1 'utf-8))))) + (if (memq system-type '(windows-nt ms-dos)) 'iso-latin-1 'utf-8))))) (setq default-process-coding-system (cons output-coding input-coding))) I happen to notice this while looking at the default data encoding behaviour when sending data to a sub-process using `call-process-region' on MS-Windows, which I found to differ when compared out of the box to other OSes. When I was sending a UTF-8 region to a sub-process, I was expecting the data reaching the sub-process to have that encoding, though that is not the default behaviour on MS-Windows, which I found confusing, the data are most likely to arrive encoded as iso-8859-1. On GNU/Linux though it is most likely to arrive encoded as UTF-8. I was expecting at first that the encoding of the data sent to the sub-process would be determined by the region's codepage, but it rather seems to be determined by the `default-process-coding-system' (or by a particular sub-process' `process-coding-system-alist', when set). This is all fine, and is document as such. I think the expectation at this time and age is for communication between processes should be in Unicode by default, so as to allow multilingual sets to passed on between them. `flycheck' is an example of a utility which is using `call-process' to marshal buffers to/from a checker sub-processes. Sending multilingual data to the checkers on MS-Windows are likely to cause failure due to the default proc encodings being `undecided-unix', and thus encoded as iso-8859-1 dropping the unicode chars. On GNU/Linux the same operation is most likely to succeed, because the default encoding is most likely to be set to `utf8-unix', courtesy of the LANG env variable being most likely set to a UTF-8 codepage such as `C.UTF-8', and picked up by the locale logic in Emacs. The default process coding system is forced in lisp/w32-fns.el:w32-set-default-process-coding-system: ;; Most programs on Windows will accept Unix line endings on input ;; (and some programs ported from Unix require it) but most will ;; produce DOS line endings on output. (setq default-process-coding-system '(undecided-dos . undecided-unix)) Is now perhaps a good time perhaps now that the utf-8 adaptation is almost universal, to change the default from undecided to utf-8 and thus align it (more or less) with the the most likely out of the box encoding behaviour on GNU/Linux? Of course, a user can set the LANG env variable on MS-Windows to a similar codepage as in Linux, but is rather unlikely a user would ever set this on windows. Also, should the eol type be set to -dos on the input encoding? The comment suggests that this was done because most programs back then were requiring unix eols, but I don't believe that this is the case any more. A final note, the documentation under `Default Coding Systems` gives a warning that `undecided' coding systems do not work reliably with asynchronous sub-process output, perhaps this is an additional argument while we should move away from the undecided default above? https://www.gnu.org/software/emacs/manual/html_node/elisp/Default-Coding-Systems.html """ Warning: Coding systems such as undecided, which determine the coding system from the data, do not work entirely reliably with asynchronous subprocess output. This is because Emacs handles asynchronous subprocess output in batches, as it arrives. If the coding system leaves the character code conversion unspecified, or leaves the end-of-line conversion unspecified, Emacs must try to detect the proper conversion from one batch at a time, and this does not always work. """ Thanks! In GNU Emacs 28.0.90 (build 1, x86_64-w64-mingw32) of 2021-12-26 Repository revision: 89a82182cbca0caa19f5b9463629918b7131ef0c Repository branch: emacs-28 Windowing system distributor 'Microsoft Corp.', version 10 System Description: Microsoft Windows 10