From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#58281: 27.1; windows mangles encoding on command line Date: Wed, 12 Oct 2022 19:35:49 +0300 Message-ID: <83h709atju.fsf@gnu.org> References: <86sfk4cro4.fsf@zejito.i-did-not-set--mail-host-address--so-tickle-me> <8335c3x5yb.fsf@gnu.org> <83k055ctvz.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="9333"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 58281@debbugs.gnu.org To: Daniel Bastos Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Oct 12 18:52:22 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oiexy-0002E7-D3 for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 12 Oct 2022 18:52:22 +0200 Original-Received: from localhost ([::1]:49578 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oiexx-0001tS-AX for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 12 Oct 2022 12:52:21 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:49456) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oieiA-0005T1-Ob for bug-gnu-emacs@gnu.org; Wed, 12 Oct 2022 12:36:05 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:59253) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oieiA-0002AK-GZ for bug-gnu-emacs@gnu.org; Wed, 12 Oct 2022 12:36:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1oieiA-0002aE-2y for bug-gnu-emacs@gnu.org; Wed, 12 Oct 2022 12:36:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 12 Oct 2022 16:36:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 58281 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: moreinfo Original-Received: via spool by 58281-submit@debbugs.gnu.org id=B58281.16655925499909 (code B ref 58281); Wed, 12 Oct 2022 16:36:02 +0000 Original-Received: (at 58281) by debbugs.gnu.org; 12 Oct 2022 16:35:49 +0000 Original-Received: from localhost ([127.0.0.1]:58331 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oiehw-0002Zk-QT for submit@debbugs.gnu.org; Wed, 12 Oct 2022 12:35:49 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:35158) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oiehu-0002ZW-O3 for 58281@debbugs.gnu.org; Wed, 12 Oct 2022 12:35:47 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:52992) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oiehn-000286-Dh; Wed, 12 Oct 2022 12:35:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=atBC5KqOPuxWF+g6qNZDYtfVylH5uRnJTzrJO6yJkD0=; b=gKx07YKpIaPCoPybE2gj SGxP2nBI3PLw7oox1390LTjDzyC105kQ5MNxqPJHequxj7CcjnHBfopyyYk8AOJ1lXmtqZbxOaDx1 CBBilGXXFxcDQ8UYKWHvB+EYc9t+NydmMJytEw4oMYpubr62Rku2/2KnTQG+tOsIrvRNS2aSsfAfF b5tT50MxnbmPqh4VGT1FNqRJxp7oTvLozF2qH9RfGjLjdtJ677ZlVkC/rbLy+UF1YJJY/jYk6r8lB T879/vhs4FhzjEGxfn6ocpwUeGOUV3/fcKgcAO4Mmv5525iIgDKEkktCYHsAkFxKpVBlKS8igVRgI rJClBiAmhec+Qw==; Original-Received: from [87.69.77.57] (port=4206 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oiehm-0008Di-S1; Wed, 12 Oct 2022 12:35:39 -0400 In-Reply-To: (message from Daniel Bastos on Wed, 12 Oct 2022 08:49:32 -0300) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:245234 Archived-At: > From: Daniel Bastos > Date: Wed, 12 Oct 2022 08:49:32 -0300 > Cc: 58281@debbugs.gnu.org > > > I think you said at some point that using non-ASCII commit log > > messages from a shell outside of Emacs did succeed? If so, can you > > Not from a shell but from a regular GNU EMACS buffer. I then showed > an ESHELL session where I don't specify the commit message on the > command-line and then emacsclientw was invoked. In the buffer that > opened, I typed an UTF-8 encoded message and that was not mangled. > > --8<---------------cut here---------------start------------->8--- > However, if instead of the command-line, I use a regular GNU EMACS > buffer, it works just fine. > > %echo kkk >> encoding.txt > > %fs commit > Pull from https://mer@somewhere.edu/test > Round-trips: 1 Artifacts sent: 0 received: 0 > Pull done, wire bytes sent: 437 received: 2118 ip: 5.161.138.46 > emacsclientw ./ci-comment-A2803F45F10B.txt > Waiting for Emacs... > Pull from https://mer@somewhere.edu/test > Round-trips: 1 Artifacts sent: 0 received: 0 > Pull done, wire bytes sent: 441 received: 2118 ip: 5.161.138.46 > New_Version: 09ea1b5d5b8d776d61a74bb412cd58bd8b6f82323c2f539a1eb0d915f7026f20 > Sync with https://mer@somewhere.edu/test > Round-trips: 1 Artifacts sent: 2 received: 0 > Sync done, wire bytes sent: 2496 received: 309 ip: 5.161.138.46 > > %fs timeline > === 2022-10-01 === > 14:09:39 [09ea1b5d5b] *CURRENT* Naiveté. (user: mer tags: trunk) > --8<---------------cut here---------------end--------------->8--- I don't understand what that means, sorry. There's a lot of stuff that isn't relevant to the issue at hand (and I'm not familiar with fossil, so its detailed output makes no difference to me). But there's no description of what you did in plain English, which I could read and understand. I'm guessing that emacsclientw was invoked to edit a file with the commit log message, and the commit command then used that edited file. If that is true, then there's no wonder this works: the problem you experience only happens if the commit log message is passed to fossil through the command-line arguments, not through a disk file. > > describe how you do that, i.e. which shell do you use and how you type > > 'Naiveté' from the shell? Also, what does the command "chcp" report > > in that shell, if you invoke it with no arguments? > > I had not tested with a different shell. I'm testing it with cmd.exe > below. The encoding is not mangled, but I don't know which encoding > is applied there because I have no idea how cmd.exe works. The > command chcp reports code page 850. If chcp says codepage 850, then cmd.exe uses that codepage to encode. And my reading of the fossil source code is that it converts the command-line arguments from the codepage-encoding to UTF-8 internally. > > --8<---------------cut here---------------start------------->8--- > c:\my\path>chcp > Active code page: 850 > > c:\my\path>fossil commit -m 'Naiveté' > Pull from https://mer@somewhere.edu/mer > Round-trips: 1 Artifacts sent: 0 received: 0 > Pull done, wire bytes sent: 438 received: 3250 ip: 5.161.138.46 > New_Version: 8cce649b5236e507e84ce8114ab273e3b9ea246dd00e42484b47ab86517cf028 > Sync with https://mer@somewhere.edu/mer > Round-trips: 1 Artifacts sent: 2 received: 0 > Sync done, wire bytes sent: 3615 received: 307 ip: 5.161.138.46 > > c:\my\path>fossil timeline -n 1 > === 2022-10-12 === > 11:31:30 [8cce649b52] *CURRENT* 'Naiveté' (user: mer tags: trunk) > --- entry limit (1) reached --- > > c:\my\path> > --8<---------------cut here---------------end--------------->8--- So now the question is why Eshell doesn't use the cp850 encoding when you tell it? What happens if you say C-x RET f cp850 RET in the Eshell buffer before invoking the commit command? > However, there is some evidence that UTF-8 is the encoding used by > cmd.exe. I committed again with the message "água aaaaa". > > --8<---------------cut here---------------start------------->8--- > c:\my\path>fossil timeline -n 1 > === 2022-10-12 === > 11:38:30 [148c174ad3] *CURRENT* água aaaaa (user: mer tags: trunk) > --- entry limit (1) reached --- > --8<---------------cut here---------------end--------------->8--- > > I know "á" encodes to the two-byte c3 a1 in UTF-8. Asking /od/ to > show me the byte sequence, I see the c3 a1 in there. First notice the > position of the two-byte sequence of interest --- it's in line 0000060 > at the 4th column. > > --8<---------------cut here---------------start------------->8--- > c:\my\path>fossil timeline -n 1 | od -t c > 0000000 = = = 2 0 2 2 - 1 0 - 1 2 = > 0000020 = = \n 1 1 : 3 8 : 3 0 [ 1 4 8 > 0000040 c 1 7 4 a d 3 ] * C U R R E N > 0000060 T * Ã ¡ g u a a a a a a ( > [...] > --8<---------------cut here---------------end--------------->8--- > > If we look at which bytes are there, we find c3 a1. I do not > understand this: I have no idea why my cmd.exe is UTF-8 encoding > anything. It doesn't. What you see is the result of fossil's internal conversion to UTF-8, not what cmd.exe passed to fossil.