From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Daniel Bastos Newsgroups: gmane.emacs.bugs Subject: bug#58281: 27.1; windows mangles encoding on command line Date: Wed, 12 Oct 2022 08:49:32 -0300 Message-ID: References: <86sfk4cro4.fsf@zejito.i-did-not-set--mail-host-address--so-tickle-me> <8335c3x5yb.fsf@gnu.org> <83k055ctvz.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="9906"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 58281@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Oct 12 13:54:56 2022 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oiaK7-0002LJ-8S for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 12 Oct 2022 13:54:56 +0200 Original-Received: from localhost ([::1]:51480 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oiaK5-0002VU-Ug for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 12 Oct 2022 07:54:53 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:41360) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oiaGP-00072X-3T for bug-gnu-emacs@gnu.org; Wed, 12 Oct 2022 07:51:05 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]:56879) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oiaGM-0002fS-5B for bug-gnu-emacs@gnu.org; Wed, 12 Oct 2022 07:51:04 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1oiaGL-0004BI-Tv for bug-gnu-emacs@gnu.org; Wed, 12 Oct 2022 07:51:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Daniel Bastos Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 12 Oct 2022 11:51:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 58281 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: moreinfo Original-Received: via spool by 58281-submit@debbugs.gnu.org id=B58281.166557543215999 (code B ref 58281); Wed, 12 Oct 2022 11:51:01 +0000 Original-Received: (at 58281) by debbugs.gnu.org; 12 Oct 2022 11:50:32 +0000 Original-Received: from localhost ([127.0.0.1]:55957 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oiaFm-00049P-VV for submit@debbugs.gnu.org; Wed, 12 Oct 2022 07:50:31 -0400 Original-Received: from mail-yb1-f175.google.com ([209.85.219.175]:46649) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1oiaFZ-00046N-TA for 58281@debbugs.gnu.org; Wed, 12 Oct 2022 07:50:23 -0400 Original-Received: by mail-yb1-f175.google.com with SMTP id y205so19716392yby.13 for <58281@debbugs.gnu.org>; Wed, 12 Oct 2022 04:50:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=id.uff.br; s=google; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=P5fK7K+8dtZxbBi8yBEs1t0suPLfv5iGjSOy67Wyp/U=; b=jgjJV7TUE/s/yVhUgBz83poaHJcOcWRZrzO6zT2RYRbKPTQDoWcG/OLSUH/RAzxZiD IQjZQXYIPTXiRtN8YyEQL3bHTqcPu351OHljKLFLazSZz84wajhM93ncxZ9yRnu3LWYg MVikiP/lPniSwXPP9mBXyEqaMn4dtS0X46lcQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P5fK7K+8dtZxbBi8yBEs1t0suPLfv5iGjSOy67Wyp/U=; b=ROhWH7ukjAT6T5pGAZu6A5Kh/dsC0R2+5ixFyCe4NXwoN/J3wol8Zu5RLEl0zx6WAH lXI8G5CG4oeZXgKczx0DVLxqVP+VsEteYk8sRc92/jZCA2dkDOocEb3QXZhCPku2oGaV NzeUYS+Po2vc0kqIiuKkpUhgEdzDFG1jpJ0BJT03tF5RI2VjpIZ/Bra627EKDqlkYpgQ Y3FztkFVB7MpawCW1NMVNfD2xRWIL6sWaEO0Kf13K8U2SlMvVXQdMMzXlbzI4feEQoYS 7LPeGD7gTK6TVkCOK+Z5jBJslrW9VySL8DMz+6BYcAE6NTOGHz/JEJwV3jiLZkIdwwvB vU+g== X-Gm-Message-State: ACrzQf09WSxtOlQuDcB0ql1YTSugmB1xWUe3qqjK/aESsWjfoQifXku+ rfE1MDKs/EtTLtJFtmPKVPtTRiQt5RglK4kVTo4mIdLRPBIZHw== X-Google-Smtp-Source: AMsMyM7wufZ9GGW9qNPC4e06RGlKpAHZWISFY+SXiI0n6HpGaOEZ38CwSA7bzWWIxZRmG649N0l8EjSqPje9tzHo7cg= X-Received: by 2002:a25:4282:0:b0:6be:9369:15bc with SMTP id p124-20020a254282000000b006be936915bcmr26231977yba.487.1665575408263; Wed, 12 Oct 2022 04:50:08 -0700 (PDT) In-Reply-To: <83k055ctvz.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.io gmane.emacs.bugs:245197 Archived-At: On Wed, Oct 12, 2022 at 5:45 AM Eli Zaretskii wrote: > > > From: Daniel Bastos > > Date: Thu, 6 Oct 2022 09:03:50 -0300 > > Cc: Wayne Harris , 58281@debbugs.gnu.org > > > > On Tue, Oct 4, 2022 at 7:02 AM Eli Zaretskii wrote: > > > > From: Wayne Harris > > > > Date: Mon, 03 Oct 2022 22:18:35 -0300 > > > > > > > > I run emacs -Q. I open eshell. Then I try to use fossil (which is= a > > > > version control system like git) and try to put accented letters on= the > > > > commit message. No choice of encoding seems to avoid the mangling. > > > > > > > > c:/my/path $ alias fs 'fossil $*' > > > > c:/my/path $ echo kkk >> encoding.txt > > > > c:/my/path $ fs changes > > > > EDITED encoding.txt > > > > > > > > c:/my/path $ (print default-process-coding-system) > > > > (undecided-dos . undecided-unix) > > > > > > > > c:/my/path $ (or buffer-file-coding-system "it is nil") > > > > it is nil > > > > > > > > c:/my/path $ fs commit -m 'Naivet=C3=A9' > > > > [...] > > > > Sync done, wire bytes sent: 3234 received: 309 ip: 5.161.138.46 > > > > > > > > c:/my/path $ fs timeline -n 1 > > > > =3D=3D=3D 2022-10-02 =3D=3D=3D > > > > 13:11:20 [febbbf0441] *CURRENT* Naivet=C3=83=C2=A9 (user: mer tags:= trunk) > > > > --- entry limit (1) reached --- > > > > c:/my/path $ > > > > > > Where did you download Fossil for MS-Windows? Is it a native Windows > > > program, or a Cygwin program? Is 'fs' a program (i.e. fs.exe) or som= e > > > kind of shell script, and if the latter, can you post the script? > > > > I went to > > > > https://fossil-scm.org/home/uv/download.html > > > > and chose the last one --- Windows64 ---, which is the ZIP at > > > > https://fossil-scm.org/home/uv/fossil-w64-2.19.zip > > > > Inside this ZIP, there's a fossil.exe binary. All evidence points to > > a native Windows program, not a Cygwin program. > > > > %file c:/my/path/fossil.exe > > c:/my/path/fossil.exe: PE32+ executable (console) x86-64, for MS Window= s > > % > > > > There's no fs.exe and no script fs. (Sorry about that.) That's just > > my alias in ESHELL. You can safely assume that /fs/ just means > > /fossil/. (I shouldn't have used the alias in this bug report. > > Sorry.) > > > > > Also, do you know whether Fossil expects the message text in some > > > particular encoding? > > > > That I don't know. I've looked into the documentation, but I did not > > find anything that looked relevant. I did find old commit messages in > > the repository of fossil itself that little by little the developers > > have been adding UTF-8 support to it. But I can't say it expects any > > particular encoding. > > I think you said at some point that using non-ASCII commit log > messages from a shell outside of Emacs did succeed? If so, can you Not from a shell but from a regular GNU EMACS buffer. I then showed an ESHELL session where I don't specify the commit message on the command-line and then emacsclientw was invoked. In the buffer that opened, I typed an UTF-8 encoded message and that was not mangled. --8<---------------cut here---------------start------------->8--- However, if instead of the command-line, I use a regular GNU EMACS buffer, it works just fine. %echo kkk >> encoding.txt %fs commit Pull from https://mer@somewhere.edu/test Round-trips: 1 Artifacts sent: 0 received: 0 Pull done, wire bytes sent: 437 received: 2118 ip: 5.161.138.46 emacsclientw ./ci-comment-A2803F45F10B.txt Waiting for Emacs... Pull from https://mer@somewhere.edu/test Round-trips: 1 Artifacts sent: 0 received: 0 Pull done, wire bytes sent: 441 received: 2118 ip: 5.161.138.46 New_Version: 09ea1b5d5b8d776d61a74bb412cd58bd8b6f82323c2f539a1eb0d915f7026f= 20 Sync with https://mer@somewhere.edu/test Round-trips: 1 Artifacts sent: 2 received: 0 Sync done, wire bytes sent: 2496 received: 309 ip: 5.161.138.46 %fs timeline =3D=3D=3D 2022-10-01 =3D=3D=3D 14:09:39 [09ea1b5d5b] *CURRENT* Naivet=C3=A9. (user: mer tags: trunk) --8<---------------cut here---------------end--------------->8--- > describe how you do that, i.e. which shell do you use and how you type > 'Naivet=C3=A9' from the shell? Also, what does the command "chcp" report > in that shell, if you invoke it with no arguments? I had not tested with a different shell. I'm testing it with cmd.exe below. The encoding is not mangled, but I don't know which encoding is applied there because I have no idea how cmd.exe works. The command chcp reports code page 850. --8<---------------cut here---------------start------------->8--- c:\my\path>chcp Active code page: 850 c:\my\path>fossil commit -m 'Naivet=C3=A9' Pull from https://mer@somewhere.edu/mer Round-trips: 1 Artifacts sent: 0 received: 0 Pull done, wire bytes sent: 438 received: 3250 ip: 5.161.138.46 New_Version: 8cce649b5236e507e84ce8114ab273e3b9ea246dd00e42484b47ab86517cf0= 28 Sync with https://mer@somewhere.edu/mer Round-trips: 1 Artifacts sent: 2 received: 0 Sync done, wire bytes sent: 3615 received: 307 ip: 5.161.138.46 c:\my\path>fossil timeline -n 1 =3D=3D=3D 2022-10-12 =3D=3D=3D 11:31:30 [8cce649b52] *CURRENT* 'Naivet=C3=A9' (user: mer tags: trunk) --- entry limit (1) reached --- c:\my\path> --8<---------------cut here---------------end--------------->8--- However, there is some evidence that UTF-8 is the encoding used by cmd.exe. I committed again with the message "=C3=A1gua aaaaa". --8<---------------cut here---------------start------------->8--- c:\my\path>fossil timeline -n 1 =3D=3D=3D 2022-10-12 =3D=3D=3D 11:38:30 [148c174ad3] *CURRENT* =C3=A1gua aaaaa (user: mer tags: trunk) --- entry limit (1) reached --- --8<---------------cut here---------------end--------------->8--- I know "=C3=A1" encodes to the two-byte c3 a1 in UTF-8. Asking /od/ to show me the byte sequence, I see the c3 a1 in there. First notice the position of the two-byte sequence of interest --- it's in line 0000060 at the 4th column. --8<---------------cut here---------------start------------->8--- c:\my\path>fossil timeline -n 1 | od -t c 0000000 =3D =3D =3D 2 0 2 2 - 1 0 - 1 2 = =3D 0000020 =3D =3D \n 1 1 : 3 8 : 3 0 [ 1 4 8 0000040 c 1 7 4 a d 3 ] * C U R R E N 0000060 T * =C3=83 =C2=A1 g u a a a a a a = ( [...] --8<---------------cut here---------------end--------------->8--- If we look at which bytes are there, we find c3 a1. I do not understand this: I have no idea why my cmd.exe is UTF-8 encoding anything. --8<---------------cut here---------------start------------->8--- c:\my\path>fossil timeline -n 1 | od -t x1 0000000 3d 3d 3d 20 32 30 32 32 2d 31 30 2d 31 32 20 3d 0000020 3d 3d 0a 31 31 3a 33 38 3a 33 30 20 5b 31 34 38 0000040 63 31 37 34 61 64 33 5d 20 2a 43 55 52 52 45 4e 0000060 54 2a 20 c3 a1 67 75 61 20 61 61 61 61 61 20 28 [...] --8<---------------cut here---------------end--------------->8--- Feel free to ask me any further questions. Thank you!