From: Eli Zaretskii <eliz@gnu.org>
To: Daniel Bastos <dbastos@id.uff.br>
Cc: 58281@debbugs.gnu.org
Subject: bug#58281: 27.1; windows mangles encoding on command line
Date: Wed, 12 Oct 2022 19:35:49 +0300 [thread overview]
Message-ID: <83h709atju.fsf@gnu.org> (raw)
In-Reply-To: <CAEQ-z=+TfC1P=G=4LYPigBc0mX_SAfyML3SguqmNz0jBu219ew@mail.gmail.com> (message from Daniel Bastos on Wed, 12 Oct 2022 08:49:32 -0300)
> From: Daniel Bastos <dbastos@id.uff.br>
> Date: Wed, 12 Oct 2022 08:49:32 -0300
> Cc: 58281@debbugs.gnu.org
>
> > I think you said at some point that using non-ASCII commit log
> > messages from a shell outside of Emacs did succeed? If so, can you
>
> Not from a shell but from a regular GNU EMACS buffer. I then showed
> an ESHELL session where I don't specify the commit message on the
> command-line and then emacsclientw was invoked. In the buffer that
> opened, I typed an UTF-8 encoded message and that was not mangled.
>
> --8<---------------cut here---------------start------------->8---
> However, if instead of the command-line, I use a regular GNU EMACS
> buffer, it works just fine.
>
> %echo kkk >> encoding.txt
>
> %fs commit
> Pull from https://mer@somewhere.edu/test
> Round-trips: 1 Artifacts sent: 0 received: 0
> Pull done, wire bytes sent: 437 received: 2118 ip: 5.161.138.46
> emacsclientw ./ci-comment-A2803F45F10B.txt
> Waiting for Emacs...
> Pull from https://mer@somewhere.edu/test
> Round-trips: 1 Artifacts sent: 0 received: 0
> Pull done, wire bytes sent: 441 received: 2118 ip: 5.161.138.46
> New_Version: 09ea1b5d5b8d776d61a74bb412cd58bd8b6f82323c2f539a1eb0d915f7026f20
> Sync with https://mer@somewhere.edu/test
> Round-trips: 1 Artifacts sent: 2 received: 0
> Sync done, wire bytes sent: 2496 received: 309 ip: 5.161.138.46
>
> %fs timeline
> === 2022-10-01 ===
> 14:09:39 [09ea1b5d5b] *CURRENT* Naiveté. (user: mer tags: trunk)
> --8<---------------cut here---------------end--------------->8---
I don't understand what that means, sorry. There's a lot of stuff
that isn't relevant to the issue at hand (and I'm not familiar with
fossil, so its detailed output makes no difference to me). But
there's no description of what you did in plain English, which I could
read and understand.
I'm guessing that emacsclientw was invoked to edit a file with the
commit log message, and the commit command then used that edited
file. If that is true, then there's no wonder this works: the problem
you experience only happens if the commit log message is passed to
fossil through the command-line arguments, not through a disk file.
> > describe how you do that, i.e. which shell do you use and how you type
> > 'Naiveté' from the shell? Also, what does the command "chcp" report
> > in that shell, if you invoke it with no arguments?
>
> I had not tested with a different shell. I'm testing it with cmd.exe
> below. The encoding is not mangled, but I don't know which encoding
> is applied there because I have no idea how cmd.exe works. The
> command chcp reports code page 850.
If chcp says codepage 850, then cmd.exe uses that codepage to encode.
And my reading of the fossil source code is that it converts the
command-line arguments from the codepage-encoding to UTF-8 internally.
>
> --8<---------------cut here---------------start------------->8---
> c:\my\path>chcp
> Active code page: 850
>
> c:\my\path>fossil commit -m 'Naiveté'
> Pull from https://mer@somewhere.edu/mer
> Round-trips: 1 Artifacts sent: 0 received: 0
> Pull done, wire bytes sent: 438 received: 3250 ip: 5.161.138.46
> New_Version: 8cce649b5236e507e84ce8114ab273e3b9ea246dd00e42484b47ab86517cf028
> Sync with https://mer@somewhere.edu/mer
> Round-trips: 1 Artifacts sent: 2 received: 0
> Sync done, wire bytes sent: 3615 received: 307 ip: 5.161.138.46
>
> c:\my\path>fossil timeline -n 1
> === 2022-10-12 ===
> 11:31:30 [8cce649b52] *CURRENT* 'Naiveté' (user: mer tags: trunk)
> --- entry limit (1) reached ---
>
> c:\my\path>
> --8<---------------cut here---------------end--------------->8---
So now the question is why Eshell doesn't use the cp850 encoding when
you tell it? What happens if you say
C-x RET f cp850 RET
in the Eshell buffer before invoking the commit command?
> However, there is some evidence that UTF-8 is the encoding used by
> cmd.exe. I committed again with the message "água aaaaa".
>
> --8<---------------cut here---------------start------------->8---
> c:\my\path>fossil timeline -n 1
> === 2022-10-12 ===
> 11:38:30 [148c174ad3] *CURRENT* água aaaaa (user: mer tags: trunk)
> --- entry limit (1) reached ---
> --8<---------------cut here---------------end--------------->8---
>
> I know "á" encodes to the two-byte c3 a1 in UTF-8. Asking /od/ to
> show me the byte sequence, I see the c3 a1 in there. First notice the
> position of the two-byte sequence of interest --- it's in line 0000060
> at the 4th column.
>
> --8<---------------cut here---------------start------------->8---
> c:\my\path>fossil timeline -n 1 | od -t c
> 0000000 = = = 2 0 2 2 - 1 0 - 1 2 =
> 0000020 = = \n 1 1 : 3 8 : 3 0 [ 1 4 8
> 0000040 c 1 7 4 a d 3 ] * C U R R E N
> 0000060 T * Ã ¡ g u a a a a a a (
> [...]
> --8<---------------cut here---------------end--------------->8---
>
> If we look at which bytes are there, we find c3 a1. I do not
> understand this: I have no idea why my cmd.exe is UTF-8 encoding
> anything.
It doesn't. What you see is the result of fossil's internal
conversion to UTF-8, not what cmd.exe passed to fossil.
next prev parent reply other threads:[~2022-10-12 16:35 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-04 1:18 bug#58281: 27.1; windows mangles encoding on command line Wayne Harris
2022-10-04 10:02 ` Eli Zaretskii
2022-10-06 12:03 ` Daniel Bastos
2022-10-12 8:45 ` Eli Zaretskii
2022-10-12 11:49 ` Daniel Bastos
2022-10-12 16:35 ` Eli Zaretskii [this message]
2022-10-12 16:54 ` Eli Zaretskii
2022-10-15 11:02 ` Eli Zaretskii
2022-11-06 7:20 ` Eli Zaretskii
2022-11-07 19:40 ` Daniel Bastos
2022-11-07 20:10 ` Eli Zaretskii
2022-11-07 22:38 ` Daniel Bastos
2022-11-08 12:08 ` Eli Zaretskii
2023-09-03 9:26 ` Stefan Kangas
2022-11-07 20:23 ` Eli Zaretskii
2022-11-07 22:42 ` Daniel Bastos
2022-11-08 12:09 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83h709atju.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=58281@debbugs.gnu.org \
--cc=dbastos@id.uff.br \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.