all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Daniel Bastos <dbastos@id.uff.br>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 58281@debbugs.gnu.org
Subject: bug#58281: 27.1; windows mangles encoding on command line
Date: Wed, 12 Oct 2022 08:49:32 -0300	[thread overview]
Message-ID: <CAEQ-z=+TfC1P=G=4LYPigBc0mX_SAfyML3SguqmNz0jBu219ew@mail.gmail.com> (raw)
In-Reply-To: <83k055ctvz.fsf@gnu.org>

On Wed, Oct 12, 2022 at 5:45 AM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: Daniel Bastos <dbastos@id.uff.br>
> > Date: Thu, 6 Oct 2022 09:03:50 -0300
> > Cc: Wayne Harris <dbastos@toledo.com>, 58281@debbugs.gnu.org
> >
> > On Tue, Oct 4, 2022 at 7:02 AM Eli Zaretskii <eliz@gnu.org> wrote:
> > > > From: Wayne Harris <dbastos@toledo.com>
> > > > Date: Mon, 03 Oct 2022 22:18:35 -0300
> > > >
> > > > I run emacs -Q.  I open eshell.  Then I try to use fossil (which is a
> > > > version control system like git) and try to put accented letters on the
> > > > commit message.  No choice of encoding seems to avoid the mangling.
> > > >
> > > > c:/my/path $ alias fs 'fossil $*'
> > > > c:/my/path $ echo kkk >> encoding.txt
> > > > c:/my/path $ fs changes
> > > > EDITED     encoding.txt
> > > >
> > > > c:/my/path $ (print default-process-coding-system)
> > > > (undecided-dos . undecided-unix)
> > > >
> > > > c:/my/path $ (or buffer-file-coding-system "it is nil")
> > > > it is nil
> > > >
> > > > c:/my/path $ fs commit -m 'Naiveté'
> > > > [...]
> > > > Sync done, wire bytes sent: 3234  received: 309  ip: 5.161.138.46
> > > >
> > > > c:/my/path $ fs timeline -n 1
> > > > === 2022-10-02 ===
> > > > 13:11:20 [febbbf0441] *CURRENT* Naiveté (user: mer tags: trunk)
> > > > --- entry limit (1) reached ---
> > > > c:/my/path $
> > >
> > > Where did you download Fossil for MS-Windows?  Is it a native Windows
> > > program, or a Cygwin program?  Is 'fs' a program (i.e. fs.exe) or some
> > > kind of shell script, and if the latter, can you post the script?
> >
> > I went to
> >
> >   https://fossil-scm.org/home/uv/download.html
> >
> > and chose the last one --- Windows64 ---, which is the ZIP at
> >
> >   https://fossil-scm.org/home/uv/fossil-w64-2.19.zip
> >
> > Inside this ZIP, there's a fossil.exe binary.  All evidence points to
> > a native Windows program, not a Cygwin program.
> >
> > %file c:/my/path/fossil.exe
> > c:/my/path/fossil.exe: PE32+ executable (console) x86-64, for MS Windows
> > %
> >
> > There's no fs.exe and no script fs.  (Sorry about that.)  That's just
> > my alias in ESHELL.  You can safely assume that /fs/ just means
> > /fossil/.  (I shouldn't have used the alias in this bug report.
> > Sorry.)
> >
> > > Also, do you know whether Fossil expects the message text in some
> > > particular encoding?
> >
> > That I don't know.  I've looked into the documentation, but I did not
> > find anything that looked relevant.  I did find old commit messages in
> > the repository of fossil itself that little by little the developers
> > have been adding UTF-8 support to it.  But I can't say it expects any
> > particular encoding.
>
> I think you said at some point that using non-ASCII commit log
> messages from a shell outside of Emacs did succeed?  If so, can you

Not from a shell but from a regular GNU EMACS buffer.  I then showed
an ESHELL session where I don't specify the commit message on the
command-line and then emacsclientw was invoked.  In the buffer that
opened, I typed an UTF-8 encoded message and that was not mangled.

--8<---------------cut here---------------start------------->8---
However, if instead of the command-line, I use a regular GNU EMACS
buffer, it works just fine.

%echo kkk >> encoding.txt

%fs commit
Pull from https://mer@somewhere.edu/test
Round-trips: 1   Artifacts sent: 0  received: 0
Pull done, wire bytes sent: 437  received: 2118  ip: 5.161.138.46
emacsclientw ./ci-comment-A2803F45F10B.txt
Waiting for Emacs...
Pull from https://mer@somewhere.edu/test
Round-trips: 1   Artifacts sent: 0  received: 0
Pull done, wire bytes sent: 441  received: 2118  ip: 5.161.138.46
New_Version: 09ea1b5d5b8d776d61a74bb412cd58bd8b6f82323c2f539a1eb0d915f7026f20
Sync with https://mer@somewhere.edu/test
Round-trips: 1   Artifacts sent: 2  received: 0
Sync done, wire bytes sent: 2496  received: 309  ip: 5.161.138.46

%fs timeline
=== 2022-10-01 ===
14:09:39 [09ea1b5d5b] *CURRENT* Naiveté. (user: mer tags: trunk)
--8<---------------cut here---------------end--------------->8---

> describe how you do that, i.e. which shell do you use and how you type
> 'Naiveté' from the shell?  Also, what does the command "chcp" report
> in that shell, if you invoke it with no arguments?

I had not tested with a different shell.  I'm testing it with cmd.exe
below.  The encoding is not mangled, but I don't know which encoding
is applied there because I have no idea how cmd.exe works.  The
command chcp reports code page 850.

--8<---------------cut here---------------start------------->8---
c:\my\path>chcp
Active code page: 850

c:\my\path>fossil commit -m 'Naiveté'
Pull from https://mer@somewhere.edu/mer
Round-trips: 1   Artifacts sent: 0  received: 0
Pull done, wire bytes sent: 438  received: 3250  ip: 5.161.138.46
New_Version: 8cce649b5236e507e84ce8114ab273e3b9ea246dd00e42484b47ab86517cf028
Sync with https://mer@somewhere.edu/mer
Round-trips: 1   Artifacts sent: 2  received: 0
Sync done, wire bytes sent: 3615  received: 307  ip: 5.161.138.46

c:\my\path>fossil timeline -n 1
=== 2022-10-12 ===
11:31:30 [8cce649b52] *CURRENT* 'Naiveté' (user: mer tags: trunk)
--- entry limit (1) reached ---

c:\my\path>
--8<---------------cut here---------------end--------------->8---

However, there is some evidence that UTF-8 is the encoding used by
cmd.exe.  I committed again with the message "água aaaaa".

--8<---------------cut here---------------start------------->8---
c:\my\path>fossil timeline -n 1
=== 2022-10-12 ===
11:38:30 [148c174ad3] *CURRENT* água aaaaa (user: mer tags: trunk)
--- entry limit (1) reached ---
--8<---------------cut here---------------end--------------->8---

I know "á" encodes to the two-byte c3 a1 in UTF-8.  Asking /od/ to
show me the byte sequence, I see the c3 a1 in there.  First notice the
position of the two-byte sequence of interest --- it's in line 0000060
at the 4th column.

--8<---------------cut here---------------start------------->8---
c:\my\path>fossil timeline -n 1 | od -t c
0000000   =   =   =       2   0   2   2   -   1   0   -   1   2       =
0000020   =   =  \n   1   1   :   3   8   :   3   0       [   1   4   8
0000040   c   1   7   4   a   d   3   ]       *   C   U   R   R   E   N
0000060   T   *       Ã   ¡   g   u   a       a   a   a   a   a       (
[...]
--8<---------------cut here---------------end--------------->8---

If we look at which bytes are there, we find c3 a1.  I do not
understand this: I have no idea why my cmd.exe is UTF-8 encoding
anything.

--8<---------------cut here---------------start------------->8---
c:\my\path>fossil timeline -n 1 | od -t x1
0000000 3d 3d 3d 20 32 30 32 32 2d 31 30 2d 31 32 20 3d
0000020 3d 3d 0a 31 31 3a 33 38 3a 33 30 20 5b 31 34 38
0000040 63 31 37 34 61 64 33 5d 20 2a 43 55 52 52 45 4e
0000060 54 2a 20 c3 a1 67 75 61 20 61 61 61 61 61 20 28
[...]
--8<---------------cut here---------------end--------------->8---

Feel free to ask me any further questions. Thank you!





  reply	other threads:[~2022-10-12 11:49 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-04  1:18 bug#58281: 27.1; windows mangles encoding on command line Wayne Harris
2022-10-04 10:02 ` Eli Zaretskii
2022-10-06 12:03   ` Daniel Bastos
2022-10-12  8:45     ` Eli Zaretskii
2022-10-12 11:49       ` Daniel Bastos [this message]
2022-10-12 16:35         ` Eli Zaretskii
2022-10-12 16:54         ` Eli Zaretskii
2022-10-15 11:02           ` Eli Zaretskii
2022-11-06  7:20             ` Eli Zaretskii
2022-11-07 19:40               ` Daniel Bastos
2022-11-07 20:10                 ` Eli Zaretskii
2022-11-07 22:38                   ` Daniel Bastos
2022-11-08 12:08                     ` Eli Zaretskii
2023-09-03  9:26                       ` Stefan Kangas
2022-11-07 20:23                 ` Eli Zaretskii
2022-11-07 22:42                   ` Daniel Bastos
2022-11-08 12:09                     ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEQ-z=+TfC1P=G=4LYPigBc0mX_SAfyML3SguqmNz0jBu219ew@mail.gmail.com' \
    --to=dbastos@id.uff.br \
    --cc=58281@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.