unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Cyrillic VC Git commit messages
@ 2014-11-21  4:30 Nikolay Kudryavtsev
  2014-11-21  8:41 ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Nikolay Kudryavtsev @ 2014-11-21  4:30 UTC (permalink / raw)
  To: help-gnu-emacs@gnu.org

Hi all.

Sometimes I work with projects that have Russian commit messages in the 
git log.

I've found a way to make them work, but it's kind of counter-intuitive. 
First you set
(setq vc-git-commits-coding-system 'windows-1251))
And then in .gitconfig:
[i18n]
logoutputencoding = windows-1251

This works fine inside of emacs, but totally breaks git log in windows 
cmd. For some reason git always expects windows-1251(system default) for 
input, but outputs windows-1252 to cmd.exe and utf-8 to emacs.

So, did I miss something?  Is there another way?

-- 
Best Regards,
Nikolay Kudryavtsev




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Cyrillic VC Git commit messages
  2014-11-21  4:30 Cyrillic VC Git commit messages Nikolay Kudryavtsev
@ 2014-11-21  8:41 ` Eli Zaretskii
  2014-11-21 13:21   ` Nikolay Kudryavtsev
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2014-11-21  8:41 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Nikolay Kudryavtsev <nikolay.kudryavtsev@gmail.com>
> Date: Fri, 21 Nov 2014 07:30:06 +0300
> 
> (setq vc-git-commits-coding-system 'windows-1251))
> And then in .gitconfig:
> [i18n]
> logoutputencoding = windows-1251
> 
> This works fine inside of emacs, but totally breaks git log in windows 
> cmd. For some reason git always expects windows-1251(system default) for 
> input, but outputs windows-1252 to cmd.exe and utf-8 to emacs.

It's a missing feature in vc-git.el, see

  http://lists.gnu.org/archive/html/emacs-devel/2014-11/msg01274.html

and perhaps also a bug in git.

> So, did I miss something?  Is there another way?

To work around, try this in your ~/.emacs:

  (add-to-list process-coding-system-alist
               '("[gG][iI][tT]" windows-1251 . utf-8))




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Cyrillic VC Git commit messages
  2014-11-21  8:41 ` Eli Zaretskii
@ 2014-11-21 13:21   ` Nikolay Kudryavtsev
  2014-11-21 14:06     ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Nikolay Kudryavtsev @ 2014-11-21 13:21 UTC (permalink / raw)
  To: help-gnu-emacs

>    http://lists.gnu.org/archive/html/emacs-devel/2014-11/msg01274.html
Seen this before sending my question.  Taking .gitcofig settings into 
account would break my workaround.

> and perhaps also a bug in git. 
 From what I read on the msysgit wiki, it seems that the developers 
consider this a feature.

> To work around, try this in your ~/.emacs:
>
>    (add-to-list process-coding-system-alist
>                 '("[gG][iI][tT]" windows-1251 . utf-8))
This would not work, because vc-git-commits-coding-system is always used 
instead. And vc-git-commits-coding-system only accepts a single coding 
system.

-- 
Best Regards,
Nikolay Kudryavtsev




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Cyrillic VC Git commit messages
  2014-11-21 13:21   ` Nikolay Kudryavtsev
@ 2014-11-21 14:06     ` Eli Zaretskii
  2014-11-21 14:38       ` Nikolay Kudryavtsev
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2014-11-21 14:06 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Nikolay Kudryavtsev <nikolay.kudryavtsev@gmail.com>
> Date: Fri, 21 Nov 2014 16:21:28 +0300
> 
> >    http://lists.gnu.org/archive/html/emacs-devel/2014-11/msg01274.html
> Seen this before sending my question.  Taking .gitcofig settings into 
> account would break my workaround.
> 
> > and perhaps also a bug in git. 
>  From what I read on the msysgit wiki, it seems that the developers 
> consider this a feature.

Could you give a pointer to that place?

> > To work around, try this in your ~/.emacs:
> >
> >    (add-to-list process-coding-system-alist
> >                 '("[gG][iI][tT]" windows-1251 . utf-8))
> This would not work, because vc-git-commits-coding-system is always used 
> instead. And vc-git-commits-coding-system only accepts a single coding 
> system.

Even if you set vc-git-commits-coding-system to UTF-8?



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Cyrillic VC Git commit messages
  2014-11-21 14:06     ` Eli Zaretskii
@ 2014-11-21 14:38       ` Nikolay Kudryavtsev
  2014-11-21 15:28         ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Nikolay Kudryavtsev @ 2014-11-21 14:38 UTC (permalink / raw)
  To: help-gnu-emacs

> Could you give a pointer to that place?

It's mentioned here.  <https://github.com/msysgit/msysgit/wiki/Git-for-Windows-Unicode-Support#Disable_commit_message_transcoding>  Couldn't find a more elaborate explanation.


> Even if you set vc-git-commits-coding-system to UTF-8?
Yes. VC does is not using process-coding-system-alist at all. vc-git-commits-coding-system is used instead.

-- 
Best Regards,
Nikolay Kudryavtsev



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Cyrillic VC Git commit messages
  2014-11-21 14:38       ` Nikolay Kudryavtsev
@ 2014-11-21 15:28         ` Eli Zaretskii
  2014-11-21 16:48           ` Nikolay Kudryavtsev
  0 siblings, 1 reply; 8+ messages in thread
From: Eli Zaretskii @ 2014-11-21 15:28 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Nikolay Kudryavtsev <nikolay.kudryavtsev@gmail.com>
> Date: Fri, 21 Nov 2014 17:38:51 +0300
> 
> > Could you give a pointer to that place?
> 
> It's mentioned here.  <https://github.com/msysgit/msysgit/wiki/Git-for-Windows-Unicode-Support#Disable_commit_message_transcoding>  Couldn't find a more elaborate explanation.

I see nothing there that says it's a feature.  I don't even see there
a confirmation that output is always in UTF-8.  Can you tell how you
decided that, or where did you see that described?

Do I understand correctly that you see Cyrillic text encoded
differently when it is sent to Emacs and to the cmd.exe window?  And
it sends codepage 1252 (not 1251) to the cmd.exe window?

Moreover, you seem to say that Git outputs in UTF-8 even though you
customized i18n.logoutputencoding to be windows-1251?  That'd be a
real bug in Git.  How about asking about that on the msysgit mailing
list?

This message:

  http://osdir.com/ml/msysgit/2009-11/msg00140.html

seems to say that the problem disappears if you use --no-pager, so
maybe the bug is in Less?  There are some suggestions to play with the
value of the environment variable LESSCHARSET.

(This information might be obsolete with the current versions of
msysgit.)

> VC does is not using process-coding-system-alist at all. vc-git-commits-coding-system is used instead.

That's not true.  First, vc-git-commits-coding-system is used only in
2 commands in vc-git; others use process-coding-system-alist.  More
importantly, even those 2 commands bind only one of the coding
systems, the other is determined by process-coding-system-alist.

Not sure this helps you, though.

Anyway, if nothing else works for you, modify vc-git.el to use 2
variables instead of just one for input and output of logs, then you
can give each variable the value you need.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Cyrillic VC Git commit messages
  2014-11-21 15:28         ` Eli Zaretskii
@ 2014-11-21 16:48           ` Nikolay Kudryavtsev
  2014-11-22 13:42             ` Eli Zaretskii
  0 siblings, 1 reply; 8+ messages in thread
From: Nikolay Kudryavtsev @ 2014-11-21 16:48 UTC (permalink / raw)
  To: help-gnu-emacs

> Can you tell how you decided that, or where did you see that described?
That part implies that there is some new functionality in msysgit that 
does the recoding for windows cmd.exe.

> And it sends codepage 1252 (not 1251) to the cmd.exe window?
It first decodes the message with logoutputencoding, then recodes it 
with windows-1252. If you set logoutputencoding to windows-1251, like I 
do, it breaks cmd.exe output.

> Moreover, you seem to say that Git outputs in UTF-8 even though you
> customized i18n.logoutputencoding to be windows-1251?
For vc log the second encoding with windows-1252 does not happen.

For the commit message,  git first recodes from windows-1251 to utf-8 
and then recodes to commitencoding. This behavior is shared when called 
from VC and cmd.exe.

> First, vc-git-commits-coding-system is used only in 2 commands in vc-git
Yeah, but that's exactly the two commands we care about here. It sets 
coding-system-for-read for log and coding-system-for-write for commit 
message

> modify vc-git.el to use 2 variables
Thought about doing this, but first decided to see if I can get any 
help. Those git "hooks" do weird things to say the least.

-- 
Best Regards,
Nikolay Kudryavtsev




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Cyrillic VC Git commit messages
  2014-11-21 16:48           ` Nikolay Kudryavtsev
@ 2014-11-22 13:42             ` Eli Zaretskii
  0 siblings, 0 replies; 8+ messages in thread
From: Eli Zaretskii @ 2014-11-22 13:42 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Nikolay Kudryavtsev <nikolay.kudryavtsev@gmail.com>
> Date: Fri, 21 Nov 2014 19:48:47 +0300
> 
> > Can you tell how you decided that, or where did you see that described?
> That part implies that there is some new functionality in msysgit that 
> does the recoding for windows cmd.exe.
> 
> > And it sends codepage 1252 (not 1251) to the cmd.exe window?
> It first decodes the message with logoutputencoding, then recodes it 
> with windows-1252. If you set logoutputencoding to windows-1251, like I 
> do, it breaks cmd.exe output.
> 
> > Moreover, you seem to say that Git outputs in UTF-8 even though you
> > customized i18n.logoutputencoding to be windows-1251?
> For vc log the second encoding with windows-1252 does not happen.
> 
> For the commit message,  git first recodes from windows-1251 to utf-8 
> and then recodes to commitencoding. This behavior is shared when called 
> from VC and cmd.exe.

I looked into this some more and ran some simple tests, and I'm not
sure I see the same behavior as the one you describe.

First, preliminaries: I tried this with msysGit version
1.9.4.msysgit.2 (the latest binary release) on Windows XP SP3.  I
cannot easily set up a Cyrillic locale on my machine, so I tried the
Latin-1 locale, i.e. codepage 1252, instead.  Also, I only have access
to a Git repository whose commit log messages are encoded in UTF-8, so
that's what I tried.

What I see is this:

  . By default, Git outputs commit log messages in UTF-8 when
    redirected to a file and to Emacs.  When it writes to the console,
    Git seems to use WriteConsoleW API after converting text from
    UTF-8 to UTF-16.  The Windows console then displays that text
    according to the current codepage, converting to the supported
    characters if it can, and displaying '?' characters if not.

  . If I set i18n.logoutputencoding = windows-1252, Git outputs commit
    log messages in that encoding, both to the cmd, when redirected to
    a file, and to Emacs (I tried "C-x v L" command to see that).

This behavior looks reasonable and expectable, given what the
documentation says.  In particular, I see no differences between the
encoding Git outputs to the console and to Emacs.

Please note that there's one more player in this game, when you invoke
Git from cmd.exe prompt: in some versions of msysGit, when you type a
Git command at cmd.exe prompt, what gets invoked is a git.cmd batch
file supplied by msysGit, and that batch file manipulates the console
codepage.  (On my system, I disabled that manipulation, because it
interferes with Git invocations from Emacs.)  So it could be that what
that batch file does is one reason for the unreasonable behavior you
describe.

If git.cmd is not the culprit, or if you run Git not through such a
batch file, then perhaps you could see what encoding Git emits in the
above 3 scenarios: to console, to file, and to Emacs.  Also, please
tell how you determine the encoding in each case.

P.S.  I tried to verify my observations by looking at the msysGit
sources, but I cannot find the source distribution that corresponds to
the 1.9.4.msysgit.2 binaries I installed.  The download page provides
a link to "Source code", but what gets downloaded by clicking that
link is binaries without sources, which AFAIU is against the GPL.

HTH



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-11-22 13:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-21  4:30 Cyrillic VC Git commit messages Nikolay Kudryavtsev
2014-11-21  8:41 ` Eli Zaretskii
2014-11-21 13:21   ` Nikolay Kudryavtsev
2014-11-21 14:06     ` Eli Zaretskii
2014-11-21 14:38       ` Nikolay Kudryavtsev
2014-11-21 15:28         ` Eli Zaretskii
2014-11-21 16:48           ` Nikolay Kudryavtsev
2014-11-22 13:42             ` Eli Zaretskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).