From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: Cyrillic VC Git commit messages Date: Sat, 22 Nov 2014 15:42:05 +0200 Message-ID: <83r3wvs66q.fsf@gnu.org> References: <546EBFCE.5010602@gmail.com> <83vbm93pzf.fsf@gnu.org> <546F3C58.30603@gmail.com> <831tow4pir.fsf@gnu.org> <546F4E7B.4010304@gmail.com> <83y4r4374x.fsf@gnu.org> <546F6CEF.1090700@gmail.com> NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1416663758 17952 80.91.229.3 (22 Nov 2014 13:42:38 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sat, 22 Nov 2014 13:42:38 +0000 (UTC) To: help-gnu-emacs@gnu.org Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Sat Nov 22 14:42:29 2014 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XsAxI-00057H-PW for geh-help-gnu-emacs@m.gmane.org; Sat, 22 Nov 2014 14:42:28 +0100 Original-Received: from localhost ([::1]:45487 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XsAxI-0006rl-Bm for geh-help-gnu-emacs@m.gmane.org; Sat, 22 Nov 2014 08:42:28 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52310) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XsAx0-0006rd-D0 for help-gnu-emacs@gnu.org; Sat, 22 Nov 2014 08:42:17 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XsAws-0000KX-OE for help-gnu-emacs@gnu.org; Sat, 22 Nov 2014 08:42:10 -0500 Original-Received: from mtaout20.012.net.il ([80.179.55.166]:62854) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XsAws-0000KR-GO for help-gnu-emacs@gnu.org; Sat, 22 Nov 2014 08:42:02 -0500 Original-Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0NFG00I000386W00@a-mtaout20.012.net.il> for help-gnu-emacs@gnu.org; Sat, 22 Nov 2014 15:42:00 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NFG00HAD0Q0X8A0@a-mtaout20.012.net.il> for help-gnu-emacs@gnu.org; Sat, 22 Nov 2014 15:42:00 +0200 (IST) In-reply-to: <546F6CEF.1090700@gmail.com> X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-Received-From: 80.179.55.166 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:101114 Archived-At: > From: Nikolay Kudryavtsev > Date: Fri, 21 Nov 2014 19:48:47 +0300 > > > Can you tell how you decided that, or where did you see that described? > That part implies that there is some new functionality in msysgit that > does the recoding for windows cmd.exe. > > > And it sends codepage 1252 (not 1251) to the cmd.exe window? > It first decodes the message with logoutputencoding, then recodes it > with windows-1252. If you set logoutputencoding to windows-1251, like I > do, it breaks cmd.exe output. > > > Moreover, you seem to say that Git outputs in UTF-8 even though you > > customized i18n.logoutputencoding to be windows-1251? > For vc log the second encoding with windows-1252 does not happen. > > For the commit message, git first recodes from windows-1251 to utf-8 > and then recodes to commitencoding. This behavior is shared when called > from VC and cmd.exe. I looked into this some more and ran some simple tests, and I'm not sure I see the same behavior as the one you describe. First, preliminaries: I tried this with msysGit version 1.9.4.msysgit.2 (the latest binary release) on Windows XP SP3. I cannot easily set up a Cyrillic locale on my machine, so I tried the Latin-1 locale, i.e. codepage 1252, instead. Also, I only have access to a Git repository whose commit log messages are encoded in UTF-8, so that's what I tried. What I see is this: . By default, Git outputs commit log messages in UTF-8 when redirected to a file and to Emacs. When it writes to the console, Git seems to use WriteConsoleW API after converting text from UTF-8 to UTF-16. The Windows console then displays that text according to the current codepage, converting to the supported characters if it can, and displaying '?' characters if not. . If I set i18n.logoutputencoding = windows-1252, Git outputs commit log messages in that encoding, both to the cmd, when redirected to a file, and to Emacs (I tried "C-x v L" command to see that). This behavior looks reasonable and expectable, given what the documentation says. In particular, I see no differences between the encoding Git outputs to the console and to Emacs. Please note that there's one more player in this game, when you invoke Git from cmd.exe prompt: in some versions of msysGit, when you type a Git command at cmd.exe prompt, what gets invoked is a git.cmd batch file supplied by msysGit, and that batch file manipulates the console codepage. (On my system, I disabled that manipulation, because it interferes with Git invocations from Emacs.) So it could be that what that batch file does is one reason for the unreasonable behavior you describe. If git.cmd is not the culprit, or if you run Git not through such a batch file, then perhaps you could see what encoding Git emits in the above 3 scenarios: to console, to file, and to Emacs. Also, please tell how you determine the encoding in each case. P.S. I tried to verify my observations by looking at the msysGit sources, but I cannot find the source distribution that corresponds to the 1.9.4.msysgit.2 binaries I installed. The download page provides a link to "Source code", but what gets downloaded by clicking that link is binaries without sources, which AFAIU is against the GPL. HTH