From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Filipp Gunbin Newsgroups: gmane.emacs.help Subject: Re: Grep Japanese characters Date: Mon, 16 Jul 2018 23:11:36 +0300 Message-ID: References: <20180712.080255.586725992291613595.tkk@misasa.okayama-u.ac.jp> <83a7qxfa7r.fsf@gnu.org> <87d0vt2lxj.fsf@yue-d-PC.i-did-not-set--mail-host-address--so-tickle-me> <86pnzsbnvu.fsf@misasa.okayama-u.ac.jp> <8336wofudf.fsf@gnu.org> <864lh3g629.fsf@misasa.okayama-u.ac.jp> <838t6feh2z.fsf@gnu.org> <83lgafcidq.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1531771836 9078 195.159.176.226 (16 Jul 2018 20:10:36 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Mon, 16 Jul 2018 20:10:36 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (darwin) Cc: help-gnu-emacs@gnu.org To: Eli Zaretskii Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Mon Jul 16 22:10:32 2018 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ff9pG-0002GQ-AJ for geh-help-gnu-emacs@m.gmane.org; Mon, 16 Jul 2018 22:10:30 +0200 Original-Received: from localhost ([::1]:53624 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ff9rM-0000j7-U0 for geh-help-gnu-emacs@m.gmane.org; Mon, 16 Jul 2018 16:12:40 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:38143) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ff9qU-0000hP-24 for help-gnu-emacs@gnu.org; Mon, 16 Jul 2018 16:11:47 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ff9qS-0007BH-Rj for help-gnu-emacs@gnu.org; Mon, 16 Jul 2018 16:11:45 -0400 Original-Received: from out2-smtp.messagingengine.com ([66.111.4.26]:40625) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ff9qO-0007A0-65; Mon, 16 Jul 2018 16:11:40 -0400 Original-Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 78A9F20D2D; Mon, 16 Jul 2018 16:11:39 -0400 (EDT) Original-Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Mon, 16 Jul 2018 16:11:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastmail.fm; h= cc:content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=6QpL2VEUh4sKltwiM0AzBnORICT3IhBaYKEPZLln27o=; b=gEQ3OmQl bDUq8FYlL7p357/d/Ya96FXRMfULPsk0V4vGkT/Z0c8tYK7t2b+8I6WuhiiwpIHD 6uvwL4WPF7Nor1M4zrY8ITCLKISLEWphMvAPZgw2/IZixy0f/JXBMTPin0IUTXEJ G8MPEOPtP81sWabej4CPBmlqJKQ3bY5WksLXNZhaLExu2pYwNWy2DuE3GcxmOIIL QTFCtQVSa0XIh0Hoz+ax/d/5QIkVR2apXS+mwx3bQD5/cKDMoA4lCP/WOAAV2IzX n+vHyEJzZ11/w91HBkS0T4adGlMJTGk+ekwYlkWkYbkSUPtZgU6xpx6jyqGBL4eG rF6y4A+wxhD5HA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; bh=6QpL2VEUh4sKltwiM0AzBnORICT3I hBaYKEPZLln27o=; b=Hm3kWuMHzqIqDFZRPzoa1q4CfYVnywZ5y82A4HzX5n+P/ 68S2T+rAFrfmJ34B3At8w+vFw2v+tFS97SDmKhHxOnLD5Ira+Qrkwl1DkO1OiHAn cc3hWo7KiVD0sXlzCfJTiS/bYxmhrWQP0QVwLwhrent9A06+64V+J9NZbKq+hnpT QR6ARPEnzgKyBYqT9PZ3igdC28TEVm5I476YuIsjkL781CoQYNxoQqfjNdtuWvIa bozAMOFVYgTjoZT+638nowe/7UJTcR7SAO9JVLsSMwBirGFdQngEqkmSqjo16yIt abDbHfxpPJ5aJgqao+DZDRWGUV5CJToY+kVQkqjbg== X-ME-Proxy: X-ME-Sender: Original-Received: from fgunbin.playteam.ru (unknown [94.25.218.10]) by mail.messagingengine.com (Postfix) with ESMTPA id 800B010277; Mon, 16 Jul 2018 16:11:38 -0400 (EDT) In-Reply-To: <83lgafcidq.fsf@gnu.org> (Eli Zaretskii's message of "Fri, 13 Jul 2018 17:36:33 +0300") X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.111.4.26 X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "help-gnu-emacs" Xref: news.gmane.org gmane.emacs.help:117470 Archived-At: On 13/07/2018 17:36 +0300, Eli Zaretskii wrote: >> From: Filipp Gunbin >> Cc: help-gnu-emacs@gnu.org >> Date: Fri, 13 Jul 2018 17:06:38 +0300 >> >> > The conclusion is that UTF-8 can be used as a locale's codeset >> > (good!), but sending UTF-8 text to the console still doesn't work well >> > (not so good). So if people use this knob in Windows 10, they should >> > arrange for console input and output to be in some codepage other than >> > 65001 (a.k.a. UTF-8). >> [..] >> >> But in message <86pnzsbnvu.fsf@misasa.okayama-u.ac.jp> above it was >> reported that grepping of these non-ascii chars worked from emacs, no? > > When you gerp from Emacs, the results of the search are not displayed > by the Windows console, they get read by Emacs and displayed by Emacs. > And (GUI) Emacs can display _any_ character supported by the fonts > installed on the systems, regardless of the codepage. But if people > run Grep from the shell prompt, they will see unreadable output, even > on Windows 10 with that setting in effect. > >> And what does "using as locale's codeset" then means in your message? > > A locale's most general specification is ll_CC.ENC, where ll is the > language, CC is the country, and ENC is the encoding. Example from > Posix systems: pr_BR.UTF-8, for Brazilian variety of Portuguese with > UTF-8 encoding. Example from Windows: French_Canada.1252 (where 1252 > is the codepage used for encoding). The ENC part is also known as > "codeset". > > More about that, for Windows in particular, here: > > https://msdn.microsoft.com/en-us/library/x99tb11d.aspx > > You will see that the MS doc still says UTF-8 is not supported as the > ENC part. Thanks. I'm familiar with locale concept, but was not sure about what "codeset" means. I'm still a bit lost in this. It seems that sending/receiving to/from subprocesses works with that Win10 setting, that's why grepping from M-x shell started to work. Output in graphical Emacs will work if font is ok. But the interactions with console confuse me, I guess I need to read more on that before I am able to ask something meaningful. In particular, it's unclear to me why grep outputs Japanese correctly in the OP (with LC_ALL=en_US.UTF-8), and you say that sending UTF-8 text to console will not work.