From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.help Subject: Re: grep-find with Polish letters in Windows Date: Tue, 14 Sep 2010 21:21:41 +0200 Message-ID: <83r5gw2ksa.fsf@gnu.org> References: Reply-To: Eli Zaretskii NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE X-Trace: dough.gmane.org 1284492132 11293 80.91.229.12 (14 Sep 2010 19:22:12 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Tue, 14 Sep 2010 19:22:12 +0000 (UTC) Cc: help-gnu-emacs@gnu.org To: Andrzej Skiba Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Tue Sep 14 21:22:10 2010 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Ovb50-0002cV-IZ for geh-help-gnu-emacs@m.gmane.org; Tue, 14 Sep 2010 21:22:10 +0200 Original-Received: from localhost ([127.0.0.1]:41176 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ovb4z-0008By-W0 for geh-help-gnu-emacs@m.gmane.org; Tue, 14 Sep 2010 15:22:10 -0400 Original-Received: from [140.186.70.92] (port=33769 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ovb4Z-0008BB-OL for help-gnu-emacs@gnu.org; Tue, 14 Sep 2010 15:21:44 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Ovb4X-0006kJ-MF for help-gnu-emacs@gnu.org; Tue, 14 Sep 2010 15:21:43 -0400 Original-Received: from mtaout23.012.net.il ([80.179.55.175]:63481) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Ovb4X-0006jw-ET for help-gnu-emacs@gnu.org; Tue, 14 Sep 2010 15:21:41 -0400 Original-Received: from conversion-daemon.a-mtaout23.012.net.il by a-mtaout23.012.net.il (HyperSendmail v2007.08) id <0L8R00A004F79I00@a-mtaout23.012.net.il> for help-gnu-emacs@gnu.org; Tue, 14 Sep 2010 21:21:40 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([77.127.81.53]) by a-mtaout23.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0L8R00A1E4G26P50@a-mtaout23.012.net.il>; Tue, 14 Sep 2010 21:21:39 +0200 (IST) In-reply-to: X-012-Sender: halo1@inter.net.il X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:74964 Archived-At: > Date: Tue, 14 Sep 2010 13:02:30 +0200 > From: Andrzej Skiba >=20 > (defun as/grep-project (project pattern) > (interactive "sProject: \nsPattern: ") > (grep-find (concat > "/usr/bin/find /cygdrive/c/projects/" > project > " -type f " > " -not -name \"*.svn-base\" " > "-and -not -name \"*.tmp\" " > "-and -not -name \"*.log\" -print0 " > "| xargs -0 -e grep -U -n -s -F \"" > pattern > "\""))) >=20 > All works great until I try to search for a word with Polish letter= s (such > as =C4=85, =C5=9B, =C4=87, =C5=82, =C5=84 etc.). The files are all = utf-8. When I run the command > searching for string "Usu=C5=84" in project test I get the followin= g output in > the grep buffer: >=20 > /usr/bin/find /cygdrive/c/projects/test -type f -not -name "*.svn-b= ase" -and > -not -name "*.tmp" -and -not -name "*.log" -print0 | xargs -0 -e gr= ep -U -n > -s -F "Usu=C5=84" > /usr/bin/bash: /usr/bin/find /cygdrive/c/projects/test -type f -not= -name > "*.svn-base" -and -not -name "*.tmp" -and -not -name "*.log" -print= 0 | xargs > -0 -e grep -U -n -s -F "Usu=C4=B9=E2=80=9E": No such file or direct= ory >=20 > It runs fine with any input without Polish characters. You seem to be using the native build of Emacs in conjunction with Cygwin tools (Grep, Bash, etc.). If so, this is asking for trouble, because there are subtle incompatibilities between Cygwin programs an= d native Windows programs. I/O encoding is one of these areas: whereas latest Cygwin versions use UTF-8, native Windows programs use the current Windows codepage. The native build of Emacs cannot encode command lines it passes to programs in anything but the current codepage, which is no good for you if your files are encoded in UTF-8= . I suggest to use the Cygwin build of Emacs instead. > The find command works fine in a regular window shell as well as cy= gwin > bash. What is the "regular window shell"? If it's CMD, then I don't see ho= w it could work, since CMD does not support UTF-8 keyboard input. Perhaps the Cygwin port of Grep transparently converts keyboard input into UTF-8 or something.