From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Andrzej Skiba Newsgroups: gmane.emacs.help Subject: Re: grep-find with Polish letters in Windows Date: Wed, 15 Sep 2010 09:32:46 +0200 Message-ID: References: <83r5gw2ksa.fsf@gnu.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=e0cb4e887b1b03cf12049047568d X-Trace: dough.gmane.org 1284536014 5020 80.91.229.12 (15 Sep 2010 07:33:34 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 15 Sep 2010 07:33:34 +0000 (UTC) Cc: help-gnu-emacs@gnu.org To: Eli Zaretskii Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Sep 15 09:33:33 2010 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1OvmUm-0004gL-FE for geh-help-gnu-emacs@m.gmane.org; Wed, 15 Sep 2010 09:33:32 +0200 Original-Received: from localhost ([127.0.0.1]:54565 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OvmUl-0001nC-NO for geh-help-gnu-emacs@m.gmane.org; Wed, 15 Sep 2010 03:33:31 -0400 Original-Received: from [140.186.70.92] (port=40699 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OvmU6-0001lt-Eu for help-gnu-emacs@gnu.org; Wed, 15 Sep 2010 03:32:51 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OvmU3-0007bY-Pw for help-gnu-emacs@gnu.org; Wed, 15 Sep 2010 03:32:50 -0400 Original-Received: from mail-vw0-f41.google.com ([209.85.212.41]:42514) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OvmU3-0007bS-Lo; Wed, 15 Sep 2010 03:32:47 -0400 Original-Received: by vws16 with SMTP id 16so8317889vws.0 for ; Wed, 15 Sep 2010 00:32:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=61rMI49xaPve9huykf/jO7LVarr2plZB5hmIhcRT1xE=; b=HTZgYgLpnSNbsX40d3psgqybCCWOe21MAzRNWPYYajx8b4yv6LY4/vES0qAp2GGKok goe+wjzWLjsMuLbc/5bub7MonRdFGO+0oY12YQWjV8VX0fB/UAG0ere+2xfeTG8gk3G3 4GyVuhwbrGmMGePY8MNGiouwB+iT29hF3YTcA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=LybDA8w6Ddt2mZ8HsJiGsmPIxHwnMBfWDo5eFa2adFxo7mLEkl7Ndn+NQrLlm0WGLI md1phVWNqO4Mczep1agv1wSoQKnPKkbLqXqZ8bcQz50SdYh1r32uGzLOzcPGVKAsyUHU Df2qrEIu2MPa76dCv+milA11wMwMyAlNKTOGg= Original-Received: by 10.220.129.13 with SMTP id m13mr543601vcs.272.1284535967010; Wed, 15 Sep 2010 00:32:47 -0700 (PDT) Original-Received: by 10.220.51.12 with HTTP; Wed, 15 Sep 2010 00:32:46 -0700 (PDT) In-Reply-To: <83r5gw2ksa.fsf@gnu.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:74971 Archived-At: --e0cb4e887b1b03cf12049047568d Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I switched to the cygwin build of emacs and everything is working as expected so thank you for your help. I usually do my work on a Linux box, but they gave me a Windows machine at work recently. It's been such a pain to work with sometimes. Thanks again, Andrzej PS. And yes CMD is what I meant by "regular windows shell" :) On Tue, Sep 14, 2010 at 9:21 PM, Eli Zaretskii wrote: > > Date: Tue, 14 Sep 2010 13:02:30 +0200 > > From: Andrzej Skiba > > > > (defun as/grep-project (project pattern) > > (interactive "sProject: \nsPattern: ") > > (grep-find (concat > > "/usr/bin/find /cygdrive/c/projects/" > > project > > " -type f " > > " -not -name \"*.svn-base\" " > > "-and -not -name \"*.tmp\" " > > "-and -not -name \"*.log\" -print0 " > > "| xargs -0 -e grep -U -n -s -F \"" > > pattern > > "\""))) > > > > All works great until I try to search for a word with Polish letters > (such > > as =C4=85, =C5=9B, =C4=87, =C5=82, =C5=84 etc.). The files are all utf-= 8. When I run the command > > searching for string "Usu=C5=84" in project test I get the following ou= tput in > > the grep buffer: > > > > /usr/bin/find /cygdrive/c/projects/test -type f -not -name "*.svn-base" > -and > > -not -name "*.tmp" -and -not -name "*.log" -print0 | xargs -0 -e grep -= U > -n > > -s -F "Usu=C5=84" > > /usr/bin/bash: /usr/bin/find /cygdrive/c/projects/test -type f -not -na= me > > "*.svn-base" -and -not -name "*.tmp" -and -not -name "*.log" -print0 | > xargs > > -0 -e grep -U -n -s -F "Usu=C4=B9=E2=80=9E": No such file or directory > > > > It runs fine with any input without Polish characters. > > You seem to be using the native build of Emacs in conjunction with > Cygwin tools (Grep, Bash, etc.). If so, this is asking for trouble, > because there are subtle incompatibilities between Cygwin programs and > native Windows programs. I/O encoding is one of these areas: whereas > latest Cygwin versions use UTF-8, native Windows programs use the > current Windows codepage. The native build of Emacs cannot encode > command lines it passes to programs in anything but the current > codepage, which is no good for you if your files are encoded in UTF-8. > > I suggest to use the Cygwin build of Emacs instead. > > > The find command works fine in a regular window shell as well as cygwin > > bash. > > What is the "regular window shell"? If it's CMD, then I don't see how > it could work, since CMD does not support UTF-8 keyboard input. > Perhaps the Cygwin port of Grep transparently converts keyboard input > into UTF-8 or something. > --e0cb4e887b1b03cf12049047568d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I switched to the cygwin build of emacs and everything is working as e= xpected so thank you for your help. I usually do my work on a Linux box, bu= t they gave me a Windows machine at work recently. It's been such a pai= n to work with sometimes.

Thanks again,

Andrzej

PS. And yes CMD is what I meant by "regular win= dows shell" :)

On Tue, Sep 14, 201= 0 at 9:21 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> Date: Tue, 14 Sep 2010 13:02:30 +0200<= br> > From: Andrzej Skiba <andskiba= @gmail.com>
>
> (defun as/grep-project (project pattern)
> =C2=A0 (interactive "sProject: \nsPattern: ")
> =C2=A0 =C2=A0 (grep-find (concat
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&= quot;/usr/bin/find /cygdrive/c/projects/"
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0p= roject
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&= quot; -type f "
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&= quot; -not -name \"*.svn-base\" "
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&= quot;-and -not -name \"*.tmp\" "
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&= quot;-and -not -name \"*.log\" -print0 "
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&= quot;| xargs -0 -e grep -U -n -s -F \""
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0p= attern
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&= quot;\"")))
>
> All works great until I try to search for a word with Polish letters (= such
> as =C4=85, =C5=9B, =C4=87, =C5=82, =C5=84 etc.). The files are all utf= -8. When I run the command
> searching for string "Usu=C5=84" in project test I get the f= ollowing output in
> the grep buffer:
>
> /usr/bin/find /cygdrive/c/projects/test -type f -not -name "*.svn= -base" -and
> -not -name "*.tmp" -and -not -name "*.log" -print0= | xargs -0 -e grep -U -n
> -s -F "Usu=C5=84"
> /usr/bin/bash: /usr/bin/find /cygdrive/c/projects/test -type f -not -n= ame
> "*.svn-base" -and -not -name "*.tmp" -and -not -na= me "*.log" -print0 | xargs
> -0 -e grep -U -n -s -F "Usu=C4=B9=E2=80=9E": No such file or= directory
>
> It runs fine with any input without Polish characters.

You seem to be using the native build of Emacs in conjunction with Cygwin tools (Grep, Bash, etc.). =C2=A0If so, this is asking for trouble, because there are subtle incompatibilities between Cygwin programs and
native Windows programs. =C2=A0I/O encoding is one of these areas: whereas<= br> latest Cygwin versions use UTF-8, native Windows programs use the
current Windows codepage. =C2=A0The native build of Emacs cannot encode
command lines it passes to programs in anything but the current
codepage, which is no good for you if your files are encoded in UTF-8.

I suggest to use the Cygwin build of Emacs instead.

> The find command works fine in a regular window shell as well as cygwi= n
> bash.

What is the "regular window shell"? =C2=A0If it's CMD, = then I don't see how
it could work, since CMD does not support UTF-8 keyboard input.
Perhaps the Cygwin port of Grep transparently converts keyboard input
into UTF-8 or something.

--e0cb4e887b1b03cf12049047568d--