unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* `grep' command on MS Windows with Cygwin, looking for text with Unicode chars
@ 2018-06-13 18:23 Drew Adams
  2018-06-13 18:40 ` Óscar Fuentes
  2018-06-13 19:08 ` Eli Zaretskii
  0 siblings, 2 replies; 15+ messages in thread
From: Drew Adams @ 2018-06-13 18:23 UTC (permalink / raw)
  To: help-gnu-emacs@gnu.org List

Is there a simple way to use `M-x grep' (e.g., giving it
some switches or escape chars or replacing them with hex
escapes or...) to search for some text that includes
non-ASCII Unicode chars?

[I'm using (an old) Cygwin `grep'.  Dunno whether that
matters.]

I tried to look for "'%s'" (curly-quote) in the Emacs
source code.

E.g., in `info.el' we now have this:
(format "Index for '%s'" string) instead of this:
(format "Index for `%s'" string)

I wanted to see if this kind of change was spread to
other files.

I tried things like "\\x2018%s\\x2019", with no luck.
Is there a simple approach that uses only `M-x grep'
and not, say, piping the result of iconv to grep?

I ended up doing the search using Icicles, but I'd
like to be able to do such a search also using
just `grep' (or `rgrep' etc.).



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: `grep' command on MS Windows with Cygwin, looking for text with Unicode chars
  2018-06-13 18:23 Drew Adams
@ 2018-06-13 18:40 ` Óscar Fuentes
  2018-06-13 19:09   ` Drew Adams
  2018-06-13 19:16   ` Noam Postavsky
  2018-06-13 19:08 ` Eli Zaretskii
  1 sibling, 2 replies; 15+ messages in thread
From: Óscar Fuentes @ 2018-06-13 18:40 UTC (permalink / raw)
  To: help-gnu-emacs

Drew Adams <drew.adams@oracle.com> writes:

> Is there a simple way to use `M-x grep' (e.g., giving it
> some switches or escape chars or replacing them with hex
> escapes or...) to search for some text that includes
> non-ASCII Unicode chars?
>
> [I'm using (an old) Cygwin `grep'.  Dunno whether that
> matters.]
>
> I tried to look for "'%s'" (curly-quote) in the Emacs
> source code.
>
> E.g., in `info.el' we now have this:
> (format "Index for '%s'" string) instead of this:
> (format "Index for `%s'" string)
>
> I wanted to see if this kind of change was spread to
> other files.
>
> I tried things like "\\x2018%s\\x2019", with no luck.
> Is there a simple approach that uses only `M-x grep'
> and not, say, piping the result of iconv to grep?
>
> I ended up doing the search using Icicles, but I'd
> like to be able to do such a search also using
> just `grep' (or `rgrep' etc.).

If there is a method, I'll like to know as well. This is the main reason
why I don't use Unicode in my source files.

(I investigated the matter the last year, on this ml and on the
Internet. My conclusion was negative, at least for UTF-8).




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: `grep' command on MS Windows with Cygwin, looking for text with Unicode chars
  2018-06-13 18:23 Drew Adams
  2018-06-13 18:40 ` Óscar Fuentes
@ 2018-06-13 19:08 ` Eli Zaretskii
  2018-06-13 19:43   ` Tomas Nordin
  1 sibling, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2018-06-13 19:08 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Wed, 13 Jun 2018 11:23:47 -0700 (PDT)
> From: Drew Adams <drew.adams@oracle.com>
> 
> Is there a simple way to use `M-x grep' (e.g., giving it
> some switches or escape chars or replacing them with hex
> escapes or...) to search for some text that includes
> non-ASCII Unicode chars?

Not on MS-Windows with the native Windows build of Emacs, AFAIK.  I
think you will need a Cygwin build of Emacs for that, and perhaps also
a newer Cygwin Grep.

Emacs on Windows cannot invoke subprograms with command-line arguments
encoded in anything but the system codepage.  And Windows doesn't
support UTF-8 as the system codepage.  Sorry.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: `grep' command on MS Windows with Cygwin, looking for text with Unicode chars
  2018-06-13 18:40 ` Óscar Fuentes
@ 2018-06-13 19:09   ` Drew Adams
  2018-06-13 19:16   ` Noam Postavsky
  1 sibling, 0 replies; 15+ messages in thread
From: Drew Adams @ 2018-06-13 19:09 UTC (permalink / raw)
  To: Óscar Fuentes, help-gnu-emacs

> If there is a method, I'll like to know as well. This is the main reason
> why I don't use Unicode in my source files.
> 
> (I investigated the matter the last year, on this ml and on the
> Internet. My conclusion was negative, at least for UTF-8).

Thanks, Oscar.  I searched a bit too.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: `grep' command on MS Windows with Cygwin, looking for text with Unicode chars
       [not found] ` <<83y3fi33or.fsf@gnu.org>
@ 2018-06-13 19:16   ` Drew Adams
  2018-06-13 19:42     ` Eli Zaretskii
  2018-06-13 23:09     ` Bob Proulx
  0 siblings, 2 replies; 15+ messages in thread
From: Drew Adams @ 2018-06-13 19:16 UTC (permalink / raw)
  To: Eli Zaretskii, help-gnu-emacs

> > Is there a simple way to use `M-x grep' (e.g., giving it
> > some switches or escape chars or replacing them with hex
> > escapes or...) to search for some text that includes
> > non-ASCII Unicode chars?
> 
> Not on MS-Windows with the native Windows build of Emacs, AFAIK.  I
> think you will need a Cygwin build of Emacs for that, and perhaps also
> a newer Cygwin Grep.
> 
> Emacs on Windows cannot invoke subprograms with command-line arguments
> encoded in anything but the system codepage.  And Windows doesn't
> support UTF-8 as the system codepage.  Sorry.

Thanks for confirming, Eli.

I wonder if the Emacs doc should mention this, not supposing that
users will figure it out.  Since Emacs otherwise supports Unicode
so well now, it will be natural that some users will mistakenly
expect `M-x grep' to DTRT here.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: `grep' command on MS Windows with Cygwin, looking for text with Unicode chars
  2018-06-13 18:40 ` Óscar Fuentes
  2018-06-13 19:09   ` Drew Adams
@ 2018-06-13 19:16   ` Noam Postavsky
  2018-06-13 19:22     ` Noam Postavsky
  2018-06-13 19:26     ` Drew Adams
  1 sibling, 2 replies; 15+ messages in thread
From: Noam Postavsky @ 2018-06-13 19:16 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: Help Gnu Emacs mailing list

On 13 June 2018 at 14:40, Óscar Fuentes <ofv@wanadoo.es> wrote:
> Drew Adams <drew.adams@oracle.com> writes:
>
>> Is there a simple way to use `M-x grep' (e.g., giving it
>> some switches or escape chars or replacing them with hex
>> escapes or...) to search for some text that includes
>> non-ASCII Unicode chars?

> If there is a method, I'll like to know as well. This is the main reason
> why I don't use Unicode in my source files.

This seems to do the right with thing with the grep I have installed:

    grep "[^[:cntrl:][:print:]]" *.el

According to the GNU grep manual [:cntrl:][:print:] looks equivalent
to Emacs' [:ascii:], in the C locale.

The grep I have installed doesn't seem to support anything but the C
locale anyway (at least, setting LANG isn't needed). It identifies
itself in the --help output as:

GNU grep version 2.0d
Win32 port with subdirectory search created by Tim Charron
(full source available at http://www.interlog.com/~tcharron/grep.html)

That web page indicates it's from 2001, but works well enough that
I've never bothered to change it. Not sure how Cygwin grep would act.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: `grep' command on MS Windows with Cygwin, looking for text with Unicode chars
  2018-06-13 19:16   ` Noam Postavsky
@ 2018-06-13 19:22     ` Noam Postavsky
  2018-06-13 19:28       ` Drew Adams
  2018-06-13 19:26     ` Drew Adams
  1 sibling, 1 reply; 15+ messages in thread
From: Noam Postavsky @ 2018-06-13 19:22 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: Help Gnu Emacs mailing list

On 13 June 2018 at 15:16, Noam Postavsky <npostavs@gmail.com> wrote:
> On 13 June 2018 at 14:40, Óscar Fuentes <ofv@wanadoo.es> wrote:
>> Drew Adams <drew.adams@oracle.com> writes:
>>
>>> Is there a simple way to use `M-x grep' (e.g., giving it
>>> some switches or escape chars or replacing them with hex
>>> escapes or...) to search for some text that includes
>>> non-ASCII Unicode chars?
>
>> If there is a method, I'll like to know as well. This is the main reason
>> why I don't use Unicode in my source files.
>
> This seems to do the right with thing with the grep I have installed:
>
>     grep "[^[:cntrl:][:print:]]" *.el

Oh, just realized you probably meant "text which includes particular
non-ASCII characters", not "text which includes any non-ASCII
characters".

Never mind me then.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: `grep' command on MS Windows with Cygwin, looking for text with Unicode chars
  2018-06-13 19:16   ` Noam Postavsky
  2018-06-13 19:22     ` Noam Postavsky
@ 2018-06-13 19:26     ` Drew Adams
  1 sibling, 0 replies; 15+ messages in thread
From: Drew Adams @ 2018-06-13 19:26 UTC (permalink / raw)
  To: Noam Postavsky, Óscar Fuentes; +Cc: Help Gnu Emacs mailing list

> >> Is there a simple way to use `M-x grep' (e.g., giving it
> >> some switches or escape chars or replacing them with hex
> >> escapes or...) to search for some text that includes
> >> non-ASCII Unicode chars?
> 
> > If there is a method, I'll like to know as well. This is the main
> reason
> > why I don't use Unicode in my source files.
> 
> This seems to do the right with thing with the grep I have installed:
> 
>     grep "[^[:cntrl:][:print:]]" *.el
> 
> According to the GNU grep manual [:cntrl:][:print:] looks equivalent
> to Emacs' [:ascii:], in the C locale.
> 
> The grep I have installed doesn't seem to support anything but the C
> locale anyway (at least, setting LANG isn't needed). It identifies
> itself in the --help output as:
> 
> GNU grep version 2.0d
> Win32 port with subdirectory search created by Tim Charron
> (full source available at
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.interlog.com_-
> 7Etcharron_grep.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_J
> nE&r=kI3P6ljGv6CTHIKju0jqInF6AOwMCYRDQUmqX22rJ98&m=mwTRqK15rRKM1JijTtXJcy
> fypP_2OPkAexmNd725LFQ&s=ElcYIkHLVnToY1wdciKB3H6WEeO6g1KYRX-M4tBIsro&e=)
> 
> That web page indicates it's from 2001, but works well enough that
> I've never bothered to change it. Not sure how Cygwin grep would act.

Interesting; thanks.

With my (old) Cygwin grep, in the `lisp' directory, that shows 4 hits,
3 in char-fold.el and one in mpc.el.  The first char-fold.el hit shows
matches for curly quotes, for example.  But I guess that won't help me
find just curly quotes. ;-)

In each case, the grep hits show octal escapes instead of Unicode-char glyphs.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: `grep' command on MS Windows with Cygwin, looking for text with Unicode chars
  2018-06-13 19:22     ` Noam Postavsky
@ 2018-06-13 19:28       ` Drew Adams
  0 siblings, 0 replies; 15+ messages in thread
From: Drew Adams @ 2018-06-13 19:28 UTC (permalink / raw)
  To: Noam Postavsky, Óscar Fuentes; +Cc: Help Gnu Emacs mailing list

> >>> Is there a simple way to use `M-x grep' (e.g., giving it
> >>> some switches or escape chars or replacing them with hex
> >>> escapes or...) to search for some text that includes
> >>> non-ASCII Unicode chars?
> >
> >> If there is a method, I'll like to know as well. This is the main
> >> reason why I don't use Unicode in my source files.
> >
> > This seems to do the right with thing with the grep I have installed:
> >     grep "[^[:cntrl:][:print:]]" *.el
> 
> Oh, just realized you probably meant "text which includes particular
> non-ASCII characters", not "text which includes any non-ASCII
> characters".

Yes, I did.  But that's OK.  I learned something useful.

> Never mind me then.

Nope, sorry; can't do that. ;-)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: `grep' command on MS Windows with Cygwin, looking for text with Unicode chars
  2018-06-13 19:16   ` `grep' command on MS Windows with Cygwin, looking for text with Unicode chars Drew Adams
@ 2018-06-13 19:42     ` Eli Zaretskii
  2018-06-13 23:09     ` Bob Proulx
  1 sibling, 0 replies; 15+ messages in thread
From: Eli Zaretskii @ 2018-06-13 19:42 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Wed, 13 Jun 2018 12:16:09 -0700 (PDT)
> From: Drew Adams <drew.adams@oracle.com>
> 
> > Emacs on Windows cannot invoke subprograms with command-line arguments
> > encoded in anything but the system codepage.  And Windows doesn't
> > support UTF-8 as the system codepage.  Sorry.
> 
> Thanks for confirming, Eli.
> 
> I wonder if the Emacs doc should mention this, not supposing that
> users will figure it out.

Mention where?  This is a corner use case, a subtlety, and this kind
of factoids doesn't belong to the docs, as they will never be found
there.  Something for the FAQ or the wiki, perhaps.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: `grep' command on MS Windows with Cygwin, looking for text with Unicode chars
  2018-06-13 19:08 ` Eli Zaretskii
@ 2018-06-13 19:43   ` Tomas Nordin
  2018-06-14  2:33     ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: Tomas Nordin @ 2018-06-13 19:43 UTC (permalink / raw)
  To: Eli Zaretskii, help-gnu-emacs

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Wed, 13 Jun 2018 11:23:47 -0700 (PDT)
>> From: Drew Adams <drew.adams@oracle.com>
>> 
>> Is there a simple way to use `M-x grep' (e.g., giving it
>> some switches or escape chars or replacing them with hex
>> escapes or...) to search for some text that includes
>> non-ASCII Unicode chars?
>
> Not on MS-Windows with the native Windows build of Emacs, AFAIK.  I
> think you will need a Cygwin build of Emacs for that, and perhaps also
> a newer Cygwin Grep.
>
> Emacs on Windows cannot invoke subprograms with command-line arguments
> encoded in anything but the system codepage.  And Windows doesn't
> support UTF-8 as the system codepage.  Sorry.

Sorry for a naive question, but the "system codepage" or "current system
codepage" wording is used now and then in relation to non-ascii problems
on Windows. If on Windows, what is a good way to figure out the
current system codepage?

--
Tomas



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: `grep' command on MS Windows with Cygwin, looking for text with Unicode chars
  2018-06-13 19:16   ` `grep' command on MS Windows with Cygwin, looking for text with Unicode chars Drew Adams
  2018-06-13 19:42     ` Eli Zaretskii
@ 2018-06-13 23:09     ` Bob Proulx
  2018-06-13 23:37       ` Drew Adams
  1 sibling, 1 reply; 15+ messages in thread
From: Bob Proulx @ 2018-06-13 23:09 UTC (permalink / raw)
  To: help-gnu-emacs

Drew Adams wrote:
> Since Emacs otherwise supports Unicode so well now, it will be
> natural that some users will mistakenly expect `M-x grep' to DTRT
> here.

Depending upon what you are wanting to do I might mention that emacs
dired can be very powerful here.

I am often doing C-x d to start dired.  Then using:

  C-x d RET               ;; open directory dired
  % g REGEXP <RET>        ;; mark all files containing pattern
  Q REGEXP <RET> TO <RET> ;; Perform query-replace-regexp on marked files

If nothing else then % g is a native emacs way to search contents of
files and should handle unicode as well as for other tasks.

And for work within a single buffer there is M-x occur too.

Bob



^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: `grep' command on MS Windows with Cygwin, looking for text with Unicode chars
  2018-06-13 23:09     ` Bob Proulx
@ 2018-06-13 23:37       ` Drew Adams
  0 siblings, 0 replies; 15+ messages in thread
From: Drew Adams @ 2018-06-13 23:37 UTC (permalink / raw)
  To: Bob Proulx, help-gnu-emacs

> Depending upon what you are wanting to do I might mention that emacs
> dired can be very powerful here.
> 
> I am often doing C-x d to start dired.  Then using:
> 
>   C-x d RET               ;; open directory dired
>   % g REGEXP <RET>        ;; mark all files containing pattern
>   Q REGEXP <RET> TO <RET> ;; Perform query-replace-regexp on marked files
> 
> If nothing else then % g is a native emacs way to search contents of
> files and should handle unicode as well as for other tasks.

Excellent reminder about that, Bob.

That was the same reason I used Icicles search - let Emacs itself
do the searching/matching.  I still would like to be able to also
use `M-x grep'.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: `grep' command on MS Windows with Cygwin, looking for text with Unicode chars
  2018-06-13 19:43   ` Tomas Nordin
@ 2018-06-14  2:33     ` Eli Zaretskii
  2018-06-14  2:40       ` Eli Zaretskii
  0 siblings, 1 reply; 15+ messages in thread
From: Eli Zaretskii @ 2018-06-14  2:33 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Tomas Nordin <tomasn@posteo.net>
> Date: Wed, 13 Jun 2018 21:43:52 +0200
> 
> Sorry for a naive question, but the "system codepage" or "current system
> codepage" wording is used now and then in relation to non-ascii problems
> on Windows. If on Windows, what is a good way to figure out the
> current system codepage?

w32-system-coding-system is one variable that will tell you that.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: `grep' command on MS Windows with Cygwin, looking for text with Unicode chars
  2018-06-14  2:33     ` Eli Zaretskii
@ 2018-06-14  2:40       ` Eli Zaretskii
  0 siblings, 0 replies; 15+ messages in thread
From: Eli Zaretskii @ 2018-06-14  2:40 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Thu, 14 Jun 2018 05:33:48 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> 
> > Sorry for a naive question, but the "system codepage" or "current system
> > codepage" wording is used now and then in relation to non-ascii problems
> > on Windows. If on Windows, what is a good way to figure out the
> > current system codepage?
> 
> w32-system-coding-system is one variable that will tell you that.

And w32-ansi-code-page is another.



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-06-14  2:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <<356e7bf9-3f93-448c-a067-f6b567d5aa5a@default>
     [not found] ` <<83y3fi33or.fsf@gnu.org>
2018-06-13 19:16   ` `grep' command on MS Windows with Cygwin, looking for text with Unicode chars Drew Adams
2018-06-13 19:42     ` Eli Zaretskii
2018-06-13 23:09     ` Bob Proulx
2018-06-13 23:37       ` Drew Adams
2018-06-13 18:23 Drew Adams
2018-06-13 18:40 ` Óscar Fuentes
2018-06-13 19:09   ` Drew Adams
2018-06-13 19:16   ` Noam Postavsky
2018-06-13 19:22     ` Noam Postavsky
2018-06-13 19:28       ` Drew Adams
2018-06-13 19:26     ` Drew Adams
2018-06-13 19:08 ` Eli Zaretskii
2018-06-13 19:43   ` Tomas Nordin
2018-06-14  2:33     ` Eli Zaretskii
2018-06-14  2:40       ` Eli Zaretskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).