* diacritic-fold-search?
@ 2012-11-29 17:12 Lewis Perin
2012-11-29 18:19 ` diacritic-fold-search? Peter Dyballa
[not found] ` <mailman.14069.1354213153.855.help-gnu-emacs@gnu.org>
0 siblings, 2 replies; 13+ messages in thread
From: Lewis Perin @ 2012-11-29 17:12 UTC (permalink / raw)
To: help-gnu-emacs
Is there a way to search for a text string ignoring any diacritics,
e.g. capturing both “après” and “apres”?
/Lew
---
Lew Perin / perin@acm.org
http://babelcarp.org
^ permalink raw reply [flat|nested] 13+ messages in thread
* diacritic-fold-search?
@ 2012-11-29 17:20 Lewis Perin
2012-11-29 17:39 ` diacritic-fold-search? Drew Adams
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Lewis Perin @ 2012-11-29 17:20 UTC (permalink / raw)
To: help-gnu-emacs
Is there a way to search ignoring diacritics, e.g. capturing "apres"
both with and without an accent grave over the "e"?
/Lew
---
Lew Perin / perin@acm.org
http://babelcarp.org
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: diacritic-fold-search?
2012-11-29 17:20 diacritic-fold-search? Lewis Perin
@ 2012-11-29 17:39 ` Drew Adams
[not found] ` <mailman.14059.1354210783.855.help-gnu-emacs@gnu.org>
2012-11-30 14:13 ` diacritic-fold-search? Doug Lewan
2 siblings, 0 replies; 13+ messages in thread
From: Drew Adams @ 2012-11-29 17:39 UTC (permalink / raw)
To: 'Lewis Perin', help-gnu-emacs
> Is there a way to search ignoring diacritics, e.g. capturing "apres"
> both with and without an accent grave over the "e"?
Great question. I don't think so, but I'm guessing that lots of users could
make good use of such a feature!
Unless someone points out here that this is already possible, why don't you
submit an enhancement request for this feature (`M-x report-emacs-bug' is also
for enhancement requests): be able to toggle Isearch distinguishing certain sets
of similar chars (diacritics).
There could be predefined sets of equivalence classes of chars (e.g., the same
letter, modulo diacritical marks). And users could be able to customize these
classes.
Likewise, for punctuation chars that are very similar (in purpose/visually),
such as straight quotes and curly quotes, and no-break hyphen, hyphen, and the
various dashes.
Likewise, for whitespace chars other than the standard SPC, TAB, etc. For
whitespace, I believe there might be some handling of additional chars such as
no-break space, but what's needed, here too, is a simple way to toggle
distinguishing them on/off.
But your use case is the best one: be able to optionally ignore diacritical
marks when searching.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: diacritic-fold-search?
2012-11-29 17:12 diacritic-fold-search? Lewis Perin
@ 2012-11-29 18:19 ` Peter Dyballa
2012-11-29 18:29 ` diacritic-fold-search? Drew Adams
[not found] ` <mailman.14069.1354213153.855.help-gnu-emacs@gnu.org>
1 sibling, 1 reply; 13+ messages in thread
From: Peter Dyballa @ 2012-11-29 18:19 UTC (permalink / raw)
To: Lewis Perin; +Cc: help-gnu-emacs
Am 29.11.2012 um 18:12 schrieb Lewis Perin:
> Is there a way to search for a text string ignoring any diacritics,
> e.g. capturing both “après” and “apres”?
You can use a regular expression such as [éeè] …
--
Greetings
Pete
"What do you think of Western Civilisation?"
"I think it would be a good idea!"
– Mohandas Karamchand Gandhi
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: diacritic-fold-search?
2012-11-29 18:19 ` diacritic-fold-search? Peter Dyballa
@ 2012-11-29 18:29 ` Drew Adams
0 siblings, 0 replies; 13+ messages in thread
From: Drew Adams @ 2012-11-29 18:29 UTC (permalink / raw)
To: 'Peter Dyballa', 'Lewis Perin'; +Cc: help-gnu-emacs
> > Is there a way to search for a text string ignoring any diacritics,
> > e.g. capturing both après and apres?
>
> You can use a regular expression such as [éeè]
Sure, but it would be good to be able to just tell Isearch to ignore all accents
(diacritical marks).
As opposed to having to type a regexp with each of the chars you want to
consider equivalent. Especially if you don't have a keyboard that makes
entering such chars trivial (vs using `insert-char' and providing Unicode names
or char codes, etc.).
Some users who are not in the habit of entering such chars will nevertheless
have a use case for searching text that contains them. Copying text off the Web
is one way that people end up with such text.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: diacritic-fold-search?
[not found] ` <mailman.14069.1354213153.855.help-gnu-emacs@gnu.org>
@ 2012-11-29 18:37 ` Lewis Perin
0 siblings, 0 replies; 13+ messages in thread
From: Lewis Perin @ 2012-11-29 18:37 UTC (permalink / raw)
To: help-gnu-emacs
Peter Dyballa <Peter_Dyballa@Web.DE> writes:
>Am 29.11.2012 um 18:12 schrieb Lewis Perin:
>
>> Is there a way to search for a text string ignoring any diacritics,
>> e.g. capturing both “après” and “apres”?
>
>You can use a regular expression such as [éeè] …
Yes, I’m aware of that, but with more than one language it can get
tedious to try to think of all the possibilities and then type them,
especially if you’re searching for a text string or regex in which
multiple characters could be accented.
/Lew
---
Lew Perin / perin@acm.org
http://babelcarp.org
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: diacritic-fold-search?
[not found] ` <mailman.14059.1354210783.855.help-gnu-emacs@gnu.org>
@ 2012-11-29 18:59 ` Lewis Perin
2012-11-29 19:10 ` diacritic-fold-search? Drew Adams
2012-11-29 21:59 ` diacritic-fold-search? B. T. Raven
2012-11-30 18:31 ` diacritic-fold-search? Lewis Perin
1 sibling, 2 replies; 13+ messages in thread
From: Lewis Perin @ 2012-11-29 18:59 UTC (permalink / raw)
To: help-gnu-emacs
"Drew Adams" <drew.adams@oracle.com> writes:
>> Is there a way to search ignoring diacritics, e.g. capturing "apres"
>> both with and without an accent grave over the "e"?
>
>Great question. I don't think so, but I'm guessing that lots of users could
>make good use of such a feature!
>
>Unless someone points out here that this is already possible, why don't
>you submit an enhancement request for this feature (`M-x
>report-emacs-bug' is also for enhancement requests): be able to toggle
>Isearch distinguishing certain sets of similar chars (diacritics).
>
>There could be predefined sets of equivalence classes of chars (e.g.,
>the same letter, modulo diacritical marks). And users could be able to
>customize these classes.
>
>Likewise, for punctuation chars that are very similar (in
>purpose/visually), such as straight quotes and curly quotes, and
>no-break hyphen, hyphen, and the various dashes.
>
>Likewise, for whitespace chars other than the standard SPC, TAB, etc.
>For whitespace, I believe there might be some handling of additional
>chars such as no-break space, but what's needed, here too, is a simple
>way to toggle distinguishing them on/off.
>
>But your use case is the best one: be able to optionally ignore diacritical
>marks when searching.
It may not be totally irrelevant to note that search engines make
diacritic-agnostic search the default. And some Web browsers (Chrome
but not Firefox) do this for searches of a page they’re displaying.
/Lew
---
Lew Perin / perin@acm.org
http://babelcarp.org
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: diacritic-fold-search?
2012-11-29 18:59 ` diacritic-fold-search? Lewis Perin
@ 2012-11-29 19:10 ` Drew Adams
2012-11-29 19:31 ` diacritic-fold-search? Dani Moncayo
2012-11-29 21:59 ` diacritic-fold-search? B. T. Raven
1 sibling, 1 reply; 13+ messages in thread
From: Drew Adams @ 2012-11-29 19:10 UTC (permalink / raw)
To: 'Lewis Perin', help-gnu-emacs
> It may not be totally irrelevant to note that search engines make
> diacritic-agnostic search the default. And some Web browsers (Chrome
> but not Firefox) do this for searches of a page theyre displaying.
Another good point. Emacs has always made search case-insensitive by default
(`case-fold-search' is t), presumably for the same reason.
IMHO, it would be good for Emacs to handle this the same way it handles
case-sensitivity: off by default, with a simple toggle to turn it on.
`a' is no more the same as `A' than `è' is the same as `e'. But for many
(most?) search purposes it is handy to be able to treat them the same.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: diacritic-fold-search?
2012-11-29 19:10 ` diacritic-fold-search? Drew Adams
@ 2012-11-29 19:31 ` Dani Moncayo
0 siblings, 0 replies; 13+ messages in thread
From: Dani Moncayo @ 2012-11-29 19:31 UTC (permalink / raw)
To: Drew Adams; +Cc: help-gnu-emacs, Lewis Perin
>> It may not be totally irrelevant to note that search engines make
>> diacritic-agnostic search the default. And some Web browsers (Chrome
>> but not Firefox) do this for searches of a page they’re displaying.
>
> Another good point. Emacs has always made search case-insensitive by default
> (`case-fold-search' is t), presumably for the same reason.
>
> IMHO, it would be good for Emacs to handle this the same way it handles
> case-sensitivity: off by default, with a simple toggle to turn it on.
>
> `a' is no more the same as `A' than `è' is the same as `e'. But for many
> (most?) search purposes it is handy to be able to treat them the same.
+1
Definitely it would be a nice feature. Classify similar characters in
groups, and provide the ability to do a group- or class-based search.
--
Dani Moncayo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: diacritic-fold-search?
2012-11-29 18:59 ` diacritic-fold-search? Lewis Perin
2012-11-29 19:10 ` diacritic-fold-search? Drew Adams
@ 2012-11-29 21:59 ` B. T. Raven
2012-11-30 15:29 ` diacritic-fold-search? Lewis Perin
1 sibling, 1 reply; 13+ messages in thread
From: B. T. Raven @ 2012-11-29 21:59 UTC (permalink / raw)
To: help-gnu-emacs
Here are some accent-folding data in a .js file that could probably be
put into some kind of data structure Emacs supports:
http://hex-machina.com/scripts/yui/3.3.0pr1/api/unicode-data-accentfold.js.html
See especially the link to the Unicode utilities at the last header comment.
Ed
> "Drew Adams" <drew.adams@oracle.com> writes:
>
>>> Is there a way to search ignoring diacritics, e.g. capturing "apres"
>>> both with and without an accent grave over the "e"?
>>
>> Great question. I don't think so, but I'm guessing that lots of users could
>> make good use of such a feature!
>>
>> Unless someone points out here that this is already possible, why don't
>> you submit an enhancement request for this feature (`M-x
>> report-emacs-bug' is also for enhancement requests): be able to toggle
>> Isearch distinguishing certain sets of similar chars (diacritics).
>>
>> There could be predefined sets of equivalence classes of chars (e.g.,
>> the same letter, modulo diacritical marks). And users could be able to
>> customize these classes.
>>
>> Likewise, for punctuation chars that are very similar (in
>> purpose/visually), such as straight quotes and curly quotes, and
>> no-break hyphen, hyphen, and the various dashes.
>>
>> Likewise, for whitespace chars other than the standard SPC, TAB, etc.
>> For whitespace, I believe there might be some handling of additional
>> chars such as no-break space, but what's needed, here too, is a simple
>> way to toggle distinguishing them on/off.
>>
>> But your use case is the best one: be able to optionally ignore diacritical
>> marks when searching.
>
> It may not be totally irrelevant to note that search engines make
> diacritic-agnostic search the default. And some Web browsers (Chrome
> but not Firefox) do this for searches of a page they’re displaying.
>
> /Lew
> ---
> Lew Perin / perin@acm.org
> http://babelcarp.org
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: diacritic-fold-search?
2012-11-29 17:20 diacritic-fold-search? Lewis Perin
2012-11-29 17:39 ` diacritic-fold-search? Drew Adams
[not found] ` <mailman.14059.1354210783.855.help-gnu-emacs@gnu.org>
@ 2012-11-30 14:13 ` Doug Lewan
2 siblings, 0 replies; 13+ messages in thread
From: Doug Lewan @ 2012-11-30 14:13 UTC (permalink / raw)
To: Lewis Perin, help-gnu-emacs@gnu.org
I agree this would be a nice feature, and not too hard to implement. (Tedious, possibly, but not hard.)
It's worth noting that searching inherits the input method of the buffer. So, you get the diacriticals more or less for free. (Searching for a word misspelled because of a diacritical, however, wouldn't get you what you want.)
,Doug
> -----Original Message-----
> From: help-gnu-emacs-bounces+dougl=shubertticketing.com@gnu.org
> [mailto:help-gnu-emacs-bounces+dougl=shubertticketing.com@gnu.org] On
> Behalf Of Lewis Perin
> Sent: Thursday, 2012 November 29 12:21
> To: help-gnu-emacs@gnu.org
> Subject: diacritic-fold-search?
>
> Is there a way to search ignoring diacritics, e.g. capturing "apres"
> both with and without an accent grave over the "e"?
>
> /Lew
> ---
> Lew Perin / perin@acm.org
> http://babelcarp.org
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: diacritic-fold-search?
2012-11-29 21:59 ` diacritic-fold-search? B. T. Raven
@ 2012-11-30 15:29 ` Lewis Perin
0 siblings, 0 replies; 13+ messages in thread
From: Lewis Perin @ 2012-11-30 15:29 UTC (permalink / raw)
To: help-gnu-emacs
"B. T. Raven" <btraven@nihilo.net> writes:
>Here are some accent-folding data in a .js file that could probably be
>put into some kind of data structure Emacs supports:
>
>http://hex-machina.com/scripts/yui/3.3.0pr1/api/unicode-data-accentfold.js.html
That’s nice. It even has Hanyu Pinyin characters!
>See especially the link to the Unicode utilities at the last header comment.
Indeed. Thanks!
/Lew
---
Lew Perin / perin@acm.org
http://babelcarp.org
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: diacritic-fold-search?
[not found] ` <mailman.14059.1354210783.855.help-gnu-emacs@gnu.org>
2012-11-29 18:59 ` diacritic-fold-search? Lewis Perin
@ 2012-11-30 18:31 ` Lewis Perin
1 sibling, 0 replies; 13+ messages in thread
From: Lewis Perin @ 2012-11-30 18:31 UTC (permalink / raw)
To: help-gnu-emacs
"Drew Adams" <drew.adams@oracle.com> writes:
>> Is there a way to search ignoring diacritics, e.g. capturing "apres"
>> both with and without an accent grave over the "e"?
>
>Great question. I don't think so, but I'm guessing that lots of users
>could make good use of such a feature!
>
>Unless someone points out here that this is already possible, why don't
>you submit an enhancement request for this feature (`M-x
>report-emacs-bug' is also for enhancement requests): be able to toggle
>Isearch distinguishing certain sets of similar chars (diacritics).
I’ve just done this. We’ll see what the maintainers think of this.
/Lew
---
Lew Perin / perin@acm.org
http://babelcarp.org
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2012-11-30 18:31 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-29 17:12 diacritic-fold-search? Lewis Perin
2012-11-29 18:19 ` diacritic-fold-search? Peter Dyballa
2012-11-29 18:29 ` diacritic-fold-search? Drew Adams
[not found] ` <mailman.14069.1354213153.855.help-gnu-emacs@gnu.org>
2012-11-29 18:37 ` diacritic-fold-search? Lewis Perin
-- strict thread matches above, loose matches on Subject: below --
2012-11-29 17:20 diacritic-fold-search? Lewis Perin
2012-11-29 17:39 ` diacritic-fold-search? Drew Adams
[not found] ` <mailman.14059.1354210783.855.help-gnu-emacs@gnu.org>
2012-11-29 18:59 ` diacritic-fold-search? Lewis Perin
2012-11-29 19:10 ` diacritic-fold-search? Drew Adams
2012-11-29 19:31 ` diacritic-fold-search? Dani Moncayo
2012-11-29 21:59 ` diacritic-fold-search? B. T. Raven
2012-11-30 15:29 ` diacritic-fold-search? Lewis Perin
2012-11-30 18:31 ` diacritic-fold-search? Lewis Perin
2012-11-30 14:13 ` diacritic-fold-search? Doug Lewan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).