all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Juri Linkov <juri@jurta.org>
Cc: perin@panix.com, 13041@debbugs.gnu.org, perin@acm.org
Subject: bug#13041: 24.2; diacritic-fold-search
Date: Sat, 01 Dec 2012 10:32:35 +0200	[thread overview]
Message-ID: <83fw3qtboc.fsf@gnu.org> (raw)
In-Reply-To: <87hao6zko4.fsf@mail.jurta.org>

> From: Juri Linkov <juri@jurta.org>
> Date: Sat, 01 Dec 2012 02:27:40 +0200
> Cc: 13041@debbugs.gnu.org, perin@acm.org
> 
> > In the last message of that thread, you say “Provided it doesn’t make
> > the search slow, it would be nice to add it to Emacs activating on
> > some user settings.”  Do you remember if that technique turned out to
> > be tolerably speedy?
> 
> Yes, I have no problems with the speed.  The problem is how to
> disable this feature when it is active.  We need a special key
> to toggle it in Isearch.  One variant is M-s ~ where the easy-to-type
> TILDE character represents diacritics.  Also it's unclear whether the
> Isearch prompt should indicate its active state as e.g.

I don't understand why this thread is talking only about Latin
characters with diacritics.  That is a special case of what Unicode
calls "compatibility equivalence" (q.e.).  For example, even in the
Latin environments, don't you want to find "sniff" when searching for
"sniff", and vice versa? And there are similar issues in many
non-Latin scripts.

The decomposition of a character such as 'ff' is given by the Unicode
database, for example:

  FB00;LATIN SMALL LIGATURE FF;Ll;0;L;<compat> 0066 0066;;;;N;;;;;
                                      ^^^^^^^^^^^^^^^^^^

(66 hex, or 102 decimal, is the codepoint of 'f').

Emacs already supports these decomposition properties.  E.g.:

  (get-char-code-property ?ff 'decomposition) => (compat 102 102)

Another example, closer to the issue that triggered this thread:

  (get-char-code-property ?è 'decomposition) => (101 768)

(If you want to understand why the previous example included "compat"
in the result, while this one doesn't, read more about Unicode
normalization forms.  The distinction is irrelevant for the current
discussion.)

Using these properties, every search string can be converted to a
sequence of non-decomposable characters (this process is recursive,
because the 'decomposition' property can use characters that
themselves are decomposable).  If the user wants to ignore diacritics,
then the diacritics should be dropped from the decomposition sequence
before starting the search.  E.g., for the decomposition of è above,
we will drop the 768 and will be left with 101, which is 'e'.  Then
searching for that string should apply the same decomposition
transformation to the text being searched, when comparing them.

This would be the most general way of solving this issue, a way that
is not limited to diacritics nor to Latin scripts.  And doing that
will move Emacs closer to the goal of being Unicode compatible, since
support for this is required by the Unicode Standard.

By contrast, building and using custom data bases of equivalences that
are limited to diacritics in Latin scripts is not moving Emacs towards
that goal.  It's just a hack, IMO.






  parent reply	other threads:[~2012-12-01  8:32 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-30 18:22 bug#13041: 24.2; diacritic-fold-search Lewis Perin
2012-11-30 18:51 ` Juri Linkov
2012-11-30 21:07   ` Lewis Perin
2012-12-01  0:27     ` Juri Linkov
2012-12-01  0:47       ` Drew Adams
2012-12-01  0:49         ` Drew Adams
2012-12-01  1:20           ` Lew Perin
2012-12-01  6:50             ` Drew Adams
2012-12-01  8:32       ` Eli Zaretskii [this message]
2012-12-01  9:09         ` Eli Zaretskii
2012-12-01 16:38         ` Drew Adams
2012-12-02  0:27         ` Juri Linkov
2012-12-02 17:45           ` martin rudalics
2012-12-02 18:02             ` Eli Zaretskii
2012-12-03 10:16               ` martin rudalics
2012-12-03 16:47                 ` Eli Zaretskii
2012-12-03 17:42                   ` martin rudalics
2012-12-03 17:59                     ` Eli Zaretskii
2012-12-04 17:54                       ` martin rudalics
2012-12-04 19:28                         ` Eli Zaretskii
2012-12-05  9:41                           ` martin rudalics
2012-12-05 16:37                             ` Eli Zaretskii
2012-12-06 10:31                               ` martin rudalics
2012-12-06 17:48                                 ` Eli Zaretskii
2012-12-05 23:05                             ` Juri Linkov
2012-12-06 10:32                               ` martin rudalics
2012-12-04 20:12                         ` Drew Adams
2012-12-04 23:15                           ` Drew Adams
2012-12-05  6:50                             ` Drew Adams
2012-12-05  9:42                               ` martin rudalics
2012-12-05 15:38                                 ` Drew Adams
2012-12-06  9:25                               ` Kenichi Handa
2012-12-06 10:34                                 ` martin rudalics
2012-12-06 17:50                                   ` Eli Zaretskii
2012-12-07  0:58                                 ` Juri Linkov
2012-12-07  6:33                                   ` Eli Zaretskii
2012-12-07 10:37                                   ` martin rudalics
2012-12-07 23:55                                     ` Juri Linkov
2012-12-08  8:20                                       ` Eli Zaretskii
2012-12-08 11:35                                         ` martin rudalics
2012-12-08 12:40                                           ` Eli Zaretskii
2012-12-08 11:21                                       ` martin rudalics
2012-12-08 23:07                                         ` Juri Linkov
2012-12-09  0:04                                           ` Drew Adams
2012-12-09 17:52                                           ` martin rudalics
2012-12-09 18:06                                             ` Drew Adams
2012-12-11  7:19                                               ` Eli Zaretskii
2012-12-08 23:54                                       ` Stefan Monnier
2012-12-09  0:14                                         ` Drew Adams
2012-12-09 15:42                                           ` Stefan Monnier
2012-12-09 18:00                                             ` Drew Adams
2012-12-09  0:35                                         ` Juri Linkov
2012-12-09 11:35                                           ` Stephen Berman
2012-12-09 17:52                                             ` martin rudalics
2012-12-09 15:45                                           ` Stefan Monnier
2012-12-10  7:57                                             ` Juri Linkov
2012-12-10  8:20                                               ` Eli Zaretskii
2012-12-05  9:42                             ` martin rudalics
2012-12-05  9:42                           ` martin rudalics
2012-12-05 15:38                             ` Drew Adams
2012-12-05 15:51                               ` Lewis Perin
2012-12-05 16:20                                 ` Drew Adams
2012-12-05 17:16                               ` Drew Adams
2012-12-05 18:00                                 ` Drew Adams
2012-12-05 18:27                                   ` Eli Zaretskii
2012-12-06 10:31                                   ` martin rudalics
2012-12-06 15:59                                     ` Drew Adams
2012-12-06 10:28                               ` martin rudalics
2012-12-06 17:53                                 ` Eli Zaretskii
2012-12-05 23:04                             ` Juri Linkov
2012-12-06 10:31                               ` martin rudalics
2012-12-07  0:52                                 ` Juri Linkov
2012-12-02 21:39             ` Juri Linkov
2012-12-03 10:16               ` martin rudalics
2012-12-04  0:17                 ` Juri Linkov
2012-12-04  3:41                   ` Eli Zaretskii
2012-12-02 18:16           ` Eli Zaretskii
2012-12-02 21:31             ` Juri Linkov
2012-12-05 19:17             ` Drew Adams
2012-12-05 21:19               ` Eli Zaretskii
2012-11-30 19:31 ` Stefan Monnier
2016-08-31 14:45 ` Michael Albinus
     [not found]   ` <22473.57245.883865.68491@panix5.panix.com>
2016-09-03  7:06     ` Michael Albinus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83fw3qtboc.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=13041@debbugs.gnu.org \
    --cc=juri@jurta.org \
    --cc=perin@acm.org \
    --cc=perin@panix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.