all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: bruce.connor.am@gmail.com
Cc: emacs-devel@gnu.org
Subject: Re: extending case-fold-search to remove nonspacing marks (diacritics	etc.)
Date: Fri, 06 Feb 2015 09:35:24 +0200	[thread overview]
Message-ID: <83ioffeblv.fsf@gnu.org> (raw)
In-Reply-To: <CAAdUY-+A6Rz=BSbbOfDxVKPznLwGbZmbi0JUi0EY4=MHrQaW6g@mail.gmail.com>

> Date: Thu, 5 Feb 2015 23:17:42 +0000
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> 
> As for answering your questions:
> 
> >> implementing it for users so it works like `case-fold-search' (you just
> >> set something in Customize and all search commands DWYM) seems much
> >> harder.
> 
> Doing it as part of Emacs is not terribly hard, but it has
> disadvantages. Namely, the case-fold-search machinery only relates one
> character to another character (1 to 1). At least for latin this would
> be enough a lot of the time, e.g. you can use it to relate "á" to "a".
> However, there's another way of writing "á" which takes two
> characters, and this situation can't be handled (AFAIK) by the
> case-fold-search machinery.

This just means you cannot implement that without changes to the C
level.  Changing the C code to lift the one-character restriction is
not very hard.

> The bright side is that I think this two-char way of writing latin
> accents is much less common (not 100% sure though, it's hard to tell
> the difference). The downside is that I know nothing about other
> languages, so maybe using two chars to represent one char is the
> default behavior in some other languages?

It can be more than 2 characters, e.g. in scripts that use diacritics:
there could be more than diacritic combined with one base character.

And then there are characters to be ignored, like ZWJ and bidi
directional controls.

So I think ad-hoc rules like the above is not going to cut it.  We
must use the decomposed forms, whatever they are, and we should also
consult the character properties to ignore the ignorables.

> >> Does anyone have suggestions? Maybe some defadvice magic?
> 
> You can use a defadvice around one of the isearch internal functions
> (check out the branch I mentioned) to implement something in elisp.
> And you can redefine the buffer's case-folding table and use that in
> the advice, but that will require that you generate the entire table.

Please don't kludge around the problem.  If it is important enough for
you to solve it, let's solve it as God intended.




  parent reply	other threads:[~2015-02-06  7:35 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-05 22:16 extending case-fold-search to remove nonspacing marks (diacritics etc.) Ted Zlatanov
2015-02-05 23:06 ` Artur Malabarba
2015-02-05 23:17   ` Artur Malabarba
2015-02-06  0:54     ` Juri Linkov
2015-02-06  2:32       ` Artur Malabarba
2015-02-06  2:51         ` Artur Malabarba
2015-02-06  7:48         ` Eli Zaretskii
2015-02-06  9:06           ` Artur Malabarba
2015-02-06  9:41             ` Eli Zaretskii
2015-02-06 10:03               ` Artur Malabarba
2015-02-06 10:04               ` Eli Zaretskii
2015-02-06  4:58     ` Stephen J. Turnbull
2015-02-06  7:51       ` Eli Zaretskii
2015-02-06 14:50         ` Stefan Monnier
2015-02-06 14:54           ` Eli Zaretskii
2015-02-06  7:35     ` Eli Zaretskii [this message]
2015-02-06  7:29 ` Eli Zaretskii
2015-02-07 12:59   ` Ted Zlatanov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83ioffeblv.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=bruce.connor.am@gmail.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.