unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Robert Pluim <rpluim@gmail.com>
Cc: emacs-devel@gnu.org
Subject: Re: ignoring combining diacritics in isearch
Date: Wed, 23 Nov 2022 20:02:52 +0200	[thread overview]
Message-ID: <83wn7ly2er.fsf@gnu.org> (raw)
In-Reply-To: <878rk1fuo2.fsf@gmail.com> (message from Robert Pluim on Wed, 23 Nov 2022 18:27:25 +0100)

> From: Robert Pluim <rpluim@gmail.com>
> Date: Wed, 23 Nov 2022 18:27:25 +0100
> 
> Over on Stack Overflow, someone has been trying to get char-folded
> isearch working for Arabic, and has been having some issues because
> char-folding only works for equivalent characters, not base characters
> followed by combining characters. So eg searching for 'ee' when the
> buffer contains
> 
>     éé
> 
> (thatʼs 'e' followed by COMBINING ACUTE ACCENT) fails.
> 
> The following patch fixes that, but itʼs a bit of a sledgehammer (the
> "\\c^*" bit probably needs to be configurable, because there are
> diacritic-like codepoints in Arabic that are not combining, such as
> U+0640 ARABIC TATWEEL)

Yes, this is definitely not the way.  There are many more "foldings" that
Latin scripts don't know about.  For example, it should be possible to fold
the initial, medial, and final forms of letters that exist in some scripts
(including Arabic).

I think we've all but reached the limit to which this quasi-folding via
regexps can be stretched.  Writing regexp by hand or semi-mechanically based
on Unicode properties can only go this far.  _Real_ character folding cannot
work this way.  We should work on infrastructure for folding text for search
purposes, and then we can build features on top of that.



      parent reply	other threads:[~2022-11-23 18:02 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-23 17:27 ignoring combining diacritics in isearch Robert Pluim
2022-11-23 17:45 ` Juri Linkov
2022-11-23 18:02 ` Eli Zaretskii [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83wn7ly2er.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=rpluim@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).