unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Juri Linkov <juri@linkov.net>
To: Artur Malabarba <bruce.connor.am@gmail.com>
Cc: emacs-devel <emacs-devel@gnu.org>
Subject: Re: Character group folding in searches
Date: Sat, 07 Feb 2015 02:07:15 +0200	[thread overview]
Message-ID: <87386iinyk.fsf@mail.linkov.net> (raw)
In-Reply-To: <CAAdUY-L8ipk4Aj83hJErinrgODjJab+mhx==59=FjnfmFm_wjw@mail.gmail.com> (Artur Malabarba's message of "Fri, 6 Feb 2015 11:04:03 -0200")

> My question is:
>
> Do any of these options seem good enough? Which would you all like to explore?

This feature, as I see it, has several levels of complexity:

* 1-to-1 char-folding

  ?a <=> ?á

  This is already supported by char-tables, so there is no problem.

* 1-to-1 char-folding in combination with case-folding

  ?a <=> ?Á   (in this example one of them is in lower case
               and another in upper case with acute)

  I'm not sure how your patch handles this case.  We have to consult the
  information about case-folding from the case-table.  Otherwise, we would
  need two new tables instead of one: where character mappings are
  with and without case-folding.  In any case we have to take care of
  the correct interaction with case-fold-search.

* 1-to-1 char-folding plus a combining character

  ?a <=> "á"

  The simplest solution is just to ignore all combining characters in search.
  This should be easy to implement in the search engine by introducing
  a new list of ignorable characters to skip during the search.

* multi-character translation such as ligatures, etc.

  ?ffl <=> "ffl"

  This is the hardest case.  Maybe the existing translation tables
  from ucs-normalize.el could help.  Then configuring would be like

  (set-char-table-extra-slot case-table 3 (get 'ucs-normalize-nfd-table 'translation-table))

  But this requires a significant modification of the search engine to use
  the same logic in the search as is used in `translate-region-internal'
  to support multi-character translation in the search.

  Also it might require adding a new mode such as "lax-decomposition"
  that like lax-word mode will match partially, e.g. "f" will match "ffl".

  Or maybe better to use some external libraries like
  http://userguide.icu-project.org/collation/icu-string-search-service

I agree with your attempts to have something instead of having nothing
(as in an all-or-nothing attitude).  So to me it seems that the first
3 items would comprise the useful minimum, and the hardest last case
could be implemented afterwards.



      parent reply	other threads:[~2015-02-07  0:07 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-06 13:04 Character group folding in searches Artur Malabarba
2015-02-06 14:32 ` Eli Zaretskii
2015-02-06 16:18   ` Artur Malabarba
2015-02-06 16:44     ` Eli Zaretskii
2015-02-06 18:03   ` Stefan Monnier
2015-02-06 19:03     ` Eli Zaretskii
2015-02-06 19:27       ` Artur Malabarba
2015-02-06 21:38         ` Eli Zaretskii
2015-02-06 22:08           ` Artur Malabarba
2015-02-07  8:38             ` Eli Zaretskii
2015-02-06 19:41       ` Stefan Monnier
2015-02-06 21:43         ` Eli Zaretskii
2015-02-07  0:05           ` Stefan Monnier
2015-02-07  8:47             ` Eli Zaretskii
2015-02-07 15:02               ` Stefan Monnier
2015-02-07 15:31                 ` Eli Zaretskii
2015-02-08 14:03                   ` Stefan Monnier
2015-02-08 19:12                     ` Eli Zaretskii
2015-02-09  3:03                       ` Stefan Monnier
2015-02-09 15:40                         ` Eli Zaretskii
2015-02-09 16:33                           ` Stefan Monnier
2015-02-09 17:39                             ` Eli Zaretskii
2015-02-10  2:15                               ` Stefan Monnier
2015-02-10 15:45                                 ` Eli Zaretskii
2015-02-07  0:07 ` Juri Linkov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87386iinyk.fsf@mail.linkov.net \
    --to=juri@linkov.net \
    --cc=bruce.connor.am@gmail.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).