all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: martin rudalics <rudalics@gmx.at>
To: Juri Linkov <juri@jurta.org>
Cc: 13041@debbugs.gnu.org, perin@panix.com, perin@acm.org
Subject: bug#13041: 24.2; diacritic-fold-search
Date: Sat, 08 Dec 2012 12:21:48 +0100	[thread overview]
Message-ID: <50C322CC.1000806@gmx.at> (raw)
In-Reply-To: <87ehj18l9p.fsf@mail.jurta.org>

 >> - leave the text alone but give each string that should be handled
 >>   specially a text property with the normalized form.  In this case
 >>   searching has to pay attention to these properties, if present.
 >>
 >> - normalize the text and give each normalized string a text property
 >>   with the original text.  In this case searching will proceed as usual
 >>   but you have to restore the original text when done.
 >
 > This reminds an idea that searching should take into account the text
 > displayed with the `display' property and other display-related properties.
 > It seems this is more difficult to implement.

... and probably should include searching for overlays too.

 >> Also I don't know how to handle the return value and/or highlighting
 >> when, for example, finding a match for "suf" within "suffer".  For
 >> example, replacing each occurrence of "suf" with the empty string should
 >> leave us with "fer" here.
 >
 > I believe such ligature characters should be handled as a whole,
 > i.e. "suf" doesn't match "suffer", only "suff" should match it.

This means that when you type the second "f" you might get a match
before the present one.  Consider a buffer containing the two lines

suffer
suffer

Typing "suf" as search string would go to "suffer".  Adding an "f" to
the search string now would go back to "suffer" (or not).  Disconcerting
in any case.

 >> I have no idea how many mappings like "ß" -> "ss" exist.  The problem is
 >> that we don't get them from UnicodeData.txt IIUC.
 >
 > I can't find them in UnicodeData.txt too.  Looking at the files in
 > http://www.unicode.org/Public/UNIDATA/ can find them in the file
 >
 > http://www.unicode.org/Public/UNIDATA/DerivedNormalizationProps.txt
 >
 > that is derived from
 >
 > http://www.unicode.org/Public/UNIDATA/CaseFolding.txt
 > http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt

Case folding "ß" to "SS" (upper case "S") is not what I had in mind.  I
was talking about the (weak?) equivalence of "ß" and "ss" (lower case
"s") which is much more important when searching.  In particular so,
because many German words that were earlier written with an "ß" are now
written with "ss".

martin






  parent reply	other threads:[~2012-12-08 11:21 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-30 18:22 bug#13041: 24.2; diacritic-fold-search Lewis Perin
2012-11-30 18:51 ` Juri Linkov
2012-11-30 21:07   ` Lewis Perin
2012-12-01  0:27     ` Juri Linkov
2012-12-01  0:47       ` Drew Adams
2012-12-01  0:49         ` Drew Adams
2012-12-01  1:20           ` Lew Perin
2012-12-01  6:50             ` Drew Adams
2012-12-01  8:32       ` Eli Zaretskii
2012-12-01  9:09         ` Eli Zaretskii
2012-12-01 16:38         ` Drew Adams
2012-12-02  0:27         ` Juri Linkov
2012-12-02 17:45           ` martin rudalics
2012-12-02 18:02             ` Eli Zaretskii
2012-12-03 10:16               ` martin rudalics
2012-12-03 16:47                 ` Eli Zaretskii
2012-12-03 17:42                   ` martin rudalics
2012-12-03 17:59                     ` Eli Zaretskii
2012-12-04 17:54                       ` martin rudalics
2012-12-04 19:28                         ` Eli Zaretskii
2012-12-05  9:41                           ` martin rudalics
2012-12-05 16:37                             ` Eli Zaretskii
2012-12-06 10:31                               ` martin rudalics
2012-12-06 17:48                                 ` Eli Zaretskii
2012-12-05 23:05                             ` Juri Linkov
2012-12-06 10:32                               ` martin rudalics
2012-12-04 20:12                         ` Drew Adams
2012-12-04 23:15                           ` Drew Adams
2012-12-05  6:50                             ` Drew Adams
2012-12-05  9:42                               ` martin rudalics
2012-12-05 15:38                                 ` Drew Adams
2012-12-06  9:25                               ` Kenichi Handa
2012-12-06 10:34                                 ` martin rudalics
2012-12-06 17:50                                   ` Eli Zaretskii
2012-12-07  0:58                                 ` Juri Linkov
2012-12-07  6:33                                   ` Eli Zaretskii
2012-12-07 10:37                                   ` martin rudalics
2012-12-07 23:55                                     ` Juri Linkov
2012-12-08  8:20                                       ` Eli Zaretskii
2012-12-08 11:35                                         ` martin rudalics
2012-12-08 12:40                                           ` Eli Zaretskii
2012-12-08 11:21                                       ` martin rudalics [this message]
2012-12-08 23:07                                         ` Juri Linkov
2012-12-09  0:04                                           ` Drew Adams
2012-12-09 17:52                                           ` martin rudalics
2012-12-09 18:06                                             ` Drew Adams
2012-12-11  7:19                                               ` Eli Zaretskii
2012-12-08 23:54                                       ` Stefan Monnier
2012-12-09  0:14                                         ` Drew Adams
2012-12-09 15:42                                           ` Stefan Monnier
2012-12-09 18:00                                             ` Drew Adams
2012-12-09  0:35                                         ` Juri Linkov
2012-12-09 11:35                                           ` Stephen Berman
2012-12-09 17:52                                             ` martin rudalics
2012-12-09 15:45                                           ` Stefan Monnier
2012-12-10  7:57                                             ` Juri Linkov
2012-12-10  8:20                                               ` Eli Zaretskii
2012-12-05  9:42                             ` martin rudalics
2012-12-05  9:42                           ` martin rudalics
2012-12-05 15:38                             ` Drew Adams
2012-12-05 15:51                               ` Lewis Perin
2012-12-05 16:20                                 ` Drew Adams
2012-12-05 17:16                               ` Drew Adams
2012-12-05 18:00                                 ` Drew Adams
2012-12-05 18:27                                   ` Eli Zaretskii
2012-12-06 10:31                                   ` martin rudalics
2012-12-06 15:59                                     ` Drew Adams
2012-12-06 10:28                               ` martin rudalics
2012-12-06 17:53                                 ` Eli Zaretskii
2012-12-05 23:04                             ` Juri Linkov
2012-12-06 10:31                               ` martin rudalics
2012-12-07  0:52                                 ` Juri Linkov
2012-12-02 21:39             ` Juri Linkov
2012-12-03 10:16               ` martin rudalics
2012-12-04  0:17                 ` Juri Linkov
2012-12-04  3:41                   ` Eli Zaretskii
2012-12-02 18:16           ` Eli Zaretskii
2012-12-02 21:31             ` Juri Linkov
2012-12-05 19:17             ` Drew Adams
2012-12-05 21:19               ` Eli Zaretskii
2012-11-30 19:31 ` Stefan Monnier
2016-08-31 14:45 ` Michael Albinus
     [not found]   ` <22473.57245.883865.68491@panix5.panix.com>
2016-09-03  7:06     ` Michael Albinus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50C322CC.1000806@gmx.at \
    --to=rudalics@gmx.at \
    --cc=13041@debbugs.gnu.org \
    --cc=juri@jurta.org \
    --cc=perin@acm.org \
    --cc=perin@panix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.