From: Marcin Borkowski <mbork@mbork.pl>
To: Eli Zaretskii <eliz@gnu.org>
Cc: ofv@wanadoo.es, lokedhs@gmail.com, emacs-devel@gnu.org
Subject: Re: Character folding in the pretest
Date: Mon, 08 Feb 2016 20:18:48 +0100 [thread overview]
Message-ID: <87twljqacn.fsf@mbork.pl> (raw)
In-Reply-To: <8360xzqejk.fsf@gnu.org>
On 2016-02-08, at 18:48, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Marcin Borkowski <mbork@mbork.pl>
>> Date: Mon, 08 Feb 2016 15:05:05 +0100
>> Cc: ofv@wanadoo.es, lokedhs@gmail.com, emacs-devel@gnu.org
>>
>> Just as another datapoint in discussion: for me, searching for "l" and
>> finding "ł" seems a bit weird. (The opposite even more so.)
>
> Which is why neither one happens under character folding.
>
>> BTW, strangely enough, here isearching for "l" does /not/ find "ł", but
>> isearching for "a" (with character folding on) finds "ą". Whatever one
>> thinks about char folding, this is clearly a bug.
>
> It's not a bug, it's the feature working as designed: we only fold
> characters that have suitable decompositions in the Unicode Character
> Database. So:
>
> (get-char-code-property ?ą 'decomposition) => (97 808)
>
> but
>
> (get-char-code-property ?ł 'decomposition) => (322)
>
> IOW, ą is canonically equivalent to the 2-character sequence a ̨ (which
> is why searching for a finds that character), while ł has no canonical
> decomposition (nor any other decomposition).
>
> This means that the Unicode guys decided that ł should not be
> equivalent to any other sequence of characters, and therefore Emacs
> doesn't find it unless you search for it literally.
>
> If you want to know why ł doesn't have any decompositions, I suggest
> to ask on the Unicode mailing list, I'm sure they had good reasons,
> most probably reasons that came from people who are experts in the
> Polish language and its intricacies. We just trust the results.
Thanks for the explanation, Eli!
However, given the number of bugs/quirks in Unicode, I'd personally
prefer not to trust them too much. (Though I understand that the Emacs
devs /have/ to trust someone, and choosing the Unicode people is
probably not a bad idea generally.) Funnily, one of the more annoying
bugs in Unicode is connected with quotes, AFAIR. (Why not beat a dead
horse? ;-)) And folding "ą" to "a" while not "ł" to "l" is something
which most Poles (I guess) would treat as a serious, WTF-level bug. And
good luck to all non-Polish people with isearching for the name of Jan
Łukasiewicz (just to choose a Lisp-related name;-)).
Yet another datapoint suggesting that the issue is really complicated,
and that Drew is right: if this is not configurable by users, it might
end up more annoying than helping. (Not to say it won't - I trust Artur
here.)
Best,
--
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University
next prev parent reply other threads:[~2016-02-08 19:18 UTC|newest]
Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-03 0:31 Character folding in the pretest Per Starbäck
2016-02-03 6:34 ` Adrian.B.Robert
2016-02-03 8:00 ` Paul Eggert
2016-02-03 10:54 ` Yuri Khan
2016-02-03 15:57 ` Filipp Gunbin
2016-02-03 16:24 ` Drew Adams
2016-02-03 16:46 ` Clément Pit--Claudel
2016-02-03 17:28 ` Drew Adams
2016-02-03 18:10 ` Clément Pit--Claudel
2016-02-03 18:24 ` Clément Pit--Claudel
2016-02-03 18:31 ` Drew Adams
2016-02-03 16:52 ` Yuri Khan
2016-02-03 11:08 ` Artur Malabarba
2016-02-03 13:24 ` Stefan Monnier
2016-02-03 13:35 ` Nicolas Petton
2016-02-03 15:06 ` Drew Adams
2016-02-03 15:41 ` Eli Zaretskii
2016-02-03 15:55 ` Teemu Likonen
2016-02-03 16:16 ` Eli Zaretskii
2016-02-06 13:41 ` Teemu Likonen
2016-02-06 14:33 ` Eli Zaretskii
2016-02-06 15:09 ` Teemu Likonen
2016-02-06 18:38 ` Artur Malabarba
2016-02-06 19:08 ` Eli Zaretskii
2016-02-07 1:06 ` Artur Malabarba
2016-02-03 16:54 ` Clément Pit--Claudel
2016-02-03 17:01 ` John Wiegley
2016-02-03 21:08 ` Óscar Fuentes
2016-02-03 22:32 ` John Wiegley
2016-02-03 22:52 ` Clément Pit--Claudel
2016-02-03 23:50 ` Sacha Chua
2016-02-04 5:49 ` Ivan Andrus
2016-02-04 21:30 ` Richard Stallman
2016-02-04 8:40 ` Elias Mårtenson
2016-02-04 11:57 ` Dirk-Jan C. Binnema
2016-02-04 15:18 ` Drew Adams
2016-02-04 15:59 ` Óscar Fuentes
2016-02-04 16:36 ` Clément Pit--Claudel
2016-02-04 16:47 ` Óscar Fuentes
2016-02-04 17:05 ` Werner LEMBERG
2016-02-05 5:09 ` Elias Mårtenson
2016-02-05 6:01 ` Werner LEMBERG
2016-02-05 6:36 ` Elias Mårtenson
2016-02-05 7:15 ` Werner LEMBERG
2016-02-05 7:22 ` Elias Mårtenson
2016-02-06 15:43 ` Rasmus
2016-02-06 15:51 ` Eli Zaretskii
2016-02-05 7:52 ` Eli Zaretskii
2016-02-05 15:09 ` Filipp Gunbin
2016-02-05 19:21 ` Eli Zaretskii
2016-02-05 21:12 ` Óscar Fuentes
2016-02-05 22:20 ` Eli Zaretskii
2016-02-06 19:49 ` Richard Stallman
2016-02-06 19:49 ` Richard Stallman
2016-02-08 14:05 ` Marcin Borkowski
2016-02-08 17:48 ` Eli Zaretskii
2016-02-08 17:57 ` Werner LEMBERG
2016-02-08 19:18 ` Marcin Borkowski [this message]
2016-02-08 19:37 ` Eli Zaretskii
[not found] ` <<83oabrouwj.fsf@gnu.org>
2016-02-09 0:04 ` Drew Adams
2016-02-09 12:15 ` Richard Stallman
[not found] ` <<E1aT7CM-0005LM-9f@fencepost.gnu.org>
2016-02-09 15:26 ` Drew Adams
2016-02-06 12:58 ` Rasmus
2016-02-04 17:12 ` Eli Zaretskii
2016-02-04 19:35 ` Óscar Fuentes
2016-02-04 19:52 ` Clément Pit--Claudel
2016-02-04 20:05 ` Eli Zaretskii
2016-02-04 17:27 ` Clément Pit--Claudel
2016-02-04 17:34 ` Eli Zaretskii
2016-02-04 18:18 ` Yuri Khan
2016-02-04 19:46 ` Óscar Fuentes
2016-02-04 20:06 ` Clément Pit--Claudel
2016-02-04 20:40 ` Óscar Fuentes
2016-02-04 20:56 ` Clément Pit--Claudel
2016-02-04 21:16 ` Óscar Fuentes
2016-02-04 20:07 ` Eli Zaretskii
2016-02-04 20:52 ` Óscar Fuentes
2016-02-04 20:59 ` Clément Pit--Claudel
2016-02-04 21:08 ` Eli Zaretskii
2016-02-04 20:23 ` John Wiegley
2016-02-04 17:07 ` Eli Zaretskii
2016-02-04 17:31 ` Clément Pit--Claudel
2016-02-04 23:05 ` Artur Malabarba
2016-02-06 9:37 ` Per Starbäck
2016-02-06 10:41 ` Eli Zaretskii
2016-02-06 12:52 ` Rasmus
2016-02-06 14:31 ` Eli Zaretskii
2016-02-06 14:24 ` Ken Brown
2016-02-06 15:07 ` Eli Zaretskii
2016-02-04 16:54 ` Eli Zaretskii
2016-02-04 17:36 ` Paul Eggert
2016-02-04 17:45 ` Eli Zaretskii
2016-02-04 19:25 ` Paul Eggert
2016-02-04 19:36 ` Eli Zaretskii
2016-02-04 17:26 ` Teemu Likonen
2016-02-05 8:08 ` Adrian.B.Robert
2016-02-04 21:32 ` Richard Stallman
2016-02-08 14:12 ` Marcin Borkowski
2016-02-03 17:02 ` Eli Zaretskii
2016-02-03 15:38 ` Eli Zaretskii
2016-02-03 22:53 ` Richard Stallman
2016-02-03 15:39 ` Eli Zaretskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87twljqacn.fsf@mbork.pl \
--to=mbork@mbork.pl \
--cc=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
--cc=lokedhs@gmail.com \
--cc=ofv@wanadoo.es \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).