From: "Elias Mårtenson" <lokedhs@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org>
Subject: Re: On language-dependent defaults for character-folding
Date: Fri, 19 Feb 2016 21:37:26 +0800 [thread overview]
Message-ID: <CADtN0W+93LH5d3=joVj2xe40rramMOcURKw7QKdv_OefYCm8Ug@mail.gmail.com> (raw)
In-Reply-To: <837fi0sz29.fsf@gnu.org>
[-- Attachment #1: Type: text/plain, Size: 6782 bytes --]
On 19 February 2016 at 19:46, Eli Zaretskii <eliz@gnu.org> wrote:
> > Of course you have to use the decomposition algorithms to ensure that
> the precomposed and decomposed
> > variations of the same character compares equal.
>
> Then you agree that _some_ form of character-folding should be turned
> on by default?
>
Yes.
> > This is, however, different from using the decomposition to to decompose
> a character and then using the
> > base character as the thing to match against. The latter is what Emacs
> is doing today, as far as I understand.
>
> Please describe in more detail why do you think what Emacs does today
> is not what you think it should do. It's possible we have a
> miscommunication here.
>
The main issue to me is that it matches things that should not be matched.
A secondary (minor) issue is that some things that should be matched is not
(see my example with U+2C65).
> For example, if the buffer includes ñ (2 characters), should "C-s n"
> find the n in it?
>
That depends on the locale of the user. However, from the point of a user,
there should not be a visible difference between the precomposed and the
composed variants are the exact same character. This is in line with
Unicode recommendations (https://en.wikipedia.org/wiki/Unicode_equivalence)
Note: I know that it's possible that I am wrong about this and that Unicode
actually _has_ said that the equivalence tables can be used for this
purpose (I.e. decompose and only use the primary character). If that is the
case, I'd be interested to see a reference to that, but I will still be of
the same opinion that doing so will result in broken behaviour for a
certain class of user.
Thus, if I am Spanish, I will _not_ want any of those to match "n". If I'm
Swedish I will likely want both of them to match "n".
That equivalence is encoded in the decomposition data that is part of
> UnicodeData.txt which Emacs uses for character-folding.
>
The equivalence tables explains that the precomposed character U+00F1 is
equivalent to the specific sequence U+006E U+0303. That is all it says. It
does not say that ñ is a variation of n. It's an instruction how to
construct a given character.
The decompositions are used in the normalisation forms to ensure that the
two variants are treated equally (such as the two alternative
representations of ñ that we have been discussing).
> > If you look at the latin collation chart for example
> > (http://unicode.org/charts/collation/chart_Latin.html) you will see
> that the characters are grouped. These are
> > the equivalences I'm referring to.
>
> Yes. And if you look at the entries of the equivalent characters in
> UnicodeData.txt, you will see there they have decompositions, which is
> what Emacs uses for searching when character-folding is in effect.
>
Yes, and this is where the crux of our disagreement lies, I think. I
previously referred to using the decompositions as a guide to character
equivalence as a "trick". I stand by this, since this is not the purpose of
the decompositions. The best thing that Unicode provides for that purpose
(to my knowledge) are the collation charts that I mentioned previously (
http://unicode.org/charts/collation/)
> > Now, I note that on these charts, U+0061 LATIN SMALL LETTER A and U+2C65
> LATIN SMALL LETTER A
> > WITH STROKE compares as different characters, and the latter does not
> have a decomposition. Should this
> > also be addressed?
>
> Maybe so, but given the controversy even about what we do now, which
> is a subset, I'd doubt extending what we do now is a wise move.
>
I was just asking to understand your position better.
> > As for the locale-specific parts: using that will only DTRT if we
> > assume that the majority of searches are done in buffers holding text
> > in locale's language. Is that a good assumption?
> >
> > My opinion is that the default search behaviour should depend primarily
> on the locale of the entire Emacs
> > session. I.e. the locale of the user starting the application. I'm not
> disagreeing that allowing a buffer-local locale
> > override this behaviour is a good idea, but as a Swedish speaker I
> really see å, ä and a as completely
> > separate things, even if the language of the buffer that I am editing
> happens to be English. The equivalence of
> > these characters is the odd behaviour here, and the one that should be
> enabled explicitly.
> >
> > Also, if I happen to be editing a Spanish document (I don't speak
> Spanish) I would find equivalence of ñ and n
> > to be incredibly useful, even though Óscar would grind his teeth at it.
> :-)
>
> So you are in fact making two contradicting statements here.
Interesting. I have re-read what I wrote and I really don't see myself
holding two contradicting statement. Perhaps you think that I am both
against folding and not, at the same time. If that's the case, let me try
to rephrase:
I like the idea of character folding. But, if it's incorrectly (by my
standards, of course) implemented I would rather not have it at all since
it will be highly annoying.
> Indeed,
> the locale in which Emacs started says almost nothing about the
> documents being edited, nor even about the user's preferences: it is
> easy to imagine a user whose "native" locale is X starting Emacs in
> another locale.
>
Yes. I am fully aware of this. But so be it. Having applications work
differently depending on the locale of the environment the application was
started in is nothing new.
> > We are talking
> > about a multilingual Emacs, in an age of global communications, where
> > you can have conversations with someone on the other side of the
> > world, or read text that combines several languages in the same
> > buffer. Do we really want to go back to the l10n days, when there was
> > ever only one locale that was interesting -- the current one? I
> > wonder.
> >
> > Actually, I think so. This is because the search equivalence is
> inherently a local thing.
>
> Being a multi-lingual environment, Emacs has no real notion of the
> locale.
>
Perhaps it should?
> > It is, Unicode provides it. We just didn't import it yet.
> >
> > It does? I was looking for such tables, but didn't find it. Do you have
> a link?
>
> Look for DUCET and its tailoring data. These should be a good
> starting point:
>
> http://www.unicode.org/Public/UCA/latest/
> http://cldr.unicode.org/
>
Those are the decomposition charts, and don't actually say anything about
equivalence outside of providing a canonical form for precomposed
characters, as was discussed above.
Regards,
Elias
[-- Attachment #2: Type: text/html, Size: 9952 bytes --]
next prev parent reply other threads:[~2016-02-19 13:37 UTC|newest]
Thread overview: 263+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-09 17:26 On language-dependent defaults for character-folding Artur Malabarba
2016-02-09 17:39 ` Pierpaolo Bernardi
2016-02-09 17:54 ` Paul Eggert
2016-02-10 0:49 ` Pierpaolo Bernardi
2016-02-10 2:20 ` Artur Malabarba
2016-02-10 3:01 ` Pierpaolo Bernardi
2016-02-10 9:55 ` Artur Malabarba
2016-02-10 18:12 ` Óscar Fuentes
2016-02-10 19:23 ` Artur Malabarba
2016-02-09 17:48 ` Drew Adams
2016-02-09 16:43 ` Artur Malabarba
2016-02-09 17:58 ` Eli Zaretskii
2016-02-09 17:10 ` Artur Malabarba
2016-02-09 18:21 ` Óscar Fuentes
2016-02-09 19:54 ` Artur Malabarba
2016-02-09 20:08 ` Eli Zaretskii
2016-02-10 1:58 ` Artur Malabarba
2016-02-09 21:07 ` Óscar Fuentes
2016-02-10 2:18 ` Artur Malabarba
2016-02-10 2:52 ` Óscar Fuentes
2016-02-10 2:56 ` Mark Oteiza
2016-02-10 15:25 ` Eli Zaretskii
2016-02-10 21:17 ` Artur Malabarba
2016-02-11 3:39 ` Eli Zaretskii
2016-02-12 22:36 ` Per Starbäck
2016-02-13 8:33 ` Eli Zaretskii
2016-02-13 10:10 ` Markus Triska
2016-02-13 10:21 ` Eli Zaretskii
2016-02-13 16:46 ` joakim
2016-02-11 0:54 ` Juri Linkov
2016-02-11 1:37 ` Óscar Fuentes
2016-02-12 0:50 ` Juri Linkov
2016-02-12 1:50 ` Óscar Fuentes
2016-02-12 7:10 ` Eli Zaretskii
2016-02-12 7:32 ` Óscar Fuentes
2016-02-12 8:44 ` Eli Zaretskii
2016-02-12 10:03 ` Óscar Fuentes
2016-02-12 11:11 ` Joost Kremers
2016-02-12 18:21 ` Óscar Fuentes
2016-02-12 12:00 ` Eli Zaretskii
2016-02-12 18:42 ` Óscar Fuentes
2016-02-12 19:06 ` Eli Zaretskii
2016-02-12 19:28 ` Óscar Fuentes
2016-02-12 23:57 ` Juri Linkov
2016-02-13 0:06 ` Drew Adams
2016-02-13 8:49 ` Eli Zaretskii
2016-02-13 17:20 ` Drew Adams
2016-02-13 17:58 ` Eli Zaretskii
2016-02-18 19:15 ` John Wiegley
2016-02-18 20:12 ` Eli Zaretskii
2016-02-19 5:11 ` Lars Ingebrigtsen
2016-02-19 8:20 ` Eli Zaretskii
2016-02-19 9:22 ` Elias Mårtenson
2016-02-19 10:09 ` Eli Zaretskii
2016-02-19 10:51 ` Elias Mårtenson
2016-02-19 11:46 ` Eli Zaretskii
2016-02-19 13:37 ` Elias Mårtenson [this message]
2016-02-19 19:18 ` Eli Zaretskii
2016-02-20 5:22 ` Elias Mårtenson
2016-02-20 6:31 ` Lars Ingebrigtsen
2016-02-20 9:18 ` Elias Mårtenson
2016-02-20 10:34 ` Eli Zaretskii
2016-02-21 2:51 ` Lars Ingebrigtsen
2016-02-21 6:28 ` Elias Mårtenson
2016-02-21 8:14 ` Achim Gratz
2016-02-23 16:56 ` Eli Zaretskii
2016-02-21 10:05 ` Lars Ingebrigtsen
2016-02-21 11:01 ` Elias Mårtenson
2016-02-21 16:02 ` Eli Zaretskii
2016-02-22 1:58 ` Lars Ingebrigtsen
2016-02-22 2:34 ` Elias Mårtenson
2016-02-22 2:48 ` Lars Ingebrigtsen
2016-02-22 6:13 ` Werner LEMBERG
2016-02-22 18:03 ` Richard Stallman
2016-02-22 18:27 ` Werner LEMBERG
2016-02-22 18:01 ` Richard Stallman
2016-02-22 19:06 ` Eli Zaretskii
2016-02-23 17:43 ` Richard Stallman
2016-02-23 18:14 ` Eli Zaretskii
2016-02-23 20:24 ` Yuri Khan
2016-02-25 12:11 ` Richard Stallman
2016-02-25 14:57 ` Yuri Khan
2016-02-26 20:21 ` Richard Stallman
2016-02-27 5:47 ` Yuri Khan
2016-02-27 19:54 ` Richard Stallman
2016-02-27 20:02 ` Eli Zaretskii
2016-02-27 20:05 ` Eli Zaretskii
2016-02-28 10:25 ` Richard Stallman
2016-02-28 6:06 ` Yuri Khan
2016-02-24 13:41 ` Richard Stallman
2016-02-24 17:54 ` Eli Zaretskii
2016-02-25 12:15 ` Richard Stallman
2016-02-25 12:38 ` Joost Kremers
2016-02-25 22:43 ` John Wiegley
2016-02-25 22:48 ` John Wiegley
2016-02-26 18:13 ` Eli Zaretskii
2016-02-27 0:48 ` John Wiegley
2016-02-27 8:38 ` Eli Zaretskii
2016-02-27 8:58 ` John Wiegley
2016-02-27 9:30 ` Eli Zaretskii
2016-02-27 16:22 ` Ken Brown
2016-02-27 22:48 ` John Wiegley
2016-02-28 15:57 ` Eli Zaretskii
2016-02-28 16:59 ` Drew Adams
2016-02-28 22:59 ` John Wiegley
2016-02-29 0:22 ` Drew Adams
2016-02-29 0:31 ` Juri Linkov
2016-02-29 3:45 ` Eli Zaretskii
2016-02-27 19:53 ` Richard Stallman
2016-02-27 20:01 ` Eli Zaretskii
2016-02-28 10:24 ` Richard Stallman
2016-02-28 16:01 ` Eli Zaretskii
[not found] ` <<E1aZyX5-0007bU-Mu@fencepost.gnu.org>
[not found] ` <<83oab0ako0.fsf@gnu.org>
2016-02-28 17:00 ` Drew Adams
2016-02-28 17:59 ` Clément Pit--Claudel
2016-02-28 18:04 ` Eli Zaretskii
2016-02-28 18:15 ` Clément Pit--Claudel
2016-02-28 18:23 ` Drew Adams
2016-02-28 18:46 ` Eli Zaretskii
2016-02-28 18:22 ` Drew Adams
2016-02-28 18:58 ` Clément Pit--Claudel
2016-02-24 13:41 ` Richard Stallman
2016-02-24 17:56 ` Eli Zaretskii
2016-02-25 12:15 ` Richard Stallman
2016-02-23 20:21 ` Yuri Khan
2016-02-23 21:15 ` Marcin Borkowski
2016-02-22 18:01 ` Richard Stallman
2016-02-22 18:58 ` Eli Zaretskii
2016-02-23 1:30 ` Lars Ingebrigtsen
2016-02-23 17:46 ` Richard Stallman
2016-02-24 1:50 ` Lars Ingebrigtsen
2016-02-24 6:40 ` Lars Brinkhoff
2016-02-24 13:43 ` Richard Stallman
2016-02-23 2:03 ` Elias Mårtenson
2016-02-23 17:46 ` Richard Stallman
2016-02-22 3:38 ` Eli Zaretskii
2016-02-22 3:57 ` Lars Ingebrigtsen
2016-02-22 16:10 ` Eli Zaretskii
2016-02-22 18:58 ` John Wiegley
2016-02-23 7:50 ` Per Starbäck
2016-02-23 16:29 ` John Wiegley
2016-02-21 16:31 ` Eli Zaretskii
2016-02-21 16:58 ` Elias Mårtenson
2016-02-21 17:23 ` Eli Zaretskii
2016-02-21 18:48 ` Ivan Andrus
2016-02-22 15:58 ` Wolfgang Jenkner
2016-02-22 16:35 ` Eli Zaretskii
2016-02-22 16:56 ` Wolfgang Jenkner
2016-02-22 17:24 ` Eli Zaretskii
2016-02-22 17:59 ` Richard Stallman
2016-02-22 18:57 ` Eli Zaretskii
2016-02-23 17:43 ` Richard Stallman
2016-02-23 18:03 ` Eli Zaretskii
2016-02-24 13:41 ` Richard Stallman
2016-02-23 17:43 ` Richard Stallman
[not found] ` <<E1aYGze-000655-RM@fencepost.gnu.org>
2016-02-23 18:00 ` Drew Adams
2016-02-22 17:59 ` Richard Stallman
2016-02-22 18:51 ` Eli Zaretskii
2016-02-23 0:14 ` Juri Linkov
2016-02-23 17:11 ` Eli Zaretskii
2016-02-24 0:16 ` Juri Linkov
2016-02-24 18:39 ` Eli Zaretskii
2016-02-25 0:29 ` Juri Linkov
2016-02-25 16:24 ` Eli Zaretskii
2016-02-29 0:22 ` Juri Linkov
2016-02-29 16:27 ` Eli Zaretskii
2016-02-29 23:40 ` Juri Linkov
2016-03-01 16:44 ` Eli Zaretskii
2016-02-26 20:23 ` Richard Stallman
2016-02-21 16:25 ` Eli Zaretskii
2016-02-22 1:56 ` Lars Ingebrigtsen
2016-02-22 9:20 ` Andreas Schwab
2016-02-23 1:46 ` Lars Ingebrigtsen
2016-02-23 3:38 ` Eli Zaretskii
2016-02-21 12:44 ` Richard Stallman
2016-02-21 16:05 ` Eli Zaretskii
2016-02-22 17:57 ` Richard Stallman
2016-02-22 18:34 ` Eli Zaretskii
2016-02-20 9:21 ` Eli Zaretskii
2016-02-20 10:08 ` Elias Mårtenson
2016-02-20 10:44 ` Eli Zaretskii
2016-02-19 20:38 ` Marcin Borkowski
2016-02-19 22:44 ` Lars Ingebrigtsen
2016-02-19 22:54 ` Clément Pit--Claudel
2016-02-20 5:25 ` Elias Mårtenson
2016-02-20 14:32 ` Richard Stallman
2016-02-20 15:50 ` Elias Mårtenson
2016-02-21 12:45 ` Richard Stallman
2016-02-20 8:09 ` Eli Zaretskii
2016-02-20 14:32 ` Richard Stallman
2016-02-24 23:27 ` Rasmus
2016-02-25 20:46 ` Richard Stallman
2016-02-13 18:15 ` Artur Malabarba
2016-02-13 18:26 ` Drew Adams
2016-02-12 19:09 ` Clément Pit--Claudel
2016-02-12 19:39 ` Óscar Fuentes
2016-02-13 15:32 ` Richard Stallman
2016-02-13 15:40 ` Eli Zaretskii
2016-02-13 16:58 ` Andreas Schwab
2016-02-13 17:44 ` Eli Zaretskii
2016-02-13 16:37 ` Marcin Borkowski
2016-02-13 16:50 ` Eli Zaretskii
2016-02-13 17:15 ` Marcin Borkowski
2016-02-13 17:45 ` Eli Zaretskii
2016-02-13 17:52 ` Marcin Borkowski
2016-02-13 17:46 ` andres.ramirez
2016-02-14 13:59 ` Richard Stallman
2016-02-12 23:50 ` Juri Linkov
2016-02-13 0:33 ` Óscar Fuentes
2016-02-14 13:57 ` Richard Stallman
2016-02-14 14:27 ` Óscar Fuentes
2016-02-15 10:28 ` Richard Stallman
2016-02-15 12:31 ` Óscar Fuentes
2016-02-15 17:45 ` Richard Stallman
2016-02-16 13:54 ` Elias Mårtenson
2016-02-16 14:30 ` Per Starbäck
2016-02-16 19:32 ` Ken Brown
2016-02-16 23:49 ` Lars Ingebrigtsen
2016-02-17 16:03 ` Richard Stallman
2016-02-18 8:57 ` Alan Mackenzie
2016-02-18 17:27 ` Eli Zaretskii
2016-02-19 12:37 ` Richard Stallman
2016-02-19 18:31 ` John Wiegley
2016-02-17 8:00 ` Joost Kremers
2016-02-17 15:34 ` Eli Zaretskii
2016-02-17 18:30 ` Achim Gratz
2016-02-17 19:30 ` Eli Zaretskii
2016-02-17 20:26 ` Marcin Borkowski
2016-02-17 20:06 ` Joost Kremers
2016-02-17 20:15 ` Eli Zaretskii
2016-02-17 22:58 ` Ken Brown
2016-02-18 0:03 ` Vinicius Latorre
2016-02-18 17:29 ` Eli Zaretskii
2016-02-18 4:55 ` Marcin Borkowski
2016-02-18 11:26 ` Filipp Gunbin
2016-02-18 17:26 ` Eli Zaretskii
2016-02-19 12:30 ` Filipp Gunbin
2016-02-19 15:22 ` Eli Zaretskii
2016-02-18 17:30 ` Eli Zaretskii
2016-02-17 22:53 ` Mark Oteiza
2016-02-18 0:11 ` Juri Linkov
2016-02-18 0:20 ` Mark Oteiza
2016-02-18 17:28 ` Eli Zaretskii
2016-02-18 4:53 ` Marcin Borkowski
2016-02-18 17:07 ` Elias Mårtenson
2016-02-18 17:21 ` Eli Zaretskii
2016-02-19 7:40 ` Elias Mårtenson
2016-02-19 19:24 ` Achim Gratz
2016-02-20 5:05 ` Elias Mårtenson
2016-02-20 13:59 ` Achim Gratz
2016-02-19 20:47 ` Marcin Borkowski
2016-02-20 14:31 ` Richard Stallman
2016-02-18 17:46 ` Eli Zaretskii
2016-02-18 18:18 ` Mark Oteiza
2016-02-18 18:24 ` Eli Zaretskii
2016-02-18 16:30 ` Richard Stallman
2016-02-18 17:07 ` Eli Zaretskii
2016-02-13 16:38 ` Marcin Borkowski
2016-02-13 17:58 ` Content navigation (was: On language-dependent defaults for character-folding) Óscar Fuentes
2016-02-13 16:32 ` On language-dependent defaults for character-folding Marcin Borkowski
2016-02-13 16:47 ` Eli Zaretskii
2016-02-13 17:03 ` Marcin Borkowski
2016-02-10 13:52 ` Adrian.B.Robert
2016-02-24 9:58 ` Marcin Borkowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CADtN0W+93LH5d3=joVj2xe40rramMOcURKw7QKdv_OefYCm8Ug@mail.gmail.com' \
--to=lokedhs@gmail.com \
--cc=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
--cc=larsi@gnus.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.