Re: On language-dependent defaults for character-folding

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

From: "Elias Mårtenson" <lokedhs@gmail.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org>
Subject: Re: On language-dependent defaults for character-folding
Date: Fri, 19 Feb 2016 21:37:26 +0800	[thread overview]
Message-ID: <CADtN0W+93LH5d3=joVj2xe40rramMOcURKw7QKdv_OefYCm8Ug@mail.gmail.com> (raw)
In-Reply-To: <837fi0sz29.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 6782 bytes --]

On 19 February 2016 at 19:46, Eli Zaretskii <eliz@gnu.org> wrote:

> > Of course you have to use the decomposition algorithms to ensure that
> the precomposed and decomposed
> > variations of the same character compares equal.
>
> Then you agree that _some_ form of character-folding should be turned
> on by default?
>

Yes.

> > This is, however, different from using the decomposition to to decompose
> a character and then using the
> > base character as the thing to match against. The latter is what Emacs
> is doing today, as far as I understand.
>
> Please describe in more detail why do you think what Emacs does today
> is not what you think it should do.  It's possible we have a
> miscommunication here.
>

The main issue to me is that it matches things that should not be matched.
A secondary (minor) issue is that some things that should be matched is not
(see my example with U+2C65).

> For example, if the buffer includes ñ (2 characters), should "C-s n"
> find the n in it?
>

That depends on the locale of the user. However, from the point of a user,
there should not be a visible difference between the precomposed and the
composed variants are the exact same character. This is in line with
Unicode recommendations (https://en.wikipedia.org/wiki/Unicode_equivalence)

Note: I know that it's possible that I am wrong about this and that Unicode
actually _has_ said that the equivalence tables can be used for this
purpose (I.e. decompose and only use the primary character). If that is the
case, I'd be interested to see a reference to that, but I will still be of
the same opinion that doing so will result in broken behaviour for a
certain class of user.

Thus, if I am Spanish, I will _not_ want any of those to match "n". If I'm
Swedish I will likely want both of them to match "n".

That equivalence is encoded in the decomposition data that is part of
> UnicodeData.txt which Emacs uses for character-folding.
>

The equivalence tables explains that the precomposed character U+00F1 is
equivalent to the specific sequence U+006E U+0303. That is all it says. It
does not say that ñ is a variation of n. It's an instruction how to
construct a given character.

The decompositions are used in the normalisation forms to ensure that the
two variants are treated equally (such as the two alternative
representations of ñ that we have been discussing).

> > If you look at the latin collation chart for example
> > (http://unicode.org/charts/collation/chart_Latin.html) you will see
> that the characters are grouped. These are
> > the equivalences I'm referring to.
>
> Yes.  And if you look at the entries of the equivalent characters in
> UnicodeData.txt, you will see there they have decompositions, which is
> what Emacs uses for searching when character-folding is in effect.
>

Yes, and this is where the crux of our disagreement lies, I think. I
previously referred to using the decompositions as a guide to character
equivalence as a "trick". I stand by this, since this is not the purpose of
the decompositions. The best thing that Unicode provides for that purpose
(to my knowledge) are the collation charts that I mentioned previously (
http://unicode.org/charts/collation/)

> > Now, I note that on these charts, U+0061 LATIN SMALL LETTER A and U+2C65
> LATIN SMALL LETTER A
> > WITH STROKE compares as different characters, and the latter does not
> have a decomposition. Should this
> > also be addressed?
>
> Maybe so, but given the controversy even about what we do now, which
> is a subset, I'd doubt extending what we do now is a wise move.
>

I was just asking to understand your position better.

> >  As for the locale-specific parts: using that will only DTRT if we
> >  assume that the majority of searches are done in buffers holding text
> >  in locale's language. Is that a good assumption?
> >
> > My opinion is that the default search behaviour should depend primarily
> on the locale of the entire Emacs
> > session. I.e. the locale of the user starting the application. I'm not
> disagreeing that allowing a buffer-local locale
> > override this behaviour is a good idea, but as a Swedish speaker I
> really see å, ä and a as completely
> > separate things, even if the language of the buffer that I am editing
> happens to be English. The equivalence of
> > these characters is the odd behaviour here, and the one that should be
> enabled explicitly.
> >
> > Also, if I happen to be editing a Spanish document (I don't speak
> Spanish) I would find equivalence of ñ and n
> > to be incredibly useful, even though Óscar would grind his teeth at it.
> :-)
>
> So you are in fact making two contradicting statements here.

Interesting. I have re-read what I wrote and I really don't see myself
holding two contradicting statement. Perhaps you think that I am both
against folding and not, at the same time. If that's the case, let me try
to rephrase:

I like the idea of character folding. But, if it's incorrectly (by my
standards, of course) implemented I would rather not have it at all since
it will be highly annoying.

> Indeed,
> the locale in which Emacs started says almost nothing about the
> documents being edited, nor even about the user's preferences: it is
> easy to imagine a user whose "native" locale is X starting Emacs in
> another locale.
>

Yes. I am fully aware of this. But so be it. Having applications work
differently depending on the locale of the environment the application was
started in is nothing new.

> >  We are talking
> >  about a multilingual Emacs, in an age of global communications, where
> >  you can have conversations with someone on the other side of the
> >  world, or read text that combines several languages in the same
> >  buffer. Do we really want to go back to the l10n days, when there was
> >  ever only one locale that was interesting -- the current one? I
> >  wonder.
> >
> > Actually, I think so. This is because the search equivalence is
> inherently a local thing.
>
> Being a multi-lingual environment, Emacs has no real notion of the
> locale.
>

Perhaps it should?

> >  It is, Unicode provides it. We just didn't import it yet.
> >
> > It does? I was looking for such tables, but didn't find it. Do you have
> a link?
>
> Look for DUCET and its tailoring data.  These should be a good
> starting point:
>
>   http://www.unicode.org/Public/UCA/latest/
>   http://cldr.unicode.org/
>

Those are the decomposition charts, and don't actually say anything about
equivalence outside of providing a canonical form for precomposed
characters, as was discussed above.

Regards,
Elias

[-- Attachment #2: Type: text/html, Size: 9952 bytes --]

next prev parent reply	other threads:[~2016-02-19 13:37 UTC|newest]

Thread overview: 263+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-09 17:26 On language-dependent defaults for character-folding Artur Malabarba
2016-02-09 17:39 ` Pierpaolo Bernardi
2016-02-09 17:54   ` Paul Eggert
2016-02-10  0:49     ` Pierpaolo Bernardi
2016-02-10  2:20       ` Artur Malabarba
2016-02-10  3:01         ` Pierpaolo Bernardi
2016-02-10  9:55           ` Artur Malabarba
2016-02-10 18:12             ` Óscar Fuentes
2016-02-10 19:23               ` Artur Malabarba
2016-02-09 17:48 ` Drew Adams
2016-02-09 16:43   ` Artur Malabarba
2016-02-09 17:58 ` Eli Zaretskii
2016-02-09 17:10   ` Artur Malabarba
2016-02-09 18:21 ` Óscar Fuentes
2016-02-09 19:54   ` Artur Malabarba
2016-02-09 20:08     ` Eli Zaretskii
2016-02-10  1:58       ` Artur Malabarba
2016-02-09 21:07     ` Óscar Fuentes
2016-02-10  2:18       ` Artur Malabarba
2016-02-10  2:52         ` Óscar Fuentes
2016-02-10  2:56         ` Mark Oteiza
2016-02-10 15:25         ` Eli Zaretskii
2016-02-10 21:17           ` Artur Malabarba
2016-02-11  3:39             ` Eli Zaretskii
2016-02-12 22:36           ` Per Starbäck
2016-02-13  8:33             ` Eli Zaretskii
2016-02-13 10:10               ` Markus Triska
2016-02-13 10:21                 ` Eli Zaretskii
2016-02-13 16:46           ` joakim
2016-02-11  0:54         ` Juri Linkov
2016-02-11  1:37           ` Óscar Fuentes
2016-02-12  0:50             ` Juri Linkov
2016-02-12  1:50               ` Óscar Fuentes
2016-02-12  7:10                 ` Eli Zaretskii
2016-02-12  7:32                   ` Óscar Fuentes
2016-02-12  8:44                     ` Eli Zaretskii
2016-02-12 10:03                       ` Óscar Fuentes
2016-02-12 11:11                         ` Joost Kremers
2016-02-12 18:21                           ` Óscar Fuentes
2016-02-12 12:00                         ` Eli Zaretskii
2016-02-12 18:42                           ` Óscar Fuentes
2016-02-12 19:06                             ` Eli Zaretskii
2016-02-12 19:28                               ` Óscar Fuentes
2016-02-12 23:57                               ` Juri Linkov
2016-02-13  0:06                                 ` Drew Adams
2016-02-13  8:49                                 ` Eli Zaretskii
2016-02-13 17:20                                   ` Drew Adams
2016-02-13 17:58                                     ` Eli Zaretskii
2016-02-18 19:15                                       ` John Wiegley
2016-02-18 20:12                                         ` Eli Zaretskii
2016-02-19  5:11                                           ` Lars Ingebrigtsen
2016-02-19  8:20                                             ` Eli Zaretskii
2016-02-19  9:22                                               ` Elias Mårtenson
2016-02-19 10:09                                                 ` Eli Zaretskii
2016-02-19 10:51                                                   ` Elias Mårtenson
2016-02-19 11:46                                                     ` Eli Zaretskii
2016-02-19 13:37                                                       ` Elias Mårtenson [this message]
2016-02-19 19:18                                                         ` Eli Zaretskii
2016-02-20  5:22                                                           ` Elias Mårtenson
2016-02-20  6:31                                                             ` Lars Ingebrigtsen
2016-02-20  9:18                                                               ` Elias Mårtenson
2016-02-20 10:34                                                               ` Eli Zaretskii
2016-02-21  2:51                                                                 ` Lars Ingebrigtsen
2016-02-21  6:28                                                                   ` Elias Mårtenson
2016-02-21  8:14                                                                     ` Achim Gratz
2016-02-23 16:56                                                                       ` Eli Zaretskii
2016-02-21 10:05                                                                     ` Lars Ingebrigtsen
2016-02-21 11:01                                                                       ` Elias Mårtenson
2016-02-21 16:02                                                                         ` Eli Zaretskii
2016-02-22  1:58                                                                         ` Lars Ingebrigtsen
2016-02-22  2:34                                                                           ` Elias Mårtenson
2016-02-22  2:48                                                                             ` Lars Ingebrigtsen
2016-02-22  6:13                                                                               ` Werner LEMBERG
2016-02-22 18:03                                                                                 ` Richard Stallman
2016-02-22 18:27                                                                                   ` Werner LEMBERG
2016-02-22 18:01                                                                               ` Richard Stallman
2016-02-22 19:06                                                                                 ` Eli Zaretskii
2016-02-23 17:43                                                                                   ` Richard Stallman
2016-02-23 18:14                                                                                     ` Eli Zaretskii
2016-02-23 20:24                                                                                       ` Yuri Khan
2016-02-25 12:11                                                                                         ` Richard Stallman
2016-02-25 14:57                                                                                           ` Yuri Khan
2016-02-26 20:21                                                                                             ` Richard Stallman
2016-02-27  5:47                                                                                               ` Yuri Khan
2016-02-27 19:54                                                                                                 ` Richard Stallman
2016-02-27 20:02                                                                                                   ` Eli Zaretskii
2016-02-27 20:05                                                                                                   ` Eli Zaretskii
2016-02-28 10:25                                                                                                     ` Richard Stallman
2016-02-28  6:06                                                                                                   ` Yuri Khan
2016-02-24 13:41                                                                                       ` Richard Stallman
2016-02-24 17:54                                                                                         ` Eli Zaretskii
2016-02-25 12:15                                                                                           ` Richard Stallman
2016-02-25 12:38                                                                                             ` Joost Kremers
2016-02-25 22:43                                                                                               ` John Wiegley
2016-02-25 22:48                                                                                                 ` John Wiegley
2016-02-26 18:13                                                                                                 ` Eli Zaretskii
2016-02-27  0:48                                                                                                   ` John Wiegley
2016-02-27  8:38                                                                                                     ` Eli Zaretskii
2016-02-27  8:58                                                                                                       ` John Wiegley
2016-02-27  9:30                                                                                                         ` Eli Zaretskii
2016-02-27 16:22                                                                                                           ` Ken Brown
2016-02-27 22:48                                                                                                           ` John Wiegley
2016-02-28 15:57                                                                                                             ` Eli Zaretskii
2016-02-28 16:59                                                                                                               ` Drew Adams
2016-02-28 22:59                                                                                                                 ` John Wiegley
2016-02-29  0:22                                                                                                                   ` Drew Adams
2016-02-29  0:31                                                                                                                   ` Juri Linkov
2016-02-29  3:45                                                                                                                     ` Eli Zaretskii
2016-02-27 19:53                                                                                                       ` Richard Stallman
2016-02-27 20:01                                                                                                         ` Eli Zaretskii
2016-02-28 10:24                                                                                                           ` Richard Stallman
2016-02-28 16:01                                                                                                             ` Eli Zaretskii
     [not found]                                                                                                           ` <<E1aZyX5-0007bU-Mu@fencepost.gnu.org>
     [not found]                                                                                                             ` <<83oab0ako0.fsf@gnu.org>
2016-02-28 17:00                                                                                                               ` Drew Adams
2016-02-28 17:59                                                                                                                 ` Clément Pit--Claudel
2016-02-28 18:04                                                                                                                   ` Eli Zaretskii
2016-02-28 18:15                                                                                                                     ` Clément Pit--Claudel
2016-02-28 18:23                                                                                                                     ` Drew Adams
2016-02-28 18:46                                                                                                                       ` Eli Zaretskii
2016-02-28 18:22                                                                                                                   ` Drew Adams
2016-02-28 18:58                                                                                                                     ` Clément Pit--Claudel
2016-02-24 13:41                                                                                       ` Richard Stallman
2016-02-24 17:56                                                                                         ` Eli Zaretskii
2016-02-25 12:15                                                                                           ` Richard Stallman
2016-02-23 20:21                                                                                     ` Yuri Khan
2016-02-23 21:15                                                                                       ` Marcin Borkowski
2016-02-22 18:01                                                                             ` Richard Stallman
2016-02-22 18:58                                                                               ` Eli Zaretskii
2016-02-23  1:30                                                                               ` Lars Ingebrigtsen
2016-02-23 17:46                                                                                 ` Richard Stallman
2016-02-24  1:50                                                                                   ` Lars Ingebrigtsen
2016-02-24  6:40                                                                                     ` Lars Brinkhoff
2016-02-24 13:43                                                                                     ` Richard Stallman
2016-02-23  2:03                                                                               ` Elias Mårtenson
2016-02-23 17:46                                                                                 ` Richard Stallman
2016-02-22  3:38                                                                           ` Eli Zaretskii
2016-02-22  3:57                                                                             ` Lars Ingebrigtsen
2016-02-22 16:10                                                                               ` Eli Zaretskii
2016-02-22 18:58                                                                               ` John Wiegley
2016-02-23  7:50                                                                                 ` Per Starbäck
2016-02-23 16:29                                                                                   ` John Wiegley
2016-02-21 16:31                                                                     ` Eli Zaretskii
2016-02-21 16:58                                                                       ` Elias Mårtenson
2016-02-21 17:23                                                                         ` Eli Zaretskii
2016-02-21 18:48                                                                           ` Ivan Andrus
2016-02-22 15:58                                                                           ` Wolfgang Jenkner
2016-02-22 16:35                                                                             ` Eli Zaretskii
2016-02-22 16:56                                                                               ` Wolfgang Jenkner
2016-02-22 17:24                                                                                 ` Eli Zaretskii
2016-02-22 17:59                                                                           ` Richard Stallman
2016-02-22 18:57                                                                             ` Eli Zaretskii
2016-02-23 17:43                                                                               ` Richard Stallman
2016-02-23 18:03                                                                                 ` Eli Zaretskii
2016-02-24 13:41                                                                                   ` Richard Stallman
2016-02-23 17:43                                                                               ` Richard Stallman
     [not found]                                                                               ` <<E1aYGze-000655-RM@fencepost.gnu.org>
2016-02-23 18:00                                                                                 ` Drew Adams
2016-02-22 17:59                                                                         ` Richard Stallman
2016-02-22 18:51                                                                           ` Eli Zaretskii
2016-02-23  0:14                                                                             ` Juri Linkov
2016-02-23 17:11                                                                               ` Eli Zaretskii
2016-02-24  0:16                                                                                 ` Juri Linkov
2016-02-24 18:39                                                                                   ` Eli Zaretskii
2016-02-25  0:29                                                                                     ` Juri Linkov
2016-02-25 16:24                                                                                       ` Eli Zaretskii
2016-02-29  0:22                                                                                         ` Juri Linkov
2016-02-29 16:27                                                                                           ` Eli Zaretskii
2016-02-29 23:40                                                                                             ` Juri Linkov
2016-03-01 16:44                                                                                               ` Eli Zaretskii
2016-02-26 20:23                                                                             ` Richard Stallman
2016-02-21 16:25                                                                   ` Eli Zaretskii
2016-02-22  1:56                                                                     ` Lars Ingebrigtsen
2016-02-22  9:20                                                                       ` Andreas Schwab
2016-02-23  1:46                                                                         ` Lars Ingebrigtsen
2016-02-23  3:38                                                                           ` Eli Zaretskii
2016-02-21 12:44                                                                 ` Richard Stallman
2016-02-21 16:05                                                                   ` Eli Zaretskii
2016-02-22 17:57                                                                     ` Richard Stallman
2016-02-22 18:34                                                                       ` Eli Zaretskii
2016-02-20  9:21                                                             ` Eli Zaretskii
2016-02-20 10:08                                                               ` Elias Mårtenson
2016-02-20 10:44                                                                 ` Eli Zaretskii
2016-02-19 20:38                                                 ` Marcin Borkowski
2016-02-19 22:44                                               ` Lars Ingebrigtsen
2016-02-19 22:54                                                 ` Clément Pit--Claudel
2016-02-20  5:25                                                   ` Elias Mårtenson
2016-02-20 14:32                                                     ` Richard Stallman
2016-02-20 15:50                                                       ` Elias Mårtenson
2016-02-21 12:45                                                         ` Richard Stallman
2016-02-20  8:09                                                 ` Eli Zaretskii
2016-02-20 14:32                                                   ` Richard Stallman
2016-02-24 23:27                                                     ` Rasmus
2016-02-25 20:46                                                       ` Richard Stallman
2016-02-13 18:15                                     ` Artur Malabarba
2016-02-13 18:26                                       ` Drew Adams
2016-02-12 19:09                             ` Clément Pit--Claudel
2016-02-12 19:39                               ` Óscar Fuentes
2016-02-13 15:32                       ` Richard Stallman
2016-02-13 15:40                         ` Eli Zaretskii
2016-02-13 16:58                           ` Andreas Schwab
2016-02-13 17:44                             ` Eli Zaretskii
2016-02-13 16:37                       ` Marcin Borkowski
2016-02-13 16:50                         ` Eli Zaretskii
2016-02-13 17:15                           ` Marcin Borkowski
2016-02-13 17:45                             ` Eli Zaretskii
2016-02-13 17:52                               ` Marcin Borkowski
2016-02-13 17:46                             ` andres.ramirez
2016-02-14 13:59                           ` Richard Stallman
2016-02-12 23:50                 ` Juri Linkov
2016-02-13  0:33                   ` Óscar Fuentes
2016-02-14 13:57                     ` Richard Stallman
2016-02-14 14:27                       ` Óscar Fuentes
2016-02-15 10:28                         ` Richard Stallman
2016-02-15 12:31                           ` Óscar Fuentes
2016-02-15 17:45                             ` Richard Stallman
2016-02-16 13:54                               ` Elias Mårtenson
2016-02-16 14:30                               ` Per Starbäck
2016-02-16 19:32                                 ` Ken Brown
2016-02-16 23:49                                   ` Lars Ingebrigtsen
2016-02-17 16:03                                     ` Richard Stallman
2016-02-18  8:57                                   ` Alan Mackenzie
2016-02-18 17:27                                     ` Eli Zaretskii
2016-02-19 12:37                                       ` Richard Stallman
2016-02-19 18:31                                         ` John Wiegley
2016-02-17  8:00                                 ` Joost Kremers
2016-02-17 15:34                                   ` Eli Zaretskii
2016-02-17 18:30                                     ` Achim Gratz
2016-02-17 19:30                                       ` Eli Zaretskii
2016-02-17 20:26                                       ` Marcin Borkowski
2016-02-17 20:06                                     ` Joost Kremers
2016-02-17 20:15                                       ` Eli Zaretskii
2016-02-17 22:58                                         ` Ken Brown
2016-02-18  0:03                                           ` Vinicius Latorre
2016-02-18 17:29                                             ` Eli Zaretskii
2016-02-18  4:55                                           ` Marcin Borkowski
2016-02-18 11:26                                           ` Filipp Gunbin
2016-02-18 17:26                                             ` Eli Zaretskii
2016-02-19 12:30                                               ` Filipp Gunbin
2016-02-19 15:22                                                 ` Eli Zaretskii
2016-02-18 17:30                                           ` Eli Zaretskii
2016-02-17 22:53                                     ` Mark Oteiza
2016-02-18  0:11                                       ` Juri Linkov
2016-02-18  0:20                                         ` Mark Oteiza
2016-02-18 17:28                                           ` Eli Zaretskii
2016-02-18  4:53                                         ` Marcin Borkowski
2016-02-18 17:07                                           ` Elias Mårtenson
2016-02-18 17:21                                             ` Eli Zaretskii
2016-02-19  7:40                                               ` Elias Mårtenson
2016-02-19 19:24                                                 ` Achim Gratz
2016-02-20  5:05                                                   ` Elias Mårtenson
2016-02-20 13:59                                                     ` Achim Gratz
2016-02-19 20:47                                             ` Marcin Borkowski
2016-02-20 14:31                                               ` Richard Stallman
2016-02-18 17:46                                       ` Eli Zaretskii
2016-02-18 18:18                                         ` Mark Oteiza
2016-02-18 18:24                                           ` Eli Zaretskii
2016-02-18 16:30                                     ` Richard Stallman
2016-02-18 17:07                                       ` Eli Zaretskii
2016-02-13 16:38                 ` Marcin Borkowski
2016-02-13 17:58                   ` Content navigation (was: On language-dependent defaults for character-folding) Óscar Fuentes
2016-02-13 16:32       ` On language-dependent defaults for character-folding Marcin Borkowski
2016-02-13 16:47         ` Eli Zaretskii
2016-02-13 17:03           ` Marcin Borkowski
2016-02-10 13:52 ` Adrian.B.Robert
2016-02-24  9:58 ` Marcin Borkowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADtN0W+93LH5d3=joVj2xe40rramMOcURKw7QKdv_OefYCm8Ug@mail.gmail.com' \
    --to=lokedhs@gmail.com \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=larsi@gnus.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.