all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: "Elias Mårtenson" <lokedhs@gmail.com>
Cc: larsi@gnus.org, emacs-devel@gnu.org
Subject: Re: On language-dependent defaults for character-folding
Date: Fri, 19 Feb 2016 13:46:06 +0200	[thread overview]
Message-ID: <837fi0sz29.fsf@gnu.org> (raw)
In-Reply-To: <CADtN0W+2CjROLMnuC8N3X3TrwvsZOmidviFjM_-AF0DKN-Wvsg@mail.gmail.com> (message from Elias Mårtenson on Fri, 19 Feb 2016 18:51:47 +0800)

> Date: Fri, 19 Feb 2016 18:51:47 +0800
> From: Elias Mårtenson <lokedhs@gmail.com>
> Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org>
> 
>  > The Unicode character decomposition was never meant to be used to provide a feature such as
>  character
>  > folding in Emacs.
> 
>  That's not true. Canonical equivalence, which is encoded in canonical
>  decompositions, is a must for searching. Otherwise, what looks the
>  same on display will not be found, and will look like a bug. See the
>  example I gave with ñ and ñ (the latter one is 2 characters).
> 
> Of course you have to use the decomposition algorithms to ensure that the precomposed and decomposed
> variations of the same character compares equal.

Then you agree that _some_ form of character-folding should be turned
on by default?

> This is, however, different from using the decomposition to to decompose a character and then using the
> base character as the thing to match against. The latter is what Emacs is doing today, as far as I understand.

Please describe in more detail why do you think what Emacs does today
is not what you think it should do.  It's possible we have a
miscommunication here.

For example, if the buffer includes ñ (2 characters), should "C-s n"
find the n in it?

>  2 and 3 are the same as we do already, AFAICT. (Collation charts
>  describe ordering, which is irrelevant for searching; other than that,
>  you will see that Emacs already implements the data shown in
>  http://unicode.org/charts/collation/.)
> 
> The collation charts also describe equivalence.

That equivalence is encoded in the decomposition data that is part of
UnicodeData.txt which Emacs uses for character-folding.

> If you look at the latin collation chart for example
> (http://unicode.org/charts/collation/chart_Latin.html) you will see that the characters are grouped. These are
> the equivalences I'm referring to.

Yes.  And if you look at the entries of the equivalent characters in
UnicodeData.txt, you will see there they have decompositions, which is
what Emacs uses for searching when character-folding is in effect.

> Now, I note that on these charts, U+0061 LATIN SMALL LETTER A and U+2C65 LATIN SMALL LETTER A
> WITH STROKE compares as different characters, and the latter does not have a decomposition. Should this
> also be addressed?

Maybe so, but given the controversy even about what we do now, which
is a subset, I'd doubt extending what we do now is a wise move.

>  As for the locale-specific parts: using that will only DTRT if we
>  assume that the majority of searches are done in buffers holding text
>  in locale's language. Is that a good assumption? 
> 
> My opinion is that the default search behaviour should depend primarily on the locale of the entire Emacs
> session. I.e. the locale of the user starting the application. I'm not disagreeing that allowing a buffer-local locale
> override this behaviour is a good idea, but as a Swedish speaker I really see å, ä and a as completely
> separate things, even if the language of the buffer that I am editing happens to be English. The equivalence of
> these characters is the odd behaviour here, and the one that should be enabled explicitly.
> 
> Also, if I happen to be editing a Spanish document (I don't speak Spanish) I would find equivalence of ñ and n
> to be incredibly useful, even though Óscar would grind his teeth at it. :-)

So you are in fact making two contradicting statements here.  Indeed,
the locale in which Emacs started says almost nothing about the
documents being edited, nor even about the user's preferences: it is
easy to imagine a user whose "native" locale is X starting Emacs in
another locale.

>  We are talking
>  about a multilingual Emacs, in an age of global communications, where
>  you can have conversations with someone on the other side of the
>  world, or read text that combines several languages in the same
>  buffer. Do we really want to go back to the l10n days, when there was
>  ever only one locale that was interesting -- the current one? I
>  wonder.
> 
> Actually, I think so. This is because the search equivalence is inherently a local thing.

Being a multi-lingual environment, Emacs has no real notion of the
locale.

>  It is, Unicode provides it. We just didn't import it yet.
> 
> It does? I was looking for such tables, but didn't find it. Do you have a link?

Look for DUCET and its tailoring data.  These should be a good
starting point:

  http://www.unicode.org/Public/UCA/latest/
  http://cldr.unicode.org/
  
>  Note that the prerequisite for anything more complicated and elaborate
>  than what we have now is to re-implement character-folding on the C
>  level, inside search.c functions. The current implementation is at
>  its limits already. I tried to convince the interested people to do
>  this in C to be gin with, but couldn't, and the feature was important
>  enough to have even in its current implementation.
> 
> I'm not going to offer to do this until I'm sure that I can have the copyright assignment done. But I am
> interested in it.

Thanks.



  reply	other threads:[~2016-02-19 11:46 UTC|newest]

Thread overview: 263+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-09 17:26 On language-dependent defaults for character-folding Artur Malabarba
2016-02-09 17:39 ` Pierpaolo Bernardi
2016-02-09 17:54   ` Paul Eggert
2016-02-10  0:49     ` Pierpaolo Bernardi
2016-02-10  2:20       ` Artur Malabarba
2016-02-10  3:01         ` Pierpaolo Bernardi
2016-02-10  9:55           ` Artur Malabarba
2016-02-10 18:12             ` Óscar Fuentes
2016-02-10 19:23               ` Artur Malabarba
2016-02-09 17:48 ` Drew Adams
2016-02-09 16:43   ` Artur Malabarba
2016-02-09 17:58 ` Eli Zaretskii
2016-02-09 17:10   ` Artur Malabarba
2016-02-09 18:21 ` Óscar Fuentes
2016-02-09 19:54   ` Artur Malabarba
2016-02-09 20:08     ` Eli Zaretskii
2016-02-10  1:58       ` Artur Malabarba
2016-02-09 21:07     ` Óscar Fuentes
2016-02-10  2:18       ` Artur Malabarba
2016-02-10  2:52         ` Óscar Fuentes
2016-02-10  2:56         ` Mark Oteiza
2016-02-10 15:25         ` Eli Zaretskii
2016-02-10 21:17           ` Artur Malabarba
2016-02-11  3:39             ` Eli Zaretskii
2016-02-12 22:36           ` Per Starbäck
2016-02-13  8:33             ` Eli Zaretskii
2016-02-13 10:10               ` Markus Triska
2016-02-13 10:21                 ` Eli Zaretskii
2016-02-13 16:46           ` joakim
2016-02-11  0:54         ` Juri Linkov
2016-02-11  1:37           ` Óscar Fuentes
2016-02-12  0:50             ` Juri Linkov
2016-02-12  1:50               ` Óscar Fuentes
2016-02-12  7:10                 ` Eli Zaretskii
2016-02-12  7:32                   ` Óscar Fuentes
2016-02-12  8:44                     ` Eli Zaretskii
2016-02-12 10:03                       ` Óscar Fuentes
2016-02-12 11:11                         ` Joost Kremers
2016-02-12 18:21                           ` Óscar Fuentes
2016-02-12 12:00                         ` Eli Zaretskii
2016-02-12 18:42                           ` Óscar Fuentes
2016-02-12 19:06                             ` Eli Zaretskii
2016-02-12 19:28                               ` Óscar Fuentes
2016-02-12 23:57                               ` Juri Linkov
2016-02-13  0:06                                 ` Drew Adams
2016-02-13  8:49                                 ` Eli Zaretskii
2016-02-13 17:20                                   ` Drew Adams
2016-02-13 17:58                                     ` Eli Zaretskii
2016-02-18 19:15                                       ` John Wiegley
2016-02-18 20:12                                         ` Eli Zaretskii
2016-02-19  5:11                                           ` Lars Ingebrigtsen
2016-02-19  8:20                                             ` Eli Zaretskii
2016-02-19  9:22                                               ` Elias Mårtenson
2016-02-19 10:09                                                 ` Eli Zaretskii
2016-02-19 10:51                                                   ` Elias Mårtenson
2016-02-19 11:46                                                     ` Eli Zaretskii [this message]
2016-02-19 13:37                                                       ` Elias Mårtenson
2016-02-19 19:18                                                         ` Eli Zaretskii
2016-02-20  5:22                                                           ` Elias Mårtenson
2016-02-20  6:31                                                             ` Lars Ingebrigtsen
2016-02-20  9:18                                                               ` Elias Mårtenson
2016-02-20 10:34                                                               ` Eli Zaretskii
2016-02-21  2:51                                                                 ` Lars Ingebrigtsen
2016-02-21  6:28                                                                   ` Elias Mårtenson
2016-02-21  8:14                                                                     ` Achim Gratz
2016-02-23 16:56                                                                       ` Eli Zaretskii
2016-02-21 10:05                                                                     ` Lars Ingebrigtsen
2016-02-21 11:01                                                                       ` Elias Mårtenson
2016-02-21 16:02                                                                         ` Eli Zaretskii
2016-02-22  1:58                                                                         ` Lars Ingebrigtsen
2016-02-22  2:34                                                                           ` Elias Mårtenson
2016-02-22  2:48                                                                             ` Lars Ingebrigtsen
2016-02-22  6:13                                                                               ` Werner LEMBERG
2016-02-22 18:03                                                                                 ` Richard Stallman
2016-02-22 18:27                                                                                   ` Werner LEMBERG
2016-02-22 18:01                                                                               ` Richard Stallman
2016-02-22 19:06                                                                                 ` Eli Zaretskii
2016-02-23 17:43                                                                                   ` Richard Stallman
2016-02-23 18:14                                                                                     ` Eli Zaretskii
2016-02-23 20:24                                                                                       ` Yuri Khan
2016-02-25 12:11                                                                                         ` Richard Stallman
2016-02-25 14:57                                                                                           ` Yuri Khan
2016-02-26 20:21                                                                                             ` Richard Stallman
2016-02-27  5:47                                                                                               ` Yuri Khan
2016-02-27 19:54                                                                                                 ` Richard Stallman
2016-02-27 20:02                                                                                                   ` Eli Zaretskii
2016-02-27 20:05                                                                                                   ` Eli Zaretskii
2016-02-28 10:25                                                                                                     ` Richard Stallman
2016-02-28  6:06                                                                                                   ` Yuri Khan
2016-02-24 13:41                                                                                       ` Richard Stallman
2016-02-24 17:54                                                                                         ` Eli Zaretskii
2016-02-25 12:15                                                                                           ` Richard Stallman
2016-02-25 12:38                                                                                             ` Joost Kremers
2016-02-25 22:43                                                                                               ` John Wiegley
2016-02-25 22:48                                                                                                 ` John Wiegley
2016-02-26 18:13                                                                                                 ` Eli Zaretskii
2016-02-27  0:48                                                                                                   ` John Wiegley
2016-02-27  8:38                                                                                                     ` Eli Zaretskii
2016-02-27  8:58                                                                                                       ` John Wiegley
2016-02-27  9:30                                                                                                         ` Eli Zaretskii
2016-02-27 16:22                                                                                                           ` Ken Brown
2016-02-27 22:48                                                                                                           ` John Wiegley
2016-02-28 15:57                                                                                                             ` Eli Zaretskii
2016-02-28 16:59                                                                                                               ` Drew Adams
2016-02-28 22:59                                                                                                                 ` John Wiegley
2016-02-29  0:22                                                                                                                   ` Drew Adams
2016-02-29  0:31                                                                                                                   ` Juri Linkov
2016-02-29  3:45                                                                                                                     ` Eli Zaretskii
2016-02-27 19:53                                                                                                       ` Richard Stallman
2016-02-27 20:01                                                                                                         ` Eli Zaretskii
2016-02-28 10:24                                                                                                           ` Richard Stallman
2016-02-28 16:01                                                                                                             ` Eli Zaretskii
     [not found]                                                                                                           ` <<E1aZyX5-0007bU-Mu@fencepost.gnu.org>
     [not found]                                                                                                             ` <<83oab0ako0.fsf@gnu.org>
2016-02-28 17:00                                                                                                               ` Drew Adams
2016-02-28 17:59                                                                                                                 ` Clément Pit--Claudel
2016-02-28 18:04                                                                                                                   ` Eli Zaretskii
2016-02-28 18:15                                                                                                                     ` Clément Pit--Claudel
2016-02-28 18:23                                                                                                                     ` Drew Adams
2016-02-28 18:46                                                                                                                       ` Eli Zaretskii
2016-02-28 18:22                                                                                                                   ` Drew Adams
2016-02-28 18:58                                                                                                                     ` Clément Pit--Claudel
2016-02-24 13:41                                                                                       ` Richard Stallman
2016-02-24 17:56                                                                                         ` Eli Zaretskii
2016-02-25 12:15                                                                                           ` Richard Stallman
2016-02-23 20:21                                                                                     ` Yuri Khan
2016-02-23 21:15                                                                                       ` Marcin Borkowski
2016-02-22 18:01                                                                             ` Richard Stallman
2016-02-22 18:58                                                                               ` Eli Zaretskii
2016-02-23  1:30                                                                               ` Lars Ingebrigtsen
2016-02-23 17:46                                                                                 ` Richard Stallman
2016-02-24  1:50                                                                                   ` Lars Ingebrigtsen
2016-02-24  6:40                                                                                     ` Lars Brinkhoff
2016-02-24 13:43                                                                                     ` Richard Stallman
2016-02-23  2:03                                                                               ` Elias Mårtenson
2016-02-23 17:46                                                                                 ` Richard Stallman
2016-02-22  3:38                                                                           ` Eli Zaretskii
2016-02-22  3:57                                                                             ` Lars Ingebrigtsen
2016-02-22 16:10                                                                               ` Eli Zaretskii
2016-02-22 18:58                                                                               ` John Wiegley
2016-02-23  7:50                                                                                 ` Per Starbäck
2016-02-23 16:29                                                                                   ` John Wiegley
2016-02-21 16:31                                                                     ` Eli Zaretskii
2016-02-21 16:58                                                                       ` Elias Mårtenson
2016-02-21 17:23                                                                         ` Eli Zaretskii
2016-02-21 18:48                                                                           ` Ivan Andrus
2016-02-22 15:58                                                                           ` Wolfgang Jenkner
2016-02-22 16:35                                                                             ` Eli Zaretskii
2016-02-22 16:56                                                                               ` Wolfgang Jenkner
2016-02-22 17:24                                                                                 ` Eli Zaretskii
2016-02-22 17:59                                                                           ` Richard Stallman
2016-02-22 18:57                                                                             ` Eli Zaretskii
2016-02-23 17:43                                                                               ` Richard Stallman
2016-02-23 18:03                                                                                 ` Eli Zaretskii
2016-02-24 13:41                                                                                   ` Richard Stallman
2016-02-23 17:43                                                                               ` Richard Stallman
     [not found]                                                                               ` <<E1aYGze-000655-RM@fencepost.gnu.org>
2016-02-23 18:00                                                                                 ` Drew Adams
2016-02-22 17:59                                                                         ` Richard Stallman
2016-02-22 18:51                                                                           ` Eli Zaretskii
2016-02-23  0:14                                                                             ` Juri Linkov
2016-02-23 17:11                                                                               ` Eli Zaretskii
2016-02-24  0:16                                                                                 ` Juri Linkov
2016-02-24 18:39                                                                                   ` Eli Zaretskii
2016-02-25  0:29                                                                                     ` Juri Linkov
2016-02-25 16:24                                                                                       ` Eli Zaretskii
2016-02-29  0:22                                                                                         ` Juri Linkov
2016-02-29 16:27                                                                                           ` Eli Zaretskii
2016-02-29 23:40                                                                                             ` Juri Linkov
2016-03-01 16:44                                                                                               ` Eli Zaretskii
2016-02-26 20:23                                                                             ` Richard Stallman
2016-02-21 16:25                                                                   ` Eli Zaretskii
2016-02-22  1:56                                                                     ` Lars Ingebrigtsen
2016-02-22  9:20                                                                       ` Andreas Schwab
2016-02-23  1:46                                                                         ` Lars Ingebrigtsen
2016-02-23  3:38                                                                           ` Eli Zaretskii
2016-02-21 12:44                                                                 ` Richard Stallman
2016-02-21 16:05                                                                   ` Eli Zaretskii
2016-02-22 17:57                                                                     ` Richard Stallman
2016-02-22 18:34                                                                       ` Eli Zaretskii
2016-02-20  9:21                                                             ` Eli Zaretskii
2016-02-20 10:08                                                               ` Elias Mårtenson
2016-02-20 10:44                                                                 ` Eli Zaretskii
2016-02-19 20:38                                                 ` Marcin Borkowski
2016-02-19 22:44                                               ` Lars Ingebrigtsen
2016-02-19 22:54                                                 ` Clément Pit--Claudel
2016-02-20  5:25                                                   ` Elias Mårtenson
2016-02-20 14:32                                                     ` Richard Stallman
2016-02-20 15:50                                                       ` Elias Mårtenson
2016-02-21 12:45                                                         ` Richard Stallman
2016-02-20  8:09                                                 ` Eli Zaretskii
2016-02-20 14:32                                                   ` Richard Stallman
2016-02-24 23:27                                                     ` Rasmus
2016-02-25 20:46                                                       ` Richard Stallman
2016-02-13 18:15                                     ` Artur Malabarba
2016-02-13 18:26                                       ` Drew Adams
2016-02-12 19:09                             ` Clément Pit--Claudel
2016-02-12 19:39                               ` Óscar Fuentes
2016-02-13 15:32                       ` Richard Stallman
2016-02-13 15:40                         ` Eli Zaretskii
2016-02-13 16:58                           ` Andreas Schwab
2016-02-13 17:44                             ` Eli Zaretskii
2016-02-13 16:37                       ` Marcin Borkowski
2016-02-13 16:50                         ` Eli Zaretskii
2016-02-13 17:15                           ` Marcin Borkowski
2016-02-13 17:45                             ` Eli Zaretskii
2016-02-13 17:52                               ` Marcin Borkowski
2016-02-13 17:46                             ` andres.ramirez
2016-02-14 13:59                           ` Richard Stallman
2016-02-12 23:50                 ` Juri Linkov
2016-02-13  0:33                   ` Óscar Fuentes
2016-02-14 13:57                     ` Richard Stallman
2016-02-14 14:27                       ` Óscar Fuentes
2016-02-15 10:28                         ` Richard Stallman
2016-02-15 12:31                           ` Óscar Fuentes
2016-02-15 17:45                             ` Richard Stallman
2016-02-16 13:54                               ` Elias Mårtenson
2016-02-16 14:30                               ` Per Starbäck
2016-02-16 19:32                                 ` Ken Brown
2016-02-16 23:49                                   ` Lars Ingebrigtsen
2016-02-17 16:03                                     ` Richard Stallman
2016-02-18  8:57                                   ` Alan Mackenzie
2016-02-18 17:27                                     ` Eli Zaretskii
2016-02-19 12:37                                       ` Richard Stallman
2016-02-19 18:31                                         ` John Wiegley
2016-02-17  8:00                                 ` Joost Kremers
2016-02-17 15:34                                   ` Eli Zaretskii
2016-02-17 18:30                                     ` Achim Gratz
2016-02-17 19:30                                       ` Eli Zaretskii
2016-02-17 20:26                                       ` Marcin Borkowski
2016-02-17 20:06                                     ` Joost Kremers
2016-02-17 20:15                                       ` Eli Zaretskii
2016-02-17 22:58                                         ` Ken Brown
2016-02-18  0:03                                           ` Vinicius Latorre
2016-02-18 17:29                                             ` Eli Zaretskii
2016-02-18  4:55                                           ` Marcin Borkowski
2016-02-18 11:26                                           ` Filipp Gunbin
2016-02-18 17:26                                             ` Eli Zaretskii
2016-02-19 12:30                                               ` Filipp Gunbin
2016-02-19 15:22                                                 ` Eli Zaretskii
2016-02-18 17:30                                           ` Eli Zaretskii
2016-02-17 22:53                                     ` Mark Oteiza
2016-02-18  0:11                                       ` Juri Linkov
2016-02-18  0:20                                         ` Mark Oteiza
2016-02-18 17:28                                           ` Eli Zaretskii
2016-02-18  4:53                                         ` Marcin Borkowski
2016-02-18 17:07                                           ` Elias Mårtenson
2016-02-18 17:21                                             ` Eli Zaretskii
2016-02-19  7:40                                               ` Elias Mårtenson
2016-02-19 19:24                                                 ` Achim Gratz
2016-02-20  5:05                                                   ` Elias Mårtenson
2016-02-20 13:59                                                     ` Achim Gratz
2016-02-19 20:47                                             ` Marcin Borkowski
2016-02-20 14:31                                               ` Richard Stallman
2016-02-18 17:46                                       ` Eli Zaretskii
2016-02-18 18:18                                         ` Mark Oteiza
2016-02-18 18:24                                           ` Eli Zaretskii
2016-02-18 16:30                                     ` Richard Stallman
2016-02-18 17:07                                       ` Eli Zaretskii
2016-02-13 16:38                 ` Marcin Borkowski
2016-02-13 17:58                   ` Content navigation (was: On language-dependent defaults for character-folding) Óscar Fuentes
2016-02-13 16:32       ` On language-dependent defaults for character-folding Marcin Borkowski
2016-02-13 16:47         ` Eli Zaretskii
2016-02-13 17:03           ` Marcin Borkowski
2016-02-10 13:52 ` Adrian.B.Robert
2016-02-24  9:58 ` Marcin Borkowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=837fi0sz29.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=larsi@gnus.org \
    --cc=lokedhs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.