Re: On language-dependent defaults for character-folding

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Re: On language-dependent defaults for character-folding
  2016-02-09 17:48 ` Drew Adams
@ 2016-02-09 16:43   ` Artur Malabarba
  0 siblings, 0 replies; 263+ messages in thread
From: Artur Malabarba @ 2016-02-09 16:43 UTC (permalink / raw)
  To: Drew Adams; +Cc: emacs-devel

Drew Adams <drew.adams@oracle.com> writes:

> I would say that it is primarily about searching for *any of a
> given set of characters*. [...]
> It's simply about wanting to treat a given set of chars as
> equivalent for search purposes.  How you input a search pattern
> (typing, pasting) is only one consideration, for operation.

Fair enough. It's good to know how others think of this feature.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-09 17:58 ` Eli Zaretskii
@ 2016-02-09 17:10   ` Artur Malabarba
  0 siblings, 0 replies; 263+ messages in thread
From: Artur Malabarba @ 2016-02-09 17:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:
>> I don't know if it's possible to figure out the language of the user's
>> keyboard layout.
>
> It's possible on some systems (maybe on all of them).  But it isn't
> TRT, IMO, because one can use input methods external to Emacs, which
> makes this problem unsolvable, AFAIU.
>
> I think our energy will be much better spent by preparing a data base
> of preferences by various groups of users, including (but not limited
> to) something that can be vaguely called "typical user of language X",
> for several values of X.

I disagree that it's not TRT. Most problems are technically unsolvable
if you take into account the infinity of ways that the user could have
customized Emacs or their OS, that doesn't prevent us from solving the
“typical” case.
But I'm also fine with your proposed alternative.

Having a separate setting that governs multiple features and might allow
us to identify a user's “main” language (or something like that), sounds
useful too. While I'd prefer to rely on “the language that the user
types in”, relying on “the user's language” is a fine compromise. As
long as “the buffer's language” doesn't factor in.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* On language-dependent defaults for character-folding
@ 2016-02-09 17:26 Artur Malabarba
  2016-02-09 17:39 ` Pierpaolo Bernardi
                   ` (5 more replies)
  0 siblings, 6 replies; 263+ messages in thread
From: Artur Malabarba @ 2016-02-09 17:26 UTC (permalink / raw)
  To: emacs-devel

Hi everyone,

Firstly, let me say that character folding will be more easily
configurable soon. The current message is not about that, it's about
the default behaviour. It's important that the default be helpful,
without appearing to be "buggy" to unsuspecting users.

== Context ==
A lot of people have raised concerns with the default behaviour of
character folding. The argument usually goes like this:
“as a Spanish user, n and ñ are different letters, and if searching
for n will find instances of ñ, then that is a false positive. This
folding should be disabled for Spanish users.” (and so on).

One of the solutions suggested is that the set of foldings used by
default should depend on some buffer-local notion of current language.

== My Point ==
I agree that the default behaviour should be a little smarter (i.e., I
agree with the argument), but I disagree that the **buffer's**
language has anything to do with that.

Char folding is primarily about being able to easily search for
characters that you can't easily type. It also has secondary uses,
like searching when you're not even sure which character you want to
search for, but I'm focusing on the first.

The set of characters that I can easily type is defined by 3 things:
1. My keyboard layout.
2. The input method in the current Emacs buffer.
3. Any special commands/keybinds that I have specifically set up.

Note how the language of the text in the buffer does not show up
there. It does not matter whether the current buffer is in English,
Portuguese, or Spanish, I simply cannot type ñ without at least 4
keystrokes.
As long as my keyboard layout is not Spanish, I want to be able to
find ñ when searching for n. The language of the text is irrelevant.
(I'm using Spanish as the example here, obviously this holds for most
languages).

That's why the default set of char foldings should depend on item 1
above. (It might eventually be nice to take item 2 into account too,
and it's simply impossible to account for item 3).

Note that it also doesn't matter whether or not I'm proficient in
Spanish. I still can't type ñ in less than 4 keystrokes.

== Bottomline ==
I don't know if it's possible to figure out the language of the user's
keyboard layout. But the point is that we should care about the
language that the user can _type_ in, NOT the language that they
happen to be _reading_ now nor the language that they happen to
_know_.

Cheers everyone,
Artur

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-09 17:26 On language-dependent defaults for character-folding Artur Malabarba
@ 2016-02-09 17:39 ` Pierpaolo Bernardi
  2016-02-09 17:54   ` Paul Eggert
  2016-02-09 17:48 ` Drew Adams
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 263+ messages in thread
From: Pierpaolo Bernardi @ 2016-02-09 17:39 UTC (permalink / raw)
  To: Artur Malabarba; +Cc: emacs-devel

On Tue, Feb 9, 2016 at 6:26 PM, Artur Malabarba
<bruce.connor.am@gmail.com> wrote:

> == Bottomline ==
> I don't know if it's possible to figure out the language of the user's
> keyboard layout. But the point is that we should care about the
> language that the user can _type_ in, NOT the language that they
> happen to be _reading_ now nor the language that they happen to
> _know_.

So, if I'm using my laptop on which I use a US-international layout I
will get no folding for any character in Latin-1, if I use a nearby
machine with an Italian keyboard layout I get a different behavior, if
I use another machine with a US layout I get another different
behavior.  That will be the time that I revert to Emacs 19.

FWIW, my preference would be for a different function altogether,
disjoint from the non-folding version.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* RE: On language-dependent defaults for character-folding
  2016-02-09 17:26 On language-dependent defaults for character-folding Artur Malabarba
  2016-02-09 17:39 ` Pierpaolo Bernardi
@ 2016-02-09 17:48 ` Drew Adams
  2016-02-09 16:43   ` Artur Malabarba
  2016-02-09 17:58 ` Eli Zaretskii
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 263+ messages in thread
From: Drew Adams @ 2016-02-09 17:48 UTC (permalink / raw)
  To: bruce.connor.am, emacs-devel

> Char folding is primarily about being able to easily search for
> characters that you can't easily type. It also has secondary uses,
> like searching when you're not even sure which character you want to
> search for, but I'm focusing on the first.

I would say that it is primarily about searching for *any of a
given set of characters*.  It has nothing to do, necessarily, with
the difficulty of typing certain characters, and it has nothing to
do, necessarily, with not knowing which characters you want to
search for.

It's simply about wanting to treat a given set of chars as
equivalent for search purposes.  How you input a search pattern
(typing, pasting) is only one consideration, for operation.

> the point is that we should care about the
> language that the user can _type_ in, NOT the language that they
> happen to be _reading_ now nor the language that they happen to
> _know_.

Typing is only one consideration when defining default behavior.
It is of course a reasonable thing to consider.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-09 17:39 ` Pierpaolo Bernardi
@ 2016-02-09 17:54   ` Paul Eggert
  2016-02-10  0:49     ` Pierpaolo Bernardi
  0 siblings, 1 reply; 263+ messages in thread
From: Paul Eggert @ 2016-02-09 17:54 UTC (permalink / raw)
  To: Pierpaolo Bernardi, Artur Malabarba; +Cc: emacs-devel

On 02/09/2016 09:39 AM, Pierpaolo Bernardi wrote:
> So, if I'm using my laptop on which I use a US-international layout I
> will get no folding for any character in Latin-1

That's not what Artur's saying. The layout of the keyboard hardware is 
not the same thing as the language that the user can easily type in.

I agree with Artur's point: typically, searching convenience depends 
more on the language of the user doing the searching than on the 
language of the document being searched.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-09 17:26 On language-dependent defaults for character-folding Artur Malabarba
  2016-02-09 17:39 ` Pierpaolo Bernardi
  2016-02-09 17:48 ` Drew Adams
@ 2016-02-09 17:58 ` Eli Zaretskii
  2016-02-09 17:10   ` Artur Malabarba
  2016-02-09 18:21 ` Óscar Fuentes
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-09 17:58 UTC (permalink / raw)
  To: bruce.connor.am; +Cc: emacs-devel

> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Date: Tue, 9 Feb 2016 17:26:32 +0000
> 
> I don't know if it's possible to figure out the language of the user's
> keyboard layout.

It's possible on some systems (maybe on all of them).  But it isn't
TRT, IMO, because one can use input methods external to Emacs, which
makes this problem unsolvable, AFAIU.

I think our energy will be much better spent by preparing a data base
of preferences by various groups of users, including (but not limited
to) something that can be vaguely called "typical user of language X",
for several values of X.  I think we can come up with other types of
groups as well.

Thanks.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-09 17:26 On language-dependent defaults for character-folding Artur Malabarba
                   ` (2 preceding siblings ...)
  2016-02-09 17:58 ` Eli Zaretskii
@ 2016-02-09 18:21 ` Óscar Fuentes
  2016-02-09 19:54   ` Artur Malabarba
  2016-02-10 13:52 ` Adrian.B.Robert
  2016-02-24  9:58 ` Marcin Borkowski
  5 siblings, 1 reply; 263+ messages in thread
From: Óscar Fuentes @ 2016-02-09 18:21 UTC (permalink / raw)
  To: emacs-devel

Artur Malabarba <bruce.connor.am@gmail.com> writes:

[snip]

> == Bottomline ==
> I don't know if it's possible to figure out the language of the user's
> keyboard layout. But the point is that we should care about the
> language that the user can _type_ in,

Figuring out this (and acting upon that knowledge) looks like a quite
complex task to me. In practice, letting the user tell Emacs about how
the char folding should happen is more reasonable.

> NOT the language that they
> happen to be _reading_ now nor the language that they happen to
> _know_.

What I get from all this saga it that character folding is about
allowing users to search for weird characters used by those
funny-looking aliens who are harrassed by the guards when they pretend
to cross our borders :-) You don't care about what the character really
is, you just notice that it is "that character I know with some
decoration added" and then use the character you know for searching for
the funny one.

I hope you all realize that the users who can benefit from this feature
are those who are ill-equiped to *search* for certain characters,
related to the latin alphabet, and need to that only occasionally. OTOH
we have the people who actually write those characters, hence they don't
need help for searching for them, and who will be pissed to discover
that Isearch is broken.

We don't need a smarter feature, we need a sane default, which is
"disabled". When activated, act as Unicode says, which seems to be
clearly defined. That's it.

Much of the confussion on this topic originated on the expectation that
the feature could be used for searching for equivalent characters within
a language (*), but as that is not what is about, the need for
language-dependent customizations vanishes, and with it the complexity
goes away too.

* Some languages (French) may benefit from the feature anyways, because
  the "equivalence classes" of theirs happen to coincide with what the
  character folding feature does.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-09 18:21 ` Óscar Fuentes
@ 2016-02-09 19:54   ` Artur Malabarba
  2016-02-09 20:08     ` Eli Zaretskii
  2016-02-09 21:07     ` Óscar Fuentes
  0 siblings, 2 replies; 263+ messages in thread
From: Artur Malabarba @ 2016-02-09 19:54 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

On 9 February 2016 at 18:21, Óscar Fuentes <ofv@wanadoo.es> wrote:
>> I don't know if it's possible to figure out the language of the user's
>> keyboard layout. But the point is that we should care about the
>> language that the user can type in,
>
> Figuring out this (and acting upon that knowledge) looks like a quite
> complex task to me. In practice, letting the user tell Emacs about how
> the char folding should happen is more reasonable.

1. Take the set of all characters in the language that the user types in;
2. Don't fold these characters.

That's all the complexity. If we have a database of characters in a
language, this could even be done automatically. If we don't have such
a database, then all we need is some quick input from a user of that
language (this doesn't need to happen all at once, there's no rush).

> I hope you all realize that the users who can benefit from this feature
> are those who are ill-equiped to search for certain characters,

I could be wrong, but I think you just defined all users. In the
Unicode standard used by Emacs, there are 5721 characters with a
“decomposition” property. Is there a user who is well-equiped to type
all of those characters?

> OTOH
> we have the people who actually write those characters, hence they don't
> need help for searching for them, and who will be pissed to discover
> that Isearch is broken.

The whole point here is to find defaults that won't fold characters of
the user's language.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-09 19:54   ` Artur Malabarba
@ 2016-02-09 20:08     ` Eli Zaretskii
  2016-02-10  1:58       ` Artur Malabarba
  2016-02-09 21:07     ` Óscar Fuentes
  1 sibling, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-09 20:08 UTC (permalink / raw)
  To: bruce.connor.am; +Cc: ofv, emacs-devel

> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Date: Tue, 9 Feb 2016 19:54:57 +0000
> Cc: emacs-devel <emacs-devel@gnu.org>
> 
> 1. Take the set of all characters in the language that the user types in;
> 2. Don't fold these characters.

I think should make an exception to rule 2 for character sequences
that are displayed as some character in the user's language: those
must be folded, otherwise the result will be very confusing.  For
example, searching for ñ (one character) should also find a sequence
of 2 characters ñ, and vice versa, even for languages where ñ can be
typed on the keyboard.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-09 19:54   ` Artur Malabarba
  2016-02-09 20:08     ` Eli Zaretskii
@ 2016-02-09 21:07     ` Óscar Fuentes
  2016-02-10  2:18       ` Artur Malabarba
  2016-02-13 16:32       ` On language-dependent defaults for character-folding Marcin Borkowski
  1 sibling, 2 replies; 263+ messages in thread
From: Óscar Fuentes @ 2016-02-09 21:07 UTC (permalink / raw)
  To: emacs-devel

Artur Malabarba <bruce.connor.am@gmail.com> writes:

> On 9 February 2016 at 18:21, Óscar Fuentes <ofv@wanadoo.es> wrote:
>>> I don't know if it's possible to figure out the language of the user's
>>> keyboard layout. But the point is that we should care about the
>>> language that the user can type in,
>>
>> Figuring out this (and acting upon that knowledge) looks like a quite
>> complex task to me. In practice, letting the user tell Emacs about how
>> the char folding should happen is more reasonable.
>
> 1. Take the set of all characters in the language that the user types in;
> 2. Don't fold these characters.

Today I read your blog post about this feature:

http://endlessparentheses.com/new-in-emacs-25-1-easily-search-non-ascii-characters.html

where you say

"As any Brazilian, I am a daily user of diacritical marks (ó, ã, ê, and
the likes), and even though my keyboard can type these characters, I
still enjoy the simplicity of not having to."

And now I'm utterly confused. Your example is about using the feature
within your language, which you admit you have no problem with writing,
and now you talk about not folding the characters of the user's
language?

When at first I looked at the feature I thought that it was precisely
about what you mention on the blog entry and deemed it as something I
would use for the same reasons you mention on your example, until I
noticed the issue with n/ñ, when I was told that the feature was about
something else.

> That's all the complexity. If we have a database of characters in a
> language, this could even be done automatically. If we don't have such
> a database, then all we need is some quick input from a user of that
> language (this doesn't need to happen all at once, there's no rush).
>
>> I hope you all realize that the users who can benefit from this feature
>> are those who are ill-equiped to search for certain characters,
>
> I could be wrong, but I think you just defined all users. In the
> Unicode standard used by Emacs, there are 5721 characters with a
> “decomposition” property. Is there a user who is well-equiped to type
> all of those characters?

(And how many of those 5721 characters can be matched from a latin
letter?)

How typical for an Emacs user is to have to *search* (not write) for a
composed character that he can not type with his input setup? Sure,
people like Eli may have to do that quite often, because he has an
heterogeneous cultural background and also works on tasks related to
internationalization, but it is reasonable to assume that most users
will not need the feature often, if at all.

From my POV, if you see the feature as an aid for searching composed
characters by people without the adequate input method, there is no
problem at all. Just make it optional, perhaps toggable while inside
Isearch. This way the people who need it can use it, and Isearch will
not break for the rest.

[snip]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-09 17:54   ` Paul Eggert
@ 2016-02-10  0:49     ` Pierpaolo Bernardi
  2016-02-10  2:20       ` Artur Malabarba
  0 siblings, 1 reply; 263+ messages in thread
From: Pierpaolo Bernardi @ 2016-02-10  0:49 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Artur Malabarba, emacs-devel

On Tue, Feb 9, 2016 at 6:54 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:
> On 02/09/2016 09:39 AM, Pierpaolo Bernardi wrote:
>>
>> So, if I'm using my laptop on which I use a US-international layout I
>> will get no folding for any character in Latin-1
>
> That's not what Artur's saying. The layout of the keyboard hardware is not
> the same thing as the language that the user can easily type in.

How so?  The layout of the keyboard hardware and its driver are
fundamental parts of what one can easily type in.

The point is that he proposes to have the default behavior of Emacs be
different depending on random environmental features of the computer
it's running on.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-09 20:08     ` Eli Zaretskii
@ 2016-02-10  1:58       ` Artur Malabarba
  0 siblings, 0 replies; 263+ messages in thread
From: Artur Malabarba @ 2016-02-10  1:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 577 bytes --]

On 9 Feb 2016 6:08 pm, "Eli Zaretskii" <eliz@gnu.org> wrote:
> > 1. Take the set of all characters in the language that the user types
in;
> > 2. Don't fold these characters.
>
> I think should make an exception to rule 2 for character sequences
> that are displayed as some character in the user's language: those
> must be folded, otherwise the result will be very confusing.  For
> example, searching for ñ (one character) should also find a sequence
> of 2 characters ñ, and vice versa, even for languages where ñ can be
> typed on the keyboard.

I agree.

[-- Attachment #2: Type: text/html, Size: 742 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-09 21:07     ` Óscar Fuentes
@ 2016-02-10  2:18       ` Artur Malabarba
  2016-02-10  2:52         ` Óscar Fuentes
                           ` (3 more replies)
  2016-02-13 16:32       ` On language-dependent defaults for character-folding Marcin Borkowski
  1 sibling, 4 replies; 263+ messages in thread
From: Artur Malabarba @ 2016-02-10  2:18 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1339 bytes --]

On 9 Feb 2016 7:07 pm, "Óscar Fuentes" <ofv@wanadoo.es> wrote:
> >
> > 1. Take the set of all characters in the language that the user types
in;
> > 2. Don't fold these characters.
>
> Today I read your blog post about this feature:  [...]
>
> And now I'm utterly confused. Your example is about using the feature
> within your language, which you admit you have no problem with writing,
> and now you talk about not folding the characters of the user's
> language?

I'm sorry that post confused you. That post states my personal preference
(I like the "fold all unicode decompositions" behaviour). That post does
NOT reflect what I think should be the default. What I've written here on
this thread is what I think should be the default.

Although currently Emacs does fold all decompositions by default, this is
just temporary. We've said we would turn that off before release (and in
fact I'll do that tomorrow (and ammend my post too)).

> (And how many of those 5721 characters can be matched from a latin
> letter?)

OK, I see what you meant.

> How typical for an Emacs user is to have to *search* (not write) for a
> composed character that he can not type with his input setup?

I have no idea, which is why this feature will be off by default until I
feel confident it won't get in anyone's way.

[-- Attachment #2: Type: text/html, Size: 1641 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-10  0:49     ` Pierpaolo Bernardi
@ 2016-02-10  2:20       ` Artur Malabarba
  2016-02-10  3:01         ` Pierpaolo Bernardi
  0 siblings, 1 reply; 263+ messages in thread
From: Artur Malabarba @ 2016-02-10  2:20 UTC (permalink / raw)
  To: Pierpaolo Bernardi; +Cc: Paul Eggert, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 324 bytes --]

On 9 Feb 2016 10:49 pm, "Pierpaolo Bernardi" <olopierpa@gmail.com> wrote:
> The point is that he proposes to have the default behavior of Emacs be
> different depending on random environmental features of the computer
> it's running on.

Except for the word "random", yes, that was the proposal. Why do you feel
that's bad?

[-- Attachment #2: Type: text/html, Size: 454 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-10  2:18       ` Artur Malabarba
@ 2016-02-10  2:52         ` Óscar Fuentes
  2016-02-10  2:56         ` Mark Oteiza
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 263+ messages in thread
From: Óscar Fuentes @ 2016-02-10  2:52 UTC (permalink / raw)
  To: emacs-devel

Artur Malabarba <bruce.connor.am@gmail.com> writes:

> I'm sorry that post confused you. That post states my personal preference
> (I like the "fold all unicode decompositions" behaviour).

Possibly in Portuguese there is no problem with folding matching
unrelated characters. If it wasn't for the n/ñ case in Spanish, most
likely I would turn on the feature on my setup.

>> How typical for an Emacs user is to have to *search* (not write) for a
>> composed character that he can not type with his input setup?
>
> I have no idea, which is why this feature will be off by default until I
> feel confident it won't get in anyone's way.

That's very reasonable. Thank you.




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-10  2:18       ` Artur Malabarba
  2016-02-10  2:52         ` Óscar Fuentes
@ 2016-02-10  2:56         ` Mark Oteiza
  2016-02-10 15:25         ` Eli Zaretskii
  2016-02-11  0:54         ` Juri Linkov
  3 siblings, 0 replies; 263+ messages in thread
From: Mark Oteiza @ 2016-02-10  2:56 UTC (permalink / raw)
  To: emacs-devel


Artur Malabarba <bruce.connor.am@gmail.com> writes:
> Although currently Emacs does fold all decompositions by default, this
> is just temporary. We've said we would turn that off before release
> (and in fact I'll do that tomorrow (and amend my post too)).

Thank you.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-10  2:20       ` Artur Malabarba
@ 2016-02-10  3:01         ` Pierpaolo Bernardi
  2016-02-10  9:55           ` Artur Malabarba
  0 siblings, 1 reply; 263+ messages in thread
From: Pierpaolo Bernardi @ 2016-02-10  3:01 UTC (permalink / raw)
  To: Artur Malabarba; +Cc: Paul Eggert, emacs-devel

On Wed, Feb 10, 2016 at 3:20 AM, Artur Malabarba
<bruce.connor.am@gmail.com> wrote:
> On 9 Feb 2016 10:49 pm, "Pierpaolo Bernardi" <olopierpa@gmail.com> wrote:
>> The point is that he proposes to have the default behavior of Emacs be
>> different depending on random environmental features of the computer
>> it's running on.
>
> Except for the word "random", yes, that was the proposal. Why do you feel
> that's bad?

Because I want a consistent behavior.  The example I made is not
invented, I use regularly more than one machine. These machines have
different keyboards layouts and drivers, because not all of them are
under my control, and I cannot uniform their hardware and system
software, even if I wished to do so.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-10  3:01         ` Pierpaolo Bernardi
@ 2016-02-10  9:55           ` Artur Malabarba
  2016-02-10 18:12             ` Óscar Fuentes
  0 siblings, 1 reply; 263+ messages in thread
From: Artur Malabarba @ 2016-02-10  9:55 UTC (permalink / raw)
  To: Pierpaolo Bernardi; +Cc: Paul Eggert, emacs-devel

On 10 February 2016 at 03:01, Pierpaolo Bernardi <olopierpa@gmail.com> wrote:
>> Except for the word "random", yes, that was the proposal. Why do you feel
>> that's bad?
>
> Because I want a consistent behavior.  The example I made is not
> invented, I use regularly more than one machine. These machines have
> different keyboards layouts and drivers, because not all of them are
> under my control, and I cannot uniform their hardware and system
> software, even if I wished to do so.

That's my situation too. Half the time I'm on an english keyboard,
where I would be glad if Emacs helped me out with Portuguese
diacritics.
Of course, that'ts just my opinion. I'd like to understand other
people's opinon too.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-09 17:26 On language-dependent defaults for character-folding Artur Malabarba
                   ` (3 preceding siblings ...)
  2016-02-09 18:21 ` Óscar Fuentes
@ 2016-02-10 13:52 ` Adrian.B.Robert
  2016-02-24  9:58 ` Marcin Borkowski
  5 siblings, 0 replies; 263+ messages in thread
From: Adrian.B.Robert @ 2016-02-10 13:52 UTC (permalink / raw)
  To: emacs-devel

Artur Malabarba <bruce.connor.am@gmail.com> writes:

> Char folding is primarily about being able to easily search for
> characters that you can't easily type. It also has secondary uses,
> like searching when you're not even sure which character you want to
> search for, but I'm focusing on the first.

Thank you.  I wish there were more posting of actual use cases like this in
the present discussion.  I feel like a lot of the posts so far are along the
lines of "Because X, I don't want this to be the *default*", which it isn't
going to be anyway, and very few are about "I want character folding so I can
*do* Y."  So far I've seen:

1) Easily search for not-easily typable characters, by casting a wide net.

2) Search for composed and decomposed variants of the same character.

Note that these would be best served by two *different* features.  #2 by true
unicode-composition folding, and #1 by broader "optical" classes that are
roughly but not exactly captured by searching for any character whose
decomposition contains the template.

Are there any other things that people *would like* to do with character
folding (besides turn it off if it got in their way)?

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-10  2:18       ` Artur Malabarba
  2016-02-10  2:52         ` Óscar Fuentes
  2016-02-10  2:56         ` Mark Oteiza
@ 2016-02-10 15:25         ` Eli Zaretskii
  2016-02-10 21:17           ` Artur Malabarba
                             ` (2 more replies)
  2016-02-11  0:54         ` Juri Linkov
  3 siblings, 3 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-10 15:25 UTC (permalink / raw)
  To: bruce.connor.am; +Cc: ofv, emacs-devel

> Date: Wed, 10 Feb 2016 02:18:03 +0000
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Cc: emacs-devel <emacs-devel@gnu.org>
> 
> > > I could be wrong, but I think you just defined all users. In the
> > > Unicode standard used by Emacs, there are 5721 characters with a
> > > “decomposition” property. Is there a user who is well-equiped to type
> > > all of those characters?
> > 
> > (And how many of those 5721 characters can be matched from a latin
> > letter?)
> 
> OK, I see what you meant.

You do?  I don't, because the answer to Óscar's question is: 376 if we
count only canonical decompositions (which we must support, or users
will hate us), and a whopping 1449 if we count compatibility
decompositions as well.  That's quite a few, I'd say, although AFAIR
we don't find all of the compatibility decompositions under character
folding, only some.

Btw, from my POV, the ease of searching for characters not on my
keyboard is not the main point of this feature.  The main feature is
to search for similar characters.  (Of course, I don't mind if someone
likes this for other reasons.)

> Although currently Emacs does fold all decompositions by default, this is just temporary. We've said we would turn that off before release (and in fact I'll do that tomorrow (and ammend my post too)).

We didn't say we will turn it off, we said we will _decide_ whether to
turn it off.  So please don't turn it off just yet, we are still
collecting feedback.  If anything, for now I counted more people who
said they liked it than those who didn't (5 vs 9, by my count).  I'm
not saying we should already decide to leave it on, but turning it off
is certainly premature.  Less than two weeks have passed since the
pretest began, there's no rush.

Thanks.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-10  9:55           ` Artur Malabarba
@ 2016-02-10 18:12             ` Óscar Fuentes
  2016-02-10 19:23               ` Artur Malabarba
  0 siblings, 1 reply; 263+ messages in thread
From: Óscar Fuentes @ 2016-02-10 18:12 UTC (permalink / raw)
  To: emacs-devel

Artur Malabarba <bruce.connor.am@gmail.com> writes:

> That's my situation too. Half the time I'm on an english keyboard,
> where I would be glad if Emacs helped me out with Portuguese
> diacritics.

Why don't you configure your input method? Almost all the time I use a
US keyboard and have no problem entering diacritics, thanks to the
US-International input method of the OS. Emacs has its own input method
mechanism too which works on an almost identical way.

[snip]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-10 18:12             ` Óscar Fuentes
@ 2016-02-10 19:23               ` Artur Malabarba
  0 siblings, 0 replies; 263+ messages in thread
From: Artur Malabarba @ 2016-02-10 19:23 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 371 bytes --]

On 10 Feb 2016 4:12 pm, "Óscar Fuentes" <ofv@wanadoo.es> wrote:
>
> > That's my situation too. Half the time I'm on an english keyboard,
> > where I would be glad if Emacs helped me out with Portuguese
> > diacritics.
>
> Why don't you configure your input method?

Yes, I usually turn on an input method. Char folding is just more
convenient (WRT searching).

[-- Attachment #2: Type: text/html, Size: 519 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-10 15:25         ` Eli Zaretskii
@ 2016-02-10 21:17           ` Artur Malabarba
  2016-02-11  3:39             ` Eli Zaretskii
  2016-02-12 22:36           ` Per Starbäck
  2016-02-13 16:46           ` joakim
  2 siblings, 1 reply; 263+ messages in thread
From: Artur Malabarba @ 2016-02-10 21:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1587 bytes --]

On 10 Feb 2016 1:25 pm, "Eli Zaretskii" <eliz@gnu.org> wrote:
> > > (And how many of those 5721 characters can be matched from a latin
> > > letter?)
> >
> > OK, I see what you meant.
>
> You do?

I think so. But I don't want to prolong that line of thought, because it
wasn't a useful argument anyway.

> Btw, from my POV, the ease of searching for characters not on my
> keyboard is not the main point of this feature.  The main feature is
> to search for similar characters.  (Of course, I don't mind if someone
> likes this for other reasons.)

That's actually my personal preference too. I like that I can search for
"o" and hit "õ" (both are used in Portuguese text).
However, this would not be a good _default_ for Brazilian users.  Because
once in a while you might not want it, and if the user didn't enable this
behaviour himself he probably won't know that it can be disabled. (at
least, this is what I think right now).

> > Although currently Emacs does fold all decompositions by default, this
is just temporary. We've said we would turn that off before release (and in
fact I'll do that tomorrow (and ammend my post too)).
>
> We didn't say we will turn it off, we said we will _decide_ whether to
> turn it off.  So please don't turn it off just yet, we are still
> collecting feedback.

Sorry, I already did earlier today. Seems I was under the wrong impression.
Feel free to turn it back on for now.

FTR, my feedback is that I'd like to give the implementation a little more
time before enabling it by default on a stable release.

[-- Attachment #2: Type: text/html, Size: 1929 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-10  2:18       ` Artur Malabarba
                           ` (2 preceding siblings ...)
  2016-02-10 15:25         ` Eli Zaretskii
@ 2016-02-11  0:54         ` Juri Linkov
  2016-02-11  1:37           ` Óscar Fuentes
  3 siblings, 1 reply; 263+ messages in thread
From: Juri Linkov @ 2016-02-11  0:54 UTC (permalink / raw)
  To: Artur Malabarba; +Cc: Óscar Fuentes, emacs-devel

> I have no idea, which is why this feature will be off by default until I
> feel confident it won't get in anyone's way.

How regrettable would be to disable such a useful feature.  I'm using
char-folding every day a dozen times on multiple languages/scripts in
Chromium, and it's a major inconvenience not to be able to use the same
in Emacs.  Let's not hide/postpone this feature due to an inability to
reach a consensus on the default values - we could use the same defaults
as in Chromium.  These are sane defaults based on Unicode standards and
used by millions users.  I haven't noticed any annoying matching by the
default rules despite not being able to change hard-coded rules or disable
char-folding.  Unlike Chromium, Emacs is more extensible and customizable,
thus we urgently need to provide customization, so everyone could easily
add/remove char-folding rules to/from the default set.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-11  0:54         ` Juri Linkov
@ 2016-02-11  1:37           ` Óscar Fuentes
  2016-02-12  0:50             ` Juri Linkov
  0 siblings, 1 reply; 263+ messages in thread
From: Óscar Fuentes @ 2016-02-11  1:37 UTC (permalink / raw)
  To: emacs-devel

Juri Linkov <juri@linkov.net> writes:

>> I have no idea, which is why this feature will be off by default until I
>> feel confident it won't get in anyone's way.
>
> How regrettable would be to disable such a useful feature.  I'm using
> char-folding every day a dozen times on multiple languages/scripts in
> Chromium, and it's a major inconvenience not to be able to use the same
> in Emacs.

Is there something that prevents you from enabling the feature on your
setup?

> Let's not hide/postpone this feature due to an inability to
> reach a consensus on the default values - we could use the same defaults
> as in Chromium.

Just checked. Chromium has the n/ñ bug. Chrome doesn't.

> These are sane defaults based on Unicode standards

Unicode doesn't have a saying on what is correct on any given language.

> and used by millions users.

Do you have statistics about Chromium users who take advantage of
character folding?

> I haven't noticed any annoying matching by the
> default rules despite not being able to change hard-coded rules or disable
> char-folding.

Possibly the languagues you use do not collide with naïve character
composition rules, or you ignore them or simply don't care about such
rules.

> Unlike Chromium, Emacs is more extensible and customizable,
> thus we urgently need to provide customization, so everyone could easily
> add/remove char-folding rules to/from the default set.

It is reasonable to expect from a serious text editor that when you
search for a letter it finds that letter, not unrelated letters. With
the default configuration, of course.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-10 21:17           ` Artur Malabarba
@ 2016-02-11  3:39             ` Eli Zaretskii
  0 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-11  3:39 UTC (permalink / raw)
  To: bruce.connor.am; +Cc: emacs-devel

> Date: Wed, 10 Feb 2016 21:17:49 +0000
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Cc: emacs-devel <emacs-devel@gnu.org>
> 
> > We didn't say we will turn it off, we said we will _decide_ whether to
> > turn it off. So please don't turn it off just yet, we are still
> > collecting feedback.
> 
> Sorry, I already did earlier today. Seems I was under the wrong impression. Feel free to turn it back on for
> now.

Done.

> FTR, my feedback is that I'd like to give the implementation a little more time before enabling it by default on a
> stable release. 

Thanks.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-11  1:37           ` Óscar Fuentes
@ 2016-02-12  0:50             ` Juri Linkov
  2016-02-12  1:50               ` Óscar Fuentes
  0 siblings, 1 reply; 263+ messages in thread
From: Juri Linkov @ 2016-02-12  0:50 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

> Possibly the languagues you use do not collide with naïve character
> composition rules, or you ignore them or simply don't care about such
> rules.

Isearch shines in navigation.  For example, to move point quickly to the
part of your message that contains the word “naïve”, I could simply type
‘C-s naive’.  Otherwise, it would take a lot of time entering the char
“LATIN SMALL LETTER I WITH DIAERESIS” to the search string.  This is the
reason why char-folding search is so enormously useful, even though
“naïve” and “naive” are different words from the formal grammatical
point of view.

>> Unlike Chromium, Emacs is more extensible and customizable,
>> thus we urgently need to provide customization, so everyone could easily
>> add/remove char-folding rules to/from the default set.
>
> It is reasonable to expect from a serious text editor that when you
> search for a letter it finds that letter, not unrelated letters. With
> the default configuration, of course.

It's much safer to have a default where you are not in danger to miss
important things.  When a strict non-case-folding search skips a match,
you don't know about this loss until you discover later the damage.
With the case-folding search, you're visiting all possible matches,
and when you think it finds too much, you can narrow the results
by disabling this feature.  This is why its counterpart case-fold-search
is opt-out as well.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12  0:50             ` Juri Linkov
@ 2016-02-12  1:50               ` Óscar Fuentes
  2016-02-12  7:10                 ` Eli Zaretskii
                                   ` (2 more replies)
  0 siblings, 3 replies; 263+ messages in thread
From: Óscar Fuentes @ 2016-02-12  1:50 UTC (permalink / raw)
  To: emacs-devel

Juri Linkov <juri@linkov.net> writes:

>> Possibly the languagues you use do not collide with naïve character
>> composition rules, or you ignore them or simply don't care about such
>> rules.
>
> Isearch shines in navigation.

My opinion is that Isearch is terrible for navigation. You may be
interested on ace-jump or avy, for jumping to a point that is visible,
or a plethora of terrific packages for jumping to a point that is not
visible.

[snip]

> It's much safer to have a default where you are not in danger to miss
> important things.

A search that matches unrelated text is broken. Full stop. It is
possible that, because whatever reason, the brokenness can be convenient
for you, but enabling a feature which is convenient for some users and
plain wrong for others is not reasonable.

> When a strict non-case-folding search skips a match,
> you don't know about this loss until you discover later the damage.
> With the case-folding search, you're visiting all possible matches,

ñ is not a match for n, as long as you follow the rules of the Spanish
language. That's the crux of the matter. It is the same as if an English
speaker searched "vow" and matched "wow".

> and when you think it finds too much, you can narrow the results
> by disabling this feature. This is why its counterpart case-fold-search
> is opt-out as well.

case-fold-search is in another category. character-folding *could* be ok
as a default if it were governed by the linguistic rules expected by the
user. That's not easy to implement, though, as it seems that there is
controversy on some languages. Spanish is very easy on that aspect.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12  1:50               ` Óscar Fuentes
@ 2016-02-12  7:10                 ` Eli Zaretskii
  2016-02-12  7:32                   ` Óscar Fuentes
  2016-02-12 23:50                 ` Juri Linkov
  2016-02-13 16:38                 ` Marcin Borkowski
  2 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-12  7:10 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

> From: Óscar Fuentes <ofv@wanadoo.es>
> Date: Fri, 12 Feb 2016 02:50:20 +0100
> 
> ñ is not a match for n, as long as you follow the rules of the Spanish
> language.

Actually, it should be when ñ is in fact ñ (two characters).



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12  7:10                 ` Eli Zaretskii
@ 2016-02-12  7:32                   ` Óscar Fuentes
  2016-02-12  8:44                     ` Eli Zaretskii
  0 siblings, 1 reply; 263+ messages in thread
From: Óscar Fuentes @ 2016-02-12  7:32 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> ñ is not a match for n, as long as you follow the rules of the
>> Spanish language.
>
> Actually, it should be when ñ is in fact ñ (two characters).

If ñ is meant to be read as ñ, as when it is found on a Spanish word,
then ñ and ñ are the same to all effects, so no match should happen.

Again, composition rules are irrelevant for a knowledgeable reader of a
given language. What matters is the meaning of the characters (composed
or not).

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12  7:32                   ` Óscar Fuentes
@ 2016-02-12  8:44                     ` Eli Zaretskii
  2016-02-12 10:03                       ` Óscar Fuentes
                                         ` (2 more replies)
  0 siblings, 3 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-12  8:44 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

> From: Óscar Fuentes <ofv@wanadoo.es>
> Date: Fri, 12 Feb 2016 08:32:25 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> ñ is not a match for n, as long as you follow the rules of the
> >> Spanish language.
> >
> > Actually, it should be when ñ is in fact ñ (two characters).
> 
> If ñ is meant to be read as ñ

Don't you see them displayed identically in Emacs (and in any other
program that correctly implements display of combining accents)?
Maybe I don't really understand that "if" part.

> as when it is found on a Spanish word,

Display of combining accents is not language-specific.  It should
always happen in human-readable text.

> then ñ and ñ are the same to all effects, so no match should happen.

You mean, a match should happen, right?  Otherwise, I'm afraid I see
no sense in this logic: IMO identically looking text should match, or
else users will kill us.

If you agree that a match is TRT in these (and other similar) cases,
then you should agree that _some_ form of character folding should be
turned on by default.

> Again, composition rules are irrelevant for a knowledgeable reader of a
> given language. What matters is the meaning of the characters (composed
> or not).

What is "the meaning of the characters"?  Can pieces of text that are
displayed identically have different meaning?

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12  8:44                     ` Eli Zaretskii
@ 2016-02-12 10:03                       ` Óscar Fuentes
  2016-02-12 11:11                         ` Joost Kremers
  2016-02-12 12:00                         ` Eli Zaretskii
  2016-02-13 15:32                       ` Richard Stallman
  2016-02-13 16:37                       ` Marcin Borkowski
  2 siblings, 2 replies; 263+ messages in thread
From: Óscar Fuentes @ 2016-02-12 10:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> If ñ is meant to be read as ñ
>
> Don't you see them displayed identically in Emacs (and in any other
> program that correctly implements display of combining accents)?
> Maybe I don't really understand that "if" part.

They look a bit different here.

>> as when it is found on a Spanish word,
>
> Display of combining accents is not language-specific.  It should
> always happen in human-readable text.
>
>> then ñ and ñ are the same to all effects, so no match should happen.
>
> You mean, a match should happen, right?

ñ shall match ñ, but n shall not match either, from an Spaniard POV.

> Otherwise, I'm afraid I see
> no sense in this logic: IMO identically looking text should match, or
> else users will kill us.

Agreed, although in practice your example is not a big issue since I do
expect to rarely see ñ (the composed variant) used in Spanish text. And
probably not easy to implement at all for the general case (all
identical-looking combinations for all languages).

> If you agree that a match is TRT in these (and other similar) cases,
> then you should agree that _some_ form of character folding should be
> turned on by default.

I see where are you coming from ;-) On my first message on this thread I
said that I was ambivalent wrt the default status of this feature,
before finding the n/ñ issue. Not so after. A Spaniard could also deem
useful to match ú and ü while searching for u. See, the problem here is
not character-folding itsef, but how it works: a non-Spaniard could
expect matching ñ while searching for n, because for him ñ is a `n' with
a tilde, which is essentially the same case as the `u' example mentioned
above but from the POV of someone who doesn't know Spanish. (*)

[snip]

* My English dictionary says:

1. tilde -- (a diacritical mark (~) placed over the letter n in Spanish
to indicate a palatal nasal sound or over a vowel in Portuguese to
indicate nasalization)

No wonder that so many people seems to have a hard time recognizing that
ñ is a letter like any other in Spanish, not just an `n' with a tilde.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12 10:03                       ` Óscar Fuentes
@ 2016-02-12 11:11                         ` Joost Kremers
  2016-02-12 18:21                           ` Óscar Fuentes
  2016-02-12 12:00                         ` Eli Zaretskii
  1 sibling, 1 reply; 263+ messages in thread
From: Joost Kremers @ 2016-02-12 11:11 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: Eli Zaretskii, emacs-devel

On Fri, Feb 12 2016, Óscar Fuentes <ofv@wanadoo.es> wrote:
> No wonder that so many people seems to have a hard time recognizing that
> ñ is a letter like any other in Spanish, not just an `n' with a tilde.

Actually, without wanting to be pedantic, but ⟨ñ⟩ (the grapheme) *is*
just an ⟨n⟩ with a tilde, regardless of the language one is talking
about. The reason why a native speaker of Spanish considers n and ñ to
be two different letters is because they represent two different
*phonemes* of the Spanish language: /n/ vs. /ɲ/.

The term `letter' (as an alphabetic character) is notoriously imprecise,
which is the cause of much confusion.

-- 
Joost Kremers
Life has its moments

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12 10:03                       ` Óscar Fuentes
  2016-02-12 11:11                         ` Joost Kremers
@ 2016-02-12 12:00                         ` Eli Zaretskii
  2016-02-12 18:42                           ` Óscar Fuentes
  1 sibling, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-12 12:00 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

> From: Óscar Fuentes <ofv@wanadoo.es>
> Cc: emacs-devel@gnu.org
> Date: Fri, 12 Feb 2016 11:03:09 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> If ñ is meant to be read as ñ
> >
> > Don't you see them displayed identically in Emacs (and in any other
> > program that correctly implements display of combining accents)?
> > Maybe I don't really understand that "if" part.
> 
> They look a bit different here.

It could be an issue with your default font.  Perhaps it doesn't have
the precomposed glyph.

> ñ shall match ñ, but n shall not match either, from an Spaniard POV.

But in the case of 2 characters, a literal n is present in the buffer,
so not finding it would be a miss, don't you think?

> > Otherwise, I'm afraid I see
> > no sense in this logic: IMO identically looking text should match, or
> > else users will kill us.
> 
> Agreed, although in practice your example is not a big issue since I do
> expect to rarely see ñ (the composed variant) used in Spanish text. And
> probably not easy to implement at all for the general case (all
> identical-looking combinations for all languages).

We do that by using the Unicode database, because then we are free
from the need to decide whether a given diacrtic can or cannot combine
with a given base character.

> > If you agree that a match is TRT in these (and other similar) cases,
> > then you should agree that _some_ form of character folding should be
> > turned on by default.
> 
> I see where are you coming from ;-) On my first message on this thread I
> said that I was ambivalent wrt the default status of this feature,
> before finding the n/ñ issue. Not so after. A Spaniard could also deem
> useful to match ú and ü while searching for u. See, the problem here is
> not character-folding itsef, but how it works: a non-Spaniard could
> expect matching ñ while searching for n, because for him ñ is a `n' with
> a tilde, which is essentially the same case as the `u' example mentioned
> above but from the POV of someone who doesn't know Spanish. (*)

What about finding ⒜ when searching for a, don't you want to find
that?  This is not specific to any language.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12 11:11                         ` Joost Kremers
@ 2016-02-12 18:21                           ` Óscar Fuentes
  0 siblings, 0 replies; 263+ messages in thread
From: Óscar Fuentes @ 2016-02-12 18:21 UTC (permalink / raw)
  To: emacs-devel

Joost Kremers <joostkremers@fastmail.fm> writes:

> Actually, without wanting to be pedantic, but ⟨ñ⟩ (the grapheme) *is*
> just an ⟨n⟩ with a tilde, regardless of the language one is talking
> about. The reason why a native speaker of Spanish considers n and ñ to
> be two different letters is because they represent two different
> *phonemes* of the Spanish language: /n/ vs. /ɲ/.

Actually, Spaniards consider ñ to be a letter because that is what we
are taught at school. That's what sets our expectations when we use text
editors.

> The term `letter' (as an alphabetic character) is notoriously imprecise,
> which is the cause of much confusion.

In Spanish, "letter" is precisely defined. We have 27 of them. `ch' and
`ll' were letters in Spanish until 2010, when the Academies decided to
demote them, following widespread public opinion. That will not happen
to ñ anytime soon.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12 12:00                         ` Eli Zaretskii
@ 2016-02-12 18:42                           ` Óscar Fuentes
  2016-02-12 19:06                             ` Eli Zaretskii
  2016-02-12 19:09                             ` Clément Pit--Claudel
  0 siblings, 2 replies; 263+ messages in thread
From: Óscar Fuentes @ 2016-02-12 18:42 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> ñ shall match ñ, but n shall not match either, from an Spaniard POV.
>
> But in the case of 2 characters, a literal n is present in the buffer,
> so not finding it would be a miss, don't you think?

Then you are not thinking as an Spaniard, but as someone who is versed
on character representations by computers.

In practice, n matching ñ (the composed one) will not be a big issue,
since it will happen rarely. Same for the rest of compositions that
looks like ñ but are not "the" ñ. If someone complains, we can explain
what the problem is and that we opted for handling such compositions as
groups of characters.

> What about finding ⒜ when searching for a, don't you want to find
> that?  This is not specific to any language.

That would be nice, sometimes. If I search for (a), should it match ⒜?
What if I wish to replace all occurrences of (a) by [1]? Do you really
want to go down that route?

But we are digressing. Eli, you are missing the point. If you wish to
set Emacs defaults as per the convenience of people who think of text as
a series of codes at the expense of breaking basic expectations of those
who see text as... text, well, frankly, I don't think it is a good
decision.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12 18:42                           ` Óscar Fuentes
@ 2016-02-12 19:06                             ` Eli Zaretskii
  2016-02-12 19:28                               ` Óscar Fuentes
  2016-02-12 23:57                               ` Juri Linkov
  2016-02-12 19:09                             ` Clément Pit--Claudel
  1 sibling, 2 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-12 19:06 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

> From: Óscar Fuentes <ofv@wanadoo.es>
> Date: Fri, 12 Feb 2016 19:42:50 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> ñ shall match ñ, but n shall not match either, from an Spaniard POV.
> >
> > But in the case of 2 characters, a literal n is present in the buffer,
> > so not finding it would be a miss, don't you think?
> 
> Then you are not thinking as an Spaniard, but as someone who is versed
> on character representations by computers.

Aren't there Spaniards who are also versed on character
representations by computers?

> In practice, n matching ñ (the composed one) will not be a big issue,
> since it will happen rarely. Same for the rest of compositions that
> looks like ñ but are not "the" ñ. If someone complains, we can explain
> what the problem is and that we opted for handling such compositions as
> groups of characters.

So you do think this, too, is not a problem?

> > What about finding ⒜ when searching for a, don't you want to find
> > that?  This is not specific to any language.
> 
> That would be nice, sometimes. If I search for (a), should it match ⒜?

I don't know.  What do you think?

> What if I wish to replace all occurrences of (a) by [1]? Do you really
> want to go down that route?

I don't think so, no.

> But we are digressing. Eli, you are missing the point. If you wish to
> set Emacs defaults as per the convenience of people who think of text as
> a series of codes at the expense of breaking basic expectations of those
> who see text as... text, well, frankly, I don't think it is a good
> decision.

I was trying to develop a dialogue which will help me and you
understand where your resistance begins and where it ends.  I think
it's important to do that to better understand the issues, but if you
don't want that, we can stop any moment.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12 18:42                           ` Óscar Fuentes
  2016-02-12 19:06                             ` Eli Zaretskii
@ 2016-02-12 19:09                             ` Clément Pit--Claudel
  2016-02-12 19:39                               ` Óscar Fuentes
  1 sibling, 1 reply; 263+ messages in thread
From: Clément Pit--Claudel @ 2016-02-12 19:09 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 618 bytes --]

Hey Óscar,

On 02/12/2016 01:42 PM, Óscar Fuentes wrote:
> But we are digressing. Eli, you are missing the point. If you wish to
> set Emacs defaults as per the convenience of people who think of text as
> a series of codes at the expense of breaking basic expectations of those
> who see text as... text, well, frankly, I don't think it is a good
> decision.

I think your opinion is clear; so is that of other people in this thread. Don't generalize excessively, however: I don't think of text as a series of codes, but I do love the current default, and it meets many of my expectations.

Clément.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12 19:06                             ` Eli Zaretskii
@ 2016-02-12 19:28                               ` Óscar Fuentes
  2016-02-12 23:57                               ` Juri Linkov
  1 sibling, 0 replies; 263+ messages in thread
From: Óscar Fuentes @ 2016-02-12 19:28 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> > But in the case of 2 characters, a literal n is present in the buffer,
>> > so not finding it would be a miss, don't you think?
>> 
>> Then you are not thinking as an Spaniard, but as someone who is versed
>> on character representations by computers.
>
> Aren't there Spaniards who are also versed on character
> representations by computers?

Maybe less than the 0.1% of the population, but yes. Even those may
prefer a default that works for them as Spaniards rather that a default
that works for them as users familiarised with text encoding.

>> In practice, n matching ñ (the composed one) will not be a big issue,
>> since it will happen rarely. Same for the rest of compositions that
>> looks like ñ but are not "the" ñ. If someone complains, we can explain
>> what the problem is and that we opted for handling such compositions as
>> groups of characters.
>
> So you do think this, too, is not a problem?

Do we have resources for setting a default that works as the expected by
each and every user all the time? (If possible at all)

>> > What about finding ⒜ when searching for a, don't you want to find
>> > that?  This is not specific to any language.
>> 
>> That would be nice, sometimes. If I search for (a), should it match ⒜?
>
> I don't know.  What do you think?

It depends. It's like `a' matching `á' but on steroids. Sometimes I'll
find it convenient and sometimes inconvenient. Those are different cases
than doing something that is plain wrong for a set of users and
convenient for others.

>> But we are digressing. Eli, you are missing the point. If you wish to
>> set Emacs defaults as per the convenience of people who think of text as
>> a series of codes at the expense of breaking basic expectations of those
>> who see text as... text, well, frankly, I don't think it is a good
>> decision.
>
> I was trying to develop a dialogue which will help me and you
> understand where your resistance begins and where it ends.  I think
> it's important to do that to better understand the issues, but if you
> don't want that, we can stop any moment.

I think that I explained it many times, but here it goes again:
character folding, as implemented today, might be convenient for some
users, but a glaring bug for others, so its default status (on the
release) should be chosen on accordance. What's so difficult to
understand about that?

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12 19:09                             ` Clément Pit--Claudel
@ 2016-02-12 19:39                               ` Óscar Fuentes
  0 siblings, 0 replies; 263+ messages in thread
From: Óscar Fuentes @ 2016-02-12 19:39 UTC (permalink / raw)
  To: emacs-devel

Clément Pit--Claudel <clement.pit@gmail.com> writes:

> I think your opinion is clear; so is that of other people in this
> thread. Don't generalize excessively, however: I don't think of text
> as a series of codes, but I do love the current default, and it meets
> many of my expectations.

Clément, as mentioned on my first message, I thought that
character-folding *could* be a good default until I found the n/ñ issue
and read what other people wrote about similar cases on their languages.
And even on its current state character-folding is something that can be
useful from time to time to me, so I'm glad that it exists.

But this is not about me. I can enable or disable any feature, at any
time, on my config. It's about developing Emacs, and that requires
thinking on what's good for our users (actual and future).

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-10 15:25         ` Eli Zaretskii
  2016-02-10 21:17           ` Artur Malabarba
@ 2016-02-12 22:36           ` Per Starbäck
  2016-02-13  8:33             ` Eli Zaretskii
  2016-02-13 16:46           ` joakim
  2 siblings, 1 reply; 263+ messages in thread
From: Per Starbäck @ 2016-02-12 22:36 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ofv, Artur Malabarba, emacs-devel@gnu.org

Eli wrote:
> If anything, for now I counted more people who
> said they liked it than those who didn't (5 vs 9, by my count).  I'm
> not saying we should already decide to leave it on, but turning it off
> is certainly premature.  Less than two weeks have passed since the
> pretest began, there's no rush.

Collecting feedback is good, but that counting seems pointless to me
if you are counting one person mentioning that people in locale X will
see that behaviour as buggy, dumb or completely oblivious to their
culture as offset by one person saying they like the feature. It's not
about liking the feature or not. We have to listen to what the
feedback says instead of just counting it.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12  1:50               ` Óscar Fuentes
  2016-02-12  7:10                 ` Eli Zaretskii
@ 2016-02-12 23:50                 ` Juri Linkov
  2016-02-13  0:33                   ` Óscar Fuentes
  2016-02-13 16:38                 ` Marcin Borkowski
  2 siblings, 1 reply; 263+ messages in thread
From: Juri Linkov @ 2016-02-12 23:50 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

> That's not easy to implement, though, as it seems that there is
> controversy on some languages.

Don't you agree that it is very convenient to type just ‘C-s naive’
to find “naïve”?  What about https://en.wikipedia.org/wiki/%C3%8F
that brings an example in French of maïs (maize) vs. mais (but)?
And what to do with Spanish loanwords in English where the letter ñ
is kept intact as you can see in:
https://en.wikipedia.org/wiki/English_terms_with_diacritical_marks

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12 19:06                             ` Eli Zaretskii
  2016-02-12 19:28                               ` Óscar Fuentes
@ 2016-02-12 23:57                               ` Juri Linkov
  2016-02-13  0:06                                 ` Drew Adams
  2016-02-13  8:49                                 ` Eli Zaretskii
  1 sibling, 2 replies; 263+ messages in thread
From: Juri Linkov @ 2016-02-12 23:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Óscar Fuentes, emacs-devel

> I was trying to develop a dialogue which will help me and you
> understand where your resistance begins and where it ends.  I think
> it's important to do that to better understand the issues, but if you
> don't want that, we can stop any moment.

Can't we somehow use the same char-folding as is implemented in
ICU String Search Service (this is also used for search in Chromium):
http://userguide.icu-project.org/collation/icu-string-search-service
that supports matching of accented letters, conjoined letters,
and ignorable punctuation.

As is described in http://userguide.icu-project.org/collation/concepts
there are several levels of character matching:

1. Primary Level: differences between base characters

2. Secondary Level: Accents in the characters

3. Tertiary Level: Upper and lower case differences in characters

4. Quaternary Level: Punctuation is ignored (where e.g. snake-cased
   “black_bird” matches camel-cased “blackBird”)

5. Identical Level

Maybe our customization could provide options to choose
between all these levels?

^ permalink raw reply	[flat|nested] 263+ messages in thread

* RE: On language-dependent defaults for character-folding
  2016-02-12 23:57                               ` Juri Linkov
@ 2016-02-13  0:06                                 ` Drew Adams
  2016-02-13  8:49                                 ` Eli Zaretskii
  1 sibling, 0 replies; 263+ messages in thread
From: Drew Adams @ 2016-02-13  0:06 UTC (permalink / raw)
  To: Juri Linkov, Eli Zaretskii; +Cc: Óscar Fuentes, emacs-devel

> As is described in http://userguide.icu-project.org/collation/concepts
> there are several levels of character matching:
> 
> 1. Primary Level: differences between base characters
> 
> 2. Secondary Level: Accents in the characters
> 
> 3. Tertiary Level: Upper and lower case differences in characters
> 
> 4. Quaternary Level: Punctuation is ignored (where e.g. snake-cased
>    “black_bird” matches camel-cased “blackBird”)
> 
> 5. Identical Level
> 
> Maybe our customization could provide options to choose
> between all these levels?

+1

And not just options but also toggle commands. 

Thanks for guiding us to consider such groups (in addition to
other groupings that have been mentioned).



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12 23:50                 ` Juri Linkov
@ 2016-02-13  0:33                   ` Óscar Fuentes
  2016-02-14 13:57                     ` Richard Stallman
  0 siblings, 1 reply; 263+ messages in thread
From: Óscar Fuentes @ 2016-02-13  0:33 UTC (permalink / raw)
  To: emacs-devel

Juri Linkov <juri@linkov.net> writes:

>> That's not easy to implement, though, as it seems that there is
>> controversy on some languages.
>
> Don't you agree that it is very convenient to type just ‘C-s naive’
> to find “naïve”?

Oh, yes, it is convenient, no doubt. As it is convenient to ask for `a'
and be given `á'. That is convenient to me at least as much as to
anybody else.

What I find flabbergasting is the insistence on ignoring the "some
cases will be regarded as glaring bugs" part.

This is beginning to turn into a study on psychological bias :-)

[snip]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12 22:36           ` Per Starbäck
@ 2016-02-13  8:33             ` Eli Zaretskii
  2016-02-13 10:10               ` Markus Triska
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-13  8:33 UTC (permalink / raw)
  To: Per Starbäck; +Cc: ofv, bruce.connor.am, emacs-devel

> Date: Fri, 12 Feb 2016 23:36:46 +0100
> From: Per Starbäck <per.starback@gmail.com>
> Cc: Artur Malabarba <bruce.connor.am@gmail.com>, ofv@wanadoo.es, 
> 	"emacs-devel@gnu.org" <emacs-devel@gnu.org>
> 
> Eli wrote:
> > If anything, for now I counted more people who
> > said they liked it than those who didn't (5 vs 9, by my count).  I'm
> > not saying we should already decide to leave it on, but turning it off
> > is certainly premature.  Less than two weeks have passed since the
> > pretest began, there's no rush.
> 
> Collecting feedback is good, but that counting seems pointless to me
> if you are counting one person mentioning that people in locale X will
> see that behaviour as buggy, dumb or completely oblivious to their
> culture as offset by one person saying they like the feature. It's not
> about liking the feature or not. We have to listen to what the
> feedback says instead of just counting it.

The issue is whether this should stay on by default, and those are the
only opinions I count (after carefully reading everything people write
about the subject).

The strength of the opinion is not something that IMO can be reliably
taken into account, because of different writing styles different
people use, and because for most of us English is not their first
language.  The nuances of the wording can therefore be entirely
random.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12 23:57                               ` Juri Linkov
  2016-02-13  0:06                                 ` Drew Adams
@ 2016-02-13  8:49                                 ` Eli Zaretskii
  2016-02-13 17:20                                   ` Drew Adams
  1 sibling, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-13  8:49 UTC (permalink / raw)
  To: Juri Linkov; +Cc: ofv, emacs-devel

> From: Juri Linkov <juri@linkov.net>
> Cc: Óscar Fuentes <ofv@wanadoo.es>,  emacs-devel@gnu.org
> Date: Sat, 13 Feb 2016 01:57:33 +0200
> 
> Can't we somehow use the same char-folding as is implemented in
> ICU String Search Service (this is also used for search in Chromium):
> http://userguide.icu-project.org/collation/icu-string-search-service
> that supports matching of accented letters, conjoined letters,
> and ignorable punctuation.
> 
> As is described in http://userguide.icu-project.org/collation/concepts
> there are several levels of character matching:
> 
> 1. Primary Level: differences between base characters
> 
> 2. Secondary Level: Accents in the characters
> 
> 3. Tertiary Level: Upper and lower case differences in characters
> 
> 4. Quaternary Level: Punctuation is ignored (where e.g. snake-cased
>    “black_bird” matches camel-cased “blackBird”)
> 
> 5. Identical Level
> 
> Maybe our customization could provide options to choose
> between all these levels?

That's the final goal, yes.  The current implementation is just the
initial step, and it basically does just item #1.  (The list above is
about collation, not about searching, so the wording does not really
fit the searching use case.  Also, they just reiterate what the
Unicode TR#10, http://unicode.org/reports/tr10/, specifies.)

The implementation should really be on the C level, like the
case-folding support.  The current implementation isn't, and therefore
has several disadvantages some of which were already pointed out
(e.g., the regexp it uses that gets exposed in some situations and
causes users to be surprised).  For these and other reasons, I think
we should replace the current implementation with one that's in
search_buffer, driven by tables generated from the Unicode database.
I also think we will be unable to move to the higher levels mentioned
above without first moving the implementation into search_buffer.

Volunteers are welcome to work on that.  Doing this will eventually
require to use the data in DUCET (Default Unicode Collation Element
Table) and CLDR (Common Locale Data Repository), I think, to support
both the language-independent and language-dependent folding.  But
this is only needed for the next levels, the current level that
basically only looks at the base character doesn't need fancy
databases apart of what we already have.

At the time, no one stepped forward to do this on the C level, and the
current implementation was considered to be good-enough for the first
step.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-13  8:33             ` Eli Zaretskii
@ 2016-02-13 10:10               ` Markus Triska
  2016-02-13 10:21                 ` Eli Zaretskii
  0 siblings, 1 reply; 263+ messages in thread
From: Markus Triska @ 2016-02-13 10:10 UTC (permalink / raw)
  To: emacs-devel

Hi Eli,

Eli Zaretskii <eliz@gnu.org> writes:

> The issue is whether this should stay on by default, and those are the
> only opinions I count (after carefully reading everything people write
> about the subject).

Please count me in the "default should be off" category.

Thank you and all the best,
Markus




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-13 10:10               ` Markus Triska
@ 2016-02-13 10:21                 ` Eli Zaretskii
  0 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-13 10:21 UTC (permalink / raw)
  To: Markus Triska; +Cc: emacs-devel

> From: Markus Triska <triska@metalevel.at>
> Date: Sat, 13 Feb 2016 11:10:07 +0100
> 
> Please count me in the "default should be off" category.

Done.  Thanks.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12  8:44                     ` Eli Zaretskii
  2016-02-12 10:03                       ` Óscar Fuentes
@ 2016-02-13 15:32                       ` Richard Stallman
  2016-02-13 15:40                         ` Eli Zaretskii
  2016-02-13 16:37                       ` Marcin Borkowski
  2 siblings, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-13 15:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ofv, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > If ñ is meant to be read as ñ

  > Don't you see them displayed identically in Emacs (and in any other
  > program that correctly implements display of combining accents)?
  > Maybe I don't really understand that "if" part.

I am using Emacs on a Linux console.  I see them as two characters.
The first is n, and the second displays as a diamond.

I get the impression Emacs expects them to display as a single
character, though, because it messes up cursor positioning.
(Someone told me a variable to set to prevent that messing up,
but I failed to set it in .emacs and I don't remember its name now.
Does anyone recall?)

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-13 15:32                       ` Richard Stallman
@ 2016-02-13 15:40                         ` Eli Zaretskii
  2016-02-13 16:58                           ` Andreas Schwab
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-13 15:40 UTC (permalink / raw)
  To: rms, Kenichi Handa; +Cc: ofv, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> CC: ofv@wanadoo.es, emacs-devel@gnu.org
> Date: Sat, 13 Feb 2016 10:32:22 -0500
> 
>   > > If ñ is meant to be read as ñ
> 
>   > Don't you see them displayed identically in Emacs (and in any other
>   > program that correctly implements display of combining accents)?
>   > Maybe I don't really understand that "if" part.
> 
> I am using Emacs on a Linux console.  I see them as two characters.
> The first is n, and the second displays as a diamond.

Your console doesn't combine them into one.

> I get the impression Emacs expects them to display as a single
> character, though, because it messes up cursor positioning.
> (Someone told me a variable to set to prevent that messing up,
> but I failed to set it in .emacs and I don't remember its name now.
> Does anyone recall?)

It's auto-composition-mode.

I asked Handa-san (CC'ed) earlier whether we should turn off
auto-composition-mode on a TTY, but didn't get any responses.  Maybe I
will have better luck now.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-09 21:07     ` Óscar Fuentes
  2016-02-10  2:18       ` Artur Malabarba
@ 2016-02-13 16:32       ` Marcin Borkowski
  2016-02-13 16:47         ` Eli Zaretskii
  1 sibling, 1 reply; 263+ messages in thread
From: Marcin Borkowski @ 2016-02-13 16:32 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

On 2016-02-09, at 22:07, Óscar Fuentes <ofv@wanadoo.es> wrote:

> How typical for an Emacs user is to have to *search* (not write) for a
> composed character that he can not type with his input setup?

Please do not forget about use cases like mine.  I work for a journal,
and I do copyediting (among other things).  Situations where sloppy
authors write sometimes "Poincaré" and sometimes "Poincare" are not
rare.  Character folding is a blessing in such cases.

Best,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12  8:44                     ` Eli Zaretskii
  2016-02-12 10:03                       ` Óscar Fuentes
  2016-02-13 15:32                       ` Richard Stallman
@ 2016-02-13 16:37                       ` Marcin Borkowski
  2016-02-13 16:50                         ` Eli Zaretskii
  2 siblings, 1 reply; 263+ messages in thread
From: Marcin Borkowski @ 2016-02-13 16:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Óscar Fuentes, emacs-devel


On 2016-02-12, at 09:44, Eli Zaretskii <eliz@gnu.org> wrote:

> You mean, a match should happen, right?  Otherwise, I'm afraid I see
> no sense in this logic: IMO identically looking text should match, or
> else users will kill us.

What about, say "a" and "а"? ;-)

Best,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-12  1:50               ` Óscar Fuentes
  2016-02-12  7:10                 ` Eli Zaretskii
  2016-02-12 23:50                 ` Juri Linkov
@ 2016-02-13 16:38                 ` Marcin Borkowski
  2016-02-13 17:58                   ` Content navigation (was: On language-dependent defaults for character-folding) Óscar Fuentes
  2 siblings, 1 reply; 263+ messages in thread
From: Marcin Borkowski @ 2016-02-13 16:38 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel


On 2016-02-12, at 02:50, Óscar Fuentes <ofv@wanadoo.es> wrote:

>> Isearch shines in navigation.
>
> My opinion is that Isearch is terrible for navigation. You may be
> interested on ace-jump or avy, for jumping to a point that is visible,
> or a plethora of terrific packages for jumping to a point that is not
> visible.

I know this is a bit OT, but could you enumerate some of those packages?
I use avy, but I'd be interestedin navigating to places I don't see, too.

Best,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-10 15:25         ` Eli Zaretskii
  2016-02-10 21:17           ` Artur Malabarba
  2016-02-12 22:36           ` Per Starbäck
@ 2016-02-13 16:46           ` joakim
  2 siblings, 0 replies; 263+ messages in thread
From: joakim @ 2016-02-13 16:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ofv, bruce.connor.am, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Wed, 10 Feb 2016 02:18:03 +0000
>> From: Artur Malabarba <bruce.connor.am@gmail.com>
>> Cc: emacs-devel <emacs-devel@gnu.org>
>> 
>> > > I could be wrong, but I think you just defined all users. In the
>> > > Unicode standard used by Emacs, there are 5721 characters with a
>> > > “decomposition” property. Is there a user who is well-equiped to type
>> > > all of those characters?
>> > 
>> > (And how many of those 5721 characters can be matched from a latin
>> > letter?)
>> 
>> OK, I see what you meant.
>
> You do?  I don't, because the answer to Óscar's question is: 376 if we
> count only canonical decompositions (which we must support, or users
> will hate us), and a whopping 1449 if we count compatibility
> decompositions as well.  That's quite a few, I'd say, although AFAIR
> we don't find all of the compatibility decompositions under character
> folding, only some.
>
> Btw, from my POV, the ease of searching for characters not on my
> keyboard is not the main point of this feature.  The main feature is
> to search for similar characters.  (Of course, I don't mind if someone
> likes this for other reasons.)
>
>> Although currently Emacs does fold all decompositions by default, this is just temporary. We've said we would turn that off before release (and in fact I'll do that tomorrow (and ammend my post too)).
>
> We didn't say we will turn it off, we said we will _decide_ whether to
> turn it off.  So please don't turn it off just yet, we are still
> collecting feedback.  If anything, for now I counted more people who
> said they liked it than those who didn't (5 vs 9, by my count).  I'm
> not saying we should already decide to leave it on, but turning it off
> is certainly premature.  Less than two weeks have passed since the
> pretest began, there's no rush.

I like character folding, I write mainly in Swedish and English.

The mix of Swedish and English usually winds up being horrible, so
character folding helps finding things in source code where you are not sure if
Swedish characters have been guillotined or not (ÅÄÖ becomes AAO)

That said I think the question if something should be default or not
generates way too much warm air. I think ELPA should carry a number of
installable themes that present a coherent set of defaults.

So you could just install 'emacs-xtra-everything' from ELPA and get many
interesting features suitable for a fast machine. Or you could go with
'emacs-orthodoxy' which disables certain new settings.

(like for instance 'C-x M-o runs the command dired-omit-mode'. I didn't
like the newfangled C-x prefix. Otherwise I'm mostly positive to
newfangledness)


> Thanks.
>

-- 
Joakim Verona



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-13 16:32       ` On language-dependent defaults for character-folding Marcin Borkowski
@ 2016-02-13 16:47         ` Eli Zaretskii
  2016-02-13 17:03           ` Marcin Borkowski
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-13 16:47 UTC (permalink / raw)
  To: Marcin Borkowski; +Cc: ofv, emacs-devel

> From: Marcin Borkowski <mbork@mbork.pl>
> Date: Sat, 13 Feb 2016 17:32:36 +0100
> Cc: emacs-devel@gnu.org
> 
> Please do not forget about use cases like mine.  I work for a journal,
> and I do copyediting (among other things).  Situations where sloppy
> authors write sometimes "Poincaré" and sometimes "Poincare" are not
> rare.  Character folding is a blessing in such cases.

But in a previous message you said:

  For Polish texts, I would rather turn char folding off.

How to reconcile that with what you say above?



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-13 16:37                       ` Marcin Borkowski
@ 2016-02-13 16:50                         ` Eli Zaretskii
  2016-02-13 17:15                           ` Marcin Borkowski
  2016-02-14 13:59                           ` Richard Stallman
  0 siblings, 2 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-13 16:50 UTC (permalink / raw)
  To: Marcin Borkowski; +Cc: ofv, emacs-devel

> From: Marcin Borkowski <mbork@mbork.pl>
> Cc: Óscar Fuentes <ofv@wanadoo.es>, emacs-devel@gnu.org
> Date: Sat, 13 Feb 2016 17:37:48 +0100
> 
> 
> On 2016-02-12, at 09:44, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> > You mean, a match should happen, right?  Otherwise, I'm afraid I see
> > no sense in this logic: IMO identically looking text should match, or
> > else users will kill us.
> 
> What about, say "a" and "а"? ;-)

They don't look identical, and in any case, it should be clear they
should never match, except when specifically searching for so-called
"confusables".



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-13 15:40                         ` Eli Zaretskii
@ 2016-02-13 16:58                           ` Andreas Schwab
  2016-02-13 17:44                             ` Eli Zaretskii
  0 siblings, 1 reply; 263+ messages in thread
From: Andreas Schwab @ 2016-02-13 16:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ofv, Kenichi Handa, rms, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> I asked Handa-san (CC'ed) earlier whether we should turn off
> auto-composition-mode on a TTY, but didn't get any responses.

It depends on the terminal emulator.  Some implement composition (xterm,
konsole), others don't (linux console).

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-13 16:47         ` Eli Zaretskii
@ 2016-02-13 17:03           ` Marcin Borkowski
  0 siblings, 0 replies; 263+ messages in thread
From: Marcin Borkowski @ 2016-02-13 17:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ofv, emacs-devel


On 2016-02-13, at 17:47, Eli Zaretskii <eliz@gnu.org> wrote:

>> From: Marcin Borkowski <mbork@mbork.pl>
>> Date: Sat, 13 Feb 2016 17:32:36 +0100
>> Cc: emacs-devel@gnu.org
>> 
>> Please do not forget about use cases like mine.  I work for a journal,
>> and I do copyediting (among other things).  Situations where sloppy
>> authors write sometimes "Poincaré" and sometimes "Poincare" are not
>> rare.  Character folding is a blessing in such cases.
>
> But in a previous message you said:
>
>   For Polish texts, I would rather turn char folding off.
>
> How to reconcile that with what you say above?

Easily.  Different use-cases.  Usually I want to have it off, but when
I work in an article containing lots of foreign names *written not by
me*, I'd turn it on instantly.

Best,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-13 16:50                         ` Eli Zaretskii
@ 2016-02-13 17:15                           ` Marcin Borkowski
  2016-02-13 17:45                             ` Eli Zaretskii
  2016-02-13 17:46                             ` andres.ramirez
  2016-02-14 13:59                           ` Richard Stallman
  1 sibling, 2 replies; 263+ messages in thread
From: Marcin Borkowski @ 2016-02-13 17:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ofv, emacs-devel


On 2016-02-13, at 17:50, Eli Zaretskii <eliz@gnu.org> wrote:

>> From: Marcin Borkowski <mbork@mbork.pl>
>> Cc: Óscar Fuentes <ofv@wanadoo.es>, emacs-devel@gnu.org
>> Date: Sat, 13 Feb 2016 17:37:48 +0100
>> 
>> 
>> On 2016-02-12, at 09:44, Eli Zaretskii <eliz@gnu.org> wrote:
>> 
>> > You mean, a match should happen, right?  Otherwise, I'm afraid I see
>> > no sense in this logic: IMO identically looking text should match, or
>> > else users will kill us.
>> 
>> What about, say "a" and "а"? ;-)
>
> They don't look identical, and in any case, it should be clear they
> should never match, except when specifically searching for so-called
> "confusables".

Well, they look *exactly* identical on my Emacs.  I even C-x C-='d a few
times - still no difference.  And there are more pairs like this.

All this means it is way more complex than most people imagine.

Best,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 263+ messages in thread

* RE: On language-dependent defaults for character-folding
  2016-02-13  8:49                                 ` Eli Zaretskii
@ 2016-02-13 17:20                                   ` Drew Adams
  2016-02-13 17:58                                     ` Eli Zaretskii
  2016-02-13 18:15                                     ` Artur Malabarba
  0 siblings, 2 replies; 263+ messages in thread
From: Drew Adams @ 2016-02-13 17:20 UTC (permalink / raw)
  To: Eli Zaretskii, Juri Linkov; +Cc: ofv, emacs-devel

> The implementation should really be on the C level, like the
> case-folding support.  The current implementation isn't, and
> therefore has several disadvantages some of which were already
> pointed out (e.g., the regexp it uses that gets exposed in some
> situations and causes users to be surprised).

I would like to see a list of the disadvantages laid out clearly.

In general, I prefer that things be implemented in Lisp.
That leaves them far more open to Emacs users, and hence to
imagination and enhancement - which can often help Emacs
farther down the road.

Implementation in C makes great sense in some cases, but it
would help to see the detailed arguments (cases).

The argument that a complex, not-user-friendly, under-the-covers
regexp might sometimes get exposed to users is OK, but it is not
really compelling (for me).  Some users, in some case, might well
want to make use of such a regexp (e.g. tweaking it).  And we
might be able to find ways to not expose it for most uses.

(I don't reject the messy-regexp argument.  I just don't find it
sufficiently compelling on its own.)

> For these and other reasons, 

Can we see them, please?

> I also think we will be unable to move to the higher levels
> mentioned above without first moving the implementation into 
> search_buffer.

How so?  (Reasons.)

If there are important, e.g., performance reasons for coding
some functionality in C, can we at least try to limit it - do
that in component pieces rather than as a monolithic
take-it-or-leave-it whole?

I'm interested in maximizing what Lisp users can do with this,
other things being equal (IOW, use C only for what is absolutely
necessary).

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-13 16:58                           ` Andreas Schwab
@ 2016-02-13 17:44                             ` Eli Zaretskii
  0 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-13 17:44 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: ofv, handa, rms, emacs-devel

> From: Andreas Schwab <schwab@linux-m68k.org>
> Cc: rms@gnu.org,  Kenichi Handa <handa@gnu.org>,  ofv@wanadoo.es,  emacs-devel@gnu.org
> Date: Sat, 13 Feb 2016 17:58:37 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > I asked Handa-san (CC'ed) earlier whether we should turn off
> > auto-composition-mode on a TTY, but didn't get any responses.
> 
> It depends on the terminal emulator.  Some implement composition (xterm,
> konsole), others don't (linux console).

So I guess we need to be more selective.

But I think a more serious problem is that auto-composition-mode is
buffer-local, whereas we want it to be terminal-local.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-13 17:15                           ` Marcin Borkowski
@ 2016-02-13 17:45                             ` Eli Zaretskii
  2016-02-13 17:52                               ` Marcin Borkowski
  2016-02-13 17:46                             ` andres.ramirez
  1 sibling, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-13 17:45 UTC (permalink / raw)
  To: Marcin Borkowski; +Cc: ofv, emacs-devel

> From: Marcin Borkowski <mbork@mbork.pl>
> Cc: ofv@wanadoo.es, emacs-devel@gnu.org
> Date: Sat, 13 Feb 2016 18:15:35 +0100
> 
> >> What about, say "a" and "а"? ;-)
> >
> > They don't look identical, and in any case, it should be clear they
> > should never match, except when specifically searching for so-called
> > "confusables".
> 
> Well, they look *exactly* identical on my Emacs.  I even C-x C-='d a few
> times - still no difference.  And there are more pairs like this.
> 
> All this means it is way more complex than most people imagine.

Of course it is.  But the important thing is Emacs does TRT with this
(and other) aspects of this complexity.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-13 17:15                           ` Marcin Borkowski
  2016-02-13 17:45                             ` Eli Zaretskii
@ 2016-02-13 17:46                             ` andres.ramirez
  1 sibling, 0 replies; 263+ messages in thread
From: andres.ramirez @ 2016-02-13 17:46 UTC (permalink / raw)
  To: Marcin Borkowski; +Cc: ofv, Eli Zaretskii, emacs-devel

They Do not look the same on a linux virtual console. (Perhaps just in X they look the same)

BR
On Sat, 13 Feb 2016 12:15:35 -0500,
Marcin Borkowski wrote:
> >> What about, say "a" and "а"? ;-)
> >
> > They don't look identical, and in any case, it should be clear they
> > should never match, except when specifically searching for so-called
> > "confusables".
> 
> Well, they look *exactly* identical on my Emacs.  I even C-x C-='d a few
> times - still no difference.  And there are more pairs like this.
> 
> All this means it is way more complex than most people imagine.
> 



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-13 17:45                             ` Eli Zaretskii
@ 2016-02-13 17:52                               ` Marcin Borkowski
  0 siblings, 0 replies; 263+ messages in thread
From: Marcin Borkowski @ 2016-02-13 17:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ofv, emacs-devel


On 2016-02-13, at 18:45, Eli Zaretskii <eliz@gnu.org> wrote:

>> From: Marcin Borkowski <mbork@mbork.pl>
>> Cc: ofv@wanadoo.es, emacs-devel@gnu.org
>> Date: Sat, 13 Feb 2016 18:15:35 +0100
>> 
>> >> What about, say "a" and "а"? ;-)
>> >
>> > They don't look identical, and in any case, it should be clear they
>> > should never match, except when specifically searching for so-called
>> > "confusables".
>> 
>> Well, they look *exactly* identical on my Emacs.  I even C-x C-='d a few
>> times - still no difference.  And there are more pairs like this.
>> 
>> All this means it is way more complex than most people imagine.
>
> Of course it is.  But the important thing is Emacs does TRT with this
> (and other) aspects of this complexity.

Of course you're right.  (Though there exist rare cases where looking
for one /should/ find the other one.)  What I wanted to say is that this
is a counterexample to this sentence:

> identically looking text should match, or else users will kill us.

Best,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-13 17:20                                   ` Drew Adams
@ 2016-02-13 17:58                                     ` Eli Zaretskii
  2016-02-18 19:15                                       ` John Wiegley
  2016-02-13 18:15                                     ` Artur Malabarba
  1 sibling, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-13 17:58 UTC (permalink / raw)
  To: Drew Adams; +Cc: ofv, emacs-devel, juri

> Date: Sat, 13 Feb 2016 09:20:39 -0800 (PST)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: ofv@wanadoo.es, emacs-devel@gnu.org
> 
> > The implementation should really be on the C level, like the
> > case-folding support.  The current implementation isn't, and
> > therefore has several disadvantages some of which were already
> > pointed out (e.g., the regexp it uses that gets exposed in some
> > situations and causes users to be surprised).
> 
> I would like to see a list of the disadvantages laid out clearly.

They were mentioned in the discussions since this feature was designed
and till this day.  I'm sorry, but I have no time for searching and
summarizing them.  It isn't easier for me than for anyone else, and
doesn't require any specialized knowledge.

> In general, I prefer that things be implemented in Lisp.
> That leaves them far more open to Emacs users, and hence to
> imagination and enhancement - which can often help Emacs
> farther down the road.

Not in this case.  Search must be fast, it must support regular
expressions and complex character transformations, all of which cannot
be done well in Lisp, even if we expose buffer text to Lisp, something
we don't have today.

> Implementation in C makes great sense in some cases, but it
> would help to see the detailed arguments (cases).

These arguments were already given, you will find them in the
archives.

> The argument that a complex, not-user-friendly, under-the-covers
> regexp might sometimes get exposed to users is OK, but it is not
> really compelling (for me).  Some users, in some case, might well
> want to make use of such a regexp (e.g. tweaking it).

Users should tweak tables that tell Emacs how to fold characters, they
should not tweak the results of folding.  Like they do (if they do)
with case-tables today.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Content navigation (was: On language-dependent defaults for character-folding)
  2016-02-13 16:38                 ` Marcin Borkowski
@ 2016-02-13 17:58                   ` Óscar Fuentes
  0 siblings, 0 replies; 263+ messages in thread
From: Óscar Fuentes @ 2016-02-13 17:58 UTC (permalink / raw)
  To: emacs-devel; +Cc: help-gnu-emacs

Marcin Borkowski <mbork@mbork.pl> writes:

> On 2016-02-12, at 02:50, Óscar Fuentes <ofv@wanadoo.es> wrote:
>
>>> Isearch shines in navigation.
>>
>> My opinion is that Isearch is terrible for navigation. You may be
>> interested on ace-jump or avy, for jumping to a point that is visible,
>> or a plethora of terrific packages for jumping to a point that is not
>> visible.
>
> I know this is a bit OT, but could you enumerate some of those packages?
> I use avy, but I'd be interestedin navigating to places I don't see, too.

It all depends on personal preferences, will to get accustomed to new
ways of doing things, etc. It also depends on the type of content you
work with (code, plain text, org files...)

You can start looking at what Emacs provides out of the box: registers,
the mark ring, imenu... Also modes that hide the content you don't care
about: hide-show mode, narrow to region... smaller content, easier
navigation.

A direct relacement for Isearch which is much more adequate for
navigation (and searching in general) is Swiper.

There are packages for quickly visiting special places, such as
goto-change for jumping to the edited sites.

Packages that depend on more or less specialized info provided by ctags
and similar analyzers. More sophisticated ones such as Semantic,
Clang...

In the end, the key parts are how the information is managed by the
search mechanism (from simple character sequences to tokens with
attached meaning), the match system that links your input to candidate
targets and the UI that shows those candidates and allows you to jump to
them.

Personally, I use registers, goto-change, TAGS tables plus etags and
probably something more that I can't remember right now. For the
completion system and UI, ido with the flx matching algorithm.
flx-isearch is much more convenient than Isearch for searching for
identifiers on my code.

Instead of ido other people use helm or ivy as completion systems (ivy
comes with swiper.)

This is just scratching the surface. I'm sure that I'm omitting many
interesting packages. Others can chime in with their favourite packages.

Follow ups set to emacs-help.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* RE: On language-dependent defaults for character-folding
  2016-02-13 17:20                                   ` Drew Adams
  2016-02-13 17:58                                     ` Eli Zaretskii
@ 2016-02-13 18:15                                     ` Artur Malabarba
  2016-02-13 18:26                                       ` Drew Adams
  1 sibling, 1 reply; 263+ messages in thread
From: Artur Malabarba @ 2016-02-13 18:15 UTC (permalink / raw)
  To: Drew Adams; +Cc: Óscar Fuentes, Eli Zaretskii, Juri Linkov, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1138 bytes --]

On 13 Feb 2016 3:20 pm, "Drew Adams" <drew.adams@oracle.com> wrote:
>
> > The implementation should really be on the C level, like the
> > case-folding support.  The current implementation isn't, and
> > therefore has several disadvantages some of which were already
> > pointed out...
>
> I would like to see a list of the disadvantages laid out clearly.

See a thread here called “Char-folding: how can we implement matching
multiple characters as a single "thing"?”.

In summary, char folding was generating regexps that were too long for
Emacs to handle.

The best solution we reached was to make char folding dumber, so that the
resulting regexps wouldn't grow exponentially.

The C-level implementations of char folding that have been discussed
wouldn't have this problem because they wouldn't need regexps.

Even with the current solution, char folding can still produce too long
regexps if the input string is very long (which it handles by falling back
on regular search).

A second disadvantage is that you can't do char folding for regexp searches
(though I can't tell how common that would be).

[-- Attachment #2: Type: text/html, Size: 1404 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* RE: On language-dependent defaults for character-folding
  2016-02-13 18:15                                     ` Artur Malabarba
@ 2016-02-13 18:26                                       ` Drew Adams
  0 siblings, 0 replies; 263+ messages in thread
From: Drew Adams @ 2016-02-13 18:26 UTC (permalink / raw)
  To: bruce.connor.am
  Cc: Óscar Fuentes, Eli Zaretskii, Juri Linkov, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1133 bytes --]

> > The implementation should really be on the C level, like the
> > case-folding support.  The current implementation isn't, and
> > therefore has several disadvantages some of which were already
> > pointed out...
>
> I would like to see a list of the disadvantages laid out clearly.

See a thread here called “Char-folding: how can we implement matching multiple characters as a single "thing"?”. 

In summary, char folding was generating regexps that were too long for Emacs to handle. 

The best solution we reached was to make char folding dumber, so that the resulting regexps wouldn't grow exponentially.

The C-level implementations of char folding that have been discussed wouldn't have this problem because they wouldn't need regexps.

Even with the current solution, char folding can still produce too long regexps if the input string is very long (which it handles by falling back on regular search). 

A second disadvantage is that you can't do char folding for regexp searches (though I can't tell how common that would be). 

Yes, I read that part of the thread. But thanks for the reminder.

[-- Attachment #2: Type: text/html, Size: 3361 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-13  0:33                   ` Óscar Fuentes
@ 2016-02-14 13:57                     ` Richard Stallman
  2016-02-14 14:27                       ` Óscar Fuentes
  0 siblings, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-14 13:57 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > What I find flabbergasting is the insistence on ignoring the "some
  > cases will be regarded as glaring bugs" part.

Might that depend on what we say the feature is supposed to do?

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-13 16:50                         ` Eli Zaretskii
  2016-02-13 17:15                           ` Marcin Borkowski
@ 2016-02-14 13:59                           ` Richard Stallman
  1 sibling, 0 replies; 263+ messages in thread
From: Richard Stallman @ 2016-02-14 13:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ofv, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > What about, say "a" and "а"? ;-)

  > They don't look identical,

They look identical on my tty.
-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-14 13:57                     ` Richard Stallman
@ 2016-02-14 14:27                       ` Óscar Fuentes
  2016-02-15 10:28                         ` Richard Stallman
  0 siblings, 1 reply; 263+ messages in thread
From: Óscar Fuentes @ 2016-02-14 14:27 UTC (permalink / raw)
  To: Richard Stallman; +Cc: emacs-devel

Richard Stallman <rms@gnu.org> writes:

>   > What I find flabbergasting is the insistence on ignoring the "some
>   > cases will be regarded as glaring bugs" part.
>
> Might that depend on what we say the feature is supposed to do?

I'm disputing its default status, not the feature itself. Apparently,
some people here think that the feature should be enabled by default. A
search mechanism that matches unrelated letters, no less!



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-14 14:27                       ` Óscar Fuentes
@ 2016-02-15 10:28                         ` Richard Stallman
  2016-02-15 12:31                           ` Óscar Fuentes
  0 siblings, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-15 10:28 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I'm disputing its default status, not the feature itself. Apparently,
  > some people here think that the feature should be enabled by default. A
  > search mechanism that matches unrelated letters, no less!

There is no a priori reason why it should be on by default, or why it
should be off by default.  It is just a matter of what most users
prefer.

You've made it clear you prefer off by default.  Maybe most users
agree with you.  I don't know.

But there is no a priori reason why searching for n should not find ñ.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-15 10:28                         ` Richard Stallman
@ 2016-02-15 12:31                           ` Óscar Fuentes
  2016-02-15 17:45                             ` Richard Stallman
  0 siblings, 1 reply; 263+ messages in thread
From: Óscar Fuentes @ 2016-02-15 12:31 UTC (permalink / raw)
  To: Richard Stallman; +Cc: emacs-devel

Richard Stallman <rms@gnu.org> writes:

>   > I'm disputing its default status, not the feature itself. Apparently,
>   > some people here think that the feature should be enabled by default. A
>   > search mechanism that matches unrelated letters, no less!
>
> There is no a priori reason why it should be on by default, or why it
> should be off by default.  It is just a matter of what most users
> prefer.
>
> You've made it clear you prefer off by default.  Maybe most users
> agree with you.  I don't know.
>
> But there is no a priori reason why searching for n should not find ñ.

I've mentioned several times yet that, for someone who was educated on
Spanish, searching for n and finding ñ is no different than searching
for v and finding w. There are similar cases on other languages.

This looks like a strong a priori reason to me.

My point was repeated ad nauseam. I'll stop arguing about this issue
now.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-15 12:31                           ` Óscar Fuentes
@ 2016-02-15 17:45                             ` Richard Stallman
  2016-02-16 13:54                               ` Elias Mårtenson
  2016-02-16 14:30                               ` Per Starbäck
  0 siblings, 2 replies; 263+ messages in thread
From: Richard Stallman @ 2016-02-15 17:45 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I've mentioned several times yet that, for someone who was educated on
  > Spanish, searching for n and finding ñ is no different than searching
  > for v and finding w.

Whether searching for v should also find w is not a question of principle.
It's a question of what is convenient for users.
-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-15 17:45                             ` Richard Stallman
@ 2016-02-16 13:54                               ` Elias Mårtenson
  2016-02-16 14:30                               ` Per Starbäck
  1 sibling, 0 replies; 263+ messages in thread
From: Elias Mårtenson @ 2016-02-16 13:54 UTC (permalink / raw)
  To: rms; +Cc: Óscar Fuentes, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 606 bytes --]

On 16 Feb 2016 1:46 a.m., "Richard Stallman" <rms@gnu.org> wrote:
>
>   > I've mentioned several times yet that, for someone who was educated on
>   > Spanish, searching for n and finding ñ is no different than searching
>   > for v and finding w.
>
> Whether searching for v should also find w is not a question of principle.
> It's a question of what is convenient for users.

In Swedish, this would be useful indeed. Up until recently V and W even
sorted together in the dictionary.

Anyway, I will also follow Óscar's lead and not post anything more on this
subject.

Regards,
Elias

[-- Attachment #2: Type: text/html, Size: 800 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-15 17:45                             ` Richard Stallman
  2016-02-16 13:54                               ` Elias Mårtenson
@ 2016-02-16 14:30                               ` Per Starbäck
  2016-02-16 19:32                                 ` Ken Brown
  2016-02-17  8:00                                 ` Joost Kremers
  1 sibling, 2 replies; 263+ messages in thread
From: Per Starbäck @ 2016-02-16 14:30 UTC (permalink / raw)
  To: rms; +Cc: Óscar Fuentes, emacs-devel@gnu.org

>   > I've mentioned several times yet that, for someone who was educated on
>   > Spanish, searching for n and finding ñ is no different than searching
>   > for v and finding w.
>
> Whether searching for v should also find w is not a question of principle.
> It's a question of what is convenient for users.

Sure, we can avoid formulating principles that explain the
regularities in what is convenient or not, but then it's also a
question of *how much* inconvenient it is. Having a search for "i"
also find "j" would for most users be very inconvenient, to the point
that it would be seen as a bug. But for someone using classical Latin
it could be convenient.

Even *if* classical Latin was really big today that search behaviour
would still be seen as a bug by those using for example English. If
60% used classical Latin and 40% used English we shouldn't just count
the numbers and conclude that the i/j search thing would be a good
thing to activate by default. Something seen as a glaring bug must
weigh more than just convenience.

###   Searching for "n" and finding "ñ" in Spanish, or searching for
"a" and finding "ä" in Swedish
###   are just as strange as searching for "i" and finding "j" in English.

It's as if many people on this list just won't believe that statement,
which is very frustrating.  It feels a bit like reporting that a
particular feature that is useful otherwise makes the computer explode
if you use it in Utah or Nevada and getting the answer that a recent
count concluded that the feature is convenient for most users.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-16 14:30                               ` Per Starbäck
@ 2016-02-16 19:32                                 ` Ken Brown
  2016-02-16 23:49                                   ` Lars Ingebrigtsen
  2016-02-18  8:57                                   ` Alan Mackenzie
  2016-02-17  8:00                                 ` Joost Kremers
  1 sibling, 2 replies; 263+ messages in thread
From: Ken Brown @ 2016-02-16 19:32 UTC (permalink / raw)
  To: Per Starbäck, rms; +Cc: Óscar Fuentes, emacs-devel@gnu.org

On 2/16/2016 9:30 AM, Per Starbäck wrote:
> ###   Searching for "n" and finding "ñ" in Spanish, or searching for
> "a" and finding "ä" in Swedish
> ###   are just as strange as searching for "i" and finding "j" in English.
>
> It's as if many people on this list just won't believe that statement,
> which is very frustrating.

I've been following this discussion, and I haven't seen any indication 
that people don't believe that statement.  What I have seen is 
disagreement about its importance.  I've also seen several people say 
that we should wait for more feedback from pretesters before deciding 
what the default should be.  What's the harm in that?

Ken



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-16 19:32                                 ` Ken Brown
@ 2016-02-16 23:49                                   ` Lars Ingebrigtsen
  2016-02-17 16:03                                     ` Richard Stallman
  2016-02-18  8:57                                   ` Alan Mackenzie
  1 sibling, 1 reply; 263+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-16 23:49 UTC (permalink / raw)
  To: Ken Brown; +Cc: Óscar Fuentes, Per Starbäck, rms, emacs-devel@gnu.org

Ken Brown <kbrown@cornell.edu> writes:

> On 2/16/2016 9:30 AM, Per Starbäck wrote:
>> ###   Searching for "n" and finding "ñ" in Spanish, or searching for
>> "a" and finding "ä" in Swedish
>> ###   are just as strange as searching for "i" and finding "j" in English.
>>
>> It's as if many people on this list just won't believe that statement,
>> which is very frustrating.
>
> I've been following this discussion, and I haven't seen any indication
> that people don't believe that statement.  What I have seen is
> disagreement about its importance.

Yeah, it seems that people think it's unimportant that if you search for
"i", Emacs will find "j" instead.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-16 14:30                               ` Per Starbäck
  2016-02-16 19:32                                 ` Ken Brown
@ 2016-02-17  8:00                                 ` Joost Kremers
  2016-02-17 15:34                                   ` Eli Zaretskii
  1 sibling, 1 reply; 263+ messages in thread
From: Joost Kremers @ 2016-02-17  8:00 UTC (permalink / raw)
  To: Per Starbäck; +Cc: Óscar Fuentes, rms, emacs-devel@gnu.org

On Tue, Feb 16 2016, Per Starbäck <per.starback@gmail.com> wrote:
> ###   Searching for "n" and finding "ñ" in Spanish, or searching for
> "a" and finding "ä" in Swedish
> ###   are just as strange as searching for "i" and finding "j" in English.
>
> It's as if many people on this list just won't believe that statement,
> which is very frustrating.

My impression of this thread is that people *do* understand the
importance of making char-folding language-dependent and that the
maintainers hope to implement this at some point in the future. For
various reasons, however, it is not possible to do so now.

The general opinion is also that char-folding is nonetheless useful to
many users, despite the fact that it will generate incorrect results in
some languages. The only question that needs to be answered right now is
whether the feature will be turned on or off by default. And on that
point, the tendency seems to be to have it off by default, with the
ability to toggle it within an i-search.

-- 
Joost Kremers
Life has its moments

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-17  8:00                                 ` Joost Kremers
@ 2016-02-17 15:34                                   ` Eli Zaretskii
  2016-02-17 18:30                                     ` Achim Gratz
                                                       ` (3 more replies)
  0 siblings, 4 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-17 15:34 UTC (permalink / raw)
  To: Joost Kremers; +Cc: ofv, per.starback, rms, emacs-devel

> From: Joost Kremers <joostkremers@fastmail.fm>
> Date: Wed, 17 Feb 2016 09:00:02 +0100
> Cc: Óscar Fuentes <ofv@wanadoo.es>, rms@gnu.org,
> 	"emacs-devel@gnu.org" <emacs-devel@gnu.org>
> 
> The general opinion is also that char-folding is nonetheless useful to
> many users, despite the fact that it will generate incorrect results in
> some languages. The only question that needs to be answered right now is
> whether the feature will be turned on or off by default. And on that
> point, the tendency seems to be to have it off by default, with the
> ability to toggle it within an i-search.

Actually, my counts indicate that more people want it on by default
than off.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-16 23:49                                   ` Lars Ingebrigtsen
@ 2016-02-17 16:03                                     ` Richard Stallman
  0 siblings, 0 replies; 263+ messages in thread
From: Richard Stallman @ 2016-02-17 16:03 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: ofv, per.starback, kbrown, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Yeah, it seems that people think it's unimportant that if you search for
  > "i", Emacs will find "j" instead.

My point is that this isn't a matter of principal.
It is a practical question.  

I would consider that a misfeature, for my actual editing;
but I might like it if I were editing Latin.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-17 15:34                                   ` Eli Zaretskii
@ 2016-02-17 18:30                                     ` Achim Gratz
  2016-02-17 19:30                                       ` Eli Zaretskii
  2016-02-17 20:26                                       ` Marcin Borkowski
  2016-02-17 20:06                                     ` Joost Kremers
                                                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 263+ messages in thread
From: Achim Gratz @ 2016-02-17 18:30 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii writes:
>> The general opinion is also that char-folding is nonetheless useful to
>> many users, despite the fact that it will generate incorrect results in
>> some languages. The only question that needs to be answered right now is
>> whether the feature will be turned on or off by default. And on that
>> point, the tendency seems to be to have it off by default, with the
>> ability to toggle it within an i-search.
>
> Actually, my counts indicate that more people want it on by default
> than off.

Well, if you're already counting, I don't want it on by default.

I do have potential uses for the feature, but it must be switchable on
the spot, when and where I need it.  Even a mode-based customization
seems too heavy-handed to me, at least in the modes I envision to work
most of the time.

Allow me to make a general remark towards the trend lately to "let's
switch on every newfangled feature by default because it can be switched
off via customization".  I'm quite sure I can't be the only one who
regularly has to work on new systems or accounts that only offer a stock
Emacs.  It is simply not possible to always figure out which Emacs
version is installed and then spending the next half hour customizing it
(if that's even allowed).  So please keep the stock Emacs settings
stable.

Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptation for Waldorf rackAttack V1.04R1:
http://Synth.Stromeko.net/Downloads.html#WaldorfSDada

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-17 18:30                                     ` Achim Gratz
@ 2016-02-17 19:30                                       ` Eli Zaretskii
  2016-02-17 20:26                                       ` Marcin Borkowski
  1 sibling, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-17 19:30 UTC (permalink / raw)
  To: Achim Gratz; +Cc: emacs-devel

> From: Achim Gratz <Stromeko@nexgo.de>
> Date: Wed, 17 Feb 2016 19:30:09 +0100
> 
> Eli Zaretskii writes:
> >> The general opinion is also that char-folding is nonetheless useful to
> >> many users, despite the fact that it will generate incorrect results in
> >> some languages. The only question that needs to be answered right now is
> >> whether the feature will be turned on or off by default. And on that
> >> point, the tendency seems to be to have it off by default, with the
> >> ability to toggle it within an i-search.
> >
> > Actually, my counts indicate that more people want it on by default
> > than off.
> 
> Well, if you're already counting, I don't want it on by default.

I'm counting because that's what we all wanted: a poll, or some
approximation of it.  How else can a poll be summarized, if the
numbers of those for and against are not known?

> it must be switchable on the spot, when and where I need it.

It is, please see the documentation.  You can turn it on and off for a
particular search (during the search), and you can do that for the
next searches.

> Allow me to make a general remark towards the trend lately to "let's
> switch on every newfangled feature by default because it can be switched
> off via customization".

There's no such trend, AFAIK.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-17 15:34                                   ` Eli Zaretskii
  2016-02-17 18:30                                     ` Achim Gratz
@ 2016-02-17 20:06                                     ` Joost Kremers
  2016-02-17 20:15                                       ` Eli Zaretskii
  2016-02-17 22:53                                     ` Mark Oteiza
  2016-02-18 16:30                                     ` Richard Stallman
  3 siblings, 1 reply; 263+ messages in thread
From: Joost Kremers @ 2016-02-17 20:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ofv, per.starback, rms, emacs-devel


On Wed, Feb 17 2016, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Joost Kremers <joostkremers@fastmail.fm>
>> Date: Wed, 17 Feb 2016 09:00:02 +0100
>> Cc: Óscar Fuentes <ofv@wanadoo.es>, rms@gnu.org,
>> 	"emacs-devel@gnu.org" <emacs-devel@gnu.org>
>> 
>> The general opinion is also that char-folding is nonetheless useful to
>> many users, despite the fact that it will generate incorrect results in
>> some languages. The only question that needs to be answered right now is
>> whether the feature will be turned on or off by default. And on that
>> point, the tendency seems to be to have it off by default, with the
>> ability to toggle it within an i-search.
>
> Actually, my counts indicate that more people want it on by default
> than off.

Then put me down for an "off". :-)

(Is this a binding referendum?)

-- 
Joost Kremers
Life has its moments



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-17 20:06                                     ` Joost Kremers
@ 2016-02-17 20:15                                       ` Eli Zaretskii
  2016-02-17 22:58                                         ` Ken Brown
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-17 20:15 UTC (permalink / raw)
  To: Joost Kremers; +Cc: ofv, per.starback, rms, emacs-devel

> From: Joost Kremers <joostkremers@fastmail.fm>
> Cc: ofv@wanadoo.es, per.starback@gmail.com, rms@gnu.org, emacs-devel@gnu.org
> Date: Wed, 17 Feb 2016 21:06:11 +0100
> 
> > Actually, my counts indicate that more people want it on by default
> > than off.
> 
> Then put me down for an "off". :-)

Done.

> (Is this a binding referendum?)

Yes, of course.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-17 18:30                                     ` Achim Gratz
  2016-02-17 19:30                                       ` Eli Zaretskii
@ 2016-02-17 20:26                                       ` Marcin Borkowski
  1 sibling, 0 replies; 263+ messages in thread
From: Marcin Borkowski @ 2016-02-17 20:26 UTC (permalink / raw)
  To: Achim Gratz; +Cc: emacs-devel


On 2016-02-17, at 19:30, Achim Gratz <Stromeko@nexgo.de> wrote:

> I do have potential uses for the feature, but it must be switchable on
> the spot, when and where I need it.  Even a mode-based customization
> seems too heavy-handed to me, at least in the modes I envision to work
> most of the time.

+1.

Actually, this was /very/ useful for me just a few hours ago.  I would
like to thank all the involved for this feature!

Still, I think the default should be "off".

Best,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-17 15:34                                   ` Eli Zaretskii
  2016-02-17 18:30                                     ` Achim Gratz
  2016-02-17 20:06                                     ` Joost Kremers
@ 2016-02-17 22:53                                     ` Mark Oteiza
  2016-02-18  0:11                                       ` Juri Linkov
  2016-02-18 17:46                                       ` Eli Zaretskii
  2016-02-18 16:30                                     ` Richard Stallman
  3 siblings, 2 replies; 263+ messages in thread
From: Mark Oteiza @ 2016-02-17 22:53 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Joost Kremers <joostkremers@fastmail.fm>
>> Date: Wed, 17 Feb 2016 09:00:02 +0100
>> Cc: Óscar Fuentes <ofv@wanadoo.es>, rms@gnu.org,
>> 	"emacs-devel@gnu.org" <emacs-devel@gnu.org>
>> 
>> The general opinion is also that char-folding is nonetheless useful to
>> many users, despite the fact that it will generate incorrect results in
>> some languages. The only question that needs to be answered right now is
>> whether the feature will be turned on or off by default. And on that
>> point, the tendency seems to be to have it off by default, with the
>> ability to toggle it within an i-search.
>
> Actually, my counts indicate that more people want it on by default
> than off.

I didn't know what character folding was before this was implemented in
Emacs, and AFAICT the only other thing I happen to have installed that
does this is Chromium.

While it's a neat feature, it should default to off. I hope it becomes
more customizable w.r.t. the arguments against char-folding's current
behavior. It appears that char-folding's dependence on elisp
regex is a crutch.

Long PS: I think the news items in "** Search and Replace" need to be
clearer. In particular:

- *** New user option ... should perhaps mention character-fold-to-regexp if
  that ends up being the default
- *** `isearch' and ... should mention how to disable/enable character
  folding for isearch, whatever the default ends up being
- *** New function ... should mention that it is to be added to
  `search-default-regexp-mode'

To me, these appear to be completely disjoint despite having everything
to do with char-folding.  I think one would have to know how isearch
actually works in order to put it together from reading the NEWS as it
is currently.

I'd be happy to make the changes, but that requires knowing what the
default will be.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-17 20:15                                       ` Eli Zaretskii
@ 2016-02-17 22:58                                         ` Ken Brown
  2016-02-18  0:03                                           ` Vinicius Latorre
                                                             ` (3 more replies)
  0 siblings, 4 replies; 263+ messages in thread
From: Ken Brown @ 2016-02-17 22:58 UTC (permalink / raw)
  To: Eli Zaretskii, Joost Kremers; +Cc: ofv, per.starback, rms, emacs-devel

On 2/17/2016 3:15 PM, Eli Zaretskii wrote:
>> From: Joost Kremers <joostkremers@fastmail.fm>
>> Cc: ofv@wanadoo.es, per.starback@gmail.com, rms@gnu.org, emacs-devel@gnu.org
>> Date: Wed, 17 Feb 2016 21:06:11 +0100
>>
>>> Actually, my counts indicate that more people want it on by default
>>> than off.
>>
>> Then put me down for an "off". :-)
>
> Done.

I wrote earlier with positive feedback about character folding, but I 
didn't express an opinion about what the default should be.  I'm now 
ready to cast my vote for having it on by default.

My reason is that I think many users are likely to find character 
folding useful, but they are unlikely to discover that it exists if it 
is not on by default.  I have read the claims that character folding in 
its present form will be viewed as a bug by speakers of certain 
languages.  But I think the possible benefits to others outweigh the 
possible harm done to those who initially think it's a bug.

Ken

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-17 22:58                                         ` Ken Brown
@ 2016-02-18  0:03                                           ` Vinicius Latorre
  2016-02-18 17:29                                             ` Eli Zaretskii
  2016-02-18  4:55                                           ` Marcin Borkowski
                                                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 263+ messages in thread
From: Vinicius Latorre @ 2016-02-18  0:03 UTC (permalink / raw)
  To: Ken Brown
  Cc: ofv, rms, Joost Kremers, per.starback, emacs-devel, Eli Zaretskii

[-- Attachment #1: Type: text/plain, Size: 1112 bytes --]

My vote is off by default.



On Wed, Feb 17, 2016 at 8:58 PM, Ken Brown <kbrown@cornell.edu> wrote:

> On 2/17/2016 3:15 PM, Eli Zaretskii wrote:
>
>> From: Joost Kremers <joostkremers@fastmail.fm>
>>> Cc: ofv@wanadoo.es, per.starback@gmail.com, rms@gnu.org,
>>> emacs-devel@gnu.org
>>> Date: Wed, 17 Feb 2016 21:06:11 +0100
>>>
>>> Actually, my counts indicate that more people want it on by default
>>>> than off.
>>>>
>>>
>>> Then put me down for an "off". :-)
>>>
>>
>> Done.
>>
>
> I wrote earlier with positive feedback about character folding, but I
> didn't express an opinion about what the default should be.  I'm now ready
> to cast my vote for having it on by default.
>
> My reason is that I think many users are likely to find character folding
> useful, but they are unlikely to discover that it exists if it is not on by
> default.  I have read the claims that character folding in its present form
> will be viewed as a bug by speakers of certain languages.  But I think the
> possible benefits to others outweigh the possible harm done to those who
> initially think it's a bug.
>
> Ken
>
>
>

[-- Attachment #2: Type: text/html, Size: 2173 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-17 22:53                                     ` Mark Oteiza
@ 2016-02-18  0:11                                       ` Juri Linkov
  2016-02-18  0:20                                         ` Mark Oteiza
  2016-02-18  4:53                                         ` Marcin Borkowski
  2016-02-18 17:46                                       ` Eli Zaretskii
  1 sibling, 2 replies; 263+ messages in thread
From: Juri Linkov @ 2016-02-18  0:11 UTC (permalink / raw)
  To: Mark Oteiza; +Cc: emacs-devel

> I didn't know what character folding was before this was implemented in
> Emacs, and AFAICT the only other thing I happen to have installed that
> does this is Chromium.

How come char-folding is on by default in Chromium,
and yet nobody has a problem with that?



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-18  0:11                                       ` Juri Linkov
@ 2016-02-18  0:20                                         ` Mark Oteiza
  2016-02-18 17:28                                           ` Eli Zaretskii
  2016-02-18  4:53                                         ` Marcin Borkowski
  1 sibling, 1 reply; 263+ messages in thread
From: Mark Oteiza @ 2016-02-18  0:20 UTC (permalink / raw)
  To: Juri Linkov; +Cc: emacs-devel

On 18/02/16 at 02:11am, Juri Linkov wrote:
> > I didn't know what character folding was before this was implemented in
> > Emacs, and AFAICT the only other thing I happen to have installed that
> > does this is Chromium.
> 
> How come char-folding is on by default in Chromium,
> and yet nobody has a problem with that?

Apparently it has just been an open issue for six years.
https://code.google.com/p/chromium/issues/detail?id=31609




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-18  0:11                                       ` Juri Linkov
  2016-02-18  0:20                                         ` Mark Oteiza
@ 2016-02-18  4:53                                         ` Marcin Borkowski
  2016-02-18 17:07                                           ` Elias Mårtenson
  1 sibling, 1 reply; 263+ messages in thread
From: Marcin Borkowski @ 2016-02-18  4:53 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Mark Oteiza, emacs-devel


On 2016-02-18, at 01:11, Juri Linkov <juri@linkov.net> wrote:

>> I didn't know what character folding was before this was implemented in
>> Emacs, and AFAICT the only other thing I happen to have installed that
>> does this is Chromium.
>
> How come char-folding is on by default in Chromium,
> and yet nobody has a problem with that?

Well, nobody has a problem with the fact that Chromium does not have
anything like query-replace, either.

;-)

Best,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-17 22:58                                         ` Ken Brown
  2016-02-18  0:03                                           ` Vinicius Latorre
@ 2016-02-18  4:55                                           ` Marcin Borkowski
  2016-02-18 11:26                                           ` Filipp Gunbin
  2016-02-18 17:30                                           ` Eli Zaretskii
  3 siblings, 0 replies; 263+ messages in thread
From: Marcin Borkowski @ 2016-02-18  4:55 UTC (permalink / raw)
  To: Ken Brown
  Cc: ofv, rms, Joost Kremers, per.starback, emacs-devel, Eli Zaretskii


On 2016-02-17, at 23:58, Ken Brown <kbrown@cornell.edu> wrote:

> My reason is that I think many users are likely to find character 
> folding useful, but they are unlikely to discover that it exists if it 
> is not on by default.  [...]

Wait, you mean that you suspect they will not read the manual
back-to-back (or NEWS, if they are already Emacs users)‽

> Ken

Best,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-16 19:32                                 ` Ken Brown
  2016-02-16 23:49                                   ` Lars Ingebrigtsen
@ 2016-02-18  8:57                                   ` Alan Mackenzie
  2016-02-18 17:27                                     ` Eli Zaretskii
  1 sibling, 1 reply; 263+ messages in thread
From: Alan Mackenzie @ 2016-02-18  8:57 UTC (permalink / raw)
  To: Ken Brown
  Cc: Óscar Fuentes, Per Starbäck, Eli Zaretskii, rms,
	emacs-devel

Hello, Ken.

Sorry if I'm piggy-backing on your post, a bit.

On Tue, Feb 16, 2016 at 02:32:44PM -0500, Ken Brown wrote:
> On 2/16/2016 9:30 AM, Per Starbäck wrote:
> > ###   Searching for "n" and finding "ñ" in Spanish, or searching for
> > "a" and finding "ä" in Swedish
> > ###   are just as strange as searching for "i" and finding "j" in English.
> >
> > It's as if many people on this list just won't believe that statement,
> > which is very frustrating.

> I've been following this discussion, and I haven't seen any indication 
> that people don't believe that statement.  What I have seen is 
> disagreement about its importance.  I've also seen several people say 
> that we should wait for more feedback from pretesters before deciding 
> what the default should be.  What's the harm in that?

What I see is a feature that, while important, is not yet ready for
prime time.  It irritates, at the very least, native speakers of Swedish
and Spanish; it is now clear it needs to be configurable for the user's
language.  It also makes clumsy and inappropriate use of regular
expressions; I think it's generally acknowledged we need to move much of
the implementation from Lisp to C.

In short, character folding as it currently is is really in an
experimental stage.  I therefore vote for it to be disabled by default.

> Ken

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-17 22:58                                         ` Ken Brown
  2016-02-18  0:03                                           ` Vinicius Latorre
  2016-02-18  4:55                                           ` Marcin Borkowski
@ 2016-02-18 11:26                                           ` Filipp Gunbin
  2016-02-18 17:26                                             ` Eli Zaretskii
  2016-02-18 17:30                                           ` Eli Zaretskii
  3 siblings, 1 reply; 263+ messages in thread
From: Filipp Gunbin @ 2016-02-18 11:26 UTC (permalink / raw)
  To: emacs-devel

I think the default should be "on" only when we have documented and
stable logic (even if the implementation has bugs) that is not going to
change much from version to version.

Otherwise, people who switch versions often (as Achim wrote earlier)
will be confused.

Probably that will not just be "on", but some chosen default strategy
among alternatives, maybe something similar to default minibuffer
completion strategies set, with additional strategies available.

So I'm for "off" now.

Filipp

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-17 15:34                                   ` Eli Zaretskii
                                                       ` (2 preceding siblings ...)
  2016-02-17 22:53                                     ` Mark Oteiza
@ 2016-02-18 16:30                                     ` Richard Stallman
  2016-02-18 17:07                                       ` Eli Zaretskii
  3 siblings, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-18 16:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: joostkremers, per.starback, ofv, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > And on that
  > > point, the tendency seems to be to have it off by default, with the
  > > ability to toggle it within an i-search.

  > Actually, my counts indicate that more people want it on by default
  > than off.

We should think about why people want what they want, and how much
the feature helps or hurts them -- not just count people.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-18  4:53                                         ` Marcin Borkowski
@ 2016-02-18 17:07                                           ` Elias Mårtenson
  2016-02-18 17:21                                             ` Eli Zaretskii
  2016-02-19 20:47                                             ` Marcin Borkowski
  0 siblings, 2 replies; 263+ messages in thread
From: Elias Mårtenson @ 2016-02-18 17:07 UTC (permalink / raw)
  To: Marcin Borkowski; +Cc: Mark Oteiza, emacs-devel, Juri Linkov

[-- Attachment #1: Type: text/plain, Size: 741 bytes --]

On 18 February 2016 at 12:53, Marcin Borkowski <mbork@mbork.pl> wrote:

>
> On 2016-02-18, at 01:11, Juri Linkov <juri@linkov.net> wrote:
>
> > How come char-folding is on by default in Chromium,
> > and yet nobody has a problem with that?
>
> Well, nobody has a problem with the fact that Chromium does not have
> anything like query-replace, either.
>

If this impacts replace-string as well, then it moves from being a mere
irritant to a disaster when applied to Swedish. Imagine trying to replace
the word "correct" and you end up having the word "steering wheel" be
silently replaced as well (the former is "rätt" in Swedish, while the
latter is "ratt").

If my vote counts, it's obviously "off".

Regards,
Elias

[-- Attachment #2: Type: text/html, Size: 1249 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-18 16:30                                     ` Richard Stallman
@ 2016-02-18 17:07                                       ` Eli Zaretskii
  0 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-18 17:07 UTC (permalink / raw)
  To: rms; +Cc: joostkremers, per.starback, ofv, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> CC: joostkremers@fastmail.fm, per.starback@gmail.com, ofv@wanadoo.es,
> 	emacs-devel@gnu.org
> Date: Thu, 18 Feb 2016 11:30:33 -0500
> 
>   > Actually, my counts indicate that more people want it on by default
>   > than off.
> 
> We should think about why people want what they want, and how much
> the feature helps or hurts them -- not just count people.

I do both, although counting is easier, since people don't necessarily
explain their desires clearly enough.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-18 17:07                                           ` Elias Mårtenson
@ 2016-02-18 17:21                                             ` Eli Zaretskii
  2016-02-19  7:40                                               ` Elias Mårtenson
  2016-02-19 20:47                                             ` Marcin Borkowski
  1 sibling, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-18 17:21 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: mvoteiza, juri, emacs-devel

> Date: Fri, 19 Feb 2016 01:07:31 +0800
> From: Elias Mårtenson <lokedhs@gmail.com>
> Cc: Mark Oteiza <mvoteiza@udel.edu>, emacs-devel <emacs-devel@gnu.org>,
> 	Juri Linkov <juri@linkov.net>
> 
> If this impacts replace-string as well, then it moves from being a mere irritant to a disaster when applied to
> Swedish. Imagine trying to replace the word "correct" and you end up having the word "steering wheel" be
> silently replaced as well (the former is "rätt" in Swedish, while the latter is "ratt").

There's no reason to assume Emacs development is that stupid.  From
the Emacs manual:

     The replacement commands by default do not use character folding
  (*note character folding: Lax Search.) when looking for the text to
  replace.  To enable character folding for matching in ‘query-replace’
  and ‘replace-string’, set the variable ‘replace-character-fold’ to a
  non-‘nil’ value.  (This setting does not affect the replacement text,
  only how Emacs finds the text to replace.  It also doesn’t affect
  ‘replace-regexp’.)

> If my vote counts, it's obviously "off".

In general, or because you thought replacement commands fold
characters?

In this message:

  http://lists.gnu.org/archive/html/emacs-devel/2016-02/msg00245.html

you expressed a different opinion:

  I'm not even suggesting that this kind of comparisons should not be
  the default, even. Especially given the fact that locale-dependent
  comparators are not very well supported in Emacs at the moment.

This seems to be mildly in favor of the feature being on by default,
or maybe I misunderstand what you wanted to say here.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-18 11:26                                           ` Filipp Gunbin
@ 2016-02-18 17:26                                             ` Eli Zaretskii
  2016-02-19 12:30                                               ` Filipp Gunbin
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-18 17:26 UTC (permalink / raw)
  To: Filipp Gunbin; +Cc: emacs-devel

> From: Filipp Gunbin <fgunbin@fastmail.fm>
> Date: Thu, 18 Feb 2016 14:26:02 +0300
> 
> I think the default should be "on" only when we have documented and
> stable logic (even if the implementation has bugs) that is not going to
> change much from version to version.
> 
> Otherwise, people who switch versions often (as Achim wrote earlier)
> will be confused.

I'm not sure I understand what you mean by this.  If we decide to
leave the option on by default, it will stay on for substantial amount
of time.  And the same if we decide to turn it off by default.
Defaults don't change frequently in Emacs, as a matter of policy,
precisely for the reasons you mention.  Why should you think this
option will be any different?

> So I'm for "off" now.

Thanks.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-18  8:57                                   ` Alan Mackenzie
@ 2016-02-18 17:27                                     ` Eli Zaretskii
  2016-02-19 12:37                                       ` Richard Stallman
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-18 17:27 UTC (permalink / raw)
  To: Alan Mackenzie; +Cc: ofv, per.starback, rms, kbrown, emacs-devel

> Date: Thu, 18 Feb 2016 08:57:30 +0000
> Cc: Eli Zaretskii <eliz@gnu.org>,
>   Per Starbäck <per.starback@gmail.com>, rms@gnu.org,
>   Óscar Fuentes <ofv@wanadoo.es>, emacs-devel@gnu.org
> From: Alan Mackenzie <acm@muc.de>
> 
> In short, character folding as it currently is is really in an
> experimental stage.  I therefore vote for it to be disabled by default.

Thanks, noted.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-18  0:20                                         ` Mark Oteiza
@ 2016-02-18 17:28                                           ` Eli Zaretskii
  0 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-18 17:28 UTC (permalink / raw)
  To: Mark Oteiza; +Cc: emacs-devel, juri

> Date: Wed, 17 Feb 2016 19:20:27 -0500
> From: Mark Oteiza <mvoteiza@udel.edu>
> Cc: emacs-devel@gnu.org
> 
> > How come char-folding is on by default in Chromium,
> > and yet nobody has a problem with that?
> 
> Apparently it has just been an open issue for six years.
> https://code.google.com/p/chromium/issues/detail?id=31609

Most of the complaints there is because Chromium doesn't provide any
way to turn the folding off.  There's no such problem in Emacs, of
course.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-18  0:03                                           ` Vinicius Latorre
@ 2016-02-18 17:29                                             ` Eli Zaretskii
  0 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-18 17:29 UTC (permalink / raw)
  To: Vinicius Latorre
  Cc: ofv, rms, kbrown, joostkremers, per.starback, emacs-devel

> Date: Wed, 17 Feb 2016 22:03:53 -0200
> From: Vinicius Latorre <viniciusjl.gnu@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, Joost Kremers <joostkremers@fastmail.fm>, ofv@wanadoo.es, 
> 	per.starback@gmail.com, rms@gnu.org, emacs-devel <emacs-devel@gnu.org>
> 
> My vote is off by default.

Thanks.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-17 22:58                                         ` Ken Brown
                                                             ` (2 preceding siblings ...)
  2016-02-18 11:26                                           ` Filipp Gunbin
@ 2016-02-18 17:30                                           ` Eli Zaretskii
  3 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-18 17:30 UTC (permalink / raw)
  To: Ken Brown; +Cc: joostkremers, ofv, emacs-devel, rms, per.starback

> Cc: ofv@wanadoo.es, per.starback@gmail.com, rms@gnu.org, emacs-devel@gnu.org
> From: Ken Brown <kbrown@cornell.edu>
> Date: Wed, 17 Feb 2016 17:58:45 -0500
> 
> I wrote earlier with positive feedback about character folding, but I 
> didn't express an opinion about what the default should be.  I'm now 
> ready to cast my vote for having it on by default.
> 
> My reason is that I think many users are likely to find character 
> folding useful, but they are unlikely to discover that it exists if it 
> is not on by default.  I have read the claims that character folding in 
> its present form will be viewed as a bug by speakers of certain 
> languages.  But I think the possible benefits to others outweigh the 
> possible harm done to those who initially think it's a bug.

Thanks.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-17 22:53                                     ` Mark Oteiza
  2016-02-18  0:11                                       ` Juri Linkov
@ 2016-02-18 17:46                                       ` Eli Zaretskii
  2016-02-18 18:18                                         ` Mark Oteiza
  1 sibling, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-18 17:46 UTC (permalink / raw)
  To: Mark Oteiza; +Cc: emacs-devel

> From: Mark Oteiza <mvoteiza@udel.edu>
> Date: Wed, 17 Feb 2016 17:53:27 -0500
> 
> I didn't know what character folding was before this was implemented in
> Emacs, and AFAICT the only other thing I happen to have installed that
> does this is Chromium.

We don't have to always be the Nth application on the block to
implement something useful.  When Emacs was first introduced, it
pioneered many features that nowadays are taken for granted.  There's
no reason why this trend should stop, IMO.

> While it's a neat feature, it should default to off.

Thanks for providing feedback.

> It appears that char-folding's dependence on elisp regex is a
> crutch.

You (or anyone else) are welcome to work on re-implementing this in
search.c similarly to case-folding we already have there.  The current
implementation was accepted because the feature was deemed important,
and no one stepped forward to do it in C.

> Long PS: I think the news items in "** Search and Replace" need to be
> clearer. In particular:
> 
> - *** New user option ... should perhaps mention character-fold-to-regexp if
>   that ends up being the default

Done.

> - *** `isearch' and ... should mention how to disable/enable character
>   folding for isearch, whatever the default ends up being

I added that.

> - *** New function ... should mention that it is to be added to
>   `search-default-regexp-mode'

The first item above already does (after the changes you proposed
above), so this sounds redundant.

> To me, these appear to be completely disjoint despite having everything
> to do with char-folding.  I think one would have to know how isearch
> actually works in order to put it together from reading the NEWS as it
> is currently.

The grouping in NEWS is not meant to facilitate putting it all
together, in the sense of creating some overall picture of the
underlying implementation.  The grouping is there to make it easier to
grasp changes related to the same feature or group of features, that's
all.

Thanks.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-18 17:46                                       ` Eli Zaretskii
@ 2016-02-18 18:18                                         ` Mark Oteiza
  2016-02-18 18:24                                           ` Eli Zaretskii
  0 siblings, 1 reply; 263+ messages in thread
From: Mark Oteiza @ 2016-02-18 18:18 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Mark Oteiza <mvoteiza@udel.edu>
>> Date: Wed, 17 Feb 2016 17:53:27 -0500
>> 
>> I didn't know what character folding was before this was implemented in
>> Emacs, and AFAICT the only other thing I happen to have installed that
>> does this is Chromium.
>
> We don't have to always be the Nth application on the block to
> implement something useful.  When Emacs was first introduced, it
> pioneered many features that nowadays are taken for granted.  There's
> no reason why this trend should stop, IMO.

If Emacs does become the first application to implement char-folding and
provide a means to overcome the language issues associated with the
current implementation, that will be impressive.

>> It appears that char-folding's dependence on elisp regex is a
>> crutch.
>
> You (or anyone else) are welcome to work on re-implementing this in
> search.c similarly to case-folding we already have there.  The current
> implementation was accepted because the feature was deemed important,
> and no one stepped forward to do it in C.

Good to know that patches are welcome.

>> Long PS: I think the news items in "** Search and Replace" need to be
>> clearer. In particular:
>> 
>> - *** New user option ... should perhaps mention character-fold-to-regexp if
>>   that ends up being the default
>
> Done.
>
>> - *** `isearch' and ... should mention how to disable/enable character
>>   folding for isearch, whatever the default ends up being
>
> I added that.
>
>> - *** New function ... should mention that it is to be added to
>>   `search-default-regexp-mode'
>
> The first item above already does (after the changes you proposed
> above), so this sounds redundant.

Indeed, thanks



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-18 18:18                                         ` Mark Oteiza
@ 2016-02-18 18:24                                           ` Eli Zaretskii
  0 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-18 18:24 UTC (permalink / raw)
  To: Mark Oteiza; +Cc: emacs-devel

> From: Mark Oteiza <mvoteiza@udel.edu>
> Date: Thu, 18 Feb 2016 13:18:12 -0500
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Mark Oteiza <mvoteiza@udel.edu>
> >> Date: Wed, 17 Feb 2016 17:53:27 -0500
> >> 
> >> I didn't know what character folding was before this was implemented in
> >> Emacs, and AFAICT the only other thing I happen to have installed that
> >> does this is Chromium.
> >
> > We don't have to always be the Nth application on the block to
> > implement something useful.  When Emacs was first introduced, it
> > pioneered many features that nowadays are taken for granted.  There's
> > no reason why this trend should stop, IMO.
> 
> If Emacs does become the first application to implement char-folding and
> provide a means to overcome the language issues associated with the
> current implementation, that will be impressive.

"A journey of a thousand miles begins with a single step."



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-13 17:58                                     ` Eli Zaretskii
@ 2016-02-18 19:15                                       ` John Wiegley
  2016-02-18 20:12                                         ` Eli Zaretskii
  0 siblings, 1 reply; 263+ messages in thread
From: John Wiegley @ 2016-02-18 19:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ofv, juri, Drew Adams, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 832 bytes --]

Hi Eli,

I see you've kept a running tally of votes for the default nature of this
feature. Do you have a summary yet?

Given the sheer volume of concerned response, both for and against, my
inclination is to vote OFF by default, until we have more experience and
understanding. However, if the tally shows a distinct majority (at least 2/3)
wanting it on by default, I'll take that account.

We can always turn it back on in a later release -- and users can always
configure it at any time -- so this isn't a cliff we're driving off of. It's
more a question of how much use (and thus, feedback) the feature will receive
during 25.x if we turn it off by default.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-18 19:15                                       ` John Wiegley
@ 2016-02-18 20:12                                         ` Eli Zaretskii
  2016-02-19  5:11                                           ` Lars Ingebrigtsen
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-18 20:12 UTC (permalink / raw)
  To: John Wiegley; +Cc: ofv, juri, drew.adams, emacs-devel

> From: John Wiegley <jwiegley@gmail.com>
> Cc: Drew Adams <drew.adams@oracle.com>,  ofv@wanadoo.es,  emacs-devel@gnu.org,  juri@linkov.net
> Date: Thu, 18 Feb 2016 11:15:22 -0800
> 
> I see you've kept a running tally of votes for the default nature of this
> feature. Do you have a summary yet?

I can count ;-)

> Given the sheer volume of concerned response, both for and against, my
> inclination is to vote OFF by default, until we have more experience and
> understanding. However, if the tally shows a distinct majority (at least 2/3)
> wanting it on by default, I'll take that account.

I think it's too early to make the decision.  The feedback only
started to accumulate, and we are nowhere near a release.  What's the
rush?



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-18 20:12                                         ` Eli Zaretskii
@ 2016-02-19  5:11                                           ` Lars Ingebrigtsen
  2016-02-19  8:20                                             ` Eli Zaretskii
  0 siblings, 1 reply; 263+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-19  5:11 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> I can count ;-)

Here's my vote: I think character folding is a good idea, and that it
should be turned on by default if it respects the locale.  If not, it
should be off by default.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-18 17:21                                             ` Eli Zaretskii
@ 2016-02-19  7:40                                               ` Elias Mårtenson
  2016-02-19 19:24                                                 ` Achim Gratz
  0 siblings, 1 reply; 263+ messages in thread
From: Elias Mårtenson @ 2016-02-19  7:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Mark Oteiza, emacs-devel, Juri Linkov

[-- Attachment #1: Type: text/plain, Size: 1345 bytes --]

On 19 February 2016 at 01:21, Eli Zaretskii <eliz@gnu.org> wrote:

>
> > If my vote counts, it's obviously "off".
>
> In general, or because you thought replacement commands fold
> characters?
>

Because I thought replacement was affected.

> In this message:
>
>   http://lists.gnu.org/archive/html/emacs-devel/2016-02/msg00245.html
>
> you expressed a different opinion:
>
>   I'm not even suggesting that this kind of comparisons should not be
>   the default, even. Especially given the fact that locale-dependent
>   comparators are not very well supported in Emacs at the moment.
>
> This seems to be mildly in favor of the feature being on by default,
> or maybe I misunderstand what you wanted to say here.

That is correct. I was, mildly in favour, as long as it's limited to
interactive search commands.

As I have read the rest of the discussion, I have shifted slightly toward
the negative end of the spectrum to the point that my opinion is "off" by
default right now, until locale-aware searching is available in Emacs.

I'm a firm believer of putting one's money where one's mouth is, and I'm
willing to work on it myself. However, right now I'm limited by the fact
that I have no copyright assignment on file, so you can't merge anything I
do. I have to bring this up with my employer's legal department again.

Regards,
Elias

[-- Attachment #2: Type: text/html, Size: 2110 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-19  5:11                                           ` Lars Ingebrigtsen
@ 2016-02-19  8:20                                             ` Eli Zaretskii
  2016-02-19  9:22                                               ` Elias Mårtenson
  2016-02-19 22:44                                               ` Lars Ingebrigtsen
  0 siblings, 2 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-19  8:20 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Fri, 19 Feb 2016 16:11:41 +1100
> 
> Here's my vote: I think character folding is a good idea, and that it
> should be turned on by default if it respects the locale.  If not, it
> should be off by default.

Thanks.  But what does "respect the locale" mean, in practical terms?
A large portion of the characters that have some decomposition, and
thus will be folded when searching, belong to scripts that are not
related to any language or other locale-specific attribute.  What do
you think should be done with them in the context of this feature?



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-19  8:20                                             ` Eli Zaretskii
@ 2016-02-19  9:22                                               ` Elias Mårtenson
  2016-02-19 10:09                                                 ` Eli Zaretskii
  2016-02-19 20:38                                                 ` Marcin Borkowski
  2016-02-19 22:44                                               ` Lars Ingebrigtsen
  1 sibling, 2 replies; 263+ messages in thread
From: Elias Mårtenson @ 2016-02-19  9:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2029 bytes --]

On 19 February 2016 at 16:20, Eli Zaretskii <eliz@gnu.org> wrote:

> > From: Lars Ingebrigtsen <larsi@gnus.org>
> > Date: Fri, 19 Feb 2016 16:11:41 +1100
> >
> > Here's my vote: I think character folding is a good idea, and that it
> > should be turned on by default if it respects the locale.  If not, it
> > should be off by default.
>
> Thanks.  But what does "respect the locale" mean, in practical terms?
> A large portion of the characters that have some decomposition, and
> thus will be folded when searching, belong to scripts that are not
> related to any language or other locale-specific attribute.  What do
> you think should be done with them in the context of this feature?
>

The Unicode character decomposition was never meant to be used to provide a
feature such as character folding in Emacs. But, Unicode really doesn't
provide a good alternative. The standard itself states that this belongs to
the realm of localisation (IIRC, it even goes as far as mentioning Swedish
as a counterexample).

I readily agree that using the decomposition is a clever way to get the
functionality quite a long way, but the cases where it breaks down, it does
so quite spectacularly, and that's what I (and others) have been opposing.

My suggestion would be to apply several levels of comparisons:

  1. Check if the characters have locale-specific folding rules (for
Swedish, this would be no more than 3-5 characters or so). If not:
  2. Check the equivalence according to the Unicode collation charts:
http://unicode.org/charts/collation/
  3. (maybe) Use the decomposition trick

As for the per-locale exception tables mentioned in point 1, I don't know
if such information is easily available. It may be possible to extract it
from the localedata files from Glibc. But even if it isn't, creating one
for a language should be trivial since we only need a list of character
groups that should _not_ be folded, which for most languages should be a
very small list (in fact, for most(?) it's probably empty).

Regards,
Elias

[-- Attachment #2: Type: text/html, Size: 2752 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-19  9:22                                               ` Elias Mårtenson
@ 2016-02-19 10:09                                                 ` Eli Zaretskii
  2016-02-19 10:51                                                   ` Elias Mårtenson
  2016-02-19 20:38                                                 ` Marcin Borkowski
  1 sibling, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-19 10:09 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: larsi, emacs-devel

> Date: Fri, 19 Feb 2016 17:22:18 +0800
> From: Elias Mårtenson <lokedhs@gmail.com>
> Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org>
> 
> The Unicode character decomposition was never meant to be used to provide a feature such as character
> folding in Emacs.

That's not true.  Canonical equivalence, which is encoded in canonical
decompositions, is a must for searching.  Otherwise, what looks the
same on display will not be found, and will look like a bug.  See the
example I gave with ñ and ñ (the latter one is 2 characters).

So using decomposition is not a trick, it simply uses the same data
that determines equivalence of character sequences.

> My suggestion would be to apply several levels of comparisons:
> 
> 1. Check if the characters have locale-specific folding rules (for Swedish, this would be no more than 3-5
> characters or so). If not:
> 2. Check the equivalence according to the Unicode collation charts: http://unicode.org/charts/collation/
> 3. (maybe) Use the decomposition trick

2 and 3 are the same as we do already, AFAICT.  (Collation charts
describe ordering, which is irrelevant for searching; other than that,
you will see that Emacs already implements the data shown in
http://unicode.org/charts/collation/.)

As for the locale-specific parts: using that will only DTRT if we
assume that the majority of searches are done in buffers holding text
in locale's language.  Is that a good assumption?  We are talking
about a multilingual Emacs, in an age of global communications, where
you can have conversations with someone on the other side of the
world, or read text that combines several languages in the same
buffer.  Do we really want to go back to the l10n days, when there was
ever only one locale that was interesting -- the current one?  I
wonder.

> As for the per-locale exception tables mentioned in point 1, I don't know if such information is easily available.

It is, Unicode provides it.  We just didn't import it yet.

> It may be possible to extract it from the localedata files from Glibc. But even if it isn't, creating one for a
> language should be trivial since we only need a list of character groups that should _not_ be folded, which for
> most languages should be a very small list (in fact, for most(?) it's probably empty).

It's more complex than that, but patches are welcome, of course.

Note that the prerequisite for anything more complicated and elaborate
than what we have now is to re-implement character-folding on the C
level, inside search.c functions.  The current implementation is at
its limits already.  I tried to convince the interested people to do
this in C to be gin with, but couldn't, and the feature was important
enough to have even in its current implementation.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-19 10:09                                                 ` Eli Zaretskii
@ 2016-02-19 10:51                                                   ` Elias Mårtenson
  2016-02-19 11:46                                                     ` Eli Zaretskii
  0 siblings, 1 reply; 263+ messages in thread
From: Elias Mårtenson @ 2016-02-19 10:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 4151 bytes --]

On 19 February 2016 at 18:09, Eli Zaretskii <eliz@gnu.org> wrote:

> > The Unicode character decomposition was never meant to be used to
> provide a feature such as character
> > folding in Emacs.
>
> That's not true.  Canonical equivalence, which is encoded in canonical
> decompositions, is a must for searching.  Otherwise, what looks the
> same on display will not be found, and will look like a bug.  See the
> example I gave with ñ and ñ (the latter one is 2 characters).
>

Of course you have to use the decomposition algorithms to ensure that the
precomposed and decomposed variations of the same character compares equal.

This is, however, different from using the decomposition to to decompose a
character and then using the base character as the thing to match against.
The latter is what Emacs is doing today, as far as I understand.

> 2 and 3 are the same as we do already, AFAICT.  (Collation charts
> describe ordering, which is irrelevant for searching; other than that,
> you will see that Emacs already implements the data shown in
> http://unicode.org/charts/collation/.)
>

The collation charts also describe equivalence. If you look at the latin
collation chart for example (
http://unicode.org/charts/collation/chart_Latin.html) you will see that the
characters are grouped. These are the equivalences I'm referring to.

Now, I note that on these charts, U+0061 LATIN SMALL LETTER A and
U+2C65 LATIN SMALL LETTER A WITH STROKE compares as different characters,
and the latter does not have a decomposition. Should this also be addressed?

> As for the locale-specific parts: using that will only DTRT if we
> assume that the majority of searches are done in buffers holding text
> in locale's language.  Is that a good assumption?

My opinion is that the default search behaviour should depend primarily on
the locale of the entire Emacs session. I.e. the locale of the user
starting the application. I'm not disagreeing that allowing a buffer-local
locale override this behaviour is a good idea, but as a Swedish speaker I
really see å, ä and a as completely separate things, even if the language
of the buffer that I am editing happens to be English. The equivalence of
these characters is the odd behaviour here, and the one that should be
enabled explicitly.

Also, if I happen to be editing a Spanish document (I don't speak Spanish)
I would find equivalence of ñ and n to be incredibly useful, even though
Óscar would grind his teeth at it. :-)

We are talking
> about a multilingual Emacs, in an age of global communications, where
> you can have conversations with someone on the other side of the
> world, or read text that combines several languages in the same
> buffer.  Do we really want to go back to the l10n days, when there was
> ever only one locale that was interesting -- the current one?  I
> wonder.
>

Actually, I think so. This is because the search equivalence is inherently
a local thing. The behaviour of search is more tried to a user's preference
than the locale of the given buffer, in most cases.

At least that's my opinion. The bike shed can have many colours.

> It is, Unicode provides it.  We just didn't import it yet.
>

It does? I was looking for such tables, but didn't find it. Do you have a
link?

> It's more complex than that, but patches are welcome, of course.
>

Having spent the better part of the day trying to solve a C++ design
problem that I had originally hand-waved as being trivial, I know what you
mean…

> Note that the prerequisite for anything more complicated and elaborate
> than what we have now is to re-implement character-folding on the C
> level, inside search.c functions.  The current implementation is at
> its limits already.  I tried to convince the interested people to do
> this in C to be gin with, but couldn't, and the feature was important
> enough to have even in its current implementation.
>

I'm not going to offer to do this until I'm sure that I can have the
copyright assignment done. But I am interested in it.

Regards,
Elias

[-- Attachment #2: Type: text/html, Size: 6206 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-19 10:51                                                   ` Elias Mårtenson
@ 2016-02-19 11:46                                                     ` Eli Zaretskii
  2016-02-19 13:37                                                       ` Elias Mårtenson
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-19 11:46 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: larsi, emacs-devel

> Date: Fri, 19 Feb 2016 18:51:47 +0800
> From: Elias Mårtenson <lokedhs@gmail.com>
> Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org>
> 
>  > The Unicode character decomposition was never meant to be used to provide a feature such as
>  character
>  > folding in Emacs.
> 
>  That's not true. Canonical equivalence, which is encoded in canonical
>  decompositions, is a must for searching. Otherwise, what looks the
>  same on display will not be found, and will look like a bug. See the
>  example I gave with ñ and ñ (the latter one is 2 characters).
> 
> Of course you have to use the decomposition algorithms to ensure that the precomposed and decomposed
> variations of the same character compares equal.

Then you agree that _some_ form of character-folding should be turned
on by default?

> This is, however, different from using the decomposition to to decompose a character and then using the
> base character as the thing to match against. The latter is what Emacs is doing today, as far as I understand.

Please describe in more detail why do you think what Emacs does today
is not what you think it should do.  It's possible we have a
miscommunication here.

For example, if the buffer includes ñ (2 characters), should "C-s n"
find the n in it?

>  2 and 3 are the same as we do already, AFAICT. (Collation charts
>  describe ordering, which is irrelevant for searching; other than that,
>  you will see that Emacs already implements the data shown in
>  http://unicode.org/charts/collation/.)
> 
> The collation charts also describe equivalence.

That equivalence is encoded in the decomposition data that is part of
UnicodeData.txt which Emacs uses for character-folding.

> If you look at the latin collation chart for example
> (http://unicode.org/charts/collation/chart_Latin.html) you will see that the characters are grouped. These are
> the equivalences I'm referring to.

Yes.  And if you look at the entries of the equivalent characters in
UnicodeData.txt, you will see there they have decompositions, which is
what Emacs uses for searching when character-folding is in effect.

> Now, I note that on these charts, U+0061 LATIN SMALL LETTER A and U+2C65 LATIN SMALL LETTER A
> WITH STROKE compares as different characters, and the latter does not have a decomposition. Should this
> also be addressed?

Maybe so, but given the controversy even about what we do now, which
is a subset, I'd doubt extending what we do now is a wise move.

>  As for the locale-specific parts: using that will only DTRT if we
>  assume that the majority of searches are done in buffers holding text
>  in locale's language. Is that a good assumption? 
> 
> My opinion is that the default search behaviour should depend primarily on the locale of the entire Emacs
> session. I.e. the locale of the user starting the application. I'm not disagreeing that allowing a buffer-local locale
> override this behaviour is a good idea, but as a Swedish speaker I really see å, ä and a as completely
> separate things, even if the language of the buffer that I am editing happens to be English. The equivalence of
> these characters is the odd behaviour here, and the one that should be enabled explicitly.
> 
> Also, if I happen to be editing a Spanish document (I don't speak Spanish) I would find equivalence of ñ and n
> to be incredibly useful, even though Óscar would grind his teeth at it. :-)

So you are in fact making two contradicting statements here.  Indeed,
the locale in which Emacs started says almost nothing about the
documents being edited, nor even about the user's preferences: it is
easy to imagine a user whose "native" locale is X starting Emacs in
another locale.

>  We are talking
>  about a multilingual Emacs, in an age of global communications, where
>  you can have conversations with someone on the other side of the
>  world, or read text that combines several languages in the same
>  buffer. Do we really want to go back to the l10n days, when there was
>  ever only one locale that was interesting -- the current one? I
>  wonder.
> 
> Actually, I think so. This is because the search equivalence is inherently a local thing.

Being a multi-lingual environment, Emacs has no real notion of the
locale.

>  It is, Unicode provides it. We just didn't import it yet.
> 
> It does? I was looking for such tables, but didn't find it. Do you have a link?

Look for DUCET and its tailoring data.  These should be a good
starting point:

  http://www.unicode.org/Public/UCA/latest/
  http://cldr.unicode.org/
  
>  Note that the prerequisite for anything more complicated and elaborate
>  than what we have now is to re-implement character-folding on the C
>  level, inside search.c functions. The current implementation is at
>  its limits already. I tried to convince the interested people to do
>  this in C to be gin with, but couldn't, and the feature was important
>  enough to have even in its current implementation.
> 
> I'm not going to offer to do this until I'm sure that I can have the copyright assignment done. But I am
> interested in it.

Thanks.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-18 17:26                                             ` Eli Zaretskii
@ 2016-02-19 12:30                                               ` Filipp Gunbin
  2016-02-19 15:22                                                 ` Eli Zaretskii
  0 siblings, 1 reply; 263+ messages in thread
From: Filipp Gunbin @ 2016-02-19 12:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Hi Eli,

On 18/02/2016 19:26 +0200, Eli Zaretskii wrote:

>> From: Filipp Gunbin <fgunbin@fastmail.fm>
>> Date: Thu, 18 Feb 2016 14:26:02 +0300
>> 
>> I think the default should be "on" only when we have documented and
>> stable logic (even if the implementation has bugs) that is not going to
>> change much from version to version.
>> 
>> Otherwise, people who switch versions often (as Achim wrote earlier)
>> will be confused.
>
> I'm not sure I understand what you mean by this.  If we decide to
> leave the option on by default, it will stay on for substantial amount
> of time.  And the same if we decide to turn it off by default.
> Defaults don't change frequently in Emacs, as a matter of policy,
> precisely for the reasons you mention.  Why should you think this
> option will be any different?

I wrote about the logic of folding.  There is ongoing discussion about
it and I wanted to stress that if we make the feature "on" by default
and then change algorithm, there will be radically different behavior in
different versions (besides bugfix).  Maybe that's so obvious that's
even not worth saying.

Filipp



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-18 17:27                                     ` Eli Zaretskii
@ 2016-02-19 12:37                                       ` Richard Stallman
  2016-02-19 18:31                                         ` John Wiegley
  0 siblings, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-19 12:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ofv, acm, emacs-devel, kbrown, per.starback

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

We're not looking for "votes" on this list.  That would be the wrong
way to make a decision.  We need to poll the users -- but that too
will not be a simple matter of counting votes.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-19 11:46                                                     ` Eli Zaretskii
@ 2016-02-19 13:37                                                       ` Elias Mårtenson
  2016-02-19 19:18                                                         ` Eli Zaretskii
  0 siblings, 1 reply; 263+ messages in thread
From: Elias Mårtenson @ 2016-02-19 13:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 6782 bytes --]

On 19 February 2016 at 19:46, Eli Zaretskii <eliz@gnu.org> wrote:

> > Of course you have to use the decomposition algorithms to ensure that
> the precomposed and decomposed
> > variations of the same character compares equal.
>
> Then you agree that _some_ form of character-folding should be turned
> on by default?
>

Yes.

> > This is, however, different from using the decomposition to to decompose
> a character and then using the
> > base character as the thing to match against. The latter is what Emacs
> is doing today, as far as I understand.
>
> Please describe in more detail why do you think what Emacs does today
> is not what you think it should do.  It's possible we have a
> miscommunication here.
>

The main issue to me is that it matches things that should not be matched.
A secondary (minor) issue is that some things that should be matched is not
(see my example with U+2C65).

> For example, if the buffer includes ñ (2 characters), should "C-s n"
> find the n in it?
>

That depends on the locale of the user. However, from the point of a user,
there should not be a visible difference between the precomposed and the
composed variants are the exact same character. This is in line with
Unicode recommendations (https://en.wikipedia.org/wiki/Unicode_equivalence)

Note: I know that it's possible that I am wrong about this and that Unicode
actually _has_ said that the equivalence tables can be used for this
purpose (I.e. decompose and only use the primary character). If that is the
case, I'd be interested to see a reference to that, but I will still be of
the same opinion that doing so will result in broken behaviour for a
certain class of user.

Thus, if I am Spanish, I will _not_ want any of those to match "n". If I'm
Swedish I will likely want both of them to match "n".

That equivalence is encoded in the decomposition data that is part of
> UnicodeData.txt which Emacs uses for character-folding.
>

The equivalence tables explains that the precomposed character U+00F1 is
equivalent to the specific sequence U+006E U+0303. That is all it says. It
does not say that ñ is a variation of n. It's an instruction how to
construct a given character.

The decompositions are used in the normalisation forms to ensure that the
two variants are treated equally (such as the two alternative
representations of ñ that we have been discussing).

> > If you look at the latin collation chart for example
> > (http://unicode.org/charts/collation/chart_Latin.html) you will see
> that the characters are grouped. These are
> > the equivalences I'm referring to.
>
> Yes.  And if you look at the entries of the equivalent characters in
> UnicodeData.txt, you will see there they have decompositions, which is
> what Emacs uses for searching when character-folding is in effect.
>

Yes, and this is where the crux of our disagreement lies, I think. I
previously referred to using the decompositions as a guide to character
equivalence as a "trick". I stand by this, since this is not the purpose of
the decompositions. The best thing that Unicode provides for that purpose
(to my knowledge) are the collation charts that I mentioned previously (
http://unicode.org/charts/collation/)

> > Now, I note that on these charts, U+0061 LATIN SMALL LETTER A and U+2C65
> LATIN SMALL LETTER A
> > WITH STROKE compares as different characters, and the latter does not
> have a decomposition. Should this
> > also be addressed?
>
> Maybe so, but given the controversy even about what we do now, which
> is a subset, I'd doubt extending what we do now is a wise move.
>

I was just asking to understand your position better.

> >  As for the locale-specific parts: using that will only DTRT if we
> >  assume that the majority of searches are done in buffers holding text
> >  in locale's language. Is that a good assumption?
> >
> > My opinion is that the default search behaviour should depend primarily
> on the locale of the entire Emacs
> > session. I.e. the locale of the user starting the application. I'm not
> disagreeing that allowing a buffer-local locale
> > override this behaviour is a good idea, but as a Swedish speaker I
> really see å, ä and a as completely
> > separate things, even if the language of the buffer that I am editing
> happens to be English. The equivalence of
> > these characters is the odd behaviour here, and the one that should be
> enabled explicitly.
> >
> > Also, if I happen to be editing a Spanish document (I don't speak
> Spanish) I would find equivalence of ñ and n
> > to be incredibly useful, even though Óscar would grind his teeth at it.
> :-)
>
> So you are in fact making two contradicting statements here.

Interesting. I have re-read what I wrote and I really don't see myself
holding two contradicting statement. Perhaps you think that I am both
against folding and not, at the same time. If that's the case, let me try
to rephrase:

I like the idea of character folding. But, if it's incorrectly (by my
standards, of course) implemented I would rather not have it at all since
it will be highly annoying.

> Indeed,
> the locale in which Emacs started says almost nothing about the
> documents being edited, nor even about the user's preferences: it is
> easy to imagine a user whose "native" locale is X starting Emacs in
> another locale.
>

Yes. I am fully aware of this. But so be it. Having applications work
differently depending on the locale of the environment the application was
started in is nothing new.

> >  We are talking
> >  about a multilingual Emacs, in an age of global communications, where
> >  you can have conversations with someone on the other side of the
> >  world, or read text that combines several languages in the same
> >  buffer. Do we really want to go back to the l10n days, when there was
> >  ever only one locale that was interesting -- the current one? I
> >  wonder.
> >
> > Actually, I think so. This is because the search equivalence is
> inherently a local thing.
>
> Being a multi-lingual environment, Emacs has no real notion of the
> locale.
>

Perhaps it should?

> >  It is, Unicode provides it. We just didn't import it yet.
> >
> > It does? I was looking for such tables, but didn't find it. Do you have
> a link?
>
> Look for DUCET and its tailoring data.  These should be a good
> starting point:
>
>   http://www.unicode.org/Public/UCA/latest/
>   http://cldr.unicode.org/
>

Those are the decomposition charts, and don't actually say anything about
equivalence outside of providing a canonical form for precomposed
characters, as was discussed above.

Regards,
Elias

[-- Attachment #2: Type: text/html, Size: 9952 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-19 12:30                                               ` Filipp Gunbin
@ 2016-02-19 15:22                                                 ` Eli Zaretskii
  0 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-19 15:22 UTC (permalink / raw)
  To: Filipp Gunbin; +Cc: emacs-devel

> From: Filipp Gunbin <fgunbin@fastmail.fm>
> Cc: emacs-devel@gnu.org
> Date: Fri, 19 Feb 2016 15:30:21 +0300
> 
> if we make the feature "on" by default and then change algorithm,
> there will be radically different behavior in different versions
> (besides bugfix).

This is unlikely to happen, precisely for the reasons you said it
shouldn't.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-19 12:37                                       ` Richard Stallman
@ 2016-02-19 18:31                                         ` John Wiegley
  0 siblings, 0 replies; 263+ messages in thread
From: John Wiegley @ 2016-02-19 18:31 UTC (permalink / raw)
  To: Richard Stallman
  Cc: ofv, kbrown, per.starback, emacs-devel, acm, Eli Zaretskii

>>>>> Richard Stallman <rms@gnu.org> writes:

> We're not looking for "votes" on this list. That would be the wrong way to
> make a decision. We need to poll the users -- but that too will not be a
> simple matter of counting votes.

That's understood. We're looking to gauge the sentiment of the developers here
on this list, but the final decision will take every factor into account.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-19 13:37                                                       ` Elias Mårtenson
@ 2016-02-19 19:18                                                         ` Eli Zaretskii
  2016-02-20  5:22                                                           ` Elias Mårtenson
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-19 19:18 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: larsi, emacs-devel

> Date: Fri, 19 Feb 2016 21:37:26 +0800
> From: Elias Mårtenson <lokedhs@gmail.com>
> Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org>
> 
>  For example, if the buffer includes ñ (2 characters), should "C-s n"
>  find the n in it?
> 
> That depends on the locale of the user.

There are use cases that are independent of the locale.  For example,
imagine that you need to find all the literal n characters in a buffer
because you are investigating a bug in the program that produced that
buffer.  As an Emacs user, I need to do such jobs almost every day.  I
don't want the results affected by the locale.

> However, from the point of a user, there should not be a visible
> difference between the precomposed and the composed variants are the
> exact same character.

What if the user wants to find all those places where what looks like
ñ is actually ñ?  Wouldn't that be a valid use case?

> Note: I know that it's possible that I am wrong about this and that Unicode actually _has_ said that the
> equivalence tables can be used for this purpose (I.e. decompose and only use the primary character). If that is
> the case, I'd be interested to see a reference to that, but I will still be of the same opinion that doing so will
> result in broken behaviour for a certain class of user.

The reference you are looking for is the Unicode Standard itself.  It
says to use the normalization forms, see for example section 5.16
there.

> The equivalence tables explains that the precomposed character U+00F1 is equivalent to the specific
> sequence U+006E U+0303. That is all it says. It does not say that ñ is a variation of n. It's an instruction how
> to construct a given character.

Every character-folding search implementation decomposes characters
before matching them.  So does Emacs.  We didn't invent this, and we
certainly didn't use the decompositions where they weren't supposed to
be used.  It's not a trick, it's what everyone else does to do the
job.  See the ICU library, for example.

> The decompositions are used in the normalisation forms to ensure that the two variants are treated equally
> (such as the two alternative representations of ñ that we have been discussing).

Yes, and any character-folding search uses normalization forms as
well.

>  Indeed,
>  the locale in which Emacs started says almost nothing about the
>  documents being edited, nor even about the user's preferences: it is
>  easy to imagine a user whose "native" locale is X starting Emacs in
>  another locale.
> 
> Yes. I am fully aware of this. But so be it. Having applications work differently depending on the locale of the
> environment the application was started in is nothing new.

It's not new.  It's old.  We should move on to more general
environments that support multiple languages.  Emacs is such an
environment.  The old l10n paradigms are fundamentally incompatible
with that.

>  Being a multi-lingual environment, Emacs has no real notion of the
>  locale.
> 
> Perhaps it should?

That'd be a step backward, IMO.

>  > It is, Unicode provides it. We just didn't import it yet.
>  >
>  > It does? I was looking for such tables, but didn't find it. Do you have a link?
> 
>  Look for DUCET and its tailoring data. These should be a good
>  starting point:
> 
>  http://www.unicode.org/Public/UCA/latest/
>  http://cldr.unicode.org/
> 
> Those are the decomposition charts, and don't actually say anything about equivalence outside of providing a
> canonical form for precomposed characters, as was discussed above.

Strange, I always thought the data was there.  Perhaps you should ask
a question on the Unicode mailing list, then.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-19  7:40                                               ` Elias Mårtenson
@ 2016-02-19 19:24                                                 ` Achim Gratz
  2016-02-20  5:05                                                   ` Elias Mårtenson
  0 siblings, 1 reply; 263+ messages in thread
From: Achim Gratz @ 2016-02-19 19:24 UTC (permalink / raw)
  To: emacs-devel

Elias Mårtenson writes:
> I'm a firm believer of putting one's money where one's mouth is, and I'm
> willing to work on it myself. However, right now I'm limited by the fact
> that I have no copyright assignment on file, so you can't merge anything I
> do. I have to bring this up with my employer's legal department again.

If your email address is an indicator of your employer, then to the best
of my knowledge that has been taken care of already, but please ask.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Wavetables for the Terratec KOMPLEXER:
http://Synth.Stromeko.net/Downloads.html#KomplexerWaves




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-19  9:22                                               ` Elias Mårtenson
  2016-02-19 10:09                                                 ` Eli Zaretskii
@ 2016-02-19 20:38                                                 ` Marcin Borkowski
  1 sibling, 0 replies; 263+ messages in thread
From: Marcin Borkowski @ 2016-02-19 20:38 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: Eli Zaretskii, Lars Ingebrigtsen, emacs-devel


On 2016-02-19, at 10:22, Elias Mårtenson <lokedhs@gmail.com> wrote:

> I readily agree that using the decomposition is a clever way to get the
> functionality quite a long way, but the cases where it breaks down, it does
> so quite spectacularly, and that's what I (and others) have been opposing.

And I'd like to remind that it breaks down both ways: non-Poles should
really be able to find "żółć" (btw, this is a real word, meaning "bile")
by searching for "zolc", and "l" and "ł" are currently /not/ equivalent
in the char-folding sense.

Best,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-18 17:07                                           ` Elias Mårtenson
  2016-02-18 17:21                                             ` Eli Zaretskii
@ 2016-02-19 20:47                                             ` Marcin Borkowski
  2016-02-20 14:31                                               ` Richard Stallman
  1 sibling, 1 reply; 263+ messages in thread
From: Marcin Borkowski @ 2016-02-19 20:47 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: Mark Oteiza, emacs-devel, Juri Linkov

On 2016-02-18, at 18:07, Elias Mårtenson <lokedhs@gmail.com> wrote:

> On 18 February 2016 at 12:53, Marcin Borkowski <mbork@mbork.pl> wrote:
>
>>
>> On 2016-02-18, at 01:11, Juri Linkov <juri@linkov.net> wrote:
>>
>> > How come char-folding is on by default in Chromium,
>> > and yet nobody has a problem with that?
>>
>> Well, nobody has a problem with the fact that Chromium does not have
>> anything like query-replace, either.
>>
>
> If this impacts replace-string as well, then it moves from being a mere
> irritant to a disaster when applied to Swedish. Imagine trying to replace

You misunderstood me.  I didn't mean replace is or should be affected.
I meant that Chromium is a tool for /consuming/ text, and Emacs is
a tool for both /consuming/ and /producing/ text (in fact, also for its
/editing/, which is distinct from producing: I spend quite a lot of time
on editing texts (in a natural langauge) written by others).  This
implies that search in Emacs has more use-cases than in a web browser
(think navigation in the file you are editing, for instance).  And yes,
in general this also means replacing, though this is irrelevant to this
discussion.

> Regards,
> Elias

Best,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-19  8:20                                             ` Eli Zaretskii
  2016-02-19  9:22                                               ` Elias Mårtenson
@ 2016-02-19 22:44                                               ` Lars Ingebrigtsen
  2016-02-19 22:54                                                 ` Clément Pit--Claudel
  2016-02-20  8:09                                                 ` Eli Zaretskii
  1 sibling, 2 replies; 263+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-19 22:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Thanks.  But what does "respect the locale" mean, in practical terms?
> A large portion of the characters that have some decomposition, and
> thus will be folded when searching, belong to scripts that are not
> related to any language or other locale-specific attribute.  What do
> you think should be done with them in the context of this feature?

The locale says what language culture the user is from, and that's the
important thing for most users.  Not the language of the document or
anything like that.

Norwegian (like Danish and Swedish) has a 29 character alphabet, and
there are keys on our keyboards for all those letters.  Having any of
those characters show up when searching for other characters is as weird
for a Norwegian as it would be for an American to have any of their 26
characters in their alphabet substitute for another.

The Norwegian "extra" characters are æøå, of which only the latter is
confused in Emacs by any other character by isearch today.  I would
imagine that an American would like ø to be folded with o, for instance,
which it doesn't do.

So as currently implemented, the feature is kinda both incomplete and
too intrusive at the same time.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-19 22:44                                               ` Lars Ingebrigtsen
@ 2016-02-19 22:54                                                 ` Clément Pit--Claudel
  2016-02-20  5:25                                                   ` Elias Mårtenson
  2016-02-20  8:09                                                 ` Eli Zaretskii
  1 sibling, 1 reply; 263+ messages in thread
From: Clément Pit--Claudel @ 2016-02-19 22:54 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 353 bytes --]

On 02/19/2016 05:44 PM, Lars Ingebrigtsen wrote:
> The locale says what language culture the user is from, and that's the
> important thing for most users.  Not the language of the document or
> anything like that.

Does it? I use GNU/Linux in English, but I'm from France. This seems to be a pretty common among French programmers.

Clément.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-19 19:24                                                 ` Achim Gratz
@ 2016-02-20  5:05                                                   ` Elias Mårtenson
  2016-02-20 13:59                                                     ` Achim Gratz
  0 siblings, 1 reply; 263+ messages in thread
From: Elias Mårtenson @ 2016-02-20  5:05 UTC (permalink / raw)
  To: Achim Gratz; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 845 bytes --]

On 20 February 2016 at 03:24, Achim Gratz <Stromeko@nexgo.de> wrote:

> Elias Mårtenson writes:
> > I'm a firm believer of putting one's money where one's mouth is, and I'm
> > willing to work on it myself. However, right now I'm limited by the fact
> > that I have no copyright assignment on file, so you can't merge anything
> I
> > do. I have to bring this up with my employer's legal department again.
>
> If your email address is an indicator of your employer, then to the best
> of my knowledge that has been taken care of already, but please ask.


I'm posting this from a Gmail address. Perhaps you mistook it for a
google.com address? This is my personal email address. I work in the
banking industry where the legal departments tend to try to want to cross
the i's (or whatever the expression is).

Regards,
Elias

[-- Attachment #2: Type: text/html, Size: 1287 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-19 19:18                                                         ` Eli Zaretskii
@ 2016-02-20  5:22                                                           ` Elias Mårtenson
  2016-02-20  6:31                                                             ` Lars Ingebrigtsen
  2016-02-20  9:21                                                             ` Eli Zaretskii
  0 siblings, 2 replies; 263+ messages in thread
From: Elias Mårtenson @ 2016-02-20  5:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 4305 bytes --]

On 20 February 2016 at 03:18, Eli Zaretskii <eliz@gnu.org> wrote:

> > Date: Fri, 19 Feb 2016 21:37:26 +0800
> > From: Elias Mårtenson <lokedhs@gmail.com>
> > Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org
> >
> >
> >  For example, if the buffer includes ñ (2 characters), should "C-s n"
> >  find the n in it?
> >
> > That depends on the locale of the user.
>
> There are use cases that are independent of the locale.  For example,
> imagine that you need to find all the literal n characters in a buffer
> because you are investigating a bug in the program that produced that
> buffer.  As an Emacs user, I need to do such jobs almost every day.  I
> don't want the results affected by the locale.
>

Of course I'm not saying that you should now be able to do this. All I'm
advocating here is sensible defaults.

> > However, from the point of a user, there should not be a visible
> > difference between the precomposed and the composed variants are the
> > exact same character.
>
> What if the user wants to find all those places where what looks like
> ñ is actually ñ?  Wouldn't that be a valid use case?
>

It would, but certainly a very rare one. For all intents and purposes the
two forms are (should be) equivalent.

> The reference you are looking for is the Unicode Standard itself.  It
> says to use the normalization forms, see for example section 5.16
> there.
>

I have read that section before, and I have now read it again. The section
certainly talks about searching ignores diacritics, but does not discuss a
method to do so. There is also a reference to TR29, but it refers to
grapheme clusters which would be a very strange way to do character folding
(Koreans would be very confused).

> Every character-folding search implementation decomposes characters
> before matching them.  So does Emacs.  We didn't invent this, and we
> certainly didn't use the decompositions where they weren't supposed to
> be used.  It's not a trick, it's what everyone else does to do the
> job.  See the ICU library, for example.
>

Every example you have given so far discusses the decomposition
equivalence. I.e. the fact that the who variants of ñ are the same. Section
5.16 discuss the _concept_ of allowing n and ñ match similarly but the
mechanism to do so is locale-dependent. This is what Unicode says, and that
is what I say. My position is simply that the default (if absolutely
nothing else overrides it) should be chosen to take the locale of the user
into account.

> > The decompositions are used in the normalisation forms to ensure that
> the two variants are treated equally
> > (such as the two alternative representations of ñ that we have been
> discussing).
>
> Yes, and any character-folding search uses normalization forms as
> well.
>

Yes, but that's not what normalisation forms were designed to do.

Again (I really apologise for repeating myself, I'm starting to sound like
a troll and that is truly not my intention), the purpose of normalisation
forms are to ensure that the two variants of ñ compare the same. It is not
designed to provide a mechanism to allow n to compare equal to ñ.

> > Yes. I am fully aware of this. But so be it. Having applications work
> differently depending on the locale of the
> > environment the application was started in is nothing new.
>
> It's not new.  It's old.  We should move on to more general
> environments that support multiple languages.  Emacs is such an
> environment.  The old l10n paradigms are fundamentally incompatible
> with that.
>

Sure, but doesn't it make sense to fall back to the user's default if the
buffer does not have an overriding locale?

> >  Being a multi-lingual environment, Emacs has no real notion of the
> >  locale.
> >
> > Perhaps it should?
>
> That'd be a step backward, IMO.
>

As opposed to having no concept of locale at all? I just have to disagree
with you on that.

> Strange, I always thought the data was there.  Perhaps you should ask
> a question on the Unicode mailing list, then.
>

That's a good idea actually. Thank you for the suggestion. I'm reading that
mailing list, and I will post a question there.

Regards,
Elias

[-- Attachment #2: Type: text/html, Size: 6205 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-19 22:54                                                 ` Clément Pit--Claudel
@ 2016-02-20  5:25                                                   ` Elias Mårtenson
  2016-02-20 14:32                                                     ` Richard Stallman
  0 siblings, 1 reply; 263+ messages in thread
From: Elias Mårtenson @ 2016-02-20  5:25 UTC (permalink / raw)
  To: Clément Pit--Claudel; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 706 bytes --]

On 20 February 2016 at 06:54, Clément Pit--Claudel <clement.pit@gmail.com>
wrote:

> On 02/19/2016 05:44 PM, Lars Ingebrigtsen wrote:
> > The locale says what language culture the user is from, and that's the
> > important thing for most users.  Not the language of the document or
> > anything like that.
>
> Does it? I use GNU/Linux in English, but I'm from France. This seems to be
> a pretty common among French programmers.


But that is your choice, is it not? Linux (GNOME, actually) certainly have
very good French support and the you made a concious choice not to use it.
As such, you wouldn't be surprised to see English-oriented character
folding, would you?

Regards,
Elias

[-- Attachment #2: Type: text/html, Size: 1089 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-20  5:22                                                           ` Elias Mårtenson
@ 2016-02-20  6:31                                                             ` Lars Ingebrigtsen
  2016-02-20  9:18                                                               ` Elias Mårtenson
  2016-02-20 10:34                                                               ` Eli Zaretskii
  2016-02-20  9:21                                                             ` Eli Zaretskii
  1 sibling, 2 replies; 263+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-20  6:31 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: Eli Zaretskii, emacs-devel

Elias Mårtenson <lokedhs@gmail.com> writes:

> Every example you have given so far discusses the decomposition
> equivalence. I.e. the fact that the who variants of ñ are the
> same. Section 5.16 discuss the _concept_ of allowing n and ñ match
> similarly but the mechanism to do so is locale-dependent. This is what
> Unicode says, and that is what I say.

Yes.

Here are my thoughts (I was sitting on a plane today):

It seems to me that we're considering using the Unicode decomposition
rules for "variant detection" because it's what we have.  But this
doesn't allow people to say `C-s l' to find ł or `C-s o' to find ø, and
this would obviously be something that many people would find helpful.

So the Unicode decomposition rules only get us halfway there.  On the
other hand, they go to far for other users, who absolutely do not want
`C-s o' to find ø, but would be really glad if `C-s hermes' would find
"Hermés" (or is it "Hermès"?  I can't even type that in on this
keyboard).

Emacs is awesome.  We should aim to make this extremely useful feature
awesome.

So: How many characters are we really talking about?  Unicode is big and
scary, but this only applies to alphabetical scripts, right?  That is,
all the Latin-like scripts, and...  possibly Greek/Hebrew/Cyrillic?  I
don't know?

But if we only consider the Latin scripts for a moment, there aren't
more than a few hundred Unicode points that we care about.  Basically
all the old iso-8859-foos from around Europe.  And what we want is a way
for people with normal keyboards (they have a-z in Latin alphabet
countries) to search for variants.

So: That sounds like an evening's work.

(defvar *character-variants*
  '((?a ?á ?å ?ä ...)
    (?o ?ø ?ö ?ó ...)
    ...))

Everything that somebody says "that's kinda an a, right?" goes on there.

Then we have something like:

(define-locale-execption :no ?a ?å)

There would be few of these exceptions per locale.  The Scandinavian
countries would have three each, and Denmark's and Norway's would be the
same.

That bit is more than an evening, but is something that people would
enjoy submitting exceptions to, I think.

And then we just look up the locale, create the mapping when we type
`C-s', and there we are.  An awesome, very useful feature that would
annoy nobody, and that should be on by default.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-19 22:44                                               ` Lars Ingebrigtsen
  2016-02-19 22:54                                                 ` Clément Pit--Claudel
@ 2016-02-20  8:09                                                 ` Eli Zaretskii
  2016-02-20 14:32                                                   ` Richard Stallman
  1 sibling, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-20  8:09 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: emacs-devel@gnu.org
> Date: Sat, 20 Feb 2016 09:44:26 +1100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Thanks.  But what does "respect the locale" mean, in practical terms?
> > A large portion of the characters that have some decomposition, and
> > thus will be folded when searching, belong to scripts that are not
> > related to any language or other locale-specific attribute.  What do
> > you think should be done with them in the context of this feature?
> 
> The locale says what language culture the user is from, and that's the
> important thing for most users.  Not the language of the document or
> anything like that.
> 
> Norwegian (like Danish and Swedish) has a 29 character alphabet, and
> there are keys on our keyboards for all those letters.  Having any of
> those characters show up when searching for other characters is as weird
> for a Norwegian as it would be for an American to have any of their 26
> characters in their alphabet substitute for another.
> 
> The Norwegian "extra" characters are æøå, of which only the latter is
> confused in Emacs by any other character by isearch today.  I would
> imagine that an American would like ø to be folded with o, for instance,
> which it doesn't do.
> 
> So as currently implemented, the feature is kinda both incomplete and
> too intrusive at the same time.

Are you saying that making the default depend on the locale would be
OK?



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-20  6:31                                                             ` Lars Ingebrigtsen
@ 2016-02-20  9:18                                                               ` Elias Mårtenson
  2016-02-20 10:34                                                               ` Eli Zaretskii
  1 sibling, 0 replies; 263+ messages in thread
From: Elias Mårtenson @ 2016-02-20  9:18 UTC (permalink / raw)
  To: Lars Magne Ingebrigtsen; +Cc: Eli Zaretskii, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 3489 bytes --]

I think your message illustrates an opinion that is not only mine, in that
I am not against the idea of character folding. I mean, if I were, I'd just
ignore this discussion and just turn the feature off. What I want, and by
the looks of things, other people too, is to actually have this feature. I
just don't want it to be broken, and today it is broken because it' been
implemented based on incorrect assumptions.

On 20 Feb 2016 14:32, "Lars Ingebrigtsen" <larsi@gnus.org> wrote:

> It seems to me that we're considering using the Unicode decomposition
> rules for "variant detection" because it's what we have.  But this
> doesn't allow people to say `C-s l' to find ł or `C-s o' to find ø, and
> this would obviously be something that many people would find helpful.

The Unicode collation charts <http://unicode.org/charts/collation/> do
place ø in the "o" category. Eli said in an earlier message that the
collation charts were consulted, but when I test that doesn't seem to be
the case.

The Unicode character collation charts is the best generic solution that
Unicode gives us.

The proposal you put forward below seems very much like what I proposed
earlier; having the locale-dependent rules determine any exceptions and
then fall back to a generic method.

The question is what that generic should be. The current trick of
decomposing and using the first character of the decomposition is not good
and breaks down very quickly. Clearly the collation charts should be
consulted instead, but this is not enough. I could spend quite some time
discussing all the issues that I can think of (to get an idea of it, look
up how Korean and Devanagari works, as well as the concept of "grapheme
clusters").

> So the Unicode decomposition rules only get us halfway there.  On the
> other hand, they go to far for other users, who absolutely do not want
> `C-s o' to find ø, but would be really glad if `C-s hermes' would find
> "Hermés" (or is it "Hermès"?  I can't even type
> So: How many characters are we really talking about?  Unicode is big and
> scary, but this only applies to alphabetical scripts, right?  That is,
> all the Latin-like scripts, and...  possibly Greek/Hebrew/Cyrillic?  I
> don't know?

Cyrillic has the issues. Also, most of the accented characters in Cyrillic
are historical and not used today. Therefore having this feature in
Cyrillic would most definitely be useful.

> But if we only consider the Latin scripts for a moment, there aren't
> more than a few hundred Unicode points that we care about.  Basically
> all the old iso-8859-foos from around Europe.  And what we want is a way
> for people with normal keyboards (they have a-z in Latin alphabet
> countries) to search for variants.

It's more than that, because it's not just single characters we're talking
about but also combinations. Of course, for European languages this can be
handled by comparing only the base character but in other languages this is
a much more complex issue.

That said, I agree with you on your proposed approach.

> That bit is more than an evening, but is something that people would
> enjoy submitting exceptions to, I think.

You can count me in. :-)

> And then we just look up the locale, create the mapping when we type
> `C-s', and there we are.  An awesome, very useful feature that would
> annoy nobody, and that should be on by default.

That would be amazing.

Regards,
Elias

[-- Attachment #2: Type: text/html, Size: 3942 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-20  5:22                                                           ` Elias Mårtenson
  2016-02-20  6:31                                                             ` Lars Ingebrigtsen
@ 2016-02-20  9:21                                                             ` Eli Zaretskii
  2016-02-20 10:08                                                               ` Elias Mårtenson
  1 sibling, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-20  9:21 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: larsi, emacs-devel

> Date: Sat, 20 Feb 2016 13:22:57 +0800
> From: Elias Mårtenson <lokedhs@gmail.com>
> Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org>
> 
>  The reference you are looking for is the Unicode Standard itself. It
>  says to use the normalization forms, see for example section 5.16
>  there.
> 
> I have read that section before, and I have now read it again. The section certainly talks about searching
> ignores diacritics, but does not discuss a method to do so. There is also a reference to TR29, but it refers to
> grapheme clusters which would be a very strange way to do character folding (Koreans would be very
> confused).
> 
>  Every character-folding search implementation decomposes characters
>  before matching them. So does Emacs. We didn't invent this, and we
>  certainly didn't use the decompositions where they weren't supposed to
>  be used. It's not a trick, it's what everyone else does to do the
>  job. See the ICU library, for example.
> 
> Every example you have given so far discusses the decomposition equivalence. I.e. the fact that the who
> variants of ñ are the same. Section 5.16 discuss the _concept_ of allowing n and ñ match similarly but the
> mechanism to do so is locale-dependent. This is what Unicode says, and that is what I say. My position is
> simply that the default (if absolutely nothing else overrides it) should be chosen to take the locale of the user
> into account.
> 
>  > The decompositions are used in the normalisation forms to ensure that the two variants are treated
>  equally
>  > (such as the two alternative representations of ñ that we have been discussing).
> 
>  Yes, and any character-folding search uses normalization forms as
>  well.
> 
> Yes, but that's not what normalisation forms were designed to do.

Your interpretation is wrong, because every implementation of
character-folding in search uses normalization forms.  So if you want
to maintain that whoever does that is abusing normalization forms, you
are not just up against Emacs, you are up against the ICU library and
others.  You are also up against http://www.unicode.org/notes/tn5/.

It is possible that you only see the "equivalence" parts of all these
sources.  But in that case, you are actually claiming that folding
characters should never be done at all!  "Folding" means mapping
_distinct_ character sequences to the same basic sequence.  You start
from a normalization form, then compare the results disregarding
certain secondary, tertiary, etc. differences.  The Emacs
implementation simply expresses this algorithm by using suitable
regular expressions, and it's currently only capable of either
ignoring all the non-base weights or none at all, but the principle is
preserved to the letter.

> Again (I really apologise for repeating myself, I'm starting to sound like a troll and that is truly not my intention),
> the purpose of normalisation forms are to ensure that the two variants of ñ compare the same. It is not
> designed to provide a mechanism to allow n to compare equal to ñ.

Under character-folding that ignores diacritics, ñ should indeed
compare equal to n.

>  > Yes. I am fully aware of this. But so be it. Having applications work differently depending on the locale
>  of the
>  > environment the application was started in is nothing new.
> 
>  It's not new. It's old. We should move on to more general
>  environments that support multiple languages. Emacs is such an
>  environment. The old l10n paradigms are fundamentally incompatible
>  with that.
> 
> Sure, but doesn't it make sense to fall back to the user's default if the buffer does not have an overriding
> locale?

I don't know what you mean by "buffer has an overriding locale".
Emacs buffers don't have a locale, and they cannot do that in
principle because we support multiple languages.  E.g., what could the
locale of the HELLO buffer created by "C-h H" be?

>  > Being a multi-lingual environment, Emacs has no real notion of the
>  > locale.
>  >
>  > Perhaps it should?
> 
>  That'd be a step backward, IMO.
> 
> As opposed to having no concept of locale at all?

Yes.  A multilingual environment cannot have a locale in principle.
It will cease being multilingual if it does.

>  Strange, I always thought the data was there. Perhaps you should ask
>  a question on the Unicode mailing list, then.
> 
> That's a good idea actually.

That's a relief.  I was beginning to suspect I don't have any good
ideas at all.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-20  9:21                                                             ` Eli Zaretskii
@ 2016-02-20 10:08                                                               ` Elias Mårtenson
  2016-02-20 10:44                                                                 ` Eli Zaretskii
  0 siblings, 1 reply; 263+ messages in thread
From: Elias Mårtenson @ 2016-02-20 10:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 3056 bytes --]

On 20 February 2016 at 17:21, Eli Zaretskii <eliz@gnu.org> wrote:

Your interpretation is wrong, because every implementation of
> character-folding in search uses normalization forms.  So if you want
> to maintain that whoever does that is abusing normalization forms, you
> are not just up against Emacs, you are up against the ICU library and
> others.  You are also up against http://www.unicode.org/notes/tn5/.
>

They may do so, but only because we're not exactly swimming in great
alternatives.


> It is possible that you only see the "equivalence" parts of all these
> sources.  But in that case, you are actually claiming that folding
> characters should never be done at all!  "Folding" means mapping
> _distinct_ character sequences to the same basic sequence.  You start
> from a normalization form, then compare the results disregarding
> certain secondary, tertiary, etc. differences.


Of course. But the fact that you start from a normalisation form is of
secondary relevance here. I thinking that perhaps repeating the fact that
the normalised form is used has somewhat clouded the discussion.

When you say "ignoring [...] differences", how do you determine those
differences?

> Again (I really apologise for repeating myself, I'm starting to sound
> like a troll and that is truly not my intention),
> > the purpose of normalisation forms are to ensure that the two variants
> of ñ compare the same. It is not
> > designed to provide a mechanism to allow n to compare equal to ñ.
>
> Under character-folding that ignores diacritics, ñ should indeed
> compare equal to n.
>

Yes again. But how do you determine what rules to apply?


> > Sure, but doesn't it make sense to fall back to the user's default if
> the buffer does not have an overriding
> > locale?
>
> I don't know what you mean by "buffer has an overriding locale".
> Emacs buffers don't have a locale, and they cannot do that in
> principle because we support multiple languages.  E.g., what could the
> locale of the HELLO buffer created by "C-h H" be?
>

I was not talking about what Emacs does today. I was speaking about the
hypothetical case where buffers can have unique locales. I can see a few
cases where that would be a neat thing to have, but I have to scrape the
barrel to do so.


> > As opposed to having no concept of locale at all?
>
> Yes.  A multilingual environment cannot have a locale in principle.
> It will cease being multilingual if it does.
>

I guess we'll have to agree to disagree about this one. In any case, it's
for a different thread.


> >  Strange, I always thought the data was there. Perhaps you should ask
> >  a question on the Unicode mailing list, then.
> >
> > That's a good idea actually.
>
> That's a relief.  I was beginning to suspect I don't have any good
> ideas at all.
>

Apparently I have given the impression that I think your ideas are garbage.
I profoundly apologise for this and will try to be better going forward.

Regards,
Elias

[-- Attachment #2: Type: text/html, Size: 4578 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-20  6:31                                                             ` Lars Ingebrigtsen
  2016-02-20  9:18                                                               ` Elias Mårtenson
@ 2016-02-20 10:34                                                               ` Eli Zaretskii
  2016-02-21  2:51                                                                 ` Lars Ingebrigtsen
  2016-02-21 12:44                                                                 ` Richard Stallman
  1 sibling, 2 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-20 10:34 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: lokedhs, emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  emacs-devel <emacs-devel@gnu.org>
> Date: Sat, 20 Feb 2016 17:31:48 +1100
> 
> It seems to me that we're considering using the Unicode decomposition
> rules for "variant detection" because it's what we have.

No, we use decompositions because that's how equivalent strings are to
be compared and mapped/folded.

> But this doesn't allow people to say `C-s l' to find ł or `C-s o' to
> find ø, and this would obviously be something that many people would
> find helpful.
> 
> So the Unicode decomposition rules only get us halfway there.

Yes, the current implementation is just a first step.

> On the other hand, they go to far for other users, who absolutely do
> not want `C-s o' to find ø, but would be really glad if `C-s hermes'
> would find "Hermés" (or is it "Hermès"?  I can't even type that in
> on this keyboard).

Which is why this is toggle-able.

> (defvar *character-variants*
>   '((?a ?á ?å ?ä ...)
>     (?o ?ø ?ö ?ó ...)
>     ...))
> 
> Everything that somebody says "that's kinda an a, right?" goes on there.

The above won't support finding decomposed sequences as in á (there
are 2 characters here, they are just displayed as one).  I hope it's
agreed that it is imperative for us to support finding such decomposed
sequences (and we already do, under the current character-folding
default).  There are also more complicated cases like ǖ and ǖ (3
characters), where there are several diacritics which can be in either
order, and we still have to match them, because they look identical on
display.  We currently don't support that, but we should do that in
the future, and the decomposition data supports that.

It is, of course, possible to support this without normalization, by
having all those combinations in the database you proposed.  But why
should we bother creating and maintaining such a database (and
updating it whenever a new Unicode version is released), when one is
already available in data that we already read into Emacs?  So we
currently implement this by using the decomposition information in the
Unicode database.

Also, what would be the algorithm for searching using the data you
propose?  If you want to use regexps, then the data should already be
in the form of regexps, I think.  And I expect the regexp to look very
similar to what we current construct in character-fold.el.

So what are we really arguing here about?  Is it about a feature that
will allow exempting specific decompositions from the search?  If so,
I don't think it would be hard to do that with the current
implementation, using just the locale-exception data (which should be
much smaller).  If that will make everyone happier, we can do this
now, if we are sure we won't have another round of prolonged dispute
about that.

> And then we just look up the locale, create the mapping when we type
> `C-s', and there we are.  An awesome, very useful feature that would
> annoy nobody, and that should be on by default.

But it doesn't pass the simplest test above, so it really isn't good
enough.

Btw, this was already discussed in the past, before Artur sat down to
implement this stuff.  You may wish re-reading those discussions to
see the broader picture.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-20 10:08                                                               ` Elias Mårtenson
@ 2016-02-20 10:44                                                                 ` Eli Zaretskii
  0 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-20 10:44 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: larsi, emacs-devel

> Date: Sat, 20 Feb 2016 18:08:20 +0800
> From: Elias Mårtenson <lokedhs@gmail.com>
> Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org>
> 
>  It is possible that you only see the "equivalence" parts of all these
>  sources. But in that case, you are actually claiming that folding
>  characters should never be done at all! "Folding" means mapping
>  _distinct_ character sequences to the same basic sequence. You start
>  from a normalization form, then compare the results disregarding
>  certain secondary, tertiary, etc. differences.
> 
> Of course. But the fact that you start from a normalisation form is of secondary relevance here. I thinking that
> perhaps repeating the fact that the normalised form is used has somewhat clouded the discussion.
> 
> When you say "ignoring [...] differences", how do you determine those differences?
> 
>  > Again (I really apologise for repeating myself, I'm starting to sound like a troll and that is truly not my
>  intention),
>  > the purpose of normalisation forms are to ensure that the two variants of ñ compare the same. It is
>  not
>  > designed to provide a mechanism to allow n to compare equal to ñ.
> 
>  Under character-folding that ignores diacritics, ñ should indeed
>  compare equal to n.
> 
> 
> Yes again. But how do you determine what rules to apply?

Emacs currently ignores _any_ non-base differences, so ignoring is
simple: we disregard any characters in the decomposition except the
first one, which is the base character.

Further improvements in this direction will need to access additional
Unicode properties (to properly order the combining marks), and
perhaps additional tables.  But this is something to consider in the
future, and it will have to be done in C anyway; the regexp based
implementation cannot cut it.

>  > That's a good idea actually.
> 
>  That's a relief. I was beginning to suspect I don't have any good
>  ideas at all.
> 
> Apparently I have given the impression that I think your ideas are garbage. I profoundly apologise for this and
> will try to be better going forward.

My smilies are usually implicit, so no sweat.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-20  5:05                                                   ` Elias Mårtenson
@ 2016-02-20 13:59                                                     ` Achim Gratz
  0 siblings, 0 replies; 263+ messages in thread
From: Achim Gratz @ 2016-02-20 13:59 UTC (permalink / raw)
  To: emacs-devel

Elias Mårtenson writes:
> I'm posting this from a Gmail address. Perhaps you mistook it for a
> google.com address?
Yes, sorry, somehow I managed to read google.com…

> This is my personal email address. I work in the banking industry
> where the legal departments tend to try to want to cross the i's (or
> whatever the expression is).

Oh sure.  But ask them first if they have any dibs on code you write in
your spare time at all (it depends on where you live and work).  If not,
you don't need their signature at all to assign the copyright to the
FSF.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptations for Waldorf Q V3.00R3 and Q+ V3.54R2:
http://Synth.Stromeko.net/Downloads.html#WaldorfSDada




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-19 20:47                                             ` Marcin Borkowski
@ 2016-02-20 14:31                                               ` Richard Stallman
  0 siblings, 0 replies; 263+ messages in thread
From: Richard Stallman @ 2016-02-20 14:31 UTC (permalink / raw)
  To: Marcin Borkowski; +Cc: mvoteiza, juri, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I meant that Chromium is a tool for /consuming/ text,

I understand what you mean, but please don't use the word
"consume" to describe looking at a document.

Visiting a web page does not consume it.

See http://gnu.org/philosophy/words-to-avoid.html.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-20  5:25                                                   ` Elias Mårtenson
@ 2016-02-20 14:32                                                     ` Richard Stallman
  2016-02-20 15:50                                                       ` Elias Mårtenson
  0 siblings, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-20 14:32 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: clement.pit, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > But that is your choice, is it not? Linux (GNOME, actually) certainly have
  > very good French support

GNOME is not part of Linux.  It was started by the GNU Project.
Are you talking about the GNU operating system and calling it "Linux"?

Please don't credit our work to someone else.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-20  8:09                                                 ` Eli Zaretskii
@ 2016-02-20 14:32                                                   ` Richard Stallman
  2016-02-24 23:27                                                     ` Rasmus
  0 siblings, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-20 14:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Are you saying that making the default depend on the locale would be
  > OK?

I think it is ok to use the locale as a sort of last default,
but more important than that is to make it easy to specify different
behaviors in Emacs, both globally and for a specific buffer.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-20 14:32                                                     ` Richard Stallman
@ 2016-02-20 15:50                                                       ` Elias Mårtenson
  2016-02-21 12:45                                                         ` Richard Stallman
  0 siblings, 1 reply; 263+ messages in thread
From: Elias Mårtenson @ 2016-02-20 15:50 UTC (permalink / raw)
  To: rms; +Cc: Clément Pit--Claudel, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 618 bytes --]

On 20 February 2016 at 22:32, Richard Stallman <rms@gnu.org> wrote:

  > But that is your choice, is it not? Linux (GNOME, actually) certainly
> have
>   > very good French support
>
> GNOME is not part of Linux.  It was started by the GNU Project.
> Are you talking about the GNU operating system and calling it "Linux"?
>

I was specifically referring to GNOME, since it's the localised user
interface most people would interact with on a daily basis.

I have to admit that I was ignorant of the fact that GNU was involved in
it. I guess the G at the beginning of the name should have tipped me off.

Regards,
Elias

[-- Attachment #2: Type: text/html, Size: 1070 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-20 10:34                                                               ` Eli Zaretskii
@ 2016-02-21  2:51                                                                 ` Lars Ingebrigtsen
  2016-02-21  6:28                                                                   ` Elias Mårtenson
  2016-02-21 16:25                                                                   ` Eli Zaretskii
  2016-02-21 12:44                                                                 ` Richard Stallman
  1 sibling, 2 replies; 263+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-21  2:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lokedhs, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> The above won't support finding decomposed sequences as in á (there
> are 2 characters here, they are just displayed as one).

They are displayed as two characters in this Emacs (current Ubuntu,
Emacs git master).  :-)

> I hope it's agreed that it is imperative for us to support finding
> such decomposed sequences (and we already do, under the current
> character-folding default).

Yes.

> It is, of course, possible to support this without normalization, by
> having all those combinations in the database you proposed.  But why
> should we bother creating and maintaining such a database (and
> updating it whenever a new Unicode version is released), when one is
> already available in data that we already read into Emacs?  So we
> currently implement this by using the decomposition information in the
> Unicode database.

If that database gives us all that, then I'm all for using that database
instead of creating our own, of course.  But why doesn't C-s o find ø,
and C-s l find ł then?  

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-21  2:51                                                                 ` Lars Ingebrigtsen
@ 2016-02-21  6:28                                                                   ` Elias Mårtenson
  2016-02-21  8:14                                                                     ` Achim Gratz
                                                                                       ` (2 more replies)
  2016-02-21 16:25                                                                   ` Eli Zaretskii
  1 sibling, 3 replies; 263+ messages in thread
From: Elias Mårtenson @ 2016-02-21  6:28 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Eli Zaretskii, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1130 bytes --]

On 21 February 2016 at 10:51, Lars Ingebrigtsen <larsi@gnus.org> wrote:

If that database gives us all that, then I'm all for using that database
> instead of creating our own, of course.  But why doesn't C-s o find ø,
> and C-s l find ł then?

Because under the Unicode decomposition rules, ø is not decomposable. I
can't explain why that is the case (probably because there is no reason to
have a combining /. After all, the only languages that use ø are languages
that use it as a character of its own).

On a related note, I would expect a search for ö to match ø. As would you,
I guess?

In the thread on the Unicode mailing list, the recommendation seems to be
to use the CLDR (http://cldr.unicode.org/). Of course, this assumes there
is a locale, but the choice of locale can easily be customisable (with the
default being the user's locale).

Another poster on the same thread mentioned that the CLDR doesn't go all
the way, but adding a set of exceptions on top of it shouldn't be hard. In
any case, the result would be significantly better than what is implemented
now.

Regards,
Elias

[-- Attachment #2: Type: text/html, Size: 1704 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-21  6:28                                                                   ` Elias Mårtenson
@ 2016-02-21  8:14                                                                     ` Achim Gratz
  2016-02-23 16:56                                                                       ` Eli Zaretskii
  2016-02-21 10:05                                                                     ` Lars Ingebrigtsen
  2016-02-21 16:31                                                                     ` Eli Zaretskii
  2 siblings, 1 reply; 263+ messages in thread
From: Achim Gratz @ 2016-02-21  8:14 UTC (permalink / raw)
  To: emacs-devel

Elias Mårtenson writes:
> Because under the Unicode decomposition rules, ø is not decomposable. I
> can't explain why that is the case (probably because there is no reason to
> have a combining /. After all, the only languages that use ø are languages
> that use it as a character of its own).

AFAIK, for combining characters to be composable/decomposable the glyphs
must not overlap.  This is the same issue as with the polish »ł« to the
best of my knowledge.

In other words, unicode composition/decomposition rules tell you more
about the glyph construction than they do about useful strategies to
search for multiple characters.  The idea of using the base character of
the canonical decomposition in the search might still yield a useful
shortcut in most cases, but I'm not sure it is correct in all languages
even when that decomposition exists and, as the examples show, there are
cases where the non-decomposed character has to be treated specially.

Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptations for Waldorf Q V3.00R3 and Q+ V3.54R2:
http://Synth.Stromeko.net/Downloads.html#WaldorfSDada

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-21  6:28                                                                   ` Elias Mårtenson
  2016-02-21  8:14                                                                     ` Achim Gratz
@ 2016-02-21 10:05                                                                     ` Lars Ingebrigtsen
  2016-02-21 11:01                                                                       ` Elias Mårtenson
  2016-02-21 16:31                                                                     ` Eli Zaretskii
  2 siblings, 1 reply; 263+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-21 10:05 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: Eli Zaretskii, emacs-devel

Elias Mårtenson <lokedhs@gmail.com> writes:

> On a related note, I would expect a search for ö to match ø. As would you, I
> guess?

No, I wouldn't.  :-)  Actually, I wouldn't expect anything other than
the 26 first letters of the alphabet to match variants.  

It's like it's fine if you're typing in lower case characters for them
to match upper case, too, but if you've bothered to type an upper case
character, then you probably don't want lower case characters to match.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-21 10:05                                                                     ` Lars Ingebrigtsen
@ 2016-02-21 11:01                                                                       ` Elias Mårtenson
  2016-02-21 16:02                                                                         ` Eli Zaretskii
  2016-02-22  1:58                                                                         ` Lars Ingebrigtsen
  0 siblings, 2 replies; 263+ messages in thread
From: Elias Mårtenson @ 2016-02-21 11:01 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Eli Zaretskii, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1064 bytes --]

On 21 February 2016 at 18:05, Lars Ingebrigtsen <larsi@gnus.org> wrote:

Elias Mårtenson <lokedhs@gmail.com> writes:
>
> > On a related note, I would expect a search for ö to match ø. As would
> you, I
> > guess?
>
> No, I wouldn't.  :-)  Actually, I wouldn't expect anything other than
> the 26 first letters of the alphabet to match variants.
>

All right, but at least in Sweden we often write Danish and Norwegian names
using ø and æ, so for us we definitely want to fold those into ö and ä.
That was what I was referring to. I.e. the former are definitely variants
of the latter. In fact, there is an argument to be made for "ü" to be a
variant of "y" as well, even though it's very rare (pretty much limited to
a single word: "Müsli").


> It's like it's fine if you're typing in lower case characters for them
> to match upper case, too, but if you've bothered to type an upper case
> character, then you probably don't want lower case characters to match.


This is how Emacs behaves today, is it not?

Regards,
Elias

[-- Attachment #2: Type: text/html, Size: 1717 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-20 10:34                                                               ` Eli Zaretskii
  2016-02-21  2:51                                                                 ` Lars Ingebrigtsen
@ 2016-02-21 12:44                                                                 ` Richard Stallman
  2016-02-21 16:05                                                                   ` Eli Zaretskii
  1 sibling, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-21 12:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > It seems to me that we're considering using the Unicode decomposition
  > > rules for "variant detection" because it's what we have.

  > No, we use decompositions because that's how equivalent strings are to
  > be compared and mapped/folded.

Please let's drop the idea of determining the folding behavior
automatically from something in Unicide.  It is too rigid.

Users want many different folding behaviors.  Instead of insisting on
a particular set of equivalences, let's make it easy for users to
specify the foldings they want.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-20 15:50                                                       ` Elias Mårtenson
@ 2016-02-21 12:45                                                         ` Richard Stallman
  0 siblings, 0 replies; 263+ messages in thread
From: Richard Stallman @ 2016-02-21 12:45 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: clement.pit, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I was specifically referring to GNOME, since it's the localised user
  > interface most people would interact with on a daily basis.

  > I have to admit that I was ignorant of the fact that GNU was involved in
  > it. I guess the G at the beginning of the name should have tipped me off.

Well, it certainly has nothing to do with Linux.
Linux is a kernel, nothing more.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-21 11:01                                                                       ` Elias Mårtenson
@ 2016-02-21 16:02                                                                         ` Eli Zaretskii
  2016-02-22  1:58                                                                         ` Lars Ingebrigtsen
  1 sibling, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-21 16:02 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: larsi, emacs-devel

> Date: Sun, 21 Feb 2016 19:01:06 +0800
> From: Elias Mårtenson <lokedhs@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel <emacs-devel@gnu.org>
> 
>  It's like it's fine if you're typing in lower case characters for them
>  to match upper case, too, but if you've bothered to type an upper case
>  character, then you probably don't want lower case characters to match.
> 
> This is how Emacs behaves today, is it not?

Yes.  It's called "asymmetric search".



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-21 12:44                                                                 ` Richard Stallman
@ 2016-02-21 16:05                                                                   ` Eli Zaretskii
  2016-02-22 17:57                                                                     ` Richard Stallman
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-21 16:05 UTC (permalink / raw)
  To: rms; +Cc: larsi, lokedhs, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, lokedhs@gmail.com, emacs-devel@gnu.org
> Date: Sun, 21 Feb 2016 07:44:45 -0500
> 
>  > > It seems to me that we're considering using the Unicode decomposition
>   > > rules for "variant detection" because it's what we have.
> 
>   > No, we use decompositions because that's how equivalent strings are to
>   > be compared and mapped/folded.
> 
> Please let's drop the idea of determining the folding behavior
> automatically from something in Unicide.  It is too rigid.

We don't determine the behavior from Unicode.  We use the Unicode data
to implement the behavior we consider useful.

> Users want many different folding behaviors.  Instead of insisting on
> a particular set of equivalences, let's make it easy for users to
> specify the foldings they want.

Whatever additional behavior and nuances the users want, we can
implement it regardless of the Unicode data we use for the basic
folding (once we figure out what is it that they want and how to
implement that best).  There's no dichotomy here.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-21  2:51                                                                 ` Lars Ingebrigtsen
  2016-02-21  6:28                                                                   ` Elias Mårtenson
@ 2016-02-21 16:25                                                                   ` Eli Zaretskii
  2016-02-22  1:56                                                                     ` Lars Ingebrigtsen
  1 sibling, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-21 16:25 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: lokedhs, emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: lokedhs@gmail.com,  emacs-devel@gnu.org
> Date: Sun, 21 Feb 2016 13:51:46 +1100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > The above won't support finding decomposed sequences as in á (there
> > are 2 characters here, they are just displayed as one).
> 
> They are displayed as two characters in this Emacs (current Ubuntu,
> Emacs git master).  :-)

Probably because your default font is not capable enough.  Or maybe
your build lacks libotf and/or libm17n?

> If that database gives us all that, then I'm all for using that database
> instead of creating our own, of course.  But why doesn't C-s o find ø,
> and C-s l find ł then?  

To avoid making yet another group of users angry, this time with no
firm basis at all ;-)



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-21  6:28                                                                   ` Elias Mårtenson
  2016-02-21  8:14                                                                     ` Achim Gratz
  2016-02-21 10:05                                                                     ` Lars Ingebrigtsen
@ 2016-02-21 16:31                                                                     ` Eli Zaretskii
  2016-02-21 16:58                                                                       ` Elias Mårtenson
  2 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-21 16:31 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: larsi, emacs-devel

> Date: Sun, 21 Feb 2016 14:28:40 +0800
> From: Elias Mårtenson <lokedhs@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel <emacs-devel@gnu.org>
> 
>     If that database gives us all that, then I'm all for using that database
>     instead of creating our own, of course.  But why doesn't C-s o find ø,
>     and C-s l find ł then?
> 
> Because under the Unicode decomposition rules, ø is not decomposable. I can't explain why that is the case (probably because there is no reason to have a combining /.

I asked the question about this on the Unicode mailing list, let's see
what we get in response.

> After all, the only languages that use ø are languages that use it as a character of its own).

Not sure what this means: how is the usage of ø in this regard
different from, say, ä?

> In the thread on the Unicode mailing list, the recommendation seems to be to use the CLDR (http://cldr.unicode.org/). Of course, this assumes there is a locale, but the choice of locale can easily be customisable (with the default being the user's locale).

Not locale, language.

> Another poster on the same thread mentioned that the CLDR doesn't go all the way, but adding a set of exceptions on top of it shouldn't be hard. In any case, the result would be significantly better than what is implemented now.

The last part is not yet clear to me, as this aspect was never
discussed in enough detail.  I have now asked explicitly about that.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-21 16:31                                                                     ` Eli Zaretskii
@ 2016-02-21 16:58                                                                       ` Elias Mårtenson
  2016-02-21 17:23                                                                         ` Eli Zaretskii
  2016-02-22 17:59                                                                         ` Richard Stallman
  0 siblings, 2 replies; 263+ messages in thread
From: Elias Mårtenson @ 2016-02-21 16:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1839 bytes --]

On 22 February 2016 at 00:31, Eli Zaretskii <eliz@gnu.org> wrote:

>
> > After all, the only languages that use ø are languages that use it as a
> character of its own).
>
> Not sure what this means: how is the usage of ø in this regard
> different from, say, ä?
>

Well, if you are interested, here's how it works in the Scandinavian
languages:

Swedish has three extra characters: å, ä and ö. These are individual
characters as has been discussed many times in this thread. Norwegian and
Danish has the same extra characters, except that they write them as å, æ
and ø (they also sort them in different order, but that's beside the point).

Now, other languages may use the character (in the Unicode sense) ö as a
variation of o. In other words, o with ¨ on top of it. For users of such
languages ö is just a variation of o as we also have discussed before. On
the other hand, ø is not used as a variation of o in any language that I am
aware of.

In Sweden, when discussing Norwegian or Danish words (usually names) we
tend to keep their style of characters. So for example, if I might refer to
my Swedish friend Östen and my Norwegian friend Øystein. I would not spell
his name Öystein, even though it's technically the same letter.

However, when searching for "ö" I would certainly expect to match the first
letter of Øystein.

> In the thread on the Unicode mailing list, the recommendation seems to be
> to use the CLDR (http://cldr.unicode.org/). Of course, this assumes there
> is a locale, but the choice of locale can easily be customisable (with the
> default being the user's locale).
>
> Not locale, language.
>

Right. I guess I'm getting ahead of myself. As you know, I'm advocating
choosing a default language based on the locale of the user.

Regards,
Elias

[-- Attachment #2: Type: text/html, Size: 2551 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-21 16:58                                                                       ` Elias Mårtenson
@ 2016-02-21 17:23                                                                         ` Eli Zaretskii
  2016-02-21 18:48                                                                           ` Ivan Andrus
                                                                                             ` (2 more replies)
  2016-02-22 17:59                                                                         ` Richard Stallman
  1 sibling, 3 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-21 17:23 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: larsi, emacs-devel

> Date: Mon, 22 Feb 2016 00:58:37 +0800
> From: Elias Mårtenson <lokedhs@gmail.com>
> Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org>
> 
> Now, other languages may use the character (in the Unicode sense) ö as a variation of o. In other words, o
> with ¨ on top of it. For users of such languages ö is just a variation of o as we also have discussed before. On
> the other hand, ø is not used as a variation of o in any language that I am aware of.

I don't think this is correct.  I think ö is a letter on its own in
any language that uses it.  Which is why I don't see how it is
different from ø.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-21 17:23                                                                         ` Eli Zaretskii
@ 2016-02-21 18:48                                                                           ` Ivan Andrus
  2016-02-22 15:58                                                                           ` Wolfgang Jenkner
  2016-02-22 17:59                                                                           ` Richard Stallman
  2 siblings, 0 replies; 263+ messages in thread
From: Ivan Andrus @ 2016-02-21 18:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, Elias Mårtenson, emacs-devel

On Feb 21, 2016, at 10:23 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> Date: Mon, 22 Feb 2016 00:58:37 +0800
>> From: Elias Mårtenson <lokedhs@gmail.com>
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org>
>> 
>> Now, other languages may use the character (in the Unicode sense) ö as a variation of o. In other words, o
>> with ¨ on top of it. For users of such languages ö is just a variation of o as we also have discussed before. On
>> the other hand, ø is not used as a variation of o in any language that I am aware of.
> 
> I don't think this is correct.  I think ö is a letter on its own in
> any language that uses it.  Which is why I don't see how it is
> different from ø.

Well, the New Yorker writes coöperate [1], though it’s definitely an o.  That said, I don’t think we should worry overly about that case, since we Americans will want o to match them all. :-)

-Ivan

[1] https://en.wikipedia.org/wiki/Diaeresis_(diacritic)#English


^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-21 16:25                                                                   ` Eli Zaretskii
@ 2016-02-22  1:56                                                                     ` Lars Ingebrigtsen
  2016-02-22  9:20                                                                       ` Andreas Schwab
  0 siblings, 1 reply; 263+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-22  1:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lokedhs, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Probably because your default font is not capable enough.  Or maybe
> your build lacks libotf and/or libm17n?

Let's see...

  Does Emacs use -lfreetype?                              yes
  Does Emacs use -lm17n-flt?                              yes
  Does Emacs use -lotf?                                   yes
  Does Emacs use -lxft?                                   yes

And the font seems to be

    xft:-unknown-Ubuntu Mono-normal-normal-normal-*-24-*-*-*-m-0-iso10646-1 (#x27)

I don't think I've customised any of this stuff -- it's just the default
Ubuntu setup.  It's weird that the default Ubuntu font won't do the
right thing here...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-21 11:01                                                                       ` Elias Mårtenson
  2016-02-21 16:02                                                                         ` Eli Zaretskii
@ 2016-02-22  1:58                                                                         ` Lars Ingebrigtsen
  2016-02-22  2:34                                                                           ` Elias Mårtenson
  2016-02-22  3:38                                                                           ` Eli Zaretskii
  1 sibling, 2 replies; 263+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-22  1:58 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: Eli Zaretskii, emacs-devel

Elias Mårtenson <lokedhs@gmail.com> writes:

>  It's like it's fine if you're typing in lower case characters for them
>  to match upper case, too, but if you've bothered to type an upper case
>  character, then you probably don't want lower case characters to match.
>
> This is how Emacs behaves today, is it not?

Yes, and that's my point.  I'd expect character folding when doing
searches to work in an analogous fashion: If I type `C-s é', I would be
surprised if it found "e", but not the other way around.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22  1:58                                                                         ` Lars Ingebrigtsen
@ 2016-02-22  2:34                                                                           ` Elias Mårtenson
  2016-02-22  2:48                                                                             ` Lars Ingebrigtsen
  2016-02-22 18:01                                                                             ` Richard Stallman
  2016-02-22  3:38                                                                           ` Eli Zaretskii
  1 sibling, 2 replies; 263+ messages in thread
From: Elias Mårtenson @ 2016-02-22  2:34 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Eli Zaretskii, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 832 bytes --]

On 22 February 2016 at 09:58, Lars Ingebrigtsen <larsi@gnus.org> wrote:

> Elias Mårtenson <lokedhs@gmail.com> writes:
>
> >  It's like it's fine if you're typing in lower case characters for them
> >  to match upper case, too, but if you've bothered to type an upper case
> >  character, then you probably don't want lower case characters to match.
> >
> > This is how Emacs behaves today, is it not?
>
> Yes, and that's my point.  I'd expect character folding when doing
> searches to work in an analogous fashion: If I type `C-s é', I would be
> surprised if it found "e", but not the other way around.


But you are Danish, are you not? As such, I would have thought that when
you search for ø, you would want to find a Swedish ö? (this is the inverse
of the natural Swedish behaviour).

Regards,
Elias

[-- Attachment #2: Type: text/html, Size: 1308 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22  2:34                                                                           ` Elias Mårtenson
@ 2016-02-22  2:48                                                                             ` Lars Ingebrigtsen
  2016-02-22  6:13                                                                               ` Werner LEMBERG
  2016-02-22 18:01                                                                               ` Richard Stallman
  2016-02-22 18:01                                                                             ` Richard Stallman
  1 sibling, 2 replies; 263+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-22  2:48 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: Eli Zaretskii, emacs-devel

Elias Mårtenson <lokedhs@gmail.com> writes:

> But you are Danish, are you not?

Almost.  Norwegian.  :-)

> As such, I would have thought that when you search for ø, you would
> want to find a Swedish ö? (this is the inverse of the natural Swedish
> behaviour).

No, I think that would be weird behaviour, and is not something that I
ever wished would happen.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22  1:58                                                                         ` Lars Ingebrigtsen
  2016-02-22  2:34                                                                           ` Elias Mårtenson
@ 2016-02-22  3:38                                                                           ` Eli Zaretskii
  2016-02-22  3:57                                                                             ` Lars Ingebrigtsen
  1 sibling, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-22  3:38 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: lokedhs, emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  emacs-devel <emacs-devel@gnu.org>
> Date: Mon, 22 Feb 2016 12:58:31 +1100
> 
> Elias Mårtenson <lokedhs@gmail.com> writes:
> 
> >  It's like it's fine if you're typing in lower case characters for them
> >  to match upper case, too, but if you've bothered to type an upper case
> >  character, then you probably don't want lower case characters to match.
> >
> > This is how Emacs behaves today, is it not?
> 
> Yes, and that's my point.  I'd expect character folding when doing
> searches to work in an analogous fashion: If I type `C-s é', I would be
> surprised if it found "e", but not the other way around.

Emacs behaves as you expect.  Did you try that?



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22  3:38                                                                           ` Eli Zaretskii
@ 2016-02-22  3:57                                                                             ` Lars Ingebrigtsen
  2016-02-22 16:10                                                                               ` Eli Zaretskii
  2016-02-22 18:58                                                                               ` John Wiegley
  0 siblings, 2 replies; 263+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-22  3:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: lokedhs, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Lars Ingebrigtsen <larsi@gnus.org>
>> Cc: Eli Zaretskii <eliz@gnu.org>,  emacs-devel <emacs-devel@gnu.org>
>> Date: Mon, 22 Feb 2016 12:58:31 +1100
>> 
>> Elias Mårtenson <lokedhs@gmail.com> writes:
>> 
>> >  It's like it's fine if you're typing in lower case characters for them
>> >  to match upper case, too, but if you've bothered to type an upper case
>> >  character, then you probably don't want lower case characters to match.
>> >
>> > This is how Emacs behaves today, is it not?
>> 
>> Yes, and that's my point.  I'd expect character folding when doing
>> searches to work in an analogous fashion: If I type `C-s é', I would be
>> surprised if it found "e", but not the other way around.
>
> Emacs behaves as you expect.  Did you try that?

I am describing how Emacs works today.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22  2:48                                                                             ` Lars Ingebrigtsen
@ 2016-02-22  6:13                                                                               ` Werner LEMBERG
  2016-02-22 18:03                                                                                 ` Richard Stallman
  2016-02-22 18:01                                                                               ` Richard Stallman
  1 sibling, 1 reply; 263+ messages in thread
From: Werner LEMBERG @ 2016-02-22  6:13 UTC (permalink / raw)
  To: larsi; +Cc: eliz, lokedhs, emacs-devel


>> As such, I would have thought that when you search for ø, you would
>> want to find a Swedish ö? (this is the inverse of the natural
>> Swedish behaviour).
> 
> No, I think that would be weird behaviour, and is not something that
> I ever wished would happen.

Well, being Austrian, I would like to have a full equivalence of ø to
ö while searching in German data...


    Werner

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22  1:56                                                                     ` Lars Ingebrigtsen
@ 2016-02-22  9:20                                                                       ` Andreas Schwab
  2016-02-23  1:46                                                                         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 263+ messages in thread
From: Andreas Schwab @ 2016-02-22  9:20 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Eli Zaretskii, lokedhs, emacs-devel

Lars Ingebrigtsen <larsi@gnus.org> writes:

> And the font seems to be
>
>     xft:-unknown-Ubuntu Mono-normal-normal-normal-*-24-*-*-*-m-0-iso10646-1 (#x27)

For both characters?

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-21 17:23                                                                         ` Eli Zaretskii
  2016-02-21 18:48                                                                           ` Ivan Andrus
@ 2016-02-22 15:58                                                                           ` Wolfgang Jenkner
  2016-02-22 16:35                                                                             ` Eli Zaretskii
  2016-02-22 17:59                                                                           ` Richard Stallman
  2 siblings, 1 reply; 263+ messages in thread
From: Wolfgang Jenkner @ 2016-02-22 15:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, Elias Mårtenson, emacs-devel

On Sun, Feb 21 2016, Eli Zaretskii wrote:

>> Date: Mon, 22 Feb 2016 00:58:37 +0800
>> From: Elias Mårtenson <lokedhs@gmail.com>
>> Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org>
>> 
>> Now, other languages may use the character (in the Unicode sense) ö as a variation of o. In other words, o
>> with ¨ on top of it. For users of such languages ö is just a variation of o as we also have discussed before. On
>> the other hand, ø is not used as a variation of o in any language that I am aware of.
>
> I don't think this is correct.  I think ö is a letter on its own in
> any language that uses it.  Which is why I don't see how it is
> different from ø.

In German dictionary collation order there's only a secondary difference
between o and ö [1]

&O<<ö<<<Ö

[1] http://unicode.org/repos/cldr/trunk/common/collation/de.xml



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22  3:57                                                                             ` Lars Ingebrigtsen
@ 2016-02-22 16:10                                                                               ` Eli Zaretskii
  2016-02-22 18:58                                                                               ` John Wiegley
  1 sibling, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-22 16:10 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: lokedhs, emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: lokedhs@gmail.com,  emacs-devel@gnu.org
> Date: Mon, 22 Feb 2016 14:57:39 +1100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Lars Ingebrigtsen <larsi@gnus.org>
> >> Cc: Eli Zaretskii <eliz@gnu.org>,  emacs-devel <emacs-devel@gnu.org>
> >> Date: Mon, 22 Feb 2016 12:58:31 +1100
> >> 
> >> Elias Mårtenson <lokedhs@gmail.com> writes:
> >> 
> >> >  It's like it's fine if you're typing in lower case characters for them
> >> >  to match upper case, too, but if you've bothered to type an upper case
> >> >  character, then you probably don't want lower case characters to match.
> >> >
> >> > This is how Emacs behaves today, is it not?
> >> 
> >> Yes, and that's my point.  I'd expect character folding when doing
> >> searches to work in an analogous fashion: If I type `C-s é', I would be
> >> surprised if it found "e", but not the other way around.
> >
> > Emacs behaves as you expect.  Did you try that?
> 
> I am describing how Emacs works today.

So was I.  I just wanted to be sure Emacs behaves according to your
expectations in this case, and that you are not complaining about what
it does.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22 15:58                                                                           ` Wolfgang Jenkner
@ 2016-02-22 16:35                                                                             ` Eli Zaretskii
  2016-02-22 16:56                                                                               ` Wolfgang Jenkner
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-22 16:35 UTC (permalink / raw)
  To: Wolfgang Jenkner; +Cc: larsi, lokedhs, emacs-devel

> From: Wolfgang Jenkner <wjenkner@inode.at>
> Cc: Elias Mårtenson <lokedhs@gmail.com>,  larsi@gnus.org,
>   emacs-devel@gnu.org
> Date: Mon, 22 Feb 2016 16:58:36 +0100
> 
> > I don't think this is correct.  I think ö is a letter on its own in
> > any language that uses it.  Which is why I don't see how it is
> > different from ø.
> 
> In German dictionary collation order there's only a secondary difference
> between o and ö [1]
> 
> &O<<ö<<<Ö

Yes, I know.  But that doesn't mean ö is not a letter on its own.

IOW, collation order says nothing about letter differences, IMO.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22 16:35                                                                             ` Eli Zaretskii
@ 2016-02-22 16:56                                                                               ` Wolfgang Jenkner
  2016-02-22 17:24                                                                                 ` Eli Zaretskii
  0 siblings, 1 reply; 263+ messages in thread
From: Wolfgang Jenkner @ 2016-02-22 16:56 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel

On Mon, Feb 22 2016, Eli Zaretskii wrote:

>> > I don't think this is correct.  I think ö is a letter on its own in
>> > any language that uses it.  Which is why I don't see how it is
>> > different from ø.
>> 
>> In German dictionary collation order there's only a secondary difference
>> between o and ö [1]
>> 
>> &O<<ö<<<Ö
>
> Yes, I know.  But that doesn't mean ö is not a letter on its own.
>
> IOW, collation order says nothing about letter differences, IMO.

I think it does.  All objections to making char-fold search the default
come from people who expect that letters with a *primary* difference in
their locale should not be conflated.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22 16:56                                                                               ` Wolfgang Jenkner
@ 2016-02-22 17:24                                                                                 ` Eli Zaretskii
  0 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-22 17:24 UTC (permalink / raw)
  To: Wolfgang Jenkner; +Cc: larsi, lokedhs, emacs-devel

> From: Wolfgang Jenkner <wjenkner@inode.at>
> Cc: larsi@gnus.org,  lokedhs@gmail.com,  emacs-devel@gnu.org
> Date: Mon, 22 Feb 2016 17:56:19 +0100
> 
> On Mon, Feb 22 2016, Eli Zaretskii wrote:
> 
> >> > I don't think this is correct.  I think ö is a letter on its own in
> >> > any language that uses it.  Which is why I don't see how it is
> >> > different from ø.
> >> 
> >> In German dictionary collation order there's only a secondary difference
> >> between o and ö [1]
> >> 
> >> &O<<ö<<<Ö
> >
> > Yes, I know.  But that doesn't mean ö is not a letter on its own.
> >
> > IOW, collation order says nothing about letter differences, IMO.
> 
> I think it does.  All objections to making char-fold search the default
> come from people who expect that letters with a *primary* difference in
> their locale should not be conflated.

I understand, and I didn't try to argue against that.  The sub-thread
about being a "letter on its own" is just a tangent, not directly
related to the issue at hand.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-21 16:05                                                                   ` Eli Zaretskii
@ 2016-02-22 17:57                                                                     ` Richard Stallman
  2016-02-22 18:34                                                                       ` Eli Zaretskii
  0 siblings, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-22 17:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > Please let's drop the idea of determining the folding behavior
  > > automatically from something in Unicide.  It is too rigid.

  > We don't determine the behavior from Unicode.  We use the Unicode data
  > to implement the behavior we consider useful.

What we have seen is that the behavior that comes from that Unicode
data does not please the users very much.  Users seem to have many
different ideas of what folding is useful, and disagree with each
other greatly.

We should not cling to the set of folding specs that happen to come
from that Unicode data.  Let's forget that Unicode data.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-21 16:58                                                                       ` Elias Mårtenson
  2016-02-21 17:23                                                                         ` Eli Zaretskii
@ 2016-02-22 17:59                                                                         ` Richard Stallman
  2016-02-22 18:51                                                                           ` Eli Zaretskii
  1 sibling, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-22 17:59 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: eliz, larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Right. I guess I'm getting ahead of myself. As you know, I'm advocating
  > choosing a default language based on the locale of the user.

We need:

* A per-buffer language preference variable.
* A global value which becomes the default for new buffers.

The global value can be initialized when Emacs starts based on the
locale.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-21 17:23                                                                         ` Eli Zaretskii
  2016-02-21 18:48                                                                           ` Ivan Andrus
  2016-02-22 15:58                                                                           ` Wolfgang Jenkner
@ 2016-02-22 17:59                                                                           ` Richard Stallman
  2016-02-22 18:57                                                                             ` Eli Zaretskii
  2 siblings, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-22 17:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I don't think this is correct.  I think ö is a letter on its own in
  > any language that uses it.  Which is why I don't see how it is
  > different from ø.

Users seem to disagree on whether to fold diacritics that make
different letters (ñ, ç, polish l with slash) or only those that
modify a single letter (as á, à, â in French).

I think that we should have a user option which controls this and only
this.

That means we should have two levels of folding group definitions: the
smaller groups which hold variants of the same letter, and the bigger
groups which hold similar letters.

These groups need to depend on the language setting.  In English (and
in French), ö is a modified o.  In Swedish (and German, I think), ö
and o are different letters.

I think that each folding group should specify one character that is
the base.  This is because users also seem to disagree on what it
should mean to specify a non-base letter in the search string.

Some plausible meanings are

* Find that one and only that one.
* Treat it the same as specifying the base letter.

There should be a user option to choose between those two (and maybe
some other behaviors for a non-base letter in the search string).

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22  2:34                                                                           ` Elias Mårtenson
  2016-02-22  2:48                                                                             ` Lars Ingebrigtsen
@ 2016-02-22 18:01                                                                             ` Richard Stallman
  2016-02-22 18:58                                                                               ` Eli Zaretskii
                                                                                                 ` (2 more replies)
  1 sibling, 3 replies; 263+ messages in thread
From: Richard Stallman @ 2016-02-22 18:01 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: larsi, eliz, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > But you are Danish, are you not? As such, I would have thought that when
  > you search for ø, you would want to find a Swedish ö? (this is the inverse
  > of the natural Swedish behaviour).

Elias and Lars, what do you two think searching for o should match?
Should it match ö and ø, or not?

IF you want o not to match ö and ø, then you want ö and ø to be a
class by themselves.

One way to handle each class is the asymnetric way: searching for the base
character matches all of them, but searching for one of the other character
matches only itself.

In Swedish, ö could be the base character and ø a variant.
In Danish, ø could be the base character and ö the variant.

Would each of you be happy with that mode?

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22  2:48                                                                             ` Lars Ingebrigtsen
  2016-02-22  6:13                                                                               ` Werner LEMBERG
@ 2016-02-22 18:01                                                                               ` Richard Stallman
  2016-02-22 19:06                                                                                 ` Eli Zaretskii
  1 sibling, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-22 18:01 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: eliz, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

Lars, would you ever want any sort of folding between ö and ø?

Would you want to use my proposed setting where folding occurs only
between letters with and without an accent, and never folding between
related letters such as o and ø?  If you use that setting, then
ö and ø will also never fold.  Thus, you won't need to have any preference
about how folding should treat ö and ø, when users do enable it.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22  6:13                                                                               ` Werner LEMBERG
@ 2016-02-22 18:03                                                                                 ` Richard Stallman
  2016-02-22 18:27                                                                                   ` Werner LEMBERG
  0 siblings, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-22 18:03 UTC (permalink / raw)
  To: Werner LEMBERG; +Cc: larsi, lokedhs, eliz, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Well, being Austrian, I would like to have a full equivalence of ø to
  > ö while searching in German data...

In what use case would that make a difference, and how?
ø is not normally used in German, right?

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22 18:03                                                                                 ` Richard Stallman
@ 2016-02-22 18:27                                                                                   ` Werner LEMBERG
  0 siblings, 0 replies; 263+ messages in thread
From: Werner LEMBERG @ 2016-02-22 18:27 UTC (permalink / raw)
  To: rms; +Cc: larsi, lokedhs, eliz, emacs-devel

>   > Well, being Austrian, I would like to have a full equivalence of
>   > ø to ö while searching in German data...
> 
> In what use case would that make a difference, and how?

For example, the word `Øre' is usually written `Öre' in German (and
this is true for essentially all words containing ø), so it would be
good if a search for the latter finds the former and vice versa.

> ø is not normally used in German, right?

It is not used in the German language, but today there is a tendency
in German speaking countries to use the original spelling in foreign
words.  However, during history many words were also `germanized' by
adapting the spelling to German (i.e., becoming loan words), and here
only German characters are used.  In many cases accents were lost
during the conversion to loan words; for example, a quite common name
in German and Austria is `Dvorak', with the original Czech spelling
being `Dvořák'.

    Werner

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22 17:57                                                                     ` Richard Stallman
@ 2016-02-22 18:34                                                                       ` Eli Zaretskii
  0 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-22 18:34 UTC (permalink / raw)
  To: rms; +Cc: larsi, lokedhs, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, lokedhs@gmail.com, emacs-devel@gnu.org
> Date: Mon, 22 Feb 2016 12:57:54 -0500
> 
>   > > Please let's drop the idea of determining the folding behavior
>   > > automatically from something in Unicide.  It is too rigid.
> 
>   > We don't determine the behavior from Unicode.  We use the Unicode data
>   > to implement the behavior we consider useful.
> 
> What we have seen is that the behavior that comes from that Unicode
> data does not please the users very much.  Users seem to have many
> different ideas of what folding is useful, and disagree with each
> other greatly.

My analysis of the discussion is that a small number of specific cases
of language-independent folding makes users of some languages unhappy.
The number of such cases is small, and they only bother users of a
small number of languages we support.

My conclusion from that is that the feature as implemented needs to be
augmented in minor ways, but is basically correct for the majority of
use cases.  IOW, it's not perfect, but it's a significant improvement
for many.

> We should not cling to the set of folding specs that happen to come
> from that Unicode data.  Let's forget that Unicode data.

That'd be a mistake tantamount to throwing the baby with the
bathwater.  Besides, any alternative data to use for such a feature
will be either identical or very similar to what we use now.  The only
alternative that won't need such similar data is to decide to never
have this feature.  I don't think we want to do that.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22 17:59                                                                         ` Richard Stallman
@ 2016-02-22 18:51                                                                           ` Eli Zaretskii
  2016-02-23  0:14                                                                             ` Juri Linkov
  2016-02-26 20:23                                                                             ` Richard Stallman
  0 siblings, 2 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-22 18:51 UTC (permalink / raw)
  To: rms; +Cc: larsi, lokedhs, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> CC: eliz@gnu.org, larsi@gnus.org, emacs-devel@gnu.org
> Date: Mon, 22 Feb 2016 12:59:00 -0500
> 
>   > Right. I guess I'm getting ahead of myself. As you know, I'm advocating
>   > choosing a default language based on the locale of the user.
> 
> We need:
> 
> * A per-buffer language preference variable.
> * A global value which becomes the default for new buffers.

That's unnecessarily restrictive; we can do better with the current
infrastructure.  Some encodings provide us with charset information,
which can be used to deduce the language of the text.  Some characters
belong to Unicode blocks that allow identification of the language, or
maybe a small group of languages.  In some cases, the text itself
comes with metadata which describes the language.  And there might be
other sources of information about the language.

It would be silly to disregard this information where it exists.

There are other aspects of this that need to be considered, if we want
for language-specific searching to be solid.  E.g., what happens with
text copied to another buffer which might have a different per-buffer
language preference? does it suddenly behave differently when
searched?

But the most basic issue is that any significant development in these
directions require to re-implement the feature on the C level, and use
char-tables for folding, like we do with case-mapping.  So until
someone steps forward for the job, all we can do is small corrections
to the existing implementation.  For example, the default state of
character-folding might depend on the locale's language -- we could
turn it off by default for languages whose users expressed
dissatisfaction with the feature.  We could also augment the regular
expressions created for folding the search string by filtering out
variants that users of a particular language don't want.  If people
think these ideas will make more users happy, we can work on that.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22 17:59                                                                           ` Richard Stallman
@ 2016-02-22 18:57                                                                             ` Eli Zaretskii
  2016-02-23 17:43                                                                               ` Richard Stallman
                                                                                                 ` (2 more replies)
  0 siblings, 3 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-22 18:57 UTC (permalink / raw)
  To: rms; +Cc: larsi, lokedhs, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> CC: lokedhs@gmail.com, larsi@gnus.org, emacs-devel@gnu.org
> Date: Mon, 22 Feb 2016 12:59:03 -0500
> 
> Users seem to disagree on whether to fold diacritics that make
> different letters (ñ, ç, polish l with slash) or only those that
> modify a single letter (as á, à, â in French).
> 
> I think that we should have a user option which controls this and only
> this.
> 
> That means we should have two levels of folding group definitions: the
> smaller groups which hold variants of the same letter, and the bigger
> groups which hold similar letters.
> 
> These groups need to depend on the language setting.  In English (and
> in French), ö is a modified o.  In Swedish (and German, I think), ö
> and o are different letters.

This can be done if it will help.  But no one responded to these ideas
until now, so I'm not sure we are not in for another round of
rejections.

> I think that each folding group should specify one character that is
> the base.

I'm not sure what that means.  What is a "folding group"?

> This is because users also seem to disagree on what it
> should mean to specify a non-base letter in the search string.
> 
> Some plausible meanings are
> 
> * Find that one and only that one.
> * Treat it the same as specifying the base letter.
> 
> There should be a user option to choose between those two (and maybe
> some other behaviors for a non-base letter in the search string).

We already have both options, and in particular, if a non-base letter
appears explicitly in the search string, it will be searched
literally, similarly to what we do with case-insensitive search.
E.g., searching for ö doesn't find o or any other of its variants.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22  3:57                                                                             ` Lars Ingebrigtsen
  2016-02-22 16:10                                                                               ` Eli Zaretskii
@ 2016-02-22 18:58                                                                               ` John Wiegley
  2016-02-23  7:50                                                                                 ` Per Starbäck
  1 sibling, 1 reply; 263+ messages in thread
From: John Wiegley @ 2016-02-22 18:58 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: Eli Zaretskii, lokedhs, emacs-devel

>>>>> Lars Ingebrigtsen <larsi@gnus.org> writes:

> I am describing how Emacs works today.

I'm worried that this very long discussion on character-folding is going
nowhere. We're over 200 messages now, and it seems that the same arguments are
being repeated about what does and does not constitute a letter to be folded.
Or maybe my eyes glazed over, and that's what I think I'm seeing...

If there are other technical discussions to be branched from this topic, now
would be a good time to start new threads for them, if for no other reason
than to clarify what the outcome of those threads should hopefully be.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22 18:01                                                                             ` Richard Stallman
@ 2016-02-22 18:58                                                                               ` Eli Zaretskii
  2016-02-23  1:30                                                                               ` Lars Ingebrigtsen
  2016-02-23  2:03                                                                               ` Elias Mårtenson
  2 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-22 18:58 UTC (permalink / raw)
  To: rms; +Cc: larsi, lokedhs, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, eliz@gnu.org, emacs-devel@gnu.org
> Date: Mon, 22 Feb 2016 13:01:13 -0500
> 
> One way to handle each class is the asymnetric way: searching for the base
> character matches all of them, but searching for one of the other character
> matches only itself.

Emacs already behaves like that.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22 18:01                                                                               ` Richard Stallman
@ 2016-02-22 19:06                                                                                 ` Eli Zaretskii
  2016-02-23 17:43                                                                                   ` Richard Stallman
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-22 19:06 UTC (permalink / raw)
  To: rms; +Cc: larsi, lokedhs, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> CC: lokedhs@gmail.com, eliz@gnu.org, emacs-devel@gnu.org
> Date: Mon, 22 Feb 2016 13:01:26 -0500
> 
> Lars, would you ever want any sort of folding between ö and ø?
> 
> Would you want to use my proposed setting where folding occurs only
> between letters with and without an accent, and never folding between
> related letters such as o and ø?  If you use that setting, then
> ö and ø will also never fold.  Thus, you won't need to have any preference
> about how folding should treat ö and ø, when users do enable it.

Some minimal amount of folding will nevertheless be necessary even in
asymmetric mode, in order to find character sequences produced by
decomposing characters like ö into o and the combining mark ̈.  That's
because these two characters when juxtaposed (ö) look identical to the
precomposed character on most displays, so we should by default find
such decomposed sequences even when the search string includes the
precomposed character.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22 18:51                                                                           ` Eli Zaretskii
@ 2016-02-23  0:14                                                                             ` Juri Linkov
  2016-02-23 17:11                                                                               ` Eli Zaretskii
  2016-02-26 20:23                                                                             ` Richard Stallman
  1 sibling, 1 reply; 263+ messages in thread
From: Juri Linkov @ 2016-02-23  0:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, lokedhs, rms, emacs-devel

> But the most basic issue is that any significant development in these
> directions require to re-implement the feature on the C level, and use
> char-tables for folding, like we do with case-mapping.  So until
> someone steps forward for the job, all we can do is small corrections
> to the existing implementation.

Do I understand correctly that essentially what is necessary to do on the
C level is to extend char-tables with character insertions and deletions,
so in addition to canonical equivalence mappings (like are used for the
existing case-mappings) char-tables should also support matching of
multi-character additions (like combining accents in the search
string) and deletions (like combining accents from the search string
missing in the search text)?

> For example, the default state of character-folding might depend on
> the locale's language -- we could turn it off by default for languages
> whose users expressed dissatisfaction with the feature.  We could also
> augment the regular expressions created for folding the search string
> by filtering out variants that users of a particular language don't
> want.  If people think these ideas will make more users happy, we can
> work on that.

It seems two user variables are necessary for customization:

1. inclusive folding groups that will include by default such pairs
   as o - ø, l - ł added to the Unicode decomposition-based rules,
   and allow the users to add more rules;

2. exclusive folding groups to exclude locale/language-dependent rules from
   the default mappings above, e.g. removing n - ñ for the "es" locale.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22 18:01                                                                             ` Richard Stallman
  2016-02-22 18:58                                                                               ` Eli Zaretskii
@ 2016-02-23  1:30                                                                               ` Lars Ingebrigtsen
  2016-02-23 17:46                                                                                 ` Richard Stallman
  2016-02-23  2:03                                                                               ` Elias Mårtenson
  2 siblings, 1 reply; 263+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-23  1:30 UTC (permalink / raw)
  To: Richard Stallman; +Cc: eliz, Elias Mårtenson, emacs-devel

Richard Stallman <rms@gnu.org> writes:

> Elias and Lars, what do you two think searching for o should match?
> Should it match ö and ø, or not?

As a Norwegian, I think o should match ö, but not ø.  For Americans, it
should match both.

> One way to handle each class is the asymnetric way: searching for the base
> character matches all of them, but searching for one of the other character
> matches only itself.
>
> In Swedish, ö could be the base character and ø a variant.
> In Danish, ø could be the base character and ö the variant.
>
> Would each of you be happy with that mode?

Hm...  I would personally be surprised if any of these characters
matched the other characters, but that may be just me.  Others seem to
find that helpful, apparently.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22  9:20                                                                       ` Andreas Schwab
@ 2016-02-23  1:46                                                                         ` Lars Ingebrigtsen
  2016-02-23  3:38                                                                           ` Eli Zaretskii
  0 siblings, 1 reply; 263+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-23  1:46 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Eli Zaretskii, lokedhs, emacs-devel

Andreas Schwab <schwab@suse.de> writes:

> Lars Ingebrigtsen <larsi@gnus.org> writes:
>
>> And the font seems to be
>>
>>     xft:-unknown-Ubuntu
>> Mono-normal-normal-normal-*-24-*-*-*-m-0-iso10646-1 (#x27)
>
> For both characters?

No, the second one is

    xft:-unknown-Abyssinica SIL-normal-normal-normal-*-24-*-*-*-*-0-iso10646-1 (#x11F)

Character code properties: customize what to show
  name: COMBINING ACUTE ACCENT
  old-name: NON-SPACING ACUTE
  general-category: Mn (Mark, Nonspacing)
  decomposition: (769) ('́')

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22 18:01                                                                             ` Richard Stallman
  2016-02-22 18:58                                                                               ` Eli Zaretskii
  2016-02-23  1:30                                                                               ` Lars Ingebrigtsen
@ 2016-02-23  2:03                                                                               ` Elias Mårtenson
  2016-02-23 17:46                                                                                 ` Richard Stallman
  2 siblings, 1 reply; 263+ messages in thread
From: Elias Mårtenson @ 2016-02-23  2:03 UTC (permalink / raw)
  To: rms; +Cc: Lars Ingebrigtsen, Eli Zaretskii, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1156 bytes --]

On 23 February 2016 at 02:01, Richard Stallman <rms@gnu.org> wrote:

>
>   > But you are Danish, are you not? As such, I would have thought that
> when
>   > you search for ø, you would want to find a Swedish ö? (this is the
> inverse
>   > of the natural Swedish behaviour).
>
> Elias and Lars, what do you two think searching for o should match?
> Should it match ö and ø, or not?
>

I can only speak for Swedish, and there, a search for o definitely should
not match ö (nor ø). This is the crux of this entire discussion, at least
for me.

However, a search for ö should match ø.


> IF you want o not to match ö and ø, then you want ö and ø to be a
> class by themselves.
>
> One way to handle each class is the asymnetric way: searching for the base
> character matches all of them, but searching for one of the other character
> matches only itself.
>
> In Swedish, ö could be the base character and ø a variant.
> In Danish, ø could be the base character and ö the variant.
>
> Would each of you be happy with that mode?


This is exactly in line with what I have been proposing.

Regards,
Elias

[-- Attachment #2: Type: text/html, Size: 1741 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-23  1:46                                                                         ` Lars Ingebrigtsen
@ 2016-02-23  3:38                                                                           ` Eli Zaretskii
  0 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-23  3:38 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: schwab, lokedhs, emacs-devel

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: Eli Zaretskii <eliz@gnu.org>,  lokedhs@gmail.com,  emacs-devel@gnu.org
> Date: Tue, 23 Feb 2016 12:46:06 +1100
> 
> Andreas Schwab <schwab@suse.de> writes:
> 
> > Lars Ingebrigtsen <larsi@gnus.org> writes:
> >
> >> And the font seems to be
> >>
> >>     xft:-unknown-Ubuntu
> >> Mono-normal-normal-normal-*-24-*-*-*-m-0-iso10646-1 (#x27)
> >
> > For both characters?
> 
> No, the second one is
> 
>     xft:-unknown-Abyssinica SIL-normal-normal-normal-*-24-*-*-*-*-0-iso10646-1 (#x11F)

That's why you see them separate: Emacs can only compose characters if
their glyphs come from the same font.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22 18:58                                                                               ` John Wiegley
@ 2016-02-23  7:50                                                                                 ` Per Starbäck
  2016-02-23 16:29                                                                                   ` John Wiegley
  0 siblings, 1 reply; 263+ messages in thread
From: Per Starbäck @ 2016-02-23  7:50 UTC (permalink / raw)
  To: John Wiegley, Lars Ingebrigtsen, Eli Zaretskii, lokedhs,
	emacs-devel@gnu.org

2016-02-22 19:58 GMT+01:00 John Wiegley <jwiegley@gmail.com>:

> I'm worried that this very long discussion on character-folding is going
> nowhere. We're over 200 messages now, and it seems that the same arguments are
> being repeated about what does and does not constitute a letter to be folded.
> Or maybe my eyes glazed over, and that's what I think I'm seeing...

I would have liked a more focused discussion on the most pressing
issue, namely what to do regarding this in the upcoming release which
is currently in pretest.

Therefore I have avoided discussion on how to make the folding better
in the future, even though I have my views on details on the ideal way
to handle o vs ö vs ø, or how useful collation rules are, or how
useful a user's locale settings are, etc. Artur had an interesting
post on how he plans to make it better which I'd like to comment on
someday, but won't for the time being, because it just detracts.

All of this is interesting, but the planned substantial improvements
in character folding will not be in the next released version, so none
of those details matter and it's essentially just a question of a
default setting of off or on for the feature as it currently stands. I
think it has been shown without doubt that the feature as it currently
stands will lead to many disappointed users. As Artur has written:

> It's important that the default be helpful,
> without appearing to be "buggy" to unsuspecting users.

That is the view of what I understand to be the main developer of this
feature, who tried to set the default to off. I think this should have
been settled then, and think that Eli's view that it can be decided
later is just wrong. Pretests should test what we intend to ship.
Saying that it can be changed at the last moment just invites some
error in the last-minute, for example that someone forgets to update
the documentation that goes along with it. No more data is needed for
this decision. (More data and more discussion may be needed for
finding the best way forward after that, but that is something else.)

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-23  7:50                                                                                 ` Per Starbäck
@ 2016-02-23 16:29                                                                                   ` John Wiegley
  0 siblings, 0 replies; 263+ messages in thread
From: John Wiegley @ 2016-02-23 16:29 UTC (permalink / raw)
  To: Per Starbäck
  Cc: Lars Ingebrigtsen, lokedhs, Eli Zaretskii, emacs-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 1016 bytes --]

>>>>> Per Starbäck <per.starback@gmail.com> writes:

> That is the view of what I understand to be the main developer of this
> feature, who tried to set the default to off. I think this should have been
> settled then, and think that Eli's view that it can be decided later is just
> wrong.

I agree that the pretest should be a pre-test, not a candidate run for
features that won't appear in the final release.

I think the hope was that pretesting would reveal that people want character
folding, and so it really was a candidate for the next release. But I'm
getting a string impression that character folding isn't quite ready for
prime-time as a default feature.

So right now, I'm looking for arguments that it *should* be made the default;
otherwise, it seems wise to me to let it wait until things have been hammered
out a lot more.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-21  8:14                                                                     ` Achim Gratz
@ 2016-02-23 16:56                                                                       ` Eli Zaretskii
  0 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-23 16:56 UTC (permalink / raw)
  To: Achim Gratz; +Cc: emacs-devel

> From: Achim Gratz <Stromeko@nexgo.de>
> Date: Sun, 21 Feb 2016 09:14:18 +0100
> 
> Elias Mårtenson writes:
> > Because under the Unicode decomposition rules, ø is not decomposable. I
> > can't explain why that is the case (probably because there is no reason to
> > have a combining /. After all, the only languages that use ø are languages
> > that use it as a character of its own).
> 
> AFAIK, for combining characters to be composable/decomposable the glyphs
> must not overlap.  This is the same issue as with the polish »ł« to the
> best of my knowledge.

The definitive answer is here, for those interested:

  http://www.unicode.org/mail-arch/unicode-ml/y2016-m02/0106.html

> In other words, unicode composition/decomposition rules tell you more
> about the glyph construction than they do about useful strategies to
> search for multiple characters.

That conclusion is too radical, IMO.  You will see in the above
message that the criterion you describe was just a means for the UTC
to draw a line somewhere, i.e. it was an ad-hoc rule more than
anything else.

> The idea of using the base character of the canonical decomposition
> in the search might still yield a useful shortcut in most cases, but
> I'm not sure it is correct in all languages even when that
> decomposition exists and, as the examples show, there are cases
> where the non-decomposed character has to be treated specially.

Language-specific tailoring is indeed needed for best results, but the
language-independent decompositions have their place.  E.g., you will
see in the Unicode collation database (UCA) a file named decomps.txt
that is basically a list of decompositions from UnicodeData.txt with
additions specifically for collation, searching, and matching
(including ł, btw).  Which tells me that the decomposition data in
UnicodeData.txt is a good basis for these features, it is not just
about glyph constructions.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-23  0:14                                                                             ` Juri Linkov
@ 2016-02-23 17:11                                                                               ` Eli Zaretskii
  2016-02-24  0:16                                                                                 ` Juri Linkov
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-23 17:11 UTC (permalink / raw)
  To: Juri Linkov; +Cc: larsi, lokedhs, rms, emacs-devel

> From: Juri Linkov <juri@linkov.net>
> Cc: rms@gnu.org,  larsi@gnus.org,  lokedhs@gmail.com,  emacs-devel@gnu.org
> Date: Tue, 23 Feb 2016 02:14:55 +0200
> 
> > But the most basic issue is that any significant development in these
> > directions require to re-implement the feature on the C level, and use
> > char-tables for folding, like we do with case-mapping.  So until
> > someone steps forward for the job, all we can do is small corrections
> > to the existing implementation.
> 
> Do I understand correctly that essentially what is necessary to do on the
> C level is to extend char-tables with character insertions and deletions,
> so in addition to canonical equivalence mappings (like are used for the
> existing case-mappings) char-tables should also support matching of
> multi-character additions (like combining accents in the search
> string) and deletions (like combining accents from the search string
> missing in the search text)?

I'm not sure I understand why you think char-tables need to be
extended in support of folding search.  AFAIU, we need a way to
normalize each character, both in the search string and in the
buffer/string we search.  This normalization involves decomposition
followed by reordering the combining diacritics into a canonical
order.  Then we just match one against the other, almost as usual
("almost" because we need to backtrack in the buffer/string upon
mismatch).  (Of course, decomposition of buffer/string text needs to
be done on the fly, but this is an implementation detail unrelated to
this discussion.)

So we need a char-table that maps each character into its
decomposition sequence, which AFAIR is something the current
char-tables can support already.  Am I missing something?

If you are interested in the details, I suggest reading
http://unicode.org/reports/tr10/ and in particular
http://unicode.org/reports/tr10/#Searching, which deals specifically
with searching.  http://www.unicode.org/notes/tn5/ is also a useful
reading.

> > For example, the default state of character-folding might depend on
> > the locale's language -- we could turn it off by default for languages
> > whose users expressed dissatisfaction with the feature.  We could also
> > augment the regular expressions created for folding the search string
> > by filtering out variants that users of a particular language don't
> > want.  If people think these ideas will make more users happy, we can
> > work on that.
> 
> It seems two user variables are necessary for customization:
> 
> 1. inclusive folding groups that will include by default such pairs
>    as o - ø, l - ł added to the Unicode decomposition-based rules,
>    and allow the users to add more rules;
> 
> 2. exclusive folding groups to exclude locale/language-dependent rules from
>    the default mappings above, e.g. removing n - ñ for the "es" locale.

I think we should add those in item 1 unconditionally (i.e. include
them in the default mappings), and then exclude some of them under the
rules you describe in item 2.  Then the problem becomes easier, as we
only need to filter out some mappings, as determined by a single user
variable (whose default can come from the user locale).

The additional mappings can be picked up from the file decomps.txt in
the UCA database.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22 18:57                                                                             ` Eli Zaretskii
@ 2016-02-23 17:43                                                                               ` Richard Stallman
  2016-02-23 18:03                                                                                 ` Eli Zaretskii
  2016-02-23 17:43                                                                               ` Richard Stallman
       [not found]                                                                               ` <<E1aYGze-000655-RM@fencepost.gnu.org>
  2 siblings, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-23 17:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > Some plausible meanings are
  > > 
  > > * Find that one and only that one.
  > > * Treat it the same as specifying the base letter.
  > > 
  > > There should be a user option to choose between those two (and maybe
  > > some other behaviors for a non-base letter in the search string).

  > We already have both options, and in particular, if a non-base letter
  > appears explicitly in the search string, it will be searched
  > literally, similarly to what we do with case-insensitive search.

Some users want that.  Some, it appears, want searching for any letter
in the group to find any letter in the group.  So I am suggesting we
offer both behaviors.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22 18:57                                                                             ` Eli Zaretskii
  2016-02-23 17:43                                                                               ` Richard Stallman
@ 2016-02-23 17:43                                                                               ` Richard Stallman
       [not found]                                                                               ` <<E1aYGze-000655-RM@fencepost.gnu.org>
  2 siblings, 0 replies; 263+ messages in thread
From: Richard Stallman @ 2016-02-23 17:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > These groups need to depend on the language setting.  In English (and
  > > in French), ö is a modified o.  In Swedish (and German, I think), ö
  > > and o are different letters.

  > > I think that each folding group should specify one character that is
  > > the base.

  > I'm not sure what that means.  What is a "folding group"?

A group of characters which, under certain circumstances, isearch
should fold together (treat as equivalent).

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22 19:06                                                                                 ` Eli Zaretskii
@ 2016-02-23 17:43                                                                                   ` Richard Stallman
  2016-02-23 18:14                                                                                     ` Eli Zaretskii
  2016-02-23 20:21                                                                                     ` Yuri Khan
  0 siblings, 2 replies; 263+ messages in thread
From: Richard Stallman @ 2016-02-23 17:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Some minimal amount of folding will nevertheless be necessary even in
  > asymmetric mode, in order to find character sequences produced by
  > decomposing characters like ö into o and the combining mark ̈.  That's
  > because these two characters when juxtaposed (ö) look identical to the
  > precomposed character on most displays, so we should by default find
  > such decomposed sequences even when the search string includes the
  > precomposed character.

That is interesting.  It means we need several levels of folding:

* Different appearances of the same letter+decorations:
  as a single code point, or as a composition.

* Identical-looking distinct code points (Latin a and Cyrillic a).

* The same letter with different decorations (o and ö in English).

* Equivalent letters (ö and ø in Swedish).

* Non-equivalent letters modified from a common base (o and ö in
  Swedish).

The first level is language-independent and should be handled
symmetrically, with each folding group as an equivalence class.

Is there any need, ever, to disable the first level?
Perhaps it would be good to enable that all the time.

The second level is also language-independent.  Does anyone ever want
to turn it off?

The other levels are language-specific, and the user might want to
enable or disable them.  When enabled, the user might want them
handled symmetrically or asymmetrically.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-23  1:30                                                                               ` Lars Ingebrigtsen
@ 2016-02-23 17:46                                                                                 ` Richard Stallman
  2016-02-24  1:50                                                                                   ` Lars Ingebrigtsen
  0 siblings, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-23 17:46 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: eliz, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > As a Norwegian, I think o should match ö, but not ø.

Could you explain why that would be best for you?

  > Hm...  I would personally be surprised if any of these characters
  > matched the other characters, but that may be just me.  Others seem to
  > find that helpful, apparently.

Using my proposed levels (see the other message in this batch), I
think you would want to turn off this level

* Equivalent letters (ö and ø in Swedish).

and turn on this level, asymmetrically.

* Non-equivalent letters with a common base (o and ö/ø in Swedish).

Would you be happy with that?

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-23  2:03                                                                               ` Elias Mårtenson
@ 2016-02-23 17:46                                                                                 ` Richard Stallman
  0 siblings, 0 replies; 263+ messages in thread
From: Richard Stallman @ 2016-02-23 17:46 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: larsi, eliz, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I can only speak for Swedish, and there, a search for o definitely should
  > not match ö (nor ø). This is the crux of this entire discussion, at least
  > for me.

  > However, a search for ö should match ø.

Using my proposed levels (see the other message in this batch), I
think you would want to turn on this level, asymmetrically,

* Equivalent letters (ö and ø in Swedish).

and turn off this level.

* Non-equivalent letters with a common base (o and ö/ø in Swedish).

Would you be happy with that?

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* RE: On language-dependent defaults for character-folding
       [not found]                                                                               ` <<E1aYGze-000655-RM@fencepost.gnu.org>
@ 2016-02-23 18:00                                                                                 ` Drew Adams
  0 siblings, 0 replies; 263+ messages in thread
From: Drew Adams @ 2016-02-23 18:00 UTC (permalink / raw)
  To: rms, Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel

>   > > Some plausible meanings are
>   > >
>   > > * Find that one and only that one.
>   > > * Treat it the same as specifying the base letter.
>   > >
>   > > There should be a user option to choose between those two (and maybe
>   > > some other behaviors for a non-base letter in the search string).
>   >
>   > We already have both options, and in particular, if a non-base letter
>   > appears explicitly in the search string, it will be searched
>   > literally, similarly to what we do with case-insensitive search.
> 
> Some users want that.  Some, it appears, want searching for any letter
> in the group to find any letter in the group.  So I am suggesting we
> offer both behaviors.

+1.

That is what I did, BTW, in my add-on to character-fold.el
(option `char-fold-symmetric').

And the same user can want one or the other behavior at different
times or in different contexts.

Besides choosing a behavior as a general preference at customize
time, you can toggle the behavior during Isearch, using `M-s ='
(command `isearchp-toggle-symmetric-char-fold'):

  Toggle option `char-fold-symmetric'.
  This does not also toggle character folding.

  Note that symmetric character folding can slow down search.
  Use longer search strings to reduce this problem, or use `M-s h L'
  to turn off lazy highlighting.

Moving some of the character-fold.el implementation to C would
no doubt speed things up.  But I hope that that will be done in a
fine-grained modular way, providing individual Lisp functions that
users can tweak.

For example, I might not have been able to add this alternative
behavior easily, were it not for the current regexp-using code
in character-fold.el.  I don't expect the same Lisp functions to
be available after the implementation of some things in C, but
let's try to make sure that a C implementation is not monolithic,
preventing easy extension using Lisp.

(No, I'm not suggesting that that has been the case in the past.
Just mentioning the need to be able to extend and experiment in
Lisp.)

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-23 17:43                                                                               ` Richard Stallman
@ 2016-02-23 18:03                                                                                 ` Eli Zaretskii
  2016-02-24 13:41                                                                                   ` Richard Stallman
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-23 18:03 UTC (permalink / raw)
  To: rms; +Cc: larsi, lokedhs, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> CC: lokedhs@gmail.com, larsi@gnus.org, emacs-devel@gnu.org
> Date: Tue, 23 Feb 2016 12:43:26 -0500
> 
>   > We already have both options, and in particular, if a non-base letter
>   > appears explicitly in the search string, it will be searched
>   > literally, similarly to what we do with case-insensitive search.
> 
> Some users want that.  Some, it appears, want searching for any letter
> in the group to find any letter in the group.  So I am suggesting we
> offer both behaviors.

That's okay, but if we do, shouldn't we have similar options for
case-folding and perhaps also for "lax-space" matching?  Currently
they all behave asymmetrically.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-23 17:43                                                                                   ` Richard Stallman
@ 2016-02-23 18:14                                                                                     ` Eli Zaretskii
  2016-02-23 20:24                                                                                       ` Yuri Khan
                                                                                                         ` (2 more replies)
  2016-02-23 20:21                                                                                     ` Yuri Khan
  1 sibling, 3 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-23 18:14 UTC (permalink / raw)
  To: rms; +Cc: larsi, lokedhs, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, lokedhs@gmail.com, emacs-devel@gnu.org
> Date: Tue, 23 Feb 2016 12:43:56 -0500
> 
> That is interesting.  It means we need several levels of folding:
> 
> * Different appearances of the same letter+decorations:
>   as a single code point, or as a composition.
> 
> * Identical-looking distinct code points (Latin a and Cyrillic a).

This one is a very specialized feature needed only in some marginal
use cases (like looking for the so-called "confusables" -- characters
that look the same and could be used for deception, e.g. in URLs).

> * The same letter with different decorations (o and ö in English).
> 
> * Equivalent letters (ö and ø in Swedish).

Not just letters -- sequences of characters.  For example, å vs aa in
Danish, or ﬃ vs ffi.

> Is there any need, ever, to disable the first level?

One could imagine a use case when you want to find only precomposed
characters, not their decomposed equivalents.  But it should be rare
indeed.

> The other levels are language-specific, and the user might want to
> enable or disable them.

Not all of them are language-specific.  Some are valid in any
language.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-23 17:43                                                                                   ` Richard Stallman
  2016-02-23 18:14                                                                                     ` Eli Zaretskii
@ 2016-02-23 20:21                                                                                     ` Yuri Khan
  2016-02-23 21:15                                                                                       ` Marcin Borkowski
  1 sibling, 1 reply; 263+ messages in thread
From: Yuri Khan @ 2016-02-23 20:21 UTC (permalink / raw)
  To: rms@gnu.org; +Cc: Eli Zaretskii, lokedhs, Lars Ingebrigtsen, Emacs developers

On Tue, Feb 23, 2016 at 11:43 PM, Richard Stallman <rms@gnu.org> wrote:

> That is interesting.  It means we need several levels of folding:
>
> * Identical-looking distinct code points (Latin a and Cyrillic a).
> […]
> The second level is also language-independent.  Does anyone ever want
> to turn it off?

I see no reason to ever turn it on.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-23 18:14                                                                                     ` Eli Zaretskii
@ 2016-02-23 20:24                                                                                       ` Yuri Khan
  2016-02-25 12:11                                                                                         ` Richard Stallman
  2016-02-24 13:41                                                                                       ` Richard Stallman
  2016-02-24 13:41                                                                                       ` Richard Stallman
  2 siblings, 1 reply; 263+ messages in thread
From: Yuri Khan @ 2016-02-23 20:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, lokedhs, rms@gnu.org, Emacs developers

On Wed, Feb 24, 2016 at 12:14 AM, Eli Zaretskii <eliz@gnu.org> wrote:

>> * Identical-looking distinct code points (Latin a and Cyrillic a).
>
> This one is a very specialized feature needed only in some marginal
> use cases (like looking for the so-called "confusables" -- characters
> that look the same and could be used for deception, e.g. in URLs).

When looking for confusables, you don’t want to fold. You want to make
letters of different scripts stand out, e.g. by font-locking.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-23 20:21                                                                                     ` Yuri Khan
@ 2016-02-23 21:15                                                                                       ` Marcin Borkowski
  0 siblings, 0 replies; 263+ messages in thread
From: Marcin Borkowski @ 2016-02-23 21:15 UTC (permalink / raw)
  To: Yuri Khan
  Cc: Lars Ingebrigtsen, Eli Zaretskii, lokedhs, rms@gnu.org,
	Emacs developers


On 2016-02-23, at 21:21, Yuri Khan <yuri.v.khan@gmail.com> wrote:

> On Tue, Feb 23, 2016 at 11:43 PM, Richard Stallman <rms@gnu.org> wrote:
>
>> That is interesting.  It means we need several levels of folding:
>>
>> * Identical-looking distinct code points (Latin a and Cyrillic a).
>> […]
>> The second level is also language-independent.  Does anyone ever want
>> to turn it off?
>
> I see no reason to ever turn it on.

I do, but it is indeed an extremely specialized case, and it is unlikely
that anyone would use Emacs for that anyway.

Best,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-23 17:11                                                                               ` Eli Zaretskii
@ 2016-02-24  0:16                                                                                 ` Juri Linkov
  2016-02-24 18:39                                                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 263+ messages in thread
From: Juri Linkov @ 2016-02-24  0:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, lokedhs, rms, emacs-devel

>> > But the most basic issue is that any significant development in these
>> > directions require to re-implement the feature on the C level, and use
>> > char-tables for folding, like we do with case-mapping.  So until
>> > someone steps forward for the job, all we can do is small corrections
>> > to the existing implementation.
>>
>> Do I understand correctly that essentially what is necessary to do on the
>> C level is to extend char-tables with character insertions and deletions,
>> so in addition to canonical equivalence mappings (like are used for the
>> existing case-mappings) char-tables should also support matching of
>> multi-character additions (like combining accents in the search
>> string) and deletions (like combining accents from the search string
>> missing in the search text)?
>
> I'm not sure I understand why you think char-tables need to be
> extended in support of folding search.  AFAIU, we need a way to
> normalize each character, both in the search string and in the
> buffer/string we search.  This normalization involves decomposition
> followed by reordering the combining diacritics into a canonical
> order.  Then we just match one against the other, almost as usual
> ("almost" because we need to backtrack in the buffer/string upon
> mismatch).  (Of course, decomposition of buffer/string text needs to
> be done on the fly, but this is an implementation detail unrelated to
> this discussion.)
>
> So we need a char-table that maps each character into its
> decomposition sequence, which AFAIR is something the current
> char-tables can support already.  Am I missing something?

Searching for a base character and matching a sequence of characters
(e.g. a base character and combining accents) might be already possible
by the current char-tables indexed by a base character.  But I see
no way to specify such a mapping in a char-table that e.g.
a character should be skipped in the search buffer.  Maybe this need
could be avoided in an asymmetric search with combining characters
in the search buffer, but still is required for ignorable characters.

> If you are interested in the details, I suggest reading
> http://unicode.org/reports/tr10/ and in particular
> http://unicode.org/reports/tr10/#Searching, which deals specifically
> with searching.  http://www.unicode.org/notes/tn5/ is also a useful
> reading.

Thanks, looks like a complete specification with comprehensive answers
to most questions.

>> > For example, the default state of character-folding might depend on
>> > the locale's language -- we could turn it off by default for languages
>> > whose users expressed dissatisfaction with the feature.  We could also
>> > augment the regular expressions created for folding the search string
>> > by filtering out variants that users of a particular language don't
>> > want.  If people think these ideas will make more users happy, we can
>> > work on that.
>>
>> It seems two user variables are necessary for customization:
>>
>> 1. inclusive folding groups that will include by default such pairs
>>    as o - ø, l - ł added to the Unicode decomposition-based rules,
>>    and allow the users to add more rules;
>>
>> 2. exclusive folding groups to exclude locale/language-dependent rules from
>>    the default mappings above, e.g. removing n - ñ for the "es" locale.
>
> I think we should add those in item 1 unconditionally (i.e. include
> them in the default mappings), and then exclude some of them under the
> rules you describe in item 2.  Then the problem becomes easier, as we
> only need to filter out some mappings, as determined by a single user
> variable (whose default can come from the user locale).

Better to have 4 variables (2 internal + 2 user customizable variables):

1.1. (internal) default mappings with additional data from decomps.txt

1.2. user mappings to add to the default list

2.1. (internal) locale-dependent mappings to remove from the default list

2.2. user mappings to remove from the default list

> The additional mappings can be picked up from the file decomps.txt in
> the UCA database.

It would be good to find all differences between UnicodeData.txt and
decomps.txt.  Is this the latest version?
http://unicode.org/Public/UCA/6.3.0/decomps.txt



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-23 17:46                                                                                 ` Richard Stallman
@ 2016-02-24  1:50                                                                                   ` Lars Ingebrigtsen
  2016-02-24  6:40                                                                                     ` Lars Brinkhoff
  2016-02-24 13:43                                                                                     ` Richard Stallman
  0 siblings, 2 replies; 263+ messages in thread
From: Lars Ingebrigtsen @ 2016-02-24  1:50 UTC (permalink / raw)
  To: Richard Stallman; +Cc: eliz, lokedhs, emacs-devel

Richard Stallman <rms@gnu.org> writes:

>   > As a Norwegian, I think o should match ö, but not ø.
>
> Could you explain why that would be best for you?

ø is a different letter from o in our 29 letter alphabet, and is a
separate key on our keyboards.  ö is just a variation of o.

>   > Hm...  I would personally be surprised if any of these characters
>   > matched the other characters, but that may be just me.  Others seem to
>   > find that helpful, apparently.
>
> Using my proposed levels (see the other message in this batch), I
> think you would want to turn off this level
>
> * Equivalent letters (ö and ø in Swedish).
>
> and turn on this level, asymmetrically.
>
> * Non-equivalent letters with a common base (o and ö/ø in Swedish).
>
> Would you be happy with that?

Uhm...  I'm not quite sure.  This is all getting so complicated.  :-)

The original, and quite easy to understand, feature being discussed was
that if you search for "e", then all "e" variations should be found.
("Variation" here is "all those diacritics those furriners use all the
time".)  That's a feature I can get behind, and I think everybody would
like.

All this talk about equivalence classes feels like a totally different
feature.  Sure, in (older) Danish "å" can be spelled "aa", and they were
sorted the same way, so they're "equivalent".  But that's a totally
different and separate feature set.

It's the same with Swedes wanting ö and ø to be found.  It's out of the
scope of the simple, diacritic-ignoring feature that Emacs should
definitely have.

I think.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-24  1:50                                                                                   ` Lars Ingebrigtsen
@ 2016-02-24  6:40                                                                                     ` Lars Brinkhoff
  2016-02-24 13:43                                                                                     ` Richard Stallman
  1 sibling, 0 replies; 263+ messages in thread
From: Lars Brinkhoff @ 2016-02-24  6:40 UTC (permalink / raw)
  To: emacs-devel

Lars Ingebrigtsen <larsi@gnus.org> writes:
> Richard Stallman <rms@gnu.org> writes:
>>   > As a Norwegian, I think o should match ö, but not ø.
>> Could you explain why that would be best for you?
> ø is a different letter from o in our 29 letter alphabet, and
> is a separate key on our keyboards.  ö is just a variation of
> o.

Maybe you point about the keyboard can be a useful illustration
in the debate.  (Maybe it has been brought up alread, in which
case I apologize.)

An English-speaking user would typically have a keyboard with
the letters a-z, so it can be quite handy to have o match ö and
ø, and n match ñ.  Because it's somewhat inconvenient to type
those letters on such a keyboard.

Swedish-speaking users probably have keyboards with a separate
ö key, so it's easy to search for ö without any folding.  (The
situation for ø is less clear; I can imagine that some Swedish
user would like it to be matched by both o and ö, or just o, or
just ö.)

Similarly, Spanish keyboards have a separate ñ key (I learned
that just now).

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-09 17:26 On language-dependent defaults for character-folding Artur Malabarba
                   ` (4 preceding siblings ...)
  2016-02-10 13:52 ` Adrian.B.Robert
@ 2016-02-24  9:58 ` Marcin Borkowski
  5 siblings, 0 replies; 263+ messages in thread
From: Marcin Borkowski @ 2016-02-24  9:58 UTC (permalink / raw)
  To: bruce.connor.am; +Cc: emacs-devel

Related (well, sort of): http://xkcd.com/1647/

;-)

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-23 18:03                                                                                 ` Eli Zaretskii
@ 2016-02-24 13:41                                                                                   ` Richard Stallman
  0 siblings, 0 replies; 263+ messages in thread
From: Richard Stallman @ 2016-02-24 13:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > That's okay, but if we do, shouldn't we have similar options for
  > case-folding and perhaps also for "lax-space" matching?

Not necessarily.  There is no principle that says we have to give
feature A whatever customizations we give to feature B.

We could implement these options for case folding and whitespace
matching if users want them.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-23 18:14                                                                                     ` Eli Zaretskii
  2016-02-23 20:24                                                                                       ` Yuri Khan
@ 2016-02-24 13:41                                                                                       ` Richard Stallman
  2016-02-24 17:54                                                                                         ` Eli Zaretskii
  2016-02-24 13:41                                                                                       ` Richard Stallman
  2 siblings, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-24 13:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > The other levels are language-specific, and the user might want to
  > > enable or disable them.

  > Not all of them are language-specific.  Some are valid in any
  > language.

Could you explain that more concretely?

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-23 18:14                                                                                     ` Eli Zaretskii
  2016-02-23 20:24                                                                                       ` Yuri Khan
  2016-02-24 13:41                                                                                       ` Richard Stallman
@ 2016-02-24 13:41                                                                                       ` Richard Stallman
  2016-02-24 17:56                                                                                         ` Eli Zaretskii
  2 siblings, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-24 13:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > * Equivalent letters (ö and ø in Swedish).

  > Not just letters -- sequences of characters.  For example, å vs aa in
  > Danish, or ﬃ vs ffi.

å and aa in Danish are equivalent, like ö and ø in Swedish.

Ligatures such as ﬃ are a different issue entirely.
The relationship between ﬃ vs ffi is language-independent
and similar to these two levels:

 * Different appearances of the same letter+decorations:
   as a single code point, or as a composition.

 * Identical-looking distinct code points (Latin a and Cyrillic a).

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-24  1:50                                                                                   ` Lars Ingebrigtsen
  2016-02-24  6:40                                                                                     ` Lars Brinkhoff
@ 2016-02-24 13:43                                                                                     ` Richard Stallman
  1 sibling, 0 replies; 263+ messages in thread
From: Richard Stallman @ 2016-02-24 13:43 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: eliz, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > Using my proposed levels (see the other message in this batch), I
  > > think you would want to turn off this level
  > >
  > > * Equivalent letters (ö and ø in Swedish).
  > >
  > > and turn on this level, asymmetrically.
  > >
  > > * Non-equivalent letters with a common base (o and ö/ø in Swedish).
  > >
  > > Would you be happy with that?

  > Uhm...  I'm not quite sure.

Please help out by thinking about the question.

What part are you not sure about?

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-24 13:41                                                                                       ` Richard Stallman
@ 2016-02-24 17:54                                                                                         ` Eli Zaretskii
  2016-02-25 12:15                                                                                           ` Richard Stallman
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-24 17:54 UTC (permalink / raw)
  To: rms; +Cc: larsi, lokedhs, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, lokedhs@gmail.com, emacs-devel@gnu.org
> Date: Wed, 24 Feb 2016 08:41:45 -0500
> 
>   > > The other levels are language-specific, and the user might want to
>   > > enable or disable them.
> 
>   > Not all of them are language-specific.  Some are valid in any
>   > language.
> 
> Could you explain that more concretely?

Not sure what to explain, to tell the truth.  What I had in mind is
cases like á, which I don't think any user of any language will ever
want to consider a non-decomposable character.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-24 13:41                                                                                       ` Richard Stallman
@ 2016-02-24 17:56                                                                                         ` Eli Zaretskii
  2016-02-25 12:15                                                                                           ` Richard Stallman
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-24 17:56 UTC (permalink / raw)
  To: rms; +Cc: larsi, lokedhs, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> CC: larsi@gnus.org, lokedhs@gmail.com, emacs-devel@gnu.org
> Date: Wed, 24 Feb 2016 08:41:46 -0500
> 
>   > > * Equivalent letters (ö and ø in Swedish).
> 
>   > Not just letters -- sequences of characters.  For example, å vs aa in
>   > Danish, or ﬃ vs ffi.
> 
> å and aa in Danish are equivalent, like ö and ø in Swedish.
> 
> Ligatures such as ﬃ are a different issue entirely.
> The relationship between ﬃ vs ffi is language-independent
> and similar to these two levels:
> 
>  * Different appearances of the same letter+decorations:
>    as a single code point, or as a composition.
>  
>  * Identical-looking distinct code points (Latin a and Cyrillic a).

I didn't say the 2 examples were in the same class.  My point was that
we are not talking about equivalence of _characters_, we are talking
about equivalent character _sequences_.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-24  0:16                                                                                 ` Juri Linkov
@ 2016-02-24 18:39                                                                                   ` Eli Zaretskii
  2016-02-25  0:29                                                                                     ` Juri Linkov
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-24 18:39 UTC (permalink / raw)
  To: Juri Linkov; +Cc: larsi, lokedhs, rms, emacs-devel

> From: Juri Linkov <juri@linkov.net>
> Cc: rms@gnu.org,  larsi@gnus.org,  lokedhs@gmail.com,  emacs-devel@gnu.org
> Date: Wed, 24 Feb 2016 02:16:23 +0200
> 
> > So we need a char-table that maps each character into its
> > decomposition sequence, which AFAIR is something the current
> > char-tables can support already.  Am I missing something?
> 
> Searching for a base character and matching a sequence of characters
> (e.g. a base character and combining accents) might be already possible
> by the current char-tables indexed by a base character.  But I see
> no way to specify such a mapping in a char-table that e.g.
> a character should be skipped in the search buffer.  Maybe this need
> could be avoided in an asymmetric search with combining characters
> in the search buffer, but still is required for ignorable characters.

Whether ignorables can be supported by the current char-tables depends
on the data we store in that table.  It could be a vector of objects
that provide both the codepoint and its weight; then it's easy to
implement skipping characters by throwing away characters whose weight
is above the threshold specified by the caller.

> >> It seems two user variables are necessary for customization:
> >>
> >> 1. inclusive folding groups that will include by default such pairs
> >>    as o - ø, l - ł added to the Unicode decomposition-based rules,
> >>    and allow the users to add more rules;
> >>
> >> 2. exclusive folding groups to exclude locale/language-dependent rules from
> >>    the default mappings above, e.g. removing n - ñ for the "es" locale.
> >
> > I think we should add those in item 1 unconditionally (i.e. include
> > them in the default mappings), and then exclude some of them under the
> > rules you describe in item 2.  Then the problem becomes easier, as we
> > only need to filter out some mappings, as determined by a single user
> > variable (whose default can come from the user locale).
> 
> Better to have 4 variables (2 internal + 2 user customizable variables):

Can you explain why it's better to have 4 variables rather than just
one?

> It would be good to find all differences between UnicodeData.txt and
> decomps.txt.  Is this the latest version?
> http://unicode.org/Public/UCA/6.3.0/decomps.txt

No, the latest is always here:

  http://unicode.org/Public/UCA/latest/decomps.txt

(The last release of Unicode is v8.0.)



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-20 14:32                                                   ` Richard Stallman
@ 2016-02-24 23:27                                                     ` Rasmus
  2016-02-25 20:46                                                       ` Richard Stallman
  0 siblings, 1 reply; 263+ messages in thread
From: Rasmus @ 2016-02-24 23:27 UTC (permalink / raw)
  To: emacs-devel

Richard Stallman <rms@gnu.org> writes:

> [[[ To any NSA and FBI agents reading my email: please consider    ]]]
> [[[ whether defending the US Constitution against all enemies,     ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
>   > Are you saying that making the default depend on the locale would be
>   > OK?
>
> I think it is ok to use the locale as a sort of last default,
> but more important than that is to make it easy to specify different
> behaviors in Emacs, both globally and for a specific buffer.

I think it should look at the /keyboard layout/ before the /locale/.
E.g. on my system the locale would suggest that I can easily type ñ,
though in fact I cannot:

     $ localectl 
       System Locale: LANG=es_ES.UTF-8
           VC Keymap: dk-latin1
          X11 Layout: dk
         X11 Variant: nodeadkeys

Rasmus

-- 
Hooray!




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-24 18:39                                                                                   ` Eli Zaretskii
@ 2016-02-25  0:29                                                                                     ` Juri Linkov
  2016-02-25 16:24                                                                                       ` Eli Zaretskii
  0 siblings, 1 reply; 263+ messages in thread
From: Juri Linkov @ 2016-02-25  0:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, lokedhs, rms, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2158 bytes --]

>> >> It seems two user variables are necessary for customization:
>> >>
>> >> 1. inclusive folding groups that will include by default such pairs
>> >>    as o - ø, l - ł added to the Unicode decomposition-based rules,
>> >>    and allow the users to add more rules;
>> >>
>> >> 2. exclusive folding groups to exclude locale/language-dependent rules from
>> >>    the default mappings above, e.g. removing n - ñ for the "es" locale.
>> >
>> > I think we should add those in item 1 unconditionally (i.e. include
>> > them in the default mappings), and then exclude some of them under the
>> > rules you describe in item 2.  Then the problem becomes easier, as we
>> > only need to filter out some mappings, as determined by a single user
>> > variable (whose default can come from the user locale).
>> 
>> Better to have 4 variables (2 internal + 2 user customizable variables):
>
> Can you explain why it's better to have 4 variables rather than just
> one?

If you mean that one customizable variable should contain all mappings from
UnicodeData.txt and decomps.txt presented to the user for customization,
such a list will be too huge to customize: there are 5721 decompositions
in UnicodeData.txt, and 6674 decompositions in decomps.txt.

So we could have at least one default internal variable containing all
decompositions from UnicodeData.txt plus decompositions from decomps.txt
minus locale-dependent mappings.

Then 2 user customizable variables should be enough: one will allow
the users to add a mapping to the default list, and another to remove
a mapping from the default list.

>> It would be good to find all differences between UnicodeData.txt and
>> decomps.txt.  Is this the latest version?
>> http://unicode.org/Public/UCA/6.3.0/decomps.txt
>
> No, the latest is always here:
>
>   http://unicode.org/Public/UCA/latest/decomps.txt
>
> (The last release of Unicode is v8.0.)

Thanks, comparing UnicodeData.txt with the latest decomps.txt shows
1600 differences (such as ł decomposed to l and ̵ and ø to o and ̸)
we need to add manually (a whole set of differences is attached below):


[-- Attachment #2: UnicodeData_decomps.diff --]
[-- Type: text/x-diff, Size: 34090 bytes --]

< ¨ = <compat>   ̈
< ¯ = <compat>   ̄
< ´ = <compat>   ́
< ¸ = <compat>   ̧
> Æ = <sort> A  E
> Ð = <sort> D 
> Ø = O ̸
> ß = <sort> s  s
> æ = <sort> a  e
> ð = <sort> d 
> ø = o ̸
> Đ = D ̵
> đ = d ̵
> Ħ = H ̵
> ħ = h ̵
> Ł = L ̵
> ł = l ̵
> Œ = <sort> O  E
> œ = <sort> o  e
< ſ = <compat> s
> ſ = <sort> s 
> ƍ = <sort> z w
> ƾ = <sort> t s
< Ǣ = Æ ̄
< ǣ = æ ̄
> Ǣ = <sort> A  E ̄
> ǣ = <sort> a  e ̄
< Ǽ = Æ ́
< ǽ = æ ́
< Ǿ = Ø ́
< ǿ = ø ́
> Ǽ = <sort> A  E ́
> ǽ = <sort> a  e ́
> Ǿ = O ̸ ́
> ǿ = o ̸ ́
> ȸ = <sort> d b
> ȹ = <sort> q p
> ʣ = <sort> d z
> ʤ = <sort> d ʒ
> ʥ = <sort> d ʑ
> ʦ = <sort> t s
> ʧ = <sort> t ʃ
> ʨ = <sort> t ɕ
> ʩ = <sort> f ŋ
> ʪ = <sort> l s
> ʫ = <sort> l z
< ˘ = <compat>   ̆
< ˙ = <compat>   ̇
< ˚ = <compat>   ̊
< ˛ = <compat>   ̨
< ˜ = <compat>   ̃
< ˝ = <compat>   ̋
> ̍ = 
> ̎ = 
> ̒ = 
> ̕ = 
> ̖ = 
> ̗ = 
> ̘ = 
> ̙ = 
> ̚ = 
> ̜ = 
> ̝ = 
> ̞ = 
> ̟ = 
> ̠ = 
> ̩ = 
> ̪ = 
> ̫ = 
> ̬ = 
> ̯ = 
> ̳ = 
> ̶ = 
> ̷ = 
> ̺ = 
> ̻ = 
> ̼ = 
> ̽ = 
> ̾ = 
> ̿ = 
> ͆ = 
> ͇ = 
> ͈ = 
> ͉ = 
> ͊ = 
> ͋ = 
> ͌ = 
> ͍ = 
> ͎ = 
> ͐ = 
> ͑ = 
> ͒ = 
> ͓ = 
> ͔ = 
> ͕ = 
> ͖ = 
> ͗ = 
> ͙ = 
> ͚ = 
> ͛ = 
> ͜ = 
> ͝ = 
> ͞ = 
> ͟ = 
> ͢ = 
> ͣ = <sort> a
> ͤ = <sort> e
> ͥ = <sort> i
> ͦ = <sort> o
> ͧ = <sort> u
> ͨ = <sort> c
> ͩ = <sort> d
> ͪ = <sort> h
> ͫ = <sort> m
> ͬ = <sort> r
> ͭ = <sort> t
> ͮ = <sort> v
> ͯ = <sort> x
< ͺ = <compat>   ͅ
> ͺ = <sort> ι
< ΄ = <compat>   ́
< ΅ = <compat>   ̈ ́
> ΄ = ´
> ΅ = ¨ ́
> ς = <final> σ
> Ϗ = <sort> Κ α ι
> ϗ = <sort> κ α ι
< ϲ = <compat> ς
> ϲ = <compat> σ
> ҄ = 
> ҅ = ̔
> ҆ = ̓
> ҇ = 
> Ґ = <sort> Г 
> ґ = <sort> г 
> ֺ = ֹ
> ׇ = ָ
> ך = <final> כ
> ם = <final> מ
> ן = <final> נ
> ף = <final> פ
> ץ = <final> צ
> װ = <sort> ו ו
> ױ = <sort> ו י
> ײ = <sort> י י
< ٵ = <compat> ا ٴ
< ٶ = <compat> و ٴ
< ٷ = <compat> ۇ ٴ
< ٸ = <compat> ي ٴ
> ٴ = <sort> ء
> ٵ = <compat> ا ء
> ٶ = <compat> و ء
> ٷ = <compat> ۇ ء
> ٸ = <compat> ي ء
> ۥ = <sort> و
> ۦ = <sort> ي
> ۽ = <sort> ء 
> ۾ = <sort> م 
> ܔ = <sort> ܓ 
> ܜ = <sort> ܛ 
> ܤ = <final> ܣ
> ܧ = <sort> ܦ 
> ܭ = <sort> ܒ 
> ܮ = <sort> ܓ 
> ܯ = <sort> ܕ 
> ݁ = 
> ݂ = 
> ݅ = 
> ݆ = 
> ߨ = <sort> ߖ 
> ߩ = <sort> ߗ 
> ߪ = <sort> ߙ 
> ࠜ = ࠝ
> ࠞ = ࠠ
> ࠟ = ࠠ
> ࠡ = ࠣ
> ࠢ = ࠣ
> ࠤ = ࠥ
> ࠦ = ࠧ
> ࠨ = ࠪ
> ࠩ = ࠪ
> ࡙ = 
> ࡚ = 
> ࡛ = 
> ࢭ = <sort> ا
> ऀ = ँ
> ॓ = ̀
> ॔ = ́
> ঁ = ँ
> ং = ं
> ঃ = ः
> ় = ़
< ড় = ড ়
< ঢ় = ঢ ়
< য় = য ়
< ਲ਼ = ਲ ਼
< ਸ਼ = ਸ ਼
< ਖ਼ = ਖ ਼
< ਗ਼ = ਗ ਼
< ਜ਼ = ਜ ਼
< ਫ਼ = ਫ ਼
> ৎ = <sort> ত ্
> ড় = ড ़
> ঢ় = ঢ ़
> য় = য ़
> ਁ = ँ
> ਂ = ं
> ਃ = ः
> ਲ਼ = ਲ ़
> ਸ਼ = ਸ ़
> ਼ = ़
> ਖ਼ = ਖ ़
> ਗ਼ = ਗ ़
> ਜ਼ = ਜ ़
> ਫ਼ = ਫ ़
> ઁ = ँ
> ં = ं
> ઃ = ः
> ઼ = ़
> ଁ = ँ
> ଂ = ं
> ଃ = ः
> ଼ = ़
< ଡ଼ = ଡ ଼
< ଢ଼ = ଢ ଼
> ଡ଼ = ଡ ़
> ଢ଼ = ଢ ़
> ஂ = ं
> ఀ = ँ
> ఁ = ँ
> ం = ं
> ః = ः
> ಁ = ँ
> ಂ = ं
> ಃ = ः
> ಼ = ़
> ೋ = ೊ ೕ
> ഁ = ँ
> ം = ं
> ഃ = ः
> ൎ = <sort> ര ്
> ൺ = <sort> ണ ്
> ൻ = <sort> ന ്
> ർ = <sort> ര ്
> ൽ = <sort> ല ്
> ൾ = <sort> ള ്
> ൿ = <sort> ക ്
> ං = ं
> ඃ = ः
> ෝ = ො ්
< ำ = <compat> ํ า
< ຳ = <compat> ໍ າ
> ำ = ํ า
> ຳ = ໍ າ
> ༀ = <sort> ཨ ོ ं
> ༪ = <sort> ༡
> ༫ = <sort> ༢
> ༬ = <sort> ༣
> ༭ = <sort> ༤
> ༮ = <sort> ༥
> ༯ = <sort> ༦
> ༰ = <sort> ༧
> ༱ = <sort> ༨
> ༲ = <sort> ༩
> ༳ = <sort> ༠
> ཪ = <sort> ར 
> ཷ = <compat> ྲ ཱྀ
> ཹ = <compat> ླ ཱྀ
> ཾ = ं
> ཿ = ः
> ྺ = <sort> ྭ 
> ྻ = <sort> ྱ 
> ྼ = <sort> ྲ 
> ါ = <sort> ာ
> ံ = ं
> း = ः
> ဿ = <sort> သ ္ သ
>   = <sort>  
> ᚡ = <sort> ᚠ 
> ᚤ = <sort> ᚢ 
> ᚥ = <sort> ᚢ 
> ᚧ = <sort> ᚦ 
> ᚩ = <sort> ᚨ 
> ᚬ = <sort> ᚨ 
> ᚭ = <sort> ᚨ 
> ᚮ = <sort> ᚨ 
> ᚳ = <sort> ᚲ 
> ᚴ = <sort> ᚲ 
> ᚵ = <sort> ᚲ 
> ᚶ = <sort> ᚲ 
> ᚻ = <sort> ᚺ 
> ᚼ = <sort> ᚺ 
> ᚽ = <sort> ᚺ 
> ᚿ = <sort> ᚾ 
> ᛀ = <sort> ᚾ 
> ᛂ = <sort> ᛁ 
> ᛄ = <sort> ᛃ 
> ᛆ = <sort> ᛅ 
> ᛋ = <sort> ᛊ 
> ᛌ = <sort> ᛊ 
> ᛍ = <sort> ᛊ 
> ᛎ = <sort> ᛊ 
> ᛐ = <sort> ᛏ 
> ᛑ = <sort> ᛏ 
> ᛓ = <sort> ᛒ 
> ᛔ = <sort> ᛒ 
> ᛕ = <sort> ᛈ 
> ᛘ = <sort> ᛗ 
> ᛙ = <sort> ᛗ 
> ᛛ = <sort> ᛚ 
> ᛝ = <sort> ᛜ 
> ᛧ = <sort> ᛦ 
> ᛨ = <sort> ᛦ 
> ᛩ = <sort> ᚹ 
> ᛪ = <sort> ᛊ 
> ᛮ = <sort> ᛅ ᛚ
> ᛯ = <sort> ᛗ  ᛗ 
> ᛰ = <sort> ᚦ ᚦ
> ំ = ं
> ះ = ः
> ់ = 
> ៌ = 
> ៍ = 
> ៎ = 
> ៏ = 
> ័ = 
> ៑ = 
> ៝ = 
> ᤝ = <sort> ᤈ ᤩ
> ᤞ = <sort> ᤋ ᤪ
> ᧞ = <sort> ᦜ ᦶ
> ᧟ = <sort> ᦜ ᦶ ᧁ
> ᩔ = <sort> ᩆ ᩠ ᩆ
> ᩘ = <sort> ᨦ
> ᩙ = <sort> ᨦ
> ᩚ = <sort> ᨻ
> ᩛ = <sort> ᨻ
> ᩤ = <sort> ᩣ
> ᩴ = ं
> ᪰ = 
> ᪱ = 
> ᪲ = 
> ᪳ = 
> ᪴ = 
> ᪵ = 
> ᪶ = 
> ᪷ = 
> ᪸ = 
> ᪹ = 
> ᪺ = 
> ᪻ = 
> ᪼ = 
> ᪽ = 
> ᪾ = 
> ᬀ = ँ
> ᬁ = ँ
> ᬂ = ं
> ᬄ = ः
> ᬴ = ़
> ᮀ = ं
> ᮂ = ः
> ᮺ = <sort> ᮃ
> ᮾ = <final> ᮊ
> ᮿ = <final> ᮙ
> ᯁ = <sort> ᯀ
> ᯃ = <sort> ᯂ
> ᯄ = <sort> ᯂ
> ᯆ = <sort> ᯅ
> ᯈ = <sort> ᯇ
> ᯊ = <sort> ᯉ
> ᯌ = <sort> ᯋ
> ᯍ = <sort> ᯋ
> ᯏ = <sort> ᯎ
> ᯓ = <sort> ᯒ
> ᯕ = <sort> ᯔ
> ᯗ = <sort> ᯖ
> ᯙ = <sort> ᯘ
> ᯚ = <sort> ᯘ
> ᯜ = <sort> ᯛ
> ᯟ = <sort> ᯞ
> ᯦ = ़
> ᯨ = <sort> ᯧ
> ᯫ = <sort> ᯪ
> ᯭ = <sort> ᯬ
> ᯯ = <sort> ᯮ
> ᰷ = ़
> ᳪ = <sort> ᳩ
> ᳫ = <sort> ᳩ
> ᳬ = <sort> ᳩ
> ᳭ = ं
> ᳮ = <sort> ᳩ
> ᳯ = <sort> ᳩ
> ᳰ = <sort> ᳩ
> ᳱ = <sort> ᳩ
> ᳲ = ः
> ᳳ = ः
< ᴭ = <super> Æ
> ᴭ = <super> A  E
< ᵌ = <super> ɜ
> ᵌ = <super> ᴈ
> ᵎ = <super> ᴉ
> ᵹ = <sort> g 
> ᵺ = <sort> t  h
< ᶞ = <super> ð
> ᶞ = <super> d 
> ᷀ = 
> ᷁ = 
> ᷂ = 
> ᷃ = 
> ᷄ = 
> ᷅ = 
> ᷆ = 
> ᷇ = 
> ᷈ = 
> ᷉ = 
> ᷊ = <sort> r
> ᷋ = 
> ᷌ = 
> ᷍ = 
> ᷎ = 
> ᷏ = 
> ᷐ = 
> ᷑ = 
> ᷒ = <sort> ꝯ
> ᷓ = <sort> a 
> ᷔ = <sort> a  e
> ᷕ = <sort> a o
> ᷖ = <sort> a v
> ᷗ = <sort> c ̧
> ᷘ = <sort> d 
> ᷙ = <sort> d 
> ᷚ = <sort> g
> ᷛ = <sort> ɢ
> ᷜ = <sort> k
> ᷝ = <sort> l
> ᷞ = <sort> ʟ
> ᷟ = <sort> ᴍ
> ᷠ = <sort> n
> ᷡ = <sort> ɴ
> ᷢ = <sort> ʀ
> ᷣ = <sort> ꝛ
> ᷤ = <sort> s
> ᷥ = <sort> s 
> ᷦ = <sort> z
> ᷧ = <sort> ɑ
> ᷨ = <sort> b
> ᷩ = <sort> ꞵ
> ᷪ = <sort> ə
> ᷫ = <sort> f
> ᷬ = <sort> ꬸ
> ᷭ = <sort> o 
> ᷮ = <sort> p
> ᷯ = <sort> ʃ
> ᷰ = <sort> u 
> ᷱ = <sort> w
> ᷲ = <sort> a ̈
> ᷳ = <sort> o ̈
> ᷴ = <sort> u ̈
> ᷵ = 
> ᷼ = 
> ᷽ = 
> ᷾ = 
> ᷿ = 
< ẛ = <compat> s ̇
> ẛ = <sort> s  ̇
> ẞ = <sort> S  S
> Ỻ = <sort> L L
> ỻ = <sort> l l
< ᾽ = <compat>   ̓
> ᾽ = ᾿
< ᾿ = <compat>   ̓
< ῀ = <compat>   ͂
< ῁ = <compat>   ̈ ͂
> ῁ = ¨ ͂
< ῍ = <compat>   ̓ ̀
< ῎ = <compat>   ̓ ́
< ῏ = <compat>   ̓ ͂
> ῍ = ᾿ ̀
> ῎ = ᾿ ́
> ῏ = ᾿ ͂
< ῝ = <compat>   ̔ ̀
< ῞ = <compat>   ̔ ́
< ῟ = <compat>   ̔ ͂
> ῝ = ῾ ̀
> ῞ = ῾ ́
> ῟ = ῾ ͂
< ῭ = <compat>   ̈ ̀
< ΅ = <compat>   ̈ ́
> ῭ = ¨ ̀
> ΅ = ¨ ́
< ´ = <compat>   ́
< ῾ = <compat>   ̔
<   = <compat>  
<   = <compat>  
> ´ = ´
>   = <compat>  
>   = <compat>  
< ‗ = <compat>   ̳
< ‾ = <compat>   ̅
> ⃓ = ⃒
> ⃘ = 
> ⃙ = 
> ⃚ = 
> ⃝ = 
> ⃞ = 
> ⃟ = 
> ⃠ = 
> ⃢ = 
> ⃣ = 
> ⃤ = 
> ⃥ = 
> ⃪ = 
> ⃫ = 
> ⃬ = 
> ⃭ = 
> ⃮ = 
> ⃯ = 
> ⃰ = 
< ℏ = <font> ħ
> ℏ = <font> h ̵
> ⅍ = <sort> A / S
> ⓫ = <circle> 1 1
> ⓬ = <circle> 1 2
> ⓭ = <circle> 1 3
> ⓮ = <circle> 1 4
> ⓯ = <circle> 1 5
> ⓰ = <circle> 1 6
> ⓱ = <circle> 1 7
> ⓲ = <circle> 1 8
> ⓳ = <circle> 1 9
> ⓴ = <circle> 2 0
> ⓵ = <circle> 1
> ⓶ = <circle> 2
> ⓷ = <circle> 3
> ⓸ = <circle> 4
> ⓹ = <circle> 5
> ⓺ = <circle> 6
> ⓻ = <circle> 7
> ⓼ = <circle> 8
> ⓽ = <circle> 9
> ⓾ = <circle> 1 0
> ⓿ = <circle> 0
> ❶ = <circle> 1
> ❷ = <circle> 2
> ❸ = <circle> 3
> ❹ = <circle> 4
> ❺ = <circle> 5
> ❻ = <circle> 6
> ❼ = <circle> 7
> ❽ = <circle> 8
> ❾ = <circle> 9
> ❿ = <circle> 1 0
> ➀ = <circle> 1
> ➁ = <circle> 2
> ➂ = <circle> 3
> ➃ = <circle> 4
> ➄ = <circle> 5
> ➅ = <circle> 6
> ➆ = <circle> 7
> ➇ = <circle> 8
> ➈ = <circle> 9
> ➉ = <circle> 1 0
> ➊ = <circle> 1
> ➋ = <circle> 2
> ➌ = <circle> 3
> ➍ = <circle> 4
> ➎ = <circle> 5
> ➏ = <circle> 6
> ➐ = <circle> 7
> ➑ = <circle> 8
> ➒ = <circle> 9
> ➓ = <circle> 1 0
< ⵯ = <super> ⵡ
> ⳤ = <sort> ⲕ ⲁ ⲓ
> ⳯ = 
> ⳰ = ̔
> ⳱ = ̓
> ⷠ = <sort> б
> ⷡ = <sort> в
> ⷢ = <sort> г
> ⷣ = <sort> д
> ⷤ = <sort> ж
> ⷥ = <sort> з
> ⷦ = <sort> к
> ⷧ = <sort> л
> ⷨ = <sort> м
> ⷩ = <sort> н
> ⷪ = <sort> о
> ⷫ = <sort> п
> ⷬ = <sort> р
> ⷭ = <sort> с
> ⷮ = <sort> т
> ⷯ = <sort> х
> ⷰ = <sort> ц
> ⷱ = <sort> ч
> ⷲ = <sort> ш
> ⷳ = <sort> щ
> ⷴ = <sort> ѳ
> ⷵ = <sort> с т
> ⷶ = <sort> а
> ⷷ = <sort> е
> ⷸ = <sort> ꙉ
> ⷹ = <sort> ꙋ
> ⷺ = <sort> ѣ
> ⷻ = <sort> ю
> ⷼ = <sort> ꙗ
> ⷽ = <sort> ѧ
> ⷾ = <sort> ѫ
> ⷿ = <sort> ѭ
> ⺀ = <sort> 丶 
> ⺁ = <sort> 厂 
> ⺂ = <sort> 乛
> ⺃ = <sort> 乚
> ⺄ = <sort> 乙 
> ⺅ = <sort> 亻
> ⺆ = <sort> 冂 
> ⺇ = <sort> 几 
> ⺈ = <sort> 刀 
> ⺉ = <sort> 刂
> ⺊ = <sort> 卜 
> ⺋ = <sort> 卩 
> ⺌ = <sort> 小 
> ⺍ = <sort> 小 
> ⺎ = <sort> 尢 
> ⺏ = <sort> 尣
> ⺐ = <sort> 尢
> ⺑ = <sort> 尣 
> ⺒ = <sort> 巳
> ⺓ = <sort> 幺
> ⺔ = <sort> 彑
> ⺕ = <sort> 彐 
> ⺖ = <sort> 忄
> ⺗ = <sort> 心 
> ⺘ = <sort> 扌
> ⺙ = <sort> 攵
> ⺛ = <sort> 旡
> ⺜ = <sort> 日 
> ⺝ = <sort> 月 
> ⺞ = <sort> 歺 
> ⺠ = <sort> 民
> ⺡ = <sort> 氵
> ⺢ = <sort> 氺
> ⺣ = <sort> 灬
> ⺤ = <sort> 爫
> ⺥ = <sort> 爫 
> ⺦ = <sort> 丬
> ⺧ = <sort> 牛 
> ⺨ = <sort> 犭
> ⺩ = <sort> 王 
> ⺪ = <sort> 疋 
> ⺫ = <sort> 目 
> ⺬ = <sort> 示 
> ⺭ = <sort> 礻
> ⺮ = <sort> 竹 
> ⺯ = <sort> 糹
> ⺰ = <sort> 纟
> ⺱ = <sort> 罓
> ⺲ = <sort> 罒
> ⺳ = <sort> 罓 
> ⺴ = <sort> 罓 
> ⺵ = <sort> 罒 
> ⺶ = <sort> 羊 
> ⺷ = <sort> 羊 
> ⺸ = <sort> 羋
> ⺹ = <sort> 耂
> ⺺ = <sort> 肀
> ⺻ = <sort> 聿 
> ⺼ = <sort> 肉 
> ⺽ = <sort> 臼 
> ⺾ = <sort> 艹
> ⺿ = <sort> 艹 
> ⻀ = <sort> 艹 
> ⻁ = <sort> 虎
> ⻂ = <sort> 衤
> ⻃ = <sort> 覀
> ⻄ = <sort> 西
> ⻅ = <sort> 见
> ⻆ = <sort> 角
> ⻇ = <sort> 角 
> ⻈ = <sort> 讠
> ⻉ = <sort> 贝
> ⻊ = <sort> 足 
> ⻋ = <sort> 车
> ⻌ = <sort> 辶
> ⻍ = <sort> 辶 
> ⻎ = <sort> 辶 
> ⻏ = <sort> 邑 
> ⻐ = <sort> 钅
> ⻑ = <sort> 長
> ⻒ = <sort> 镸
> ⻓ = <sort> 长
> ⻔ = <sort> 门
> ⻕ = <sort> 阜 
> ⻖ = <sort> 阝
> ⻗ = <sort> 雨 
> ⻘ = <sort> 青
> ⻙ = <sort> 韦
> ⻚ = <sort> 页
> ⻛ = <sort> 风
> ⻜ = <sort> 飞
> ⻝ = <sort> 食
> ⻞ = <sort> 飠 
> ⻟ = <sort> 飠
> ⻠ = <sort> 饣
> ⻡ = <sort> 首 
> ⻢ = <sort> 马
> ⻣ = <sort> 骨 
> ⻤ = <sort> 鬼 
> ⻥ = <sort> 鱼
> ⻦ = <sort> 鸟
> ⻧ = <sort> 鹵 
> ⻨ = <sort> 麦
> ⻩ = <sort> 黄
> ⻪ = <sort> 黾
> ⻫ = <sort> 齊 
> ⻬ = <sort> 齐
> ⻭ = <sort> 齒 
> ⻮ = <sort> 齿
> ⻯ = <sort> 龍 
> ⻰ = <sort> 龙
> ⻱ = <sort> 龜 
> ⻲ = <sort> 龜 
> 〆 = <sort> し め
> 〲 = 〱 ゙
> 〴 = 〳 ゙
> 〼 = <sort> ま す
< ゛ = <compat>   ゙
< ゜ = <compat>   ゚
> ㆠ = <sort> ㄅ 
> ㆡ = <sort> ㄗ 
> ㆢ = <sort> ㄐ 
> ㆣ = <sort> ㄍ 
> ㆥ = <sort> ㆤ 
> ㆧ = <sort> ㄛ 
> ㆨ = <sort> ㄨ 
> ㆩ = <sort> ㄚ 
> ㆪ = <sort> ㄧ 
> ㆫ = <sort> ㄨ 
> ㆮ = <sort> ㄞ 
> ㆯ = <sort> ㄠ 
> ㆳ = <vertical> ㄧ 
> ㆴ = <final> ㄆ
> ㆵ = <final> ㄊ
> ㆶ = <final> ㄎ
> ㆷ = <final> ㄏ
> ㉈ = <circle> 1 0
> ㉉ = <circle> 2 0
> ㉊ = <circle> 3 0
> ㉋ = <circle> 4 0
> ㉌ = <circle> 5 0
> ㉍ = <circle> 6 0
> ㉎ = <circle> 7 0
> ㉏ = <circle> 8 0
< ㋐ = <circle> ア
< ㋑ = <circle> イ
< ㋒ = <circle> ウ
< ㋓ = <circle> エ
< ㋔ = <circle> オ
< ㋕ = <circle> カ
< ㋖ = <circle> キ
< ㋗ = <circle> ク
< ㋘ = <circle> ケ
< ㋙ = <circle> コ
< ㋚ = <circle> サ
< ㋛ = <circle> シ
< ㋜ = <circle> ス
< ㋝ = <circle> セ
< ㋞ = <circle> ソ
< ㋟ = <circle> タ
< ㋠ = <circle> チ
< ㋡ = <circle> ツ
< ㋢ = <circle> テ
< ㋣ = <circle> ト
< ㋤ = <circle> ナ
< ㋥ = <circle> ニ
< ㋦ = <circle> ヌ
< ㋧ = <circle> ネ
< ㋨ = <circle> ノ
< ㋩ = <circle> ハ
< ㋪ = <circle> ヒ
< ㋫ = <circle> フ
< ㋬ = <circle> ヘ
< ㋭ = <circle> ホ
< ㋮ = <circle> マ
< ㋯ = <circle> ミ
< ㋰ = <circle> ム
< ㋱ = <circle> メ
< ㋲ = <circle> モ
< ㋳ = <circle> ヤ
< ㋴ = <circle> ユ
< ㋵ = <circle> ヨ
< ㋶ = <circle> ラ
< ㋷ = <circle> リ
< ㋸ = <circle> ル
< ㋹ = <circle> レ
< ㋺ = <circle> ロ
< ㋻ = <circle> ワ
< ㋼ = <circle> ヰ
< ㋽ = <circle> ヱ
< ㋾ = <circle> ヲ
> ㋐ = <circlekata> ア
> ㋑ = <circlekata> イ
> ㋒ = <circlekata> ウ
> ㋓ = <circlekata> エ
> ㋔ = <circlekata> オ
> ㋕ = <circlekata> カ
> ㋖ = <circlekata> キ
> ㋗ = <circlekata> ク
> ㋘ = <circlekata> ケ
> ㋙ = <circlekata> コ
> ㋚ = <circlekata> サ
> ㋛ = <circlekata> シ
> ㋜ = <circlekata> ス
> ㋝ = <circlekata> セ
> ㋞ = <circlekata> ソ
> ㋟ = <circlekata> タ
> ㋠ = <circlekata> チ
> ㋡ = <circlekata> ツ
> ㋢ = <circlekata> テ
> ㋣ = <circlekata> ト
> ㋤ = <circlekata> ナ
> ㋥ = <circlekata> ニ
> ㋦ = <circlekata> ヌ
> ㋧ = <circlekata> ネ
> ㋨ = <circlekata> ノ
> ㋩ = <circlekata> ハ
> ㋪ = <circlekata> ヒ
> ㋫ = <circlekata> フ
> ㋬ = <circlekata> ヘ
> ㋭ = <circlekata> ホ
> ㋮ = <circlekata> マ
> ㋯ = <circlekata> ミ
> ㋰ = <circlekata> ム
> ㋱ = <circlekata> メ
> ㋲ = <circlekata> モ
> ㋳ = <circlekata> ヤ
> ㋴ = <circlekata> ユ
> ㋵ = <circlekata> ヨ
> ㋶ = <circlekata> ラ
> ㋷ = <circlekata> リ
> ㋸ = <circlekata> ル
> ㋹ = <circlekata> レ
> ㋺ = <circlekata> ロ
> ㋻ = <circlekata> ワ
> ㋼ = <circlekata> ヰ
> ㋽ = <circlekata> ヱ
> ㋾ = <circlekata> ヲ
< ㍸ = <square> d m <super> 2
< ㍹ = <square> d m <super> 3
> ㍸ = <square> d m 2
> ㍹ = <square> d m 3
< ㎕ = <square> μ <font> l
< ㎖ = <square> m <font> l
< ㎗ = <square> d <font> l
< ㎘ = <square> k <font> l
> ㎕ = <square> μ l
> ㎖ = <square> m l
> ㎗ = <square> d l
> ㎘ = <square> k l
< ㎟ = <square> m m <super> 2
< ㎠ = <square> c m <super> 2
< ㎡ = <square> m <super> 2
< ㎢ = <square> k m <super> 2
< ㎣ = <square> m m <super> 3
< ㎤ = <square> c m <super> 3
< ㎥ = <square> m <super> 3
< ㎦ = <square> k m <super> 3
> ㎟ = <square> m m 2
> ㎠ = <square> c m 2
> ㎡ = <square> m 2
> ㎢ = <square> k m 2
> ㎣ = <square> m m 3
> ㎤ = <square> c m 3
> ㎥ = <square> m 3
> ㎦ = <square> k m 3
< ㎨ = <square> m ∕ s <super> 2
> ㎨ = <square> m ∕ s 2
< ㎯ = <square> r a d ∕ s <super> 2
> ㎯ = <square> r a d ∕ s 2
> ꘐ = <sort> ꕘ
> ꘑ = <sort> ꕪ
> ꘒ = <sort> ꖇ
> ꘓ = <sort> ꔌ ꘋ
> ꘔ = <sort> ꔞ ꘋ
> ꘕ = <sort> ꔳ ꘋ
> ꘖ = <sort> ꕇ ꘌ
> ꘗ = <sort> ꕒ ꘋ
> ꘘ = <sort> ꕘ ꘌ
> ꘙ = <sort> ꕚ ꘌ
> ꘚ = <sort> ꕠ ꘋ
> ꘛ = <sort> ꖅ ꘋ
> ꘜ = <sort> ꖴ ꘋ
> ꘝ = <sort> ꗋ ꘋ
> ꘞ = <sort> ꗑ ꘌ
> ꘟ = <sort> ꗘ ꘋ
> ꘪ = <sort> ꕮ
> ꘫ = <sort> ꗑ
> Ꙩ = <sort> О
> ꙩ = <sort> о
> Ꙫ = <sort> О
> ꙫ = <sort> о
> Ꙭ = <sort> О
> ꙭ = <sort> о
> ꙮ = <sort> о
> ꙴ = <sort> є
> ꙵ = <sort> и
> ꙶ = <sort> і ̈
> ꙷ = <sort> у
> ꙸ = <sort> ъ
> ꙹ = <sort> ы
> ꙺ = <sort> ь
> ꙻ = <sort> ѡ
> ꙼ = 
> ꙽ = 
> Ꚙ = <sort> О
> ꚙ = <sort> о
> Ꚛ = <sort> О
> ꚛ = <sort> о
> ꚞ = <sort> ф
> ꚟ = <sort> ѥ
> Ꜩ = <sort> T z
> ꜩ = <sort> t z
> Ꜳ = <sort> A A
> ꜳ = <sort> a a
> Ꜵ = <sort> A O
> ꜵ = <sort> a o
> Ꜷ = <sort> A U
> ꜷ = <sort> a u
> Ꜹ = <sort> A V
> ꜹ = <sort> a v
> Ꜻ = <sort> A  V
> ꜻ = <sort> a  v
> Ꜽ = <sort> A Y
> ꜽ = <sort> a y
> Ꝏ = <sort> O O
> ꝏ = <sort> o o
> Ꝡ = <sort> V Y
> ꝡ = <sort> v y
< ꟸ = <super> Ħ
< ꟹ = <super> œ
> Ꝺ = <sort> D 
> ꝺ = <sort> d 
> Ꝼ = <sort> F 
> ꝼ = <sort> f 
> Ᵹ = <sort> G 
> Ꞃ = <sort> R 
> ꞃ = <sort> r 
> Ꞅ = <sort> S 
> ꞅ = <sort> s 
> Ꞇ = <sort> T 
> ꞇ = <sort> t 
> Ꞛ = <sort> A ̈
> ꞛ = <sort> a ̈
> Ꞝ = <sort> O ̈
> ꞝ = <sort> o ̈
> Ꞟ = <sort> U ̈
> ꞟ = <sort> u ̈
> Ꞡ = <sort> G 
> ꞡ = <sort> g 
> Ꞣ = <sort> K 
> ꞣ = <sort> k 
> Ꞥ = <sort> N 
> ꞥ = <sort> n 
> Ꞧ = <sort> R 
> ꞧ = <sort> r 
> Ꞩ = <sort> S 
> ꞩ = <sort> s 
> ꟸ = <super> H ̵
> ꟹ = <super> o  e
> ꠋ = ं
> ꢀ = ं
> ꢁ = ः
> ꣳ = <sort> ꣲ
> ꣴ = <sort> ꣲ
> ꣵ = <sort> ꣲ
> ꣶ = <sort> ꣲ
> ꣷ = <sort> ꣲ
> ꦀ = ँ
> ꦁ = ं
> ꦃ = ः
> ꦬ = <sort> ꦫ
> ꦳ = ़
< ﬅ = <compat> <compat> s t
> ﬅ = <compat> s  t
< ײַ = ײ ַ
> ײַ = <sort> י י ַ
< ﬦ = <font> ם
> ﬦ = <font> מ
< ךּ = ך ּ
> ךּ = <final> כ ּ
< ףּ = ף ּ
> ףּ = <final> פ ּ
< ﯝ = <isolated> <compat> ۇ ٴ
> ﯝ = <isolated> ۇ ء
< ﯪ = <isolated> ي ٔ ا
< ﯫ = <final> ي ٔ ا
< ﯬ = <isolated> ي ٔ ە
< ﯭ = <final> ي ٔ ە
< ﯮ = <isolated> ي ٔ و
< ﯯ = <final> ي ٔ و
< ﯰ = <isolated> ي ٔ ۇ
< ﯱ = <final> ي ٔ ۇ
< ﯲ = <isolated> ي ٔ ۆ
< ﯳ = <final> ي ٔ ۆ
< ﯴ = <isolated> ي ٔ ۈ
< ﯵ = <final> ي ٔ ۈ
< ﯶ = <isolated> ي ٔ ې
< ﯷ = <final> ي ٔ ې
< ﯸ = <initial> ي ٔ ې
< ﯹ = <isolated> ي ٔ ى
< ﯺ = <final> ي ٔ ى
< ﯻ = <initial> ي ٔ ى
> ﯪ = <isolated> ئ ا
> ﯫ = <final> ئ ا
> ﯬ = <isolated> ئ ە
> ﯭ = <final> ئ ە
> ﯮ = <isolated> ئ و
> ﯯ = <final> ئ و
> ﯰ = <isolated> ئ ۇ
> ﯱ = <final> ئ ۇ
> ﯲ = <isolated> ئ ۆ
> ﯳ = <final> ئ ۆ
> ﯴ = <isolated> ئ ۈ
> ﯵ = <final> ئ ۈ
> ﯶ = <isolated> ئ ې
> ﯷ = <final> ئ ې
> ﯸ = <initial> ئ ې
> ﯹ = <isolated> ئ ى
> ﯺ = <final> ئ ى
> ﯻ = <initial> ئ ى
< ﰀ = <isolated> ي ٔ ج
< ﰁ = <isolated> ي ٔ ح
< ﰂ = <isolated> ي ٔ م
< ﰃ = <isolated> ي ٔ ى
< ﰄ = <isolated> ي ٔ ي
> ﰀ = <isolated> ئ ج
> ﰁ = <isolated> ئ ح
> ﰂ = <isolated> ئ م
> ﰃ = <isolated> ئ ى
> ﰄ = <isolated> ئ ي
< ﱞ = <isolated>   ٌ ّ
< ﱟ = <isolated>   ٍ ّ
< ﱠ = <isolated>   َ ّ
< ﱡ = <isolated>   ُ ّ
< ﱢ = <isolated>   ِ ّ
< ﱣ = <isolated>   ّ ٰ
< ﱤ = <final> ي ٔ ر
< ﱥ = <final> ي ٔ ز
< ﱦ = <final> ي ٔ م
< ﱧ = <final> ي ٔ ن
< ﱨ = <final> ي ٔ ى
< ﱩ = <final> ي ٔ ي
> ﱞ = <isolated> ٌ ّ
> ﱟ = <isolated> ٍ ّ
> ﱠ = <isolated> َ ّ
> ﱡ = <isolated> ُ ّ
> ﱢ = <isolated> ِ ّ
> ﱣ = <isolated> ّ ٰ
> ﱤ = <final> ئ ر
> ﱥ = <final> ئ ز
> ﱦ = <final> ئ م
> ﱧ = <final> ئ ن
> ﱨ = <final> ئ ى
> ﱩ = <final> ئ ي
< ﲗ = <initial> ي ٔ ج
< ﲘ = <initial> ي ٔ ح
< ﲙ = <initial> ي ٔ خ
< ﲚ = <initial> ي ٔ م
< ﲛ = <initial> ي ٔ ه
> ﲗ = <initial> ئ ج
> ﲘ = <initial> ئ ح
> ﲙ = <initial> ئ خ
> ﲚ = <initial> ئ م
> ﲛ = <initial> ئ ه
< ﳟ = <medial> ي ٔ م
< ﳠ = <medial> ي ٔ ه
> ﳟ = <medial> ئ م
> ﳠ = <medial> ئ ه
< ﳲ = <medial> ـ َ ّ
< ﳳ = <medial> ـ ُ ّ
< ﳴ = <medial> ـ ِ ّ
> ﳲ = <medial> َ ّ
> ﳳ = <medial> ُ ّ
> ﳴ = <medial> ِ ّ
< ︙ = <vertical> <compat> . . .
< ︰ = <vertical> <compat> . .
> ︙ = <vertical> . . .
> ︠ = ͡
> ︢ = ͠
> ︧ = 
> ︩ = ͠
> ︮ = ҃
> ︰ = <vertical> . .
< ﹉ = <compat> <compat>   ̅
< ﹊ = <compat> <compat>   ̅
< ﹋ = <compat> <compat>   ̅
< ﹌ = <compat> <compat>   ̅
> ﹉ = <compat> ‾
> ﹊ = <compat> ‾
> ﹋ = <compat> ‾
> ﹌ = <compat> ‾
< ﹰ = <isolated>   ً
< ﹱ = <medial> ـ ً
< ﹲ = <isolated>   ٌ
< ﹴ = <isolated>   ٍ
< ﹶ = <isolated>   َ
< ﹷ = <medial> ـ َ
< ﹸ = <isolated>   ُ
< ﹹ = <medial> ـ ُ
< ﹺ = <isolated>   ِ
< ﹻ = <medial> ـ ِ
< ﹼ = <isolated>   ّ
< ﹽ = <medial> ـ ّ
< ﹾ = <isolated>   ْ
< ﹿ = <medial> ـ ْ
> ﹰ = <isolated> ً
> ﹱ = <medial> ً
> ﹲ = <isolated> ٌ
> ﹴ = <isolated> ٍ
> ﹶ = <isolated> َ
> ﹷ = <medial> َ
> ﹸ = <isolated> ُ
> ﹹ = <medial> ُ
> ﹺ = <isolated> ِ
> ﹻ = <medial> ِ
> ﹼ = <isolated> ّ
> ﹽ = <medial> ّ
> ﹾ = <isolated> ْ
> ﹿ = <medial> ْ
< ﺁ = <isolated> ا ٓ
< ﺂ = <final> ا ٓ
< ﺃ = <isolated> ا ٔ
< ﺄ = <final> ا ٔ
< ﺅ = <isolated> و ٔ
< ﺆ = <final> و ٔ
< ﺇ = <isolated> ا ٕ
< ﺈ = <final> ا ٕ
< ﺉ = <isolated> ي ٔ
< ﺊ = <final> ي ٔ
< ﺋ = <initial> ي ٔ
< ﺌ = <medial> ي ٔ
> ﺁ = <isolated> آ
> ﺂ = <final> آ
> ﺃ = <isolated> أ
> ﺄ = <final> أ
> ﺅ = <isolated> ؤ
> ﺆ = <final> ؤ
> ﺇ = <isolated> إ
> ﺈ = <final> إ
> ﺉ = <isolated> ئ
> ﺊ = <final> ئ
> ﺋ = <initial> ئ
> ﺌ = <medial> ئ
< ﻵ = <isolated> ل ا ٓ
< ﻶ = <final> ل ا ٓ
< ﻷ = <isolated> ل ا ٔ
< ﻸ = <final> ل ا ٔ
< ﻹ = <isolated> ل ا ٕ
< ﻺ = <final> ل ا ٕ
> ﻵ = <isolated> ل آ
> ﻶ = <final> ل آ
> ﻷ = <isolated> ل أ
> ﻸ = <final> ل أ
> ﻹ = <isolated> ل إ
> ﻺ = <final> ل إ
< ｧ = <narrow> ァ
< ｨ = <narrow> ィ
< ｩ = <narrow> ゥ
< ｪ = <narrow> ェ
< ｫ = <narrow> ォ
< ｬ = <narrow> ャ
< ｭ = <narrow> ュ
< ｮ = <narrow> ョ
< ｯ = <narrow> ッ
> ｧ = <smallnarrow> ア
> ｨ = <smallnarrow> イ
> ｩ = <smallnarrow> ウ
> ｪ = <smallnarrow> エ
> ｫ = <smallnarrow> オ
> ｬ = <smallnarrow> ヤ
> ｭ = <smallnarrow> ユ
> ｮ = <smallnarrow> ヨ
> ｯ = <smallnarrow> ツ
< ﾠ = <narrow> <compat> ᅠ
< ﾡ = <narrow> <compat> ᄀ
< ﾢ = <narrow> <compat> ᄁ
< ﾣ = <narrow> <compat> ᆪ
< ﾤ = <narrow> <compat> ᄂ
< ﾥ = <narrow> <compat> ᆬ
< ﾦ = <narrow> <compat> ᆭ
< ﾧ = <narrow> <compat> ᄃ
< ﾨ = <narrow> <compat> ᄄ
< ﾩ = <narrow> <compat> ᄅ
< ﾪ = <narrow> <compat> ᆰ
< ﾫ = <narrow> <compat> ᆱ
< ﾬ = <narrow> <compat> ᆲ
< ﾭ = <narrow> <compat> ᆳ
< ﾮ = <narrow> <compat> ᆴ
< ﾯ = <narrow> <compat> ᆵ
< ﾰ = <narrow> <compat> ᄚ
< ﾱ = <narrow> <compat> ᄆ
< ﾲ = <narrow> <compat> ᄇ
< ﾳ = <narrow> <compat> ᄈ
< ﾴ = <narrow> <compat> ᄡ
< ﾵ = <narrow> <compat> ᄉ
< ﾶ = <narrow> <compat> ᄊ
< ﾷ = <narrow> <compat> ᄋ
< ﾸ = <narrow> <compat> ᄌ
< ﾹ = <narrow> <compat> ᄍ
< ﾺ = <narrow> <compat> ᄎ
< ﾻ = <narrow> <compat> ᄏ
< ﾼ = <narrow> <compat> ᄐ
< ﾽ = <narrow> <compat> ᄑ
< ﾾ = <narrow> <compat> ᄒ
< ￂ = <narrow> <compat> ᅡ
< ￃ = <narrow> <compat> ᅢ
< ￄ = <narrow> <compat> ᅣ
< ￅ = <narrow> <compat> ᅤ
< ￆ = <narrow> <compat> ᅥ
< ￇ = <narrow> <compat> ᅦ
< ￊ = <narrow> <compat> ᅧ
< ￋ = <narrow> <compat> ᅨ
< ￌ = <narrow> <compat> ᅩ
< ￍ = <narrow> <compat> ᅪ
< ￎ = <narrow> <compat> ᅫ
< ￏ = <narrow> <compat> ᅬ
< ￒ = <narrow> <compat> ᅭ
< ￓ = <narrow> <compat> ᅮ
< ￔ = <narrow> <compat> ᅯ
< ￕ = <narrow> <compat> ᅰ
< ￖ = <narrow> <compat> ᅱ
< ￗ = <narrow> <compat> ᅲ
< ￚ = <narrow> <compat> ᅳ
< ￛ = <narrow> <compat> ᅴ
< ￜ = <narrow> <compat> ᅵ
> ﾠ = <narrow> ᅠ
> ﾡ = <narrow> ᄀ
> ﾢ = <narrow> ᄁ
> ﾣ = <narrow> ᆪ
> ﾤ = <narrow> ᄂ
> ﾥ = <narrow> ᆬ
> ﾦ = <narrow> ᆭ
> ﾧ = <narrow> ᄃ
> ﾨ = <narrow> ᄄ
> ﾩ = <narrow> ᄅ
> ﾪ = <narrow> ᆰ
> ﾫ = <narrow> ᆱ
> ﾬ = <narrow> ᆲ
> ﾭ = <narrow> ᆳ
> ﾮ = <narrow> ᆴ
> ﾯ = <narrow> ᆵ
> ﾰ = <narrow> ᄚ
> ﾱ = <narrow> ᄆ
> ﾲ = <narrow> ᄇ
> ﾳ = <narrow> ᄈ
> ﾴ = <narrow> ᄡ
> ﾵ = <narrow> ᄉ
> ﾶ = <narrow> ᄊ
> ﾷ = <narrow> ᄋ
> ﾸ = <narrow> ᄌ
> ﾹ = <narrow> ᄍ
> ﾺ = <narrow> ᄎ
> ﾻ = <narrow> ᄏ
> ﾼ = <narrow> ᄐ
> ﾽ = <narrow> ᄑ
> ﾾ = <narrow> ᄒ
> ￂ = <narrow> ᅡ
> ￃ = <narrow> ᅢ
> ￄ = <narrow> ᅣ
> ￅ = <narrow> ᅤ
> ￆ = <narrow> ᅥ
> ￇ = <narrow> ᅦ
> ￊ = <narrow> ᅧ
> ￋ = <narrow> ᅨ
> ￌ = <narrow> ᅩ
> ￍ = <narrow> ᅪ
> ￎ = <narrow> ᅫ
> ￏ = <narrow> ᅬ
> ￒ = <narrow> ᅭ
> ￓ = <narrow> ᅮ
> ￔ = <narrow> ᅯ
> ￕ = <narrow> ᅰ
> ￖ = <narrow> ᅱ
> ￗ = <narrow> ᅲ
> ￚ = <narrow> ᅳ
> ￛ = <narrow> ᅴ
> ￜ = <narrow> ᅵ
< ￣ = <wide> <compat>   ̄
> ￣ = <wide> ¯
< 𑂚 = 𑂙 𑂺
< 𑂜 = 𑂛 𑂺
< 𑂫 = 𑂥 𑂺
> 𐍶 = <sort> 𐍐
> 𐍷 = <sort> 𐍓
> 𐍸 = <sort> 𐍗
> 𐍹 = <sort> 𐍝
> 𐍺 = <sort> 𐍡
> 𐡭 = <final> 𐡮
> 𐢀 = <final> 𐢁
> 𐢂 = <final> 𐢃
> 𐢆 = <final> 𐢇
> 𐢌 = <final> 𐢍
> 𐢎 = <final> 𐢏
> 𐢐 = <final> 𐢑
> 𐢒 = <final> 𐢓
> 𐢔 = <final> 𐢕
> 𐢜 = <final> 𐢝
> 𐦀 = <sort> 𐦠 
> 𐦁 = <sort> 𐦡 
> 𐦂 = <sort> 𐦢 
> 𐦃 = <sort> 𐦣 
> 𐦄 = <sort> 𐦤 
> 𐦅 = <sort> 𐦥 
> 𐦆 = <sort> 𐦦 
> 𐦇 = <sort> 𐦦 
> 𐦈 = <sort> 𐦧 
> 𐦉 = <sort> 𐦨 
> 𐦊 = <sort> 𐦩 
> 𐦋 = <sort> 𐦩 
> 𐦌 = <sort> 𐦪 
> 𐦍 = <sort> 𐦪 
> 𐦎 = <sort> 𐦫 
> 𐦏 = <sort> 𐦫 
> 𐦐 = <sort> 𐦬 
> 𐦑 = <sort> 𐦭 
> 𐦒 = <sort> 𐦮 
> 𐦓 = <sort> 𐦯 
> 𐦔 = <sort> 𐦯 
> 𐦕 = <sort> 𐦱 
> 𐦖 = <sort> 𐦲 
> 𐦗 = <sort> 𐦳 
> 𐦘 = <sort> 𐦴 
> 𐦙 = <sort> 𐦴 
> 𐦚 = <sort> 𐦵 
> 𐦛 = <sort> 𐦵 
> 𐦜 = <sort> 𐦶 
> 𐦝 = <sort> 𐦷 
> 𐦰 = <sort> 𐦯 
> 𐨍 = 
> 𐨎 = ं
> 𐨏 = ः
> 𐫈 = <sort> 𐫇 
> 𐫥 = 
> 𐫦 = 
> 𐬮 = <sort> 𐬭 
> 𐰁 = <sort> 𐰀 
> 𐰄 = <sort> 𐰃 
> 𐰈 = <sort> 𐰇 
> 𐰊 = <sort> 𐰉 
> 𐰌 = <sort> 𐰋 
> 𐰎 = <sort> 𐰍 
> 𐰐 = <sort> 𐰏 
> 𐰒 = <sort> 𐰑 
> 𐰕 = <sort> 𐰔 
> 𐰗 = <sort> 𐰖 
> 𐰙 = <sort> 𐰘 
> 𐰛 = <sort> 𐰚 
> 𐰝 = <sort> 𐰜 
> 𐰟 = <sort> 𐰞 
> 𐰥 = <sort> 𐰤 
> 𐰧 = <sort> 𐰦 
> 𐰩 = <sort> 𐰨 
> 𐰫 = <sort> 𐰪 
> 𐰮 = <sort> 𐰭 
> 𐰳 = <sort> 𐰲 
> 𐰵 = <sort> 𐰴 
> 𐰷 = <sort> 𐰶 
> 𐰹 = <sort> 𐰸 
> 𐰻 = <sort> 𐰺 
> 𐱀 = <sort> 𐰿 
> 𐱂 = <sort> 𐱁 
> 𐱄 = <sort> 𐱃 
> 𐱆 = <sort> 𐱅 
> 𐲁 = <sort> 𐲀 
> 𐲊 = <sort> 𐲉 
> 𐲋 = <sort> 𐲉 
> 𐲑 = <sort> 𐲐 
> 𐲜 = <sort> 𐲛 
> 𐲞 = <sort> 𐲝 
> 𐲟 = <sort> 𐲝 
> 𐲣 = <sort> 𐲢 
> 𐲫 = <sort> 𐲪 
> 𐲭 = <sort> 𐲬 
> 𐳁 = <sort> 𐳀 
> 𐳊 = <sort> 𐳉 
> 𐳋 = <sort> 𐳉 
> 𐳑 = <sort> 𐳐 
> 𐳜 = <sort> 𐳛 
> 𐳞 = <sort> 𐳝 
> 𐳟 = <sort> 𐳝 
> 𐳣 = <sort> 𐳢 
> 𐳫 = <sort> 𐳪 
> 𐳭 = <sort> 𐳬 
> 𑀀 = ँ
> 𑀁 = ं
> 𑀂 = ः
> 𑂀 = ँ
> 𑂁 = ं
> 𑂂 = ः
> 𑂚 = 𑂙 ़
> 𑂜 = 𑂛 ़
> 𑂫 = 𑂥 ़
> 𑂺 = ़
> 𑄀 = ँ
> 𑄁 = ं
> 𑄂 = ः
> 𑅳 = ़
> 𑆀 = ँ
> 𑆁 = ं
> 𑆂 = ः
> 𑇊 = ़
> 𑈴 = ं
> 𑈶 = ़
> 𑈷 = ّ
> 𑋟 = ं
> 𑋩 = ़
> 𑌀 = ं
> 𑌁 = ँ
> 𑌂 = ं
> 𑌃 = ः
> 𑌼 = ़
> 𑒿 = ँ
> 𑓀 = ं
> 𑓁 = ः
> 𑓃 = ़
> 𑖼 = ँ
> 𑖽 = ं
> 𑖾 = ः
> 𑗀 = ़
> 𑗘 = <sort> 𑖂 
> 𑗙 = <sort> 𑖂 
> 𑗚 = <sort> 𑖃 
> 𑗛 = <sort> 𑖄 
> 𑗜 = <sort> 𑖲 
> 𑗝 = <sort> 𑖳 
> 𑘽 = ं
> 𑘾 = ः
> 𑙀 = ँ
> 𑚫 = ं
> 𑚬 = ः
> 𑚷 = ़
> 𑜅 = <sort> 𑜄 
> 𑜖 = <sort> 𑜕 
> 𖼆 = <sort> 𖼄
> 𖼓 = <sort> 𖼐
> 𖼥 = <sort> 𖼣
> 𖼿 = <sort> 𖼽
> 𛲝 = 
> 𛲞 = 
< 𝚹 = <font> <compat> Θ
> 𝚹 = <font> Θ
< 𝛓 = <font> ς
> 𝛓 = <font> σ
< 𝛜 = <font> <compat> ε
< 𝛝 = <font> <compat> θ
< 𝛞 = <font> <compat> κ
< 𝛟 = <font> <compat> φ
< 𝛠 = <font> <compat> ρ
< 𝛡 = <font> <compat> π
> 𝛜 = <font> ε
> 𝛝 = <font> θ
> 𝛞 = <font> κ
> 𝛟 = <font> φ
> 𝛠 = <font> ρ
> 𝛡 = <font> π
< 𝛳 = <font> <compat> Θ
> 𝛳 = <font> Θ
< 𝜍 = <font> ς
> 𝜍 = <font> σ
< 𝜖 = <font> <compat> ε
< 𝜗 = <font> <compat> θ
< 𝜘 = <font> <compat> κ
< 𝜙 = <font> <compat> φ
< 𝜚 = <font> <compat> ρ
< 𝜛 = <font> <compat> π
> 𝜖 = <font> ε
> 𝜗 = <font> θ
> 𝜘 = <font> κ
> 𝜙 = <font> φ
> 𝜚 = <font> ρ
> 𝜛 = <font> π
< 𝜭 = <font> <compat> Θ
> 𝜭 = <font> Θ
< 𝝇 = <font> ς
> 𝝇 = <font> σ
< 𝝐 = <font> <compat> ε
< 𝝑 = <font> <compat> θ
< 𝝒 = <font> <compat> κ
< 𝝓 = <font> <compat> φ
< 𝝔 = <font> <compat> ρ
< 𝝕 = <font> <compat> π
> 𝝐 = <font> ε
> 𝝑 = <font> θ
> 𝝒 = <font> κ
> 𝝓 = <font> φ
> 𝝔 = <font> ρ
> 𝝕 = <font> π
< 𝝧 = <font> <compat> Θ
> 𝝧 = <font> Θ
< 𝞁 = <font> ς
> 𝞁 = <font> σ
< 𝞊 = <font> <compat> ε
< 𝞋 = <font> <compat> θ
< 𝞌 = <font> <compat> κ
< 𝞍 = <font> <compat> φ
< 𝞎 = <font> <compat> ρ
< 𝞏 = <font> <compat> π
> 𝞊 = <font> ε
> 𝞋 = <font> θ
> 𝞌 = <font> κ
> 𝞍 = <font> φ
> 𝞎 = <font> ρ
> 𝞏 = <font> π
< 𝞡 = <font> <compat> Θ
> 𝞡 = <font> Θ
< 𝞻 = <font> ς
> 𝞻 = <font> σ
< 𝟄 = <font> <compat> ε
< 𝟅 = <font> <compat> θ
< 𝟆 = <font> <compat> κ
< 𝟇 = <font> <compat> φ
< 𝟈 = <font> <compat> ρ
< 𝟉 = <font> <compat> π
> 𝟄 = <font> ε
> 𝟅 = <font> θ
> 𝟆 = <font> κ
> 𝟇 = <font> φ
> 𝟈 = <font> ρ
> 𝟉 = <font> π
> 🄋 = <circle> 0
> 🄌 = <circle> 0
> 🅐 = <circle> A
> 🅑 = <circle> B
> 🅒 = <circle> C
> 🅓 = <circle> D
> 🅔 = <circle> E
> 🅕 = <circle> F
> 🅖 = <circle> G
> 🅗 = <circle> H
> 🅘 = <circle> I
> 🅙 = <circle> J
> 🅚 = <circle> K
> 🅛 = <circle> L
> 🅜 = <circle> M
> 🅝 = <circle> N
> 🅞 = <circle> O
> 🅟 = <circle> P
> 🅠 = <circle> Q
> 🅡 = <circle> R
> 🅢 = <circle> S
> 🅣 = <circle> T
> 🅤 = <circle> U
> 🅥 = <circle> V
> 🅦 = <circle> W
> 🅧 = <circle> X
> 🅨 = <circle> Y
> 🅩 = <circle> Z
> 🅰 = <square> A
> 🅱 = <square> B
> 🅲 = <square> C
> 🅳 = <square> D
> 🅴 = <square> E
> 🅵 = <square> F
> 🅶 = <square> G
> 🅷 = <square> H
> 🅸 = <square> I
> 🅹 = <square> J
> 🅺 = <square> K
> 🅻 = <square> L
> 🅼 = <square> M
> 🅽 = <square> N
> 🅾 = <square> O
> 🅿 = <square> P
> 🆀 = <square> Q
> 🆁 = <square> R
> 🆂 = <square> S
> 🆃 = <square> T
> 🆄 = <square> U
> 🆅 = <square> V
> 🆆 = <square> W
> 🆇 = <square> X
> 🆈 = <square> Y
> 🆉 = <square> Z
> 🆊 = <square> P
> 🆋 = <square> I C
> 🆌 = <square> P A
> 🆍 = <square> S A
> 🆎 = <square> A B
> 🆏 = <square> W C
> 🆑 = <square> C L
> 🆒 = <square> C O O L
> 🆓 = <square> F R E E
> 🆔 = <square> I D
> 🆕 = <square> N E W
> 🆖 = <square> N G
> 🆗 = <square> O K
> 🆘 = <square> S O S
> 🆙 = <square> U P !
> 🆚 = <square> V S

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-23 20:24                                                                                       ` Yuri Khan
@ 2016-02-25 12:11                                                                                         ` Richard Stallman
  2016-02-25 14:57                                                                                           ` Yuri Khan
  0 siblings, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-25 12:11 UTC (permalink / raw)
  To: Yuri Khan; +Cc: eliz, lokedhs, larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > When looking for confusables, you don’t want to fold. You want to make
  > letters of different scripts stand out, e.g. by font-locking.

That might be a good feature, but the devil is in the details.
Would you like to discuss possible details here?

Meanwhile, I don't think it has to be one or the other.
It might be good to do both.

It might be difficult to design a convention to distinguish
Latin a and Cyrillic a with fonts _all the time_.  So here's an idea:
when you search for Latin a and it finds Cyrillic a, it could put a special
font or color (this tty has no fonts) on the Cyrillic a
to show it matched as a confusable.  Likewise, if you search for Cyrillic a
and it finds Latin a, it would put that same font on the Latin a.

This needs just one font or color -- to indicate a confusable in search.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-24 17:54                                                                                         ` Eli Zaretskii
@ 2016-02-25 12:15                                                                                           ` Richard Stallman
  2016-02-25 12:38                                                                                             ` Joost Kremers
  0 siblings, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-25 12:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Not sure what to explain, to tell the truth.  What I had in mind is
  > cases like á, which I don't think any user of any language will ever
  > want to consider a non-decomposable character.

In French and Spanish, á is a decorated version of a.  Perhaps there
is no language in which á has any other status.

My point about decorated letters is that _in general_ the list of
decorated versions of letters is language-dependent.  For instance, ö
is a decorated o in English and French, but not in Swedish.  The
tables that define decorated letters need to be language-specific.

If it happens that all languages agree about á, that won't be a problem.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-24 17:56                                                                                         ` Eli Zaretskii
@ 2016-02-25 12:15                                                                                           ` Richard Stallman
  0 siblings, 0 replies; 263+ messages in thread
From: Richard Stallman @ 2016-02-25 12:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I didn't say the 2 examples were in the same class.  My point was that
  > we are not talking about equivalence of _characters_, we are talking
  > about equivalent character _sequences_.

That's true.  My point is, if folding is going to fold some sequences
with some letters, we need to put each sequence-match into the
appropriate level, in order to handle them properly.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-25 12:15                                                                                           ` Richard Stallman
@ 2016-02-25 12:38                                                                                             ` Joost Kremers
  2016-02-25 22:43                                                                                               ` John Wiegley
  0 siblings, 1 reply; 263+ messages in thread
From: Joost Kremers @ 2016-02-25 12:38 UTC (permalink / raw)
  To: rms; +Cc: Eli Zaretskii, lokedhs, larsi, emacs-devel


On Thu, Feb 25 2016, Richard Stallman wrote:
> If it happens that all languages agree about á, that won't be a problem.

I doubt that's the case. Though I don't actually speak the language, I
suspect that in Icelandic a and á are considered different letters. The
former is pronounced [a], the latter [au̯]. Similar considerations apply
to all the vowels e/é, i/í, o/ó, u/ú and y/ý.

-- 
Joost Kremers
Life has its moments



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-25 12:11                                                                                         ` Richard Stallman
@ 2016-02-25 14:57                                                                                           ` Yuri Khan
  2016-02-26 20:21                                                                                             ` Richard Stallman
  0 siblings, 1 reply; 263+ messages in thread
From: Yuri Khan @ 2016-02-25 14:57 UTC (permalink / raw)
  To: rms@gnu.org; +Cc: Eli Zaretskii, lokedhs, Lars Ingebrigtsen, Emacs developers

On Thu, Feb 25, 2016 at 6:11 PM, Richard Stallman <rms@gnu.org> wrote:

>   > When looking for confusables, you don’t want to fold. You want to make
>   > letters of different scripts stand out, e.g. by font-locking.
>
> That might be a good feature, but the devil is in the details.
> Would you like to discuss possible details here?

No.

> Meanwhile, I don't think it has to be one or the other.
> It might be good to do both.

What specific user scenario do you want to solve by folding
Latin/Greek/Cyrillic confusables?

> It might be difficult to design a convention to distinguish
> Latin a and Cyrillic a with fonts _all the time_.

There is no reason to distinguish them _all the time_. For convenient
reading, they should in fact be indistinguishable. The reader knows
from the surrounding context which letters are Latin and which are
Cyrillic.

It is when you are proof-reading text that it becomes important to
distinguish Latin and Cyrillic, to check that you don’t have a stray
Cyrillic letter within an English word, or vice-versa. (For that
matter, in this same mode it becomes important to distinguish various
kinds of Unicode spaces, hyphen/en dash/em dash/minus/figure dash,
degree sign/masculine ordinal, empty set/Latin letter o with stroke,
etc. A trained eye and a specially designed font goes a long way.)

> So here's an idea:
> when you search for Latin a and it finds Cyrillic a, it could put a special
> font or color (this tty has no fonts) on the Cyrillic a
> to show it matched as a confusable.  Likewise, if you search for Cyrillic a
> and it finds Latin a, it would put that same font on the Latin a.
>
> This needs just one font or color -- to indicate a confusable in search.

That’s assuming we *do* want to fold confusables. I’d like to know a
use case first.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-25  0:29                                                                                     ` Juri Linkov
@ 2016-02-25 16:24                                                                                       ` Eli Zaretskii
  2016-02-29  0:22                                                                                         ` Juri Linkov
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-25 16:24 UTC (permalink / raw)
  To: Juri Linkov; +Cc: larsi, lokedhs, rms, emacs-devel

> From: Juri Linkov <juri@linkov.net>
> Cc: larsi@gnus.org,  lokedhs@gmail.com,  rms@gnu.org,  emacs-devel@gnu.org
> Date: Thu, 25 Feb 2016 02:29:11 +0200
> 
> >> >> It seems two user variables are necessary for customization:
> >> >>
> >> >> 1. inclusive folding groups that will include by default such pairs
> >> >>    as o - ø, l - ł added to the Unicode decomposition-based rules,
> >> >>    and allow the users to add more rules;
> >> >>
> >> >> 2. exclusive folding groups to exclude locale/language-dependent rules from
> >> >>    the default mappings above, e.g. removing n - ñ for the "es" locale.
> >> >
> >> > I think we should add those in item 1 unconditionally (i.e. include
> >> > them in the default mappings), and then exclude some of them under the
> >> > rules you describe in item 2.  Then the problem becomes easier, as we
> >> > only need to filter out some mappings, as determined by a single user
> >> > variable (whose default can come from the user locale).
> >> 
> >> Better to have 4 variables (2 internal + 2 user customizable variables):
> >
> > Can you explain why it's better to have 4 variables rather than just
> > one?
> 
> If you mean that one customizable variable should contain all mappings from
> UnicodeData.txt and decomps.txt presented to the user for customization,
> such a list will be too huge to customize: there are 5721 decompositions
> in UnicodeData.txt, and 6674 decompositions in decomps.txt.

No, of course not.  That would be extremely inconvenient.

What I envisioned is a single variable that holds a list of folding
sub-features.  Examples include ignoring diacritics, matching
ligatures and their decompositions, "controversial" foldings that
users of specific languages might not want, etc.  The default value
will hold all of the sub-features; users that don't want some of them
will be able to remove them from the list, which will affect the
mapping at search time.  We could also have a setting that means "DTRT
for my locale", which will remove the sub-features inappropriate for
the locale's language.  Stuff like that.

> So we could have at least one default internal variable containing all
> decompositions from UnicodeData.txt plus decompositions from decomps.txt
> minus locale-dependent mappings.

Internally, we need a translation table for mapping equivalent
characters.  This table should be recomputed (or selected among
several precomputed ones) according to the list of sub-features that
the user requested.

> >   http://unicode.org/Public/UCA/latest/decomps.txt
> >
> > (The last release of Unicode is v8.0.)
> 
> Thanks, comparing UnicodeData.txt with the latest decomps.txt shows
> 1600 differences (such as ł decomposed to l and ̵ and ø to o and ̸)
> we need to add manually (a whole set of differences is attached below):

I think we need to create another uni-*.el file which defines a
decomposition char-table populated from decomps.txt.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-24 23:27                                                     ` Rasmus
@ 2016-02-25 20:46                                                       ` Richard Stallman
  0 siblings, 0 replies; 263+ messages in thread
From: Richard Stallman @ 2016-02-25 20:46 UTC (permalink / raw)
  To: Rasmus; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I think it should look at the /keyboard layout/ before the /locale/.

In principle you might be right, but how can Emacs find out the
keyboard layout?

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-25 12:38                                                                                             ` Joost Kremers
@ 2016-02-25 22:43                                                                                               ` John Wiegley
  2016-02-25 22:48                                                                                                 ` John Wiegley
  2016-02-26 18:13                                                                                                 ` Eli Zaretskii
  0 siblings, 2 replies; 263+ messages in thread
From: John Wiegley @ 2016-02-25 22:43 UTC (permalink / raw)
  To: Joost Kremers; +Cc: larsi, Eli Zaretskii, lokedhs, rms, emacs-devel

>>>>> Joost Kremers <joostkremers@fastmail.fm> writes:

> On Thu, Feb 25 2016, Richard Stallman wrote:
>> If it happens that all languages agree about á, that won't be a problem.

> I doubt that's the case. Though I don't actually speak the language, I
> suspect that in Icelandic a and á are considered different letters. The
> former is pronounced [a], the latter [au̯]. Similar considerations apply to
> all the vowels e/é, i/í, o/ó, u/ú and y/ý.

I'd like to ask at this point that this discussion move to Emacs Tangents, as
it is not approaching anything in the way of a technical consensus.

Sub-threads addressing specific, concrete issues are welcome on this list; but
the general discussion happening here is only creating volume without result.

Thank you,
-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-25 22:43                                                                                               ` John Wiegley
@ 2016-02-25 22:48                                                                                                 ` John Wiegley
  2016-02-26 18:13                                                                                                 ` Eli Zaretskii
  1 sibling, 0 replies; 263+ messages in thread
From: John Wiegley @ 2016-02-25 22:48 UTC (permalink / raw)
  To: Joost Kremers; +Cc: larsi, Eli Zaretskii, lokedhs, rms, emacs-devel

>>>>> John Wiegley <johnw@gnu.org> writes:

> Sub-threads addressing specific, concrete issues are welcome on this list;
> but the general discussion happening here is only creating volume without
> result.

Where by "sub-thread" I mean, changing the Subject as you reply to indicate
the precise point you wish to resolve through discussion here.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-25 22:43                                                                                               ` John Wiegley
  2016-02-25 22:48                                                                                                 ` John Wiegley
@ 2016-02-26 18:13                                                                                                 ` Eli Zaretskii
  2016-02-27  0:48                                                                                                   ` John Wiegley
  1 sibling, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-26 18:13 UTC (permalink / raw)
  To: John Wiegley; +Cc: joostkremers, larsi, lokedhs, rms, emacs-devel

> From: John Wiegley <jwiegley@gmail.com>
> Cc: rms@gnu.org,  Eli Zaretskii <eliz@gnu.org>,  lokedhs@gmail.com,  larsi@gnus.org,  emacs-devel@gnu.org
> Date: Thu, 25 Feb 2016 14:43:37 -0800
> 
> I'd like to ask at this point that this discussion move to Emacs Tangents, as
> it is not approaching anything in the way of a technical consensus.
> 
> Sub-threads addressing specific, concrete issues are welcome on this list; but
> the general discussion happening here is only creating volume without result.

The discussion (with a few exceptions) is about how to augment the
current implementation to make it more acceptable to various needs and
cultures.  So I think it's directly related to the pretest, and so
moving it to emacs-tangents would be wrong.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-25 14:57                                                                                           ` Yuri Khan
@ 2016-02-26 20:21                                                                                             ` Richard Stallman
  2016-02-27  5:47                                                                                               ` Yuri Khan
  0 siblings, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-26 20:21 UTC (permalink / raw)
  To: Yuri Khan; +Cc: eliz, lokedhs, larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > Meanwhile, I don't think it has to be one or the other.
  > > It might be good to do both.

  > What specific user scenario do you want to solve by folding
  > Latin/Greek/Cyrillic confusables?

If I saw an 'a' in the buffer, I'd like searching for 'a' to find it.
Of course, I will search for a Latin 'a'.  If the char in the buffer
is a Cyrillic 'a', I want isearch to find that too.

  > It is when you are proof-reading text that it becomes important to
  > distinguish Latin and Cyrillic, to check that you don’t have a stray
  > Cyrillic letter within an English word, or vice-versa.

If I want to check which kind of a it is, I can do that with C-x =.
It would never occur to me to test "Is this really a Cyrillic a"
by searching for a Latin a and seeing if that finds it.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-22 18:51                                                                           ` Eli Zaretskii
  2016-02-23  0:14                                                                             ` Juri Linkov
@ 2016-02-26 20:23                                                                             ` Richard Stallman
  1 sibling, 0 replies; 263+ messages in thread
From: Richard Stallman @ 2016-02-26 20:23 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > * A per-buffer language preference variable.
  > > * A global value which becomes the default for new buffers.

  > That's unnecessarily restrictive; we can do better with the current
  > infrastructure.

This is not a restiction, it is a feature.  It is meant to enables
people to do something convenient.

  >   Some encodings provide us with charset information,
  > which can be used to deduce the language of the text.  Some characters
  > belong to Unicode blocks that allow identification of the language, or
  > maybe a small group of languages.  In some cases, the text itself
  > comes with metadata which describes the language.  And there might be
  > other sources of information about the language.

If there are useful ways to determine the language from the text, that
work well enough that users won't complain, let's do it.  That would
be an add-on to the structure I proposed.

  > There are other aspects of this that need to be considered, if we want
  > for language-specific searching to be solid.  E.g., what happens with
  > text copied to another buffer which might have a different per-buffer
  > language preference? does it suddenly behave differently when
  > searched?

Yes.  If you want the two buffers to have the same language
preference, then maybe Emacs can guess that for you; if not, you can
specify it.

  > But the most basic issue is that any significant development in these
  > directions require to re-implement the feature on the C level, and use
  > char-tables for folding, like we do with case-mapping.

It needs to use some sort of tables.  Whether they are the current
kind of char table, or some other structure, is something to be
determined.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-26 18:13                                                                                                 ` Eli Zaretskii
@ 2016-02-27  0:48                                                                                                   ` John Wiegley
  2016-02-27  8:38                                                                                                     ` Eli Zaretskii
  0 siblings, 1 reply; 263+ messages in thread
From: John Wiegley @ 2016-02-27  0:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: joostkremers, larsi, lokedhs, rms, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 959 bytes --]

>>>>> Eli Zaretskii <eliz@gnu.org> writes:

> The discussion (with a few exceptions) is about how to augment the current
> implementation to make it more acceptable to various needs and cultures. So
> I think it's directly related to the pretest, and so moving it to
> emacs-tangents would be wrong.

In that case, can you please propose a plan for reaching such acceptability?
If I can clearly see what we're aiming toward, it will give me a context for
reading these messages, and help focus the discussion.

For example: makes exactly it not acceptable today? what are the desirable
features of an "ideal implementation"? what are the variables we're trying to
hammer down? etc. Then I think we can meaningfully tackle this issue by
breaking it into the smaller pieces that make it up.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-26 20:21                                                                                             ` Richard Stallman
@ 2016-02-27  5:47                                                                                               ` Yuri Khan
  2016-02-27 19:54                                                                                                 ` Richard Stallman
  0 siblings, 1 reply; 263+ messages in thread
From: Yuri Khan @ 2016-02-27  5:47 UTC (permalink / raw)
  To: rms@gnu.org
  Cc: Eli Zaretskii, Elias Mårtenson, Lars Ingebrigtsen,
	Emacs developers

On Sat, Feb 27, 2016 at 2:21 AM, Richard Stallman <rms@gnu.org> wrote:

>   > What specific user scenario do you want to solve by folding
>   > Latin/Greek/Cyrillic confusables?
>
> If I saw an 'a' in the buffer, I'd like searching for 'a' to find it.
> Of course, I will search for a Latin 'a'.  If the char in the buffer
> is a Cyrillic 'a', I want isearch to find that too.

You don’t usually see an “а” in isolation. In normal text, you see at
least a word, and usually a sentence. Those give you enough context to
know it’s not a Latin “a”.

>   > It is when you are proof-reading text that it becomes important to
>   > distinguish Latin and Cyrillic, to check that you don’t have a stray
>   > Cyrillic letter within an English word, or vice-versa.
>
> If I want to check which kind of a it is, I can do that with C-x =.

You can do that if you already suspect one letter to be of the wrong
alphabet (e.g. your spell-checker tells you there is no such word as
“sрell-сhecker”). You cannot do that for any reasonably long stretch
of text.

> It would never occur to me to test "Is this really a Cyrillic a"
> by searching for a Latin a and seeing if that finds it.

Neither to me, though I might use a regexp isearch for [a-z] to
highlight all Latin letters in a paragraph where I expect none. It
would be confusing and misleading if it highlighted
[АВЕКМНОРСТХЬавеморстух].

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-27  0:48                                                                                                   ` John Wiegley
@ 2016-02-27  8:38                                                                                                     ` Eli Zaretskii
  2016-02-27  8:58                                                                                                       ` John Wiegley
  2016-02-27 19:53                                                                                                       ` Richard Stallman
  0 siblings, 2 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-27  8:38 UTC (permalink / raw)
  To: John Wiegley; +Cc: joostkremers, larsi, lokedhs, rms, emacs-devel

> From: John Wiegley <jwiegley@gmail.com>
> Cc: joostkremers@fastmail.fm,  rms@gnu.org,  lokedhs@gmail.com,  larsi@gnus.org,  emacs-devel@gnu.org
> Date: Fri, 26 Feb 2016 16:48:21 -0800
> 
> 
> [1:text/plain Hide]
> 
> >>>>> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > The discussion (with a few exceptions) is about how to augment the current
> > implementation to make it more acceptable to various needs and cultures. So
> > I think it's directly related to the pretest, and so moving it to
> > emacs-tangents would be wrong.
> 
> In that case, can you please propose a plan for reaching such acceptability?
> If I can clearly see what we're aiming toward, it will give me a context for
> reading these messages, and help focus the discussion.
> 
> For example: makes exactly it not acceptable today? what are the desirable
> features of an "ideal implementation"? what are the variables we're trying to
> hammer down? etc. Then I think we can meaningfully tackle this issue by
> breaking it into the smaller pieces that make it up.

The simplest change would be to have character-folding disabled by
default in some European locales whose users expressed objections to
having it on by default, due to folding of some characters that
shouldn't be folded in the languages of those locales.

Another, more complex, but still simple enough, possibility would be
to have character-folding on by default, but have the problematic
foldings filtered out from the regexp used by it.  We could either
always filter out all of them, or filter out only some of them, as
determined by the user locale.  For example, in the Spanish locales, ñ
will not be folded.

The next alternative is to come up with a fine-grained classification
of character-folding, and provide user options to control each one of
them independently, with the defaults determined by the user locale.
For example, one class of folding is the one required for matching
pre-composed characters such as á with its decomposed variant á;
another class is for finding "similar" characters, such as finding ⒜
when looking for a.  There should probably be classes that are
disliked by users of certain languages, such as ñ for Spanish.
Etc. etc.  (I think this alternative needs more research and user
feedback, and so is probably not for the release branch.)

Maybe there are more alternatives, I don't know.  It's not like they
were explicitly proposed by someone; the above is just my personal
conclusions from reading the discussion.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-27  8:38                                                                                                     ` Eli Zaretskii
@ 2016-02-27  8:58                                                                                                       ` John Wiegley
  2016-02-27  9:30                                                                                                         ` Eli Zaretskii
  2016-02-27 19:53                                                                                                       ` Richard Stallman
  1 sibling, 1 reply; 263+ messages in thread
From: John Wiegley @ 2016-02-27  8:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: joostkremers, larsi, lokedhs, rms, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2796 bytes --]

>>>>> Eli Zaretskii <eliz@gnu.org> writes:

> The simplest change would be to have character-folding disabled by default
> in some European locales whose users expressed objections to having it on by
> default, due to folding of some characters that shouldn't be folded in the
> languages of those locales.

> Another, more complex, but still simple enough, possibility would be to have
> character-folding on by default, but have the problematic foldings filtered
> out from the regexp used by it. We could either always filter out all of
> them, or filter out only some of them, as determined by the user locale. For
> example, in the Spanish locales, ñ will not be folded.

> The next alternative is to come up with a fine-grained classification of
> character-folding, and provide user options to control each one of them
> independently, with the defaults determined by the user locale. For example,
> one class of folding is the one required for matching pre-composed
> characters such as á with its decomposed variant á; another class is for
> finding "similar" characters, such as finding ⒜ when looking for a. There
> should probably be classes that are disliked by users of certain languages,
> such as ñ for Spanish. Etc. etc. (I think this alternative needs more
> research and user feedback, and so is probably not for the release branch.)

> Maybe there are more alternatives, I don't know. It's not like they were
> explicitly proposed by someone; the above is just my personal conclusions
> from reading the discussion.

Thank you for that summary. From that reading, it sounds like this will
require a fairly complex decision tree, to determine what should be folded
when based on the details of each particular country/language? That is, we
can't expect to make a single decision up front, but will need feedback from
users in every country that uses Emacs, in order to determine what the correct
settings are for each language?

And what about a Swedish speaker living in America who uses en_US because
that's what 90% of his text is in, who then wants to search some Swedish text?
Is it the locale that determines it, or something specific to the nature of
the text in each buffer? And how would Emacs know?

Unless I'm not seeing the light at the end of this tunnel, this feature is
just not ready for prime-time as a default. There are too many unanswered
questions, and it sounds like none of them can be answered in the abstract for
every case. I have a feeling we'd be getting bug reports constantly from users
whose language contains details we never anticipated.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-27  8:58                                                                                                       ` John Wiegley
@ 2016-02-27  9:30                                                                                                         ` Eli Zaretskii
  2016-02-27 16:22                                                                                                           ` Ken Brown
  2016-02-27 22:48                                                                                                           ` John Wiegley
  0 siblings, 2 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-27  9:30 UTC (permalink / raw)
  To: John Wiegley; +Cc: joostkremers, larsi, lokedhs, rms, emacs-devel

> From: John Wiegley <jwiegley@gmail.com>
> Cc: joostkremers@fastmail.fm,  rms@gnu.org,  lokedhs@gmail.com,  larsi@gnus.org,  emacs-devel@gnu.org
> Date: Sat, 27 Feb 2016 00:58:02 -0800
> 
> Thank you for that summary. From that reading, it sounds like this will
> require a fairly complex decision tree, to determine what should be folded
> when based on the details of each particular country/language?

I fail to see the complexity, but that's me.  In particular, the first
alternative (to have it disabled in certain locales) seems very simple
to me.

> And what about a Swedish speaker living in America who uses en_US because
> that's what 90% of his text is in, who then wants to search some Swedish text?
> Is it the locale that determines it, or something specific to the nature of
> the text in each buffer? And how would Emacs know?

I've asked these questions a lot in this discussion, and still the
majority thinks that the locale in which Emacs is started should be
used for the defaults.  So you are in fact arguing with what the
majority says, not with me.

> Unless I'm not seeing the light at the end of this tunnel, this feature is
> just not ready for prime-time as a default. There are too many unanswered
> questions, and it sounds like none of them can be answered in the abstract for
> every case. I have a feeling we'd be getting bug reports constantly from users
> whose language contains details we never anticipated.

Do we have a clear definition of what are the criteria for this
feature to be "ready for prime-time as a default"?  You are in effect
saying that we will never be able to find good answers for those
questions.  We shouldn't be dismissing a good feature such as this
one, which many users like, due to FUD-like arguments.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-27  9:30                                                                                                         ` Eli Zaretskii
@ 2016-02-27 16:22                                                                                                           ` Ken Brown
  2016-02-27 22:48                                                                                                           ` John Wiegley
  1 sibling, 0 replies; 263+ messages in thread
From: Ken Brown @ 2016-02-27 16:22 UTC (permalink / raw)
  To: Eli Zaretskii, John Wiegley
  Cc: joostkremers, larsi, lokedhs, rms, emacs-devel

On 2/27/2016 4:30 AM, Eli Zaretskii wrote:
>> From: John Wiegley <jwiegley@gmail.com>
>> Thank you for that summary. From that reading, it sounds like this will
>> require a fairly complex decision tree, to determine what should be folded
>> when based on the details of each particular country/language?
>
> I fail to see the complexity, but that's me.  In particular, the first
> alternative (to have it disabled in certain locales) seems very simple
> to me.

I strongly agree.  This would be an excellent compromise for 25.1.  It 
would enable many users to discover a useful new feature, while allowing 
time for future refinements that would improve the feature for users in 
the problematic locales.

Ken



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-27  8:38                                                                                                     ` Eli Zaretskii
  2016-02-27  8:58                                                                                                       ` John Wiegley
@ 2016-02-27 19:53                                                                                                       ` Richard Stallman
  2016-02-27 20:01                                                                                                         ` Eli Zaretskii
  1 sibling, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-27 19:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: joostkremers, larsi, johnw, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > The simplest change would be to have character-folding disabled by
  > default in some European locales whose users expressed objections to
  ...

Why not implement what I suggested?  Even though there are several
levels, in each case they boil down into a set of classes of characters,
each one either symmetric or asymmetric.  Once that calculation is done,
we can search for them with the existing mechanism.

  > That is, we
  > can't expect to make a single decision up front, but will need feedback from
  > users in every country that uses Emacs, in order to determine what the correct
  > settings are for each language?

Right.  Once we show it to people, we will start getting language-specific
definitions.

  > And what about a Swedish speaker living in America who uses en_US because
  > that's what 90% of his text is in, who then wants to search some Swedish text?
  > Is it the locale that determines it, or something specific to the nature of
  > the text in each buffer? And how would Emacs know?

Clearly we need to provide a way to set the language for each buffer.
We need this for several purposes, another one being the ispell dictionary.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-27  5:47                                                                                               ` Yuri Khan
@ 2016-02-27 19:54                                                                                                 ` Richard Stallman
  2016-02-27 20:02                                                                                                   ` Eli Zaretskii
                                                                                                                     ` (2 more replies)
  0 siblings, 3 replies; 263+ messages in thread
From: Richard Stallman @ 2016-02-27 19:54 UTC (permalink / raw)
  To: Yuri Khan; +Cc: eliz, lokedhs, larsi, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > > If I saw an 'a' in the buffer, I'd like searching for 'a' to find it.
  > > Of course, I will search for a Latin 'a'.  If the char in the buffer
  > > is a Cyrillic 'a', I want isearch to find that too.

  > You don’t usually see an “а” in isolation. In normal text, you see at
  > least a word, and usually a sentence. Those give you enough context to
  > know it’s not a Latin “a”.

Often that is true.  Nonetheless, I stand by what I said:
I would rather have searching for Latin a match all a's.

  > Neither to me, though I might use a regexp isearch for [a-z] to
  > highlight all Latin letters in a paragraph where I expect none. It
  > would be confusing and misleading if it highlighted
  > [АВЕКМНОРСТХЬавеморстух].

Folding doesn't operate on [...], right?

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-27 19:53                                                                                                       ` Richard Stallman
@ 2016-02-27 20:01                                                                                                         ` Eli Zaretskii
  2016-02-28 10:24                                                                                                           ` Richard Stallman
       [not found]                                                                                                           ` <<E1aZyX5-0007bU-Mu@fencepost.gnu.org>
  0 siblings, 2 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-27 20:01 UTC (permalink / raw)
  To: rms; +Cc: joostkremers, larsi, johnw, lokedhs, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> CC: johnw@gnu.org, joostkremers@fastmail.fm, larsi@gnus.org,
> 	lokedhs@gmail.com, emacs-devel@gnu.org
> Date: Sat, 27 Feb 2016 14:53:21 -0500
> 
>   > The simplest change would be to have character-folding disabled by
>   > default in some European locales whose users expressed objections to
>   ...
> 
> Why not implement what I suggested?  Even though there are several
> levels, in each case they boil down into a set of classes of characters,
> each one either symmetric or asymmetric.  Once that calculation is done,
> we can search for them with the existing mechanism.

I will have to see the code, but I expect your suggestion to be much
more complex, and thus unsuitable for the release branch.  It's okay
to do that on master, but John asked his questions wrt the release
branch.

>   > That is, we
>   > can't expect to make a single decision up front, but will need feedback from
>   > users in every country that uses Emacs, in order to determine what the correct
>   > settings are for each language?
> 
> Right.  Once we show it to people, we will start getting language-specific
> definitions.
> 
>   > And what about a Swedish speaker living in America who uses en_US because
>   > that's what 90% of his text is in, who then wants to search some Swedish text?
>   > Is it the locale that determines it, or something specific to the nature of
>   > the text in each buffer? And how would Emacs know?
> 
> Clearly we need to provide a way to set the language for each buffer.
> We need this for several purposes, another one being the ispell dictionary.

These are definitely out for the release branch.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-27 19:54                                                                                                 ` Richard Stallman
@ 2016-02-27 20:02                                                                                                   ` Eli Zaretskii
  2016-02-27 20:05                                                                                                   ` Eli Zaretskii
  2016-02-28  6:06                                                                                                   ` Yuri Khan
  2 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-27 20:02 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel, lokedhs, yuri.v.khan

> From: Richard Stallman <rms@gnu.org>
> CC: eliz@gnu.org, larsi@gnus.org, lokedhs@gmail.com,
> 	emacs-devel@gnu.org
> Date: Sat, 27 Feb 2016 14:54:00 -0500
> 
>   > You don’t usually see an “а” in isolation. In normal text, you see at
>   > least a word, and usually a sentence. Those give you enough context to
>   > know it’s not a Latin “a”.
> 
> Often that is true.  Nonetheless, I stand by what I said:
> I would rather have searching for Latin a match all a's.

I think you are in a tiny minority in this respect.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-27 19:54                                                                                                 ` Richard Stallman
  2016-02-27 20:02                                                                                                   ` Eli Zaretskii
@ 2016-02-27 20:05                                                                                                   ` Eli Zaretskii
  2016-02-28 10:25                                                                                                     ` Richard Stallman
  2016-02-28  6:06                                                                                                   ` Yuri Khan
  2 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-27 20:05 UTC (permalink / raw)
  To: rms; +Cc: larsi, emacs-devel, lokedhs, yuri.v.khan

> From: Richard Stallman <rms@gnu.org>
> Date: Sat, 27 Feb 2016 14:54:00 -0500
> Cc: eliz@gnu.org, lokedhs@gmail.com, larsi@gnus.org, emacs-devel@gnu.org
> 
>   > Neither to me, though I might use a regexp isearch for [a-z] to
>   > highlight all Latin letters in a paragraph where I expect none. It
>   > would be confusing and misleading if it highlighted
>   > [АВЕКМНОРСТХЬавеморстух].
> 
> Folding doesn't operate on [...], right?

No, but only because character-folding is implemented with regexps.
When it is re-implemented through translation tables, it will affect
regexp search as well.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-27  9:30                                                                                                         ` Eli Zaretskii
  2016-02-27 16:22                                                                                                           ` Ken Brown
@ 2016-02-27 22:48                                                                                                           ` John Wiegley
  2016-02-28 15:57                                                                                                             ` Eli Zaretskii
  1 sibling, 1 reply; 263+ messages in thread
From: John Wiegley @ 2016-02-27 22:48 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: joostkremers, larsi, lokedhs, rms, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 3006 bytes --]

>>>>> Eli Zaretskii <eliz@gnu.org> writes:

> I've asked these questions a lot in this discussion, and still the majority
> thinks that the locale in which Emacs is started should be used for the
> defaults. So you are in fact arguing with what the majority says, not with
> me.

From what I've seen, this is a complex feature with many corner cases, some of
which may not have been encountered yet because it hasn't been "out in the
field" except for a few pretests.

> Do we have a clear definition of what are the criteria for this feature to
> be "ready for prime-time as a default"? You are in effect saying that we
> will never be able to find good answers for those questions. We shouldn't be
> dismissing a good feature such as this one, which many users like, due to
> FUD-like arguments.

Having such a clear definition would be the first criterion. :) Otherwise, I
feel like we're saying, "It sounds useful, why not enable it by default?"

Here are my somewhat fuzzy criteria:

 1. Questions about the feature should not prompt mega-threads that fail to
    reach clarity within a three week time-frame. This indicates a lack of
    clarity about the feature among the core developers, and I believe users
    will notice this lack of clarity when trying out the feature.

 2. If there is work yet to be done, we should know what the work is.
    Otherwise, the feature may change in unpredictable ways in future
    versions. If that's the case, why make it the default before those
    decisions have been made?

 3. I would like to have a sense that this is a feature with either prior art,
    or considerable experience, behind it. Instead, I get the *feeling* (from
    reading this thread) that we're just starting to explore the idea of
    character-class-based searching, and it strikes me as odd that we would
    make our first attempt at it a default behavior for all users.

I've heard several people ask for it not to be a default, and I take that
seriously. The many complexities surrounding this feature make me uneasy. If
this were a product for sale, I'd have a huge question mark next to making
this a default behavior, given the confusion and false bug reports it is
likely to raise. Nothing I've read so far in this discussion has increased my
sense of security; quite the opposite, I become more wary by the week. It
seems like the more we poke this anthill, the more critters jump out.

That said, I'm quite happy for the feature to be there, and I will most
definitely turn it on. The question is whether it should become the default
for all users from the start. We can always enable it as a default later, so I
don't see a need to hurry. This could be a great feature to introduce as a
default in 26.1, if it receives good reception from early adopters in 25.x.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-27 19:54                                                                                                 ` Richard Stallman
  2016-02-27 20:02                                                                                                   ` Eli Zaretskii
  2016-02-27 20:05                                                                                                   ` Eli Zaretskii
@ 2016-02-28  6:06                                                                                                   ` Yuri Khan
  2 siblings, 0 replies; 263+ messages in thread
From: Yuri Khan @ 2016-02-28  6:06 UTC (permalink / raw)
  To: rms@gnu.org
  Cc: Eli Zaretskii, Elias Mårtenson, Lars Ingebrigtsen,
	Emacs developers

On Sun, Feb 28, 2016 at 1:54 AM, Richard Stallman <rms@gnu.org> wrote:

>   > […] I might use a regexp isearch for [a-z] to
>   > highlight all Latin letters in a paragraph where I expect none. It
>   > would be confusing and misleading if it highlighted
>   > [АВЕКМНОРСТХЬавеморстух].
>
> Folding doesn't operate on [...], right?

Case folding surely does. I was assuming it is the long-term plan that
character folding would operate consistently with case folding.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-27 20:01                                                                                                         ` Eli Zaretskii
@ 2016-02-28 10:24                                                                                                           ` Richard Stallman
  2016-02-28 16:01                                                                                                             ` Eli Zaretskii
       [not found]                                                                                                           ` <<E1aZyX5-0007bU-Mu@fencepost.gnu.org>
  1 sibling, 1 reply; 263+ messages in thread
From: Richard Stallman @ 2016-02-28 10:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: joostkremers, larsi, johnw, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I will have to see the code, but I expect your suggestion to be much
  > more complex, and thus unsuitable for the release branch.  It's okay
  > to do that on master, but John asked his questions wrt the release
  > branch.

For the release, I think we should turn it off by default
and invite people to try turning it on.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-27 20:05                                                                                                   ` Eli Zaretskii
@ 2016-02-28 10:25                                                                                                     ` Richard Stallman
  0 siblings, 0 replies; 263+ messages in thread
From: Richard Stallman @ 2016-02-28 10:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, emacs-devel, lokedhs, yuri.v.khan

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > No, but only because character-folding is implemented with regexps.
  > When it is re-implemented through translation tables, it will affect
  > regexp search as well.

How to properly fold character ranges calls for some additional
thought.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-27 22:48                                                                                                           ` John Wiegley
@ 2016-02-28 15:57                                                                                                             ` Eli Zaretskii
  2016-02-28 16:59                                                                                                               ` Drew Adams
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-28 15:57 UTC (permalink / raw)
  To: John Wiegley; +Cc: joostkremers, larsi, lokedhs, rms, emacs-devel

> From: John Wiegley <jwiegley@gmail.com>
> Cc: joostkremers@fastmail.fm,  rms@gnu.org,  lokedhs@gmail.com,  larsi@gnus.org,  emacs-devel@gnu.org
> Date: Sat, 27 Feb 2016 14:48:31 -0800
> 
> From what I've seen, this is a complex feature with many corner cases, some of
> which may not have been encountered yet because it hasn't been "out in the
> field" except for a few pretests.

I don't see any corner use cases, just some parts that, for best
results, should be handled depending on the language of the text.
What we have now is IMNSHO good enough, although improvements are
welcome (and need infrastructure we don't currently have).  This is a
clear case of perfect being the enemy of good.

> The question is whether it should become the default for all users
> from the start. We can always enable it as a default later, so I
> don't see a need to hurry. This could be a great feature to
> introduce as a default in 26.1, if it receives good reception from
> early adopters in 25.x.

Why does it have to be a binary all or nothing decision?  Users of a
few languages found some of the folding patterns incorrect for their
language -- why not turn only those patterns off in the locales that
use only those languages?  Why should we have this decision affect
users who have nothing to do with those few languages?

Turning this summarily off will also disable features that AFAIR no
one objected to -- the ability to find á (a 2-character sequence) when
looking for á (one character), or vice versa.  I fail to see how a
failure to match by default in this use case would make any sense at
all.

We should make our decisions in this matter based on understanding the
issues involved, and try very hard not to throw away the baby with the
bathwater.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-28 10:24                                                                                                           ` Richard Stallman
@ 2016-02-28 16:01                                                                                                             ` Eli Zaretskii
  0 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-28 16:01 UTC (permalink / raw)
  To: rms; +Cc: joostkremers, larsi, johnw, lokedhs, emacs-devel

> From: Richard Stallman <rms@gnu.org>
> CC: johnw@gnu.org, joostkremers@fastmail.fm, larsi@gnus.org,
> 	lokedhs@gmail.com, emacs-devel@gnu.org
> Date: Sun, 28 Feb 2016 05:24:59 -0500
> 
> For the release, I think we should turn it off by default
> and invite people to try turning it on.

That would be a grave mistake, IMO, since at least some parts of
folding are a must, and no one objected to them till now (neither
would I expect to see any objections).  See my other message for
details.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* RE: On language-dependent defaults for character-folding
  2016-02-28 15:57                                                                                                             ` Eli Zaretskii
@ 2016-02-28 16:59                                                                                                               ` Drew Adams
  2016-02-28 22:59                                                                                                                 ` John Wiegley
  0 siblings, 1 reply; 263+ messages in thread
From: Drew Adams @ 2016-02-28 16:59 UTC (permalink / raw)
  To: Eli Zaretskii, John Wiegley
  Cc: joostkremers, larsi, lokedhs, rms, emacs-devel

> > From what I've seen, this is a complex feature with many corner
> > cases, some of which may not have been encountered yet because it
> > hasn't been "out in the field" except for a few pretests.
> 
> I don't see any corner use cases, just some parts that, for best
> results, should be handled depending on the language of the text.
> What we have now is IMNSHO good enough, although improvements are
> welcome (and need infrastructure we don't currently have).  This is
> a clear case of perfect being the enemy of good.

I don't see anyone arguing that this feature is not "good enough" for
Emacs 25.1.  No one has suggested pulling the feature from the release.

The question is only whether it should be turned on by default.
Posing that question, and even deciding that it is not, is not at
all "a clear case of perfect being the enemy of good."

> > The question is whether it should become the default for all
> > users from the start.

What John said.

> > We can always enable it as a default later, so I
> > don't see a need to hurry. This could be a great feature to
> > introduce as a default in 26.1, if it receives good reception from
> > early adopters in 25.x.
> 
> Why does it have to be a binary all or nothing decision?  Users of a
> few languages found some of the folding patterns incorrect for their
> language -- why not turn only those patterns off in the locales that
> use only those languages?  Why should we have this decision affect
> users who have nothing to do with those few languages?

That's a reasonable question: whether Emacs should have different
default values for this feature for different users/locales.

I tend to think that deciding to do that now would also be a bit
premature, but the question is reasonable.

> Turning this summarily off will also disable features that AFAIR no
> one objected to -- the ability to find á (a 2-character sequence)
> when looking for á (one character), or vice versa.  I fail to see
> how a failure to match by default in this use case would make any
> sense at all.

That "ability to find" would not disappear if char-folding were
off by default.  It is you who sounds like you are now making the
question into all-or-nothing.

> We should make our decisions in this matter based on understanding
> the issues involved, and try very hard not to throw away the baby
> with the bathwater.

I don't see anyone proposing to throw out the bathwater, much less
the baby with it.

Eli, you say here, quite often, that you think discussions about
what the default behavior of a feature should be are typically
fruitless, if not sterile.  But it seems clear that you care quite
a lot about this default behavior.

I'd say let it go.  There will be Emacs 25.2 and beyond.  And users
will try this new feature and give their feedback, which I expect
will be overwhelmingly positive - and informative for further
discussions here.

Based on user feedback and further discussion and analysis here
(this is not going away), Emacs Dev will improve and elaborate this
feature.  We will have better ideas about how to handle all of the
things that are currently not so clear.  There is plenty of time
to decide again whether this or that should be turned on by default.

What seems clear to me for Emacs 25.1 is that the feature should be
included AND that it should be simple to both (1) customize the
default behavior for a given user (i.e., what behavior search starts
with, a la `case-fold-search') and (2) toggle the behavior on the
fly, during Isearch.

Given (1) and (2), users can do what they like, and we can learn
later from them what behaviors might best be adopted for defaulting.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* RE: On language-dependent defaults for character-folding
       [not found]                                                                                                             ` <<83oab0ako0.fsf@gnu.org>
@ 2016-02-28 17:00                                                                                                               ` Drew Adams
  2016-02-28 17:59                                                                                                                 ` Clément Pit--Claudel
  0 siblings, 1 reply; 263+ messages in thread
From: Drew Adams @ 2016-02-28 17:00 UTC (permalink / raw)
  To: Eli Zaretskii, rms; +Cc: joostkremers, larsi, johnw, lokedhs, emacs-devel

> > For the release, I think we should turn it off by default
> > and invite people to try turning it on.
> 
> That would be a grave mistake, IMO, since at least some parts of
> folding are a must, and no one objected to them till now (neither
> would I expect to see any objections).  See my other message for
> details.

Some parts are a must?  Which parts, and a must for what?
A must for the _default_ behavior?



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-28 17:00                                                                                                               ` Drew Adams
@ 2016-02-28 17:59                                                                                                                 ` Clément Pit--Claudel
  2016-02-28 18:04                                                                                                                   ` Eli Zaretskii
  2016-02-28 18:22                                                                                                                   ` Drew Adams
  0 siblings, 2 replies; 263+ messages in thread
From: Clément Pit--Claudel @ 2016-02-28 17:59 UTC (permalink / raw)
  To: emacs-devel


[-- Attachment #1.1: Type: text/plain, Size: 569 bytes --]

On 02/28/2016 12:00 PM, Drew Adams wrote:
>>> For the release, I think we should turn it off by default
>>> and invite people to try turning it on.
>>
>> That would be a grave mistake, IMO, since at least some parts of
>> folding are a must, and no one objected to them till now (neither
>> would I expect to see any objections).  See my other message for
>> details.
> 
> Some parts are a must?  Which parts, and a must for what?
> A must for the _default_ behavior?

I guess Eli had pairs such as .../… in mind; I have not any disagreement about them.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-28 17:59                                                                                                                 ` Clément Pit--Claudel
@ 2016-02-28 18:04                                                                                                                   ` Eli Zaretskii
  2016-02-28 18:15                                                                                                                     ` Clément Pit--Claudel
  2016-02-28 18:23                                                                                                                     ` Drew Adams
  2016-02-28 18:22                                                                                                                   ` Drew Adams
  1 sibling, 2 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-28 18:04 UTC (permalink / raw)
  To: Clément Pit--Claudel; +Cc: emacs-devel

> From: Clément Pit--Claudel <clement.pit@gmail.com>
> Date: Sun, 28 Feb 2016 12:59:44 -0500
> 
> > Some parts are a must?  Which parts, and a must for what?
> > A must for the _default_ behavior?
> 
> I guess Eli had pairs such as .../… in mind; I have not any disagreement about them.

No, I meant the pre-composed characters and their decomposed
equivalents.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-28 18:04                                                                                                                   ` Eli Zaretskii
@ 2016-02-28 18:15                                                                                                                     ` Clément Pit--Claudel
  2016-02-28 18:23                                                                                                                     ` Drew Adams
  1 sibling, 0 replies; 263+ messages in thread
From: Clément Pit--Claudel @ 2016-02-28 18:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel


[-- Attachment #1.1: Type: text/plain, Size: 480 bytes --]

On 02/28/2016 01:04 PM, Eli Zaretskii wrote:
>> From: Clément Pit--Claudel <clement.pit@gmail.com>
>> Date: Sun, 28 Feb 2016 12:59:44 -0500
>>
>>> Some parts are a must?  Which parts, and a must for what?
>>> A must for the _default_ behavior?
>>
>> I guess Eli had pairs such as .../… in mind; I have not any disagreement about them.
> 
> No, I meant the pre-composed characters and their decomposed
> equivalents.

Of I see. Thanks for clarifying! I agree fully.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* RE: On language-dependent defaults for character-folding
  2016-02-28 17:59                                                                                                                 ` Clément Pit--Claudel
  2016-02-28 18:04                                                                                                                   ` Eli Zaretskii
@ 2016-02-28 18:22                                                                                                                   ` Drew Adams
  2016-02-28 18:58                                                                                                                     ` Clément Pit--Claudel
  1 sibling, 1 reply; 263+ messages in thread
From: Drew Adams @ 2016-02-28 18:22 UTC (permalink / raw)
  To: Clément Pit--Claudel, emacs-devel

> >>> For the release, I think we should turn it off by default
> >>> and invite people to try turning it on.
> >>
> >> That would be a grave mistake, IMO, since at least some parts of
> >> folding are a must, and no one objected to them till now (neither
> >> would I expect to see any objections).  See my other message for
> >> details.
> >
> > Some parts are a must?  Which parts, and a must for what?
> > A must for the _default_ behavior?
> 
> I guess Eli had pairs such as .../. in mind; I have not any
> disagreement about them.

Why would such pairs be a "must" in terms of the default behavior?



^ permalink raw reply	[flat|nested] 263+ messages in thread

* RE: On language-dependent defaults for character-folding
  2016-02-28 18:04                                                                                                                   ` Eli Zaretskii
  2016-02-28 18:15                                                                                                                     ` Clément Pit--Claudel
@ 2016-02-28 18:23                                                                                                                     ` Drew Adams
  2016-02-28 18:46                                                                                                                       ` Eli Zaretskii
  1 sibling, 1 reply; 263+ messages in thread
From: Drew Adams @ 2016-02-28 18:23 UTC (permalink / raw)
  To: Eli Zaretskii, Clément Pit--Claudel; +Cc: emacs-devel

> > > Some parts are a must?  Which parts, and a must for what?
> > > A must for the _default_ behavior?
> >
> > I guess Eli had pairs such as .../. in mind; I have not any
> disagreement about them.
> 
> No, I meant the pre-composed characters and their decomposed
> equivalents.

Why a must in terms of default behavior?



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-28 18:23                                                                                                                     ` Drew Adams
@ 2016-02-28 18:46                                                                                                                       ` Eli Zaretskii
  0 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-28 18:46 UTC (permalink / raw)
  To: Drew Adams; +Cc: clement.pit, emacs-devel

> Date: Sun, 28 Feb 2016 10:23:23 -0800 (PST)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: emacs-devel@gnu.org
> 
> > > > Some parts are a must?  Which parts, and a must for what?
> > > > A must for the _default_ behavior?
> > >
> > > I guess Eli had pairs such as .../. in mind; I have not any
> > disagreement about them.
> > 
> > No, I meant the pre-composed characters and their decomposed
> > equivalents.
> 
> Why a must in terms of default behavior?

Because they look identical on display.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-28 18:22                                                                                                                   ` Drew Adams
@ 2016-02-28 18:58                                                                                                                     ` Clément Pit--Claudel
  0 siblings, 0 replies; 263+ messages in thread
From: Clément Pit--Claudel @ 2016-02-28 18:58 UTC (permalink / raw)
  To: Drew Adams, emacs-devel


[-- Attachment #1.1: Type: text/plain, Size: 308 bytes --]

On 02/28/2016 01:22 PM, Drew Adams wrote:
>> I guess Eli had pairs such as .../. in mind; I have not any
>> disagreement about them.
> 
> Why would such pairs be a "must" in terms of the default behavior?

I think your mailer (or mine) corrupted my message (or your quote). I wrote .../…, not .../.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-28 16:59                                                                                                               ` Drew Adams
@ 2016-02-28 22:59                                                                                                                 ` John Wiegley
  2016-02-29  0:22                                                                                                                   ` Drew Adams
  2016-02-29  0:31                                                                                                                   ` Juri Linkov
  0 siblings, 2 replies; 263+ messages in thread
From: John Wiegley @ 2016-02-28 22:59 UTC (permalink / raw)
  To: Drew Adams; +Cc: rms, joostkremers, lokedhs, emacs-devel, Eli Zaretskii, larsi

[-- Attachment #1: Type: text/plain, Size: 1950 bytes --]

>>>>> Drew Adams <drew.adams@oracle.com> writes:

> What seems clear to me for Emacs 25.1 is that the feature should be included
> AND that it should be simple to both (1) customize the default behavior for
> a given user (i.e., what behavior search starts with, a la
> `case-fold-search') and (2) toggle the behavior on the fly, during Isearch.

I think Drew has summarized perfectly what I would like to see happen. In
addition, I'd add one more item: Once 25.1 is released, I (or another) will
write a blog article publicizing this feature and touting its benefits, in
order to encourage people to try it out and discover how useful it can be.

However, making it a default in 25.1 is something I am simply not comfortable
doing, giving the diversity of opinion on this list, plus my own misgivings
about so new (and nuanced) a feature. Yes, the visual equality of á and á is a
powerful argument, but as Drew said, there will be well-advertised ways to
both enable this feature, and to toggle it while searching. Users will not
lose any capacity by our decision, they will simply not experience it as a
default out of the box.

And so, my decision is that this feature will be off by default in the 25.1
release, with the genuine hope that it can be made solid enough to become a
default in a future release. It needn't even wait until 26.1, if we receive
enough positive feedback.

My thanks to everyone for the extensive and conscientious debate, and to Eli
for sticking to his guns. I am hopeful we will reach general consensus over
time, and that this feature will come to be recognized as a compelling aspect
of the Emacs feature set. Until that day, please forgive me my reservations;
I'm just not there yet in wanting this to become a default behavior.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 629 bytes --]

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-25 16:24                                                                                       ` Eli Zaretskii
@ 2016-02-29  0:22                                                                                         ` Juri Linkov
  2016-02-29 16:27                                                                                           ` Eli Zaretskii
  0 siblings, 1 reply; 263+ messages in thread
From: Juri Linkov @ 2016-02-29  0:22 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, lokedhs, rms, emacs-devel

> What I envisioned is a single variable that holds a list of folding
> sub-features.  Examples include ignoring diacritics, matching
> ligatures and their decompositions, "controversial" foldings that
> users of specific languages might not want, etc.  The default value
> will hold all of the sub-features; users that don't want some of them
> will be able to remove them from the list, which will affect the
> mapping at search time.  We could also have a setting that means "DTRT
> for my locale", which will remove the sub-features inappropriate for
> the locale's language.  Stuff like that.

Like (defcustom char-fold-defaults '(ignore-diacritics match-ligatures ...?

Not sure if such terms are self-descriptive.  At least plain pairs like
'((o ø) (l ł) ...) should be enough to customize at the base character level,
and later we might consider grouping such pairs into a more high-level
features like ‘spanish-diacritics’, ‘swedish-diacritics’, etc.

>> So we could have at least one default internal variable containing all
>> decompositions from UnicodeData.txt plus decompositions from decomps.txt
>> minus locale-dependent mappings.
>
> Internally, we need a translation table for mapping equivalent
> characters.  This table should be recomputed (or selected among
> several precomputed ones) according to the list of sub-features that
> the user requested.

Or maybe customizing a variable like (defcustom char-fold-language
(with the default depending on the user locale) could reevaluate
the table on saving the modified value.

>> >   http://unicode.org/Public/UCA/latest/decomps.txt
>> >
>> > (The last release of Unicode is v8.0.)
>>
>> Thanks, comparing UnicodeData.txt with the latest decomps.txt shows
>> 1600 differences (such as ł decomposed to l and ̵ and ø to o and ̸)
>> we need to add manually (a whole set of differences is attached below):
>
> I think we need to create another uni-*.el file which defines a
> decomposition char-table populated from decomps.txt.

The name of the currently used Unicode character property is “decomposition”.
What would be a good name for the property from decomps.txt? “decomposition2”?



^ permalink raw reply	[flat|nested] 263+ messages in thread

* RE: On language-dependent defaults for character-folding
  2016-02-28 22:59                                                                                                                 ` John Wiegley
@ 2016-02-29  0:22                                                                                                                   ` Drew Adams
  2016-02-29  0:31                                                                                                                   ` Juri Linkov
  1 sibling, 0 replies; 263+ messages in thread
From: Drew Adams @ 2016-02-29  0:22 UTC (permalink / raw)
  To: John Wiegley
  Cc: rms, joostkremers, lokedhs, emacs-devel, Eli Zaretskii, larsi

> I'd add one more item: Once 25.1 is released, I (or another) will
> write a blog article publicizing this feature and touting its
> benefits, in order to encourage people to try it out and discover
> how useful it can be.

Good idea.  It would be good to include some of the use cases
brought up here (e.g. dealing with different languages).  People
here who are more familiar with specific cases could make
suggestions or propose corrections to whatever is written as a
first draft.

That way, these cases and their possible issues (so far) will be
out there, from the outset, in addition to the general info about
using the new feature.  That will help users who might run into
such use cases on their own, and doing that will help us get more
feedback from such users, for future enhancement of the feature.

Mentioning such things on the blog could be done in a separate
section, after the main points have been made.  In addition to
the benefits mentioned above, this will show people that Emacs is
thinking about such things and is open to suggestions about them.

^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-28 22:59                                                                                                                 ` John Wiegley
  2016-02-29  0:22                                                                                                                   ` Drew Adams
@ 2016-02-29  0:31                                                                                                                   ` Juri Linkov
  2016-02-29  3:45                                                                                                                     ` Eli Zaretskii
  1 sibling, 1 reply; 263+ messages in thread
From: Juri Linkov @ 2016-02-29  0:31 UTC (permalink / raw)
  To: Drew Adams; +Cc: rms, joostkremers, lokedhs, emacs-devel, Eli Zaretskii, larsi

>>>>>> Drew Adams <drew.adams@oracle.com> writes:
>
>> What seems clear to me for Emacs 25.1 is that the feature should be included
>> AND that it should be simple to both (1) customize the default behavior for
>> a given user (i.e., what behavior search starts with, a la
>> `case-fold-search') and (2) toggle the behavior on the fly, during Isearch.
>
> I think Drew has summarized perfectly what I would like to see happen. In
> addition, I'd add one more item: Once 25.1 is released, I (or another) will
> write a blog article publicizing this feature and touting its benefits, in
> order to encourage people to try it out and discover how useful it can be.
>
> However, making it a default in 25.1 is something I am simply not comfortable
> doing, giving the diversity of opinion on this list, plus my own misgivings
> about so new (and nuanced) a feature. Yes, the visual equality of á and á is a
> powerful argument, but as Drew said, there will be well-advertised ways to
> both enable this feature, and to toggle it while searching. Users will not
> lose any capacity by our decision, they will simply not experience it as a
> default out of the box.
>
> And so, my decision is that this feature will be off by default in the 25.1
> release, with the genuine hope that it can be made solid enough to become a
> default in a future release. It needn't even wait until 26.1, if we receive
> enough positive feedback.
>
> My thanks to everyone for the extensive and conscientious debate, and to Eli
> for sticking to his guns. I am hopeful we will reach general consensus over
> time, and that this feature will come to be recognized as a compelling aspect
> of the Emacs feature set. Until that day, please forgive me my reservations;
> I'm just not there yet in wanting this to become a default behavior.

Even if disabled by default before the next release, do you think
we still have to polish and finish this feature before the release,
so the users willing to enable it would enjoy it bug-free and usable?
In case of a positive answer, I have a few ideas how to help achieve
this goal.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-29  0:31                                                                                                                   ` Juri Linkov
@ 2016-02-29  3:45                                                                                                                     ` Eli Zaretskii
  0 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-29  3:45 UTC (permalink / raw)
  To: Juri Linkov; +Cc: rms, joostkremers, lokedhs, emacs-devel, larsi, drew.adams

> From: Juri Linkov <juri@linkov.net>
> Cc: Eli Zaretskii <eliz@gnu.org>,  joostkremers@fastmail.fm,  larsi@gnus.org,  lokedhs@gmail.com,  rms@gnu.org,  emacs-devel@gnu.org
> Date: Mon, 29 Feb 2016 02:31:21 +0200
> 
> Even if disabled by default before the next release, do you think
> we still have to polish and finish this feature before the release,
> so the users willing to enable it would enjoy it bug-free and usable?

That goes without saying.

> In case of a positive answer, I have a few ideas how to help achieve
> this goal.

Thanks in advance.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-29  0:22                                                                                         ` Juri Linkov
@ 2016-02-29 16:27                                                                                           ` Eli Zaretskii
  2016-02-29 23:40                                                                                             ` Juri Linkov
  0 siblings, 1 reply; 263+ messages in thread
From: Eli Zaretskii @ 2016-02-29 16:27 UTC (permalink / raw)
  To: Juri Linkov; +Cc: larsi, lokedhs, rms, emacs-devel

> From: Juri Linkov <juri@linkov.net>
> Cc: larsi@gnus.org,  lokedhs@gmail.com,  rms@gnu.org,  emacs-devel@gnu.org
> Date: Mon, 29 Feb 2016 02:22:02 +0200
> 
> > What I envisioned is a single variable that holds a list of folding
> > sub-features.  Examples include ignoring diacritics, matching
> > ligatures and their decompositions, "controversial" foldings that
> > users of specific languages might not want, etc.  The default value
> > will hold all of the sub-features; users that don't want some of them
> > will be able to remove them from the list, which will affect the
> > mapping at search time.  We could also have a setting that means "DTRT
> > for my locale", which will remove the sub-features inappropriate for
> > the locale's language.  Stuff like that.
> 
> Like (defcustom char-fold-defaults '(ignore-diacritics match-ligatures ...?

Yes.

> Not sure if such terms are self-descriptive.  At least plain pairs like
> '((o ø) (l ł) ...) should be enough to customize at the base character level,
> and later we might consider grouping such pairs into a more high-level
> features like ‘spanish-diacritics’, ‘swedish-diacritics’, etc.

Such grouping is what I had in mind.  I don't expect users to remember
these characters by heart.

> > I think we need to create another uni-*.el file which defines a
> > decomposition char-table populated from decomps.txt.
> 
> The name of the currently used Unicode character property is “decomposition”.
> What would be a good name for the property from decomps.txt? “decomposition2”?

I'm not good at naming stuff, but how about collating-decomposition or
decomposition-for-collation?



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-29 16:27                                                                                           ` Eli Zaretskii
@ 2016-02-29 23:40                                                                                             ` Juri Linkov
  2016-03-01 16:44                                                                                               ` Eli Zaretskii
  0 siblings, 1 reply; 263+ messages in thread
From: Juri Linkov @ 2016-02-29 23:40 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: larsi, lokedhs, rms, emacs-devel

>> Like (defcustom char-fold-defaults '(ignore-diacritics match-ligatures ...?
>
> Yes.
>
>> Not sure if such terms are self-descriptive.  At least plain pairs like
>> '((o ø) (l ł) ...) should be enough to customize at the base character level,
>> and later we might consider grouping such pairs into a more high-level
>> features like ‘spanish-diacritics’, ‘swedish-diacritics’, etc.
>
> Such grouping is what I had in mind.  I don't expect users to remember
> these characters by heart.

OTOH, they definitely know what characters they want to ignore.

>> > I think we need to create another uni-*.el file which defines a
>> > decomposition char-table populated from decomps.txt.
>>
>> The name of the currently used Unicode character property is “decomposition”.
>> What would be a good name for the property from decomps.txt? “decomposition2”?
>
> I'm not good at naming stuff, but how about collating-decomposition or
> decomposition-for-collation?

Or to put decompositions from decomps.txt into the same table
with UnicodeData.txt decompositions, but mark these additional
decompositions by a special tag "<collation>", or better using
the same tag "<sort>" introduced in decomps.txt.



^ permalink raw reply	[flat|nested] 263+ messages in thread

* Re: On language-dependent defaults for character-folding
  2016-02-29 23:40                                                                                             ` Juri Linkov
@ 2016-03-01 16:44                                                                                               ` Eli Zaretskii
  0 siblings, 0 replies; 263+ messages in thread
From: Eli Zaretskii @ 2016-03-01 16:44 UTC (permalink / raw)
  To: Juri Linkov; +Cc: larsi, lokedhs, rms, emacs-devel

> From: Juri Linkov <juri@linkov.net>
> Cc: larsi@gnus.org,  lokedhs@gmail.com,  rms@gnu.org,  emacs-devel@gnu.org
> Date: Tue, 01 Mar 2016 01:40:12 +0200
> 
> >> What would be a good name for the property from decomps.txt? “decomposition2”?
> >
> > I'm not good at naming stuff, but how about collating-decomposition or
> > decomposition-for-collation?
> 
> Or to put decompositions from decomps.txt into the same table
> with UnicodeData.txt decompositions, but mark these additional
> decompositions by a special tag "<collation>", or better using
> the same tag "<sort>" introduced in decomps.txt.

Yes, I think this is a better alternative.



^ permalink raw reply	[flat|nested] 263+ messages in thread

end of thread, other threads:[~2016-03-01 16:44 UTC | newest]

Thread overview: 263+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-09 17:26 On language-dependent defaults for character-folding Artur Malabarba
2016-02-09 17:39 ` Pierpaolo Bernardi
2016-02-09 17:54   ` Paul Eggert
2016-02-10  0:49     ` Pierpaolo Bernardi
2016-02-10  2:20       ` Artur Malabarba
2016-02-10  3:01         ` Pierpaolo Bernardi
2016-02-10  9:55           ` Artur Malabarba
2016-02-10 18:12             ` Óscar Fuentes
2016-02-10 19:23               ` Artur Malabarba
2016-02-09 17:48 ` Drew Adams
2016-02-09 16:43   ` Artur Malabarba
2016-02-09 17:58 ` Eli Zaretskii
2016-02-09 17:10   ` Artur Malabarba
2016-02-09 18:21 ` Óscar Fuentes
2016-02-09 19:54   ` Artur Malabarba
2016-02-09 20:08     ` Eli Zaretskii
2016-02-10  1:58       ` Artur Malabarba
2016-02-09 21:07     ` Óscar Fuentes
2016-02-10  2:18       ` Artur Malabarba
2016-02-10  2:52         ` Óscar Fuentes
2016-02-10  2:56         ` Mark Oteiza
2016-02-10 15:25         ` Eli Zaretskii
2016-02-10 21:17           ` Artur Malabarba
2016-02-11  3:39             ` Eli Zaretskii
2016-02-12 22:36           ` Per Starbäck
2016-02-13  8:33             ` Eli Zaretskii
2016-02-13 10:10               ` Markus Triska
2016-02-13 10:21                 ` Eli Zaretskii
2016-02-13 16:46           ` joakim
2016-02-11  0:54         ` Juri Linkov
2016-02-11  1:37           ` Óscar Fuentes
2016-02-12  0:50             ` Juri Linkov
2016-02-12  1:50               ` Óscar Fuentes
2016-02-12  7:10                 ` Eli Zaretskii
2016-02-12  7:32                   ` Óscar Fuentes
2016-02-12  8:44                     ` Eli Zaretskii
2016-02-12 10:03                       ` Óscar Fuentes
2016-02-12 11:11                         ` Joost Kremers
2016-02-12 18:21                           ` Óscar Fuentes
2016-02-12 12:00                         ` Eli Zaretskii
2016-02-12 18:42                           ` Óscar Fuentes
2016-02-12 19:06                             ` Eli Zaretskii
2016-02-12 19:28                               ` Óscar Fuentes
2016-02-12 23:57                               ` Juri Linkov
2016-02-13  0:06                                 ` Drew Adams
2016-02-13  8:49                                 ` Eli Zaretskii
2016-02-13 17:20                                   ` Drew Adams
2016-02-13 17:58                                     ` Eli Zaretskii
2016-02-18 19:15                                       ` John Wiegley
2016-02-18 20:12                                         ` Eli Zaretskii
2016-02-19  5:11                                           ` Lars Ingebrigtsen
2016-02-19  8:20                                             ` Eli Zaretskii
2016-02-19  9:22                                               ` Elias Mårtenson
2016-02-19 10:09                                                 ` Eli Zaretskii
2016-02-19 10:51                                                   ` Elias Mårtenson
2016-02-19 11:46                                                     ` Eli Zaretskii
2016-02-19 13:37                                                       ` Elias Mårtenson
2016-02-19 19:18                                                         ` Eli Zaretskii
2016-02-20  5:22                                                           ` Elias Mårtenson
2016-02-20  6:31                                                             ` Lars Ingebrigtsen
2016-02-20  9:18                                                               ` Elias Mårtenson
2016-02-20 10:34                                                               ` Eli Zaretskii
2016-02-21  2:51                                                                 ` Lars Ingebrigtsen
2016-02-21  6:28                                                                   ` Elias Mårtenson
2016-02-21  8:14                                                                     ` Achim Gratz
2016-02-23 16:56                                                                       ` Eli Zaretskii
2016-02-21 10:05                                                                     ` Lars Ingebrigtsen
2016-02-21 11:01                                                                       ` Elias Mårtenson
2016-02-21 16:02                                                                         ` Eli Zaretskii
2016-02-22  1:58                                                                         ` Lars Ingebrigtsen
2016-02-22  2:34                                                                           ` Elias Mårtenson
2016-02-22  2:48                                                                             ` Lars Ingebrigtsen
2016-02-22  6:13                                                                               ` Werner LEMBERG
2016-02-22 18:03                                                                                 ` Richard Stallman
2016-02-22 18:27                                                                                   ` Werner LEMBERG
2016-02-22 18:01                                                                               ` Richard Stallman
2016-02-22 19:06                                                                                 ` Eli Zaretskii
2016-02-23 17:43                                                                                   ` Richard Stallman
2016-02-23 18:14                                                                                     ` Eli Zaretskii
2016-02-23 20:24                                                                                       ` Yuri Khan
2016-02-25 12:11                                                                                         ` Richard Stallman
2016-02-25 14:57                                                                                           ` Yuri Khan
2016-02-26 20:21                                                                                             ` Richard Stallman
2016-02-27  5:47                                                                                               ` Yuri Khan
2016-02-27 19:54                                                                                                 ` Richard Stallman
2016-02-27 20:02                                                                                                   ` Eli Zaretskii
2016-02-27 20:05                                                                                                   ` Eli Zaretskii
2016-02-28 10:25                                                                                                     ` Richard Stallman
2016-02-28  6:06                                                                                                   ` Yuri Khan
2016-02-24 13:41                                                                                       ` Richard Stallman
2016-02-24 17:54                                                                                         ` Eli Zaretskii
2016-02-25 12:15                                                                                           ` Richard Stallman
2016-02-25 12:38                                                                                             ` Joost Kremers
2016-02-25 22:43                                                                                               ` John Wiegley
2016-02-25 22:48                                                                                                 ` John Wiegley
2016-02-26 18:13                                                                                                 ` Eli Zaretskii
2016-02-27  0:48                                                                                                   ` John Wiegley
2016-02-27  8:38                                                                                                     ` Eli Zaretskii
2016-02-27  8:58                                                                                                       ` John Wiegley
2016-02-27  9:30                                                                                                         ` Eli Zaretskii
2016-02-27 16:22                                                                                                           ` Ken Brown
2016-02-27 22:48                                                                                                           ` John Wiegley
2016-02-28 15:57                                                                                                             ` Eli Zaretskii
2016-02-28 16:59                                                                                                               ` Drew Adams
2016-02-28 22:59                                                                                                                 ` John Wiegley
2016-02-29  0:22                                                                                                                   ` Drew Adams
2016-02-29  0:31                                                                                                                   ` Juri Linkov
2016-02-29  3:45                                                                                                                     ` Eli Zaretskii
2016-02-27 19:53                                                                                                       ` Richard Stallman
2016-02-27 20:01                                                                                                         ` Eli Zaretskii
2016-02-28 10:24                                                                                                           ` Richard Stallman
2016-02-28 16:01                                                                                                             ` Eli Zaretskii
     [not found]                                                                                                           ` <<E1aZyX5-0007bU-Mu@fencepost.gnu.org>
     [not found]                                                                                                             ` <<83oab0ako0.fsf@gnu.org>
2016-02-28 17:00                                                                                                               ` Drew Adams
2016-02-28 17:59                                                                                                                 ` Clément Pit--Claudel
2016-02-28 18:04                                                                                                                   ` Eli Zaretskii
2016-02-28 18:15                                                                                                                     ` Clément Pit--Claudel
2016-02-28 18:23                                                                                                                     ` Drew Adams
2016-02-28 18:46                                                                                                                       ` Eli Zaretskii
2016-02-28 18:22                                                                                                                   ` Drew Adams
2016-02-28 18:58                                                                                                                     ` Clément Pit--Claudel
2016-02-24 13:41                                                                                       ` Richard Stallman
2016-02-24 17:56                                                                                         ` Eli Zaretskii
2016-02-25 12:15                                                                                           ` Richard Stallman
2016-02-23 20:21                                                                                     ` Yuri Khan
2016-02-23 21:15                                                                                       ` Marcin Borkowski
2016-02-22 18:01                                                                             ` Richard Stallman
2016-02-22 18:58                                                                               ` Eli Zaretskii
2016-02-23  1:30                                                                               ` Lars Ingebrigtsen
2016-02-23 17:46                                                                                 ` Richard Stallman
2016-02-24  1:50                                                                                   ` Lars Ingebrigtsen
2016-02-24  6:40                                                                                     ` Lars Brinkhoff
2016-02-24 13:43                                                                                     ` Richard Stallman
2016-02-23  2:03                                                                               ` Elias Mårtenson
2016-02-23 17:46                                                                                 ` Richard Stallman
2016-02-22  3:38                                                                           ` Eli Zaretskii
2016-02-22  3:57                                                                             ` Lars Ingebrigtsen
2016-02-22 16:10                                                                               ` Eli Zaretskii
2016-02-22 18:58                                                                               ` John Wiegley
2016-02-23  7:50                                                                                 ` Per Starbäck
2016-02-23 16:29                                                                                   ` John Wiegley
2016-02-21 16:31                                                                     ` Eli Zaretskii
2016-02-21 16:58                                                                       ` Elias Mårtenson
2016-02-21 17:23                                                                         ` Eli Zaretskii
2016-02-21 18:48                                                                           ` Ivan Andrus
2016-02-22 15:58                                                                           ` Wolfgang Jenkner
2016-02-22 16:35                                                                             ` Eli Zaretskii
2016-02-22 16:56                                                                               ` Wolfgang Jenkner
2016-02-22 17:24                                                                                 ` Eli Zaretskii
2016-02-22 17:59                                                                           ` Richard Stallman
2016-02-22 18:57                                                                             ` Eli Zaretskii
2016-02-23 17:43                                                                               ` Richard Stallman
2016-02-23 18:03                                                                                 ` Eli Zaretskii
2016-02-24 13:41                                                                                   ` Richard Stallman
2016-02-23 17:43                                                                               ` Richard Stallman
     [not found]                                                                               ` <<E1aYGze-000655-RM@fencepost.gnu.org>
2016-02-23 18:00                                                                                 ` Drew Adams
2016-02-22 17:59                                                                         ` Richard Stallman
2016-02-22 18:51                                                                           ` Eli Zaretskii
2016-02-23  0:14                                                                             ` Juri Linkov
2016-02-23 17:11                                                                               ` Eli Zaretskii
2016-02-24  0:16                                                                                 ` Juri Linkov
2016-02-24 18:39                                                                                   ` Eli Zaretskii
2016-02-25  0:29                                                                                     ` Juri Linkov
2016-02-25 16:24                                                                                       ` Eli Zaretskii
2016-02-29  0:22                                                                                         ` Juri Linkov
2016-02-29 16:27                                                                                           ` Eli Zaretskii
2016-02-29 23:40                                                                                             ` Juri Linkov
2016-03-01 16:44                                                                                               ` Eli Zaretskii
2016-02-26 20:23                                                                             ` Richard Stallman
2016-02-21 16:25                                                                   ` Eli Zaretskii
2016-02-22  1:56                                                                     ` Lars Ingebrigtsen
2016-02-22  9:20                                                                       ` Andreas Schwab
2016-02-23  1:46                                                                         ` Lars Ingebrigtsen
2016-02-23  3:38                                                                           ` Eli Zaretskii
2016-02-21 12:44                                                                 ` Richard Stallman
2016-02-21 16:05                                                                   ` Eli Zaretskii
2016-02-22 17:57                                                                     ` Richard Stallman
2016-02-22 18:34                                                                       ` Eli Zaretskii
2016-02-20  9:21                                                             ` Eli Zaretskii
2016-02-20 10:08                                                               ` Elias Mårtenson
2016-02-20 10:44                                                                 ` Eli Zaretskii
2016-02-19 20:38                                                 ` Marcin Borkowski
2016-02-19 22:44                                               ` Lars Ingebrigtsen
2016-02-19 22:54                                                 ` Clément Pit--Claudel
2016-02-20  5:25                                                   ` Elias Mårtenson
2016-02-20 14:32                                                     ` Richard Stallman
2016-02-20 15:50                                                       ` Elias Mårtenson
2016-02-21 12:45                                                         ` Richard Stallman
2016-02-20  8:09                                                 ` Eli Zaretskii
2016-02-20 14:32                                                   ` Richard Stallman
2016-02-24 23:27                                                     ` Rasmus
2016-02-25 20:46                                                       ` Richard Stallman
2016-02-13 18:15                                     ` Artur Malabarba
2016-02-13 18:26                                       ` Drew Adams
2016-02-12 19:09                             ` Clément Pit--Claudel
2016-02-12 19:39                               ` Óscar Fuentes
2016-02-13 15:32                       ` Richard Stallman
2016-02-13 15:40                         ` Eli Zaretskii
2016-02-13 16:58                           ` Andreas Schwab
2016-02-13 17:44                             ` Eli Zaretskii
2016-02-13 16:37                       ` Marcin Borkowski
2016-02-13 16:50                         ` Eli Zaretskii
2016-02-13 17:15                           ` Marcin Borkowski
2016-02-13 17:45                             ` Eli Zaretskii
2016-02-13 17:52                               ` Marcin Borkowski
2016-02-13 17:46                             ` andres.ramirez
2016-02-14 13:59                           ` Richard Stallman
2016-02-12 23:50                 ` Juri Linkov
2016-02-13  0:33                   ` Óscar Fuentes
2016-02-14 13:57                     ` Richard Stallman
2016-02-14 14:27                       ` Óscar Fuentes
2016-02-15 10:28                         ` Richard Stallman
2016-02-15 12:31                           ` Óscar Fuentes
2016-02-15 17:45                             ` Richard Stallman
2016-02-16 13:54                               ` Elias Mårtenson
2016-02-16 14:30                               ` Per Starbäck
2016-02-16 19:32                                 ` Ken Brown
2016-02-16 23:49                                   ` Lars Ingebrigtsen
2016-02-17 16:03                                     ` Richard Stallman
2016-02-18  8:57                                   ` Alan Mackenzie
2016-02-18 17:27                                     ` Eli Zaretskii
2016-02-19 12:37                                       ` Richard Stallman
2016-02-19 18:31                                         ` John Wiegley
2016-02-17  8:00                                 ` Joost Kremers
2016-02-17 15:34                                   ` Eli Zaretskii
2016-02-17 18:30                                     ` Achim Gratz
2016-02-17 19:30                                       ` Eli Zaretskii
2016-02-17 20:26                                       ` Marcin Borkowski
2016-02-17 20:06                                     ` Joost Kremers
2016-02-17 20:15                                       ` Eli Zaretskii
2016-02-17 22:58                                         ` Ken Brown
2016-02-18  0:03                                           ` Vinicius Latorre
2016-02-18 17:29                                             ` Eli Zaretskii
2016-02-18  4:55                                           ` Marcin Borkowski
2016-02-18 11:26                                           ` Filipp Gunbin
2016-02-18 17:26                                             ` Eli Zaretskii
2016-02-19 12:30                                               ` Filipp Gunbin
2016-02-19 15:22                                                 ` Eli Zaretskii
2016-02-18 17:30                                           ` Eli Zaretskii
2016-02-17 22:53                                     ` Mark Oteiza
2016-02-18  0:11                                       ` Juri Linkov
2016-02-18  0:20                                         ` Mark Oteiza
2016-02-18 17:28                                           ` Eli Zaretskii
2016-02-18  4:53                                         ` Marcin Borkowski
2016-02-18 17:07                                           ` Elias Mårtenson
2016-02-18 17:21                                             ` Eli Zaretskii
2016-02-19  7:40                                               ` Elias Mårtenson
2016-02-19 19:24                                                 ` Achim Gratz
2016-02-20  5:05                                                   ` Elias Mårtenson
2016-02-20 13:59                                                     ` Achim Gratz
2016-02-19 20:47                                             ` Marcin Borkowski
2016-02-20 14:31                                               ` Richard Stallman
2016-02-18 17:46                                       ` Eli Zaretskii
2016-02-18 18:18                                         ` Mark Oteiza
2016-02-18 18:24                                           ` Eli Zaretskii
2016-02-18 16:30                                     ` Richard Stallman
2016-02-18 17:07                                       ` Eli Zaretskii
2016-02-13 16:38                 ` Marcin Borkowski
2016-02-13 17:58                   ` Content navigation (was: On language-dependent defaults for character-folding) Óscar Fuentes
2016-02-13 16:32       ` On language-dependent defaults for character-folding Marcin Borkowski
2016-02-13 16:47         ` Eli Zaretskii
2016-02-13 17:03           ` Marcin Borkowski
2016-02-10 13:52 ` Adrian.B.Robert
2016-02-24  9:58 ` Marcin Borkowski

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).