unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Character folding in the pretest
@ 2016-02-03  0:31 Per Starbäck
  2016-02-03  6:34 ` Adrian.B.Robert
                   ` (3 more replies)
  0 siblings, 4 replies; 102+ messages in thread
From: Per Starbäck @ 2016-02-03  0:31 UTC (permalink / raw)
  To: emacs-devel@gnu.org

I brought up earlier that the new character fold feature that still
hasn't been in any released version of Emacs shouldn't be turned on by
default when it debuts.

Now I've tested the first prerelease of Emacs 25, and seen that it is
still turned on by default, so I'll revisit this and argue why this is
important. Probably what I say here is all I have to say.

=== There was a lot of agreement ===

RMS wrote that there ought to be a poll about the default. Eli wrote
that

> Such a poll could only work if the behavior intended to become the
> default is already available in released versions of Emacs, so users
> could turn it on and try it.  This is not the case with character
> folding, which is only available in development snapshots, and
> actually is still in flux: it changes in non-trivial ways almost every
> day.
>
> If we are afraid users will hate this default, we can turn it off in
> v25.1 and consider making it the default later.

RMS commented:
> That seems like the right approach.

Artur Malabarba wrote:
> I don't mind leaving this OFF by default in Emacs 25. So long as the
> eventual goal is to have it ON by default (preferably in 26).

Drew liked the feature and thought it should be turned off initially:
> My expectation, if we turn it off by default, is that users will
> try it, like it, and possibly ask for it to become the default
> behavior.  There is no reason to jump the gun on this.

Eli thought that it should remain turned on in the pretest to get more
testing:
> The entire time interval between Nov 15 this year and until we release
> Emacs 25.1 (which will take a few months, probably more than 6,
> judging by past experience) is supposed to provide that feedback.  All
> it takes to turn this off by default is changing the default value of
> a single variable (and change a couple of places in the User Manual to
> reflect that).  Once we decide to do that, it can be done very quickly
> and easily.  We can do that a day before the release, if we want to.
>
> OTOH, turning it off today means that it will get much less testing,
> and therefore bugs related to it (like the one reported just today in
> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=22090) will most probably
> remain hidden for who knows how long.

It's time to make that decision now.

=== Why ===

Because this is a big change that have repercussions that haven't had
all the major wrinkles ironed out yet. Some software throws big
changes like that in the face of the users, and more or less force
them to get the kinks out or find out how to turn it off, but that is
usually not the Emacs way. Here usually the big kinks are already
taken care of when something is introduced to users who haven't
specifically asked for it. That's a good thing.

Eli argued against me that Emacs sometimes does that, for example with
bidi which he argued was a much bigger change. In some ways it was,
but still, for people who don't use RTL languages all of that has been
more of less invisible, and for those who do it was obviously better
than without it.

I know how the current character fold version is *just wrong* for
Swedes and other Scandinavians when handling their native languages.

There was a flurry of messages then which I couldn't keep up with, and
where I thought most of it took up issues I already had answered
anyway, but I'm getting back to this now. One answer was that problems
for Scandinavians wasn't relevant, because I had to show that it was
"_wrong_ in _most_ situations" to be relevant. I don't agree with
that, but even if you do, I think my Scandinavian example is only an
example, and that there probably are several similar in different
locales.

=== What was that Swedish example now again? ===

A and Ä.

In classical Latin U and V was the same letter. Not until Late Middle
Ages were there these two forms and they weren't differentiated one as
a consonant and one as a vowel until the 16th century.

In spite their historical equivalness they are clearly different
letters in for example English. Having a character fold feature where
a search for U found V would be *just wrong*. Since everyone on this
list knows English everyone knows that.

What we get now for Swedish is very similar to that. Everyone who
knows Swedish knows that. Here ÅÄÖ are separate letters in all ways
from A and O, in spite of their historic origin tying them together.
That is just history. "Ä" has its own key on a keyboard, its own name
and its own position in the alphabet.

For a Swede to have a search for "varpa" in a Swedish text find
"värpa" or "varpå" would be *just wrong*. It would give a strong
impression of this being an American program not meant to be used for
Swedish.

Note that this is not me saying that we Swedes don't like character
folding. It's a perfectly good feature to have a search for "entre"
find "entré" or a search for "crepe" find "crêpe" because "é" and "ê"
are accented variants of "e". But "ä" in Swedish is in no way an
accented letter.

At this point several people usually reply "then just turn it off".
But the point is that by having it work like this out of the box it
sends a message to some new users that Emacs is not usable at all.
If they instead have some problems with a feature they have explicitly
turned on that's something else. Those who have turned it on know how
to turn it off. Others don't necessarily know that.

=== Are there other examples? ===

I won't say something certain about a language I'm not a native
speaker of, but I think there are similar situations. I suspect for
example that Russian и and й is a similar pair, where it is *just
wrong* that a search for "и" (CYRILLIC SMALL LETTER I) also finds "й"
(CYRILLIC SMALL LETTER SHORT I).

All in all I see the need for a feature to adjust individual entries
of the character folding before it ought to be turned on by default.

=== Are users expecting this? ===

Has Emacs been late implementing character folding? Is everyone
expecting that now and it's important to turn it on to now seem to be
out of the loop?

It doesn't seem so. Eli wrote first that character folding was
introduced in Emacs to give users "what the other text-editing and
word-processing environments provide, what they therefore are expected
to expect". I answered that for example Gedit and Firefox didn't have
this feature, and then Eli wrote that I should "try more serious
editing environments" like MS Word. Since then I have had opportunity
to try MS Word 2013 and I couldn't find such a feature.

Maybe there was such a feature I couldn't find. Maybe it had been
turned off by the system administrators at my university. I don't
know, but on a random web source,
http://wordribbon.tips.net/T010627_Ignoring_Accented_Characters_in_Searches.html
I find it stated that MS Word (2007, 2010, and 2013) doesn't have such
a feature.

I don't think this is something users expect, but something that will
be an example of how Emacs does things better for those who can turn
it off for good results already and for the rest of us when it has
become slightly more featureful.

=== Option menu ===

Also, please please add a checkbox for character folding just above or
below the one for case folding in the Options menu!!



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03  0:31 Character folding in the pretest Per Starbäck
@ 2016-02-03  6:34 ` Adrian.B.Robert
  2016-02-03  8:00 ` Paul Eggert
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 102+ messages in thread
From: Adrian.B.Robert @ 2016-02-03  6:34 UTC (permalink / raw)
  To: emacs-devel


As a developer in a Scandinavian country I find the new case folding
very useful for searching in text when I have a US keyboard layout
enabled.  That said, I agree that it should not be the default, but
be easily discoverable in the Options menu.




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03  0:31 Character folding in the pretest Per Starbäck
  2016-02-03  6:34 ` Adrian.B.Robert
@ 2016-02-03  8:00 ` Paul Eggert
  2016-02-03 10:54   ` Yuri Khan
  2016-02-03 11:08 ` Artur Malabarba
  2016-02-03 15:39 ` Eli Zaretskii
  3 siblings, 1 reply; 102+ messages in thread
From: Paul Eggert @ 2016-02-03  8:00 UTC (permalink / raw)
  To: emacs-devel

Per Starbäck wrote:
> I suspect for
> example that Russian и and й is a similar pair, where it is *just
> wrong* that a search for "и" (CYRILLIC SMALL LETTER I) also finds "й"
> (CYRILLIC SMALL LETTER SHORT I).

For what it's worth, here is an amusing bug report involving two Russians who 
disagree about whether to accent-fold и and й:

http://tracker.firebirdsql.org/browse/CORE-4803




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03  8:00 ` Paul Eggert
@ 2016-02-03 10:54   ` Yuri Khan
  2016-02-03 15:57     ` Filipp Gunbin
  0 siblings, 1 reply; 102+ messages in thread
From: Yuri Khan @ 2016-02-03 10:54 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Emacs developers

On Wed, Feb 3, 2016 at 2:00 PM, Paul Eggert <eggert@cs.ucla.edu> wrote:

> For what it's worth, here is an amusing bug report involving two Russians
> who disagree about whether to accent-fold и and й:
>
> http://tracker.firebirdsql.org/browse/CORE-4803

Very funny. In Russian, И and Й are only treated as equivalent within
crossword puzzles; otherwise everybody agrees they are different
letters. Е and Ё, on the other hand, are a holywar-inducing contention
point.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03  0:31 Character folding in the pretest Per Starbäck
  2016-02-03  6:34 ` Adrian.B.Robert
  2016-02-03  8:00 ` Paul Eggert
@ 2016-02-03 11:08 ` Artur Malabarba
  2016-02-03 13:24   ` Stefan Monnier
                     ` (2 more replies)
  2016-02-03 15:39 ` Eli Zaretskii
  3 siblings, 3 replies; 102+ messages in thread
From: Artur Malabarba @ 2016-02-03 11:08 UTC (permalink / raw)
  To: Per Starbäck; +Cc: emacs-devel@gnu.org

Per Starbäck <per@starback.se> writes:

> I brought up earlier that the new character fold feature that still
> hasn't been in any released version of Emacs shouldn't be turned on by
> default when it debuts.

FTR, My opinion on this is still as you quoted:

>> I don't mind leaving this OFF by default in Emacs 25. So long as the
>> eventual goal is to have it ON by default (preferably in 26).

I do also share Eli's opinion, that it would be nice to get as much
(pre)testing as possible before the release. However, it's likely I'll
grow a little absent from this list in the next few months, so it's
entirely possible I'll miss out on the chance to turn this OFF before
release.

Does anyone volunteer to switch OFF this default shortly before release?
If not, I'll just do it now.

> I know how the current character fold version is *just wrong* for
> Swedes and other Scandinavians when handling their native languages.

The current version just follows the Unicode standard (plus a few ad-hoc
rules related to quotation marks), whose authors have certainly spent a
lot more time on this than us. This is just a polite way of saying
“we're not catering to any languages, take any complaints up with that
other team”.

Of course, that doesn't mean we can't improve support to specific
languages/locales in the future. But I've mentioned before I don't want to
start designing APIs or sophisticated features on top of the current
implementation before seeing how it fares “in the wild” for at least one
release.

> === Option menu ===
>
> Also, please please add a checkbox for character folding just above or
> below the one for case folding in the Options menu!!

Yes, please!



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 11:08 ` Artur Malabarba
@ 2016-02-03 13:24   ` Stefan Monnier
  2016-02-03 13:35     ` Nicolas Petton
  2016-02-03 15:38   ` Eli Zaretskii
  2016-02-03 22:53   ` Richard Stallman
  2 siblings, 1 reply; 102+ messages in thread
From: Stefan Monnier @ 2016-02-03 13:24 UTC (permalink / raw)
  To: emacs-devel

> Does anyone volunteer to switch OFF this default shortly before release?

Am I the only one worried about making changes "shortly before release"?


        Stefan




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 13:24   ` Stefan Monnier
@ 2016-02-03 13:35     ` Nicolas Petton
  2016-02-03 15:06       ` Drew Adams
  2016-02-03 15:41       ` Eli Zaretskii
  0 siblings, 2 replies; 102+ messages in thread
From: Nicolas Petton @ 2016-02-03 13:35 UTC (permalink / raw)
  To: Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 324 bytes --]

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> Does anyone volunteer to switch OFF this default shortly before release?
>
> Am I the only one worried about making changes "shortly before
> release"?

I agree.  If we want to turn it off by default for the release, I'd do
it now, so it gets in the next pretest.

Nico

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: Character folding in the pretest
  2016-02-03 13:35     ` Nicolas Petton
@ 2016-02-03 15:06       ` Drew Adams
  2016-02-03 15:41       ` Eli Zaretskii
  1 sibling, 0 replies; 102+ messages in thread
From: Drew Adams @ 2016-02-03 15:06 UTC (permalink / raw)
  To: Nicolas Petton, Stefan Monnier, emacs-devel

> >> Does anyone volunteer to switch OFF this default shortly before release?
> >
> > Am I the only one worried about making changes "shortly before
> > release"?
> 
> I agree.  If we want to turn it off by default for the release, I'd do
> it now, so it gets in the next pretest.

+1  WYTestIWYG



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 11:08 ` Artur Malabarba
  2016-02-03 13:24   ` Stefan Monnier
@ 2016-02-03 15:38   ` Eli Zaretskii
  2016-02-03 22:53   ` Richard Stallman
  2 siblings, 0 replies; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-03 15:38 UTC (permalink / raw)
  To: Artur Malabarba; +Cc: per, emacs-devel

> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Date: Wed, 03 Feb 2016 11:08:57 +0000
> Cc: "emacs-devel@gnu.org" <emacs-devel@gnu.org>
> 
> Does anyone volunteer to switch OFF this default shortly before release?
> If not, I'll just do it now.

Thanks, but there's no need to do this yet.  Doing that is easy, so if
you need a volunteer, here I am.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03  0:31 Character folding in the pretest Per Starbäck
                   ` (2 preceding siblings ...)
  2016-02-03 11:08 ` Artur Malabarba
@ 2016-02-03 15:39 ` Eli Zaretskii
  3 siblings, 0 replies; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-03 15:39 UTC (permalink / raw)
  To: Per Starbäck; +Cc: emacs-devel

> Date: Wed, 3 Feb 2016 01:31:11 +0100
> From: Per Starbäck <per@starback.se>
> 
> Eli thought that it should remain turned on in the pretest to get more
> testing:
> > The entire time interval between Nov 15 this year and until we release
> > Emacs 25.1 (which will take a few months, probably more than 6,
> > judging by past experience) is supposed to provide that feedback.  All
> > it takes to turn this off by default is changing the default value of
> > a single variable (and change a couple of places in the User Manual to
> > reflect that).  Once we decide to do that, it can be done very quickly
> > and easily.  We can do that a day before the release, if we want to.
> >
> > OTOH, turning it off today means that it will get much less testing,
> > and therefore bugs related to it (like the one reported just today in
> > http://debbugs.gnu.org/cgi/bugreport.cgi?bug=22090) will most probably
> > remain hidden for who knows how long.
> 
> It's time to make that decision now.

IMO, it's too early for that.  As I said in the quote above, the time
interval for the feedback can go on until very close to the release.
That time is still far away.  The pretest just started less than a
week ago, and no new opinions were heard yet.  Let us collect the
feedback for a bit more than just a couple of days.

If someone wants to start a poll somewhere, please do, it will allow
us to collect more data and make better decisions.  If not, we will
have to go with what will be written here and on other relevant
forums.

> Also, please please add a checkbox for character folding just above or
> below the one for case folding in the Options menu!!

Indeed, patches are welcome for such an addition.  (Lax whitespace
option probably needs a similar option.)

Thanks.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 13:35     ` Nicolas Petton
  2016-02-03 15:06       ` Drew Adams
@ 2016-02-03 15:41       ` Eli Zaretskii
  2016-02-03 15:55         ` Teemu Likonen
  2016-02-03 16:54         ` Clément Pit--Claudel
  1 sibling, 2 replies; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-03 15:41 UTC (permalink / raw)
  To: Nicolas Petton; +Cc: monnier, emacs-devel

> From: Nicolas Petton <nicolas@petton.fr>
> Date: Wed, 03 Feb 2016 14:35:46 +0100
> 
> If we want to turn it off by default for the release

We don't, at least not yet.  We want to collect feedback.




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 15:41       ` Eli Zaretskii
@ 2016-02-03 15:55         ` Teemu Likonen
  2016-02-03 16:16           ` Eli Zaretskii
  2016-02-03 16:54         ` Clément Pit--Claudel
  1 sibling, 1 reply; 102+ messages in thread
From: Teemu Likonen @ 2016-02-03 15:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Nicolas Petton, monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 673 bytes --]

Eli Zaretskii [2016-02-03 17:41:06+02] wrote:

>> From: Nicolas Petton <nicolas@petton.fr>
>> Date: Wed, 03 Feb 2016 14:35:46 +0100
>> If we want to turn it off by default for the release
>
> We don't, at least not yet.  We want to collect feedback.

Here's mine: I don't want "a" and "ä" to be the same in searches, by
default. In my language (Finnish) they are different letters and
phonemes, for example: "tai" (= or) and "täi" (= a louse); "sakki" (=
gang, crowd) and "säkki" (= a sack).

This is a great feature, though.

-- 
/// Teemu Likonen   - .-..   <https://github.com/tlikonen> //
// PGP: 4E10 55DC 84E9 DFF6 13D7 8557 719D 69D3 2453 9450 ///

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 10:54   ` Yuri Khan
@ 2016-02-03 15:57     ` Filipp Gunbin
  2016-02-03 16:24       ` Drew Adams
  2016-02-03 16:52       ` Yuri Khan
  0 siblings, 2 replies; 102+ messages in thread
From: Filipp Gunbin @ 2016-02-03 15:57 UTC (permalink / raw)
  To: Yuri Khan; +Cc: Paul Eggert, Emacs developers

On 03/02/2016 16:54 +0600, Yuri Khan wrote:

> Е and Ё, on the other hand, are a holywar-inducing contention
> point.

They have their own places in the Russian alphabet.  I think
char-folding should fold only "modified" letter variants into
"canonical" form (without any modifications).

Е and Ё are just separate letters, although we don't use Ё much...

Once I "fixed" all our text resources files at work and a colleague of
mine commented in review that Ё is used only in childrens books.  I had
to revert the change.

Filipp



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 15:55         ` Teemu Likonen
@ 2016-02-03 16:16           ` Eli Zaretskii
  2016-02-06 13:41             ` Teemu Likonen
  0 siblings, 1 reply; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-03 16:16 UTC (permalink / raw)
  To: Teemu Likonen; +Cc: nicolas, monnier, emacs-devel

> From: Teemu Likonen <tlikonen@iki.fi>
> Cc: Nicolas Petton <nicolas@petton.fr>, monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Wed, 03 Feb 2016 17:55:50 +0200
> 
> > We want to collect feedback.
> 
> Here's mine: I don't want "a" and "ä" to be the same in searches, by
> default. In my language (Finnish) they are different letters and
> phonemes, for example: "tai" (= or) and "täi" (= a louse); "sakki" (=
> gang, crowd) and "säkki" (= a sack).

Thank you.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: Character folding in the pretest
  2016-02-03 15:57     ` Filipp Gunbin
@ 2016-02-03 16:24       ` Drew Adams
  2016-02-03 16:46         ` Clément Pit--Claudel
  2016-02-03 16:52       ` Yuri Khan
  1 sibling, 1 reply; 102+ messages in thread
From: Drew Adams @ 2016-02-03 16:24 UTC (permalink / raw)
  To: Filipp Gunbin, Yuri Khan; +Cc: Paul Eggert, Emacs developers

> > Е and Ё, on the other hand, are a holywar-inducing contention
> > point.
> 
> They have their own places in the Russian alphabet.  I think
> char-folding should fold only "modified" letter variants into
> "canonical" form (without any modifications).
> 
> Е and Ё are just separate letters, although we don't use Ё much...
> 
> Once I "fixed" all our text resources files at work and a colleague of
> mine commented in review that Ё is used only in childrens books.  I had
> to revert the change.

The point, IMO, is that there are multiple use cases, depending on
the user and the context (including, but not limited to, language).

What we really need are ways for _users_ to _easily_ express their
preferences, including perhaps preferences for different contexts
that they use, and including ways to express what they want on the
fly - not just ahead of time via Customize (e.g. default preferences).

That should be the _first_ order of business.  If we do a good
job of providing for that then anything additional we do
concerning DWIM or default behaviors is icing on the cake.

If we do not take care of the need to give users flexible control
then anything we do (DWIM or defaults) will be misguided for at
least some users and use cases.  It typically hurts more than helps,
IMO.

This is a general point, not limited to char folding or search.
Our priority should be to (1) yes, raise possible use cases for
discussion, such as is being done now in this thread, and (2)
come up with brilliant, easy-to-use ways to _give users control_.

Users are different, and even the same user has multiple use
cases - s?he does not want the same behavior all the time.
It is not enough to look at the user's language setting etc.
Only the user knows, at any given time, what s?he wants.

It is fine to be smart about the defaults we set, but that's
not the most important thing.  Likewise wrt coming up with
clever DWIM behavior.  But the smartest DWIM is brain dead
when compared with a live user.  And even the best default
behavior is no good for many use cases.  Users need to be
able to (easily) control the behavior.

Thinking first about defaults or DWIM is wrong, IMO.  We
should think first about how users can change the behavior,
including on the fly.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 16:24       ` Drew Adams
@ 2016-02-03 16:46         ` Clément Pit--Claudel
  2016-02-03 17:28           ` Drew Adams
  2016-02-03 18:24           ` Clément Pit--Claudel
  0 siblings, 2 replies; 102+ messages in thread
From: Clément Pit--Claudel @ 2016-02-03 16:46 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 561 bytes --]

On 02/03/2016 11:24 AM, Drew Adams wrote:
> Thinking first about defaults or DWIM is wrong, IMO.  We
> should think first about how users can change the behavior,
> including on the fly.

I don't agree. This leads to Emacs being painful to use without large amounts of customization. Do many Emacs devs use an empty or almost empty .emacs?
Customizability is a strength, but the popularity of pre-packaged Emacs configurations (prelude, Emacs starter kit, Graphene, and countless .emacs.d repositories) says something about good defaults.

Clément.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 15:57     ` Filipp Gunbin
  2016-02-03 16:24       ` Drew Adams
@ 2016-02-03 16:52       ` Yuri Khan
  1 sibling, 0 replies; 102+ messages in thread
From: Yuri Khan @ 2016-02-03 16:52 UTC (permalink / raw)
  To: Filipp Gunbin; +Cc: Paul Eggert, Emacs developers

On Wed, Feb 3, 2016 at 9:57 PM, Filipp Gunbin <fgunbin@fastmail.fm> wrote:

>> Е and Ё, on the other hand, are a holywar-inducing contention
>> point.
>
> They have their own places in the Russian alphabet.  I think
> char-folding should fold only "modified" letter variants into
> "canonical" form (without any modifications).
>
> Е and Ё are just separate letters, although we don't use Ё much...

Oh, we use it all the time. It’s just that many people habitually
write Е in place of Ё.

And this is exactly the reason why char folding becomes relevant for
this particular pair. When searching in a text by someone other, I
will want to fold so that I find occurrences where I would write Ё but
other would replace it with Е. Likewise, those other people, when
reading my text, will want to fold in order to find occurrences where
they would write Е but I would write Ё.

> Once I "fixed" all our text resources files at work and a colleague of
> mine commented in review that Ё is used only in childrens books.  I had
> to revert the change.

In this situation, you will want to not fold, so that you can search
for all instances of Ё and decide which to replace with Е. (Even when
the policy is to avoid Ё, it is still mandatory in cases of
ambiguity.)



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 15:41       ` Eli Zaretskii
  2016-02-03 15:55         ` Teemu Likonen
@ 2016-02-03 16:54         ` Clément Pit--Claudel
  2016-02-03 17:01           ` John Wiegley
  2016-02-03 17:02           ` Eli Zaretskii
  1 sibling, 2 replies; 102+ messages in thread
From: Clément Pit--Claudel @ 2016-02-03 16:54 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 832 bytes --]

On 02/03/2016 10:41 AM, Eli Zaretskii wrote:
> We don't, at least not yet.  We want to collect feedback.

I love the new behaviour:

* It makes it much nicer to search through documents written in French when I'm not using a French keyboard.
* It also makes it easier to search through emails in which some accents have been omitted (probably for the same reason as above).
* It even makes it nicer to search for my own name: it's definitely wrong to spell it “Clement”, but many websites reject “Clément” due to the accent, so I end up with emails addressed to “Clement”.

I don't read Emacs' change logs carefully enough to hear about every new feature. Disabling features that I don't like doesn't bother me; on the other hand, discovering features to enable is much harder. So I'd vote for this being on by default.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 16:54         ` Clément Pit--Claudel
@ 2016-02-03 17:01           ` John Wiegley
  2016-02-03 21:08             ` Óscar Fuentes
  2016-02-03 17:02           ` Eli Zaretskii
  1 sibling, 1 reply; 102+ messages in thread
From: John Wiegley @ 2016-02-03 17:01 UTC (permalink / raw)
  To: Clément Pit--Claudel; +Cc: emacs-devel

>>>>> Clément Pit--Claudel <clement.pit@gmail.com> writes:

> It makes it much nicer to search through documents written in French when
> I'm not using a French keyboard.

It's also nice when searching a Spanish document, where someone says "como"
and you want to search for it, but aren't sure if it was meant as a question
word (¿Cómo?) or a preposition (como).

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 16:54         ` Clément Pit--Claudel
  2016-02-03 17:01           ` John Wiegley
@ 2016-02-03 17:02           ` Eli Zaretskii
  1 sibling, 0 replies; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-03 17:02 UTC (permalink / raw)
  To: Clément Pit--Claudel; +Cc: emacs-devel

> From: Clément Pit--Claudel <clement.pit@gmail.com>
> Date: Wed, 3 Feb 2016 11:54:41 -0500
> 
> On 02/03/2016 10:41 AM, Eli Zaretskii wrote:
> > We don't, at least not yet.  We want to collect feedback.
> 
> I love the new behaviour:

Thanks for your feedback.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: Character folding in the pretest
  2016-02-03 16:46         ` Clément Pit--Claudel
@ 2016-02-03 17:28           ` Drew Adams
  2016-02-03 18:10             ` Clément Pit--Claudel
  2016-02-03 18:24           ` Clément Pit--Claudel
  1 sibling, 1 reply; 102+ messages in thread
From: Drew Adams @ 2016-02-03 17:28 UTC (permalink / raw)
  To: Clément Pit--Claudel, emacs-devel

> > Thinking first about defaults or DWIM is wrong, IMO.  We
> > should think first about how users can change the behavior,
> > including on the fly.
> 
> I don't agree. This leads to Emacs being painful to use without large
> amounts of customization. Do many Emacs devs use an empty or almost empty
> .emacs?
> Customizability is a strength, but the popularity of pre-packaged Emacs
> configurations (prelude, Emacs starter kit, Graphene, and countless .emacs.d
> repositories) says something about good defaults.

Please read what I wrote.  I do not argue that defaults are
unimportant, or that we should not choose good default behavior,
and choose it carefully.  Quite the contrary.

My point is that concentrating _first_ on the default behavior,
without considering various use cases, is a mistake.  (One reason
it is a mistake is precisely because without considering possible
use cases the default choice made is likely to not be the best one.)

I welcome the recent posts that point to different use cases.
The mere _possibility_ of char folding (treating different chars
equivalently, for some meanings of equivalence) means that there
can be, and so there will be, some very different needs and
preferences wrt which chars are to be handled as equivalent in
which contexts.  Better for us to start hearing about this at
the outset, so we have a wider vision of what this new feature
represents.

As to the popularity of starter kits:  Sure.  But the popularity of
_Emacs_ itself has a lot to do with its bendability - the fact that
different people can use it in different ways, and extend it or
customize it or change it on the fly to fit their needs.  Without
that, Emacs is not Emacs.

And in the case at hand, I feel that char folding does not yet
provide enough flexibility for users.  It provides a useful set
of foldings (equivalences) out of the box, and that's great, as a
start.  But we should make it more user-customizable.
Just one opinion.

It's not a case of one or the other: picking good defaults and
clever DWIM or providing ways for users to control the behavior.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 17:28           ` Drew Adams
@ 2016-02-03 18:10             ` Clément Pit--Claudel
  0 siblings, 0 replies; 102+ messages in thread
From: Clément Pit--Claudel @ 2016-02-03 18:10 UTC (permalink / raw)
  To: Drew Adams, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 112 bytes --]

On 02/03/2016 12:28 PM, Drew Adams wrote:
> Please read what I wrote.

Please don't assume that I didn't.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 16:46         ` Clément Pit--Claudel
  2016-02-03 17:28           ` Drew Adams
@ 2016-02-03 18:24           ` Clément Pit--Claudel
  2016-02-03 18:31             ` Drew Adams
  1 sibling, 1 reply; 102+ messages in thread
From: Clément Pit--Claudel @ 2016-02-03 18:24 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 746 bytes --]

Amusingly, the tagline for Emacs Starter Kit is “Because the Emacs defaults are not so great sometimes.”

On 02/03/2016 11:46 AM, Clément Pit--Claudel wrote:
> On 02/03/2016 11:24 AM, Drew Adams wrote:
>> Thinking first about defaults or DWIM is wrong, IMO.  We
>> should think first about how users can change the behavior,
>> including on the fly.
> 
> I don't agree. This leads to Emacs being painful to use without large amounts of customization. Do many Emacs devs use an empty or almost empty .emacs?
> Customizability is a strength, but the popularity of pre-packaged Emacs configurations (prelude, Emacs starter kit, Graphene, and countless .emacs.d repositories) says something about good defaults.
> 
> Clément.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: Character folding in the pretest
  2016-02-03 18:24           ` Clément Pit--Claudel
@ 2016-02-03 18:31             ` Drew Adams
  0 siblings, 0 replies; 102+ messages in thread
From: Drew Adams @ 2016-02-03 18:31 UTC (permalink / raw)
  To: Clément Pit--Claudel, emacs-devel

> Amusingly, the tagline for Emacs Starter Kit is “Because the Emacs defaults
> are not so great sometimes.”

Yes, a starter kit is a customization, albeit one that its creator
expects will be useful to multiple users.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 17:01           ` John Wiegley
@ 2016-02-03 21:08             ` Óscar Fuentes
  2016-02-03 22:32               ` John Wiegley
                                 ` (2 more replies)
  0 siblings, 3 replies; 102+ messages in thread
From: Óscar Fuentes @ 2016-02-03 21:08 UTC (permalink / raw)
  To: emacs-devel

John Wiegley <jwiegley@gmail.com> writes:

> It's also nice when searching a Spanish document, where someone says "como"
> and you want to search for it, but aren't sure if it was meant as a question
> word (¿Cómo?) or a preposition (como).

Furthermore, in Spanish nowadays you can't expect correct orthography,
even on supposedly educated environments. Also, involuntary typos
involving accents are common.

I like the feature very much, but I'm neutral wrt its default value. If
you ask me, as a programmer, I would say no, but as an Spaniard that
occasionally uses Emacs to write Spanish text, I'll say yes.

BTW, searching for `n' also matches `ñ', which is definitely wrong.
Those are not equivalent characters by any stretch.




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 21:08             ` Óscar Fuentes
@ 2016-02-03 22:32               ` John Wiegley
  2016-02-03 22:52                 ` Clément Pit--Claudel
  2016-02-03 23:50                 ` Sacha Chua
  2016-02-04  5:49               ` Ivan Andrus
  2016-02-04  8:40               ` Elias Mårtenson
  2 siblings, 2 replies; 102+ messages in thread
From: John Wiegley @ 2016-02-03 22:32 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: Sacha Chua, emacs-devel

>>>>> Óscar Fuentes <ofv@wanadoo.es> writes:

> BTW, searching for `n' also matches `ñ', which is definitely wrong. Those
> are not equivalent characters by any stretch.

I think a poll about this would be a good idea. There is enough contention
about having it as a default that we may prefer to wait, especially since it
does change the searching behavior that 24.x are used to.

What's the best method these days for conducting such a poll? I wonder if
these types of polls is something our community ambassador, Sacha, would be
willing to take ownership of...

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 22:32               ` John Wiegley
@ 2016-02-03 22:52                 ` Clément Pit--Claudel
  2016-02-03 23:50                 ` Sacha Chua
  1 sibling, 0 replies; 102+ messages in thread
From: Clément Pit--Claudel @ 2016-02-03 22:52 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 721 bytes --]

On 02/03/2016 05:32 PM, John Wiegley wrote:
>>>>>> Óscar Fuentes <ofv@wanadoo.es> writes:
> 
>> BTW, searching for `n' also matches `ñ', which is definitely wrong. Those
>> are not equivalent characters by any stretch.
> 
> I think a poll about this would be a good idea. There is enough contention
> about having it as a default that we may prefer to wait, especially since it
> does change the searching behavior that 24.x are used to.
> 
> What's the best method these days for conducting such a poll? I wonder if
> these types of polls is something our community ambassador, Sacha, would be
> willing to take ownership of...

I wonder whether the meta Emacs Stack Exchange would work.

Clément.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 11:08 ` Artur Malabarba
  2016-02-03 13:24   ` Stefan Monnier
  2016-02-03 15:38   ` Eli Zaretskii
@ 2016-02-03 22:53   ` Richard Stallman
  2 siblings, 0 replies; 102+ messages in thread
From: Richard Stallman @ 2016-02-03 22:53 UTC (permalink / raw)
  To: Artur Malabarba; +Cc: per, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

To get useful feedback from pretests, we need to ask the community to
respond.  Otherwise we will only hear from those who absolutely hate
it.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 22:32               ` John Wiegley
  2016-02-03 22:52                 ` Clément Pit--Claudel
@ 2016-02-03 23:50                 ` Sacha Chua
  1 sibling, 0 replies; 102+ messages in thread
From: Sacha Chua @ 2016-02-03 23:50 UTC (permalink / raw)
  To: emacs-devel; +Cc: jwiegley

John Wiegley <jwiegley@gmail.com> writes:

> I think a poll about this would be a good idea. There is enough
> contention about having it as a default that we may prefer to wait,
> especially since it does change the searching behavior that 24.x are
> used to. What's the best method these days for conducting such a poll?
> I wonder if these types of polls is something our community
> ambassador, Sacha, would be willing to take ownership of...

This approach from 2002 (
http://lists.gnu.org/archive/html/emacs-devel/2002-06/msg00170.html ) of
posting a lightly-structured e-mail-based poll so that people could
either share a quick answer or a more nuanced opinion seems to still be
a better way than, say, using a web-based multiple-choice poll.

On-list discussion seems to be slightly more useful than quick off-list
voting, although I think I can handle tallying quick votes sent to an
address off-list if needed.

Polling is a weird thing, anyway. You'll probably mostly hear from
people who feel strongly about it, so I'm not sure how representative
that will be for our user base. There are pretty good arguments on all
sides in the current thread, so I'm not sure if you'll get that much
additional information from a poll.

Still, if someone wants to draft a poll, I can help with the grunt-work
of distributing it (maybe emacs-devel, help-gnu-emacs, emacs-tangents,
Reddit, and Planet Emacsen), tallying the votes, and maybe updating a
proposal page with additional notes (maybe on EmacsWiki).

Sacha



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 21:08             ` Óscar Fuentes
  2016-02-03 22:32               ` John Wiegley
@ 2016-02-04  5:49               ` Ivan Andrus
  2016-02-04 21:30                 ` Richard Stallman
  2016-02-04  8:40               ` Elias Mårtenson
  2 siblings, 1 reply; 102+ messages in thread
From: Ivan Andrus @ 2016-02-04  5:49 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

On Feb 3, 2016, at 2:08 PM, Óscar Fuentes <ofv@wanadoo.es> wrote:
> 
> John Wiegley <jwiegley@gmail.com> writes:
> 
>> It's also nice when searching a Spanish document, where someone says "como"
>> and you want to search for it, but aren't sure if it was meant as a question
>> word (¿Cómo?) or a preposition (como).
> 
> Furthermore, in Spanish nowadays you can't expect correct orthography,
> even on supposedly educated environments. Also, involuntary typos
> involving accents are common.
> 
> I like the feature very much, but I'm neutral wrt its default value. If
> you ask me, as a programmer, I would say no, but as an Spaniard that
> occasionally uses Emacs to write Spanish text, I'll say yes.
> 
> BTW, searching for `n' also matches `ñ', which is definitely wrong.
> Those are not equivalent characters by any stretch.

Though folding b and v would be very helpful for some of the Spanish I read.  :-)

-Ivan


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 21:08             ` Óscar Fuentes
  2016-02-03 22:32               ` John Wiegley
  2016-02-04  5:49               ` Ivan Andrus
@ 2016-02-04  8:40               ` Elias Mårtenson
  2016-02-04 11:57                 ` Dirk-Jan C. Binnema
  2016-02-04 21:32                 ` Richard Stallman
  2 siblings, 2 replies; 102+ messages in thread
From: Elias Mårtenson @ 2016-02-04  8:40 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 582 bytes --]

On 4 February 2016 at 05:08, Óscar Fuentes <ofv@wanadoo.es> wrote:

BTW, searching for `n' also matches `ñ', which is definitely wrong.
> Those are not equivalent characters by any stretch.
>

What type of character equivalence should be used is locale-dependent.
Everybody here agrees with that. Thus, the solution must also be
locale-dependent.

It would make sense to have the default based on the session's locale,
meaning that in a Swedish locale a, ä and å would be different and n and ñ
be different, but under a Spanish locale, the opposite would be true.

[-- Attachment #2: Type: text/html, Size: 947 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04  8:40               ` Elias Mårtenson
@ 2016-02-04 11:57                 ` Dirk-Jan C. Binnema
  2016-02-04 15:18                   ` Drew Adams
                                     ` (2 more replies)
  2016-02-04 21:32                 ` Richard Stallman
  1 sibling, 3 replies; 102+ messages in thread
From: Dirk-Jan C. Binnema @ 2016-02-04 11:57 UTC (permalink / raw)
  To: emacs-devel


On Thursday Feb 04 2016, Elias Mårtenson wrote:

> On 4 February 2016 at 05:08, Óscar Fuentes <ofv@wanadoo.es> wrote:
>
> BTW, searching for `n' also matches `ñ', which is definitely wrong.
>> Those are not equivalent characters by any stretch.

> What type of character equivalence should be used is locale-dependent.
> Everybody here agrees with that. Thus, the solution must also be
> locale-dependent.

> It would make sense to have the default based on the session's locale,
> meaning that in a Swedish locale a, ä and å would be different and n and ñ
> be different, but under a Spanish locale, the opposite would be true.

Character equivalence is based on the language(s) of whatever is in your
buffer, which might be correlated with your locale, but not more than
that.

Regardless, for the purpose of searching, my personal preference would
be to make folding rather inclusive; I don't really care about the exact
rules languages have come up for what letters are considered "the same",
I just care for what I, as a user, would find the easiest to match.

So for instance, I'd like "angstrom" to match "Ångström" even though in
Swedish, a/Å and o/ö are not the same. Somewhat similar to how
languages' capitalization rules are ignored when searching
case-insensitively. A few false positives are not much of problem.

That would also get my vote as a reasonable default for case-folding in
searches. But I'll happily take any default, as long as there's a way to
get the above behavior, preferably without having to change my locale.

Kind regards,
Dirk.

-- 
Dirk-Jan C. Binnema                  Helsinki, Finland
e:djcb@djcbsoftware.nl           w:www.djcbsoftware.nl
pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C



^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: Character folding in the pretest
  2016-02-04 11:57                 ` Dirk-Jan C. Binnema
@ 2016-02-04 15:18                   ` Drew Adams
  2016-02-04 15:59                     ` Óscar Fuentes
  2016-02-04 23:05                     ` Artur Malabarba
  2016-02-04 16:54                   ` Eli Zaretskii
  2016-02-04 17:26                   ` Teemu Likonen
  2 siblings, 2 replies; 102+ messages in thread
From: Drew Adams @ 2016-02-04 15:18 UTC (permalink / raw)
  To: Dirk-Jan C. Binnema, emacs-devel

> > It would make sense to have the default based on the session's locale,
> > meaning that in a Swedish locale a, ä and å would be different and n and ñ
> > be different, but under a Spanish locale, the opposite would be true.
> 
> Character equivalence is based on the language(s) of whatever is in your
> buffer, which might be correlated with your locale, but not more than
> that.
> 
> Regardless, for the purpose of searching, my personal preference would
> be to make folding rather inclusive; I don't really care about the exact
> rules languages have come up for what letters are considered "the same",
> I just care for what I, as a user, would find the easiest to match.
> 
> So for instance, I'd like "angstrom" to match "Ångström" even though in
> Swedish, a/Å and o/ö are not the same. Somewhat similar to how
> languages' capitalization rules are ignored when searching
> case-insensitively. A few false positives are not much of problem.
> 
> That would also get my vote as a reasonable default for case-folding in
> searches. But I'll happily take any default, as long as there's a way to
> get the above behavior, preferably without having to change my locale.

Both of these posts (one saying that it should be possible to take
locale into account, perhaps even for default behavior; the other
adding that someone might have a personal preference) point to the
existence of multiple use cases and users needing to be able to
(easily) control the behavior.

We can fine-tune defaulting at design time, to try to provide a
reasonable behavior for most use cases/contexts, but users still
need to be able to easily customize the sets of equivalence classes,
and they should be able to have multiple sets of such sets, which
they can activate in different contexts (e.g. modes).

That is really where the design effort should be, at this point.
We have a basic char-folding mechanism, but we do not yet provide
an easy way for a user to customize the behavior, let alone to
define/get the various behaviors that s?he might want in different
contexts.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 15:18                   ` Drew Adams
@ 2016-02-04 15:59                     ` Óscar Fuentes
  2016-02-04 16:36                       ` Clément Pit--Claudel
  2016-02-04 17:07                       ` Eli Zaretskii
  2016-02-04 23:05                     ` Artur Malabarba
  1 sibling, 2 replies; 102+ messages in thread
From: Óscar Fuentes @ 2016-02-04 15:59 UTC (permalink / raw)
  To: emacs-devel

Drew Adams <drew.adams@oracle.com> writes:

[snip]

> That is really where the design effort should be, at this point.
> We have a basic char-folding mechanism, but we do not yet provide
> an easy way for a user to customize the behavior, let alone to
> define/get the various behaviors that s?he might want in different
> contexts.

Allowing the user to configure the feature is good, but the defaults
should be usable. After seeing the case I mentioned (`n' matching `ñ' in
Spanish text) it is obvious that the feature is not ready for prime
time.




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 15:59                     ` Óscar Fuentes
@ 2016-02-04 16:36                       ` Clément Pit--Claudel
  2016-02-04 16:47                         ` Óscar Fuentes
  2016-02-04 20:23                         ` John Wiegley
  2016-02-04 17:07                       ` Eli Zaretskii
  1 sibling, 2 replies; 102+ messages in thread
From: Clément Pit--Claudel @ 2016-02-04 16:36 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1415 bytes --]

On 02/04/2016 10:59 AM, Óscar Fuentes wrote:
> After seeing the case I mentioned (`n' matching `ñ' in
> Spanish text) it is obvious that the feature is not ready for prime
> time.

This is interesting. I guess it boils down to whether you're trying to avoid false positives or false negatives. For me the strength of this feature is that it lets me find virtually anything using an dumb keyboard (one without easy access to accents); I don't care too much about false positives (that is, I don't mind if ‘n’ finds ‘ñ’). In that sense, it doesn't matter if letters "are different"; all that matters is whether they look different. I imagine that's why the Unicode standard defined things that way. It seems this behavior is consistent with that of most online search engines (I tried Google, Bing, and DuckDuckGo; all return accented matches for unaccented keywords).

I'm wary of smart solutions based on locale or buffer language. It's not uncommon to be writing a single document in multiple languages; especially if names are involved. Plus, it's not obvious that a single set of settings is enough for each locale. For example, one could argue that folding accents makes no sense in French: ‘supprimé’ means ‘removed’, but ‘supprime’ means ‘removes’. Yet it is not uncommon for people to write the latter for the former, especially when using a dumb keyboard.

Clément.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 16:36                       ` Clément Pit--Claudel
@ 2016-02-04 16:47                         ` Óscar Fuentes
  2016-02-04 17:05                           ` Werner LEMBERG
                                             ` (2 more replies)
  2016-02-04 20:23                         ` John Wiegley
  1 sibling, 3 replies; 102+ messages in thread
From: Óscar Fuentes @ 2016-02-04 16:47 UTC (permalink / raw)
  To: emacs-devel

Clément Pit--Claudel <clement.pit@gmail.com> writes:

> On 02/04/2016 10:59 AM, Óscar Fuentes wrote:
>> After seeing the case I mentioned (`n' matching `ñ' in
>> Spanish text) it is obvious that the feature is not ready for prime
>> time.
>
> This is interesting. I guess it boils down to whether you're trying to
> avoid false positives or false negatives. For me the strength of this
> feature is that it lets me find virtually anything using an dumb
> keyboard (one without easy access to accents); I don't care too much
> about false positives (that is, I don't mind if ‘n’ finds ‘ñ’). In
> that sense, it doesn't matter if letters "are different"; all that
> matters is whether they look different. I imagine that's why the
> Unicode standard defined things that way. It seems this behavior is
> consistent with that of most online search engines (I tried Google,
> Bing, and DuckDuckGo; all return accented matches for unaccented
> keywords).

I see your point, but you are talking about accents all the time. In
Spanish `n' and `ñ' are different letters. `n' matching `ñ' is no
different than `p' matching `q'. I think that you will agree that some
of us will see that behavior as a glaring bug.

> I'm wary of smart solutions based on locale or buffer language. It's
> not uncommon to be writing a single document in multiple languages;
> especially if names are involved. Plus, it's not obvious that a single
> set of settings is enough for each locale. For example, one could
> argue that folding accents makes no sense in French: ‘supprimé’ means
> ‘removed’, but ‘supprime’ means ‘removes’. Yet it is not uncommon for
> people to write the latter for the former, especially when using a
> dumb keyboard.

I'm not sure how to fix this, but seeing similar reservations from other
users, some language-dependent behavior is unavoidable.




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 11:57                 ` Dirk-Jan C. Binnema
  2016-02-04 15:18                   ` Drew Adams
@ 2016-02-04 16:54                   ` Eli Zaretskii
  2016-02-04 17:36                     ` Paul Eggert
  2016-02-04 17:26                   ` Teemu Likonen
  2 siblings, 1 reply; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-04 16:54 UTC (permalink / raw)
  To: Dirk-Jan C. Binnema; +Cc: emacs-devel

> From: "Dirk-Jan C. Binnema" <djcb@djcbsoftware.nl>
> Date: Thu, 04 Feb 2016 13:57:36 +0200
> 
> > What type of character equivalence should be used is locale-dependent.
> > Everybody here agrees with that. Thus, the solution must also be
> > locale-dependent.
> 
> > It would make sense to have the default based on the session's locale,
> > meaning that in a Swedish locale a, ä and å would be different and n and ñ
> > be different, but under a Spanish locale, the opposite would be true.
> 
> Character equivalence is based on the language(s) of whatever is in your
> buffer, which might be correlated with your locale, but not more than
> that.

Indeed.  Emacs is a multilingual environment, so any assumption that
the main language in every buffer, or even in most buffers, is likely
to be the locale's language will misfire.

Also, Emacs has features that need match characters which didn't come
from human-readable text at all, like file names.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 16:47                         ` Óscar Fuentes
@ 2016-02-04 17:05                           ` Werner LEMBERG
  2016-02-05  5:09                             ` Elias Mårtenson
  2016-02-04 17:12                           ` Eli Zaretskii
  2016-02-04 17:27                           ` Clément Pit--Claudel
  2 siblings, 1 reply; 102+ messages in thread
From: Werner LEMBERG @ 2016-02-04 17:05 UTC (permalink / raw)
  To: ofv; +Cc: emacs-devel

>> This is interesting. I guess it boils down to whether you're trying
>> to avoid false positives or false negatives.  For me the strength
>> of this feature is that it lets me find virtually anything using an
>> dumb keyboard (one without easy access to accents); I don't care
>> too much about false positives (that is, I don't mind if ‘n’ finds
>> ‘ñ’).  In that sense, it doesn't matter if letters "are different";
>> all that matters is whether they look different.  I imagine that's
>> why the Unicode standard defined things that way.  It seems this
>> behavior is consistent with that of most online search engines (I
>> tried Google, Bing, and DuckDuckGo; all return accented matches for
>> unaccented keywords).
> 
> I see your point, but you are talking about accents all the time.
> In Spanish `n' and `ñ' are different letters.  `n' matching `ñ' is
> no different than `p' matching `q'.  I think that you will agree
> that some of us will see that behavior as a glaring bug.

This naturally leads to a possible user option: Having `optical'
matches or not, where `optical' means `base character plus diacritic
and/or slight modifications', e.g., o → ø → ö etc., etc.


    Werner

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 15:59                     ` Óscar Fuentes
  2016-02-04 16:36                       ` Clément Pit--Claudel
@ 2016-02-04 17:07                       ` Eli Zaretskii
  2016-02-04 17:31                         ` Clément Pit--Claudel
  1 sibling, 1 reply; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-04 17:07 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

> From: Óscar Fuentes <ofv@wanadoo.es>
> Date: Thu, 04 Feb 2016 16:59:18 +0100
> 
> After seeing the case I mentioned (`n' matching `ñ' in Spanish text)
> it is obvious that the feature is not ready for prime time.

The feature was _designed_ to do this, so it simply works as designed.
It can be turned off if you don't like the results, but saying it
isn't ready based on that is IMO inaccurate, if not incorrect.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 16:47                         ` Óscar Fuentes
  2016-02-04 17:05                           ` Werner LEMBERG
@ 2016-02-04 17:12                           ` Eli Zaretskii
  2016-02-04 19:35                             ` Óscar Fuentes
  2016-02-04 17:27                           ` Clément Pit--Claudel
  2 siblings, 1 reply; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-04 17:12 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

> From: Óscar Fuentes <ofv@wanadoo.es>
> Date: Thu, 04 Feb 2016 17:47:54 +0100
> 
> I see your point, but you are talking about accents all the time. In
> Spanish `n' and `ñ' are different letters. `n' matching `ñ' is no
> different than `p' matching `q'.

Unicode disagrees:

  M-: (get-char-code-property ?ñ 'decomposition) RET

   => (110 771)

110 is 'n' and 771 is U+0303 NON-SPACING TILDE, a combining accent.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 11:57                 ` Dirk-Jan C. Binnema
  2016-02-04 15:18                   ` Drew Adams
  2016-02-04 16:54                   ` Eli Zaretskii
@ 2016-02-04 17:26                   ` Teemu Likonen
  2016-02-05  8:08                     ` Adrian.B.Robert
  2 siblings, 1 reply; 102+ messages in thread
From: Teemu Likonen @ 2016-02-04 17:26 UTC (permalink / raw)
  To: Dirk-Jan C. Binnema; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 986 bytes --]

Dirk-Jan C. Binnema [2016-02-04 13:57:36+02] wrote:

> Regardless, for the purpose of searching, my personal preference would
> be to make folding rather inclusive; I don't really care about the
> exact rules languages have come up for what letters are considered
> "the same", I just care for what I, as a user, would find the easiest
> to match.

> That would also get my vote as a reasonable default for case-folding
> in searches. But I'll happily take any default, as long as there's a
> way to get the above behavior, preferably without having to change my
> locale.

I think that just a global setting and easy switch like M-s <something>
in isearch prompt is enough. I fear that any locale or language based
magic or intelligence is over-engineering and may cause annoying
surprises. Unexpected intelligence can be harmful too.

-- 
/// Teemu Likonen   - .-..   <https://github.com/tlikonen> //
// PGP: 4E10 55DC 84E9 DFF6 13D7 8557 719D 69D3 2453 9450 ///

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 16:47                         ` Óscar Fuentes
  2016-02-04 17:05                           ` Werner LEMBERG
  2016-02-04 17:12                           ` Eli Zaretskii
@ 2016-02-04 17:27                           ` Clément Pit--Claudel
  2016-02-04 17:34                             ` Eli Zaretskii
                                               ` (2 more replies)
  2 siblings, 3 replies; 102+ messages in thread
From: Clément Pit--Claudel @ 2016-02-04 17:27 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 3215 bytes --]

On 02/04/2016 11:47 AM, Óscar Fuentes wrote:
> Clément Pit--Claudel <clement.pit@gmail.com> writes:
> 
>> On 02/04/2016 10:59 AM, Óscar Fuentes wrote:
>>> After seeing the case I mentioned (`n' matching `ñ' in Spanish
>>> text) it is obvious that the feature is not ready for prime 
>>> time.
>> 
>> This is interesting. I guess it boils down to whether you're trying
>> to avoid false positives or false negatives. For me the strength of
>> this feature is that it lets me find virtually anything using an
>> dumb keyboard (one without easy access to accents); I don't care
>> too much about false positives (that is, I don't mind if ‘n’ finds
>> ‘ñ’). In that sense, it doesn't matter if letters "are different";
>> all that matters is whether they look different. I imagine that's
>> why the Unicode standard defined things that way. It seems this
>> behavior is consistent with that of most online search engines (I
>> tried Google, Bing, and DuckDuckGo; all return accented matches for
>> unaccented keywords).
> 
> I see your point, but you are talking about accents all the time. In 
> Spanish `n' and `ñ' are different letters. `n' matching `ñ' is no 
> different than `p' matching `q'. I think that you will agree that
> some of us will see that behavior as a glaring bug.

I should have said diacritics instead of accents; sorry. The difference between n matching ñ and p matching q is that graphically, ñ is n + ~ (it can also be encoded that way: ̃n). 

Here's another issue that character folding solves; Id like your thoughts on it. Try to search the text of my message for 'n' and 'ñ', without any sort of character folding.

This will match n but not ñ: ̃n.
This will match ñ but not n: ñ.

Note that the behaviour has nothing to do with Emacs; most applications will behave the same. The first ñ is using n + combining tilde, while the second is a single character ñ. Both are legal representation of the Spanish letter ñ. With character folding, both match 'n'. This is a much more logical default, I think. The same thing can be said for virtually every diacritic.

On a more personal note, I wouldn't see the character folding behaviour as a bug for French, where ç is quite different from c, and é is quite different from e.

>> I'm wary of smart solutions based on locale or buffer language.
>> It's not uncommon to be writing a single document in multiple
>> languages; especially if names are involved. Plus, it's not obvious
>> that a single set of settings is enough for each locale. For
>> example, one could argue that folding accents makes no sense in
>> French: ‘supprimé’ means ‘removed’, but ‘supprime’ means ‘removes’.
>> Yet it is not uncommon for people to write the latter for the
>> former, especially when using a dumb keyboard.
> 
> I'm not sure how to fix this, but seeing similar reservations from
> other users, some language-dependent behavior is unavoidable.

I don't think so. An on-off switch seems enough to begin with. Language-dependent folding could to be a separate feature; unicode folding (the curretn implementation) would be a fine feature to start with, I think.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 17:07                       ` Eli Zaretskii
@ 2016-02-04 17:31                         ` Clément Pit--Claudel
  0 siblings, 0 replies; 102+ messages in thread
From: Clément Pit--Claudel @ 2016-02-04 17:31 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1117 bytes --]

On 02/04/2016 12:07 PM, Eli Zaretskii wrote:
>> From: Óscar Fuentes <ofv@wanadoo.es>
>> Date: Thu, 04 Feb 2016 16:59:18 +0100
>>
>> After seeing the case I mentioned (`n' matching `ñ' in Spanish text)
>> it is obvious that the feature is not ready for prime time.
> 
> The feature was _designed_ to do this, so it simply works as designed.
> It can be turned off if you don't like the results, but saying it
> isn't ready based on that is IMO inaccurate, if not incorrect.

I agree. Maybe we're just discussing two different features? 

* One is unicode standard character folding; it's implemented, it works as designed, it has very clear semantics based on a recognized standard, but we're not sure if it should be enabled by default (I'd vote yes).

The other is language-dependent character folding; it isn't implemented (though some people think it could reuse some of the architecture used for unicode folding), it doesn't have clear semantics (it's a matter of user-preference, though we might be able to come up with good defaults), and many people would love such a feature.

Clément.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 17:27                           ` Clément Pit--Claudel
@ 2016-02-04 17:34                             ` Eli Zaretskii
  2016-02-04 18:18                             ` Yuri Khan
  2016-02-04 19:46                             ` Óscar Fuentes
  2 siblings, 0 replies; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-04 17:34 UTC (permalink / raw)
  To: Clément Pit--Claudel; +Cc: emacs-devel

> From: Clément Pit--Claudel <clement.pit@gmail.com>
> Date: Thu, 4 Feb 2016 12:27:49 -0500
> 
> > I'm not sure how to fix this, but seeing similar reservations from
> > other users, some language-dependent behavior is unavoidable.
> 
> I don't think so. An on-off switch seems enough to begin with. Language-dependent folding could to be a separate feature; unicode folding (the curretn implementation) would be a fine feature to start with, I think.

That's the idea, indeed: the feature currently provides
language-independent lax matching; language-dependent variations
should follow, once we (a) figure out how to know _the_ language at
any given place, and (b) acquire a database of those variations.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 16:54                   ` Eli Zaretskii
@ 2016-02-04 17:36                     ` Paul Eggert
  2016-02-04 17:45                       ` Eli Zaretskii
  0 siblings, 1 reply; 102+ messages in thread
From: Paul Eggert @ 2016-02-04 17:36 UTC (permalink / raw)
  To: Eli Zaretskii, Dirk-Jan C. Binnema; +Cc: emacs-devel

On 02/04/2016 08:54 AM, Eli Zaretskii wrote:
> Emacs is a multilingual environment, so any assumption that
> the main language in every buffer, or even in most buffers, is likely
> to be the locale's language will misfire.

True, but although Emacs is designed to be language-agnostic when 
handling buffer text, that doesn't mean it should be designed to be 
language-agnostic when handling user input. If Emacs starts up in a 
language-X locale, its user probably will be more comfortable using 
language-X rules for searching, even if the main language in a buffer is 
language Y. As an English-speaker when I search Swedish texts by hand, I 
normally want to use English-like rules because English is what I know 
and I can't really read the Swedish anyway. In English we tend to 
consider accents unimportant when searching, and because we treat 
“naïve” like “naive” we also treat “Ångström” like “Angstrom” even 
though the latter is not correct in Swedish.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 17:36                     ` Paul Eggert
@ 2016-02-04 17:45                       ` Eli Zaretskii
  2016-02-04 19:25                         ` Paul Eggert
  0 siblings, 1 reply; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-04 17:45 UTC (permalink / raw)
  To: Paul Eggert; +Cc: djcb, emacs-devel

> Cc: emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Thu, 4 Feb 2016 09:36:47 -0800
> 
> On 02/04/2016 08:54 AM, Eli Zaretskii wrote:
> > Emacs is a multilingual environment, so any assumption that
> > the main language in every buffer, or even in most buffers, is likely
> > to be the locale's language will misfire.
> 
> True, but although Emacs is designed to be language-agnostic when 
> handling buffer text, that doesn't mean it should be designed to be 
> language-agnostic when handling user input.

The user input in this case is a search string.  A search string is
likely to use the language of the text being searched, not the
language of the user's locale.  E.g., when I search Cyrillic text, I
will hardly ever use Hebrew, my locale language.

> As an English-speaker when I search Swedish texts by hand, I
> normally want to use English-like rules because English is what I
> know and I can't really read the Swedish anyway.

I'm not sure this is the use case we should cater to.  We should
instead cater to users who search text they _can_ read.

> In English we tend to consider accents unimportant when searching

Amazingly enough, Unicode advises the same.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 17:27                           ` Clément Pit--Claudel
  2016-02-04 17:34                             ` Eli Zaretskii
@ 2016-02-04 18:18                             ` Yuri Khan
  2016-02-04 19:46                             ` Óscar Fuentes
  2 siblings, 0 replies; 102+ messages in thread
From: Yuri Khan @ 2016-02-04 18:18 UTC (permalink / raw)
  To: Clément Pit--Claudel; +Cc: Emacs developers

On Thu, Feb 4, 2016 at 11:27 PM, Clément Pit--Claudel
<clement.pit@gmail.com> wrote:

> I should have said diacritics instead of accents; sorry. The difference between n matching ñ and p matching q is that graphically, ñ is n + ~ (it can also be encoded that way: ̃n).

This last example is wrong. Combining diacritics always affect the
preceding character, not the following. In your example, the tilde is
rendered over the space preceding n.

If you see the tilde over n, this indicates a bug in the font you are
using. (It is fairly common.)



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 17:45                       ` Eli Zaretskii
@ 2016-02-04 19:25                         ` Paul Eggert
  2016-02-04 19:36                           ` Eli Zaretskii
  0 siblings, 1 reply; 102+ messages in thread
From: Paul Eggert @ 2016-02-04 19:25 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: djcb, emacs-devel

On 02/04/2016 09:45 AM, Eli Zaretskii wrote:
> We should instead cater to users who search text they_can_  read.

This depends on what one means by "read". I can "read" Swedish in the 
sense that I know where the word boundaries are and have some idea of 
how they're pronounced. I can also "read" Belarusian in the sense that I 
know Cyrillic and a bit of Russian and can follow Belarusian better than 
Swedish, though I easily get lost. In both cases, I'd prefer 
Unicode-type case folding even though it's "wrong" to ignore diacritics 
in the native languages.

Conversely, I can't "read" Hebrew or Chinese or Arabic in the same sense 
and so don't much care how folding works for those language. Perhaps 
some Hebrew-speaking experts want פּ and פ and ף to be treated the same 
while searching, while other experts do not; it doesn't matter to me.

To help provide context here, most of my reading of non-English text is 
to support other free projects such as the tz database. That database is 
mostly English but contains short passages from other languages. I use 
Emacs for primary database maintenance, but often use other programs to 
browse the Internet as they're more convenient. I'll cut and paste out 
of a Firefox browser between a page of interest and Google Translate, 
for example. Examples of text under Emacs control include "Bahía", "Lịch 
hai thế kỷ", "中国科技史料", and "Новый счет времени". Most of the 
searching for this sort of thing in Emacs will involve typing strings 
like "bahia" and "lich" where I almost always prefer diacritic- and 
case-folded search.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 17:12                           ` Eli Zaretskii
@ 2016-02-04 19:35                             ` Óscar Fuentes
  2016-02-04 19:52                               ` Clément Pit--Claudel
  2016-02-04 20:05                               ` Eli Zaretskii
  0 siblings, 2 replies; 102+ messages in thread
From: Óscar Fuentes @ 2016-02-04 19:35 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> I see your point, but you are talking about accents all the time. In
>> Spanish `n' and `ñ' are different letters. `n' matching `ñ' is no
>> different than `p' matching `q'.
>
> Unicode disagrees:
>
>   M-: (get-char-code-property ?ñ 'decomposition) RET
>
>    => (110 771)
>
> 110 is 'n' and 771 is U+0303 NON-SPACING TILDE, a combining accent.

AFAIK Unicode doesn't mandate what the Spanish alphabet is.

I thought that the point of the feature was to provide searching with
support for character equivalence classes, which is very useful for the
case of Spanish (and other languages, I'm sure). But you are saying that
the feature is about how the characters are encoded by the computer and
not about how they are used by people. If that is true, it should be
disabled by default.




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 19:25                         ` Paul Eggert
@ 2016-02-04 19:36                           ` Eli Zaretskii
  0 siblings, 0 replies; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-04 19:36 UTC (permalink / raw)
  To: Paul Eggert; +Cc: djcb, emacs-devel

> Cc: djcb@djcbsoftware.nl, emacs-devel@gnu.org
> From: Paul Eggert <eggert@cs.ucla.edu>
> Date: Thu, 4 Feb 2016 11:25:41 -0800
> 
> On 02/04/2016 09:45 AM, Eli Zaretskii wrote:
> > We should instead cater to users who search text they_can_  read.
> 
> This depends on what one means by "read". I can "read" Swedish in the 
> sense that I know where the word boundaries are and have some idea of 
> how they're pronounced. I can also "read" Belarusian in the sense that I 
> know Cyrillic and a bit of Russian and can follow Belarusian better than 
> Swedish, though I easily get lost. In both cases, I'd prefer 
> Unicode-type case folding even though it's "wrong" to ignore diacritics 
> in the native languages.

Then the current defaults are definitely for you, I think.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 17:27                           ` Clément Pit--Claudel
  2016-02-04 17:34                             ` Eli Zaretskii
  2016-02-04 18:18                             ` Yuri Khan
@ 2016-02-04 19:46                             ` Óscar Fuentes
  2016-02-04 20:06                               ` Clément Pit--Claudel
  2016-02-04 20:07                               ` Eli Zaretskii
  2 siblings, 2 replies; 102+ messages in thread
From: Óscar Fuentes @ 2016-02-04 19:46 UTC (permalink / raw)
  To: emacs-devel

Clément Pit--Claudel <clement.pit@gmail.com> writes:

[snip]

It seems that the feature is not geared towards natural language, but
for the cases where the user cares about how the character is composed.
As mentioned on my answer to Eli, this feature should default to off.

Your use case is not typical and is based on usage circunstances
(writing French with a US keyboard), personal opinions about what is
admisible or factors depending on your language (maybe French has no a
similar case of Spanish n/ñ), so I think that it is not convincing
enough to change my POV about the default status of the feature.




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 19:35                             ` Óscar Fuentes
@ 2016-02-04 19:52                               ` Clément Pit--Claudel
  2016-02-04 20:05                               ` Eli Zaretskii
  1 sibling, 0 replies; 102+ messages in thread
From: Clément Pit--Claudel @ 2016-02-04 19:52 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2034 bytes --]

On 02/04/2016 02:35 PM, Óscar Fuentes wrote:
> Eli Zaretskii <eliz@gnu.org> writes:
> 
>>> I see your point, but you are talking about accents all the time. In
>>> Spanish `n' and `ñ' are different letters. `n' matching `ñ' is no
>>> different than `p' matching `q'.
>>
>> Unicode disagrees:
>>
>>   M-: (get-char-code-property ?ñ 'decomposition) RET
>>
>>    => (110 771)
>>
>> 110 is 'n' and 771 is U+0303 NON-SPACING TILDE, a combining accent.
> 
> AFAIK Unicode doesn't mandate what the Spanish alphabet is.
> 
> I thought that the point of the feature was to provide searching with
> support for character equivalence classes, which is very useful for the
> case of Spanish (and other languages, I'm sure). But you are saying that
> the feature is about how the characters are encoded by the computer and
> not about how they are used by people. If that is true, it should be
> disabled by default.

Why? This feature is simply folding as specified by the Unicode standard. Hopefully the way it is implemented will indeed lend itself to future extensions; using it for user-defined classes of substitutions would be nice. But I don't understand why the possibility of fancier (though less clearly defined) folding should disqualify this feature from becoming the default.

Also, it's not easy (I'd guess not possible) to give any sort of precise meaning to ‘how characters are used by people’. I still find this simple character folding quite useful; I just accept that it's visual folding, not semantic folding (and this list is well aware of the difficulties that arise when one tries to assign semantic meaning to characters; cf. the ‘’ vs `' debate). The semantics of this simple folding are as uncontroversial as can be; we're following an established standard. Maybe there's a better behaved notion of folding out there, but I'm not sure why its existence is relevant to the choice of a default, since we don't have an implementation (nor a spec) for that alternative.

Clément.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 19:35                             ` Óscar Fuentes
  2016-02-04 19:52                               ` Clément Pit--Claudel
@ 2016-02-04 20:05                               ` Eli Zaretskii
  1 sibling, 0 replies; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-04 20:05 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

> From: Óscar Fuentes <ofv@wanadoo.es>
> Date: Thu, 04 Feb 2016 20:35:53 +0100
> 
> >   M-: (get-char-code-property ?ñ 'decomposition) RET
> >
> >    => (110 771)
> >
> > 110 is 'n' and 771 is U+0303 NON-SPACING TILDE, a combining accent.
> 
> AFAIK Unicode doesn't mandate what the Spanish alphabet is.

I didn't say it did.

> I thought that the point of the feature was to provide searching with
> support for character equivalence classes

It is.

> But you are saying that the feature is about how the characters are
> encoded by the computer and not about how they are used by
> people. If that is true, it should be disabled by default.

But it isn't true.  This has (almost) nothing to do with encoding,
get-char-code-property accesses properties, not encodings.

Perhaps you aren't familiar with Unicode equivalence, in which case I
suggest these sources:

  http://unicode.org/reports/tr10/#Searching
  http://www.unicode.org/notes/tn5/
  http://www.unicode.org/reports/tr30/tr30-4.html



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 19:46                             ` Óscar Fuentes
@ 2016-02-04 20:06                               ` Clément Pit--Claudel
  2016-02-04 20:40                                 ` Óscar Fuentes
  2016-02-04 20:07                               ` Eli Zaretskii
  1 sibling, 1 reply; 102+ messages in thread
From: Clément Pit--Claudel @ 2016-02-04 20:06 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1139 bytes --]

On 02/04/2016 02:46 PM, Óscar Fuentes wrote:
> Your use case is not typical and is based on usage circunstances
> (writing French with a US keyboard), personal opinions about what is
> admisible or factors depending on your language (maybe French has no a
> similar case of Spanish n/ñ)

My name is a good example in French. Clément and Clement are not pronounced the same at all. I gave other examples in other messages.
My writing French with an american keyboard has nothing to do with this feature; we're talking about searching, not input methods.

> so I think that it is not convincing
> enough to change my POV about the default status of the feature.

I was not trying to change your POV; mostly to understand it. I think you've described a use case that is not covered by the current implementation (you want character folding to be smart, and to recognize whether the user knows that ñ and n are more different than á and a before folding deciding whether to fold ñ into n). But why should your use case not being covered by the current implementation prevent that implementation from becoming the default?


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 19:46                             ` Óscar Fuentes
  2016-02-04 20:06                               ` Clément Pit--Claudel
@ 2016-02-04 20:07                               ` Eli Zaretskii
  2016-02-04 20:52                                 ` Óscar Fuentes
  1 sibling, 1 reply; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-04 20:07 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

> From: Óscar Fuentes <ofv@wanadoo.es>
> Date: Thu, 04 Feb 2016 20:46:20 +0100
> 
> It seems that the feature is not geared towards natural language, but
> for the cases where the user cares about how the character is composed.

You misunderstood.  Decomposition is just a tool that is used to
search for equivalent character sequences.

> As mentioned on my answer to Eli, this feature should default to off.

AFAIU, that opinion is based on misunderstanding of what the feature
is supposed to do and support.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 16:36                       ` Clément Pit--Claudel
  2016-02-04 16:47                         ` Óscar Fuentes
@ 2016-02-04 20:23                         ` John Wiegley
  1 sibling, 0 replies; 102+ messages in thread
From: John Wiegley @ 2016-02-04 20:23 UTC (permalink / raw)
  To: Clément Pit--Claudel; +Cc: emacs-devel

>>>>> Clément Pit--Claudel <clement.pit@gmail.com> writes:

> For me the strength of this feature is that it lets me find virtually
> anything using an dumb keyboard (one without easy access to accents); I
> don't care too much about false positives (that is, I don't mind if ‘n’
> finds ‘ñ’).

Going beyond natural languages, there have been a few times when I've wanted
to search for equivalence expressions in an Agda file, for example, but really
I want it to match against anything similar, so typing "x = y", I'd like it to
find occurrences using ≈ ≅ ≃ ≡ =, etc..

This sort of lax searching is like taking a "quotient" of your buffer based on
the equivalence classes you're interested in, and then searching against that
version of the buffer. And there many quotients to be taken, for many reasons.

A locale-based quotient for natural language text seems like a reasonable
default, unless pretesting/polling shows us otherwise. However, there will
always be times when you don't want it, or you want a different quotient
altogether, or even various combinations of them.

-- 
John Wiegley                  GPG fingerprint = 4710 CF98 AF9B 327B B80F
http://newartisans.com                          60E1 46C4 BD1A 7AC1 4BA2



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 20:06                               ` Clément Pit--Claudel
@ 2016-02-04 20:40                                 ` Óscar Fuentes
  2016-02-04 20:56                                   ` Clément Pit--Claudel
  0 siblings, 1 reply; 102+ messages in thread
From: Óscar Fuentes @ 2016-02-04 20:40 UTC (permalink / raw)
  To: emacs-devel

Clément Pit--Claudel <clement.pit@gmail.com> writes:

> My name is a good example in French. Clément and Clement are not
> pronounced the same at all. I gave other examples in other messages.

Sure, there are plenty of similar cases in Spanish. Every Spaniard knows
that "canto" and "cantó" are different words and, most likely, will be
not too upset or even happy while seeing isearch locating "cantó" when
searching for "canto". But the same doesn't apply to n/ñ. If a Spaniard
inputs "sana" on a search box and "saña" is found, he will regard the
software as either buggy, dumb or completely oblivious to Spanish
culture.

I'm unable to make isearch-query-replace work (it gives me
"isearch-query-replace: Wrong type argument: stringp, nil") but if the
replaced elements are the same that gets found with Isearch, the n/ñ
thing can produce lots of hilarious (or embarrassing) anecdotes :-)

> I was not trying to change your POV; mostly to understand it. I think
> you've described a use case that is not covered by the current
> implementation (you want character folding to be smart, and to
> recognize whether the user knows that ñ and n are more different than
> á and a before folding deciding whether to fold ñ into n). But why
> should your use case not being covered by the current implementation
> prevent that implementation from becoming the default?

We are talking about isearch here, the most basic and accessible way of
text searching on Emacs. Introducing a change on how it works with the
consequence of creating an "it is not a bug, it is a feature" experience
for a fair chunk of the world's population seems like something that
should give us pause.

Personally, I'm fine with disabling the feature on my setup, but I'll
advise against setting defaults that appeals to users who see foreign
characters as glyphs instead of thinking on the users who actually see
meaning on those characters.




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 20:07                               ` Eli Zaretskii
@ 2016-02-04 20:52                                 ` Óscar Fuentes
  2016-02-04 20:59                                   ` Clément Pit--Claudel
  2016-02-04 21:08                                   ` Eli Zaretskii
  0 siblings, 2 replies; 102+ messages in thread
From: Óscar Fuentes @ 2016-02-04 20:52 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Óscar Fuentes <ofv@wanadoo.es>
>> Date: Thu, 04 Feb 2016 20:46:20 +0100
>> 
>> It seems that the feature is not geared towards natural language, but
>> for the cases where the user cares about how the character is composed.
>
> You misunderstood.  Decomposition is just a tool that is used to
> search for equivalent character sequences.

Equivalent in the Unicode sense, right?

>> As mentioned on my answer to Eli, this feature should default to off.
>
> AFAIU, that opinion is based on misunderstanding of what the feature
> is supposed to do and support.

If my understanding is correct now (the feature is some Unicode thing
and not about how characters are used by people) I insist on defaulting
to off, unless we renounce to make Emacs amenable to those who use a
text editor for natural languages.




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 20:40                                 ` Óscar Fuentes
@ 2016-02-04 20:56                                   ` Clément Pit--Claudel
  2016-02-04 21:16                                     ` Óscar Fuentes
  0 siblings, 1 reply; 102+ messages in thread
From: Clément Pit--Claudel @ 2016-02-04 20:56 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 660 bytes --]

On 02/04/2016 03:40 PM, Óscar Fuentes wrote:
> If a Spaniard inputs "sana" on a search box and "saña" is found, he
> will regard the software as either buggy, dumb or completely
> oblivious to Spanish culture.

Is that true? Here are Google.es results for "sana"; Google seems to be happy to return saña too: 

> La Agencia Árabe Siria de Noticias
> sana.sy/es/
>
> saña - Definición - WordReference.com
> www.wordreference.com/definicion/saña
> 
> Saná - Wikipedia, la enciclopedia libre
> https://es.wikipedia.org/wiki/Saná

I'm seeing this both from France and from the US, on Google.es; is it different from Spain?

Clément.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 20:52                                 ` Óscar Fuentes
@ 2016-02-04 20:59                                   ` Clément Pit--Claudel
  2016-02-04 21:08                                   ` Eli Zaretskii
  1 sibling, 0 replies; 102+ messages in thread
From: Clément Pit--Claudel @ 2016-02-04 20:59 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 378 bytes --]

On 02/04/2016 03:52 PM, Óscar Fuentes wrote:
> I insist on defaulting to off, unless we renounce to make Emacs
> amenable to those who use a text editor for natural languages.

I think this is a false dichotomy. I use Emacs for natural languages too, and I'm OK with that behavior being the default. It could also be made default in prog-mode but not in text-mode. etc.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 20:52                                 ` Óscar Fuentes
  2016-02-04 20:59                                   ` Clément Pit--Claudel
@ 2016-02-04 21:08                                   ` Eli Zaretskii
  1 sibling, 0 replies; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-04 21:08 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

> From: Óscar Fuentes <ofv@wanadoo.es>
> Date: Thu, 04 Feb 2016 21:52:08 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > You misunderstood.  Decomposition is just a tool that is used to
> > search for equivalent character sequences.
> 
> Equivalent in the Unicode sense, right?

Equivalent in the following sense: if the text includes ñ (these are 2
separate characters, they are just combined for display), then
searching for either n or ñ (a single character in both cases) should
find that 2-character sequence.

This follows the "canonical equivalence", described in more detail
here:

  http://unicode.org/reports/tr10/#Canonical_Equivalence

> If my understanding is correct now (the feature is some Unicode thing
> and not about how characters are used by people) I insist on defaulting
> to off, unless we renounce to make Emacs amenable to those who use a
> text editor for natural languages.

It _is_ about how characters are used, see above.

And you don't need to insist, you can just turn it off in your
sessions.  You have heard at least 2 people whose opinions are to the
contrary, for various valid reasons.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 20:56                                   ` Clément Pit--Claudel
@ 2016-02-04 21:16                                     ` Óscar Fuentes
  0 siblings, 0 replies; 102+ messages in thread
From: Óscar Fuentes @ 2016-02-04 21:16 UTC (permalink / raw)
  To: emacs-devel

Clément Pit--Claudel <clement.pit@gmail.com> writes:

> On 02/04/2016 03:40 PM, Óscar Fuentes wrote:
>> If a Spaniard inputs "sana" on a search box and "saña" is found, he
>> will regard the software as either buggy, dumb or completely
>> oblivious to Spanish culture.
>
> Is that true? Here are Google.es results for "sana"; Google seems to be happy to return saña too: 
>
>> La Agencia Árabe Siria de Noticias
>> sana.sy/es/
>>
>> saña - Definición - WordReference.com
>> www.wordreference.com/definicion/saña
>> 
>> Saná - Wikipedia, la enciclopedia libre
>> https://es.wikipedia.org/wiki/Saná
>
> I'm seeing this both from France and from the US, on Google.es; is it different from Spain?

It is the same from Spain. Apparently Google is optimized for non-native
people who possible don't see a real difference among `n' and `ñ', or
have no method for typing an `ñ' (by law, all keyboards sold on Spain
must have a dedicated key for `ñ').

Google is being dumb here (from an Spanish-speaking POV, maybe not so
from other's POV).




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04  5:49               ` Ivan Andrus
@ 2016-02-04 21:30                 ` Richard Stallman
  0 siblings, 0 replies; 102+ messages in thread
From: Richard Stallman @ 2016-02-04 21:30 UTC (permalink / raw)
  To: Ivan Andrus; +Cc: ofv, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Though folding b and v would be very helpful for some of the Spanish I read.  :-)

This suggests a possible feature, phonetic search.  It would be too
hard to support English, I fear, but some other languages might be
easier.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04  8:40               ` Elias Mårtenson
  2016-02-04 11:57                 ` Dirk-Jan C. Binnema
@ 2016-02-04 21:32                 ` Richard Stallman
  2016-02-08 14:12                   ` Marcin Borkowski
  1 sibling, 1 reply; 102+ messages in thread
From: Richard Stallman @ 2016-02-04 21:32 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: ofv, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > It would make sense to have the default based on the session's locale,

Maybe the locale should be the ultimate default, but I think we should
try to tie this to something else people specify in Emacs.

We have something called the language environment that we could
connect this to.

Perhaps we need another temporary and buffer-specific language setting.
It could control this, select the spelling dictionary, select a default
input method, and more.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 15:18                   ` Drew Adams
  2016-02-04 15:59                     ` Óscar Fuentes
@ 2016-02-04 23:05                     ` Artur Malabarba
  2016-02-06  9:37                       ` Per Starbäck
  1 sibling, 1 reply; 102+ messages in thread
From: Artur Malabarba @ 2016-02-04 23:05 UTC (permalink / raw)
  To: Drew Adams; +Cc: Dirk-Jan C. Binnema, emacs-devel

>> > It would make sense to have the default based on the session's locale,

>> Character equivalence is based on the language(s) of whatever is in your
>> buffer,

> That is really where the design effort should be, at this point.
> We have a basic char-folding mechanism, but we do not yet provide
> an easy way for a user to customize the behavior, let alone to
> define/get the various behaviors that s?he might want in different
> contexts.

FTR, like I've said a couple of times already, I will invest more time
into making this customizable once I've seen how it's received.

Also (and this I haven't said yet) I do plan on providing a better
default depending on locale. When the time comes to actually implement
it I'll explain why I prefer locale (over some notion of buffer-local
language).



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 17:05                           ` Werner LEMBERG
@ 2016-02-05  5:09                             ` Elias Mårtenson
  2016-02-05  6:01                               ` Werner LEMBERG
  2016-02-06 12:58                               ` Rasmus
  0 siblings, 2 replies; 102+ messages in thread
From: Elias Mårtenson @ 2016-02-05  5:09 UTC (permalink / raw)
  To: Werner LEMBERG; +Cc: Óscar Fuentes, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1935 bytes --]

On 5 Feb 2016 1:06 a.m., "Werner LEMBERG" <wl@gnu.org> wrote:
>
> This naturally leads to a possible user option: Having `optical'
> matches or not, where `optical' means `base character plus diacritic
> and/or slight modifications', e.g., o → ø → ö etc., etc.

I think this statement shows how easy it is to introduce cultural bias,
although the fact that your name sounds German suggests that personal
preference is involved.

How do you even define "optical similarities"? Should l and I compare the
same under this definition? They certainly looks similar. What about p and
q? They look like mirror images of each other. What about z and s? They
even sound similar. To a Swedish speaker there are zero similarities
between a, ä and å. They are, in fact, just as different as a and z are to
an English speaker. I really cannot emphasise this enough, and reading this
thread tells me that it needs to be emphasised even more.

As someone who lives in an English speaking country and using English
keyboards, while still working with documents in various languages, I see
first-hand the need to have ways of searching for characters that I can't
easily type on my keyboard, but this issue is orthogonal to that of
character equivalence. The conflating of these two issues are, in my
opinion, the root cause of many of the disagreements in this thread.

My personal preference is that the expected behaviour of searches is more
related to the locale of the user, rather than that of the document being
searched. In other words, as a non-Spanish speaker, I'd expect to be able
to find ñ when searching for n, even if the document I'm searching in is in
Spanish. There are definitely an infinite number of counter-examples to
this (enough to keep this thread going for another 100 messages, I'm sure),
but at least there is reason to consider making the default based on the
locale of the user.

[-- Attachment #2: Type: text/html, Size: 2116 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-05  5:09                             ` Elias Mårtenson
@ 2016-02-05  6:01                               ` Werner LEMBERG
  2016-02-05  6:36                                 ` Elias Mårtenson
  2016-02-08 14:05                                 ` Marcin Borkowski
  2016-02-06 12:58                               ` Rasmus
  1 sibling, 2 replies; 102+ messages in thread
From: Werner LEMBERG @ 2016-02-05  6:01 UTC (permalink / raw)
  To: lokedhs; +Cc: ofv, emacs-devel


>> This naturally leads to a possible user option: Having `optical'
>> matches or not, where `optical' means `base character plus
>> diacritic and/or slight modifications', e.g., o → ø → ö etc., etc.
> 
> How do you even define "optical similarities"?

Basically the same as Eli has described: Base character plus
diacritics, probably plus some basic shapes with `diacritics' that
Unicode doesn't represent as composable: o → ø, l → ł, d → đ, etc.

> Should l and I compare the same under this definition?  They
> certainly looks similar.

No, since the similarity is a font issue only.  For this reason I
*never* use Arial-like fonts.

> What about p and q?  They look like mirror images of each other.
> What about z and s?  They even sound similar.

Nonsense.  I've clearly mentioned `base character plus diacritic'.
Why do you intentionally skip that?  Doing so reminds me of
Schopenhauer's first stratagem in `The Art of Being Right'...

> To a Swedish speaker there are zero similarities between a, ä and å.

I'm a native German speaker, and there is *zero* similarity in the
sound between `a' and `ä', say.  But it is quite common in English
texts, say, to omit the diaeresis dots, thus having a searching mode
that finds both `Hänsel und Gretel' and `Hansel and Gretel' at the
same time would be very valuable.

> My personal preference is that the expected behaviour of searches is
> more related to the locale of the user, rather than that of the
> document being searched.  In other words, as a non-Spanish speaker,
> I'd expect to be able to find ñ when searching for n, even if the
> document I'm searching in is in Spanish.  There are definitely an
> infinite number of counter-examples to this (enough to keep this
> thread going for another 100 messages, I'm sure), but at least there
> is reason to consider making the default based on the locale of the
> user.

What you describe naturally leads to another user option: Don't handle
characters as `equal' (with a proper definition of `equal') that
aren't `equal' in the user's locale.


    Werner

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-05  6:01                               ` Werner LEMBERG
@ 2016-02-05  6:36                                 ` Elias Mårtenson
  2016-02-05  7:15                                   ` Werner LEMBERG
  2016-02-05  7:52                                   ` Eli Zaretskii
  2016-02-08 14:05                                 ` Marcin Borkowski
  1 sibling, 2 replies; 102+ messages in thread
From: Elias Mårtenson @ 2016-02-05  6:36 UTC (permalink / raw)
  To: Werner LEMBERG; +Cc: Óscar Fuentes, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 4980 bytes --]

On 5 February 2016 at 14:01, Werner LEMBERG <wl@gnu.org> wrote:

>
> >> This naturally leads to a possible user option: Having `optical'
> >> matches or not, where `optical' means `base character plus
> >> diacritic and/or slight modifications', e.g., o → ø → ö etc., etc.
> >
> > How do you even define "optical similarities"?
>
> Basically the same as Eli has described: Base character plus
> diacritics, probably plus some basic shapes with `diacritics' that
> Unicode doesn't represent as composable: o → ø, l → ł, d → đ, etc.
>

Composability is somewhat arbitrary. The character composition has very
little to do with "visual similarities". Just have a look at character
compositions in Devanagari for example.


> > Should l and I compare the same under this definition?  They
> > certainly looks similar.
>
> No, since the similarity is a font issue only.  For this reason I
> *never* use Arial-like fonts.
>

And that argument works equally well for a and å. They really have
_nothing_ in common. The fact that there exists a Unicode decomposition for
them is completely irrelevant to a Swedish speaker.

Also note that to a Swedish speaker (well, at least up until recently), W
and V were variations of the same character. Yet I'm not advocating that
Emacs should consider them similar unless the locale says they should be.

In fact, the links to the Unicode TR on collations that Eli posted mentions
that as a specific example.


> > What about p and q?  They look like mirror images of each other.
> > What about z and s?  They even sound similar.
>
> Nonsense.  I've clearly mentioned `base character plus diacritic'.
> Why do you intentionally skip that?  Doing so reminds me of
> Schopenhauer's first stratagem in `The Art of Being Right'...
>

I did not intentionally skip that. I would appreciate it if you didn't
assume that I was out to simply prove you wrong, or that I am here to troll.

I was using that as an example in trying to highlight that to some people
(like myself) ä just simply is not a character with a diacritic. It is in
German, but not in Swedish.

I think this is hard to explain because in many European language (such as
English, German and French) you have characters which are variations or
alternatives. For example, in French you have the letter Œ, which is a
variation of "OE". Likewise in German, ß is a variation of SS and Ü is a
variation of UE. As far as I know, I could write "Müller" as "Mueller".

However, this is not true for Swedish. I'll say it again (and I apologise
for repeating myself, this kind of repetition makes me sound like the troll
that you accused me of being) but in Swedish the difference between Å and A
are just as great as the difference in English between the letters E and O.
Writing my last name as "Martenson" looks just as bizarre as me writing
your last name as "Merner". And yes, I picked M because it kinda looks like
an upside-down W and I'm doing that not because I'm really suggesting that
that equivalence should be implemented, but because I want to illustrate
just how silly it looks.



> > To a Swedish speaker there are zero similarities between a, ä and å.
>
> I'm a native German speaker, and there is *zero* similarity in the
> sound between `a' and `ä', say.


I know. Speak a little German. In fact, Ä is pronounced exactly the same in
German and Swedish. That said, as far as I can recall from my German
lessons 25 years ago, German grammar does see Ä as a variation of A. At
least they are sorted together in the dictionary.

Swedish distinction is much greater. This discussion would have been much
easier if the letter looked completely different. :-)


> But it is quite common in English
> texts, say, to omit the diaeresis dots, thus having a searching mode
> that finds both `Hänsel und Gretel' and `Hansel and Gretel' at the
> same time would be very valuable.
>

I never said it's not valuable. I never even suggested that this kind of
comparisons should not be possible.

In fact, I'm not even suggesting that this kind of comparisons should not
be the default, even. Especially given the fact that locale-dependent
comparators are not very well supported in Emacs at the moment.

What I did want to do was try try to explain that even though there is a
visual similarity between A, Ä and Å, to a Swedish speaker those
similarities are no greater than those of q and k. And definitely much more
different than W and V (which were, up until recently sorted under V in
dictionaries and seen as simply a visual variation).

>
> What you describe naturally leads to another user option: Don't handle
> characters as `equal' (with a proper definition of `equal') that
> aren't `equal' in the user's locale.


This is exactly my point. And you have managed to compress hundreds of my
words into a single, district sentence. Thank you.

[-- Attachment #2: Type: text/html, Size: 6918 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-05  6:36                                 ` Elias Mårtenson
@ 2016-02-05  7:15                                   ` Werner LEMBERG
  2016-02-05  7:22                                     ` Elias Mårtenson
  2016-02-05  7:52                                   ` Eli Zaretskii
  1 sibling, 1 reply; 102+ messages in thread
From: Werner LEMBERG @ 2016-02-05  7:15 UTC (permalink / raw)
  To: lokedhs; +Cc: ofv, emacs-devel


>> Basically the same as Eli has described: Base character plus
>> diacritics, probably plus some basic shapes with `diacritics' that
>> Unicode doesn't represent as composable: o → ø, l → ł, d → đ, etc.
> 
> Composability is somewhat arbitrary.  The character composition has
> very little to do with "visual similarities".  Just have a look at
> character compositions in Devanagari for example.

Character compositions in Devanagari form ligatures.  This is a
completely different concept.  It is possible that a given character
sequence yields different renderings, depending on the availability of
a ligature in a font.  The same issue is present in Arabic, BTW.  What
we are discussing here is inherently bound to alphabetic scripts, in
particular Latin, Greek, and Cyrillic.  Abugida and Abjad scripts need
a separate solution, as do CJKV scripts.

> Likewise in German, ß is a variation of SS and Ü is a variation of
> UE.  As far as I know, I could write "Müller" as "Mueller".

In German, `Mueller' is an emergency representation if `ü' is not
available; it is highly discouraged otherwise.  But yes, it would be
beneficial if there were an option to make a search for `Mueller'
match `Müller' also (and vice versa).

> However, this is not true for Swedish. I'll say it again (and I
> apologise for repeating myself, this kind of repetition makes me
> sound like the troll that you accused me of being) but in Swedish
> the difference between Å and A are just as great as the difference
> in English between the letters E and O.  [...]

Funnily, in your neighbouring country Denmark `A' and `Å' are much
nearer, cf. `Århus' vs. `Aarhus'.

>> What you describe naturally leads to another user option: Don't
>> handle characters as `equal' (with a proper definition of `equal')
>> that aren't `equal' in the user's locale.
> 
> This is exactly my point.  [...]

:)


    Werner

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-05  7:15                                   ` Werner LEMBERG
@ 2016-02-05  7:22                                     ` Elias Mårtenson
  2016-02-06 15:43                                       ` Rasmus
  0 siblings, 1 reply; 102+ messages in thread
From: Elias Mårtenson @ 2016-02-05  7:22 UTC (permalink / raw)
  To: Werner LEMBERG; +Cc: Óscar Fuentes, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 760 bytes --]

On 5 February 2016 at 15:15, Werner LEMBERG <wl@gnu.org> wrote:

>
> > However, this is not true for Swedish. I'll say it again (and I
> > apologise for repeating myself, this kind of repetition makes me
> > sound like the troll that you accused me of being) but in Swedish
> > the difference between Å and A are just as great as the difference
> > in English between the letters E and O.  [...]
>
> Funnily, in your neighbouring country Denmark `A' and `Å' are much
> nearer, cf. `Århus' vs. `Aarhus'.
>

Yes, that is funny. And I wish my Danish was better so that I could explain
that. But yes, you observation is correct.

If I remember correctly, I wasn't even aware that Aarhus and Århus was the
same place until it was pointed out.

[-- Attachment #2: Type: text/html, Size: 1178 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-05  6:36                                 ` Elias Mårtenson
  2016-02-05  7:15                                   ` Werner LEMBERG
@ 2016-02-05  7:52                                   ` Eli Zaretskii
  2016-02-05 15:09                                     ` Filipp Gunbin
  1 sibling, 1 reply; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-05  7:52 UTC (permalink / raw)
  To: Elias Mårtenson; +Cc: ofv, emacs-devel

> Date: Fri, 5 Feb 2016 14:36:13 +0800
> From: Elias Mårtenson <lokedhs@gmail.com>
> Cc: Óscar Fuentes <ofv@wanadoo.es>,
> 	emacs-devel <emacs-devel@gnu.org>
> 
> What I did want to do was try try to explain that even though there is a visual similarity between A, Ä and Å, to
> a Swedish speaker those similarities are no greater than those of q and k. And definitely much more different
> than W and V (which were, up until recently sorted under V in dictionaries and seen as simply a visual
> variation).
> 
> 
>  What you describe naturally leads to another user option: Don't handle
>  characters as `equal' (with a proper definition of `equal') that
>  aren't `equal' in the user's locale.
> 
> This is exactly my point. And you have managed to compress hundreds of my words into a single, district
> sentence. Thank you. 

We are not going by visual similarity, or any other arbitrary
criteria.  We are using established rules specified by the UCD, the
Unicode Character Database, and the explanations that accompany it in
the standard itself.  The main rule is equivalent character strings
should match (when character folding is enabled).

That character equivalence is language-dependent is a truism that
doesn't need to be argued.  The plan is to have language-dependent
variations as soon as Emacs acquires good infrastructure for doing
that in a useful manner.  The idea behind the current implementation
was that this feature will be useful even when it is
language-agnostic, which is the lowest level of compatibility cited in
the Unicode Standard (so the Unicode Consortium guys didn't think it
to be a stupid idea).



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 17:26                   ` Teemu Likonen
@ 2016-02-05  8:08                     ` Adrian.B.Robert
  0 siblings, 0 replies; 102+ messages in thread
From: Adrian.B.Robert @ 2016-02-05  8:08 UTC (permalink / raw)
  To: emacs-devel

Teemu Likonen <tlikonen@iki.fi> writes:

> Dirk-Jan C. Binnema [2016-02-04 13:57:36+02] wrote:
>
>> Regardless, for the purpose of searching, my personal preference would
>> be to make folding rather inclusive; I don't really care about the
>> exact rules languages have come up for what letters are considered
>> "the same", I just care for what I, as a user, would find the easiest
>> to match.
>
>> ...
> I think that just a global setting and easy switch like M-s <something>
> in isearch prompt is enough. I fear that any locale or language based
> magic or intelligence is over-engineering and may cause annoying
> surprises. Unexpected intelligence can be harmful too.

+1

I sense a strong enmity between the perfect and the good here.
"Dumb" (unicode-equivalence-based) character folding is a
a godsend for searching through texts when using the "wrong"
keyboard layout, for whatever reason.  It also matches expectations
from using search engines, etc..  And exact matching can handle the
need for precision.  Using default=exact with an easy global option
for switching to unicode-folding will be a great step forward.




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-05  7:52                                   ` Eli Zaretskii
@ 2016-02-05 15:09                                     ` Filipp Gunbin
  2016-02-05 19:21                                       ` Eli Zaretskii
  0 siblings, 1 reply; 102+ messages in thread
From: Filipp Gunbin @ 2016-02-05 15:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

> The idea behind the current implementation was that this feature will
> be useful even when it is language-agnostic, which is the lowest level
> of compatibility cited in the Unicode Standard (so the Unicode
> Consortium guys didn't think it to be a stupid idea).

While we have strict rules for some languages, it's very helpful to
count for errors which natives and non-natives may make and fold as much
as possible - if folded search gives too many false positive that may
just be an indication that a more specific (not folded) search should be
used.

I now realize that I'd like to see folded even distinct letters like
Russian Е and Ё - I cannot tell in advance when the author did it
correct.

However, having folding on by default will certainly tell me that Emacs
is not respecting Russian alphabet, which some people here wrote about
too.

Filipp



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-05 15:09                                     ` Filipp Gunbin
@ 2016-02-05 19:21                                       ` Eli Zaretskii
  2016-02-05 21:12                                         ` Óscar Fuentes
  2016-02-06 19:49                                         ` Richard Stallman
  0 siblings, 2 replies; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-05 19:21 UTC (permalink / raw)
  To: Filipp Gunbin; +Cc: emacs-devel

> From: Filipp Gunbin <fgunbin@fastmail.fm>
> Cc: emacs-devel@gnu.org
> Date: Fri, 05 Feb 2016 18:09:23 +0300
> 
> I now realize that I'd like to see folded even distinct letters like
> Russian Е and Ё - I cannot tell in advance when the author did it
> correct.
> 
> However, having folding on by default will certainly tell me that Emacs
> is not respecting Russian alphabet, which some people here wrote about
> too.

Folding has nothing to do with respecting the alphabet.  A and a are
not the same letters, either, and have distinct positions within the
English alphabet, and yet it is customary to have case folded during
searching, and Emacs does that by default.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-05 19:21                                       ` Eli Zaretskii
@ 2016-02-05 21:12                                         ` Óscar Fuentes
  2016-02-05 22:20                                           ` Eli Zaretskii
  2016-02-06 19:49                                           ` Richard Stallman
  2016-02-06 19:49                                         ` Richard Stallman
  1 sibling, 2 replies; 102+ messages in thread
From: Óscar Fuentes @ 2016-02-05 21:12 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Folding has nothing to do with respecting the alphabet.  A and a are
> not the same letters, either, and have distinct positions within the
> English alphabet,

This is big news to me. AFAIK `A' and `a' are the same letter, one in
uppercase form and the other in lowercase form. The English alphabet
consists on 26 letters. This is what I learned many years ago, but it
seems that it is all wrong.

In Spanish, `A' and `a' are the same letter. `á' and `a' are also the
same letter. `n' and `ñ' are not the same letter.

> and yet it is customary to have case folded during searching, and
> Emacs does that by default.

Maybe you are confusing C with English :-)

Seriously, if you want a feature for the people who think on terms of
encodings, that's fine, but please keep in mind that most people see
text as text, the same thing they can write with a pencil, not series of
bytes on Unicode, ASCII or whatever.




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-05 21:12                                         ` Óscar Fuentes
@ 2016-02-05 22:20                                           ` Eli Zaretskii
  2016-02-06 19:49                                           ` Richard Stallman
  1 sibling, 0 replies; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-05 22:20 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

> From: Óscar Fuentes <ofv@wanadoo.es>
> Date: Fri, 05 Feb 2016 22:12:34 +0100
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Folding has nothing to do with respecting the alphabet.  A and a are
> > not the same letters, either, and have distinct positions within the
> > English alphabet,
> 
> This is big news to me. AFAIK `A' and `a' are the same letter, one in
> uppercase form and the other in lowercase form. The English alphabet
> consists on 26 letters. This is what I learned many years ago, but it
> seems that it is all wrong.

You are missing the point.  The point is that "folding", by its very
definition, means mapping distinct things to the same value.  So no
one argues that the letters are different before they are folded.

> Seriously, if you want a feature for the people who think on terms of
> encodings, that's fine, but please keep in mind that most people see
> text as text, the same thing they can write with a pencil, not series of
> bytes on Unicode, ASCII or whatever.

The notion of "text" moved a long way since we were in kindergarten.
The Unicode Standard is about plain text, not anything else.  We
slowly adapt to that, and character folding is one milestone on that
long journey.  It has nothing to do with encoding.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 23:05                     ` Artur Malabarba
@ 2016-02-06  9:37                       ` Per Starbäck
  2016-02-06 10:41                         ` Eli Zaretskii
  0 siblings, 1 reply; 102+ messages in thread
From: Per Starbäck @ 2016-02-06  9:37 UTC (permalink / raw)
  To: Artur Malabarba; +Cc: Dirk-Jan C. Binnema, Drew Adams, emacs-devel

Oscar Fuentes wrote:

> If a Spaniard inputs "sana" on a search box and "saña" is found, he
> will regard the software as either buggy, dumb or completely
> oblivious to Spanish culture.

Similar to my example of how a Swede would see a search for "varpa"
finding "värpa" or "varpå" (all of the three being existing totally
different words).

When met with the "argument" that not many people speak Swedish anyway
I replied that it was only an example of what I knew best, and that
there probably were similar examples in several other languages. I'm
glad to hear there is one in Spanish, one of the largest languages of
the world. Now let's count the number of affected people again! :)

That character folding is dependent on locale is of course well-known
by those who work on this. Artur Malabarba wrote:

> FTR, like I've said a couple of times already, I will invest more time
> into making this customizable once I've seen how it's received.
> Also (and this I haven't said yet) I do plan on providing a better
> default depending on locale. When the time comes to actually implement
> it I'll explain why I prefer locale (over some notion of buffer-local
> language).

When Artur again confirmed that he is fine with having the new feature
turned of in Emacs 25 with the intention of having it turned on later,
after it has had enough testing, I though this would finally be settled.

But evidently not yet... From the opposers it has been argued as if
this is something mandated by Unicode, so we can do nothing about it
but to follow. It doesn't matter if the result is seen as buggy or
dumb by users. "This feature is simply folding as specified by the
Unicode standard".

That is not so. Of course the Unicode Consortium is well aware of the
issues that I, Oscar and others are pointing out, and that I'm sure
Artur is well aware of.

Eli Zaretskii:
> Perhaps you aren't familiar with Unicode equivalence, in which case I
> suggest these sources:
>
>   http://unicode.org/reports/tr10/#Searching
>   http://www.unicode.org/notes/tn5/
>   http://www.unicode.org/reports/tr30/tr30-4.html

But of course these take up issues like we have mentioned here. The
first one mentions the aa/å equivalence in Danish for example. And to
quote the last one:

#  In the general case, different search term foldings are applied for
#  different languages. For example, accent distinctions are ignorable
#  for some languages, but not for others. In English the accent in
#  words like naïve is optional, while to a Swedish user 'o' and 'ö'
#  are distinct letters.

That is by the way the last draft of a withdrawn tecnical report.

  Draft UTR #30: Unicode Character Foldings has been withdrawn. It was
  never formally approved; the last public version was a draft
  UTR,which can be found at
  http://www.unicode.org/reports/tr30/tr30-4.html.

That shows not only that the issues I, Oscar and others are mentioning
are not something new that we just thought of that Unicode somehow
should have us ignore. It also shows that there *is* no technical
report on Unicode Character Foldings.

We have to break out of the circles this is going in. John Wiegley wrote:

> A locale-based quotient for natural language text seems like a reasonable
> default, unless pretesting/polling shows us otherwise. However, there will
> always be times when you don't want it, or you want a different quotient
> altogether, or even various combinations of them.

Yes, that would be a good default, but that's not a default that we
can have in the next Emacs, but that there is great prospects we can
have in the one after that. Please John, put your foot down and don't
let this continue ad infinitum.

The options we have are instead:

(1) Let the default be as searching has worked before. Nothing gets
worse for anyone.

We'll the start of a new exciting feature available, that will be just
right for many users, and that will be tried by a lot others as well,
giving feedback for the continued development that Artur has written
that he already is planning.

(2) Make the fundamental feature searching work fundamentally
different out of the box in a way that for many users will be seen as
neat, and for many users will be seen as "buggy, dumb or completely
oblivious to" the user's culture.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-06  9:37                       ` Per Starbäck
@ 2016-02-06 10:41                         ` Eli Zaretskii
  2016-02-06 12:52                           ` Rasmus
  2016-02-06 14:24                           ` Ken Brown
  0 siblings, 2 replies; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-06 10:41 UTC (permalink / raw)
  To: Per Starbäck; +Cc: djcb, drew.adams, bruce.connor.am, emacs-devel

> Date: Sat, 6 Feb 2016 10:37:06 +0100
> From: Per Starbäck <per.starback@gmail.com>
> Cc: "Dirk-Jan C. Binnema" <djcb@djcbsoftware.nl>,
> 	Drew Adams <drew.adams@oracle.com>, emacs-devel <emacs-devel@gnu.org>
> 
> From the opposers it has been argued as if this is something
> mandated by Unicode, so we can do nothing about it but to follow.

No one said anything like that.  The references to the Unicode
Standard and its various data and TRs are to make the point that the
feature as implemented is based on sound principles and not on some
arbitrary criteria.

No one said the feature is "mandated" in any way, shape or form.
Whether the features should be turned on by default is a matter only
we the Emacs community will decide.

> It doesn't matter if the result is seen as buggy or dumb by
> users. "This feature is simply folding as specified by the Unicode
> standard".

The Unicode Standard specifies _how_ to fold during search.  It also
includes recommendations _when_ to fold.  It doesn't mandate anything,
and even if it did, we don't need to heed to that.  Your arguments in
this part are a red herring.

> That is not so. Of course the Unicode Consortium is well aware of the
> issues that I, Oscar and others are pointing out, and that I'm sure
> Artur is well aware of.

We are all aware of that, please give us credit that we know something
about the issues involved.  It is you who seems to misunderstands
important aspects of this, see below.

> Eli Zaretskii:
> > Perhaps you aren't familiar with Unicode equivalence, in which case I
> > suggest these sources:
> >
> >   http://unicode.org/reports/tr10/#Searching
> >   http://www.unicode.org/notes/tn5/
> >   http://www.unicode.org/reports/tr30/tr30-4.html
> 
> But of course these take up issues like we have mentioned here. The
> first one mentions the aa/å equivalence in Danish for example. And to
> quote the last one:
> 
> #  In the general case, different search term foldings are applied for
> #  different languages. For example, accent distinctions are ignorable
> #  for some languages, but not for others. In English the accent in
> #  words like naïve is optional, while to a Swedish user 'o' and 'ö'
> #  are distinct letters.

It seems that you have read only the parts that confirm your views in
your eyes, and skipped or dismissed the rest.  And now you are
spreading your misunderstanding among others.

The facts are different.  Unicode indeed recognizes that different
languages change the rules to some degree.  However, it defines
several distinct degrees of conformance, and what we have now is the
lowest possible level of conformance, the one that is not tailored to
any particular language.  See Section 3.8 of TR#10, referenced above,
and Table 13 there.  What we in fact implemented is the default
collation weights, which are independent of language tailoring.

This is similar to the data we use for case-folding: it doesn't
include any language-specific tailoring, and so in some cases, like
Turkish dotless i issue, produces results that are incorrect in the
context of some specific languages.  Still we use it, and it generally
works very well.

In the long run, we should add language-specific tailoring to this and
other similar features.  Currently, we lack the infrastructure for
doing that in a useful way, so this further development must wait.
But it doesn't mean the feature isn't useful as it is now, and several
participants in this thread explicitly said they like what the feature
gives them.  Which doesn't surprise me, because it matches the advice
in the Unicode Standard, so I know we are on the right path.

> That is by the way the last draft of a withdrawn tecnical report.

(So why are you quoting from it and claim that it supports your POV?
If it's indeed a useless, withdrawn draft, then it has no relevance at
all, right?  Please decide whether you want to treat that report
seriously or not, and please be consistent with your decision.  Trying
to have the cake and also eat it doesn't add credibility to your
opinions.)

>   Draft UTR #30: Unicode Character Foldings has been withdrawn. It was
>   never formally approved; the last public version was a draft
>   UTR,which can be found at
>   http://www.unicode.org/reports/tr30/tr30-4.html.

Actually, that draft was mentioned because it includes interesting and
important stuff not mentioned in one place in any other publication I
know of.  I referred to it under an assumption that the reader will be
keenly interested in learning as much relevant background information
about the subject as possible, even if the report itself never made it
to the official status.

> We have to break out of the circles this is going in.

There are no circles.  We wanted to collect feedback, and we are
collecting it.  The pretest is going on for merely one week, and the
feedback we have already is useful, and it keeps coming in.  Stopping
that and making the decision now makes no sense to me.  The release is
still quite far away, and we have nothing to lose by hearing from more
people.  Assuming we want to make an informed decision, there's no
rush.

> Please John, put your foot down and don't let this continue ad
> infinitum.

No one intends to continue "ad infinitum".  That's another red
herring.  We should continue collecting feedback for a couple more of
pretest releases, that's all.  Then we can make the decision based on
that feedback.  I counted 10 people (excluding myself and Artur) who
expressed their clear opinions in this thread; that is way too few for
an intelligent decision, IMO.

> The options we have are instead:
> 
> (1) Let the default be as searching has worked before. Nothing gets
> worse for anyone.
> 
> We'll the start of a new exciting feature available, that will be just
> right for many users, and that will be tried by a lot others as well,
> giving feedback for the continued development that Artur has written
> that he already is planning.
> 
> (2) Make the fundamental feature searching work fundamentally
> different out of the box in a way that for many users will be seen as
> neat, and for many users will be seen as "buggy, dumb or completely
> oblivious to" the user's culture.

With all due respect, I don't think this is an objective description
of the alternatives.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-06 10:41                         ` Eli Zaretskii
@ 2016-02-06 12:52                           ` Rasmus
  2016-02-06 14:31                             ` Eli Zaretskii
  2016-02-06 14:24                           ` Ken Brown
  1 sibling, 1 reply; 102+ messages in thread
From: Rasmus @ 2016-02-06 12:52 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> No one intends to continue "ad infinitum".  That's another red 
> herring.  We should continue collecting feedback for a couple 
> more of pretest releases, that's all.  Then we can make the 
> decision based on that feedback.  I counted 10 people (excluding 
> myself and Artur) who expressed their clear opinions in this 
> thread; that is way too few for an intelligent decision, IMO. 

My language probably does not fit the agnostic approach. 
Nonetheless, this is an awesome features and I think it should be 
on by default.

Thanks for working on this to all those who have done so!

Rasmus

-- 
C is for Cookie




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-05  5:09                             ` Elias Mårtenson
  2016-02-05  6:01                               ` Werner LEMBERG
@ 2016-02-06 12:58                               ` Rasmus
  1 sibling, 0 replies; 102+ messages in thread
From: Rasmus @ 2016-02-06 12:58 UTC (permalink / raw)
  To: emacs-devel

Elias Mårtenson <lokedhs@gmail.com> writes:

> My personal preference is that the expected behaviour of 
> searches is more related to the locale of the user, rather than 
> that of the document being searched. In other words, as a 
> non-Spanish speaker, I'd expect to be able to find ñ when 
> searching for n, even if the document I'm searching in is in 
> Spanish. There are definitely an infinite number of 
> counter-examples to this (enough to keep this thread going for 
> another 100 messages, I'm sure), but at least there is reason to 
> consider making the default based on the locale of the user. 

But what locale?  The keyboard makes the most sense, I guess, but 
plenty people switches between layouts (native and English, say) 
and it might be confusing to have different search results based 
on that.

The "main" locale surely will not work IMO.  I use a Scando 
keyboard, my Gnome is set to Spanish, and I mostly compose 
documents in English, German or Danish....

Rasmus

-- 
Send from my Emacs




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-03 16:16           ` Eli Zaretskii
@ 2016-02-06 13:41             ` Teemu Likonen
  2016-02-06 14:33               ` Eli Zaretskii
  0 siblings, 1 reply; 102+ messages in thread
From: Teemu Likonen @ 2016-02-06 13:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: nicolas, monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 902 bytes --]

Eli Zaretskii [2016-02-03 18:16:25+02] wrote:

> From: Teemu Likonen <tlikonen@iki.fi>
>> Here's mine: I don't want "a" and "ä" to be the same in searches, by
>> default. In my language (Finnish) they are different letters and
>> phonemes, for example: "tai" (= or) and "täi" (= a louse); "sakki" (=
>> gang, crowd) and "säkki" (= a sack).
>
> Thank you.

Actually I take that back. I've been testing (and thinking) the
character folding feature more and it's very unlikely that users will
face problems with it in the Finnish language. It doesn't bother me if
the feature is on by default.

A global switch (a dynamic variable) would be a good thing and I think
it should override any locale or language based magic, if such magic is
even necessary.

-- 
/// Teemu Likonen   - .-..   <https://github.com/tlikonen> //
// PGP: 4E10 55DC 84E9 DFF6 13D7 8557 719D 69D3 2453 9450 ///

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-06 10:41                         ` Eli Zaretskii
  2016-02-06 12:52                           ` Rasmus
@ 2016-02-06 14:24                           ` Ken Brown
  2016-02-06 15:07                             ` Eli Zaretskii
  1 sibling, 1 reply; 102+ messages in thread
From: Ken Brown @ 2016-02-06 14:24 UTC (permalink / raw)
  To: Eli Zaretskii, Per Starbäck
  Cc: djcb, bruce.connor.am, drew.adams, emacs-devel

On 2/6/2016 5:41 AM, Eli Zaretskii wrote:
> No one intends to continue "ad infinitum".  That's another red
> herring.  We should continue collecting feedback for a couple more of
> pretest releases, that's all.  Then we can make the decision based on
> that feedback.  I counted 10 people (excluding myself and Artur) who
> expressed their clear opinions in this thread; that is way too few for
> an intelligent decision, IMO.

I'll add one more.  I like character folding in its present form, and I 
will use it whether it's on by default or not.

As to whether it should be on by default, I agree with those who say 
it's too early to make that decision.

Ken



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-06 12:52                           ` Rasmus
@ 2016-02-06 14:31                             ` Eli Zaretskii
  0 siblings, 0 replies; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-06 14:31 UTC (permalink / raw)
  To: Rasmus; +Cc: emacs-devel

> From: Rasmus <rasmus@gmx.us>
> Date: Sat, 06 Feb 2016 13:52:23 +0100
> 
> My language probably does not fit the agnostic approach. 
> Nonetheless, this is an awesome features and I think it should be 
> on by default.
> 
> Thanks for working on this to all those who have done so!

Thank you for your feedback.

(Most of the credit for the actual work goes to Artur, of course.)



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-06 13:41             ` Teemu Likonen
@ 2016-02-06 14:33               ` Eli Zaretskii
  2016-02-06 15:09                 ` Teemu Likonen
  0 siblings, 1 reply; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-06 14:33 UTC (permalink / raw)
  To: Teemu Likonen; +Cc: nicolas, monnier, emacs-devel

> From: Teemu Likonen <tlikonen@iki.fi>
> Cc: nicolas@petton.fr, monnier@iro.umontreal.ca, emacs-devel@gnu.org
> Date: Sat, 06 Feb 2016 15:41:44 +0200
> 
> Eli Zaretskii [2016-02-03 18:16:25+02] wrote:
> 
> > From: Teemu Likonen <tlikonen@iki.fi>
> >> Here's mine: I don't want "a" and "ä" to be the same in searches, by
> >> default. In my language (Finnish) they are different letters and
> >> phonemes, for example: "tai" (= or) and "täi" (= a louse); "sakki" (=
> >> gang, crowd) and "säkki" (= a sack).
> >
> > Thank you.
> 
> Actually I take that back. I've been testing (and thinking) the
> character folding feature more and it's very unlikely that users will
> face problems with it in the Finnish language. It doesn't bother me if
> the feature is on by default.

Thanks again for sharing your views.

> A global switch (a dynamic variable) would be a good thing and I think
> it should override any locale or language based magic, if such magic is
> even necessary.

Not sure I understand: a global switch to do what?  If to turn
character folding on and off, then such a possibility already exists.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-06 14:24                           ` Ken Brown
@ 2016-02-06 15:07                             ` Eli Zaretskii
  0 siblings, 0 replies; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-06 15:07 UTC (permalink / raw)
  To: Ken Brown; +Cc: per.starback, djcb, bruce.connor.am, drew.adams, emacs-devel

> Cc: djcb@djcbsoftware.nl, drew.adams@oracle.com, bruce.connor.am@gmail.com,
>         emacs-devel@gnu.org
> From: Ken Brown <kbrown@cornell.edu>
> Date: Sat, 6 Feb 2016 09:24:24 -0500
> 
> I'll add one more.  I like character folding in its present form, and I 
> will use it whether it's on by default or not.
> 
> As to whether it should be on by default, I agree with those who say 
> it's too early to make that decision.

Thanks for the feedback.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-06 14:33               ` Eli Zaretskii
@ 2016-02-06 15:09                 ` Teemu Likonen
  2016-02-06 18:38                   ` Artur Malabarba
  0 siblings, 1 reply; 102+ messages in thread
From: Teemu Likonen @ 2016-02-06 15:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: nicolas, monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 876 bytes --]

Eli Zaretskii [2016-02-06 16:33:36+02] wrote:

> From: Teemu Likonen <tlikonen@iki.fi>
>> A global switch (a dynamic variable) would be a good thing and I
>> think it should override any locale or language based magic, if such
>> magic is even necessary.
>
> Not sure I understand: a global switch to do what?  If to turn
> character folding on and off, then such a possibility already exists.

By global switch I meant a variable like case-fold-search but for
character folding. But after looking a bit more closely I found
search-default-regexp-mode. The "regexp" part in the variable name is
confusing but I guess I must read the whole "(emacs) Search" info node
and its subnodes. It's probably too long since the last time.

-- 
/// Teemu Likonen   - .-..   <https://github.com/tlikonen> //
// PGP: 4E10 55DC 84E9 DFF6 13D7 8557 719D 69D3 2453 9450 ///

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-05  7:22                                     ` Elias Mårtenson
@ 2016-02-06 15:43                                       ` Rasmus
  2016-02-06 15:51                                         ` Eli Zaretskii
  0 siblings, 1 reply; 102+ messages in thread
From: Rasmus @ 2016-02-06 15:43 UTC (permalink / raw)
  To: emacs-devel

Elias Mårtenson <lokedhs@gmail.com> writes:

> On 5 February 2016 at 15:15, Werner LEMBERG <wl@gnu.org> wrote: 
> 
>> 
>> > However, this is not true for Swedish. I'll say it again (and 
>> > I apologise for repeating myself, this kind of repetition 
>> > makes me sound like the troll that you accused me of being) 
>> > but in Swedish the difference between Å and A are just as 
>> > great as the difference in English between the letters E and 
>> > O.  [...] 
>> 
>> Funnily, in your neighbouring country Denmark `A' and `Å' are 
>> much nearer, cf. `Århus' vs. `Aarhus'. 
>> 
> 
> Yes, that is funny. And I wish my Danish was better so that I 
> could explain that. But yes, you observation is correct. 

Å and aa is the same though Å apparently sorts before aa in the 
dictionary.  Å is the recommended symbol for the aa sounds since 
1948, but in some cases like places one is free to chose 
(e.g. Århus and Aarhus and Aalborg and Ålborg; note "Ålborg" is 
uncommon and is never used by citizens of the city).

Since 2011 Aarhus is used in official documents, but both 
representations are generally correct.

For the purpose of the discussion you could argue that "arhus" 
should match Århus since an equivalent representation is aarhus...

Rasmus 

-- 
The second rule of Fight Club is: You do not talk about Fight Club




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-06 15:43                                       ` Rasmus
@ 2016-02-06 15:51                                         ` Eli Zaretskii
  0 siblings, 0 replies; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-06 15:51 UTC (permalink / raw)
  To: Rasmus; +Cc: emacs-devel

> From: Rasmus <rasmus@gmx.us>
> Date: Sat, 06 Feb 2016 16:43:14 +0100
> 
> For the purpose of the discussion you could argue that "arhus" 
> should match Århus

And it does, indeed, when both character-folding and case-folding are
turned on.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-06 15:09                 ` Teemu Likonen
@ 2016-02-06 18:38                   ` Artur Malabarba
  2016-02-06 19:08                     ` Eli Zaretskii
  0 siblings, 1 reply; 102+ messages in thread
From: Artur Malabarba @ 2016-02-06 18:38 UTC (permalink / raw)
  To: Teemu Likonen; +Cc: Nicolas Petton, Eli Zaretskii, Stefan Monnier, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 525 bytes --]

On 6 Feb 2016 1:09 pm, "Teemu Likonen" <tlikonen@iki.fi> wrote:
> By global switch I meant a variable like case-fold-search but for
> character folding. But after looking a bit more closely I found
> search-default-regexp-mode. The "regexp" part in the variable name is
> confusing

I see how that's confusing. We should probably call it search-default-mode.
There's still time to make the change.

The regexp part is related to the implementation, so it's really of little
interest to the user and shouldn't be in the name.

[-- Attachment #2: Type: text/html, Size: 688 bytes --]

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-06 18:38                   ` Artur Malabarba
@ 2016-02-06 19:08                     ` Eli Zaretskii
  2016-02-07  1:06                       ` Artur Malabarba
  0 siblings, 1 reply; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-06 19:08 UTC (permalink / raw)
  To: Artur Malabarba; +Cc: nicolas, tlikonen, monnier, emacs-devel

> Date: Sat, 6 Feb 2016 18:38:44 +0000
> From: Artur Malabarba <arturmalabarba@gmail.com>
> Cc: emacs-devel <emacs-devel@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca>, 
> 	Nicolas Petton <nicolas@petton.fr>, Eli Zaretskii <eliz@gnu.org>
> 
> On 6 Feb 2016 1:09 pm, "Teemu Likonen" <tlikonen@iki.fi> wrote:
> > By global switch I meant a variable like case-fold-search but for
> > character folding. But after looking a bit more closely I found
> > search-default-regexp-mode. The "regexp" part in the variable name is
> > confusing
> 
> I see how that's confusing. We should probably call it search-default-mode. 

Yes, please.

Thanks.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-05 19:21                                       ` Eli Zaretskii
  2016-02-05 21:12                                         ` Óscar Fuentes
@ 2016-02-06 19:49                                         ` Richard Stallman
  1 sibling, 0 replies; 102+ messages in thread
From: Richard Stallman @ 2016-02-06 19:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: fgunbin, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Folding has nothing to do with respecting the alphabet.

I agree.  This is not a matter of principle, just convenience.

I am sure this feature will be convenient for many users if configured
it right -- but what configuration is right appears not to be obvious.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-05 21:12                                         ` Óscar Fuentes
  2016-02-05 22:20                                           ` Eli Zaretskii
@ 2016-02-06 19:49                                           ` Richard Stallman
  1 sibling, 0 replies; 102+ messages in thread
From: Richard Stallman @ 2016-02-06 19:49 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > In Spanish, `A' and `a' are the same letter. `á' and `a' are also the
  > same letter. `n' and `ñ' are not the same letter.

Ok, but let's not make dire criticial remarks about it ;-).

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-06 19:08                     ` Eli Zaretskii
@ 2016-02-07  1:06                       ` Artur Malabarba
  0 siblings, 0 replies; 102+ messages in thread
From: Artur Malabarba @ 2016-02-07  1:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Nicolas Petton, tlikonen, Stefan Monnier, emacs-devel

On 6 February 2016 at 19:08, Eli Zaretskii <eliz@gnu.org> wrote:
>> Date: Sat, 6 Feb 2016 18:38:44 +0000
>> From: Artur Malabarba <arturmalabarba@gmail.com>
>> Cc: emacs-devel <emacs-devel@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca>,
>>       Nicolas Petton <nicolas@petton.fr>, Eli Zaretskii <eliz@gnu.org>
>>
>> On 6 Feb 2016 1:09 pm, "Teemu Likonen" <tlikonen@iki.fi> wrote:
>> > By global switch I meant a variable like case-fold-search but for
>> > character folding. But after looking a bit more closely I found
>> > search-default-regexp-mode. The "regexp" part in the variable name is
>> > confusing
>>
>> I see how that's confusing. We should probably call it search-default-mode.
>
> Yes, please.
>
> Thanks.
Done



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-05  6:01                               ` Werner LEMBERG
  2016-02-05  6:36                                 ` Elias Mårtenson
@ 2016-02-08 14:05                                 ` Marcin Borkowski
  2016-02-08 17:48                                   ` Eli Zaretskii
  1 sibling, 1 reply; 102+ messages in thread
From: Marcin Borkowski @ 2016-02-08 14:05 UTC (permalink / raw)
  To: Werner LEMBERG; +Cc: ofv, lokedhs, emacs-devel


On 2016-02-05, at 07:01, Werner LEMBERG <wl@gnu.org> wrote:

>> How do you even define "optical similarities"?
>
> Basically the same as Eli has described: Base character plus
> diacritics, probably plus some basic shapes with `diacritics' that
> Unicode doesn't represent as composable: o → ø, l → ł, d → đ, etc.

Just as another datapoint in discussion: for me, searching for "l" and
finding "ł" seems a bit weird.  (The opposite even more so.)  I admit
this might be nice for people without access to Polish keyboard, and in
fact the most popular layout for Polish keyboard is one where "AltGr +
l" stands for "ł", but they are really different letters, and similarly
with other such cases:

"łata" = "patch"
"lata" = "flies" (verb, as in "something flies")

"kąt" = "angle"
"kat" = "hangman"

Etc., etc.

BTW, strangely enough, here isearching for "l" does /not/ find "ł", but
isearching for "a" (with character folding on) finds "ą".  Whatever one
thinks about char folding, this is clearly a bug.

For Polish texts, I would rather turn char folding off.

Best,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-04 21:32                 ` Richard Stallman
@ 2016-02-08 14:12                   ` Marcin Borkowski
  0 siblings, 0 replies; 102+ messages in thread
From: Marcin Borkowski @ 2016-02-08 14:12 UTC (permalink / raw)
  To: rms; +Cc: ofv, Elias Mårtenson, emacs-devel


On 2016-02-04, at 22:32, Richard Stallman <rms@gnu.org> wrote:
>   > It would make sense to have the default based on the session's locale,
>
> Maybe the locale should be the ultimate default, but I think we should
> try to tie this to something else people specify in Emacs.
>
> We have something called the language environment that we could
> connect this to.
>
> Perhaps we need another temporary and buffer-specific language setting.
> It could control this, select the spelling dictionary, select a default
> input method, and more.

Yes, we need it.  Wasn't that discussed some time ago?

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-08 14:05                                 ` Marcin Borkowski
@ 2016-02-08 17:48                                   ` Eli Zaretskii
  2016-02-08 17:57                                     ` Werner LEMBERG
  2016-02-08 19:18                                     ` Marcin Borkowski
  0 siblings, 2 replies; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-08 17:48 UTC (permalink / raw)
  To: Marcin Borkowski; +Cc: ofv, lokedhs, emacs-devel

> From: Marcin Borkowski <mbork@mbork.pl>
> Date: Mon, 08 Feb 2016 15:05:05 +0100
> Cc: ofv@wanadoo.es, lokedhs@gmail.com, emacs-devel@gnu.org
> 
> Just as another datapoint in discussion: for me, searching for "l" and
> finding "ł" seems a bit weird.  (The opposite even more so.)

Which is why neither one happens under character folding.

> BTW, strangely enough, here isearching for "l" does /not/ find "ł", but
> isearching for "a" (with character folding on) finds "ą".  Whatever one
> thinks about char folding, this is clearly a bug.

It's not a bug, it's the feature working as designed: we only fold
characters that have suitable decompositions in the Unicode Character
Database.  So:

  (get-char-code-property ?ą 'decomposition) => (97 808)

but

  (get-char-code-property ?ł 'decomposition) => (322)

IOW, ą is canonically equivalent to the 2-character sequence a ̨ (which
is why searching for a finds that character), while ł has no canonical
decomposition (nor any other decomposition).

This means that the Unicode guys decided that ł should not be
equivalent to any other sequence of characters, and therefore Emacs
doesn't find it unless you search for it literally.

If you want to know why ł doesn't have any decompositions, I suggest
to ask on the Unicode mailing list, I'm sure they had good reasons,
most probably reasons that came from people who are experts in the
Polish language and its intricacies.  We just trust the results.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-08 17:48                                   ` Eli Zaretskii
@ 2016-02-08 17:57                                     ` Werner LEMBERG
  2016-02-08 19:18                                     ` Marcin Borkowski
  1 sibling, 0 replies; 102+ messages in thread
From: Werner LEMBERG @ 2016-02-08 17:57 UTC (permalink / raw)
  To: eliz; +Cc: ofv, lokedhs, emacs-devel


> This means that the Unicode guys decided that ł should not be
> equivalent to any other sequence of characters, and therefore Emacs
> doesn't find it unless you search for it literally.

Well, I'm suggesting to extend Unicode rules here for the sake of
(non-Polish) users.

> If you want to know why ł doesn't have any decompositions, I suggest
> to ask on the Unicode mailing list, [...]

It's quite easy: A decomposition happens only if the modifier at most
touches the glyph.  A glyph with a strike-through feature (ł, đ, ø,
etc.) is thus not decomposable.


    Werner

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-08 17:48                                   ` Eli Zaretskii
  2016-02-08 17:57                                     ` Werner LEMBERG
@ 2016-02-08 19:18                                     ` Marcin Borkowski
  2016-02-08 19:37                                       ` Eli Zaretskii
                                                         ` (3 more replies)
  1 sibling, 4 replies; 102+ messages in thread
From: Marcin Borkowski @ 2016-02-08 19:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: ofv, lokedhs, emacs-devel


On 2016-02-08, at 18:48, Eli Zaretskii <eliz@gnu.org> wrote:

>> From: Marcin Borkowski <mbork@mbork.pl>
>> Date: Mon, 08 Feb 2016 15:05:05 +0100
>> Cc: ofv@wanadoo.es, lokedhs@gmail.com, emacs-devel@gnu.org
>> 
>> Just as another datapoint in discussion: for me, searching for "l" and
>> finding "ł" seems a bit weird.  (The opposite even more so.)
>
> Which is why neither one happens under character folding.
>
>> BTW, strangely enough, here isearching for "l" does /not/ find "ł", but
>> isearching for "a" (with character folding on) finds "ą".  Whatever one
>> thinks about char folding, this is clearly a bug.
>
> It's not a bug, it's the feature working as designed: we only fold
> characters that have suitable decompositions in the Unicode Character
> Database.  So:
>
>   (get-char-code-property ?ą 'decomposition) => (97 808)
>
> but
>
>   (get-char-code-property ?ł 'decomposition) => (322)
>
> IOW, ą is canonically equivalent to the 2-character sequence a ̨ (which
> is why searching for a finds that character), while ł has no canonical
> decomposition (nor any other decomposition).
>
> This means that the Unicode guys decided that ł should not be
> equivalent to any other sequence of characters, and therefore Emacs
> doesn't find it unless you search for it literally.
>
> If you want to know why ł doesn't have any decompositions, I suggest
> to ask on the Unicode mailing list, I'm sure they had good reasons,
> most probably reasons that came from people who are experts in the
> Polish language and its intricacies.  We just trust the results.

Thanks for the explanation, Eli!

However, given the number of bugs/quirks in Unicode, I'd personally
prefer not to trust them too much.  (Though I understand that the Emacs
devs /have/ to trust someone, and choosing the Unicode people is
probably not a bad idea generally.)  Funnily, one of the more annoying
bugs in Unicode is connected with quotes, AFAIR.  (Why not beat a dead
horse? ;-))  And folding "ą" to "a" while not "ł" to "l" is something
which most Poles (I guess) would treat as a serious, WTF-level bug.  And
good luck to all non-Polish people with isearching for the name of Jan
Łukasiewicz (just to choose a Lisp-related name;-)).

Yet another datapoint suggesting that the issue is really complicated,
and that Drew is right: if this is not configurable by users, it might
end up more annoying than helping.  (Not to say it won't - I trust Artur
here.)

Best,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-08 19:18                                     ` Marcin Borkowski
@ 2016-02-08 19:37                                       ` Eli Zaretskii
       [not found]                                       ` <<83oabrouwj.fsf@gnu.org>
                                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 102+ messages in thread
From: Eli Zaretskii @ 2016-02-08 19:37 UTC (permalink / raw)
  To: Marcin Borkowski; +Cc: ofv, lokedhs, emacs-devel

> From: Marcin Borkowski <mbork@mbork.pl>
> Cc: wl@gnu.org, ofv@wanadoo.es, lokedhs@gmail.com, emacs-devel@gnu.org
> Date: Mon, 08 Feb 2016 20:18:48 +0100
> 
> Drew is right: if this is not configurable by users, it might end up
> more annoying than helping.

It's already configurable, always have been.  This is Emacs, right?




^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: Character folding in the pretest
       [not found]                                       ` <<83oabrouwj.fsf@gnu.org>
@ 2016-02-09  0:04                                         ` Drew Adams
  0 siblings, 0 replies; 102+ messages in thread
From: Drew Adams @ 2016-02-09  0:04 UTC (permalink / raw)
  To: Eli Zaretskii, Marcin Borkowski; +Cc: ofv, lokedhs, emacs-devel

> > Drew is right: if this is not configurable by users, it might end up
> > more annoying than helping.
> 
> It's already configurable, always have been.  This is Emacs, right?

What I suggested was introducing easy, flexible, powerful ways to
customize/configure.  I gave more specifics, including ability to
(easily) define multiple equivalence classes, switch among them,
combine them in various ways, associate them with given modes, etc.

"This is Emacs" and "this is Lisp", therefore you can do nearly
anything is not what I had in mind.

FWIW, I suggested these things not because otherwise "it might end
up more annoying than helping".  It's already a useful feature.

But it can and should become more useful still.  There's no hurry,
but there's also no harm in thinking about what ways a user might
interact with such possible additional features.



^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: Character folding in the pretest
  2016-02-08 19:18                                     ` Marcin Borkowski
  2016-02-08 19:37                                       ` Eli Zaretskii
       [not found]                                       ` <<83oabrouwj.fsf@gnu.org>
@ 2016-02-09 12:15                                       ` Richard Stallman
       [not found]                                       ` <<E1aT7CM-0005LM-9f@fencepost.gnu.org>
  3 siblings, 0 replies; 102+ messages in thread
From: Richard Stallman @ 2016-02-09 12:15 UTC (permalink / raw)
  To: Marcin Borkowski; +Cc: ofv, eliz, lokedhs, emacs-devel

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

I think it is clear that people want various different character
folding rules.  The differences depend partly on what language the
text is in, partly on whether the user actually speaks that language,
and partly on personal preference.

Rather than arguing for an a-priori rule, we should let users show us
what they actually like, and then try to find general patterns in
those preferences so that we can make general defaults that users
tend to like.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.




^ permalink raw reply	[flat|nested] 102+ messages in thread

* RE: Character folding in the pretest
       [not found]                                       ` <<E1aT7CM-0005LM-9f@fencepost.gnu.org>
@ 2016-02-09 15:26                                         ` Drew Adams
  0 siblings, 0 replies; 102+ messages in thread
From: Drew Adams @ 2016-02-09 15:26 UTC (permalink / raw)
  To: rms, Marcin Borkowski; +Cc: ofv, eliz, lokedhs, emacs-devel

> I think it is clear that people want various different character
> folding rules.  The differences depend partly on what language the
> text is in, partly on whether the user actually speaks that language,
> and partly on personal preference.
> 
> Rather than arguing for an a-priori rule, we should let users show us
> what they actually like, and then try to find general patterns in
> those preferences so that we can make general defaults that users
> tend to like.

+1



^ permalink raw reply	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2016-02-09 15:26 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-03  0:31 Character folding in the pretest Per Starbäck
2016-02-03  6:34 ` Adrian.B.Robert
2016-02-03  8:00 ` Paul Eggert
2016-02-03 10:54   ` Yuri Khan
2016-02-03 15:57     ` Filipp Gunbin
2016-02-03 16:24       ` Drew Adams
2016-02-03 16:46         ` Clément Pit--Claudel
2016-02-03 17:28           ` Drew Adams
2016-02-03 18:10             ` Clément Pit--Claudel
2016-02-03 18:24           ` Clément Pit--Claudel
2016-02-03 18:31             ` Drew Adams
2016-02-03 16:52       ` Yuri Khan
2016-02-03 11:08 ` Artur Malabarba
2016-02-03 13:24   ` Stefan Monnier
2016-02-03 13:35     ` Nicolas Petton
2016-02-03 15:06       ` Drew Adams
2016-02-03 15:41       ` Eli Zaretskii
2016-02-03 15:55         ` Teemu Likonen
2016-02-03 16:16           ` Eli Zaretskii
2016-02-06 13:41             ` Teemu Likonen
2016-02-06 14:33               ` Eli Zaretskii
2016-02-06 15:09                 ` Teemu Likonen
2016-02-06 18:38                   ` Artur Malabarba
2016-02-06 19:08                     ` Eli Zaretskii
2016-02-07  1:06                       ` Artur Malabarba
2016-02-03 16:54         ` Clément Pit--Claudel
2016-02-03 17:01           ` John Wiegley
2016-02-03 21:08             ` Óscar Fuentes
2016-02-03 22:32               ` John Wiegley
2016-02-03 22:52                 ` Clément Pit--Claudel
2016-02-03 23:50                 ` Sacha Chua
2016-02-04  5:49               ` Ivan Andrus
2016-02-04 21:30                 ` Richard Stallman
2016-02-04  8:40               ` Elias Mårtenson
2016-02-04 11:57                 ` Dirk-Jan C. Binnema
2016-02-04 15:18                   ` Drew Adams
2016-02-04 15:59                     ` Óscar Fuentes
2016-02-04 16:36                       ` Clément Pit--Claudel
2016-02-04 16:47                         ` Óscar Fuentes
2016-02-04 17:05                           ` Werner LEMBERG
2016-02-05  5:09                             ` Elias Mårtenson
2016-02-05  6:01                               ` Werner LEMBERG
2016-02-05  6:36                                 ` Elias Mårtenson
2016-02-05  7:15                                   ` Werner LEMBERG
2016-02-05  7:22                                     ` Elias Mårtenson
2016-02-06 15:43                                       ` Rasmus
2016-02-06 15:51                                         ` Eli Zaretskii
2016-02-05  7:52                                   ` Eli Zaretskii
2016-02-05 15:09                                     ` Filipp Gunbin
2016-02-05 19:21                                       ` Eli Zaretskii
2016-02-05 21:12                                         ` Óscar Fuentes
2016-02-05 22:20                                           ` Eli Zaretskii
2016-02-06 19:49                                           ` Richard Stallman
2016-02-06 19:49                                         ` Richard Stallman
2016-02-08 14:05                                 ` Marcin Borkowski
2016-02-08 17:48                                   ` Eli Zaretskii
2016-02-08 17:57                                     ` Werner LEMBERG
2016-02-08 19:18                                     ` Marcin Borkowski
2016-02-08 19:37                                       ` Eli Zaretskii
     [not found]                                       ` <<83oabrouwj.fsf@gnu.org>
2016-02-09  0:04                                         ` Drew Adams
2016-02-09 12:15                                       ` Richard Stallman
     [not found]                                       ` <<E1aT7CM-0005LM-9f@fencepost.gnu.org>
2016-02-09 15:26                                         ` Drew Adams
2016-02-06 12:58                               ` Rasmus
2016-02-04 17:12                           ` Eli Zaretskii
2016-02-04 19:35                             ` Óscar Fuentes
2016-02-04 19:52                               ` Clément Pit--Claudel
2016-02-04 20:05                               ` Eli Zaretskii
2016-02-04 17:27                           ` Clément Pit--Claudel
2016-02-04 17:34                             ` Eli Zaretskii
2016-02-04 18:18                             ` Yuri Khan
2016-02-04 19:46                             ` Óscar Fuentes
2016-02-04 20:06                               ` Clément Pit--Claudel
2016-02-04 20:40                                 ` Óscar Fuentes
2016-02-04 20:56                                   ` Clément Pit--Claudel
2016-02-04 21:16                                     ` Óscar Fuentes
2016-02-04 20:07                               ` Eli Zaretskii
2016-02-04 20:52                                 ` Óscar Fuentes
2016-02-04 20:59                                   ` Clément Pit--Claudel
2016-02-04 21:08                                   ` Eli Zaretskii
2016-02-04 20:23                         ` John Wiegley
2016-02-04 17:07                       ` Eli Zaretskii
2016-02-04 17:31                         ` Clément Pit--Claudel
2016-02-04 23:05                     ` Artur Malabarba
2016-02-06  9:37                       ` Per Starbäck
2016-02-06 10:41                         ` Eli Zaretskii
2016-02-06 12:52                           ` Rasmus
2016-02-06 14:31                             ` Eli Zaretskii
2016-02-06 14:24                           ` Ken Brown
2016-02-06 15:07                             ` Eli Zaretskii
2016-02-04 16:54                   ` Eli Zaretskii
2016-02-04 17:36                     ` Paul Eggert
2016-02-04 17:45                       ` Eli Zaretskii
2016-02-04 19:25                         ` Paul Eggert
2016-02-04 19:36                           ` Eli Zaretskii
2016-02-04 17:26                   ` Teemu Likonen
2016-02-05  8:08                     ` Adrian.B.Robert
2016-02-04 21:32                 ` Richard Stallman
2016-02-08 14:12                   ` Marcin Borkowski
2016-02-03 17:02           ` Eli Zaretskii
2016-02-03 15:38   ` Eli Zaretskii
2016-02-03 22:53   ` Richard Stallman
2016-02-03 15:39 ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).