* On language-dependent defaults for character-folding @ 2016-02-09 17:26 Artur Malabarba 2016-02-09 17:39 ` Pierpaolo Bernardi ` (5 more replies) 0 siblings, 6 replies; 263+ messages in thread From: Artur Malabarba @ 2016-02-09 17:26 UTC (permalink / raw) To: emacs-devel Hi everyone, Firstly, let me say that character folding will be more easily configurable soon. The current message is not about that, it's about the default behaviour. It's important that the default be helpful, without appearing to be "buggy" to unsuspecting users. == Context == A lot of people have raised concerns with the default behaviour of character folding. The argument usually goes like this: “as a Spanish user, n and ñ are different letters, and if searching for n will find instances of ñ, then that is a false positive. This folding should be disabled for Spanish users.” (and so on). One of the solutions suggested is that the set of foldings used by default should depend on some buffer-local notion of current language. == My Point == I agree that the default behaviour should be a little smarter (i.e., I agree with the argument), but I disagree that the **buffer's** language has anything to do with that. Char folding is primarily about being able to easily search for characters that you can't easily type. It also has secondary uses, like searching when you're not even sure which character you want to search for, but I'm focusing on the first. The set of characters that I can easily type is defined by 3 things: 1. My keyboard layout. 2. The input method in the current Emacs buffer. 3. Any special commands/keybinds that I have specifically set up. Note how the language of the text in the buffer does not show up there. It does not matter whether the current buffer is in English, Portuguese, or Spanish, I simply cannot type ñ without at least 4 keystrokes. As long as my keyboard layout is not Spanish, I want to be able to find ñ when searching for n. The language of the text is irrelevant. (I'm using Spanish as the example here, obviously this holds for most languages). That's why the default set of char foldings should depend on item 1 above. (It might eventually be nice to take item 2 into account too, and it's simply impossible to account for item 3). Note that it also doesn't matter whether or not I'm proficient in Spanish. I still can't type ñ in less than 4 keystrokes. == Bottomline == I don't know if it's possible to figure out the language of the user's keyboard layout. But the point is that we should care about the language that the user can _type_ in, NOT the language that they happen to be _reading_ now nor the language that they happen to _know_. Cheers everyone, Artur ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-09 17:26 On language-dependent defaults for character-folding Artur Malabarba @ 2016-02-09 17:39 ` Pierpaolo Bernardi 2016-02-09 17:54 ` Paul Eggert 2016-02-09 17:48 ` Drew Adams ` (4 subsequent siblings) 5 siblings, 1 reply; 263+ messages in thread From: Pierpaolo Bernardi @ 2016-02-09 17:39 UTC (permalink / raw) To: Artur Malabarba; +Cc: emacs-devel On Tue, Feb 9, 2016 at 6:26 PM, Artur Malabarba <bruce.connor.am@gmail.com> wrote: > == Bottomline == > I don't know if it's possible to figure out the language of the user's > keyboard layout. But the point is that we should care about the > language that the user can _type_ in, NOT the language that they > happen to be _reading_ now nor the language that they happen to > _know_. So, if I'm using my laptop on which I use a US-international layout I will get no folding for any character in Latin-1, if I use a nearby machine with an Italian keyboard layout I get a different behavior, if I use another machine with a US layout I get another different behavior. That will be the time that I revert to Emacs 19. FWIW, my preference would be for a different function altogether, disjoint from the non-folding version. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-09 17:39 ` Pierpaolo Bernardi @ 2016-02-09 17:54 ` Paul Eggert 2016-02-10 0:49 ` Pierpaolo Bernardi 0 siblings, 1 reply; 263+ messages in thread From: Paul Eggert @ 2016-02-09 17:54 UTC (permalink / raw) To: Pierpaolo Bernardi, Artur Malabarba; +Cc: emacs-devel On 02/09/2016 09:39 AM, Pierpaolo Bernardi wrote: > So, if I'm using my laptop on which I use a US-international layout I > will get no folding for any character in Latin-1 That's not what Artur's saying. The layout of the keyboard hardware is not the same thing as the language that the user can easily type in. I agree with Artur's point: typically, searching convenience depends more on the language of the user doing the searching than on the language of the document being searched. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-09 17:54 ` Paul Eggert @ 2016-02-10 0:49 ` Pierpaolo Bernardi 2016-02-10 2:20 ` Artur Malabarba 0 siblings, 1 reply; 263+ messages in thread From: Pierpaolo Bernardi @ 2016-02-10 0:49 UTC (permalink / raw) To: Paul Eggert; +Cc: Artur Malabarba, emacs-devel On Tue, Feb 9, 2016 at 6:54 PM, Paul Eggert <eggert@cs.ucla.edu> wrote: > On 02/09/2016 09:39 AM, Pierpaolo Bernardi wrote: >> >> So, if I'm using my laptop on which I use a US-international layout I >> will get no folding for any character in Latin-1 > > That's not what Artur's saying. The layout of the keyboard hardware is not > the same thing as the language that the user can easily type in. How so? The layout of the keyboard hardware and its driver are fundamental parts of what one can easily type in. The point is that he proposes to have the default behavior of Emacs be different depending on random environmental features of the computer it's running on. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-10 0:49 ` Pierpaolo Bernardi @ 2016-02-10 2:20 ` Artur Malabarba 2016-02-10 3:01 ` Pierpaolo Bernardi 0 siblings, 1 reply; 263+ messages in thread From: Artur Malabarba @ 2016-02-10 2:20 UTC (permalink / raw) To: Pierpaolo Bernardi; +Cc: Paul Eggert, emacs-devel [-- Attachment #1: Type: text/plain, Size: 324 bytes --] On 9 Feb 2016 10:49 pm, "Pierpaolo Bernardi" <olopierpa@gmail.com> wrote: > The point is that he proposes to have the default behavior of Emacs be > different depending on random environmental features of the computer > it's running on. Except for the word "random", yes, that was the proposal. Why do you feel that's bad? [-- Attachment #2: Type: text/html, Size: 454 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-10 2:20 ` Artur Malabarba @ 2016-02-10 3:01 ` Pierpaolo Bernardi 2016-02-10 9:55 ` Artur Malabarba 0 siblings, 1 reply; 263+ messages in thread From: Pierpaolo Bernardi @ 2016-02-10 3:01 UTC (permalink / raw) To: Artur Malabarba; +Cc: Paul Eggert, emacs-devel On Wed, Feb 10, 2016 at 3:20 AM, Artur Malabarba <bruce.connor.am@gmail.com> wrote: > On 9 Feb 2016 10:49 pm, "Pierpaolo Bernardi" <olopierpa@gmail.com> wrote: >> The point is that he proposes to have the default behavior of Emacs be >> different depending on random environmental features of the computer >> it's running on. > > Except for the word "random", yes, that was the proposal. Why do you feel > that's bad? Because I want a consistent behavior. The example I made is not invented, I use regularly more than one machine. These machines have different keyboards layouts and drivers, because not all of them are under my control, and I cannot uniform their hardware and system software, even if I wished to do so. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-10 3:01 ` Pierpaolo Bernardi @ 2016-02-10 9:55 ` Artur Malabarba 2016-02-10 18:12 ` Óscar Fuentes 0 siblings, 1 reply; 263+ messages in thread From: Artur Malabarba @ 2016-02-10 9:55 UTC (permalink / raw) To: Pierpaolo Bernardi; +Cc: Paul Eggert, emacs-devel On 10 February 2016 at 03:01, Pierpaolo Bernardi <olopierpa@gmail.com> wrote: >> Except for the word "random", yes, that was the proposal. Why do you feel >> that's bad? > > Because I want a consistent behavior. The example I made is not > invented, I use regularly more than one machine. These machines have > different keyboards layouts and drivers, because not all of them are > under my control, and I cannot uniform their hardware and system > software, even if I wished to do so. That's my situation too. Half the time I'm on an english keyboard, where I would be glad if Emacs helped me out with Portuguese diacritics. Of course, that'ts just my opinion. I'd like to understand other people's opinon too. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-10 9:55 ` Artur Malabarba @ 2016-02-10 18:12 ` Óscar Fuentes 2016-02-10 19:23 ` Artur Malabarba 0 siblings, 1 reply; 263+ messages in thread From: Óscar Fuentes @ 2016-02-10 18:12 UTC (permalink / raw) To: emacs-devel Artur Malabarba <bruce.connor.am@gmail.com> writes: > That's my situation too. Half the time I'm on an english keyboard, > where I would be glad if Emacs helped me out with Portuguese > diacritics. Why don't you configure your input method? Almost all the time I use a US keyboard and have no problem entering diacritics, thanks to the US-International input method of the OS. Emacs has its own input method mechanism too which works on an almost identical way. [snip] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-10 18:12 ` Óscar Fuentes @ 2016-02-10 19:23 ` Artur Malabarba 0 siblings, 0 replies; 263+ messages in thread From: Artur Malabarba @ 2016-02-10 19:23 UTC (permalink / raw) To: Óscar Fuentes; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 371 bytes --] On 10 Feb 2016 4:12 pm, "Óscar Fuentes" <ofv@wanadoo.es> wrote: > > > That's my situation too. Half the time I'm on an english keyboard, > > where I would be glad if Emacs helped me out with Portuguese > > diacritics. > > Why don't you configure your input method? Yes, I usually turn on an input method. Char folding is just more convenient (WRT searching). [-- Attachment #2: Type: text/html, Size: 519 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* RE: On language-dependent defaults for character-folding 2016-02-09 17:26 On language-dependent defaults for character-folding Artur Malabarba 2016-02-09 17:39 ` Pierpaolo Bernardi @ 2016-02-09 17:48 ` Drew Adams 2016-02-09 16:43 ` Artur Malabarba 2016-02-09 17:58 ` Eli Zaretskii ` (3 subsequent siblings) 5 siblings, 1 reply; 263+ messages in thread From: Drew Adams @ 2016-02-09 17:48 UTC (permalink / raw) To: bruce.connor.am, emacs-devel > Char folding is primarily about being able to easily search for > characters that you can't easily type. It also has secondary uses, > like searching when you're not even sure which character you want to > search for, but I'm focusing on the first. I would say that it is primarily about searching for *any of a given set of characters*. It has nothing to do, necessarily, with the difficulty of typing certain characters, and it has nothing to do, necessarily, with not knowing which characters you want to search for. It's simply about wanting to treat a given set of chars as equivalent for search purposes. How you input a search pattern (typing, pasting) is only one consideration, for operation. > the point is that we should care about the > language that the user can _type_ in, NOT the language that they > happen to be _reading_ now nor the language that they happen to > _know_. Typing is only one consideration when defining default behavior. It is of course a reasonable thing to consider. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-09 17:48 ` Drew Adams @ 2016-02-09 16:43 ` Artur Malabarba 0 siblings, 0 replies; 263+ messages in thread From: Artur Malabarba @ 2016-02-09 16:43 UTC (permalink / raw) To: Drew Adams; +Cc: emacs-devel Drew Adams <drew.adams@oracle.com> writes: > I would say that it is primarily about searching for *any of a > given set of characters*. [...] > It's simply about wanting to treat a given set of chars as > equivalent for search purposes. How you input a search pattern > (typing, pasting) is only one consideration, for operation. Fair enough. It's good to know how others think of this feature. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-09 17:26 On language-dependent defaults for character-folding Artur Malabarba 2016-02-09 17:39 ` Pierpaolo Bernardi 2016-02-09 17:48 ` Drew Adams @ 2016-02-09 17:58 ` Eli Zaretskii 2016-02-09 17:10 ` Artur Malabarba 2016-02-09 18:21 ` Óscar Fuentes ` (2 subsequent siblings) 5 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-09 17:58 UTC (permalink / raw) To: bruce.connor.am; +Cc: emacs-devel > From: Artur Malabarba <bruce.connor.am@gmail.com> > Date: Tue, 9 Feb 2016 17:26:32 +0000 > > I don't know if it's possible to figure out the language of the user's > keyboard layout. It's possible on some systems (maybe on all of them). But it isn't TRT, IMO, because one can use input methods external to Emacs, which makes this problem unsolvable, AFAIU. I think our energy will be much better spent by preparing a data base of preferences by various groups of users, including (but not limited to) something that can be vaguely called "typical user of language X", for several values of X. I think we can come up with other types of groups as well. Thanks. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-09 17:58 ` Eli Zaretskii @ 2016-02-09 17:10 ` Artur Malabarba 0 siblings, 0 replies; 263+ messages in thread From: Artur Malabarba @ 2016-02-09 17:10 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> I don't know if it's possible to figure out the language of the user's >> keyboard layout. > > It's possible on some systems (maybe on all of them). But it isn't > TRT, IMO, because one can use input methods external to Emacs, which > makes this problem unsolvable, AFAIU. > > I think our energy will be much better spent by preparing a data base > of preferences by various groups of users, including (but not limited > to) something that can be vaguely called "typical user of language X", > for several values of X. I disagree that it's not TRT. Most problems are technically unsolvable if you take into account the infinity of ways that the user could have customized Emacs or their OS, that doesn't prevent us from solving the “typical” case. But I'm also fine with your proposed alternative. Having a separate setting that governs multiple features and might allow us to identify a user's “main” language (or something like that), sounds useful too. While I'd prefer to rely on “the language that the user types in”, relying on “the user's language” is a fine compromise. As long as “the buffer's language” doesn't factor in. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-09 17:26 On language-dependent defaults for character-folding Artur Malabarba ` (2 preceding siblings ...) 2016-02-09 17:58 ` Eli Zaretskii @ 2016-02-09 18:21 ` Óscar Fuentes 2016-02-09 19:54 ` Artur Malabarba 2016-02-10 13:52 ` Adrian.B.Robert 2016-02-24 9:58 ` Marcin Borkowski 5 siblings, 1 reply; 263+ messages in thread From: Óscar Fuentes @ 2016-02-09 18:21 UTC (permalink / raw) To: emacs-devel Artur Malabarba <bruce.connor.am@gmail.com> writes: [snip] > == Bottomline == > I don't know if it's possible to figure out the language of the user's > keyboard layout. But the point is that we should care about the > language that the user can _type_ in, Figuring out this (and acting upon that knowledge) looks like a quite complex task to me. In practice, letting the user tell Emacs about how the char folding should happen is more reasonable. > NOT the language that they > happen to be _reading_ now nor the language that they happen to > _know_. What I get from all this saga it that character folding is about allowing users to search for weird characters used by those funny-looking aliens who are harrassed by the guards when they pretend to cross our borders :-) You don't care about what the character really is, you just notice that it is "that character I know with some decoration added" and then use the character you know for searching for the funny one. I hope you all realize that the users who can benefit from this feature are those who are ill-equiped to *search* for certain characters, related to the latin alphabet, and need to that only occasionally. OTOH we have the people who actually write those characters, hence they don't need help for searching for them, and who will be pissed to discover that Isearch is broken. We don't need a smarter feature, we need a sane default, which is "disabled". When activated, act as Unicode says, which seems to be clearly defined. That's it. Much of the confussion on this topic originated on the expectation that the feature could be used for searching for equivalent characters within a language (*), but as that is not what is about, the need for language-dependent customizations vanishes, and with it the complexity goes away too. * Some languages (French) may benefit from the feature anyways, because the "equivalence classes" of theirs happen to coincide with what the character folding feature does. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-09 18:21 ` Óscar Fuentes @ 2016-02-09 19:54 ` Artur Malabarba 2016-02-09 20:08 ` Eli Zaretskii 2016-02-09 21:07 ` Óscar Fuentes 0 siblings, 2 replies; 263+ messages in thread From: Artur Malabarba @ 2016-02-09 19:54 UTC (permalink / raw) To: Óscar Fuentes; +Cc: emacs-devel On 9 February 2016 at 18:21, Óscar Fuentes <ofv@wanadoo.es> wrote: >> I don't know if it's possible to figure out the language of the user's >> keyboard layout. But the point is that we should care about the >> language that the user can type in, > > Figuring out this (and acting upon that knowledge) looks like a quite > complex task to me. In practice, letting the user tell Emacs about how > the char folding should happen is more reasonable. 1. Take the set of all characters in the language that the user types in; 2. Don't fold these characters. That's all the complexity. If we have a database of characters in a language, this could even be done automatically. If we don't have such a database, then all we need is some quick input from a user of that language (this doesn't need to happen all at once, there's no rush). > I hope you all realize that the users who can benefit from this feature > are those who are ill-equiped to search for certain characters, I could be wrong, but I think you just defined all users. In the Unicode standard used by Emacs, there are 5721 characters with a “decomposition” property. Is there a user who is well-equiped to type all of those characters? > OTOH > we have the people who actually write those characters, hence they don't > need help for searching for them, and who will be pissed to discover > that Isearch is broken. The whole point here is to find defaults that won't fold characters of the user's language. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-09 19:54 ` Artur Malabarba @ 2016-02-09 20:08 ` Eli Zaretskii 2016-02-10 1:58 ` Artur Malabarba 2016-02-09 21:07 ` Óscar Fuentes 1 sibling, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-09 20:08 UTC (permalink / raw) To: bruce.connor.am; +Cc: ofv, emacs-devel > From: Artur Malabarba <bruce.connor.am@gmail.com> > Date: Tue, 9 Feb 2016 19:54:57 +0000 > Cc: emacs-devel <emacs-devel@gnu.org> > > 1. Take the set of all characters in the language that the user types in; > 2. Don't fold these characters. I think should make an exception to rule 2 for character sequences that are displayed as some character in the user's language: those must be folded, otherwise the result will be very confusing. For example, searching for ñ (one character) should also find a sequence of 2 characters ñ, and vice versa, even for languages where ñ can be typed on the keyboard. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-09 20:08 ` Eli Zaretskii @ 2016-02-10 1:58 ` Artur Malabarba 0 siblings, 0 replies; 263+ messages in thread From: Artur Malabarba @ 2016-02-10 1:58 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 577 bytes --] On 9 Feb 2016 6:08 pm, "Eli Zaretskii" <eliz@gnu.org> wrote: > > 1. Take the set of all characters in the language that the user types in; > > 2. Don't fold these characters. > > I think should make an exception to rule 2 for character sequences > that are displayed as some character in the user's language: those > must be folded, otherwise the result will be very confusing. For > example, searching for ñ (one character) should also find a sequence > of 2 characters ñ, and vice versa, even for languages where ñ can be > typed on the keyboard. I agree. [-- Attachment #2: Type: text/html, Size: 742 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-09 19:54 ` Artur Malabarba 2016-02-09 20:08 ` Eli Zaretskii @ 2016-02-09 21:07 ` Óscar Fuentes 2016-02-10 2:18 ` Artur Malabarba 2016-02-13 16:32 ` On language-dependent defaults for character-folding Marcin Borkowski 1 sibling, 2 replies; 263+ messages in thread From: Óscar Fuentes @ 2016-02-09 21:07 UTC (permalink / raw) To: emacs-devel Artur Malabarba <bruce.connor.am@gmail.com> writes: > On 9 February 2016 at 18:21, Óscar Fuentes <ofv@wanadoo.es> wrote: >>> I don't know if it's possible to figure out the language of the user's >>> keyboard layout. But the point is that we should care about the >>> language that the user can type in, >> >> Figuring out this (and acting upon that knowledge) looks like a quite >> complex task to me. In practice, letting the user tell Emacs about how >> the char folding should happen is more reasonable. > > 1. Take the set of all characters in the language that the user types in; > 2. Don't fold these characters. Today I read your blog post about this feature: http://endlessparentheses.com/new-in-emacs-25-1-easily-search-non-ascii-characters.html where you say "As any Brazilian, I am a daily user of diacritical marks (ó, ã, ê, and the likes), and even though my keyboard can type these characters, I still enjoy the simplicity of not having to." And now I'm utterly confused. Your example is about using the feature within your language, which you admit you have no problem with writing, and now you talk about not folding the characters of the user's language? When at first I looked at the feature I thought that it was precisely about what you mention on the blog entry and deemed it as something I would use for the same reasons you mention on your example, until I noticed the issue with n/ñ, when I was told that the feature was about something else. > That's all the complexity. If we have a database of characters in a > language, this could even be done automatically. If we don't have such > a database, then all we need is some quick input from a user of that > language (this doesn't need to happen all at once, there's no rush). > >> I hope you all realize that the users who can benefit from this feature >> are those who are ill-equiped to search for certain characters, > > I could be wrong, but I think you just defined all users. In the > Unicode standard used by Emacs, there are 5721 characters with a > “decomposition” property. Is there a user who is well-equiped to type > all of those characters? (And how many of those 5721 characters can be matched from a latin letter?) How typical for an Emacs user is to have to *search* (not write) for a composed character that he can not type with his input setup? Sure, people like Eli may have to do that quite often, because he has an heterogeneous cultural background and also works on tasks related to internationalization, but it is reasonable to assume that most users will not need the feature often, if at all. From my POV, if you see the feature as an aid for searching composed characters by people without the adequate input method, there is no problem at all. Just make it optional, perhaps toggable while inside Isearch. This way the people who need it can use it, and Isearch will not break for the rest. [snip] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-09 21:07 ` Óscar Fuentes @ 2016-02-10 2:18 ` Artur Malabarba 2016-02-10 2:52 ` Óscar Fuentes ` (3 more replies) 2016-02-13 16:32 ` On language-dependent defaults for character-folding Marcin Borkowski 1 sibling, 4 replies; 263+ messages in thread From: Artur Malabarba @ 2016-02-10 2:18 UTC (permalink / raw) To: Óscar Fuentes; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 1339 bytes --] On 9 Feb 2016 7:07 pm, "Óscar Fuentes" <ofv@wanadoo.es> wrote: > > > > 1. Take the set of all characters in the language that the user types in; > > 2. Don't fold these characters. > > Today I read your blog post about this feature: [...] > > And now I'm utterly confused. Your example is about using the feature > within your language, which you admit you have no problem with writing, > and now you talk about not folding the characters of the user's > language? I'm sorry that post confused you. That post states my personal preference (I like the "fold all unicode decompositions" behaviour). That post does NOT reflect what I think should be the default. What I've written here on this thread is what I think should be the default. Although currently Emacs does fold all decompositions by default, this is just temporary. We've said we would turn that off before release (and in fact I'll do that tomorrow (and ammend my post too)). > (And how many of those 5721 characters can be matched from a latin > letter?) OK, I see what you meant. > How typical for an Emacs user is to have to *search* (not write) for a > composed character that he can not type with his input setup? I have no idea, which is why this feature will be off by default until I feel confident it won't get in anyone's way. [-- Attachment #2: Type: text/html, Size: 1641 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-10 2:18 ` Artur Malabarba @ 2016-02-10 2:52 ` Óscar Fuentes 2016-02-10 2:56 ` Mark Oteiza ` (2 subsequent siblings) 3 siblings, 0 replies; 263+ messages in thread From: Óscar Fuentes @ 2016-02-10 2:52 UTC (permalink / raw) To: emacs-devel Artur Malabarba <bruce.connor.am@gmail.com> writes: > I'm sorry that post confused you. That post states my personal preference > (I like the "fold all unicode decompositions" behaviour). Possibly in Portuguese there is no problem with folding matching unrelated characters. If it wasn't for the n/ñ case in Spanish, most likely I would turn on the feature on my setup. >> How typical for an Emacs user is to have to *search* (not write) for a >> composed character that he can not type with his input setup? > > I have no idea, which is why this feature will be off by default until I > feel confident it won't get in anyone's way. That's very reasonable. Thank you. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-10 2:18 ` Artur Malabarba 2016-02-10 2:52 ` Óscar Fuentes @ 2016-02-10 2:56 ` Mark Oteiza 2016-02-10 15:25 ` Eli Zaretskii 2016-02-11 0:54 ` Juri Linkov 3 siblings, 0 replies; 263+ messages in thread From: Mark Oteiza @ 2016-02-10 2:56 UTC (permalink / raw) To: emacs-devel Artur Malabarba <bruce.connor.am@gmail.com> writes: > Although currently Emacs does fold all decompositions by default, this > is just temporary. We've said we would turn that off before release > (and in fact I'll do that tomorrow (and amend my post too)). Thank you. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-10 2:18 ` Artur Malabarba 2016-02-10 2:52 ` Óscar Fuentes 2016-02-10 2:56 ` Mark Oteiza @ 2016-02-10 15:25 ` Eli Zaretskii 2016-02-10 21:17 ` Artur Malabarba ` (2 more replies) 2016-02-11 0:54 ` Juri Linkov 3 siblings, 3 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-10 15:25 UTC (permalink / raw) To: bruce.connor.am; +Cc: ofv, emacs-devel > Date: Wed, 10 Feb 2016 02:18:03 +0000 > From: Artur Malabarba <bruce.connor.am@gmail.com> > Cc: emacs-devel <emacs-devel@gnu.org> > > > > I could be wrong, but I think you just defined all users. In the > > > Unicode standard used by Emacs, there are 5721 characters with a > > > “decomposition” property. Is there a user who is well-equiped to type > > > all of those characters? > > > > (And how many of those 5721 characters can be matched from a latin > > letter?) > > OK, I see what you meant. You do? I don't, because the answer to Óscar's question is: 376 if we count only canonical decompositions (which we must support, or users will hate us), and a whopping 1449 if we count compatibility decompositions as well. That's quite a few, I'd say, although AFAIR we don't find all of the compatibility decompositions under character folding, only some. Btw, from my POV, the ease of searching for characters not on my keyboard is not the main point of this feature. The main feature is to search for similar characters. (Of course, I don't mind if someone likes this for other reasons.) > Although currently Emacs does fold all decompositions by default, this is just temporary. We've said we would turn that off before release (and in fact I'll do that tomorrow (and ammend my post too)). We didn't say we will turn it off, we said we will _decide_ whether to turn it off. So please don't turn it off just yet, we are still collecting feedback. If anything, for now I counted more people who said they liked it than those who didn't (5 vs 9, by my count). I'm not saying we should already decide to leave it on, but turning it off is certainly premature. Less than two weeks have passed since the pretest began, there's no rush. Thanks. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-10 15:25 ` Eli Zaretskii @ 2016-02-10 21:17 ` Artur Malabarba 2016-02-11 3:39 ` Eli Zaretskii 2016-02-12 22:36 ` Per Starbäck 2016-02-13 16:46 ` joakim 2 siblings, 1 reply; 263+ messages in thread From: Artur Malabarba @ 2016-02-10 21:17 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 1587 bytes --] On 10 Feb 2016 1:25 pm, "Eli Zaretskii" <eliz@gnu.org> wrote: > > > (And how many of those 5721 characters can be matched from a latin > > > letter?) > > > > OK, I see what you meant. > > You do? I think so. But I don't want to prolong that line of thought, because it wasn't a useful argument anyway. > Btw, from my POV, the ease of searching for characters not on my > keyboard is not the main point of this feature. The main feature is > to search for similar characters. (Of course, I don't mind if someone > likes this for other reasons.) That's actually my personal preference too. I like that I can search for "o" and hit "õ" (both are used in Portuguese text). However, this would not be a good _default_ for Brazilian users. Because once in a while you might not want it, and if the user didn't enable this behaviour himself he probably won't know that it can be disabled. (at least, this is what I think right now). > > Although currently Emacs does fold all decompositions by default, this is just temporary. We've said we would turn that off before release (and in fact I'll do that tomorrow (and ammend my post too)). > > We didn't say we will turn it off, we said we will _decide_ whether to > turn it off. So please don't turn it off just yet, we are still > collecting feedback. Sorry, I already did earlier today. Seems I was under the wrong impression. Feel free to turn it back on for now. FTR, my feedback is that I'd like to give the implementation a little more time before enabling it by default on a stable release. [-- Attachment #2: Type: text/html, Size: 1929 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-10 21:17 ` Artur Malabarba @ 2016-02-11 3:39 ` Eli Zaretskii 0 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-11 3:39 UTC (permalink / raw) To: bruce.connor.am; +Cc: emacs-devel > Date: Wed, 10 Feb 2016 21:17:49 +0000 > From: Artur Malabarba <bruce.connor.am@gmail.com> > Cc: emacs-devel <emacs-devel@gnu.org> > > > We didn't say we will turn it off, we said we will _decide_ whether to > > turn it off. So please don't turn it off just yet, we are still > > collecting feedback. > > Sorry, I already did earlier today. Seems I was under the wrong impression. Feel free to turn it back on for > now. Done. > FTR, my feedback is that I'd like to give the implementation a little more time before enabling it by default on a > stable release. Thanks. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-10 15:25 ` Eli Zaretskii 2016-02-10 21:17 ` Artur Malabarba @ 2016-02-12 22:36 ` Per Starbäck 2016-02-13 8:33 ` Eli Zaretskii 2016-02-13 16:46 ` joakim 2 siblings, 1 reply; 263+ messages in thread From: Per Starbäck @ 2016-02-12 22:36 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ofv, Artur Malabarba, emacs-devel@gnu.org Eli wrote: > If anything, for now I counted more people who > said they liked it than those who didn't (5 vs 9, by my count). I'm > not saying we should already decide to leave it on, but turning it off > is certainly premature. Less than two weeks have passed since the > pretest began, there's no rush. Collecting feedback is good, but that counting seems pointless to me if you are counting one person mentioning that people in locale X will see that behaviour as buggy, dumb or completely oblivious to their culture as offset by one person saying they like the feature. It's not about liking the feature or not. We have to listen to what the feedback says instead of just counting it. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 22:36 ` Per Starbäck @ 2016-02-13 8:33 ` Eli Zaretskii 2016-02-13 10:10 ` Markus Triska 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-13 8:33 UTC (permalink / raw) To: Per Starbäck; +Cc: ofv, bruce.connor.am, emacs-devel > Date: Fri, 12 Feb 2016 23:36:46 +0100 > From: Per Starbäck <per.starback@gmail.com> > Cc: Artur Malabarba <bruce.connor.am@gmail.com>, ofv@wanadoo.es, > "emacs-devel@gnu.org" <emacs-devel@gnu.org> > > Eli wrote: > > If anything, for now I counted more people who > > said they liked it than those who didn't (5 vs 9, by my count). I'm > > not saying we should already decide to leave it on, but turning it off > > is certainly premature. Less than two weeks have passed since the > > pretest began, there's no rush. > > Collecting feedback is good, but that counting seems pointless to me > if you are counting one person mentioning that people in locale X will > see that behaviour as buggy, dumb or completely oblivious to their > culture as offset by one person saying they like the feature. It's not > about liking the feature or not. We have to listen to what the > feedback says instead of just counting it. The issue is whether this should stay on by default, and those are the only opinions I count (after carefully reading everything people write about the subject). The strength of the opinion is not something that IMO can be reliably taken into account, because of different writing styles different people use, and because for most of us English is not their first language. The nuances of the wording can therefore be entirely random. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-13 8:33 ` Eli Zaretskii @ 2016-02-13 10:10 ` Markus Triska 2016-02-13 10:21 ` Eli Zaretskii 0 siblings, 1 reply; 263+ messages in thread From: Markus Triska @ 2016-02-13 10:10 UTC (permalink / raw) To: emacs-devel Hi Eli, Eli Zaretskii <eliz@gnu.org> writes: > The issue is whether this should stay on by default, and those are the > only opinions I count (after carefully reading everything people write > about the subject). Please count me in the "default should be off" category. Thank you and all the best, Markus ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-13 10:10 ` Markus Triska @ 2016-02-13 10:21 ` Eli Zaretskii 0 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-13 10:21 UTC (permalink / raw) To: Markus Triska; +Cc: emacs-devel > From: Markus Triska <triska@metalevel.at> > Date: Sat, 13 Feb 2016 11:10:07 +0100 > > Please count me in the "default should be off" category. Done. Thanks. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-10 15:25 ` Eli Zaretskii 2016-02-10 21:17 ` Artur Malabarba 2016-02-12 22:36 ` Per Starbäck @ 2016-02-13 16:46 ` joakim 2 siblings, 0 replies; 263+ messages in thread From: joakim @ 2016-02-13 16:46 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ofv, bruce.connor.am, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> Date: Wed, 10 Feb 2016 02:18:03 +0000 >> From: Artur Malabarba <bruce.connor.am@gmail.com> >> Cc: emacs-devel <emacs-devel@gnu.org> >> >> > > I could be wrong, but I think you just defined all users. In the >> > > Unicode standard used by Emacs, there are 5721 characters with a >> > > “decomposition” property. Is there a user who is well-equiped to type >> > > all of those characters? >> > >> > (And how many of those 5721 characters can be matched from a latin >> > letter?) >> >> OK, I see what you meant. > > You do? I don't, because the answer to Óscar's question is: 376 if we > count only canonical decompositions (which we must support, or users > will hate us), and a whopping 1449 if we count compatibility > decompositions as well. That's quite a few, I'd say, although AFAIR > we don't find all of the compatibility decompositions under character > folding, only some. > > Btw, from my POV, the ease of searching for characters not on my > keyboard is not the main point of this feature. The main feature is > to search for similar characters. (Of course, I don't mind if someone > likes this for other reasons.) > >> Although currently Emacs does fold all decompositions by default, this is just temporary. We've said we would turn that off before release (and in fact I'll do that tomorrow (and ammend my post too)). > > We didn't say we will turn it off, we said we will _decide_ whether to > turn it off. So please don't turn it off just yet, we are still > collecting feedback. If anything, for now I counted more people who > said they liked it than those who didn't (5 vs 9, by my count). I'm > not saying we should already decide to leave it on, but turning it off > is certainly premature. Less than two weeks have passed since the > pretest began, there's no rush. I like character folding, I write mainly in Swedish and English. The mix of Swedish and English usually winds up being horrible, so character folding helps finding things in source code where you are not sure if Swedish characters have been guillotined or not (ÅÄÖ becomes AAO) That said I think the question if something should be default or not generates way too much warm air. I think ELPA should carry a number of installable themes that present a coherent set of defaults. So you could just install 'emacs-xtra-everything' from ELPA and get many interesting features suitable for a fast machine. Or you could go with 'emacs-orthodoxy' which disables certain new settings. (like for instance 'C-x M-o runs the command dired-omit-mode'. I didn't like the newfangled C-x prefix. Otherwise I'm mostly positive to newfangledness) > Thanks. > -- Joakim Verona ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-10 2:18 ` Artur Malabarba ` (2 preceding siblings ...) 2016-02-10 15:25 ` Eli Zaretskii @ 2016-02-11 0:54 ` Juri Linkov 2016-02-11 1:37 ` Óscar Fuentes 3 siblings, 1 reply; 263+ messages in thread From: Juri Linkov @ 2016-02-11 0:54 UTC (permalink / raw) To: Artur Malabarba; +Cc: Óscar Fuentes, emacs-devel > I have no idea, which is why this feature will be off by default until I > feel confident it won't get in anyone's way. How regrettable would be to disable such a useful feature. I'm using char-folding every day a dozen times on multiple languages/scripts in Chromium, and it's a major inconvenience not to be able to use the same in Emacs. Let's not hide/postpone this feature due to an inability to reach a consensus on the default values - we could use the same defaults as in Chromium. These are sane defaults based on Unicode standards and used by millions users. I haven't noticed any annoying matching by the default rules despite not being able to change hard-coded rules or disable char-folding. Unlike Chromium, Emacs is more extensible and customizable, thus we urgently need to provide customization, so everyone could easily add/remove char-folding rules to/from the default set. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-11 0:54 ` Juri Linkov @ 2016-02-11 1:37 ` Óscar Fuentes 2016-02-12 0:50 ` Juri Linkov 0 siblings, 1 reply; 263+ messages in thread From: Óscar Fuentes @ 2016-02-11 1:37 UTC (permalink / raw) To: emacs-devel Juri Linkov <juri@linkov.net> writes: >> I have no idea, which is why this feature will be off by default until I >> feel confident it won't get in anyone's way. > > How regrettable would be to disable such a useful feature. I'm using > char-folding every day a dozen times on multiple languages/scripts in > Chromium, and it's a major inconvenience not to be able to use the same > in Emacs. Is there something that prevents you from enabling the feature on your setup? > Let's not hide/postpone this feature due to an inability to > reach a consensus on the default values - we could use the same defaults > as in Chromium. Just checked. Chromium has the n/ñ bug. Chrome doesn't. > These are sane defaults based on Unicode standards Unicode doesn't have a saying on what is correct on any given language. > and used by millions users. Do you have statistics about Chromium users who take advantage of character folding? > I haven't noticed any annoying matching by the > default rules despite not being able to change hard-coded rules or disable > char-folding. Possibly the languagues you use do not collide with naïve character composition rules, or you ignore them or simply don't care about such rules. > Unlike Chromium, Emacs is more extensible and customizable, > thus we urgently need to provide customization, so everyone could easily > add/remove char-folding rules to/from the default set. It is reasonable to expect from a serious text editor that when you search for a letter it finds that letter, not unrelated letters. With the default configuration, of course. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-11 1:37 ` Óscar Fuentes @ 2016-02-12 0:50 ` Juri Linkov 2016-02-12 1:50 ` Óscar Fuentes 0 siblings, 1 reply; 263+ messages in thread From: Juri Linkov @ 2016-02-12 0:50 UTC (permalink / raw) To: Óscar Fuentes; +Cc: emacs-devel > Possibly the languagues you use do not collide with naïve character > composition rules, or you ignore them or simply don't care about such > rules. Isearch shines in navigation. For example, to move point quickly to the part of your message that contains the word “naïve”, I could simply type ‘C-s naive’. Otherwise, it would take a lot of time entering the char “LATIN SMALL LETTER I WITH DIAERESIS” to the search string. This is the reason why char-folding search is so enormously useful, even though “naïve” and “naive” are different words from the formal grammatical point of view. >> Unlike Chromium, Emacs is more extensible and customizable, >> thus we urgently need to provide customization, so everyone could easily >> add/remove char-folding rules to/from the default set. > > It is reasonable to expect from a serious text editor that when you > search for a letter it finds that letter, not unrelated letters. With > the default configuration, of course. It's much safer to have a default where you are not in danger to miss important things. When a strict non-case-folding search skips a match, you don't know about this loss until you discover later the damage. With the case-folding search, you're visiting all possible matches, and when you think it finds too much, you can narrow the results by disabling this feature. This is why its counterpart case-fold-search is opt-out as well. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 0:50 ` Juri Linkov @ 2016-02-12 1:50 ` Óscar Fuentes 2016-02-12 7:10 ` Eli Zaretskii ` (2 more replies) 0 siblings, 3 replies; 263+ messages in thread From: Óscar Fuentes @ 2016-02-12 1:50 UTC (permalink / raw) To: emacs-devel Juri Linkov <juri@linkov.net> writes: >> Possibly the languagues you use do not collide with naïve character >> composition rules, or you ignore them or simply don't care about such >> rules. > > Isearch shines in navigation. My opinion is that Isearch is terrible for navigation. You may be interested on ace-jump or avy, for jumping to a point that is visible, or a plethora of terrific packages for jumping to a point that is not visible. [snip] > It's much safer to have a default where you are not in danger to miss > important things. A search that matches unrelated text is broken. Full stop. It is possible that, because whatever reason, the brokenness can be convenient for you, but enabling a feature which is convenient for some users and plain wrong for others is not reasonable. > When a strict non-case-folding search skips a match, > you don't know about this loss until you discover later the damage. > With the case-folding search, you're visiting all possible matches, ñ is not a match for n, as long as you follow the rules of the Spanish language. That's the crux of the matter. It is the same as if an English speaker searched "vow" and matched "wow". > and when you think it finds too much, you can narrow the results > by disabling this feature. This is why its counterpart case-fold-search > is opt-out as well. case-fold-search is in another category. character-folding *could* be ok as a default if it were governed by the linguistic rules expected by the user. That's not easy to implement, though, as it seems that there is controversy on some languages. Spanish is very easy on that aspect. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 1:50 ` Óscar Fuentes @ 2016-02-12 7:10 ` Eli Zaretskii 2016-02-12 7:32 ` Óscar Fuentes 2016-02-12 23:50 ` Juri Linkov 2016-02-13 16:38 ` Marcin Borkowski 2 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-12 7:10 UTC (permalink / raw) To: Óscar Fuentes; +Cc: emacs-devel > From: Óscar Fuentes <ofv@wanadoo.es> > Date: Fri, 12 Feb 2016 02:50:20 +0100 > > ñ is not a match for n, as long as you follow the rules of the Spanish > language. Actually, it should be when ñ is in fact ñ (two characters). ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 7:10 ` Eli Zaretskii @ 2016-02-12 7:32 ` Óscar Fuentes 2016-02-12 8:44 ` Eli Zaretskii 0 siblings, 1 reply; 263+ messages in thread From: Óscar Fuentes @ 2016-02-12 7:32 UTC (permalink / raw) To: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> ñ is not a match for n, as long as you follow the rules of the >> Spanish language. > > Actually, it should be when ñ is in fact ñ (two characters). If ñ is meant to be read as ñ, as when it is found on a Spanish word, then ñ and ñ are the same to all effects, so no match should happen. Again, composition rules are irrelevant for a knowledgeable reader of a given language. What matters is the meaning of the characters (composed or not). ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 7:32 ` Óscar Fuentes @ 2016-02-12 8:44 ` Eli Zaretskii 2016-02-12 10:03 ` Óscar Fuentes ` (2 more replies) 0 siblings, 3 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-12 8:44 UTC (permalink / raw) To: Óscar Fuentes; +Cc: emacs-devel > From: Óscar Fuentes <ofv@wanadoo.es> > Date: Fri, 12 Feb 2016 08:32:25 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > >> ñ is not a match for n, as long as you follow the rules of the > >> Spanish language. > > > > Actually, it should be when ñ is in fact ñ (two characters). > > If ñ is meant to be read as ñ Don't you see them displayed identically in Emacs (and in any other program that correctly implements display of combining accents)? Maybe I don't really understand that "if" part. > as when it is found on a Spanish word, Display of combining accents is not language-specific. It should always happen in human-readable text. > then ñ and ñ are the same to all effects, so no match should happen. You mean, a match should happen, right? Otherwise, I'm afraid I see no sense in this logic: IMO identically looking text should match, or else users will kill us. If you agree that a match is TRT in these (and other similar) cases, then you should agree that _some_ form of character folding should be turned on by default. > Again, composition rules are irrelevant for a knowledgeable reader of a > given language. What matters is the meaning of the characters (composed > or not). What is "the meaning of the characters"? Can pieces of text that are displayed identically have different meaning? ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 8:44 ` Eli Zaretskii @ 2016-02-12 10:03 ` Óscar Fuentes 2016-02-12 11:11 ` Joost Kremers 2016-02-12 12:00 ` Eli Zaretskii 2016-02-13 15:32 ` Richard Stallman 2016-02-13 16:37 ` Marcin Borkowski 2 siblings, 2 replies; 263+ messages in thread From: Óscar Fuentes @ 2016-02-12 10:03 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> If ñ is meant to be read as ñ > > Don't you see them displayed identically in Emacs (and in any other > program that correctly implements display of combining accents)? > Maybe I don't really understand that "if" part. They look a bit different here. >> as when it is found on a Spanish word, > > Display of combining accents is not language-specific. It should > always happen in human-readable text. > >> then ñ and ñ are the same to all effects, so no match should happen. > > You mean, a match should happen, right? ñ shall match ñ, but n shall not match either, from an Spaniard POV. > Otherwise, I'm afraid I see > no sense in this logic: IMO identically looking text should match, or > else users will kill us. Agreed, although in practice your example is not a big issue since I do expect to rarely see ñ (the composed variant) used in Spanish text. And probably not easy to implement at all for the general case (all identical-looking combinations for all languages). > If you agree that a match is TRT in these (and other similar) cases, > then you should agree that _some_ form of character folding should be > turned on by default. I see where are you coming from ;-) On my first message on this thread I said that I was ambivalent wrt the default status of this feature, before finding the n/ñ issue. Not so after. A Spaniard could also deem useful to match ú and ü while searching for u. See, the problem here is not character-folding itsef, but how it works: a non-Spaniard could expect matching ñ while searching for n, because for him ñ is a `n' with a tilde, which is essentially the same case as the `u' example mentioned above but from the POV of someone who doesn't know Spanish. (*) [snip] * My English dictionary says: 1. tilde -- (a diacritical mark (~) placed over the letter n in Spanish to indicate a palatal nasal sound or over a vowel in Portuguese to indicate nasalization) No wonder that so many people seems to have a hard time recognizing that ñ is a letter like any other in Spanish, not just an `n' with a tilde. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 10:03 ` Óscar Fuentes @ 2016-02-12 11:11 ` Joost Kremers 2016-02-12 18:21 ` Óscar Fuentes 2016-02-12 12:00 ` Eli Zaretskii 1 sibling, 1 reply; 263+ messages in thread From: Joost Kremers @ 2016-02-12 11:11 UTC (permalink / raw) To: Óscar Fuentes; +Cc: Eli Zaretskii, emacs-devel On Fri, Feb 12 2016, Óscar Fuentes <ofv@wanadoo.es> wrote: > No wonder that so many people seems to have a hard time recognizing that > ñ is a letter like any other in Spanish, not just an `n' with a tilde. Actually, without wanting to be pedantic, but ⟨ñ⟩ (the grapheme) *is* just an ⟨n⟩ with a tilde, regardless of the language one is talking about. The reason why a native speaker of Spanish considers n and ñ to be two different letters is because they represent two different *phonemes* of the Spanish language: /n/ vs. /ɲ/. The term `letter' (as an alphabetic character) is notoriously imprecise, which is the cause of much confusion. -- Joost Kremers Life has its moments ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 11:11 ` Joost Kremers @ 2016-02-12 18:21 ` Óscar Fuentes 0 siblings, 0 replies; 263+ messages in thread From: Óscar Fuentes @ 2016-02-12 18:21 UTC (permalink / raw) To: emacs-devel Joost Kremers <joostkremers@fastmail.fm> writes: > Actually, without wanting to be pedantic, but ⟨ñ⟩ (the grapheme) *is* > just an ⟨n⟩ with a tilde, regardless of the language one is talking > about. The reason why a native speaker of Spanish considers n and ñ to > be two different letters is because they represent two different > *phonemes* of the Spanish language: /n/ vs. /ɲ/. Actually, Spaniards consider ñ to be a letter because that is what we are taught at school. That's what sets our expectations when we use text editors. > The term `letter' (as an alphabetic character) is notoriously imprecise, > which is the cause of much confusion. In Spanish, "letter" is precisely defined. We have 27 of them. `ch' and `ll' were letters in Spanish until 2010, when the Academies decided to demote them, following widespread public opinion. That will not happen to ñ anytime soon. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 10:03 ` Óscar Fuentes 2016-02-12 11:11 ` Joost Kremers @ 2016-02-12 12:00 ` Eli Zaretskii 2016-02-12 18:42 ` Óscar Fuentes 1 sibling, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-12 12:00 UTC (permalink / raw) To: Óscar Fuentes; +Cc: emacs-devel > From: Óscar Fuentes <ofv@wanadoo.es> > Cc: emacs-devel@gnu.org > Date: Fri, 12 Feb 2016 11:03:09 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > >> If ñ is meant to be read as ñ > > > > Don't you see them displayed identically in Emacs (and in any other > > program that correctly implements display of combining accents)? > > Maybe I don't really understand that "if" part. > > They look a bit different here. It could be an issue with your default font. Perhaps it doesn't have the precomposed glyph. > ñ shall match ñ, but n shall not match either, from an Spaniard POV. But in the case of 2 characters, a literal n is present in the buffer, so not finding it would be a miss, don't you think? > > Otherwise, I'm afraid I see > > no sense in this logic: IMO identically looking text should match, or > > else users will kill us. > > Agreed, although in practice your example is not a big issue since I do > expect to rarely see ñ (the composed variant) used in Spanish text. And > probably not easy to implement at all for the general case (all > identical-looking combinations for all languages). We do that by using the Unicode database, because then we are free from the need to decide whether a given diacrtic can or cannot combine with a given base character. > > If you agree that a match is TRT in these (and other similar) cases, > > then you should agree that _some_ form of character folding should be > > turned on by default. > > I see where are you coming from ;-) On my first message on this thread I > said that I was ambivalent wrt the default status of this feature, > before finding the n/ñ issue. Not so after. A Spaniard could also deem > useful to match ú and ü while searching for u. See, the problem here is > not character-folding itsef, but how it works: a non-Spaniard could > expect matching ñ while searching for n, because for him ñ is a `n' with > a tilde, which is essentially the same case as the `u' example mentioned > above but from the POV of someone who doesn't know Spanish. (*) What about finding ⒜ when searching for a, don't you want to find that? This is not specific to any language. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 12:00 ` Eli Zaretskii @ 2016-02-12 18:42 ` Óscar Fuentes 2016-02-12 19:06 ` Eli Zaretskii 2016-02-12 19:09 ` Clément Pit--Claudel 0 siblings, 2 replies; 263+ messages in thread From: Óscar Fuentes @ 2016-02-12 18:42 UTC (permalink / raw) To: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> ñ shall match ñ, but n shall not match either, from an Spaniard POV. > > But in the case of 2 characters, a literal n is present in the buffer, > so not finding it would be a miss, don't you think? Then you are not thinking as an Spaniard, but as someone who is versed on character representations by computers. In practice, n matching ñ (the composed one) will not be a big issue, since it will happen rarely. Same for the rest of compositions that looks like ñ but are not "the" ñ. If someone complains, we can explain what the problem is and that we opted for handling such compositions as groups of characters. > What about finding ⒜ when searching for a, don't you want to find > that? This is not specific to any language. That would be nice, sometimes. If I search for (a), should it match ⒜? What if I wish to replace all occurrences of (a) by [1]? Do you really want to go down that route? But we are digressing. Eli, you are missing the point. If you wish to set Emacs defaults as per the convenience of people who think of text as a series of codes at the expense of breaking basic expectations of those who see text as... text, well, frankly, I don't think it is a good decision. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 18:42 ` Óscar Fuentes @ 2016-02-12 19:06 ` Eli Zaretskii 2016-02-12 19:28 ` Óscar Fuentes 2016-02-12 23:57 ` Juri Linkov 2016-02-12 19:09 ` Clément Pit--Claudel 1 sibling, 2 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-12 19:06 UTC (permalink / raw) To: Óscar Fuentes; +Cc: emacs-devel > From: Óscar Fuentes <ofv@wanadoo.es> > Date: Fri, 12 Feb 2016 19:42:50 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > >> ñ shall match ñ, but n shall not match either, from an Spaniard POV. > > > > But in the case of 2 characters, a literal n is present in the buffer, > > so not finding it would be a miss, don't you think? > > Then you are not thinking as an Spaniard, but as someone who is versed > on character representations by computers. Aren't there Spaniards who are also versed on character representations by computers? > In practice, n matching ñ (the composed one) will not be a big issue, > since it will happen rarely. Same for the rest of compositions that > looks like ñ but are not "the" ñ. If someone complains, we can explain > what the problem is and that we opted for handling such compositions as > groups of characters. So you do think this, too, is not a problem? > > What about finding ⒜ when searching for a, don't you want to find > > that? This is not specific to any language. > > That would be nice, sometimes. If I search for (a), should it match ⒜? I don't know. What do you think? > What if I wish to replace all occurrences of (a) by [1]? Do you really > want to go down that route? I don't think so, no. > But we are digressing. Eli, you are missing the point. If you wish to > set Emacs defaults as per the convenience of people who think of text as > a series of codes at the expense of breaking basic expectations of those > who see text as... text, well, frankly, I don't think it is a good > decision. I was trying to develop a dialogue which will help me and you understand where your resistance begins and where it ends. I think it's important to do that to better understand the issues, but if you don't want that, we can stop any moment. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 19:06 ` Eli Zaretskii @ 2016-02-12 19:28 ` Óscar Fuentes 2016-02-12 23:57 ` Juri Linkov 1 sibling, 0 replies; 263+ messages in thread From: Óscar Fuentes @ 2016-02-12 19:28 UTC (permalink / raw) To: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> > But in the case of 2 characters, a literal n is present in the buffer, >> > so not finding it would be a miss, don't you think? >> >> Then you are not thinking as an Spaniard, but as someone who is versed >> on character representations by computers. > > Aren't there Spaniards who are also versed on character > representations by computers? Maybe less than the 0.1% of the population, but yes. Even those may prefer a default that works for them as Spaniards rather that a default that works for them as users familiarised with text encoding. >> In practice, n matching ñ (the composed one) will not be a big issue, >> since it will happen rarely. Same for the rest of compositions that >> looks like ñ but are not "the" ñ. If someone complains, we can explain >> what the problem is and that we opted for handling such compositions as >> groups of characters. > > So you do think this, too, is not a problem? Do we have resources for setting a default that works as the expected by each and every user all the time? (If possible at all) >> > What about finding ⒜ when searching for a, don't you want to find >> > that? This is not specific to any language. >> >> That would be nice, sometimes. If I search for (a), should it match ⒜? > > I don't know. What do you think? It depends. It's like `a' matching `á' but on steroids. Sometimes I'll find it convenient and sometimes inconvenient. Those are different cases than doing something that is plain wrong for a set of users and convenient for others. >> But we are digressing. Eli, you are missing the point. If you wish to >> set Emacs defaults as per the convenience of people who think of text as >> a series of codes at the expense of breaking basic expectations of those >> who see text as... text, well, frankly, I don't think it is a good >> decision. > > I was trying to develop a dialogue which will help me and you > understand where your resistance begins and where it ends. I think > it's important to do that to better understand the issues, but if you > don't want that, we can stop any moment. I think that I explained it many times, but here it goes again: character folding, as implemented today, might be convenient for some users, but a glaring bug for others, so its default status (on the release) should be chosen on accordance. What's so difficult to understand about that? ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 19:06 ` Eli Zaretskii 2016-02-12 19:28 ` Óscar Fuentes @ 2016-02-12 23:57 ` Juri Linkov 2016-02-13 0:06 ` Drew Adams 2016-02-13 8:49 ` Eli Zaretskii 1 sibling, 2 replies; 263+ messages in thread From: Juri Linkov @ 2016-02-12 23:57 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Óscar Fuentes, emacs-devel > I was trying to develop a dialogue which will help me and you > understand where your resistance begins and where it ends. I think > it's important to do that to better understand the issues, but if you > don't want that, we can stop any moment. Can't we somehow use the same char-folding as is implemented in ICU String Search Service (this is also used for search in Chromium): http://userguide.icu-project.org/collation/icu-string-search-service that supports matching of accented letters, conjoined letters, and ignorable punctuation. As is described in http://userguide.icu-project.org/collation/concepts there are several levels of character matching: 1. Primary Level: differences between base characters 2. Secondary Level: Accents in the characters 3. Tertiary Level: Upper and lower case differences in characters 4. Quaternary Level: Punctuation is ignored (where e.g. snake-cased “black_bird” matches camel-cased “blackBird”) 5. Identical Level Maybe our customization could provide options to choose between all these levels? ^ permalink raw reply [flat|nested] 263+ messages in thread
* RE: On language-dependent defaults for character-folding 2016-02-12 23:57 ` Juri Linkov @ 2016-02-13 0:06 ` Drew Adams 2016-02-13 8:49 ` Eli Zaretskii 1 sibling, 0 replies; 263+ messages in thread From: Drew Adams @ 2016-02-13 0:06 UTC (permalink / raw) To: Juri Linkov, Eli Zaretskii; +Cc: Óscar Fuentes, emacs-devel > As is described in http://userguide.icu-project.org/collation/concepts > there are several levels of character matching: > > 1. Primary Level: differences between base characters > > 2. Secondary Level: Accents in the characters > > 3. Tertiary Level: Upper and lower case differences in characters > > 4. Quaternary Level: Punctuation is ignored (where e.g. snake-cased > “black_bird” matches camel-cased “blackBird”) > > 5. Identical Level > > Maybe our customization could provide options to choose > between all these levels? +1 And not just options but also toggle commands. Thanks for guiding us to consider such groups (in addition to other groupings that have been mentioned). ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 23:57 ` Juri Linkov 2016-02-13 0:06 ` Drew Adams @ 2016-02-13 8:49 ` Eli Zaretskii 2016-02-13 17:20 ` Drew Adams 1 sibling, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-13 8:49 UTC (permalink / raw) To: Juri Linkov; +Cc: ofv, emacs-devel > From: Juri Linkov <juri@linkov.net> > Cc: Óscar Fuentes <ofv@wanadoo.es>, emacs-devel@gnu.org > Date: Sat, 13 Feb 2016 01:57:33 +0200 > > Can't we somehow use the same char-folding as is implemented in > ICU String Search Service (this is also used for search in Chromium): > http://userguide.icu-project.org/collation/icu-string-search-service > that supports matching of accented letters, conjoined letters, > and ignorable punctuation. > > As is described in http://userguide.icu-project.org/collation/concepts > there are several levels of character matching: > > 1. Primary Level: differences between base characters > > 2. Secondary Level: Accents in the characters > > 3. Tertiary Level: Upper and lower case differences in characters > > 4. Quaternary Level: Punctuation is ignored (where e.g. snake-cased > “black_bird” matches camel-cased “blackBird”) > > 5. Identical Level > > Maybe our customization could provide options to choose > between all these levels? That's the final goal, yes. The current implementation is just the initial step, and it basically does just item #1. (The list above is about collation, not about searching, so the wording does not really fit the searching use case. Also, they just reiterate what the Unicode TR#10, http://unicode.org/reports/tr10/, specifies.) The implementation should really be on the C level, like the case-folding support. The current implementation isn't, and therefore has several disadvantages some of which were already pointed out (e.g., the regexp it uses that gets exposed in some situations and causes users to be surprised). For these and other reasons, I think we should replace the current implementation with one that's in search_buffer, driven by tables generated from the Unicode database. I also think we will be unable to move to the higher levels mentioned above without first moving the implementation into search_buffer. Volunteers are welcome to work on that. Doing this will eventually require to use the data in DUCET (Default Unicode Collation Element Table) and CLDR (Common Locale Data Repository), I think, to support both the language-independent and language-dependent folding. But this is only needed for the next levels, the current level that basically only looks at the base character doesn't need fancy databases apart of what we already have. At the time, no one stepped forward to do this on the C level, and the current implementation was considered to be good-enough for the first step. ^ permalink raw reply [flat|nested] 263+ messages in thread
* RE: On language-dependent defaults for character-folding 2016-02-13 8:49 ` Eli Zaretskii @ 2016-02-13 17:20 ` Drew Adams 2016-02-13 17:58 ` Eli Zaretskii 2016-02-13 18:15 ` Artur Malabarba 0 siblings, 2 replies; 263+ messages in thread From: Drew Adams @ 2016-02-13 17:20 UTC (permalink / raw) To: Eli Zaretskii, Juri Linkov; +Cc: ofv, emacs-devel > The implementation should really be on the C level, like the > case-folding support. The current implementation isn't, and > therefore has several disadvantages some of which were already > pointed out (e.g., the regexp it uses that gets exposed in some > situations and causes users to be surprised). I would like to see a list of the disadvantages laid out clearly. In general, I prefer that things be implemented in Lisp. That leaves them far more open to Emacs users, and hence to imagination and enhancement - which can often help Emacs farther down the road. Implementation in C makes great sense in some cases, but it would help to see the detailed arguments (cases). The argument that a complex, not-user-friendly, under-the-covers regexp might sometimes get exposed to users is OK, but it is not really compelling (for me). Some users, in some case, might well want to make use of such a regexp (e.g. tweaking it). And we might be able to find ways to not expose it for most uses. (I don't reject the messy-regexp argument. I just don't find it sufficiently compelling on its own.) > For these and other reasons, Can we see them, please? > I also think we will be unable to move to the higher levels > mentioned above without first moving the implementation into > search_buffer. How so? (Reasons.) If there are important, e.g., performance reasons for coding some functionality in C, can we at least try to limit it - do that in component pieces rather than as a monolithic take-it-or-leave-it whole? I'm interested in maximizing what Lisp users can do with this, other things being equal (IOW, use C only for what is absolutely necessary). ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-13 17:20 ` Drew Adams @ 2016-02-13 17:58 ` Eli Zaretskii 2016-02-18 19:15 ` John Wiegley 2016-02-13 18:15 ` Artur Malabarba 1 sibling, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-13 17:58 UTC (permalink / raw) To: Drew Adams; +Cc: ofv, emacs-devel, juri > Date: Sat, 13 Feb 2016 09:20:39 -0800 (PST) > From: Drew Adams <drew.adams@oracle.com> > Cc: ofv@wanadoo.es, emacs-devel@gnu.org > > > The implementation should really be on the C level, like the > > case-folding support. The current implementation isn't, and > > therefore has several disadvantages some of which were already > > pointed out (e.g., the regexp it uses that gets exposed in some > > situations and causes users to be surprised). > > I would like to see a list of the disadvantages laid out clearly. They were mentioned in the discussions since this feature was designed and till this day. I'm sorry, but I have no time for searching and summarizing them. It isn't easier for me than for anyone else, and doesn't require any specialized knowledge. > In general, I prefer that things be implemented in Lisp. > That leaves them far more open to Emacs users, and hence to > imagination and enhancement - which can often help Emacs > farther down the road. Not in this case. Search must be fast, it must support regular expressions and complex character transformations, all of which cannot be done well in Lisp, even if we expose buffer text to Lisp, something we don't have today. > Implementation in C makes great sense in some cases, but it > would help to see the detailed arguments (cases). These arguments were already given, you will find them in the archives. > The argument that a complex, not-user-friendly, under-the-covers > regexp might sometimes get exposed to users is OK, but it is not > really compelling (for me). Some users, in some case, might well > want to make use of such a regexp (e.g. tweaking it). Users should tweak tables that tell Emacs how to fold characters, they should not tweak the results of folding. Like they do (if they do) with case-tables today. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-13 17:58 ` Eli Zaretskii @ 2016-02-18 19:15 ` John Wiegley 2016-02-18 20:12 ` Eli Zaretskii 0 siblings, 1 reply; 263+ messages in thread From: John Wiegley @ 2016-02-18 19:15 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ofv, juri, Drew Adams, emacs-devel [-- Attachment #1: Type: text/plain, Size: 832 bytes --] Hi Eli, I see you've kept a running tally of votes for the default nature of this feature. Do you have a summary yet? Given the sheer volume of concerned response, both for and against, my inclination is to vote OFF by default, until we have more experience and understanding. However, if the tally shows a distinct majority (at least 2/3) wanting it on by default, I'll take that account. We can always turn it back on in a later release -- and users can always configure it at any time -- so this isn't a cliff we're driving off of. It's more a question of how much use (and thus, feedback) the feature will receive during 25.x if we turn it off by default. -- John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 629 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-18 19:15 ` John Wiegley @ 2016-02-18 20:12 ` Eli Zaretskii 2016-02-19 5:11 ` Lars Ingebrigtsen 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-18 20:12 UTC (permalink / raw) To: John Wiegley; +Cc: ofv, juri, drew.adams, emacs-devel > From: John Wiegley <jwiegley@gmail.com> > Cc: Drew Adams <drew.adams@oracle.com>, ofv@wanadoo.es, emacs-devel@gnu.org, juri@linkov.net > Date: Thu, 18 Feb 2016 11:15:22 -0800 > > I see you've kept a running tally of votes for the default nature of this > feature. Do you have a summary yet? I can count ;-) > Given the sheer volume of concerned response, both for and against, my > inclination is to vote OFF by default, until we have more experience and > understanding. However, if the tally shows a distinct majority (at least 2/3) > wanting it on by default, I'll take that account. I think it's too early to make the decision. The feedback only started to accumulate, and we are nowhere near a release. What's the rush? ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-18 20:12 ` Eli Zaretskii @ 2016-02-19 5:11 ` Lars Ingebrigtsen 2016-02-19 8:20 ` Eli Zaretskii 0 siblings, 1 reply; 263+ messages in thread From: Lars Ingebrigtsen @ 2016-02-19 5:11 UTC (permalink / raw) To: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > I can count ;-) Here's my vote: I think character folding is a good idea, and that it should be turned on by default if it respects the locale. If not, it should be off by default. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-19 5:11 ` Lars Ingebrigtsen @ 2016-02-19 8:20 ` Eli Zaretskii 2016-02-19 9:22 ` Elias Mårtenson 2016-02-19 22:44 ` Lars Ingebrigtsen 0 siblings, 2 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-19 8:20 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: emacs-devel > From: Lars Ingebrigtsen <larsi@gnus.org> > Date: Fri, 19 Feb 2016 16:11:41 +1100 > > Here's my vote: I think character folding is a good idea, and that it > should be turned on by default if it respects the locale. If not, it > should be off by default. Thanks. But what does "respect the locale" mean, in practical terms? A large portion of the characters that have some decomposition, and thus will be folded when searching, belong to scripts that are not related to any language or other locale-specific attribute. What do you think should be done with them in the context of this feature? ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-19 8:20 ` Eli Zaretskii @ 2016-02-19 9:22 ` Elias Mårtenson 2016-02-19 10:09 ` Eli Zaretskii 2016-02-19 20:38 ` Marcin Borkowski 2016-02-19 22:44 ` Lars Ingebrigtsen 1 sibling, 2 replies; 263+ messages in thread From: Elias Mårtenson @ 2016-02-19 9:22 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel [-- Attachment #1: Type: text/plain, Size: 2029 bytes --] On 19 February 2016 at 16:20, Eli Zaretskii <eliz@gnu.org> wrote: > > From: Lars Ingebrigtsen <larsi@gnus.org> > > Date: Fri, 19 Feb 2016 16:11:41 +1100 > > > > Here's my vote: I think character folding is a good idea, and that it > > should be turned on by default if it respects the locale. If not, it > > should be off by default. > > Thanks. But what does "respect the locale" mean, in practical terms? > A large portion of the characters that have some decomposition, and > thus will be folded when searching, belong to scripts that are not > related to any language or other locale-specific attribute. What do > you think should be done with them in the context of this feature? > The Unicode character decomposition was never meant to be used to provide a feature such as character folding in Emacs. But, Unicode really doesn't provide a good alternative. The standard itself states that this belongs to the realm of localisation (IIRC, it even goes as far as mentioning Swedish as a counterexample). I readily agree that using the decomposition is a clever way to get the functionality quite a long way, but the cases where it breaks down, it does so quite spectacularly, and that's what I (and others) have been opposing. My suggestion would be to apply several levels of comparisons: 1. Check if the characters have locale-specific folding rules (for Swedish, this would be no more than 3-5 characters or so). If not: 2. Check the equivalence according to the Unicode collation charts: http://unicode.org/charts/collation/ 3. (maybe) Use the decomposition trick As for the per-locale exception tables mentioned in point 1, I don't know if such information is easily available. It may be possible to extract it from the localedata files from Glibc. But even if it isn't, creating one for a language should be trivial since we only need a list of character groups that should _not_ be folded, which for most languages should be a very small list (in fact, for most(?) it's probably empty). Regards, Elias [-- Attachment #2: Type: text/html, Size: 2752 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-19 9:22 ` Elias Mårtenson @ 2016-02-19 10:09 ` Eli Zaretskii 2016-02-19 10:51 ` Elias Mårtenson 2016-02-19 20:38 ` Marcin Borkowski 1 sibling, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-19 10:09 UTC (permalink / raw) To: Elias Mårtenson; +Cc: larsi, emacs-devel > Date: Fri, 19 Feb 2016 17:22:18 +0800 > From: Elias Mårtenson <lokedhs@gmail.com> > Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org> > > The Unicode character decomposition was never meant to be used to provide a feature such as character > folding in Emacs. That's not true. Canonical equivalence, which is encoded in canonical decompositions, is a must for searching. Otherwise, what looks the same on display will not be found, and will look like a bug. See the example I gave with ñ and ñ (the latter one is 2 characters). So using decomposition is not a trick, it simply uses the same data that determines equivalence of character sequences. > My suggestion would be to apply several levels of comparisons: > > 1. Check if the characters have locale-specific folding rules (for Swedish, this would be no more than 3-5 > characters or so). If not: > 2. Check the equivalence according to the Unicode collation charts: http://unicode.org/charts/collation/ > 3. (maybe) Use the decomposition trick 2 and 3 are the same as we do already, AFAICT. (Collation charts describe ordering, which is irrelevant for searching; other than that, you will see that Emacs already implements the data shown in http://unicode.org/charts/collation/.) As for the locale-specific parts: using that will only DTRT if we assume that the majority of searches are done in buffers holding text in locale's language. Is that a good assumption? We are talking about a multilingual Emacs, in an age of global communications, where you can have conversations with someone on the other side of the world, or read text that combines several languages in the same buffer. Do we really want to go back to the l10n days, when there was ever only one locale that was interesting -- the current one? I wonder. > As for the per-locale exception tables mentioned in point 1, I don't know if such information is easily available. It is, Unicode provides it. We just didn't import it yet. > It may be possible to extract it from the localedata files from Glibc. But even if it isn't, creating one for a > language should be trivial since we only need a list of character groups that should _not_ be folded, which for > most languages should be a very small list (in fact, for most(?) it's probably empty). It's more complex than that, but patches are welcome, of course. Note that the prerequisite for anything more complicated and elaborate than what we have now is to re-implement character-folding on the C level, inside search.c functions. The current implementation is at its limits already. I tried to convince the interested people to do this in C to be gin with, but couldn't, and the feature was important enough to have even in its current implementation. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-19 10:09 ` Eli Zaretskii @ 2016-02-19 10:51 ` Elias Mårtenson 2016-02-19 11:46 ` Eli Zaretskii 0 siblings, 1 reply; 263+ messages in thread From: Elias Mårtenson @ 2016-02-19 10:51 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel [-- Attachment #1: Type: text/plain, Size: 4151 bytes --] On 19 February 2016 at 18:09, Eli Zaretskii <eliz@gnu.org> wrote: > > The Unicode character decomposition was never meant to be used to > provide a feature such as character > > folding in Emacs. > > That's not true. Canonical equivalence, which is encoded in canonical > decompositions, is a must for searching. Otherwise, what looks the > same on display will not be found, and will look like a bug. See the > example I gave with ñ and ñ (the latter one is 2 characters). > Of course you have to use the decomposition algorithms to ensure that the precomposed and decomposed variations of the same character compares equal. This is, however, different from using the decomposition to to decompose a character and then using the base character as the thing to match against. The latter is what Emacs is doing today, as far as I understand. > 2 and 3 are the same as we do already, AFAICT. (Collation charts > describe ordering, which is irrelevant for searching; other than that, > you will see that Emacs already implements the data shown in > http://unicode.org/charts/collation/.) > The collation charts also describe equivalence. If you look at the latin collation chart for example ( http://unicode.org/charts/collation/chart_Latin.html) you will see that the characters are grouped. These are the equivalences I'm referring to. Now, I note that on these charts, U+0061 LATIN SMALL LETTER A and U+2C65 LATIN SMALL LETTER A WITH STROKE compares as different characters, and the latter does not have a decomposition. Should this also be addressed? > As for the locale-specific parts: using that will only DTRT if we > assume that the majority of searches are done in buffers holding text > in locale's language. Is that a good assumption? My opinion is that the default search behaviour should depend primarily on the locale of the entire Emacs session. I.e. the locale of the user starting the application. I'm not disagreeing that allowing a buffer-local locale override this behaviour is a good idea, but as a Swedish speaker I really see å, ä and a as completely separate things, even if the language of the buffer that I am editing happens to be English. The equivalence of these characters is the odd behaviour here, and the one that should be enabled explicitly. Also, if I happen to be editing a Spanish document (I don't speak Spanish) I would find equivalence of ñ and n to be incredibly useful, even though Óscar would grind his teeth at it. :-) We are talking > about a multilingual Emacs, in an age of global communications, where > you can have conversations with someone on the other side of the > world, or read text that combines several languages in the same > buffer. Do we really want to go back to the l10n days, when there was > ever only one locale that was interesting -- the current one? I > wonder. > Actually, I think so. This is because the search equivalence is inherently a local thing. The behaviour of search is more tried to a user's preference than the locale of the given buffer, in most cases. At least that's my opinion. The bike shed can have many colours. > It is, Unicode provides it. We just didn't import it yet. > It does? I was looking for such tables, but didn't find it. Do you have a link? > It's more complex than that, but patches are welcome, of course. > Having spent the better part of the day trying to solve a C++ design problem that I had originally hand-waved as being trivial, I know what you mean… > Note that the prerequisite for anything more complicated and elaborate > than what we have now is to re-implement character-folding on the C > level, inside search.c functions. The current implementation is at > its limits already. I tried to convince the interested people to do > this in C to be gin with, but couldn't, and the feature was important > enough to have even in its current implementation. > I'm not going to offer to do this until I'm sure that I can have the copyright assignment done. But I am interested in it. Regards, Elias [-- Attachment #2: Type: text/html, Size: 6206 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-19 10:51 ` Elias Mårtenson @ 2016-02-19 11:46 ` Eli Zaretskii 2016-02-19 13:37 ` Elias Mårtenson 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-19 11:46 UTC (permalink / raw) To: Elias Mårtenson; +Cc: larsi, emacs-devel > Date: Fri, 19 Feb 2016 18:51:47 +0800 > From: Elias Mårtenson <lokedhs@gmail.com> > Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org> > > > The Unicode character decomposition was never meant to be used to provide a feature such as > character > > folding in Emacs. > > That's not true. Canonical equivalence, which is encoded in canonical > decompositions, is a must for searching. Otherwise, what looks the > same on display will not be found, and will look like a bug. See the > example I gave with ñ and ñ (the latter one is 2 characters). > > Of course you have to use the decomposition algorithms to ensure that the precomposed and decomposed > variations of the same character compares equal. Then you agree that _some_ form of character-folding should be turned on by default? > This is, however, different from using the decomposition to to decompose a character and then using the > base character as the thing to match against. The latter is what Emacs is doing today, as far as I understand. Please describe in more detail why do you think what Emacs does today is not what you think it should do. It's possible we have a miscommunication here. For example, if the buffer includes ñ (2 characters), should "C-s n" find the n in it? > 2 and 3 are the same as we do already, AFAICT. (Collation charts > describe ordering, which is irrelevant for searching; other than that, > you will see that Emacs already implements the data shown in > http://unicode.org/charts/collation/.) > > The collation charts also describe equivalence. That equivalence is encoded in the decomposition data that is part of UnicodeData.txt which Emacs uses for character-folding. > If you look at the latin collation chart for example > (http://unicode.org/charts/collation/chart_Latin.html) you will see that the characters are grouped. These are > the equivalences I'm referring to. Yes. And if you look at the entries of the equivalent characters in UnicodeData.txt, you will see there they have decompositions, which is what Emacs uses for searching when character-folding is in effect. > Now, I note that on these charts, U+0061 LATIN SMALL LETTER A and U+2C65 LATIN SMALL LETTER A > WITH STROKE compares as different characters, and the latter does not have a decomposition. Should this > also be addressed? Maybe so, but given the controversy even about what we do now, which is a subset, I'd doubt extending what we do now is a wise move. > As for the locale-specific parts: using that will only DTRT if we > assume that the majority of searches are done in buffers holding text > in locale's language. Is that a good assumption? > > My opinion is that the default search behaviour should depend primarily on the locale of the entire Emacs > session. I.e. the locale of the user starting the application. I'm not disagreeing that allowing a buffer-local locale > override this behaviour is a good idea, but as a Swedish speaker I really see å, ä and a as completely > separate things, even if the language of the buffer that I am editing happens to be English. The equivalence of > these characters is the odd behaviour here, and the one that should be enabled explicitly. > > Also, if I happen to be editing a Spanish document (I don't speak Spanish) I would find equivalence of ñ and n > to be incredibly useful, even though Óscar would grind his teeth at it. :-) So you are in fact making two contradicting statements here. Indeed, the locale in which Emacs started says almost nothing about the documents being edited, nor even about the user's preferences: it is easy to imagine a user whose "native" locale is X starting Emacs in another locale. > We are talking > about a multilingual Emacs, in an age of global communications, where > you can have conversations with someone on the other side of the > world, or read text that combines several languages in the same > buffer. Do we really want to go back to the l10n days, when there was > ever only one locale that was interesting -- the current one? I > wonder. > > Actually, I think so. This is because the search equivalence is inherently a local thing. Being a multi-lingual environment, Emacs has no real notion of the locale. > It is, Unicode provides it. We just didn't import it yet. > > It does? I was looking for such tables, but didn't find it. Do you have a link? Look for DUCET and its tailoring data. These should be a good starting point: http://www.unicode.org/Public/UCA/latest/ http://cldr.unicode.org/ > Note that the prerequisite for anything more complicated and elaborate > than what we have now is to re-implement character-folding on the C > level, inside search.c functions. The current implementation is at > its limits already. I tried to convince the interested people to do > this in C to be gin with, but couldn't, and the feature was important > enough to have even in its current implementation. > > I'm not going to offer to do this until I'm sure that I can have the copyright assignment done. But I am > interested in it. Thanks. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-19 11:46 ` Eli Zaretskii @ 2016-02-19 13:37 ` Elias Mårtenson 2016-02-19 19:18 ` Eli Zaretskii 0 siblings, 1 reply; 263+ messages in thread From: Elias Mårtenson @ 2016-02-19 13:37 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel [-- Attachment #1: Type: text/plain, Size: 6782 bytes --] On 19 February 2016 at 19:46, Eli Zaretskii <eliz@gnu.org> wrote: > > Of course you have to use the decomposition algorithms to ensure that > the precomposed and decomposed > > variations of the same character compares equal. > > Then you agree that _some_ form of character-folding should be turned > on by default? > Yes. > > This is, however, different from using the decomposition to to decompose > a character and then using the > > base character as the thing to match against. The latter is what Emacs > is doing today, as far as I understand. > > Please describe in more detail why do you think what Emacs does today > is not what you think it should do. It's possible we have a > miscommunication here. > The main issue to me is that it matches things that should not be matched. A secondary (minor) issue is that some things that should be matched is not (see my example with U+2C65). > For example, if the buffer includes ñ (2 characters), should "C-s n" > find the n in it? > That depends on the locale of the user. However, from the point of a user, there should not be a visible difference between the precomposed and the composed variants are the exact same character. This is in line with Unicode recommendations (https://en.wikipedia.org/wiki/Unicode_equivalence) Note: I know that it's possible that I am wrong about this and that Unicode actually _has_ said that the equivalence tables can be used for this purpose (I.e. decompose and only use the primary character). If that is the case, I'd be interested to see a reference to that, but I will still be of the same opinion that doing so will result in broken behaviour for a certain class of user. Thus, if I am Spanish, I will _not_ want any of those to match "n". If I'm Swedish I will likely want both of them to match "n". That equivalence is encoded in the decomposition data that is part of > UnicodeData.txt which Emacs uses for character-folding. > The equivalence tables explains that the precomposed character U+00F1 is equivalent to the specific sequence U+006E U+0303. That is all it says. It does not say that ñ is a variation of n. It's an instruction how to construct a given character. The decompositions are used in the normalisation forms to ensure that the two variants are treated equally (such as the two alternative representations of ñ that we have been discussing). > > If you look at the latin collation chart for example > > (http://unicode.org/charts/collation/chart_Latin.html) you will see > that the characters are grouped. These are > > the equivalences I'm referring to. > > Yes. And if you look at the entries of the equivalent characters in > UnicodeData.txt, you will see there they have decompositions, which is > what Emacs uses for searching when character-folding is in effect. > Yes, and this is where the crux of our disagreement lies, I think. I previously referred to using the decompositions as a guide to character equivalence as a "trick". I stand by this, since this is not the purpose of the decompositions. The best thing that Unicode provides for that purpose (to my knowledge) are the collation charts that I mentioned previously ( http://unicode.org/charts/collation/) > > Now, I note that on these charts, U+0061 LATIN SMALL LETTER A and U+2C65 > LATIN SMALL LETTER A > > WITH STROKE compares as different characters, and the latter does not > have a decomposition. Should this > > also be addressed? > > Maybe so, but given the controversy even about what we do now, which > is a subset, I'd doubt extending what we do now is a wise move. > I was just asking to understand your position better. > > As for the locale-specific parts: using that will only DTRT if we > > assume that the majority of searches are done in buffers holding text > > in locale's language. Is that a good assumption? > > > > My opinion is that the default search behaviour should depend primarily > on the locale of the entire Emacs > > session. I.e. the locale of the user starting the application. I'm not > disagreeing that allowing a buffer-local locale > > override this behaviour is a good idea, but as a Swedish speaker I > really see å, ä and a as completely > > separate things, even if the language of the buffer that I am editing > happens to be English. The equivalence of > > these characters is the odd behaviour here, and the one that should be > enabled explicitly. > > > > Also, if I happen to be editing a Spanish document (I don't speak > Spanish) I would find equivalence of ñ and n > > to be incredibly useful, even though Óscar would grind his teeth at it. > :-) > > So you are in fact making two contradicting statements here. Interesting. I have re-read what I wrote and I really don't see myself holding two contradicting statement. Perhaps you think that I am both against folding and not, at the same time. If that's the case, let me try to rephrase: I like the idea of character folding. But, if it's incorrectly (by my standards, of course) implemented I would rather not have it at all since it will be highly annoying. > Indeed, > the locale in which Emacs started says almost nothing about the > documents being edited, nor even about the user's preferences: it is > easy to imagine a user whose "native" locale is X starting Emacs in > another locale. > Yes. I am fully aware of this. But so be it. Having applications work differently depending on the locale of the environment the application was started in is nothing new. > > We are talking > > about a multilingual Emacs, in an age of global communications, where > > you can have conversations with someone on the other side of the > > world, or read text that combines several languages in the same > > buffer. Do we really want to go back to the l10n days, when there was > > ever only one locale that was interesting -- the current one? I > > wonder. > > > > Actually, I think so. This is because the search equivalence is > inherently a local thing. > > Being a multi-lingual environment, Emacs has no real notion of the > locale. > Perhaps it should? > > It is, Unicode provides it. We just didn't import it yet. > > > > It does? I was looking for such tables, but didn't find it. Do you have > a link? > > Look for DUCET and its tailoring data. These should be a good > starting point: > > http://www.unicode.org/Public/UCA/latest/ > http://cldr.unicode.org/ > Those are the decomposition charts, and don't actually say anything about equivalence outside of providing a canonical form for precomposed characters, as was discussed above. Regards, Elias [-- Attachment #2: Type: text/html, Size: 9952 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-19 13:37 ` Elias Mårtenson @ 2016-02-19 19:18 ` Eli Zaretskii 2016-02-20 5:22 ` Elias Mårtenson 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-19 19:18 UTC (permalink / raw) To: Elias Mårtenson; +Cc: larsi, emacs-devel > Date: Fri, 19 Feb 2016 21:37:26 +0800 > From: Elias Mårtenson <lokedhs@gmail.com> > Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org> > > For example, if the buffer includes ñ (2 characters), should "C-s n" > find the n in it? > > That depends on the locale of the user. There are use cases that are independent of the locale. For example, imagine that you need to find all the literal n characters in a buffer because you are investigating a bug in the program that produced that buffer. As an Emacs user, I need to do such jobs almost every day. I don't want the results affected by the locale. > However, from the point of a user, there should not be a visible > difference between the precomposed and the composed variants are the > exact same character. What if the user wants to find all those places where what looks like ñ is actually ñ? Wouldn't that be a valid use case? > Note: I know that it's possible that I am wrong about this and that Unicode actually _has_ said that the > equivalence tables can be used for this purpose (I.e. decompose and only use the primary character). If that is > the case, I'd be interested to see a reference to that, but I will still be of the same opinion that doing so will > result in broken behaviour for a certain class of user. The reference you are looking for is the Unicode Standard itself. It says to use the normalization forms, see for example section 5.16 there. > The equivalence tables explains that the precomposed character U+00F1 is equivalent to the specific > sequence U+006E U+0303. That is all it says. It does not say that ñ is a variation of n. It's an instruction how > to construct a given character. Every character-folding search implementation decomposes characters before matching them. So does Emacs. We didn't invent this, and we certainly didn't use the decompositions where they weren't supposed to be used. It's not a trick, it's what everyone else does to do the job. See the ICU library, for example. > The decompositions are used in the normalisation forms to ensure that the two variants are treated equally > (such as the two alternative representations of ñ that we have been discussing). Yes, and any character-folding search uses normalization forms as well. > Indeed, > the locale in which Emacs started says almost nothing about the > documents being edited, nor even about the user's preferences: it is > easy to imagine a user whose "native" locale is X starting Emacs in > another locale. > > Yes. I am fully aware of this. But so be it. Having applications work differently depending on the locale of the > environment the application was started in is nothing new. It's not new. It's old. We should move on to more general environments that support multiple languages. Emacs is such an environment. The old l10n paradigms are fundamentally incompatible with that. > Being a multi-lingual environment, Emacs has no real notion of the > locale. > > Perhaps it should? That'd be a step backward, IMO. > > It is, Unicode provides it. We just didn't import it yet. > > > > It does? I was looking for such tables, but didn't find it. Do you have a link? > > Look for DUCET and its tailoring data. These should be a good > starting point: > > http://www.unicode.org/Public/UCA/latest/ > http://cldr.unicode.org/ > > Those are the decomposition charts, and don't actually say anything about equivalence outside of providing a > canonical form for precomposed characters, as was discussed above. Strange, I always thought the data was there. Perhaps you should ask a question on the Unicode mailing list, then. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-19 19:18 ` Eli Zaretskii @ 2016-02-20 5:22 ` Elias Mårtenson 2016-02-20 6:31 ` Lars Ingebrigtsen 2016-02-20 9:21 ` Eli Zaretskii 0 siblings, 2 replies; 263+ messages in thread From: Elias Mårtenson @ 2016-02-20 5:22 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel [-- Attachment #1: Type: text/plain, Size: 4305 bytes --] On 20 February 2016 at 03:18, Eli Zaretskii <eliz@gnu.org> wrote: > > Date: Fri, 19 Feb 2016 21:37:26 +0800 > > From: Elias Mårtenson <lokedhs@gmail.com> > > Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org > > > > > > For example, if the buffer includes ñ (2 characters), should "C-s n" > > find the n in it? > > > > That depends on the locale of the user. > > There are use cases that are independent of the locale. For example, > imagine that you need to find all the literal n characters in a buffer > because you are investigating a bug in the program that produced that > buffer. As an Emacs user, I need to do such jobs almost every day. I > don't want the results affected by the locale. > Of course I'm not saying that you should now be able to do this. All I'm advocating here is sensible defaults. > > However, from the point of a user, there should not be a visible > > difference between the precomposed and the composed variants are the > > exact same character. > > What if the user wants to find all those places where what looks like > ñ is actually ñ? Wouldn't that be a valid use case? > It would, but certainly a very rare one. For all intents and purposes the two forms are (should be) equivalent. > The reference you are looking for is the Unicode Standard itself. It > says to use the normalization forms, see for example section 5.16 > there. > I have read that section before, and I have now read it again. The section certainly talks about searching ignores diacritics, but does not discuss a method to do so. There is also a reference to TR29, but it refers to grapheme clusters which would be a very strange way to do character folding (Koreans would be very confused). > Every character-folding search implementation decomposes characters > before matching them. So does Emacs. We didn't invent this, and we > certainly didn't use the decompositions where they weren't supposed to > be used. It's not a trick, it's what everyone else does to do the > job. See the ICU library, for example. > Every example you have given so far discusses the decomposition equivalence. I.e. the fact that the who variants of ñ are the same. Section 5.16 discuss the _concept_ of allowing n and ñ match similarly but the mechanism to do so is locale-dependent. This is what Unicode says, and that is what I say. My position is simply that the default (if absolutely nothing else overrides it) should be chosen to take the locale of the user into account. > > The decompositions are used in the normalisation forms to ensure that > the two variants are treated equally > > (such as the two alternative representations of ñ that we have been > discussing). > > Yes, and any character-folding search uses normalization forms as > well. > Yes, but that's not what normalisation forms were designed to do. Again (I really apologise for repeating myself, I'm starting to sound like a troll and that is truly not my intention), the purpose of normalisation forms are to ensure that the two variants of ñ compare the same. It is not designed to provide a mechanism to allow n to compare equal to ñ. > > Yes. I am fully aware of this. But so be it. Having applications work > differently depending on the locale of the > > environment the application was started in is nothing new. > > It's not new. It's old. We should move on to more general > environments that support multiple languages. Emacs is such an > environment. The old l10n paradigms are fundamentally incompatible > with that. > Sure, but doesn't it make sense to fall back to the user's default if the buffer does not have an overriding locale? > > Being a multi-lingual environment, Emacs has no real notion of the > > locale. > > > > Perhaps it should? > > That'd be a step backward, IMO. > As opposed to having no concept of locale at all? I just have to disagree with you on that. > Strange, I always thought the data was there. Perhaps you should ask > a question on the Unicode mailing list, then. > That's a good idea actually. Thank you for the suggestion. I'm reading that mailing list, and I will post a question there. Regards, Elias [-- Attachment #2: Type: text/html, Size: 6205 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-20 5:22 ` Elias Mårtenson @ 2016-02-20 6:31 ` Lars Ingebrigtsen 2016-02-20 9:18 ` Elias Mårtenson 2016-02-20 10:34 ` Eli Zaretskii 2016-02-20 9:21 ` Eli Zaretskii 1 sibling, 2 replies; 263+ messages in thread From: Lars Ingebrigtsen @ 2016-02-20 6:31 UTC (permalink / raw) To: Elias Mårtenson; +Cc: Eli Zaretskii, emacs-devel Elias Mårtenson <lokedhs@gmail.com> writes: > Every example you have given so far discusses the decomposition > equivalence. I.e. the fact that the who variants of ñ are the > same. Section 5.16 discuss the _concept_ of allowing n and ñ match > similarly but the mechanism to do so is locale-dependent. This is what > Unicode says, and that is what I say. Yes. Here are my thoughts (I was sitting on a plane today): It seems to me that we're considering using the Unicode decomposition rules for "variant detection" because it's what we have. But this doesn't allow people to say `C-s l' to find ł or `C-s o' to find ø, and this would obviously be something that many people would find helpful. So the Unicode decomposition rules only get us halfway there. On the other hand, they go to far for other users, who absolutely do not want `C-s o' to find ø, but would be really glad if `C-s hermes' would find "Hermés" (or is it "Hermès"? I can't even type that in on this keyboard). Emacs is awesome. We should aim to make this extremely useful feature awesome. So: How many characters are we really talking about? Unicode is big and scary, but this only applies to alphabetical scripts, right? That is, all the Latin-like scripts, and... possibly Greek/Hebrew/Cyrillic? I don't know? But if we only consider the Latin scripts for a moment, there aren't more than a few hundred Unicode points that we care about. Basically all the old iso-8859-foos from around Europe. And what we want is a way for people with normal keyboards (they have a-z in Latin alphabet countries) to search for variants. So: That sounds like an evening's work. (defvar *character-variants* '((?a ?á ?å ?ä ...) (?o ?ø ?ö ?ó ...) ...)) Everything that somebody says "that's kinda an a, right?" goes on there. Then we have something like: (define-locale-execption :no ?a ?å) There would be few of these exceptions per locale. The Scandinavian countries would have three each, and Denmark's and Norway's would be the same. That bit is more than an evening, but is something that people would enjoy submitting exceptions to, I think. And then we just look up the locale, create the mapping when we type `C-s', and there we are. An awesome, very useful feature that would annoy nobody, and that should be on by default. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-20 6:31 ` Lars Ingebrigtsen @ 2016-02-20 9:18 ` Elias Mårtenson 2016-02-20 10:34 ` Eli Zaretskii 1 sibling, 0 replies; 263+ messages in thread From: Elias Mårtenson @ 2016-02-20 9:18 UTC (permalink / raw) To: Lars Magne Ingebrigtsen; +Cc: Eli Zaretskii, emacs-devel [-- Attachment #1: Type: text/plain, Size: 3489 bytes --] I think your message illustrates an opinion that is not only mine, in that I am not against the idea of character folding. I mean, if I were, I'd just ignore this discussion and just turn the feature off. What I want, and by the looks of things, other people too, is to actually have this feature. I just don't want it to be broken, and today it is broken because it' been implemented based on incorrect assumptions. On 20 Feb 2016 14:32, "Lars Ingebrigtsen" <larsi@gnus.org> wrote: > It seems to me that we're considering using the Unicode decomposition > rules for "variant detection" because it's what we have. But this > doesn't allow people to say `C-s l' to find ł or `C-s o' to find ø, and > this would obviously be something that many people would find helpful. The Unicode collation charts <http://unicode.org/charts/collation/> do place ø in the "o" category. Eli said in an earlier message that the collation charts were consulted, but when I test that doesn't seem to be the case. The Unicode character collation charts is the best generic solution that Unicode gives us. The proposal you put forward below seems very much like what I proposed earlier; having the locale-dependent rules determine any exceptions and then fall back to a generic method. The question is what that generic should be. The current trick of decomposing and using the first character of the decomposition is not good and breaks down very quickly. Clearly the collation charts should be consulted instead, but this is not enough. I could spend quite some time discussing all the issues that I can think of (to get an idea of it, look up how Korean and Devanagari works, as well as the concept of "grapheme clusters"). > So the Unicode decomposition rules only get us halfway there. On the > other hand, they go to far for other users, who absolutely do not want > `C-s o' to find ø, but would be really glad if `C-s hermes' would find > "Hermés" (or is it "Hermès"? I can't even type > So: How many characters are we really talking about? Unicode is big and > scary, but this only applies to alphabetical scripts, right? That is, > all the Latin-like scripts, and... possibly Greek/Hebrew/Cyrillic? I > don't know? Cyrillic has the issues. Also, most of the accented characters in Cyrillic are historical and not used today. Therefore having this feature in Cyrillic would most definitely be useful. > But if we only consider the Latin scripts for a moment, there aren't > more than a few hundred Unicode points that we care about. Basically > all the old iso-8859-foos from around Europe. And what we want is a way > for people with normal keyboards (they have a-z in Latin alphabet > countries) to search for variants. It's more than that, because it's not just single characters we're talking about but also combinations. Of course, for European languages this can be handled by comparing only the base character but in other languages this is a much more complex issue. That said, I agree with you on your proposed approach. > That bit is more than an evening, but is something that people would > enjoy submitting exceptions to, I think. You can count me in. :-) > And then we just look up the locale, create the mapping when we type > `C-s', and there we are. An awesome, very useful feature that would > annoy nobody, and that should be on by default. That would be amazing. Regards, Elias [-- Attachment #2: Type: text/html, Size: 3942 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-20 6:31 ` Lars Ingebrigtsen 2016-02-20 9:18 ` Elias Mårtenson @ 2016-02-20 10:34 ` Eli Zaretskii 2016-02-21 2:51 ` Lars Ingebrigtsen 2016-02-21 12:44 ` Richard Stallman 1 sibling, 2 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-20 10:34 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: lokedhs, emacs-devel > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel <emacs-devel@gnu.org> > Date: Sat, 20 Feb 2016 17:31:48 +1100 > > It seems to me that we're considering using the Unicode decomposition > rules for "variant detection" because it's what we have. No, we use decompositions because that's how equivalent strings are to be compared and mapped/folded. > But this doesn't allow people to say `C-s l' to find ł or `C-s o' to > find ø, and this would obviously be something that many people would > find helpful. > > So the Unicode decomposition rules only get us halfway there. Yes, the current implementation is just a first step. > On the other hand, they go to far for other users, who absolutely do > not want `C-s o' to find ø, but would be really glad if `C-s hermes' > would find "Hermés" (or is it "Hermès"? I can't even type that in > on this keyboard). Which is why this is toggle-able. > (defvar *character-variants* > '((?a ?á ?å ?ä ...) > (?o ?ø ?ö ?ó ...) > ...)) > > Everything that somebody says "that's kinda an a, right?" goes on there. The above won't support finding decomposed sequences as in á (there are 2 characters here, they are just displayed as one). I hope it's agreed that it is imperative for us to support finding such decomposed sequences (and we already do, under the current character-folding default). There are also more complicated cases like ǖ and ǖ (3 characters), where there are several diacritics which can be in either order, and we still have to match them, because they look identical on display. We currently don't support that, but we should do that in the future, and the decomposition data supports that. It is, of course, possible to support this without normalization, by having all those combinations in the database you proposed. But why should we bother creating and maintaining such a database (and updating it whenever a new Unicode version is released), when one is already available in data that we already read into Emacs? So we currently implement this by using the decomposition information in the Unicode database. Also, what would be the algorithm for searching using the data you propose? If you want to use regexps, then the data should already be in the form of regexps, I think. And I expect the regexp to look very similar to what we current construct in character-fold.el. So what are we really arguing here about? Is it about a feature that will allow exempting specific decompositions from the search? If so, I don't think it would be hard to do that with the current implementation, using just the locale-exception data (which should be much smaller). If that will make everyone happier, we can do this now, if we are sure we won't have another round of prolonged dispute about that. > And then we just look up the locale, create the mapping when we type > `C-s', and there we are. An awesome, very useful feature that would > annoy nobody, and that should be on by default. But it doesn't pass the simplest test above, so it really isn't good enough. Btw, this was already discussed in the past, before Artur sat down to implement this stuff. You may wish re-reading those discussions to see the broader picture. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-20 10:34 ` Eli Zaretskii @ 2016-02-21 2:51 ` Lars Ingebrigtsen 2016-02-21 6:28 ` Elias Mårtenson 2016-02-21 16:25 ` Eli Zaretskii 2016-02-21 12:44 ` Richard Stallman 1 sibling, 2 replies; 263+ messages in thread From: Lars Ingebrigtsen @ 2016-02-21 2:51 UTC (permalink / raw) To: Eli Zaretskii; +Cc: lokedhs, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > The above won't support finding decomposed sequences as in á (there > are 2 characters here, they are just displayed as one). They are displayed as two characters in this Emacs (current Ubuntu, Emacs git master). :-) > I hope it's agreed that it is imperative for us to support finding > such decomposed sequences (and we already do, under the current > character-folding default). Yes. > It is, of course, possible to support this without normalization, by > having all those combinations in the database you proposed. But why > should we bother creating and maintaining such a database (and > updating it whenever a new Unicode version is released), when one is > already available in data that we already read into Emacs? So we > currently implement this by using the decomposition information in the > Unicode database. If that database gives us all that, then I'm all for using that database instead of creating our own, of course. But why doesn't C-s o find ø, and C-s l find ł then? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-21 2:51 ` Lars Ingebrigtsen @ 2016-02-21 6:28 ` Elias Mårtenson 2016-02-21 8:14 ` Achim Gratz ` (2 more replies) 2016-02-21 16:25 ` Eli Zaretskii 1 sibling, 3 replies; 263+ messages in thread From: Elias Mårtenson @ 2016-02-21 6:28 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: Eli Zaretskii, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1130 bytes --] On 21 February 2016 at 10:51, Lars Ingebrigtsen <larsi@gnus.org> wrote: If that database gives us all that, then I'm all for using that database > instead of creating our own, of course. But why doesn't C-s o find ø, > and C-s l find ł then? Because under the Unicode decomposition rules, ø is not decomposable. I can't explain why that is the case (probably because there is no reason to have a combining /. After all, the only languages that use ø are languages that use it as a character of its own). On a related note, I would expect a search for ö to match ø. As would you, I guess? In the thread on the Unicode mailing list, the recommendation seems to be to use the CLDR (http://cldr.unicode.org/). Of course, this assumes there is a locale, but the choice of locale can easily be customisable (with the default being the user's locale). Another poster on the same thread mentioned that the CLDR doesn't go all the way, but adding a set of exceptions on top of it shouldn't be hard. In any case, the result would be significantly better than what is implemented now. Regards, Elias [-- Attachment #2: Type: text/html, Size: 1704 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-21 6:28 ` Elias Mårtenson @ 2016-02-21 8:14 ` Achim Gratz 2016-02-23 16:56 ` Eli Zaretskii 2016-02-21 10:05 ` Lars Ingebrigtsen 2016-02-21 16:31 ` Eli Zaretskii 2 siblings, 1 reply; 263+ messages in thread From: Achim Gratz @ 2016-02-21 8:14 UTC (permalink / raw) To: emacs-devel Elias Mårtenson writes: > Because under the Unicode decomposition rules, ø is not decomposable. I > can't explain why that is the case (probably because there is no reason to > have a combining /. After all, the only languages that use ø are languages > that use it as a character of its own). AFAIK, for combining characters to be composable/decomposable the glyphs must not overlap. This is the same issue as with the polish »ł« to the best of my knowledge. In other words, unicode composition/decomposition rules tell you more about the glyph construction than they do about useful strategies to search for multiple characters. The idea of using the base character of the canonical decomposition in the search might still yield a useful shortcut in most cases, but I'm not sure it is correct in all languages even when that decomposition exists and, as the examples show, there are cases where the non-decomposed character has to be treated specially. Regards, Achim. -- +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+ SD adaptations for Waldorf Q V3.00R3 and Q+ V3.54R2: http://Synth.Stromeko.net/Downloads.html#WaldorfSDada ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-21 8:14 ` Achim Gratz @ 2016-02-23 16:56 ` Eli Zaretskii 0 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-23 16:56 UTC (permalink / raw) To: Achim Gratz; +Cc: emacs-devel > From: Achim Gratz <Stromeko@nexgo.de> > Date: Sun, 21 Feb 2016 09:14:18 +0100 > > Elias Mårtenson writes: > > Because under the Unicode decomposition rules, ø is not decomposable. I > > can't explain why that is the case (probably because there is no reason to > > have a combining /. After all, the only languages that use ø are languages > > that use it as a character of its own). > > AFAIK, for combining characters to be composable/decomposable the glyphs > must not overlap. This is the same issue as with the polish »ł« to the > best of my knowledge. The definitive answer is here, for those interested: http://www.unicode.org/mail-arch/unicode-ml/y2016-m02/0106.html > In other words, unicode composition/decomposition rules tell you more > about the glyph construction than they do about useful strategies to > search for multiple characters. That conclusion is too radical, IMO. You will see in the above message that the criterion you describe was just a means for the UTC to draw a line somewhere, i.e. it was an ad-hoc rule more than anything else. > The idea of using the base character of the canonical decomposition > in the search might still yield a useful shortcut in most cases, but > I'm not sure it is correct in all languages even when that > decomposition exists and, as the examples show, there are cases > where the non-decomposed character has to be treated specially. Language-specific tailoring is indeed needed for best results, but the language-independent decompositions have their place. E.g., you will see in the Unicode collation database (UCA) a file named decomps.txt that is basically a list of decompositions from UnicodeData.txt with additions specifically for collation, searching, and matching (including ł, btw). Which tells me that the decomposition data in UnicodeData.txt is a good basis for these features, it is not just about glyph constructions. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-21 6:28 ` Elias Mårtenson 2016-02-21 8:14 ` Achim Gratz @ 2016-02-21 10:05 ` Lars Ingebrigtsen 2016-02-21 11:01 ` Elias Mårtenson 2016-02-21 16:31 ` Eli Zaretskii 2 siblings, 1 reply; 263+ messages in thread From: Lars Ingebrigtsen @ 2016-02-21 10:05 UTC (permalink / raw) To: Elias Mårtenson; +Cc: Eli Zaretskii, emacs-devel Elias Mårtenson <lokedhs@gmail.com> writes: > On a related note, I would expect a search for ö to match ø. As would you, I > guess? No, I wouldn't. :-) Actually, I wouldn't expect anything other than the 26 first letters of the alphabet to match variants. It's like it's fine if you're typing in lower case characters for them to match upper case, too, but if you've bothered to type an upper case character, then you probably don't want lower case characters to match. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-21 10:05 ` Lars Ingebrigtsen @ 2016-02-21 11:01 ` Elias Mårtenson 2016-02-21 16:02 ` Eli Zaretskii 2016-02-22 1:58 ` Lars Ingebrigtsen 0 siblings, 2 replies; 263+ messages in thread From: Elias Mårtenson @ 2016-02-21 11:01 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: Eli Zaretskii, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1064 bytes --] On 21 February 2016 at 18:05, Lars Ingebrigtsen <larsi@gnus.org> wrote: Elias Mårtenson <lokedhs@gmail.com> writes: > > > On a related note, I would expect a search for ö to match ø. As would > you, I > > guess? > > No, I wouldn't. :-) Actually, I wouldn't expect anything other than > the 26 first letters of the alphabet to match variants. > All right, but at least in Sweden we often write Danish and Norwegian names using ø and æ, so for us we definitely want to fold those into ö and ä. That was what I was referring to. I.e. the former are definitely variants of the latter. In fact, there is an argument to be made for "ü" to be a variant of "y" as well, even though it's very rare (pretty much limited to a single word: "Müsli"). > It's like it's fine if you're typing in lower case characters for them > to match upper case, too, but if you've bothered to type an upper case > character, then you probably don't want lower case characters to match. This is how Emacs behaves today, is it not? Regards, Elias [-- Attachment #2: Type: text/html, Size: 1717 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-21 11:01 ` Elias Mårtenson @ 2016-02-21 16:02 ` Eli Zaretskii 2016-02-22 1:58 ` Lars Ingebrigtsen 1 sibling, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-21 16:02 UTC (permalink / raw) To: Elias Mårtenson; +Cc: larsi, emacs-devel > Date: Sun, 21 Feb 2016 19:01:06 +0800 > From: Elias Mårtenson <lokedhs@gmail.com> > Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel <emacs-devel@gnu.org> > > It's like it's fine if you're typing in lower case characters for them > to match upper case, too, but if you've bothered to type an upper case > character, then you probably don't want lower case characters to match. > > This is how Emacs behaves today, is it not? Yes. It's called "asymmetric search". ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-21 11:01 ` Elias Mårtenson 2016-02-21 16:02 ` Eli Zaretskii @ 2016-02-22 1:58 ` Lars Ingebrigtsen 2016-02-22 2:34 ` Elias Mårtenson 2016-02-22 3:38 ` Eli Zaretskii 1 sibling, 2 replies; 263+ messages in thread From: Lars Ingebrigtsen @ 2016-02-22 1:58 UTC (permalink / raw) To: Elias Mårtenson; +Cc: Eli Zaretskii, emacs-devel Elias Mårtenson <lokedhs@gmail.com> writes: > It's like it's fine if you're typing in lower case characters for them > to match upper case, too, but if you've bothered to type an upper case > character, then you probably don't want lower case characters to match. > > This is how Emacs behaves today, is it not? Yes, and that's my point. I'd expect character folding when doing searches to work in an analogous fashion: If I type `C-s é', I would be surprised if it found "e", but not the other way around. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 1:58 ` Lars Ingebrigtsen @ 2016-02-22 2:34 ` Elias Mårtenson 2016-02-22 2:48 ` Lars Ingebrigtsen 2016-02-22 18:01 ` Richard Stallman 2016-02-22 3:38 ` Eli Zaretskii 1 sibling, 2 replies; 263+ messages in thread From: Elias Mårtenson @ 2016-02-22 2:34 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: Eli Zaretskii, emacs-devel [-- Attachment #1: Type: text/plain, Size: 832 bytes --] On 22 February 2016 at 09:58, Lars Ingebrigtsen <larsi@gnus.org> wrote: > Elias Mårtenson <lokedhs@gmail.com> writes: > > > It's like it's fine if you're typing in lower case characters for them > > to match upper case, too, but if you've bothered to type an upper case > > character, then you probably don't want lower case characters to match. > > > > This is how Emacs behaves today, is it not? > > Yes, and that's my point. I'd expect character folding when doing > searches to work in an analogous fashion: If I type `C-s é', I would be > surprised if it found "e", but not the other way around. But you are Danish, are you not? As such, I would have thought that when you search for ø, you would want to find a Swedish ö? (this is the inverse of the natural Swedish behaviour). Regards, Elias [-- Attachment #2: Type: text/html, Size: 1308 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 2:34 ` Elias Mårtenson @ 2016-02-22 2:48 ` Lars Ingebrigtsen 2016-02-22 6:13 ` Werner LEMBERG 2016-02-22 18:01 ` Richard Stallman 2016-02-22 18:01 ` Richard Stallman 1 sibling, 2 replies; 263+ messages in thread From: Lars Ingebrigtsen @ 2016-02-22 2:48 UTC (permalink / raw) To: Elias Mårtenson; +Cc: Eli Zaretskii, emacs-devel Elias Mårtenson <lokedhs@gmail.com> writes: > But you are Danish, are you not? Almost. Norwegian. :-) > As such, I would have thought that when you search for ø, you would > want to find a Swedish ö? (this is the inverse of the natural Swedish > behaviour). No, I think that would be weird behaviour, and is not something that I ever wished would happen. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 2:48 ` Lars Ingebrigtsen @ 2016-02-22 6:13 ` Werner LEMBERG 2016-02-22 18:03 ` Richard Stallman 2016-02-22 18:01 ` Richard Stallman 1 sibling, 1 reply; 263+ messages in thread From: Werner LEMBERG @ 2016-02-22 6:13 UTC (permalink / raw) To: larsi; +Cc: eliz, lokedhs, emacs-devel >> As such, I would have thought that when you search for ø, you would >> want to find a Swedish ö? (this is the inverse of the natural >> Swedish behaviour). > > No, I think that would be weird behaviour, and is not something that > I ever wished would happen. Well, being Austrian, I would like to have a full equivalence of ø to ö while searching in German data... Werner ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 6:13 ` Werner LEMBERG @ 2016-02-22 18:03 ` Richard Stallman 2016-02-22 18:27 ` Werner LEMBERG 0 siblings, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-22 18:03 UTC (permalink / raw) To: Werner LEMBERG; +Cc: larsi, lokedhs, eliz, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > Well, being Austrian, I would like to have a full equivalence of ø to > ö while searching in German data... In what use case would that make a difference, and how? ø is not normally used in German, right? -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 18:03 ` Richard Stallman @ 2016-02-22 18:27 ` Werner LEMBERG 0 siblings, 0 replies; 263+ messages in thread From: Werner LEMBERG @ 2016-02-22 18:27 UTC (permalink / raw) To: rms; +Cc: larsi, lokedhs, eliz, emacs-devel > > Well, being Austrian, I would like to have a full equivalence of > > ø to ö while searching in German data... > > In what use case would that make a difference, and how? For example, the word `Øre' is usually written `Öre' in German (and this is true for essentially all words containing ø), so it would be good if a search for the latter finds the former and vice versa. > ø is not normally used in German, right? It is not used in the German language, but today there is a tendency in German speaking countries to use the original spelling in foreign words. However, during history many words were also `germanized' by adapting the spelling to German (i.e., becoming loan words), and here only German characters are used. In many cases accents were lost during the conversion to loan words; for example, a quite common name in German and Austria is `Dvorak', with the original Czech spelling being `Dvořák'. Werner ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 2:48 ` Lars Ingebrigtsen 2016-02-22 6:13 ` Werner LEMBERG @ 2016-02-22 18:01 ` Richard Stallman 2016-02-22 19:06 ` Eli Zaretskii 1 sibling, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-22 18:01 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: eliz, lokedhs, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] Lars, would you ever want any sort of folding between ö and ø? Would you want to use my proposed setting where folding occurs only between letters with and without an accent, and never folding between related letters such as o and ø? If you use that setting, then ö and ø will also never fold. Thus, you won't need to have any preference about how folding should treat ö and ø, when users do enable it. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 18:01 ` Richard Stallman @ 2016-02-22 19:06 ` Eli Zaretskii 2016-02-23 17:43 ` Richard Stallman 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-22 19:06 UTC (permalink / raw) To: rms; +Cc: larsi, lokedhs, emacs-devel > From: Richard Stallman <rms@gnu.org> > CC: lokedhs@gmail.com, eliz@gnu.org, emacs-devel@gnu.org > Date: Mon, 22 Feb 2016 13:01:26 -0500 > > Lars, would you ever want any sort of folding between ö and ø? > > Would you want to use my proposed setting where folding occurs only > between letters with and without an accent, and never folding between > related letters such as o and ø? If you use that setting, then > ö and ø will also never fold. Thus, you won't need to have any preference > about how folding should treat ö and ø, when users do enable it. Some minimal amount of folding will nevertheless be necessary even in asymmetric mode, in order to find character sequences produced by decomposing characters like ö into o and the combining mark ̈. That's because these two characters when juxtaposed (ö) look identical to the precomposed character on most displays, so we should by default find such decomposed sequences even when the search string includes the precomposed character. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 19:06 ` Eli Zaretskii @ 2016-02-23 17:43 ` Richard Stallman 2016-02-23 18:14 ` Eli Zaretskii 2016-02-23 20:21 ` Yuri Khan 0 siblings, 2 replies; 263+ messages in thread From: Richard Stallman @ 2016-02-23 17:43 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > Some minimal amount of folding will nevertheless be necessary even in > asymmetric mode, in order to find character sequences produced by > decomposing characters like ö into o and the combining mark ̈. That's > because these two characters when juxtaposed (ö) look identical to the > precomposed character on most displays, so we should by default find > such decomposed sequences even when the search string includes the > precomposed character. That is interesting. It means we need several levels of folding: * Different appearances of the same letter+decorations: as a single code point, or as a composition. * Identical-looking distinct code points (Latin a and Cyrillic a). * The same letter with different decorations (o and ö in English). * Equivalent letters (ö and ø in Swedish). * Non-equivalent letters modified from a common base (o and ö in Swedish). The first level is language-independent and should be handled symmetrically, with each folding group as an equivalence class. Is there any need, ever, to disable the first level? Perhaps it would be good to enable that all the time. The second level is also language-independent. Does anyone ever want to turn it off? The other levels are language-specific, and the user might want to enable or disable them. When enabled, the user might want them handled symmetrically or asymmetrically. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-23 17:43 ` Richard Stallman @ 2016-02-23 18:14 ` Eli Zaretskii 2016-02-23 20:24 ` Yuri Khan ` (2 more replies) 2016-02-23 20:21 ` Yuri Khan 1 sibling, 3 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-23 18:14 UTC (permalink / raw) To: rms; +Cc: larsi, lokedhs, emacs-devel > From: Richard Stallman <rms@gnu.org> > CC: larsi@gnus.org, lokedhs@gmail.com, emacs-devel@gnu.org > Date: Tue, 23 Feb 2016 12:43:56 -0500 > > That is interesting. It means we need several levels of folding: > > * Different appearances of the same letter+decorations: > as a single code point, or as a composition. > > * Identical-looking distinct code points (Latin a and Cyrillic a). This one is a very specialized feature needed only in some marginal use cases (like looking for the so-called "confusables" -- characters that look the same and could be used for deception, e.g. in URLs). > * The same letter with different decorations (o and ö in English). > > * Equivalent letters (ö and ø in Swedish). Not just letters -- sequences of characters. For example, å vs aa in Danish, or ffi vs ffi. > Is there any need, ever, to disable the first level? One could imagine a use case when you want to find only precomposed characters, not their decomposed equivalents. But it should be rare indeed. > The other levels are language-specific, and the user might want to > enable or disable them. Not all of them are language-specific. Some are valid in any language. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-23 18:14 ` Eli Zaretskii @ 2016-02-23 20:24 ` Yuri Khan 2016-02-25 12:11 ` Richard Stallman 2016-02-24 13:41 ` Richard Stallman 2016-02-24 13:41 ` Richard Stallman 2 siblings, 1 reply; 263+ messages in thread From: Yuri Khan @ 2016-02-23 20:24 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, lokedhs, rms@gnu.org, Emacs developers On Wed, Feb 24, 2016 at 12:14 AM, Eli Zaretskii <eliz@gnu.org> wrote: >> * Identical-looking distinct code points (Latin a and Cyrillic a). > > This one is a very specialized feature needed only in some marginal > use cases (like looking for the so-called "confusables" -- characters > that look the same and could be used for deception, e.g. in URLs). When looking for confusables, you don’t want to fold. You want to make letters of different scripts stand out, e.g. by font-locking. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-23 20:24 ` Yuri Khan @ 2016-02-25 12:11 ` Richard Stallman 2016-02-25 14:57 ` Yuri Khan 0 siblings, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-25 12:11 UTC (permalink / raw) To: Yuri Khan; +Cc: eliz, lokedhs, larsi, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > When looking for confusables, you don’t want to fold. You want to make > letters of different scripts stand out, e.g. by font-locking. That might be a good feature, but the devil is in the details. Would you like to discuss possible details here? Meanwhile, I don't think it has to be one or the other. It might be good to do both. It might be difficult to design a convention to distinguish Latin a and Cyrillic a with fonts _all the time_. So here's an idea: when you search for Latin a and it finds Cyrillic a, it could put a special font or color (this tty has no fonts) on the Cyrillic a to show it matched as a confusable. Likewise, if you search for Cyrillic a and it finds Latin a, it would put that same font on the Latin a. This needs just one font or color -- to indicate a confusable in search. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-25 12:11 ` Richard Stallman @ 2016-02-25 14:57 ` Yuri Khan 2016-02-26 20:21 ` Richard Stallman 0 siblings, 1 reply; 263+ messages in thread From: Yuri Khan @ 2016-02-25 14:57 UTC (permalink / raw) To: rms@gnu.org; +Cc: Eli Zaretskii, lokedhs, Lars Ingebrigtsen, Emacs developers On Thu, Feb 25, 2016 at 6:11 PM, Richard Stallman <rms@gnu.org> wrote: > > When looking for confusables, you don’t want to fold. You want to make > > letters of different scripts stand out, e.g. by font-locking. > > That might be a good feature, but the devil is in the details. > Would you like to discuss possible details here? No. > Meanwhile, I don't think it has to be one or the other. > It might be good to do both. What specific user scenario do you want to solve by folding Latin/Greek/Cyrillic confusables? > It might be difficult to design a convention to distinguish > Latin a and Cyrillic a with fonts _all the time_. There is no reason to distinguish them _all the time_. For convenient reading, they should in fact be indistinguishable. The reader knows from the surrounding context which letters are Latin and which are Cyrillic. It is when you are proof-reading text that it becomes important to distinguish Latin and Cyrillic, to check that you don’t have a stray Cyrillic letter within an English word, or vice-versa. (For that matter, in this same mode it becomes important to distinguish various kinds of Unicode spaces, hyphen/en dash/em dash/minus/figure dash, degree sign/masculine ordinal, empty set/Latin letter o with stroke, etc. A trained eye and a specially designed font goes a long way.) > So here's an idea: > when you search for Latin a and it finds Cyrillic a, it could put a special > font or color (this tty has no fonts) on the Cyrillic a > to show it matched as a confusable. Likewise, if you search for Cyrillic a > and it finds Latin a, it would put that same font on the Latin a. > > This needs just one font or color -- to indicate a confusable in search. That’s assuming we *do* want to fold confusables. I’d like to know a use case first. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-25 14:57 ` Yuri Khan @ 2016-02-26 20:21 ` Richard Stallman 2016-02-27 5:47 ` Yuri Khan 0 siblings, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-26 20:21 UTC (permalink / raw) To: Yuri Khan; +Cc: eliz, lokedhs, larsi, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > Meanwhile, I don't think it has to be one or the other. > > It might be good to do both. > What specific user scenario do you want to solve by folding > Latin/Greek/Cyrillic confusables? If I saw an 'a' in the buffer, I'd like searching for 'a' to find it. Of course, I will search for a Latin 'a'. If the char in the buffer is a Cyrillic 'a', I want isearch to find that too. > It is when you are proof-reading text that it becomes important to > distinguish Latin and Cyrillic, to check that you don’t have a stray > Cyrillic letter within an English word, or vice-versa. If I want to check which kind of a it is, I can do that with C-x =. It would never occur to me to test "Is this really a Cyrillic a" by searching for a Latin a and seeing if that finds it. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-26 20:21 ` Richard Stallman @ 2016-02-27 5:47 ` Yuri Khan 2016-02-27 19:54 ` Richard Stallman 0 siblings, 1 reply; 263+ messages in thread From: Yuri Khan @ 2016-02-27 5:47 UTC (permalink / raw) To: rms@gnu.org Cc: Eli Zaretskii, Elias Mårtenson, Lars Ingebrigtsen, Emacs developers On Sat, Feb 27, 2016 at 2:21 AM, Richard Stallman <rms@gnu.org> wrote: > > What specific user scenario do you want to solve by folding > > Latin/Greek/Cyrillic confusables? > > If I saw an 'a' in the buffer, I'd like searching for 'a' to find it. > Of course, I will search for a Latin 'a'. If the char in the buffer > is a Cyrillic 'a', I want isearch to find that too. You don’t usually see an “а” in isolation. In normal text, you see at least a word, and usually a sentence. Those give you enough context to know it’s not a Latin “a”. > > It is when you are proof-reading text that it becomes important to > > distinguish Latin and Cyrillic, to check that you don’t have a stray > > Cyrillic letter within an English word, or vice-versa. > > If I want to check which kind of a it is, I can do that with C-x =. You can do that if you already suspect one letter to be of the wrong alphabet (e.g. your spell-checker tells you there is no such word as “sрell-сhecker”). You cannot do that for any reasonably long stretch of text. > It would never occur to me to test "Is this really a Cyrillic a" > by searching for a Latin a and seeing if that finds it. Neither to me, though I might use a regexp isearch for [a-z] to highlight all Latin letters in a paragraph where I expect none. It would be confusing and misleading if it highlighted [АВЕКМНОРСТХЬавеморстух]. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-27 5:47 ` Yuri Khan @ 2016-02-27 19:54 ` Richard Stallman 2016-02-27 20:02 ` Eli Zaretskii ` (2 more replies) 0 siblings, 3 replies; 263+ messages in thread From: Richard Stallman @ 2016-02-27 19:54 UTC (permalink / raw) To: Yuri Khan; +Cc: eliz, lokedhs, larsi, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > If I saw an 'a' in the buffer, I'd like searching for 'a' to find it. > > Of course, I will search for a Latin 'a'. If the char in the buffer > > is a Cyrillic 'a', I want isearch to find that too. > You don’t usually see an “а” in isolation. In normal text, you see at > least a word, and usually a sentence. Those give you enough context to > know it’s not a Latin “a”. Often that is true. Nonetheless, I stand by what I said: I would rather have searching for Latin a match all a's. > Neither to me, though I might use a regexp isearch for [a-z] to > highlight all Latin letters in a paragraph where I expect none. It > would be confusing and misleading if it highlighted > [АВЕКМНОРСТХЬавеморстух]. Folding doesn't operate on [...], right? -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-27 19:54 ` Richard Stallman @ 2016-02-27 20:02 ` Eli Zaretskii 2016-02-27 20:05 ` Eli Zaretskii 2016-02-28 6:06 ` Yuri Khan 2 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-27 20:02 UTC (permalink / raw) To: rms; +Cc: larsi, emacs-devel, lokedhs, yuri.v.khan > From: Richard Stallman <rms@gnu.org> > CC: eliz@gnu.org, larsi@gnus.org, lokedhs@gmail.com, > emacs-devel@gnu.org > Date: Sat, 27 Feb 2016 14:54:00 -0500 > > > You don’t usually see an “а” in isolation. In normal text, you see at > > least a word, and usually a sentence. Those give you enough context to > > know it’s not a Latin “a”. > > Often that is true. Nonetheless, I stand by what I said: > I would rather have searching for Latin a match all a's. I think you are in a tiny minority in this respect. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-27 19:54 ` Richard Stallman 2016-02-27 20:02 ` Eli Zaretskii @ 2016-02-27 20:05 ` Eli Zaretskii 2016-02-28 10:25 ` Richard Stallman 2016-02-28 6:06 ` Yuri Khan 2 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-27 20:05 UTC (permalink / raw) To: rms; +Cc: larsi, emacs-devel, lokedhs, yuri.v.khan > From: Richard Stallman <rms@gnu.org> > Date: Sat, 27 Feb 2016 14:54:00 -0500 > Cc: eliz@gnu.org, lokedhs@gmail.com, larsi@gnus.org, emacs-devel@gnu.org > > > Neither to me, though I might use a regexp isearch for [a-z] to > > highlight all Latin letters in a paragraph where I expect none. It > > would be confusing and misleading if it highlighted > > [АВЕКМНОРСТХЬавеморстух]. > > Folding doesn't operate on [...], right? No, but only because character-folding is implemented with regexps. When it is re-implemented through translation tables, it will affect regexp search as well. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-27 20:05 ` Eli Zaretskii @ 2016-02-28 10:25 ` Richard Stallman 0 siblings, 0 replies; 263+ messages in thread From: Richard Stallman @ 2016-02-28 10:25 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, emacs-devel, lokedhs, yuri.v.khan [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > No, but only because character-folding is implemented with regexps. > When it is re-implemented through translation tables, it will affect > regexp search as well. How to properly fold character ranges calls for some additional thought. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-27 19:54 ` Richard Stallman 2016-02-27 20:02 ` Eli Zaretskii 2016-02-27 20:05 ` Eli Zaretskii @ 2016-02-28 6:06 ` Yuri Khan 2 siblings, 0 replies; 263+ messages in thread From: Yuri Khan @ 2016-02-28 6:06 UTC (permalink / raw) To: rms@gnu.org Cc: Eli Zaretskii, Elias Mårtenson, Lars Ingebrigtsen, Emacs developers On Sun, Feb 28, 2016 at 1:54 AM, Richard Stallman <rms@gnu.org> wrote: > > […] I might use a regexp isearch for [a-z] to > > highlight all Latin letters in a paragraph where I expect none. It > > would be confusing and misleading if it highlighted > > [АВЕКМНОРСТХЬавеморстух]. > > Folding doesn't operate on [...], right? Case folding surely does. I was assuming it is the long-term plan that character folding would operate consistently with case folding. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-23 18:14 ` Eli Zaretskii 2016-02-23 20:24 ` Yuri Khan @ 2016-02-24 13:41 ` Richard Stallman 2016-02-24 17:54 ` Eli Zaretskii 2016-02-24 13:41 ` Richard Stallman 2 siblings, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-24 13:41 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > The other levels are language-specific, and the user might want to > > enable or disable them. > Not all of them are language-specific. Some are valid in any > language. Could you explain that more concretely? -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-24 13:41 ` Richard Stallman @ 2016-02-24 17:54 ` Eli Zaretskii 2016-02-25 12:15 ` Richard Stallman 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-24 17:54 UTC (permalink / raw) To: rms; +Cc: larsi, lokedhs, emacs-devel > From: Richard Stallman <rms@gnu.org> > CC: larsi@gnus.org, lokedhs@gmail.com, emacs-devel@gnu.org > Date: Wed, 24 Feb 2016 08:41:45 -0500 > > > > The other levels are language-specific, and the user might want to > > > enable or disable them. > > > Not all of them are language-specific. Some are valid in any > > language. > > Could you explain that more concretely? Not sure what to explain, to tell the truth. What I had in mind is cases like á, which I don't think any user of any language will ever want to consider a non-decomposable character. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-24 17:54 ` Eli Zaretskii @ 2016-02-25 12:15 ` Richard Stallman 2016-02-25 12:38 ` Joost Kremers 0 siblings, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-25 12:15 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > Not sure what to explain, to tell the truth. What I had in mind is > cases like á, which I don't think any user of any language will ever > want to consider a non-decomposable character. In French and Spanish, á is a decorated version of a. Perhaps there is no language in which á has any other status. My point about decorated letters is that _in general_ the list of decorated versions of letters is language-dependent. For instance, ö is a decorated o in English and French, but not in Swedish. The tables that define decorated letters need to be language-specific. If it happens that all languages agree about á, that won't be a problem. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-25 12:15 ` Richard Stallman @ 2016-02-25 12:38 ` Joost Kremers 2016-02-25 22:43 ` John Wiegley 0 siblings, 1 reply; 263+ messages in thread From: Joost Kremers @ 2016-02-25 12:38 UTC (permalink / raw) To: rms; +Cc: Eli Zaretskii, lokedhs, larsi, emacs-devel On Thu, Feb 25 2016, Richard Stallman wrote: > If it happens that all languages agree about á, that won't be a problem. I doubt that's the case. Though I don't actually speak the language, I suspect that in Icelandic a and á are considered different letters. The former is pronounced [a], the latter [au̯]. Similar considerations apply to all the vowels e/é, i/í, o/ó, u/ú and y/ý. -- Joost Kremers Life has its moments ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-25 12:38 ` Joost Kremers @ 2016-02-25 22:43 ` John Wiegley 2016-02-25 22:48 ` John Wiegley 2016-02-26 18:13 ` Eli Zaretskii 0 siblings, 2 replies; 263+ messages in thread From: John Wiegley @ 2016-02-25 22:43 UTC (permalink / raw) To: Joost Kremers; +Cc: larsi, Eli Zaretskii, lokedhs, rms, emacs-devel >>>>> Joost Kremers <joostkremers@fastmail.fm> writes: > On Thu, Feb 25 2016, Richard Stallman wrote: >> If it happens that all languages agree about á, that won't be a problem. > I doubt that's the case. Though I don't actually speak the language, I > suspect that in Icelandic a and á are considered different letters. The > former is pronounced [a], the latter [au̯]. Similar considerations apply to > all the vowels e/é, i/í, o/ó, u/ú and y/ý. I'd like to ask at this point that this discussion move to Emacs Tangents, as it is not approaching anything in the way of a technical consensus. Sub-threads addressing specific, concrete issues are welcome on this list; but the general discussion happening here is only creating volume without result. Thank you, -- John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2 ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-25 22:43 ` John Wiegley @ 2016-02-25 22:48 ` John Wiegley 2016-02-26 18:13 ` Eli Zaretskii 1 sibling, 0 replies; 263+ messages in thread From: John Wiegley @ 2016-02-25 22:48 UTC (permalink / raw) To: Joost Kremers; +Cc: larsi, Eli Zaretskii, lokedhs, rms, emacs-devel >>>>> John Wiegley <johnw@gnu.org> writes: > Sub-threads addressing specific, concrete issues are welcome on this list; > but the general discussion happening here is only creating volume without > result. Where by "sub-thread" I mean, changing the Subject as you reply to indicate the precise point you wish to resolve through discussion here. -- John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2 ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-25 22:43 ` John Wiegley 2016-02-25 22:48 ` John Wiegley @ 2016-02-26 18:13 ` Eli Zaretskii 2016-02-27 0:48 ` John Wiegley 1 sibling, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-26 18:13 UTC (permalink / raw) To: John Wiegley; +Cc: joostkremers, larsi, lokedhs, rms, emacs-devel > From: John Wiegley <jwiegley@gmail.com> > Cc: rms@gnu.org, Eli Zaretskii <eliz@gnu.org>, lokedhs@gmail.com, larsi@gnus.org, emacs-devel@gnu.org > Date: Thu, 25 Feb 2016 14:43:37 -0800 > > I'd like to ask at this point that this discussion move to Emacs Tangents, as > it is not approaching anything in the way of a technical consensus. > > Sub-threads addressing specific, concrete issues are welcome on this list; but > the general discussion happening here is only creating volume without result. The discussion (with a few exceptions) is about how to augment the current implementation to make it more acceptable to various needs and cultures. So I think it's directly related to the pretest, and so moving it to emacs-tangents would be wrong. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-26 18:13 ` Eli Zaretskii @ 2016-02-27 0:48 ` John Wiegley 2016-02-27 8:38 ` Eli Zaretskii 0 siblings, 1 reply; 263+ messages in thread From: John Wiegley @ 2016-02-27 0:48 UTC (permalink / raw) To: Eli Zaretskii; +Cc: joostkremers, larsi, lokedhs, rms, emacs-devel [-- Attachment #1: Type: text/plain, Size: 959 bytes --] >>>>> Eli Zaretskii <eliz@gnu.org> writes: > The discussion (with a few exceptions) is about how to augment the current > implementation to make it more acceptable to various needs and cultures. So > I think it's directly related to the pretest, and so moving it to > emacs-tangents would be wrong. In that case, can you please propose a plan for reaching such acceptability? If I can clearly see what we're aiming toward, it will give me a context for reading these messages, and help focus the discussion. For example: makes exactly it not acceptable today? what are the desirable features of an "ideal implementation"? what are the variables we're trying to hammer down? etc. Then I think we can meaningfully tackle this issue by breaking it into the smaller pieces that make it up. -- John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 629 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-27 0:48 ` John Wiegley @ 2016-02-27 8:38 ` Eli Zaretskii 2016-02-27 8:58 ` John Wiegley 2016-02-27 19:53 ` Richard Stallman 0 siblings, 2 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-27 8:38 UTC (permalink / raw) To: John Wiegley; +Cc: joostkremers, larsi, lokedhs, rms, emacs-devel > From: John Wiegley <jwiegley@gmail.com> > Cc: joostkremers@fastmail.fm, rms@gnu.org, lokedhs@gmail.com, larsi@gnus.org, emacs-devel@gnu.org > Date: Fri, 26 Feb 2016 16:48:21 -0800 > > > [1:text/plain Hide] > > >>>>> Eli Zaretskii <eliz@gnu.org> writes: > > > The discussion (with a few exceptions) is about how to augment the current > > implementation to make it more acceptable to various needs and cultures. So > > I think it's directly related to the pretest, and so moving it to > > emacs-tangents would be wrong. > > In that case, can you please propose a plan for reaching such acceptability? > If I can clearly see what we're aiming toward, it will give me a context for > reading these messages, and help focus the discussion. > > For example: makes exactly it not acceptable today? what are the desirable > features of an "ideal implementation"? what are the variables we're trying to > hammer down? etc. Then I think we can meaningfully tackle this issue by > breaking it into the smaller pieces that make it up. The simplest change would be to have character-folding disabled by default in some European locales whose users expressed objections to having it on by default, due to folding of some characters that shouldn't be folded in the languages of those locales. Another, more complex, but still simple enough, possibility would be to have character-folding on by default, but have the problematic foldings filtered out from the regexp used by it. We could either always filter out all of them, or filter out only some of them, as determined by the user locale. For example, in the Spanish locales, ñ will not be folded. The next alternative is to come up with a fine-grained classification of character-folding, and provide user options to control each one of them independently, with the defaults determined by the user locale. For example, one class of folding is the one required for matching pre-composed characters such as á with its decomposed variant á; another class is for finding "similar" characters, such as finding ⒜ when looking for a. There should probably be classes that are disliked by users of certain languages, such as ñ for Spanish. Etc. etc. (I think this alternative needs more research and user feedback, and so is probably not for the release branch.) Maybe there are more alternatives, I don't know. It's not like they were explicitly proposed by someone; the above is just my personal conclusions from reading the discussion. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-27 8:38 ` Eli Zaretskii @ 2016-02-27 8:58 ` John Wiegley 2016-02-27 9:30 ` Eli Zaretskii 2016-02-27 19:53 ` Richard Stallman 1 sibling, 1 reply; 263+ messages in thread From: John Wiegley @ 2016-02-27 8:58 UTC (permalink / raw) To: Eli Zaretskii; +Cc: joostkremers, larsi, lokedhs, rms, emacs-devel [-- Attachment #1: Type: text/plain, Size: 2796 bytes --] >>>>> Eli Zaretskii <eliz@gnu.org> writes: > The simplest change would be to have character-folding disabled by default > in some European locales whose users expressed objections to having it on by > default, due to folding of some characters that shouldn't be folded in the > languages of those locales. > Another, more complex, but still simple enough, possibility would be to have > character-folding on by default, but have the problematic foldings filtered > out from the regexp used by it. We could either always filter out all of > them, or filter out only some of them, as determined by the user locale. For > example, in the Spanish locales, ñ will not be folded. > The next alternative is to come up with a fine-grained classification of > character-folding, and provide user options to control each one of them > independently, with the defaults determined by the user locale. For example, > one class of folding is the one required for matching pre-composed > characters such as á with its decomposed variant á; another class is for > finding "similar" characters, such as finding ⒜ when looking for a. There > should probably be classes that are disliked by users of certain languages, > such as ñ for Spanish. Etc. etc. (I think this alternative needs more > research and user feedback, and so is probably not for the release branch.) > Maybe there are more alternatives, I don't know. It's not like they were > explicitly proposed by someone; the above is just my personal conclusions > from reading the discussion. Thank you for that summary. From that reading, it sounds like this will require a fairly complex decision tree, to determine what should be folded when based on the details of each particular country/language? That is, we can't expect to make a single decision up front, but will need feedback from users in every country that uses Emacs, in order to determine what the correct settings are for each language? And what about a Swedish speaker living in America who uses en_US because that's what 90% of his text is in, who then wants to search some Swedish text? Is it the locale that determines it, or something specific to the nature of the text in each buffer? And how would Emacs know? Unless I'm not seeing the light at the end of this tunnel, this feature is just not ready for prime-time as a default. There are too many unanswered questions, and it sounds like none of them can be answered in the abstract for every case. I have a feeling we'd be getting bug reports constantly from users whose language contains details we never anticipated. -- John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 629 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-27 8:58 ` John Wiegley @ 2016-02-27 9:30 ` Eli Zaretskii 2016-02-27 16:22 ` Ken Brown 2016-02-27 22:48 ` John Wiegley 0 siblings, 2 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-27 9:30 UTC (permalink / raw) To: John Wiegley; +Cc: joostkremers, larsi, lokedhs, rms, emacs-devel > From: John Wiegley <jwiegley@gmail.com> > Cc: joostkremers@fastmail.fm, rms@gnu.org, lokedhs@gmail.com, larsi@gnus.org, emacs-devel@gnu.org > Date: Sat, 27 Feb 2016 00:58:02 -0800 > > Thank you for that summary. From that reading, it sounds like this will > require a fairly complex decision tree, to determine what should be folded > when based on the details of each particular country/language? I fail to see the complexity, but that's me. In particular, the first alternative (to have it disabled in certain locales) seems very simple to me. > And what about a Swedish speaker living in America who uses en_US because > that's what 90% of his text is in, who then wants to search some Swedish text? > Is it the locale that determines it, or something specific to the nature of > the text in each buffer? And how would Emacs know? I've asked these questions a lot in this discussion, and still the majority thinks that the locale in which Emacs is started should be used for the defaults. So you are in fact arguing with what the majority says, not with me. > Unless I'm not seeing the light at the end of this tunnel, this feature is > just not ready for prime-time as a default. There are too many unanswered > questions, and it sounds like none of them can be answered in the abstract for > every case. I have a feeling we'd be getting bug reports constantly from users > whose language contains details we never anticipated. Do we have a clear definition of what are the criteria for this feature to be "ready for prime-time as a default"? You are in effect saying that we will never be able to find good answers for those questions. We shouldn't be dismissing a good feature such as this one, which many users like, due to FUD-like arguments. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-27 9:30 ` Eli Zaretskii @ 2016-02-27 16:22 ` Ken Brown 2016-02-27 22:48 ` John Wiegley 1 sibling, 0 replies; 263+ messages in thread From: Ken Brown @ 2016-02-27 16:22 UTC (permalink / raw) To: Eli Zaretskii, John Wiegley Cc: joostkremers, larsi, lokedhs, rms, emacs-devel On 2/27/2016 4:30 AM, Eli Zaretskii wrote: >> From: John Wiegley <jwiegley@gmail.com> >> Thank you for that summary. From that reading, it sounds like this will >> require a fairly complex decision tree, to determine what should be folded >> when based on the details of each particular country/language? > > I fail to see the complexity, but that's me. In particular, the first > alternative (to have it disabled in certain locales) seems very simple > to me. I strongly agree. This would be an excellent compromise for 25.1. It would enable many users to discover a useful new feature, while allowing time for future refinements that would improve the feature for users in the problematic locales. Ken ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-27 9:30 ` Eli Zaretskii 2016-02-27 16:22 ` Ken Brown @ 2016-02-27 22:48 ` John Wiegley 2016-02-28 15:57 ` Eli Zaretskii 1 sibling, 1 reply; 263+ messages in thread From: John Wiegley @ 2016-02-27 22:48 UTC (permalink / raw) To: Eli Zaretskii; +Cc: joostkremers, larsi, lokedhs, rms, emacs-devel [-- Attachment #1: Type: text/plain, Size: 3006 bytes --] >>>>> Eli Zaretskii <eliz@gnu.org> writes: > I've asked these questions a lot in this discussion, and still the majority > thinks that the locale in which Emacs is started should be used for the > defaults. So you are in fact arguing with what the majority says, not with > me. From what I've seen, this is a complex feature with many corner cases, some of which may not have been encountered yet because it hasn't been "out in the field" except for a few pretests. > Do we have a clear definition of what are the criteria for this feature to > be "ready for prime-time as a default"? You are in effect saying that we > will never be able to find good answers for those questions. We shouldn't be > dismissing a good feature such as this one, which many users like, due to > FUD-like arguments. Having such a clear definition would be the first criterion. :) Otherwise, I feel like we're saying, "It sounds useful, why not enable it by default?" Here are my somewhat fuzzy criteria: 1. Questions about the feature should not prompt mega-threads that fail to reach clarity within a three week time-frame. This indicates a lack of clarity about the feature among the core developers, and I believe users will notice this lack of clarity when trying out the feature. 2. If there is work yet to be done, we should know what the work is. Otherwise, the feature may change in unpredictable ways in future versions. If that's the case, why make it the default before those decisions have been made? 3. I would like to have a sense that this is a feature with either prior art, or considerable experience, behind it. Instead, I get the *feeling* (from reading this thread) that we're just starting to explore the idea of character-class-based searching, and it strikes me as odd that we would make our first attempt at it a default behavior for all users. I've heard several people ask for it not to be a default, and I take that seriously. The many complexities surrounding this feature make me uneasy. If this were a product for sale, I'd have a huge question mark next to making this a default behavior, given the confusion and false bug reports it is likely to raise. Nothing I've read so far in this discussion has increased my sense of security; quite the opposite, I become more wary by the week. It seems like the more we poke this anthill, the more critters jump out. That said, I'm quite happy for the feature to be there, and I will most definitely turn it on. The question is whether it should become the default for all users from the start. We can always enable it as a default later, so I don't see a need to hurry. This could be a great feature to introduce as a default in 26.1, if it receives good reception from early adopters in 25.x. -- John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 629 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-27 22:48 ` John Wiegley @ 2016-02-28 15:57 ` Eli Zaretskii 2016-02-28 16:59 ` Drew Adams 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-28 15:57 UTC (permalink / raw) To: John Wiegley; +Cc: joostkremers, larsi, lokedhs, rms, emacs-devel > From: John Wiegley <jwiegley@gmail.com> > Cc: joostkremers@fastmail.fm, rms@gnu.org, lokedhs@gmail.com, larsi@gnus.org, emacs-devel@gnu.org > Date: Sat, 27 Feb 2016 14:48:31 -0800 > > From what I've seen, this is a complex feature with many corner cases, some of > which may not have been encountered yet because it hasn't been "out in the > field" except for a few pretests. I don't see any corner use cases, just some parts that, for best results, should be handled depending on the language of the text. What we have now is IMNSHO good enough, although improvements are welcome (and need infrastructure we don't currently have). This is a clear case of perfect being the enemy of good. > The question is whether it should become the default for all users > from the start. We can always enable it as a default later, so I > don't see a need to hurry. This could be a great feature to > introduce as a default in 26.1, if it receives good reception from > early adopters in 25.x. Why does it have to be a binary all or nothing decision? Users of a few languages found some of the folding patterns incorrect for their language -- why not turn only those patterns off in the locales that use only those languages? Why should we have this decision affect users who have nothing to do with those few languages? Turning this summarily off will also disable features that AFAIR no one objected to -- the ability to find á (a 2-character sequence) when looking for á (one character), or vice versa. I fail to see how a failure to match by default in this use case would make any sense at all. We should make our decisions in this matter based on understanding the issues involved, and try very hard not to throw away the baby with the bathwater. ^ permalink raw reply [flat|nested] 263+ messages in thread
* RE: On language-dependent defaults for character-folding 2016-02-28 15:57 ` Eli Zaretskii @ 2016-02-28 16:59 ` Drew Adams 2016-02-28 22:59 ` John Wiegley 0 siblings, 1 reply; 263+ messages in thread From: Drew Adams @ 2016-02-28 16:59 UTC (permalink / raw) To: Eli Zaretskii, John Wiegley Cc: joostkremers, larsi, lokedhs, rms, emacs-devel > > From what I've seen, this is a complex feature with many corner > > cases, some of which may not have been encountered yet because it > > hasn't been "out in the field" except for a few pretests. > > I don't see any corner use cases, just some parts that, for best > results, should be handled depending on the language of the text. > What we have now is IMNSHO good enough, although improvements are > welcome (and need infrastructure we don't currently have). This is > a clear case of perfect being the enemy of good. I don't see anyone arguing that this feature is not "good enough" for Emacs 25.1. No one has suggested pulling the feature from the release. The question is only whether it should be turned on by default. Posing that question, and even deciding that it is not, is not at all "a clear case of perfect being the enemy of good." > > The question is whether it should become the default for all > > users from the start. What John said. > > We can always enable it as a default later, so I > > don't see a need to hurry. This could be a great feature to > > introduce as a default in 26.1, if it receives good reception from > > early adopters in 25.x. > > Why does it have to be a binary all or nothing decision? Users of a > few languages found some of the folding patterns incorrect for their > language -- why not turn only those patterns off in the locales that > use only those languages? Why should we have this decision affect > users who have nothing to do with those few languages? That's a reasonable question: whether Emacs should have different default values for this feature for different users/locales. I tend to think that deciding to do that now would also be a bit premature, but the question is reasonable. > Turning this summarily off will also disable features that AFAIR no > one objected to -- the ability to find á (a 2-character sequence) > when looking for á (one character), or vice versa. I fail to see > how a failure to match by default in this use case would make any > sense at all. That "ability to find" would not disappear if char-folding were off by default. It is you who sounds like you are now making the question into all-or-nothing. > We should make our decisions in this matter based on understanding > the issues involved, and try very hard not to throw away the baby > with the bathwater. I don't see anyone proposing to throw out the bathwater, much less the baby with it. Eli, you say here, quite often, that you think discussions about what the default behavior of a feature should be are typically fruitless, if not sterile. But it seems clear that you care quite a lot about this default behavior. I'd say let it go. There will be Emacs 25.2 and beyond. And users will try this new feature and give their feedback, which I expect will be overwhelmingly positive - and informative for further discussions here. Based on user feedback and further discussion and analysis here (this is not going away), Emacs Dev will improve and elaborate this feature. We will have better ideas about how to handle all of the things that are currently not so clear. There is plenty of time to decide again whether this or that should be turned on by default. What seems clear to me for Emacs 25.1 is that the feature should be included AND that it should be simple to both (1) customize the default behavior for a given user (i.e., what behavior search starts with, a la `case-fold-search') and (2) toggle the behavior on the fly, during Isearch. Given (1) and (2), users can do what they like, and we can learn later from them what behaviors might best be adopted for defaulting. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-28 16:59 ` Drew Adams @ 2016-02-28 22:59 ` John Wiegley 2016-02-29 0:22 ` Drew Adams 2016-02-29 0:31 ` Juri Linkov 0 siblings, 2 replies; 263+ messages in thread From: John Wiegley @ 2016-02-28 22:59 UTC (permalink / raw) To: Drew Adams; +Cc: rms, joostkremers, lokedhs, emacs-devel, Eli Zaretskii, larsi [-- Attachment #1: Type: text/plain, Size: 1950 bytes --] >>>>> Drew Adams <drew.adams@oracle.com> writes: > What seems clear to me for Emacs 25.1 is that the feature should be included > AND that it should be simple to both (1) customize the default behavior for > a given user (i.e., what behavior search starts with, a la > `case-fold-search') and (2) toggle the behavior on the fly, during Isearch. I think Drew has summarized perfectly what I would like to see happen. In addition, I'd add one more item: Once 25.1 is released, I (or another) will write a blog article publicizing this feature and touting its benefits, in order to encourage people to try it out and discover how useful it can be. However, making it a default in 25.1 is something I am simply not comfortable doing, giving the diversity of opinion on this list, plus my own misgivings about so new (and nuanced) a feature. Yes, the visual equality of á and á is a powerful argument, but as Drew said, there will be well-advertised ways to both enable this feature, and to toggle it while searching. Users will not lose any capacity by our decision, they will simply not experience it as a default out of the box. And so, my decision is that this feature will be off by default in the 25.1 release, with the genuine hope that it can be made solid enough to become a default in a future release. It needn't even wait until 26.1, if we receive enough positive feedback. My thanks to everyone for the extensive and conscientious debate, and to Eli for sticking to his guns. I am hopeful we will reach general consensus over time, and that this feature will come to be recognized as a compelling aspect of the Emacs feature set. Until that day, please forgive me my reservations; I'm just not there yet in wanting this to become a default behavior. -- John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 629 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* RE: On language-dependent defaults for character-folding 2016-02-28 22:59 ` John Wiegley @ 2016-02-29 0:22 ` Drew Adams 2016-02-29 0:31 ` Juri Linkov 1 sibling, 0 replies; 263+ messages in thread From: Drew Adams @ 2016-02-29 0:22 UTC (permalink / raw) To: John Wiegley Cc: rms, joostkremers, lokedhs, emacs-devel, Eli Zaretskii, larsi > I'd add one more item: Once 25.1 is released, I (or another) will > write a blog article publicizing this feature and touting its > benefits, in order to encourage people to try it out and discover > how useful it can be. Good idea. It would be good to include some of the use cases brought up here (e.g. dealing with different languages). People here who are more familiar with specific cases could make suggestions or propose corrections to whatever is written as a first draft. That way, these cases and their possible issues (so far) will be out there, from the outset, in addition to the general info about using the new feature. That will help users who might run into such use cases on their own, and doing that will help us get more feedback from such users, for future enhancement of the feature. Mentioning such things on the blog could be done in a separate section, after the main points have been made. In addition to the benefits mentioned above, this will show people that Emacs is thinking about such things and is open to suggestions about them. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-28 22:59 ` John Wiegley 2016-02-29 0:22 ` Drew Adams @ 2016-02-29 0:31 ` Juri Linkov 2016-02-29 3:45 ` Eli Zaretskii 1 sibling, 1 reply; 263+ messages in thread From: Juri Linkov @ 2016-02-29 0:31 UTC (permalink / raw) To: Drew Adams; +Cc: rms, joostkremers, lokedhs, emacs-devel, Eli Zaretskii, larsi >>>>>> Drew Adams <drew.adams@oracle.com> writes: > >> What seems clear to me for Emacs 25.1 is that the feature should be included >> AND that it should be simple to both (1) customize the default behavior for >> a given user (i.e., what behavior search starts with, a la >> `case-fold-search') and (2) toggle the behavior on the fly, during Isearch. > > I think Drew has summarized perfectly what I would like to see happen. In > addition, I'd add one more item: Once 25.1 is released, I (or another) will > write a blog article publicizing this feature and touting its benefits, in > order to encourage people to try it out and discover how useful it can be. > > However, making it a default in 25.1 is something I am simply not comfortable > doing, giving the diversity of opinion on this list, plus my own misgivings > about so new (and nuanced) a feature. Yes, the visual equality of á and á is a > powerful argument, but as Drew said, there will be well-advertised ways to > both enable this feature, and to toggle it while searching. Users will not > lose any capacity by our decision, they will simply not experience it as a > default out of the box. > > And so, my decision is that this feature will be off by default in the 25.1 > release, with the genuine hope that it can be made solid enough to become a > default in a future release. It needn't even wait until 26.1, if we receive > enough positive feedback. > > My thanks to everyone for the extensive and conscientious debate, and to Eli > for sticking to his guns. I am hopeful we will reach general consensus over > time, and that this feature will come to be recognized as a compelling aspect > of the Emacs feature set. Until that day, please forgive me my reservations; > I'm just not there yet in wanting this to become a default behavior. Even if disabled by default before the next release, do you think we still have to polish and finish this feature before the release, so the users willing to enable it would enjoy it bug-free and usable? In case of a positive answer, I have a few ideas how to help achieve this goal. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-29 0:31 ` Juri Linkov @ 2016-02-29 3:45 ` Eli Zaretskii 0 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-29 3:45 UTC (permalink / raw) To: Juri Linkov; +Cc: rms, joostkremers, lokedhs, emacs-devel, larsi, drew.adams > From: Juri Linkov <juri@linkov.net> > Cc: Eli Zaretskii <eliz@gnu.org>, joostkremers@fastmail.fm, larsi@gnus.org, lokedhs@gmail.com, rms@gnu.org, emacs-devel@gnu.org > Date: Mon, 29 Feb 2016 02:31:21 +0200 > > Even if disabled by default before the next release, do you think > we still have to polish and finish this feature before the release, > so the users willing to enable it would enjoy it bug-free and usable? That goes without saying. > In case of a positive answer, I have a few ideas how to help achieve > this goal. Thanks in advance. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-27 8:38 ` Eli Zaretskii 2016-02-27 8:58 ` John Wiegley @ 2016-02-27 19:53 ` Richard Stallman 2016-02-27 20:01 ` Eli Zaretskii 1 sibling, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-27 19:53 UTC (permalink / raw) To: Eli Zaretskii; +Cc: joostkremers, larsi, johnw, lokedhs, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > The simplest change would be to have character-folding disabled by > default in some European locales whose users expressed objections to ... Why not implement what I suggested? Even though there are several levels, in each case they boil down into a set of classes of characters, each one either symmetric or asymmetric. Once that calculation is done, we can search for them with the existing mechanism. > That is, we > can't expect to make a single decision up front, but will need feedback from > users in every country that uses Emacs, in order to determine what the correct > settings are for each language? Right. Once we show it to people, we will start getting language-specific definitions. > And what about a Swedish speaker living in America who uses en_US because > that's what 90% of his text is in, who then wants to search some Swedish text? > Is it the locale that determines it, or something specific to the nature of > the text in each buffer? And how would Emacs know? Clearly we need to provide a way to set the language for each buffer. We need this for several purposes, another one being the ispell dictionary. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-27 19:53 ` Richard Stallman @ 2016-02-27 20:01 ` Eli Zaretskii 2016-02-28 10:24 ` Richard Stallman [not found] ` <<E1aZyX5-0007bU-Mu@fencepost.gnu.org> 0 siblings, 2 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-27 20:01 UTC (permalink / raw) To: rms; +Cc: joostkremers, larsi, johnw, lokedhs, emacs-devel > From: Richard Stallman <rms@gnu.org> > CC: johnw@gnu.org, joostkremers@fastmail.fm, larsi@gnus.org, > lokedhs@gmail.com, emacs-devel@gnu.org > Date: Sat, 27 Feb 2016 14:53:21 -0500 > > > The simplest change would be to have character-folding disabled by > > default in some European locales whose users expressed objections to > ... > > Why not implement what I suggested? Even though there are several > levels, in each case they boil down into a set of classes of characters, > each one either symmetric or asymmetric. Once that calculation is done, > we can search for them with the existing mechanism. I will have to see the code, but I expect your suggestion to be much more complex, and thus unsuitable for the release branch. It's okay to do that on master, but John asked his questions wrt the release branch. > > That is, we > > can't expect to make a single decision up front, but will need feedback from > > users in every country that uses Emacs, in order to determine what the correct > > settings are for each language? > > Right. Once we show it to people, we will start getting language-specific > definitions. > > > And what about a Swedish speaker living in America who uses en_US because > > that's what 90% of his text is in, who then wants to search some Swedish text? > > Is it the locale that determines it, or something specific to the nature of > > the text in each buffer? And how would Emacs know? > > Clearly we need to provide a way to set the language for each buffer. > We need this for several purposes, another one being the ispell dictionary. These are definitely out for the release branch. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-27 20:01 ` Eli Zaretskii @ 2016-02-28 10:24 ` Richard Stallman 2016-02-28 16:01 ` Eli Zaretskii [not found] ` <<E1aZyX5-0007bU-Mu@fencepost.gnu.org> 1 sibling, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-28 10:24 UTC (permalink / raw) To: Eli Zaretskii; +Cc: joostkremers, larsi, johnw, lokedhs, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > I will have to see the code, but I expect your suggestion to be much > more complex, and thus unsuitable for the release branch. It's okay > to do that on master, but John asked his questions wrt the release > branch. For the release, I think we should turn it off by default and invite people to try turning it on. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-28 10:24 ` Richard Stallman @ 2016-02-28 16:01 ` Eli Zaretskii 0 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-28 16:01 UTC (permalink / raw) To: rms; +Cc: joostkremers, larsi, johnw, lokedhs, emacs-devel > From: Richard Stallman <rms@gnu.org> > CC: johnw@gnu.org, joostkremers@fastmail.fm, larsi@gnus.org, > lokedhs@gmail.com, emacs-devel@gnu.org > Date: Sun, 28 Feb 2016 05:24:59 -0500 > > For the release, I think we should turn it off by default > and invite people to try turning it on. That would be a grave mistake, IMO, since at least some parts of folding are a must, and no one objected to them till now (neither would I expect to see any objections). See my other message for details. ^ permalink raw reply [flat|nested] 263+ messages in thread
[parent not found: <<E1aZyX5-0007bU-Mu@fencepost.gnu.org>]
[parent not found: <<83oab0ako0.fsf@gnu.org>]
* RE: On language-dependent defaults for character-folding [not found] ` <<83oab0ako0.fsf@gnu.org> @ 2016-02-28 17:00 ` Drew Adams 2016-02-28 17:59 ` Clément Pit--Claudel 0 siblings, 1 reply; 263+ messages in thread From: Drew Adams @ 2016-02-28 17:00 UTC (permalink / raw) To: Eli Zaretskii, rms; +Cc: joostkremers, larsi, johnw, lokedhs, emacs-devel > > For the release, I think we should turn it off by default > > and invite people to try turning it on. > > That would be a grave mistake, IMO, since at least some parts of > folding are a must, and no one objected to them till now (neither > would I expect to see any objections). See my other message for > details. Some parts are a must? Which parts, and a must for what? A must for the _default_ behavior? ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-28 17:00 ` Drew Adams @ 2016-02-28 17:59 ` Clément Pit--Claudel 2016-02-28 18:04 ` Eli Zaretskii 2016-02-28 18:22 ` Drew Adams 0 siblings, 2 replies; 263+ messages in thread From: Clément Pit--Claudel @ 2016-02-28 17:59 UTC (permalink / raw) To: emacs-devel [-- Attachment #1.1: Type: text/plain, Size: 569 bytes --] On 02/28/2016 12:00 PM, Drew Adams wrote: >>> For the release, I think we should turn it off by default >>> and invite people to try turning it on. >> >> That would be a grave mistake, IMO, since at least some parts of >> folding are a must, and no one objected to them till now (neither >> would I expect to see any objections). See my other message for >> details. > > Some parts are a must? Which parts, and a must for what? > A must for the _default_ behavior? I guess Eli had pairs such as .../… in mind; I have not any disagreement about them. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-28 17:59 ` Clément Pit--Claudel @ 2016-02-28 18:04 ` Eli Zaretskii 2016-02-28 18:15 ` Clément Pit--Claudel 2016-02-28 18:23 ` Drew Adams 2016-02-28 18:22 ` Drew Adams 1 sibling, 2 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-28 18:04 UTC (permalink / raw) To: Clément Pit--Claudel; +Cc: emacs-devel > From: Clément Pit--Claudel <clement.pit@gmail.com> > Date: Sun, 28 Feb 2016 12:59:44 -0500 > > > Some parts are a must? Which parts, and a must for what? > > A must for the _default_ behavior? > > I guess Eli had pairs such as .../… in mind; I have not any disagreement about them. No, I meant the pre-composed characters and their decomposed equivalents. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-28 18:04 ` Eli Zaretskii @ 2016-02-28 18:15 ` Clément Pit--Claudel 2016-02-28 18:23 ` Drew Adams 1 sibling, 0 replies; 263+ messages in thread From: Clément Pit--Claudel @ 2016-02-28 18:15 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel [-- Attachment #1.1: Type: text/plain, Size: 480 bytes --] On 02/28/2016 01:04 PM, Eli Zaretskii wrote: >> From: Clément Pit--Claudel <clement.pit@gmail.com> >> Date: Sun, 28 Feb 2016 12:59:44 -0500 >> >>> Some parts are a must? Which parts, and a must for what? >>> A must for the _default_ behavior? >> >> I guess Eli had pairs such as .../… in mind; I have not any disagreement about them. > > No, I meant the pre-composed characters and their decomposed > equivalents. Of I see. Thanks for clarifying! I agree fully. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* RE: On language-dependent defaults for character-folding 2016-02-28 18:04 ` Eli Zaretskii 2016-02-28 18:15 ` Clément Pit--Claudel @ 2016-02-28 18:23 ` Drew Adams 2016-02-28 18:46 ` Eli Zaretskii 1 sibling, 1 reply; 263+ messages in thread From: Drew Adams @ 2016-02-28 18:23 UTC (permalink / raw) To: Eli Zaretskii, Clément Pit--Claudel; +Cc: emacs-devel > > > Some parts are a must? Which parts, and a must for what? > > > A must for the _default_ behavior? > > > > I guess Eli had pairs such as .../. in mind; I have not any > disagreement about them. > > No, I meant the pre-composed characters and their decomposed > equivalents. Why a must in terms of default behavior? ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-28 18:23 ` Drew Adams @ 2016-02-28 18:46 ` Eli Zaretskii 0 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-28 18:46 UTC (permalink / raw) To: Drew Adams; +Cc: clement.pit, emacs-devel > Date: Sun, 28 Feb 2016 10:23:23 -0800 (PST) > From: Drew Adams <drew.adams@oracle.com> > Cc: emacs-devel@gnu.org > > > > > Some parts are a must? Which parts, and a must for what? > > > > A must for the _default_ behavior? > > > > > > I guess Eli had pairs such as .../. in mind; I have not any > > disagreement about them. > > > > No, I meant the pre-composed characters and their decomposed > > equivalents. > > Why a must in terms of default behavior? Because they look identical on display. ^ permalink raw reply [flat|nested] 263+ messages in thread
* RE: On language-dependent defaults for character-folding 2016-02-28 17:59 ` Clément Pit--Claudel 2016-02-28 18:04 ` Eli Zaretskii @ 2016-02-28 18:22 ` Drew Adams 2016-02-28 18:58 ` Clément Pit--Claudel 1 sibling, 1 reply; 263+ messages in thread From: Drew Adams @ 2016-02-28 18:22 UTC (permalink / raw) To: Clément Pit--Claudel, emacs-devel > >>> For the release, I think we should turn it off by default > >>> and invite people to try turning it on. > >> > >> That would be a grave mistake, IMO, since at least some parts of > >> folding are a must, and no one objected to them till now (neither > >> would I expect to see any objections). See my other message for > >> details. > > > > Some parts are a must? Which parts, and a must for what? > > A must for the _default_ behavior? > > I guess Eli had pairs such as .../. in mind; I have not any > disagreement about them. Why would such pairs be a "must" in terms of the default behavior? ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-28 18:22 ` Drew Adams @ 2016-02-28 18:58 ` Clément Pit--Claudel 0 siblings, 0 replies; 263+ messages in thread From: Clément Pit--Claudel @ 2016-02-28 18:58 UTC (permalink / raw) To: Drew Adams, emacs-devel [-- Attachment #1.1: Type: text/plain, Size: 308 bytes --] On 02/28/2016 01:22 PM, Drew Adams wrote: >> I guess Eli had pairs such as .../. in mind; I have not any >> disagreement about them. > > Why would such pairs be a "must" in terms of the default behavior? I think your mailer (or mine) corrupted my message (or your quote). I wrote .../…, not .../. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-23 18:14 ` Eli Zaretskii 2016-02-23 20:24 ` Yuri Khan 2016-02-24 13:41 ` Richard Stallman @ 2016-02-24 13:41 ` Richard Stallman 2016-02-24 17:56 ` Eli Zaretskii 2 siblings, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-24 13:41 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > * Equivalent letters (ö and ø in Swedish). > Not just letters -- sequences of characters. For example, å vs aa in > Danish, or ffi vs ffi. å and aa in Danish are equivalent, like ö and ø in Swedish. Ligatures such as ffi are a different issue entirely. The relationship between ffi vs ffi is language-independent and similar to these two levels: * Different appearances of the same letter+decorations: as a single code point, or as a composition. * Identical-looking distinct code points (Latin a and Cyrillic a). -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-24 13:41 ` Richard Stallman @ 2016-02-24 17:56 ` Eli Zaretskii 2016-02-25 12:15 ` Richard Stallman 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-24 17:56 UTC (permalink / raw) To: rms; +Cc: larsi, lokedhs, emacs-devel > From: Richard Stallman <rms@gnu.org> > CC: larsi@gnus.org, lokedhs@gmail.com, emacs-devel@gnu.org > Date: Wed, 24 Feb 2016 08:41:46 -0500 > > > > * Equivalent letters (ö and ø in Swedish). > > > Not just letters -- sequences of characters. For example, å vs aa in > > Danish, or ffi vs ffi. > > å and aa in Danish are equivalent, like ö and ø in Swedish. > > Ligatures such as ffi are a different issue entirely. > The relationship between ffi vs ffi is language-independent > and similar to these two levels: > > * Different appearances of the same letter+decorations: > as a single code point, or as a composition. > > * Identical-looking distinct code points (Latin a and Cyrillic a). I didn't say the 2 examples were in the same class. My point was that we are not talking about equivalence of _characters_, we are talking about equivalent character _sequences_. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-24 17:56 ` Eli Zaretskii @ 2016-02-25 12:15 ` Richard Stallman 0 siblings, 0 replies; 263+ messages in thread From: Richard Stallman @ 2016-02-25 12:15 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > I didn't say the 2 examples were in the same class. My point was that > we are not talking about equivalence of _characters_, we are talking > about equivalent character _sequences_. That's true. My point is, if folding is going to fold some sequences with some letters, we need to put each sequence-match into the appropriate level, in order to handle them properly. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-23 17:43 ` Richard Stallman 2016-02-23 18:14 ` Eli Zaretskii @ 2016-02-23 20:21 ` Yuri Khan 2016-02-23 21:15 ` Marcin Borkowski 1 sibling, 1 reply; 263+ messages in thread From: Yuri Khan @ 2016-02-23 20:21 UTC (permalink / raw) To: rms@gnu.org; +Cc: Eli Zaretskii, lokedhs, Lars Ingebrigtsen, Emacs developers On Tue, Feb 23, 2016 at 11:43 PM, Richard Stallman <rms@gnu.org> wrote: > That is interesting. It means we need several levels of folding: > > * Identical-looking distinct code points (Latin a and Cyrillic a). > […] > The second level is also language-independent. Does anyone ever want > to turn it off? I see no reason to ever turn it on. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-23 20:21 ` Yuri Khan @ 2016-02-23 21:15 ` Marcin Borkowski 0 siblings, 0 replies; 263+ messages in thread From: Marcin Borkowski @ 2016-02-23 21:15 UTC (permalink / raw) To: Yuri Khan Cc: Lars Ingebrigtsen, Eli Zaretskii, lokedhs, rms@gnu.org, Emacs developers On 2016-02-23, at 21:21, Yuri Khan <yuri.v.khan@gmail.com> wrote: > On Tue, Feb 23, 2016 at 11:43 PM, Richard Stallman <rms@gnu.org> wrote: > >> That is interesting. It means we need several levels of folding: >> >> * Identical-looking distinct code points (Latin a and Cyrillic a). >> […] >> The second level is also language-independent. Does anyone ever want >> to turn it off? > > I see no reason to ever turn it on. I do, but it is indeed an extremely specialized case, and it is unlikely that anyone would use Emacs for that anyway. Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 2:34 ` Elias Mårtenson 2016-02-22 2:48 ` Lars Ingebrigtsen @ 2016-02-22 18:01 ` Richard Stallman 2016-02-22 18:58 ` Eli Zaretskii ` (2 more replies) 1 sibling, 3 replies; 263+ messages in thread From: Richard Stallman @ 2016-02-22 18:01 UTC (permalink / raw) To: Elias Mårtenson; +Cc: larsi, eliz, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > But you are Danish, are you not? As such, I would have thought that when > you search for ø, you would want to find a Swedish ö? (this is the inverse > of the natural Swedish behaviour). Elias and Lars, what do you two think searching for o should match? Should it match ö and ø, or not? IF you want o not to match ö and ø, then you want ö and ø to be a class by themselves. One way to handle each class is the asymnetric way: searching for the base character matches all of them, but searching for one of the other character matches only itself. In Swedish, ö could be the base character and ø a variant. In Danish, ø could be the base character and ö the variant. Would each of you be happy with that mode? -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 18:01 ` Richard Stallman @ 2016-02-22 18:58 ` Eli Zaretskii 2016-02-23 1:30 ` Lars Ingebrigtsen 2016-02-23 2:03 ` Elias Mårtenson 2 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-22 18:58 UTC (permalink / raw) To: rms; +Cc: larsi, lokedhs, emacs-devel > From: Richard Stallman <rms@gnu.org> > CC: larsi@gnus.org, eliz@gnu.org, emacs-devel@gnu.org > Date: Mon, 22 Feb 2016 13:01:13 -0500 > > One way to handle each class is the asymnetric way: searching for the base > character matches all of them, but searching for one of the other character > matches only itself. Emacs already behaves like that. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 18:01 ` Richard Stallman 2016-02-22 18:58 ` Eli Zaretskii @ 2016-02-23 1:30 ` Lars Ingebrigtsen 2016-02-23 17:46 ` Richard Stallman 2016-02-23 2:03 ` Elias Mårtenson 2 siblings, 1 reply; 263+ messages in thread From: Lars Ingebrigtsen @ 2016-02-23 1:30 UTC (permalink / raw) To: Richard Stallman; +Cc: eliz, Elias Mårtenson, emacs-devel Richard Stallman <rms@gnu.org> writes: > Elias and Lars, what do you two think searching for o should match? > Should it match ö and ø, or not? As a Norwegian, I think o should match ö, but not ø. For Americans, it should match both. > One way to handle each class is the asymnetric way: searching for the base > character matches all of them, but searching for one of the other character > matches only itself. > > In Swedish, ö could be the base character and ø a variant. > In Danish, ø could be the base character and ö the variant. > > Would each of you be happy with that mode? Hm... I would personally be surprised if any of these characters matched the other characters, but that may be just me. Others seem to find that helpful, apparently. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-23 1:30 ` Lars Ingebrigtsen @ 2016-02-23 17:46 ` Richard Stallman 2016-02-24 1:50 ` Lars Ingebrigtsen 0 siblings, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-23 17:46 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: eliz, lokedhs, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > As a Norwegian, I think o should match ö, but not ø. Could you explain why that would be best for you? > Hm... I would personally be surprised if any of these characters > matched the other characters, but that may be just me. Others seem to > find that helpful, apparently. Using my proposed levels (see the other message in this batch), I think you would want to turn off this level * Equivalent letters (ö and ø in Swedish). and turn on this level, asymmetrically. * Non-equivalent letters with a common base (o and ö/ø in Swedish). Would you be happy with that? -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-23 17:46 ` Richard Stallman @ 2016-02-24 1:50 ` Lars Ingebrigtsen 2016-02-24 6:40 ` Lars Brinkhoff 2016-02-24 13:43 ` Richard Stallman 0 siblings, 2 replies; 263+ messages in thread From: Lars Ingebrigtsen @ 2016-02-24 1:50 UTC (permalink / raw) To: Richard Stallman; +Cc: eliz, lokedhs, emacs-devel Richard Stallman <rms@gnu.org> writes: > > As a Norwegian, I think o should match ö, but not ø. > > Could you explain why that would be best for you? ø is a different letter from o in our 29 letter alphabet, and is a separate key on our keyboards. ö is just a variation of o. > > Hm... I would personally be surprised if any of these characters > > matched the other characters, but that may be just me. Others seem to > > find that helpful, apparently. > > Using my proposed levels (see the other message in this batch), I > think you would want to turn off this level > > * Equivalent letters (ö and ø in Swedish). > > and turn on this level, asymmetrically. > > * Non-equivalent letters with a common base (o and ö/ø in Swedish). > > Would you be happy with that? Uhm... I'm not quite sure. This is all getting so complicated. :-) The original, and quite easy to understand, feature being discussed was that if you search for "e", then all "e" variations should be found. ("Variation" here is "all those diacritics those furriners use all the time".) That's a feature I can get behind, and I think everybody would like. All this talk about equivalence classes feels like a totally different feature. Sure, in (older) Danish "å" can be spelled "aa", and they were sorted the same way, so they're "equivalent". But that's a totally different and separate feature set. It's the same with Swedes wanting ö and ø to be found. It's out of the scope of the simple, diacritic-ignoring feature that Emacs should definitely have. I think. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-24 1:50 ` Lars Ingebrigtsen @ 2016-02-24 6:40 ` Lars Brinkhoff 2016-02-24 13:43 ` Richard Stallman 1 sibling, 0 replies; 263+ messages in thread From: Lars Brinkhoff @ 2016-02-24 6:40 UTC (permalink / raw) To: emacs-devel Lars Ingebrigtsen <larsi@gnus.org> writes: > Richard Stallman <rms@gnu.org> writes: >> > As a Norwegian, I think o should match ö, but not ø. >> Could you explain why that would be best for you? > ø is a different letter from o in our 29 letter alphabet, and > is a separate key on our keyboards. ö is just a variation of > o. Maybe you point about the keyboard can be a useful illustration in the debate. (Maybe it has been brought up alread, in which case I apologize.) An English-speaking user would typically have a keyboard with the letters a-z, so it can be quite handy to have o match ö and ø, and n match ñ. Because it's somewhat inconvenient to type those letters on such a keyboard. Swedish-speaking users probably have keyboards with a separate ö key, so it's easy to search for ö without any folding. (The situation for ø is less clear; I can imagine that some Swedish user would like it to be matched by both o and ö, or just o, or just ö.) Similarly, Spanish keyboards have a separate ñ key (I learned that just now). ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-24 1:50 ` Lars Ingebrigtsen 2016-02-24 6:40 ` Lars Brinkhoff @ 2016-02-24 13:43 ` Richard Stallman 1 sibling, 0 replies; 263+ messages in thread From: Richard Stallman @ 2016-02-24 13:43 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: eliz, lokedhs, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > Using my proposed levels (see the other message in this batch), I > > think you would want to turn off this level > > > > * Equivalent letters (ö and ø in Swedish). > > > > and turn on this level, asymmetrically. > > > > * Non-equivalent letters with a common base (o and ö/ø in Swedish). > > > > Would you be happy with that? > Uhm... I'm not quite sure. Please help out by thinking about the question. What part are you not sure about? -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 18:01 ` Richard Stallman 2016-02-22 18:58 ` Eli Zaretskii 2016-02-23 1:30 ` Lars Ingebrigtsen @ 2016-02-23 2:03 ` Elias Mårtenson 2016-02-23 17:46 ` Richard Stallman 2 siblings, 1 reply; 263+ messages in thread From: Elias Mårtenson @ 2016-02-23 2:03 UTC (permalink / raw) To: rms; +Cc: Lars Ingebrigtsen, Eli Zaretskii, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1156 bytes --] On 23 February 2016 at 02:01, Richard Stallman <rms@gnu.org> wrote: > > > But you are Danish, are you not? As such, I would have thought that > when > > you search for ø, you would want to find a Swedish ö? (this is the > inverse > > of the natural Swedish behaviour). > > Elias and Lars, what do you two think searching for o should match? > Should it match ö and ø, or not? > I can only speak for Swedish, and there, a search for o definitely should not match ö (nor ø). This is the crux of this entire discussion, at least for me. However, a search for ö should match ø. > IF you want o not to match ö and ø, then you want ö and ø to be a > class by themselves. > > One way to handle each class is the asymnetric way: searching for the base > character matches all of them, but searching for one of the other character > matches only itself. > > In Swedish, ö could be the base character and ø a variant. > In Danish, ø could be the base character and ö the variant. > > Would each of you be happy with that mode? This is exactly in line with what I have been proposing. Regards, Elias [-- Attachment #2: Type: text/html, Size: 1741 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-23 2:03 ` Elias Mårtenson @ 2016-02-23 17:46 ` Richard Stallman 0 siblings, 0 replies; 263+ messages in thread From: Richard Stallman @ 2016-02-23 17:46 UTC (permalink / raw) To: Elias Mårtenson; +Cc: larsi, eliz, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > I can only speak for Swedish, and there, a search for o definitely should > not match ö (nor ø). This is the crux of this entire discussion, at least > for me. > However, a search for ö should match ø. Using my proposed levels (see the other message in this batch), I think you would want to turn on this level, asymmetrically, * Equivalent letters (ö and ø in Swedish). and turn off this level. * Non-equivalent letters with a common base (o and ö/ø in Swedish). Would you be happy with that? -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 1:58 ` Lars Ingebrigtsen 2016-02-22 2:34 ` Elias Mårtenson @ 2016-02-22 3:38 ` Eli Zaretskii 2016-02-22 3:57 ` Lars Ingebrigtsen 1 sibling, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-22 3:38 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: lokedhs, emacs-devel > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel <emacs-devel@gnu.org> > Date: Mon, 22 Feb 2016 12:58:31 +1100 > > Elias Mårtenson <lokedhs@gmail.com> writes: > > > It's like it's fine if you're typing in lower case characters for them > > to match upper case, too, but if you've bothered to type an upper case > > character, then you probably don't want lower case characters to match. > > > > This is how Emacs behaves today, is it not? > > Yes, and that's my point. I'd expect character folding when doing > searches to work in an analogous fashion: If I type `C-s é', I would be > surprised if it found "e", but not the other way around. Emacs behaves as you expect. Did you try that? ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 3:38 ` Eli Zaretskii @ 2016-02-22 3:57 ` Lars Ingebrigtsen 2016-02-22 16:10 ` Eli Zaretskii 2016-02-22 18:58 ` John Wiegley 0 siblings, 2 replies; 263+ messages in thread From: Lars Ingebrigtsen @ 2016-02-22 3:57 UTC (permalink / raw) To: Eli Zaretskii; +Cc: lokedhs, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Lars Ingebrigtsen <larsi@gnus.org> >> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel <emacs-devel@gnu.org> >> Date: Mon, 22 Feb 2016 12:58:31 +1100 >> >> Elias Mårtenson <lokedhs@gmail.com> writes: >> >> > It's like it's fine if you're typing in lower case characters for them >> > to match upper case, too, but if you've bothered to type an upper case >> > character, then you probably don't want lower case characters to match. >> > >> > This is how Emacs behaves today, is it not? >> >> Yes, and that's my point. I'd expect character folding when doing >> searches to work in an analogous fashion: If I type `C-s é', I would be >> surprised if it found "e", but not the other way around. > > Emacs behaves as you expect. Did you try that? I am describing how Emacs works today. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 3:57 ` Lars Ingebrigtsen @ 2016-02-22 16:10 ` Eli Zaretskii 2016-02-22 18:58 ` John Wiegley 1 sibling, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-22 16:10 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: lokedhs, emacs-devel > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: lokedhs@gmail.com, emacs-devel@gnu.org > Date: Mon, 22 Feb 2016 14:57:39 +1100 > > Eli Zaretskii <eliz@gnu.org> writes: > > >> From: Lars Ingebrigtsen <larsi@gnus.org> > >> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel <emacs-devel@gnu.org> > >> Date: Mon, 22 Feb 2016 12:58:31 +1100 > >> > >> Elias Mårtenson <lokedhs@gmail.com> writes: > >> > >> > It's like it's fine if you're typing in lower case characters for them > >> > to match upper case, too, but if you've bothered to type an upper case > >> > character, then you probably don't want lower case characters to match. > >> > > >> > This is how Emacs behaves today, is it not? > >> > >> Yes, and that's my point. I'd expect character folding when doing > >> searches to work in an analogous fashion: If I type `C-s é', I would be > >> surprised if it found "e", but not the other way around. > > > > Emacs behaves as you expect. Did you try that? > > I am describing how Emacs works today. So was I. I just wanted to be sure Emacs behaves according to your expectations in this case, and that you are not complaining about what it does. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 3:57 ` Lars Ingebrigtsen 2016-02-22 16:10 ` Eli Zaretskii @ 2016-02-22 18:58 ` John Wiegley 2016-02-23 7:50 ` Per Starbäck 1 sibling, 1 reply; 263+ messages in thread From: John Wiegley @ 2016-02-22 18:58 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: Eli Zaretskii, lokedhs, emacs-devel >>>>> Lars Ingebrigtsen <larsi@gnus.org> writes: > I am describing how Emacs works today. I'm worried that this very long discussion on character-folding is going nowhere. We're over 200 messages now, and it seems that the same arguments are being repeated about what does and does not constitute a letter to be folded. Or maybe my eyes glazed over, and that's what I think I'm seeing... If there are other technical discussions to be branched from this topic, now would be a good time to start new threads for them, if for no other reason than to clarify what the outcome of those threads should hopefully be. -- John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2 ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 18:58 ` John Wiegley @ 2016-02-23 7:50 ` Per Starbäck 2016-02-23 16:29 ` John Wiegley 0 siblings, 1 reply; 263+ messages in thread From: Per Starbäck @ 2016-02-23 7:50 UTC (permalink / raw) To: John Wiegley, Lars Ingebrigtsen, Eli Zaretskii, lokedhs, emacs-devel@gnu.org 2016-02-22 19:58 GMT+01:00 John Wiegley <jwiegley@gmail.com>: > I'm worried that this very long discussion on character-folding is going > nowhere. We're over 200 messages now, and it seems that the same arguments are > being repeated about what does and does not constitute a letter to be folded. > Or maybe my eyes glazed over, and that's what I think I'm seeing... I would have liked a more focused discussion on the most pressing issue, namely what to do regarding this in the upcoming release which is currently in pretest. Therefore I have avoided discussion on how to make the folding better in the future, even though I have my views on details on the ideal way to handle o vs ö vs ø, or how useful collation rules are, or how useful a user's locale settings are, etc. Artur had an interesting post on how he plans to make it better which I'd like to comment on someday, but won't for the time being, because it just detracts. All of this is interesting, but the planned substantial improvements in character folding will not be in the next released version, so none of those details matter and it's essentially just a question of a default setting of off or on for the feature as it currently stands. I think it has been shown without doubt that the feature as it currently stands will lead to many disappointed users. As Artur has written: > It's important that the default be helpful, > without appearing to be "buggy" to unsuspecting users. That is the view of what I understand to be the main developer of this feature, who tried to set the default to off. I think this should have been settled then, and think that Eli's view that it can be decided later is just wrong. Pretests should test what we intend to ship. Saying that it can be changed at the last moment just invites some error in the last-minute, for example that someone forgets to update the documentation that goes along with it. No more data is needed for this decision. (More data and more discussion may be needed for finding the best way forward after that, but that is something else.) ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-23 7:50 ` Per Starbäck @ 2016-02-23 16:29 ` John Wiegley 0 siblings, 0 replies; 263+ messages in thread From: John Wiegley @ 2016-02-23 16:29 UTC (permalink / raw) To: Per Starbäck Cc: Lars Ingebrigtsen, lokedhs, Eli Zaretskii, emacs-devel@gnu.org [-- Attachment #1: Type: text/plain, Size: 1016 bytes --] >>>>> Per Starbäck <per.starback@gmail.com> writes: > That is the view of what I understand to be the main developer of this > feature, who tried to set the default to off. I think this should have been > settled then, and think that Eli's view that it can be decided later is just > wrong. I agree that the pretest should be a pre-test, not a candidate run for features that won't appear in the final release. I think the hope was that pretesting would reveal that people want character folding, and so it really was a candidate for the next release. But I'm getting a string impression that character folding isn't quite ready for prime-time as a default feature. So right now, I'm looking for arguments that it *should* be made the default; otherwise, it seems wise to me to let it wait until things have been hammered out a lot more. -- John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 629 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-21 6:28 ` Elias Mårtenson 2016-02-21 8:14 ` Achim Gratz 2016-02-21 10:05 ` Lars Ingebrigtsen @ 2016-02-21 16:31 ` Eli Zaretskii 2016-02-21 16:58 ` Elias Mårtenson 2 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-21 16:31 UTC (permalink / raw) To: Elias Mårtenson; +Cc: larsi, emacs-devel > Date: Sun, 21 Feb 2016 14:28:40 +0800 > From: Elias Mårtenson <lokedhs@gmail.com> > Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel <emacs-devel@gnu.org> > > If that database gives us all that, then I'm all for using that database > instead of creating our own, of course. But why doesn't C-s o find ø, > and C-s l find ł then? > > Because under the Unicode decomposition rules, ø is not decomposable. I can't explain why that is the case (probably because there is no reason to have a combining /. I asked the question about this on the Unicode mailing list, let's see what we get in response. > After all, the only languages that use ø are languages that use it as a character of its own). Not sure what this means: how is the usage of ø in this regard different from, say, ä? > In the thread on the Unicode mailing list, the recommendation seems to be to use the CLDR (http://cldr.unicode.org/). Of course, this assumes there is a locale, but the choice of locale can easily be customisable (with the default being the user's locale). Not locale, language. > Another poster on the same thread mentioned that the CLDR doesn't go all the way, but adding a set of exceptions on top of it shouldn't be hard. In any case, the result would be significantly better than what is implemented now. The last part is not yet clear to me, as this aspect was never discussed in enough detail. I have now asked explicitly about that. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-21 16:31 ` Eli Zaretskii @ 2016-02-21 16:58 ` Elias Mårtenson 2016-02-21 17:23 ` Eli Zaretskii 2016-02-22 17:59 ` Richard Stallman 0 siblings, 2 replies; 263+ messages in thread From: Elias Mårtenson @ 2016-02-21 16:58 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1839 bytes --] On 22 February 2016 at 00:31, Eli Zaretskii <eliz@gnu.org> wrote: > > > After all, the only languages that use ø are languages that use it as a > character of its own). > > Not sure what this means: how is the usage of ø in this regard > different from, say, ä? > Well, if you are interested, here's how it works in the Scandinavian languages: Swedish has three extra characters: å, ä and ö. These are individual characters as has been discussed many times in this thread. Norwegian and Danish has the same extra characters, except that they write them as å, æ and ø (they also sort them in different order, but that's beside the point). Now, other languages may use the character (in the Unicode sense) ö as a variation of o. In other words, o with ¨ on top of it. For users of such languages ö is just a variation of o as we also have discussed before. On the other hand, ø is not used as a variation of o in any language that I am aware of. In Sweden, when discussing Norwegian or Danish words (usually names) we tend to keep their style of characters. So for example, if I might refer to my Swedish friend Östen and my Norwegian friend Øystein. I would not spell his name Öystein, even though it's technically the same letter. However, when searching for "ö" I would certainly expect to match the first letter of Øystein. > In the thread on the Unicode mailing list, the recommendation seems to be > to use the CLDR (http://cldr.unicode.org/). Of course, this assumes there > is a locale, but the choice of locale can easily be customisable (with the > default being the user's locale). > > Not locale, language. > Right. I guess I'm getting ahead of myself. As you know, I'm advocating choosing a default language based on the locale of the user. Regards, Elias [-- Attachment #2: Type: text/html, Size: 2551 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-21 16:58 ` Elias Mårtenson @ 2016-02-21 17:23 ` Eli Zaretskii 2016-02-21 18:48 ` Ivan Andrus ` (2 more replies) 2016-02-22 17:59 ` Richard Stallman 1 sibling, 3 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-21 17:23 UTC (permalink / raw) To: Elias Mårtenson; +Cc: larsi, emacs-devel > Date: Mon, 22 Feb 2016 00:58:37 +0800 > From: Elias Mårtenson <lokedhs@gmail.com> > Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org> > > Now, other languages may use the character (in the Unicode sense) ö as a variation of o. In other words, o > with ¨ on top of it. For users of such languages ö is just a variation of o as we also have discussed before. On > the other hand, ø is not used as a variation of o in any language that I am aware of. I don't think this is correct. I think ö is a letter on its own in any language that uses it. Which is why I don't see how it is different from ø. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-21 17:23 ` Eli Zaretskii @ 2016-02-21 18:48 ` Ivan Andrus 2016-02-22 15:58 ` Wolfgang Jenkner 2016-02-22 17:59 ` Richard Stallman 2 siblings, 0 replies; 263+ messages in thread From: Ivan Andrus @ 2016-02-21 18:48 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, Elias Mårtenson, emacs-devel On Feb 21, 2016, at 10:23 AM, Eli Zaretskii <eliz@gnu.org> wrote: > >> Date: Mon, 22 Feb 2016 00:58:37 +0800 >> From: Elias Mårtenson <lokedhs@gmail.com> >> Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org> >> >> Now, other languages may use the character (in the Unicode sense) ö as a variation of o. In other words, o >> with ¨ on top of it. For users of such languages ö is just a variation of o as we also have discussed before. On >> the other hand, ø is not used as a variation of o in any language that I am aware of. > > I don't think this is correct. I think ö is a letter on its own in > any language that uses it. Which is why I don't see how it is > different from ø. Well, the New Yorker writes coöperate [1], though it’s definitely an o. That said, I don’t think we should worry overly about that case, since we Americans will want o to match them all. :-) -Ivan [1] https://en.wikipedia.org/wiki/Diaeresis_(diacritic)#English ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-21 17:23 ` Eli Zaretskii 2016-02-21 18:48 ` Ivan Andrus @ 2016-02-22 15:58 ` Wolfgang Jenkner 2016-02-22 16:35 ` Eli Zaretskii 2016-02-22 17:59 ` Richard Stallman 2 siblings, 1 reply; 263+ messages in thread From: Wolfgang Jenkner @ 2016-02-22 15:58 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, Elias Mårtenson, emacs-devel On Sun, Feb 21 2016, Eli Zaretskii wrote: >> Date: Mon, 22 Feb 2016 00:58:37 +0800 >> From: Elias Mårtenson <lokedhs@gmail.com> >> Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org> >> >> Now, other languages may use the character (in the Unicode sense) ö as a variation of o. In other words, o >> with ¨ on top of it. For users of such languages ö is just a variation of o as we also have discussed before. On >> the other hand, ø is not used as a variation of o in any language that I am aware of. > > I don't think this is correct. I think ö is a letter on its own in > any language that uses it. Which is why I don't see how it is > different from ø. In German dictionary collation order there's only a secondary difference between o and ö [1] &O<<ö<<<Ö [1] http://unicode.org/repos/cldr/trunk/common/collation/de.xml ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 15:58 ` Wolfgang Jenkner @ 2016-02-22 16:35 ` Eli Zaretskii 2016-02-22 16:56 ` Wolfgang Jenkner 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-22 16:35 UTC (permalink / raw) To: Wolfgang Jenkner; +Cc: larsi, lokedhs, emacs-devel > From: Wolfgang Jenkner <wjenkner@inode.at> > Cc: Elias Mårtenson <lokedhs@gmail.com>, larsi@gnus.org, > emacs-devel@gnu.org > Date: Mon, 22 Feb 2016 16:58:36 +0100 > > > I don't think this is correct. I think ö is a letter on its own in > > any language that uses it. Which is why I don't see how it is > > different from ø. > > In German dictionary collation order there's only a secondary difference > between o and ö [1] > > &O<<ö<<<Ö Yes, I know. But that doesn't mean ö is not a letter on its own. IOW, collation order says nothing about letter differences, IMO. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 16:35 ` Eli Zaretskii @ 2016-02-22 16:56 ` Wolfgang Jenkner 2016-02-22 17:24 ` Eli Zaretskii 0 siblings, 1 reply; 263+ messages in thread From: Wolfgang Jenkner @ 2016-02-22 16:56 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel On Mon, Feb 22 2016, Eli Zaretskii wrote: >> > I don't think this is correct. I think ö is a letter on its own in >> > any language that uses it. Which is why I don't see how it is >> > different from ø. >> >> In German dictionary collation order there's only a secondary difference >> between o and ö [1] >> >> &O<<ö<<<Ö > > Yes, I know. But that doesn't mean ö is not a letter on its own. > > IOW, collation order says nothing about letter differences, IMO. I think it does. All objections to making char-fold search the default come from people who expect that letters with a *primary* difference in their locale should not be conflated. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 16:56 ` Wolfgang Jenkner @ 2016-02-22 17:24 ` Eli Zaretskii 0 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-22 17:24 UTC (permalink / raw) To: Wolfgang Jenkner; +Cc: larsi, lokedhs, emacs-devel > From: Wolfgang Jenkner <wjenkner@inode.at> > Cc: larsi@gnus.org, lokedhs@gmail.com, emacs-devel@gnu.org > Date: Mon, 22 Feb 2016 17:56:19 +0100 > > On Mon, Feb 22 2016, Eli Zaretskii wrote: > > >> > I don't think this is correct. I think ö is a letter on its own in > >> > any language that uses it. Which is why I don't see how it is > >> > different from ø. > >> > >> In German dictionary collation order there's only a secondary difference > >> between o and ö [1] > >> > >> &O<<ö<<<Ö > > > > Yes, I know. But that doesn't mean ö is not a letter on its own. > > > > IOW, collation order says nothing about letter differences, IMO. > > I think it does. All objections to making char-fold search the default > come from people who expect that letters with a *primary* difference in > their locale should not be conflated. I understand, and I didn't try to argue against that. The sub-thread about being a "letter on its own" is just a tangent, not directly related to the issue at hand. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-21 17:23 ` Eli Zaretskii 2016-02-21 18:48 ` Ivan Andrus 2016-02-22 15:58 ` Wolfgang Jenkner @ 2016-02-22 17:59 ` Richard Stallman 2016-02-22 18:57 ` Eli Zaretskii 2 siblings, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-22 17:59 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > I don't think this is correct. I think ö is a letter on its own in > any language that uses it. Which is why I don't see how it is > different from ø. Users seem to disagree on whether to fold diacritics that make different letters (ñ, ç, polish l with slash) or only those that modify a single letter (as á, à, â in French). I think that we should have a user option which controls this and only this. That means we should have two levels of folding group definitions: the smaller groups which hold variants of the same letter, and the bigger groups which hold similar letters. These groups need to depend on the language setting. In English (and in French), ö is a modified o. In Swedish (and German, I think), ö and o are different letters. I think that each folding group should specify one character that is the base. This is because users also seem to disagree on what it should mean to specify a non-base letter in the search string. Some plausible meanings are * Find that one and only that one. * Treat it the same as specifying the base letter. There should be a user option to choose between those two (and maybe some other behaviors for a non-base letter in the search string). -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 17:59 ` Richard Stallman @ 2016-02-22 18:57 ` Eli Zaretskii 2016-02-23 17:43 ` Richard Stallman ` (2 more replies) 0 siblings, 3 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-22 18:57 UTC (permalink / raw) To: rms; +Cc: larsi, lokedhs, emacs-devel > From: Richard Stallman <rms@gnu.org> > CC: lokedhs@gmail.com, larsi@gnus.org, emacs-devel@gnu.org > Date: Mon, 22 Feb 2016 12:59:03 -0500 > > Users seem to disagree on whether to fold diacritics that make > different letters (ñ, ç, polish l with slash) or only those that > modify a single letter (as á, à, â in French). > > I think that we should have a user option which controls this and only > this. > > That means we should have two levels of folding group definitions: the > smaller groups which hold variants of the same letter, and the bigger > groups which hold similar letters. > > These groups need to depend on the language setting. In English (and > in French), ö is a modified o. In Swedish (and German, I think), ö > and o are different letters. This can be done if it will help. But no one responded to these ideas until now, so I'm not sure we are not in for another round of rejections. > I think that each folding group should specify one character that is > the base. I'm not sure what that means. What is a "folding group"? > This is because users also seem to disagree on what it > should mean to specify a non-base letter in the search string. > > Some plausible meanings are > > * Find that one and only that one. > * Treat it the same as specifying the base letter. > > There should be a user option to choose between those two (and maybe > some other behaviors for a non-base letter in the search string). We already have both options, and in particular, if a non-base letter appears explicitly in the search string, it will be searched literally, similarly to what we do with case-insensitive search. E.g., searching for ö doesn't find o or any other of its variants. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 18:57 ` Eli Zaretskii @ 2016-02-23 17:43 ` Richard Stallman 2016-02-23 18:03 ` Eli Zaretskii 2016-02-23 17:43 ` Richard Stallman [not found] ` <<E1aYGze-000655-RM@fencepost.gnu.org> 2 siblings, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-23 17:43 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > Some plausible meanings are > > > > * Find that one and only that one. > > * Treat it the same as specifying the base letter. > > > > There should be a user option to choose between those two (and maybe > > some other behaviors for a non-base letter in the search string). > We already have both options, and in particular, if a non-base letter > appears explicitly in the search string, it will be searched > literally, similarly to what we do with case-insensitive search. Some users want that. Some, it appears, want searching for any letter in the group to find any letter in the group. So I am suggesting we offer both behaviors. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-23 17:43 ` Richard Stallman @ 2016-02-23 18:03 ` Eli Zaretskii 2016-02-24 13:41 ` Richard Stallman 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-23 18:03 UTC (permalink / raw) To: rms; +Cc: larsi, lokedhs, emacs-devel > From: Richard Stallman <rms@gnu.org> > CC: lokedhs@gmail.com, larsi@gnus.org, emacs-devel@gnu.org > Date: Tue, 23 Feb 2016 12:43:26 -0500 > > > We already have both options, and in particular, if a non-base letter > > appears explicitly in the search string, it will be searched > > literally, similarly to what we do with case-insensitive search. > > Some users want that. Some, it appears, want searching for any letter > in the group to find any letter in the group. So I am suggesting we > offer both behaviors. That's okay, but if we do, shouldn't we have similar options for case-folding and perhaps also for "lax-space" matching? Currently they all behave asymmetrically. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-23 18:03 ` Eli Zaretskii @ 2016-02-24 13:41 ` Richard Stallman 0 siblings, 0 replies; 263+ messages in thread From: Richard Stallman @ 2016-02-24 13:41 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > That's okay, but if we do, shouldn't we have similar options for > case-folding and perhaps also for "lax-space" matching? Not necessarily. There is no principle that says we have to give feature A whatever customizations we give to feature B. We could implement these options for case folding and whitespace matching if users want them. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 18:57 ` Eli Zaretskii 2016-02-23 17:43 ` Richard Stallman @ 2016-02-23 17:43 ` Richard Stallman [not found] ` <<E1aYGze-000655-RM@fencepost.gnu.org> 2 siblings, 0 replies; 263+ messages in thread From: Richard Stallman @ 2016-02-23 17:43 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > These groups need to depend on the language setting. In English (and > > in French), ö is a modified o. In Swedish (and German, I think), ö > > and o are different letters. > > I think that each folding group should specify one character that is > > the base. > I'm not sure what that means. What is a "folding group"? A group of characters which, under certain circumstances, isearch should fold together (treat as equivalent). -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
[parent not found: <<E1aYGze-000655-RM@fencepost.gnu.org>]
* RE: On language-dependent defaults for character-folding [not found] ` <<E1aYGze-000655-RM@fencepost.gnu.org> @ 2016-02-23 18:00 ` Drew Adams 0 siblings, 0 replies; 263+ messages in thread From: Drew Adams @ 2016-02-23 18:00 UTC (permalink / raw) To: rms, Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel > > > Some plausible meanings are > > > > > > * Find that one and only that one. > > > * Treat it the same as specifying the base letter. > > > > > > There should be a user option to choose between those two (and maybe > > > some other behaviors for a non-base letter in the search string). > > > > We already have both options, and in particular, if a non-base letter > > appears explicitly in the search string, it will be searched > > literally, similarly to what we do with case-insensitive search. > > Some users want that. Some, it appears, want searching for any letter > in the group to find any letter in the group. So I am suggesting we > offer both behaviors. +1. That is what I did, BTW, in my add-on to character-fold.el (option `char-fold-symmetric'). And the same user can want one or the other behavior at different times or in different contexts. Besides choosing a behavior as a general preference at customize time, you can toggle the behavior during Isearch, using `M-s =' (command `isearchp-toggle-symmetric-char-fold'): Toggle option `char-fold-symmetric'. This does not also toggle character folding. Note that symmetric character folding can slow down search. Use longer search strings to reduce this problem, or use `M-s h L' to turn off lazy highlighting. Moving some of the character-fold.el implementation to C would no doubt speed things up. But I hope that that will be done in a fine-grained modular way, providing individual Lisp functions that users can tweak. For example, I might not have been able to add this alternative behavior easily, were it not for the current regexp-using code in character-fold.el. I don't expect the same Lisp functions to be available after the implementation of some things in C, but let's try to make sure that a C implementation is not monolithic, preventing easy extension using Lisp. (No, I'm not suggesting that that has been the case in the past. Just mentioning the need to be able to extend and experiment in Lisp.) ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-21 16:58 ` Elias Mårtenson 2016-02-21 17:23 ` Eli Zaretskii @ 2016-02-22 17:59 ` Richard Stallman 2016-02-22 18:51 ` Eli Zaretskii 1 sibling, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-22 17:59 UTC (permalink / raw) To: Elias Mårtenson; +Cc: eliz, larsi, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > Right. I guess I'm getting ahead of myself. As you know, I'm advocating > choosing a default language based on the locale of the user. We need: * A per-buffer language preference variable. * A global value which becomes the default for new buffers. The global value can be initialized when Emacs starts based on the locale. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 17:59 ` Richard Stallman @ 2016-02-22 18:51 ` Eli Zaretskii 2016-02-23 0:14 ` Juri Linkov 2016-02-26 20:23 ` Richard Stallman 0 siblings, 2 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-22 18:51 UTC (permalink / raw) To: rms; +Cc: larsi, lokedhs, emacs-devel > From: Richard Stallman <rms@gnu.org> > CC: eliz@gnu.org, larsi@gnus.org, emacs-devel@gnu.org > Date: Mon, 22 Feb 2016 12:59:00 -0500 > > > Right. I guess I'm getting ahead of myself. As you know, I'm advocating > > choosing a default language based on the locale of the user. > > We need: > > * A per-buffer language preference variable. > * A global value which becomes the default for new buffers. That's unnecessarily restrictive; we can do better with the current infrastructure. Some encodings provide us with charset information, which can be used to deduce the language of the text. Some characters belong to Unicode blocks that allow identification of the language, or maybe a small group of languages. In some cases, the text itself comes with metadata which describes the language. And there might be other sources of information about the language. It would be silly to disregard this information where it exists. There are other aspects of this that need to be considered, if we want for language-specific searching to be solid. E.g., what happens with text copied to another buffer which might have a different per-buffer language preference? does it suddenly behave differently when searched? But the most basic issue is that any significant development in these directions require to re-implement the feature on the C level, and use char-tables for folding, like we do with case-mapping. So until someone steps forward for the job, all we can do is small corrections to the existing implementation. For example, the default state of character-folding might depend on the locale's language -- we could turn it off by default for languages whose users expressed dissatisfaction with the feature. We could also augment the regular expressions created for folding the search string by filtering out variants that users of a particular language don't want. If people think these ideas will make more users happy, we can work on that. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 18:51 ` Eli Zaretskii @ 2016-02-23 0:14 ` Juri Linkov 2016-02-23 17:11 ` Eli Zaretskii 2016-02-26 20:23 ` Richard Stallman 1 sibling, 1 reply; 263+ messages in thread From: Juri Linkov @ 2016-02-23 0:14 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, lokedhs, rms, emacs-devel > But the most basic issue is that any significant development in these > directions require to re-implement the feature on the C level, and use > char-tables for folding, like we do with case-mapping. So until > someone steps forward for the job, all we can do is small corrections > to the existing implementation. Do I understand correctly that essentially what is necessary to do on the C level is to extend char-tables with character insertions and deletions, so in addition to canonical equivalence mappings (like are used for the existing case-mappings) char-tables should also support matching of multi-character additions (like combining accents in the search string) and deletions (like combining accents from the search string missing in the search text)? > For example, the default state of character-folding might depend on > the locale's language -- we could turn it off by default for languages > whose users expressed dissatisfaction with the feature. We could also > augment the regular expressions created for folding the search string > by filtering out variants that users of a particular language don't > want. If people think these ideas will make more users happy, we can > work on that. It seems two user variables are necessary for customization: 1. inclusive folding groups that will include by default such pairs as o - ø, l - ł added to the Unicode decomposition-based rules, and allow the users to add more rules; 2. exclusive folding groups to exclude locale/language-dependent rules from the default mappings above, e.g. removing n - ñ for the "es" locale. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-23 0:14 ` Juri Linkov @ 2016-02-23 17:11 ` Eli Zaretskii 2016-02-24 0:16 ` Juri Linkov 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-23 17:11 UTC (permalink / raw) To: Juri Linkov; +Cc: larsi, lokedhs, rms, emacs-devel > From: Juri Linkov <juri@linkov.net> > Cc: rms@gnu.org, larsi@gnus.org, lokedhs@gmail.com, emacs-devel@gnu.org > Date: Tue, 23 Feb 2016 02:14:55 +0200 > > > But the most basic issue is that any significant development in these > > directions require to re-implement the feature on the C level, and use > > char-tables for folding, like we do with case-mapping. So until > > someone steps forward for the job, all we can do is small corrections > > to the existing implementation. > > Do I understand correctly that essentially what is necessary to do on the > C level is to extend char-tables with character insertions and deletions, > so in addition to canonical equivalence mappings (like are used for the > existing case-mappings) char-tables should also support matching of > multi-character additions (like combining accents in the search > string) and deletions (like combining accents from the search string > missing in the search text)? I'm not sure I understand why you think char-tables need to be extended in support of folding search. AFAIU, we need a way to normalize each character, both in the search string and in the buffer/string we search. This normalization involves decomposition followed by reordering the combining diacritics into a canonical order. Then we just match one against the other, almost as usual ("almost" because we need to backtrack in the buffer/string upon mismatch). (Of course, decomposition of buffer/string text needs to be done on the fly, but this is an implementation detail unrelated to this discussion.) So we need a char-table that maps each character into its decomposition sequence, which AFAIR is something the current char-tables can support already. Am I missing something? If you are interested in the details, I suggest reading http://unicode.org/reports/tr10/ and in particular http://unicode.org/reports/tr10/#Searching, which deals specifically with searching. http://www.unicode.org/notes/tn5/ is also a useful reading. > > For example, the default state of character-folding might depend on > > the locale's language -- we could turn it off by default for languages > > whose users expressed dissatisfaction with the feature. We could also > > augment the regular expressions created for folding the search string > > by filtering out variants that users of a particular language don't > > want. If people think these ideas will make more users happy, we can > > work on that. > > It seems two user variables are necessary for customization: > > 1. inclusive folding groups that will include by default such pairs > as o - ø, l - ł added to the Unicode decomposition-based rules, > and allow the users to add more rules; > > 2. exclusive folding groups to exclude locale/language-dependent rules from > the default mappings above, e.g. removing n - ñ for the "es" locale. I think we should add those in item 1 unconditionally (i.e. include them in the default mappings), and then exclude some of them under the rules you describe in item 2. Then the problem becomes easier, as we only need to filter out some mappings, as determined by a single user variable (whose default can come from the user locale). The additional mappings can be picked up from the file decomps.txt in the UCA database. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-23 17:11 ` Eli Zaretskii @ 2016-02-24 0:16 ` Juri Linkov 2016-02-24 18:39 ` Eli Zaretskii 0 siblings, 1 reply; 263+ messages in thread From: Juri Linkov @ 2016-02-24 0:16 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, lokedhs, rms, emacs-devel >> > But the most basic issue is that any significant development in these >> > directions require to re-implement the feature on the C level, and use >> > char-tables for folding, like we do with case-mapping. So until >> > someone steps forward for the job, all we can do is small corrections >> > to the existing implementation. >> >> Do I understand correctly that essentially what is necessary to do on the >> C level is to extend char-tables with character insertions and deletions, >> so in addition to canonical equivalence mappings (like are used for the >> existing case-mappings) char-tables should also support matching of >> multi-character additions (like combining accents in the search >> string) and deletions (like combining accents from the search string >> missing in the search text)? > > I'm not sure I understand why you think char-tables need to be > extended in support of folding search. AFAIU, we need a way to > normalize each character, both in the search string and in the > buffer/string we search. This normalization involves decomposition > followed by reordering the combining diacritics into a canonical > order. Then we just match one against the other, almost as usual > ("almost" because we need to backtrack in the buffer/string upon > mismatch). (Of course, decomposition of buffer/string text needs to > be done on the fly, but this is an implementation detail unrelated to > this discussion.) > > So we need a char-table that maps each character into its > decomposition sequence, which AFAIR is something the current > char-tables can support already. Am I missing something? Searching for a base character and matching a sequence of characters (e.g. a base character and combining accents) might be already possible by the current char-tables indexed by a base character. But I see no way to specify such a mapping in a char-table that e.g. a character should be skipped in the search buffer. Maybe this need could be avoided in an asymmetric search with combining characters in the search buffer, but still is required for ignorable characters. > If you are interested in the details, I suggest reading > http://unicode.org/reports/tr10/ and in particular > http://unicode.org/reports/tr10/#Searching, which deals specifically > with searching. http://www.unicode.org/notes/tn5/ is also a useful > reading. Thanks, looks like a complete specification with comprehensive answers to most questions. >> > For example, the default state of character-folding might depend on >> > the locale's language -- we could turn it off by default for languages >> > whose users expressed dissatisfaction with the feature. We could also >> > augment the regular expressions created for folding the search string >> > by filtering out variants that users of a particular language don't >> > want. If people think these ideas will make more users happy, we can >> > work on that. >> >> It seems two user variables are necessary for customization: >> >> 1. inclusive folding groups that will include by default such pairs >> as o - ø, l - ł added to the Unicode decomposition-based rules, >> and allow the users to add more rules; >> >> 2. exclusive folding groups to exclude locale/language-dependent rules from >> the default mappings above, e.g. removing n - ñ for the "es" locale. > > I think we should add those in item 1 unconditionally (i.e. include > them in the default mappings), and then exclude some of them under the > rules you describe in item 2. Then the problem becomes easier, as we > only need to filter out some mappings, as determined by a single user > variable (whose default can come from the user locale). Better to have 4 variables (2 internal + 2 user customizable variables): 1.1. (internal) default mappings with additional data from decomps.txt 1.2. user mappings to add to the default list 2.1. (internal) locale-dependent mappings to remove from the default list 2.2. user mappings to remove from the default list > The additional mappings can be picked up from the file decomps.txt in > the UCA database. It would be good to find all differences between UnicodeData.txt and decomps.txt. Is this the latest version? http://unicode.org/Public/UCA/6.3.0/decomps.txt ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-24 0:16 ` Juri Linkov @ 2016-02-24 18:39 ` Eli Zaretskii 2016-02-25 0:29 ` Juri Linkov 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-24 18:39 UTC (permalink / raw) To: Juri Linkov; +Cc: larsi, lokedhs, rms, emacs-devel > From: Juri Linkov <juri@linkov.net> > Cc: rms@gnu.org, larsi@gnus.org, lokedhs@gmail.com, emacs-devel@gnu.org > Date: Wed, 24 Feb 2016 02:16:23 +0200 > > > So we need a char-table that maps each character into its > > decomposition sequence, which AFAIR is something the current > > char-tables can support already. Am I missing something? > > Searching for a base character and matching a sequence of characters > (e.g. a base character and combining accents) might be already possible > by the current char-tables indexed by a base character. But I see > no way to specify such a mapping in a char-table that e.g. > a character should be skipped in the search buffer. Maybe this need > could be avoided in an asymmetric search with combining characters > in the search buffer, but still is required for ignorable characters. Whether ignorables can be supported by the current char-tables depends on the data we store in that table. It could be a vector of objects that provide both the codepoint and its weight; then it's easy to implement skipping characters by throwing away characters whose weight is above the threshold specified by the caller. > >> It seems two user variables are necessary for customization: > >> > >> 1. inclusive folding groups that will include by default such pairs > >> as o - ø, l - ł added to the Unicode decomposition-based rules, > >> and allow the users to add more rules; > >> > >> 2. exclusive folding groups to exclude locale/language-dependent rules from > >> the default mappings above, e.g. removing n - ñ for the "es" locale. > > > > I think we should add those in item 1 unconditionally (i.e. include > > them in the default mappings), and then exclude some of them under the > > rules you describe in item 2. Then the problem becomes easier, as we > > only need to filter out some mappings, as determined by a single user > > variable (whose default can come from the user locale). > > Better to have 4 variables (2 internal + 2 user customizable variables): Can you explain why it's better to have 4 variables rather than just one? > It would be good to find all differences between UnicodeData.txt and > decomps.txt. Is this the latest version? > http://unicode.org/Public/UCA/6.3.0/decomps.txt No, the latest is always here: http://unicode.org/Public/UCA/latest/decomps.txt (The last release of Unicode is v8.0.) ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-24 18:39 ` Eli Zaretskii @ 2016-02-25 0:29 ` Juri Linkov 2016-02-25 16:24 ` Eli Zaretskii 0 siblings, 1 reply; 263+ messages in thread From: Juri Linkov @ 2016-02-25 0:29 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, lokedhs, rms, emacs-devel [-- Attachment #1: Type: text/plain, Size: 2158 bytes --] >> >> It seems two user variables are necessary for customization: >> >> >> >> 1. inclusive folding groups that will include by default such pairs >> >> as o - ø, l - ł added to the Unicode decomposition-based rules, >> >> and allow the users to add more rules; >> >> >> >> 2. exclusive folding groups to exclude locale/language-dependent rules from >> >> the default mappings above, e.g. removing n - ñ for the "es" locale. >> > >> > I think we should add those in item 1 unconditionally (i.e. include >> > them in the default mappings), and then exclude some of them under the >> > rules you describe in item 2. Then the problem becomes easier, as we >> > only need to filter out some mappings, as determined by a single user >> > variable (whose default can come from the user locale). >> >> Better to have 4 variables (2 internal + 2 user customizable variables): > > Can you explain why it's better to have 4 variables rather than just > one? If you mean that one customizable variable should contain all mappings from UnicodeData.txt and decomps.txt presented to the user for customization, such a list will be too huge to customize: there are 5721 decompositions in UnicodeData.txt, and 6674 decompositions in decomps.txt. So we could have at least one default internal variable containing all decompositions from UnicodeData.txt plus decompositions from decomps.txt minus locale-dependent mappings. Then 2 user customizable variables should be enough: one will allow the users to add a mapping to the default list, and another to remove a mapping from the default list. >> It would be good to find all differences between UnicodeData.txt and >> decomps.txt. Is this the latest version? >> http://unicode.org/Public/UCA/6.3.0/decomps.txt > > No, the latest is always here: > > http://unicode.org/Public/UCA/latest/decomps.txt > > (The last release of Unicode is v8.0.) Thanks, comparing UnicodeData.txt with the latest decomps.txt shows 1600 differences (such as ł decomposed to l and ̵ and ø to o and ̸) we need to add manually (a whole set of differences is attached below): [-- Attachment #2: UnicodeData_decomps.diff --] [-- Type: text/x-diff, Size: 34090 bytes --] < ¨ = <compat> ̈ < ¯ = <compat> ̄ < ´ = <compat> ́ < ¸ = <compat> ̧ > Æ = <sort> A E > Ð = <sort> D > Ø = O ̸ > ß = <sort> s s > æ = <sort> a e > ð = <sort> d > ø = o ̸ > Đ = D ̵ > đ = d ̵ > Ħ = H ̵ > ħ = h ̵ > Ł = L ̵ > ł = l ̵ > Œ = <sort> O E > œ = <sort> o e < ſ = <compat> s > ſ = <sort> s > ƍ = <sort> z w > ƾ = <sort> t s < Ǣ = Æ ̄ < ǣ = æ ̄ > Ǣ = <sort> A E ̄ > ǣ = <sort> a e ̄ < Ǽ = Æ ́ < ǽ = æ ́ < Ǿ = Ø ́ < ǿ = ø ́ > Ǽ = <sort> A E ́ > ǽ = <sort> a e ́ > Ǿ = O ̸ ́ > ǿ = o ̸ ́ > ȸ = <sort> d b > ȹ = <sort> q p > ʣ = <sort> d z > ʤ = <sort> d ʒ > ʥ = <sort> d ʑ > ʦ = <sort> t s > ʧ = <sort> t ʃ > ʨ = <sort> t ɕ > ʩ = <sort> f ŋ > ʪ = <sort> l s > ʫ = <sort> l z < ˘ = <compat> ̆ < ˙ = <compat> ̇ < ˚ = <compat> ̊ < ˛ = <compat> ̨ < ˜ = <compat> ̃ < ˝ = <compat> ̋ > ̍ = > ̎ = > ̒ = > ̕ = > ̖ = > ̗ = > ̘ = > ̙ = > ̚ = > ̜ = > ̝ = > ̞ = > ̟ = > ̠ = > ̩ = > ̪ = > ̫ = > ̬ = > ̯ = > ̳ = > ̶ = > ̷ = > ̺ = > ̻ = > ̼ = > ̽ = > ̾ = > ̿ = > ͆ = > ͇ = > ͈ = > ͉ = > ͊ = > ͋ = > ͌ = > ͍ = > ͎ = > ͐ = > ͑ = > ͒ = > ͓ = > ͔ = > ͕ = > ͖ = > ͗ = > ͙ = > ͚ = > ͛ = > ͜ = > ͝ = > ͞ = > ͟ = > ͢ = > ͣ = <sort> a > ͤ = <sort> e > ͥ = <sort> i > ͦ = <sort> o > ͧ = <sort> u > ͨ = <sort> c > ͩ = <sort> d > ͪ = <sort> h > ͫ = <sort> m > ͬ = <sort> r > ͭ = <sort> t > ͮ = <sort> v > ͯ = <sort> x < ͺ = <compat> ͅ > ͺ = <sort> ι < ΄ = <compat> ́ < ΅ = <compat> ̈ ́ > ΄ = ´ > ΅ = ¨ ́ > ς = <final> σ > Ϗ = <sort> Κ α ι > ϗ = <sort> κ α ι < ϲ = <compat> ς > ϲ = <compat> σ > ҄ = > ҅ = ̔ > ҆ = ̓ > ҇ = > Ґ = <sort> Г > ґ = <sort> г > ֺ = ֹ > ׇ = ָ > ך = <final> כ > ם = <final> מ > ן = <final> נ > ף = <final> פ > ץ = <final> צ > װ = <sort> ו ו > ױ = <sort> ו י > ײ = <sort> י י < ٵ = <compat> ا ٴ < ٶ = <compat> و ٴ < ٷ = <compat> ۇ ٴ < ٸ = <compat> ي ٴ > ٴ = <sort> ء > ٵ = <compat> ا ء > ٶ = <compat> و ء > ٷ = <compat> ۇ ء > ٸ = <compat> ي ء > ۥ = <sort> و > ۦ = <sort> ي > ۽ = <sort> ء > ۾ = <sort> م > ܔ = <sort> ܓ > ܜ = <sort> ܛ > ܤ = <final> ܣ > ܧ = <sort> ܦ > ܭ = <sort> ܒ > ܮ = <sort> ܓ > ܯ = <sort> ܕ > ݁ = > ݂ = > ݅ = > ݆ = > ߨ = <sort> ߖ > ߩ = <sort> ߗ > ߪ = <sort> ߙ > ࠜ = ࠝ > ࠞ = ࠠ > ࠟ = ࠠ > ࠡ = ࠣ > ࠢ = ࠣ > ࠤ = ࠥ > ࠦ = ࠧ > ࠨ = ࠪ > ࠩ = ࠪ > ࡙ = > ࡚ = > ࡛ = > ࢭ = <sort> ا > ऀ = ँ > ॓ = ̀ > ॔ = ́ > ঁ = ँ > ং = ं > ঃ = ः > ় = ़ < ড় = ড ় < ঢ় = ঢ ় < য় = য ় < ਲ਼ = ਲ ਼ < ਸ਼ = ਸ ਼ < ਖ਼ = ਖ ਼ < ਗ਼ = ਗ ਼ < ਜ਼ = ਜ ਼ < ਫ਼ = ਫ ਼ > ৎ = <sort> ত ্ > ড় = ড ़ > ঢ় = ঢ ़ > য় = য ़ > ਁ = ँ > ਂ = ं > ਃ = ः > ਲ਼ = ਲ ़ > ਸ਼ = ਸ ़ > ਼ = ़ > ਖ਼ = ਖ ़ > ਗ਼ = ਗ ़ > ਜ਼ = ਜ ़ > ਫ਼ = ਫ ़ > ઁ = ँ > ં = ं > ઃ = ः > ઼ = ़ > ଁ = ँ > ଂ = ं > ଃ = ः > ଼ = ़ < ଡ଼ = ଡ ଼ < ଢ଼ = ଢ ଼ > ଡ଼ = ଡ ़ > ଢ଼ = ଢ ़ > ஂ = ं > ఀ = ँ > ఁ = ँ > ం = ं > ః = ः > ಁ = ँ > ಂ = ं > ಃ = ः > ಼ = ़ > ೋ = ೊ ೕ > ഁ = ँ > ം = ं > ഃ = ः > ൎ = <sort> ര ് > ൺ = <sort> ണ ് > ൻ = <sort> ന ് > ർ = <sort> ര ് > ൽ = <sort> ല ് > ൾ = <sort> ള ് > ൿ = <sort> ക ് > ං = ं > ඃ = ः > ෝ = ො ් < ำ = <compat> ํ า < ຳ = <compat> ໍ າ > ำ = ํ า > ຳ = ໍ າ > ༀ = <sort> ཨ ོ ं > ༪ = <sort> ༡ > ༫ = <sort> ༢ > ༬ = <sort> ༣ > ༭ = <sort> ༤ > ༮ = <sort> ༥ > ༯ = <sort> ༦ > ༰ = <sort> ༧ > ༱ = <sort> ༨ > ༲ = <sort> ༩ > ༳ = <sort> ༠ > ཪ = <sort> ར > ཷ = <compat> ྲ ཱྀ > ཹ = <compat> ླ ཱྀ > ཾ = ं > ཿ = ः > ྺ = <sort> ྭ > ྻ = <sort> ྱ > ྼ = <sort> ྲ > ါ = <sort> ာ > ံ = ं > း = ः > ဿ = <sort> သ ္ သ > = <sort> > ᚡ = <sort> ᚠ > ᚤ = <sort> ᚢ > ᚥ = <sort> ᚢ > ᚧ = <sort> ᚦ > ᚩ = <sort> ᚨ > ᚬ = <sort> ᚨ > ᚭ = <sort> ᚨ > ᚮ = <sort> ᚨ > ᚳ = <sort> ᚲ > ᚴ = <sort> ᚲ > ᚵ = <sort> ᚲ > ᚶ = <sort> ᚲ > ᚻ = <sort> ᚺ > ᚼ = <sort> ᚺ > ᚽ = <sort> ᚺ > ᚿ = <sort> ᚾ > ᛀ = <sort> ᚾ > ᛂ = <sort> ᛁ > ᛄ = <sort> ᛃ > ᛆ = <sort> ᛅ > ᛋ = <sort> ᛊ > ᛌ = <sort> ᛊ > ᛍ = <sort> ᛊ > ᛎ = <sort> ᛊ > ᛐ = <sort> ᛏ > ᛑ = <sort> ᛏ > ᛓ = <sort> ᛒ > ᛔ = <sort> ᛒ > ᛕ = <sort> ᛈ > ᛘ = <sort> ᛗ > ᛙ = <sort> ᛗ > ᛛ = <sort> ᛚ > ᛝ = <sort> ᛜ > ᛧ = <sort> ᛦ > ᛨ = <sort> ᛦ > ᛩ = <sort> ᚹ > ᛪ = <sort> ᛊ > ᛮ = <sort> ᛅ ᛚ > ᛯ = <sort> ᛗ ᛗ > ᛰ = <sort> ᚦ ᚦ > ំ = ं > ះ = ः > ់ = > ៌ = > ៍ = > ៎ = > ៏ = > ័ = > ៑ = > ៝ = > ᤝ = <sort> ᤈ ᤩ > ᤞ = <sort> ᤋ ᤪ > ᧞ = <sort> ᦜ ᦶ > ᧟ = <sort> ᦜ ᦶ ᧁ > ᩔ = <sort> ᩆ ᩠ ᩆ > ᩘ = <sort> ᨦ > ᩙ = <sort> ᨦ > ᩚ = <sort> ᨻ > ᩛ = <sort> ᨻ > ᩤ = <sort> ᩣ > ᩴ = ं > ᪰ = > ᪱ = > ᪲ = > ᪳ = > ᪴ = > ᪵ = > ᪶ = > ᪷ = > ᪸ = > ᪹ = > ᪺ = > ᪻ = > ᪼ = > ᪽ = > ᪾ = > ᬀ = ँ > ᬁ = ँ > ᬂ = ं > ᬄ = ः > ᬴ = ़ > ᮀ = ं > ᮂ = ः > ᮺ = <sort> ᮃ > ᮾ = <final> ᮊ > ᮿ = <final> ᮙ > ᯁ = <sort> ᯀ > ᯃ = <sort> ᯂ > ᯄ = <sort> ᯂ > ᯆ = <sort> ᯅ > ᯈ = <sort> ᯇ > ᯊ = <sort> ᯉ > ᯌ = <sort> ᯋ > ᯍ = <sort> ᯋ > ᯏ = <sort> ᯎ > ᯓ = <sort> ᯒ > ᯕ = <sort> ᯔ > ᯗ = <sort> ᯖ > ᯙ = <sort> ᯘ > ᯚ = <sort> ᯘ > ᯜ = <sort> ᯛ > ᯟ = <sort> ᯞ > ᯦ = ़ > ᯨ = <sort> ᯧ > ᯫ = <sort> ᯪ > ᯭ = <sort> ᯬ > ᯯ = <sort> ᯮ > ᰷ = ़ > ᳪ = <sort> ᳩ > ᳫ = <sort> ᳩ > ᳬ = <sort> ᳩ > ᳭ = ं > ᳮ = <sort> ᳩ > ᳯ = <sort> ᳩ > ᳰ = <sort> ᳩ > ᳱ = <sort> ᳩ > ᳲ = ः > ᳳ = ः < ᴭ = <super> Æ > ᴭ = <super> A E < ᵌ = <super> ɜ > ᵌ = <super> ᴈ > ᵎ = <super> ᴉ > ᵹ = <sort> g > ᵺ = <sort> t h < ᶞ = <super> ð > ᶞ = <super> d > ᷀ = > ᷁ = > ᷂ = > ᷃ = > ᷄ = > ᷅ = > ᷆ = > ᷇ = > ᷈ = > ᷉ = > ᷊ = <sort> r > ᷋ = > ᷌ = > ᷍ = > ᷎ = > ᷏ = > ᷐ = > ᷑ = > ᷒ = <sort> ꝯ > ᷓ = <sort> a > ᷔ = <sort> a e > ᷕ = <sort> a o > ᷖ = <sort> a v > ᷗ = <sort> c ̧ > ᷘ = <sort> d > ᷙ = <sort> d > ᷚ = <sort> g > ᷛ = <sort> ɢ > ᷜ = <sort> k > ᷝ = <sort> l > ᷞ = <sort> ʟ > ᷟ = <sort> ᴍ > ᷠ = <sort> n > ᷡ = <sort> ɴ > ᷢ = <sort> ʀ > ᷣ = <sort> ꝛ > ᷤ = <sort> s > ᷥ = <sort> s > ᷦ = <sort> z > ᷧ = <sort> ɑ > ᷨ = <sort> b > ᷩ = <sort> ꞵ > ᷪ = <sort> ə > ᷫ = <sort> f > ᷬ = <sort> ꬸ > ᷭ = <sort> o > ᷮ = <sort> p > ᷯ = <sort> ʃ > ᷰ = <sort> u > ᷱ = <sort> w > ᷲ = <sort> a ̈ > ᷳ = <sort> o ̈ > ᷴ = <sort> u ̈ > ᷵ = > ᷼ = > ᷽ = > ᷾ = > ᷿ = < ẛ = <compat> s ̇ > ẛ = <sort> s ̇ > ẞ = <sort> S S > Ỻ = <sort> L L > ỻ = <sort> l l < ᾽ = <compat> ̓ > ᾽ = ᾿ < ᾿ = <compat> ̓ < ῀ = <compat> ͂ < ῁ = <compat> ̈ ͂ > ῁ = ¨ ͂ < ῍ = <compat> ̓ ̀ < ῎ = <compat> ̓ ́ < ῏ = <compat> ̓ ͂ > ῍ = ᾿ ̀ > ῎ = ᾿ ́ > ῏ = ᾿ ͂ < ῝ = <compat> ̔ ̀ < ῞ = <compat> ̔ ́ < ῟ = <compat> ̔ ͂ > ῝ = ῾ ̀ > ῞ = ῾ ́ > ῟ = ῾ ͂ < ῭ = <compat> ̈ ̀ < ΅ = <compat> ̈ ́ > ῭ = ¨ ̀ > ΅ = ¨ ́ < ´ = <compat> ́ < ῾ = <compat> ̔ < = <compat> < = <compat> > ´ = ´ > = <compat> > = <compat> < ‗ = <compat> ̳ < ‾ = <compat> ̅ > ⃓ = ⃒ > ⃘ = > ⃙ = > ⃚ = > ⃝ = > ⃞ = > ⃟ = > ⃠ = > ⃢ = > ⃣ = > ⃤ = > ⃥ = > ⃪ = > ⃫ = > ⃬ = > ⃭ = > ⃮ = > ⃯ = > ⃰ = < ℏ = <font> ħ > ℏ = <font> h ̵ > ⅍ = <sort> A / S > ⓫ = <circle> 1 1 > ⓬ = <circle> 1 2 > ⓭ = <circle> 1 3 > ⓮ = <circle> 1 4 > ⓯ = <circle> 1 5 > ⓰ = <circle> 1 6 > ⓱ = <circle> 1 7 > ⓲ = <circle> 1 8 > ⓳ = <circle> 1 9 > ⓴ = <circle> 2 0 > ⓵ = <circle> 1 > ⓶ = <circle> 2 > ⓷ = <circle> 3 > ⓸ = <circle> 4 > ⓹ = <circle> 5 > ⓺ = <circle> 6 > ⓻ = <circle> 7 > ⓼ = <circle> 8 > ⓽ = <circle> 9 > ⓾ = <circle> 1 0 > ⓿ = <circle> 0 > ❶ = <circle> 1 > ❷ = <circle> 2 > ❸ = <circle> 3 > ❹ = <circle> 4 > ❺ = <circle> 5 > ❻ = <circle> 6 > ❼ = <circle> 7 > ❽ = <circle> 8 > ❾ = <circle> 9 > ❿ = <circle> 1 0 > ➀ = <circle> 1 > ➁ = <circle> 2 > ➂ = <circle> 3 > ➃ = <circle> 4 > ➄ = <circle> 5 > ➅ = <circle> 6 > ➆ = <circle> 7 > ➇ = <circle> 8 > ➈ = <circle> 9 > ➉ = <circle> 1 0 > ➊ = <circle> 1 > ➋ = <circle> 2 > ➌ = <circle> 3 > ➍ = <circle> 4 > ➎ = <circle> 5 > ➏ = <circle> 6 > ➐ = <circle> 7 > ➑ = <circle> 8 > ➒ = <circle> 9 > ➓ = <circle> 1 0 < ⵯ = <super> ⵡ > ⳤ = <sort> ⲕ ⲁ ⲓ > ⳯ = > ⳰ = ̔ > ⳱ = ̓ > ⷠ = <sort> б > ⷡ = <sort> в > ⷢ = <sort> г > ⷣ = <sort> д > ⷤ = <sort> ж > ⷥ = <sort> з > ⷦ = <sort> к > ⷧ = <sort> л > ⷨ = <sort> м > ⷩ = <sort> н > ⷪ = <sort> о > ⷫ = <sort> п > ⷬ = <sort> р > ⷭ = <sort> с > ⷮ = <sort> т > ⷯ = <sort> х > ⷰ = <sort> ц > ⷱ = <sort> ч > ⷲ = <sort> ш > ⷳ = <sort> щ > ⷴ = <sort> ѳ > ⷵ = <sort> с т > ⷶ = <sort> а > ⷷ = <sort> е > ⷸ = <sort> ꙉ > ⷹ = <sort> ꙋ > ⷺ = <sort> ѣ > ⷻ = <sort> ю > ⷼ = <sort> ꙗ > ⷽ = <sort> ѧ > ⷾ = <sort> ѫ > ⷿ = <sort> ѭ > ⺀ = <sort> 丶 > ⺁ = <sort> 厂 > ⺂ = <sort> 乛 > ⺃ = <sort> 乚 > ⺄ = <sort> 乙 > ⺅ = <sort> 亻 > ⺆ = <sort> 冂 > ⺇ = <sort> 几 > ⺈ = <sort> 刀 > ⺉ = <sort> 刂 > ⺊ = <sort> 卜 > ⺋ = <sort> 卩 > ⺌ = <sort> 小 > ⺍ = <sort> 小 > ⺎ = <sort> 尢 > ⺏ = <sort> 尣 > ⺐ = <sort> 尢 > ⺑ = <sort> 尣 > ⺒ = <sort> 巳 > ⺓ = <sort> 幺 > ⺔ = <sort> 彑 > ⺕ = <sort> 彐 > ⺖ = <sort> 忄 > ⺗ = <sort> 心 > ⺘ = <sort> 扌 > ⺙ = <sort> 攵 > ⺛ = <sort> 旡 > ⺜ = <sort> 日 > ⺝ = <sort> 月 > ⺞ = <sort> 歺 > ⺠ = <sort> 民 > ⺡ = <sort> 氵 > ⺢ = <sort> 氺 > ⺣ = <sort> 灬 > ⺤ = <sort> 爫 > ⺥ = <sort> 爫 > ⺦ = <sort> 丬 > ⺧ = <sort> 牛 > ⺨ = <sort> 犭 > ⺩ = <sort> 王 > ⺪ = <sort> 疋 > ⺫ = <sort> 目 > ⺬ = <sort> 示 > ⺭ = <sort> 礻 > ⺮ = <sort> 竹 > ⺯ = <sort> 糹 > ⺰ = <sort> 纟 > ⺱ = <sort> 罓 > ⺲ = <sort> 罒 > ⺳ = <sort> 罓 > ⺴ = <sort> 罓 > ⺵ = <sort> 罒 > ⺶ = <sort> 羊 > ⺷ = <sort> 羊 > ⺸ = <sort> 羋 > ⺹ = <sort> 耂 > ⺺ = <sort> 肀 > ⺻ = <sort> 聿 > ⺼ = <sort> 肉 > ⺽ = <sort> 臼 > ⺾ = <sort> 艹 > ⺿ = <sort> 艹 > ⻀ = <sort> 艹 > ⻁ = <sort> 虎 > ⻂ = <sort> 衤 > ⻃ = <sort> 覀 > ⻄ = <sort> 西 > ⻅ = <sort> 见 > ⻆ = <sort> 角 > ⻇ = <sort> 角 > ⻈ = <sort> 讠 > ⻉ = <sort> 贝 > ⻊ = <sort> 足 > ⻋ = <sort> 车 > ⻌ = <sort> 辶 > ⻍ = <sort> 辶 > ⻎ = <sort> 辶 > ⻏ = <sort> 邑 > ⻐ = <sort> 钅 > ⻑ = <sort> 長 > ⻒ = <sort> 镸 > ⻓ = <sort> 长 > ⻔ = <sort> 门 > ⻕ = <sort> 阜 > ⻖ = <sort> 阝 > ⻗ = <sort> 雨 > ⻘ = <sort> 青 > ⻙ = <sort> 韦 > ⻚ = <sort> 页 > ⻛ = <sort> 风 > ⻜ = <sort> 飞 > ⻝ = <sort> 食 > ⻞ = <sort> 飠 > ⻟ = <sort> 飠 > ⻠ = <sort> 饣 > ⻡ = <sort> 首 > ⻢ = <sort> 马 > ⻣ = <sort> 骨 > ⻤ = <sort> 鬼 > ⻥ = <sort> 鱼 > ⻦ = <sort> 鸟 > ⻧ = <sort> 鹵 > ⻨ = <sort> 麦 > ⻩ = <sort> 黄 > ⻪ = <sort> 黾 > ⻫ = <sort> 齊 > ⻬ = <sort> 齐 > ⻭ = <sort> 齒 > ⻮ = <sort> 齿 > ⻯ = <sort> 龍 > ⻰ = <sort> 龙 > ⻱ = <sort> 龜 > ⻲ = <sort> 龜 > 〆 = <sort> し め > 〲 = 〱 ゙ > 〴 = 〳 ゙ > 〼 = <sort> ま す < ゛ = <compat> ゙ < ゜ = <compat> ゚ > ㆠ = <sort> ㄅ > ㆡ = <sort> ㄗ > ㆢ = <sort> ㄐ > ㆣ = <sort> ㄍ > ㆥ = <sort> ㆤ > ㆧ = <sort> ㄛ > ㆨ = <sort> ㄨ > ㆩ = <sort> ㄚ > ㆪ = <sort> ㄧ > ㆫ = <sort> ㄨ > ㆮ = <sort> ㄞ > ㆯ = <sort> ㄠ > ㆳ = <vertical> ㄧ > ㆴ = <final> ㄆ > ㆵ = <final> ㄊ > ㆶ = <final> ㄎ > ㆷ = <final> ㄏ > ㉈ = <circle> 1 0 > ㉉ = <circle> 2 0 > ㉊ = <circle> 3 0 > ㉋ = <circle> 4 0 > ㉌ = <circle> 5 0 > ㉍ = <circle> 6 0 > ㉎ = <circle> 7 0 > ㉏ = <circle> 8 0 < ㋐ = <circle> ア < ㋑ = <circle> イ < ㋒ = <circle> ウ < ㋓ = <circle> エ < ㋔ = <circle> オ < ㋕ = <circle> カ < ㋖ = <circle> キ < ㋗ = <circle> ク < ㋘ = <circle> ケ < ㋙ = <circle> コ < ㋚ = <circle> サ < ㋛ = <circle> シ < ㋜ = <circle> ス < ㋝ = <circle> セ < ㋞ = <circle> ソ < ㋟ = <circle> タ < ㋠ = <circle> チ < ㋡ = <circle> ツ < ㋢ = <circle> テ < ㋣ = <circle> ト < ㋤ = <circle> ナ < ㋥ = <circle> ニ < ㋦ = <circle> ヌ < ㋧ = <circle> ネ < ㋨ = <circle> ノ < ㋩ = <circle> ハ < ㋪ = <circle> ヒ < ㋫ = <circle> フ < ㋬ = <circle> ヘ < ㋭ = <circle> ホ < ㋮ = <circle> マ < ㋯ = <circle> ミ < ㋰ = <circle> ム < ㋱ = <circle> メ < ㋲ = <circle> モ < ㋳ = <circle> ヤ < ㋴ = <circle> ユ < ㋵ = <circle> ヨ < ㋶ = <circle> ラ < ㋷ = <circle> リ < ㋸ = <circle> ル < ㋹ = <circle> レ < ㋺ = <circle> ロ < ㋻ = <circle> ワ < ㋼ = <circle> ヰ < ㋽ = <circle> ヱ < ㋾ = <circle> ヲ > ㋐ = <circlekata> ア > ㋑ = <circlekata> イ > ㋒ = <circlekata> ウ > ㋓ = <circlekata> エ > ㋔ = <circlekata> オ > ㋕ = <circlekata> カ > ㋖ = <circlekata> キ > ㋗ = <circlekata> ク > ㋘ = <circlekata> ケ > ㋙ = <circlekata> コ > ㋚ = <circlekata> サ > ㋛ = <circlekata> シ > ㋜ = <circlekata> ス > ㋝ = <circlekata> セ > ㋞ = <circlekata> ソ > ㋟ = <circlekata> タ > ㋠ = <circlekata> チ > ㋡ = <circlekata> ツ > ㋢ = <circlekata> テ > ㋣ = <circlekata> ト > ㋤ = <circlekata> ナ > ㋥ = <circlekata> ニ > ㋦ = <circlekata> ヌ > ㋧ = <circlekata> ネ > ㋨ = <circlekata> ノ > ㋩ = <circlekata> ハ > ㋪ = <circlekata> ヒ > ㋫ = <circlekata> フ > ㋬ = <circlekata> ヘ > ㋭ = <circlekata> ホ > ㋮ = <circlekata> マ > ㋯ = <circlekata> ミ > ㋰ = <circlekata> ム > ㋱ = <circlekata> メ > ㋲ = <circlekata> モ > ㋳ = <circlekata> ヤ > ㋴ = <circlekata> ユ > ㋵ = <circlekata> ヨ > ㋶ = <circlekata> ラ > ㋷ = <circlekata> リ > ㋸ = <circlekata> ル > ㋹ = <circlekata> レ > ㋺ = <circlekata> ロ > ㋻ = <circlekata> ワ > ㋼ = <circlekata> ヰ > ㋽ = <circlekata> ヱ > ㋾ = <circlekata> ヲ < ㍸ = <square> d m <super> 2 < ㍹ = <square> d m <super> 3 > ㍸ = <square> d m 2 > ㍹ = <square> d m 3 < ㎕ = <square> μ <font> l < ㎖ = <square> m <font> l < ㎗ = <square> d <font> l < ㎘ = <square> k <font> l > ㎕ = <square> μ l > ㎖ = <square> m l > ㎗ = <square> d l > ㎘ = <square> k l < ㎟ = <square> m m <super> 2 < ㎠ = <square> c m <super> 2 < ㎡ = <square> m <super> 2 < ㎢ = <square> k m <super> 2 < ㎣ = <square> m m <super> 3 < ㎤ = <square> c m <super> 3 < ㎥ = <square> m <super> 3 < ㎦ = <square> k m <super> 3 > ㎟ = <square> m m 2 > ㎠ = <square> c m 2 > ㎡ = <square> m 2 > ㎢ = <square> k m 2 > ㎣ = <square> m m 3 > ㎤ = <square> c m 3 > ㎥ = <square> m 3 > ㎦ = <square> k m 3 < ㎨ = <square> m ∕ s <super> 2 > ㎨ = <square> m ∕ s 2 < ㎯ = <square> r a d ∕ s <super> 2 > ㎯ = <square> r a d ∕ s 2 > ꘐ = <sort> ꕘ > ꘑ = <sort> ꕪ > ꘒ = <sort> ꖇ > ꘓ = <sort> ꔌ ꘋ > ꘔ = <sort> ꔞ ꘋ > ꘕ = <sort> ꔳ ꘋ > ꘖ = <sort> ꕇ ꘌ > ꘗ = <sort> ꕒ ꘋ > ꘘ = <sort> ꕘ ꘌ > ꘙ = <sort> ꕚ ꘌ > ꘚ = <sort> ꕠ ꘋ > ꘛ = <sort> ꖅ ꘋ > ꘜ = <sort> ꖴ ꘋ > ꘝ = <sort> ꗋ ꘋ > ꘞ = <sort> ꗑ ꘌ > ꘟ = <sort> ꗘ ꘋ > ꘪ = <sort> ꕮ > ꘫ = <sort> ꗑ > Ꙩ = <sort> О > ꙩ = <sort> о > Ꙫ = <sort> О > ꙫ = <sort> о > Ꙭ = <sort> О > ꙭ = <sort> о > ꙮ = <sort> о > ꙴ = <sort> є > ꙵ = <sort> и > ꙶ = <sort> і ̈ > ꙷ = <sort> у > ꙸ = <sort> ъ > ꙹ = <sort> ы > ꙺ = <sort> ь > ꙻ = <sort> ѡ > ꙼ = > ꙽ = > Ꚙ = <sort> О > ꚙ = <sort> о > Ꚛ = <sort> О > ꚛ = <sort> о > ꚞ = <sort> ф > ꚟ = <sort> ѥ > Ꜩ = <sort> T z > ꜩ = <sort> t z > Ꜳ = <sort> A A > ꜳ = <sort> a a > Ꜵ = <sort> A O > ꜵ = <sort> a o > Ꜷ = <sort> A U > ꜷ = <sort> a u > Ꜹ = <sort> A V > ꜹ = <sort> a v > Ꜻ = <sort> A V > ꜻ = <sort> a v > Ꜽ = <sort> A Y > ꜽ = <sort> a y > Ꝏ = <sort> O O > ꝏ = <sort> o o > Ꝡ = <sort> V Y > ꝡ = <sort> v y < ꟸ = <super> Ħ < ꟹ = <super> œ > Ꝺ = <sort> D > ꝺ = <sort> d > Ꝼ = <sort> F > ꝼ = <sort> f > Ᵹ = <sort> G > Ꞃ = <sort> R > ꞃ = <sort> r > Ꞅ = <sort> S > ꞅ = <sort> s > Ꞇ = <sort> T > ꞇ = <sort> t > Ꞛ = <sort> A ̈ > ꞛ = <sort> a ̈ > Ꞝ = <sort> O ̈ > ꞝ = <sort> o ̈ > Ꞟ = <sort> U ̈ > ꞟ = <sort> u ̈ > Ꞡ = <sort> G > ꞡ = <sort> g > Ꞣ = <sort> K > ꞣ = <sort> k > Ꞥ = <sort> N > ꞥ = <sort> n > Ꞧ = <sort> R > ꞧ = <sort> r > Ꞩ = <sort> S > ꞩ = <sort> s > ꟸ = <super> H ̵ > ꟹ = <super> o e > ꠋ = ं > ꢀ = ं > ꢁ = ः > ꣳ = <sort> ꣲ > ꣴ = <sort> ꣲ > ꣵ = <sort> ꣲ > ꣶ = <sort> ꣲ > ꣷ = <sort> ꣲ > ꦀ = ँ > ꦁ = ं > ꦃ = ः > ꦬ = <sort> ꦫ > ꦳ = ़ < ſt = <compat> <compat> s t > ſt = <compat> s t < ײַ = ײ ַ > ײַ = <sort> י י ַ < ﬦ = <font> ם > ﬦ = <font> מ < ךּ = ך ּ > ךּ = <final> כ ּ < ףּ = ף ּ > ףּ = <final> פ ּ < ﯝ = <isolated> <compat> ۇ ٴ > ﯝ = <isolated> ۇ ء < ﯪ = <isolated> ي ٔ ا < ﯫ = <final> ي ٔ ا < ﯬ = <isolated> ي ٔ ە < ﯭ = <final> ي ٔ ە < ﯮ = <isolated> ي ٔ و < ﯯ = <final> ي ٔ و < ﯰ = <isolated> ي ٔ ۇ < ﯱ = <final> ي ٔ ۇ < ﯲ = <isolated> ي ٔ ۆ < ﯳ = <final> ي ٔ ۆ < ﯴ = <isolated> ي ٔ ۈ < ﯵ = <final> ي ٔ ۈ < ﯶ = <isolated> ي ٔ ې < ﯷ = <final> ي ٔ ې < ﯸ = <initial> ي ٔ ې < ﯹ = <isolated> ي ٔ ى < ﯺ = <final> ي ٔ ى < ﯻ = <initial> ي ٔ ى > ﯪ = <isolated> ئ ا > ﯫ = <final> ئ ا > ﯬ = <isolated> ئ ە > ﯭ = <final> ئ ە > ﯮ = <isolated> ئ و > ﯯ = <final> ئ و > ﯰ = <isolated> ئ ۇ > ﯱ = <final> ئ ۇ > ﯲ = <isolated> ئ ۆ > ﯳ = <final> ئ ۆ > ﯴ = <isolated> ئ ۈ > ﯵ = <final> ئ ۈ > ﯶ = <isolated> ئ ې > ﯷ = <final> ئ ې > ﯸ = <initial> ئ ې > ﯹ = <isolated> ئ ى > ﯺ = <final> ئ ى > ﯻ = <initial> ئ ى < ﰀ = <isolated> ي ٔ ج < ﰁ = <isolated> ي ٔ ح < ﰂ = <isolated> ي ٔ م < ﰃ = <isolated> ي ٔ ى < ﰄ = <isolated> ي ٔ ي > ﰀ = <isolated> ئ ج > ﰁ = <isolated> ئ ح > ﰂ = <isolated> ئ م > ﰃ = <isolated> ئ ى > ﰄ = <isolated> ئ ي < ﱞ = <isolated> ٌ ّ < ﱟ = <isolated> ٍ ّ < ﱠ = <isolated> َ ّ < ﱡ = <isolated> ُ ّ < ﱢ = <isolated> ِ ّ < ﱣ = <isolated> ّ ٰ < ﱤ = <final> ي ٔ ر < ﱥ = <final> ي ٔ ز < ﱦ = <final> ي ٔ م < ﱧ = <final> ي ٔ ن < ﱨ = <final> ي ٔ ى < ﱩ = <final> ي ٔ ي > ﱞ = <isolated> ٌ ّ > ﱟ = <isolated> ٍ ّ > ﱠ = <isolated> َ ّ > ﱡ = <isolated> ُ ّ > ﱢ = <isolated> ِ ّ > ﱣ = <isolated> ّ ٰ > ﱤ = <final> ئ ر > ﱥ = <final> ئ ز > ﱦ = <final> ئ م > ﱧ = <final> ئ ن > ﱨ = <final> ئ ى > ﱩ = <final> ئ ي < ﲗ = <initial> ي ٔ ج < ﲘ = <initial> ي ٔ ح < ﲙ = <initial> ي ٔ خ < ﲚ = <initial> ي ٔ م < ﲛ = <initial> ي ٔ ه > ﲗ = <initial> ئ ج > ﲘ = <initial> ئ ح > ﲙ = <initial> ئ خ > ﲚ = <initial> ئ م > ﲛ = <initial> ئ ه < ﳟ = <medial> ي ٔ م < ﳠ = <medial> ي ٔ ه > ﳟ = <medial> ئ م > ﳠ = <medial> ئ ه < ﳲ = <medial> ـ َ ّ < ﳳ = <medial> ـ ُ ّ < ﳴ = <medial> ـ ِ ّ > ﳲ = <medial> َ ّ > ﳳ = <medial> ُ ّ > ﳴ = <medial> ِ ّ < ︙ = <vertical> <compat> . . . < ︰ = <vertical> <compat> . . > ︙ = <vertical> . . . > ︠ = ͡ > ︢ = ͠ > ︧ = > ︩ = ͠ > ︮ = ҃ > ︰ = <vertical> . . < ﹉ = <compat> <compat> ̅ < ﹊ = <compat> <compat> ̅ < ﹋ = <compat> <compat> ̅ < ﹌ = <compat> <compat> ̅ > ﹉ = <compat> ‾ > ﹊ = <compat> ‾ > ﹋ = <compat> ‾ > ﹌ = <compat> ‾ < ﹰ = <isolated> ً < ﹱ = <medial> ـ ً < ﹲ = <isolated> ٌ < ﹴ = <isolated> ٍ < ﹶ = <isolated> َ < ﹷ = <medial> ـ َ < ﹸ = <isolated> ُ < ﹹ = <medial> ـ ُ < ﹺ = <isolated> ِ < ﹻ = <medial> ـ ِ < ﹼ = <isolated> ّ < ﹽ = <medial> ـ ّ < ﹾ = <isolated> ْ < ﹿ = <medial> ـ ْ > ﹰ = <isolated> ً > ﹱ = <medial> ً > ﹲ = <isolated> ٌ > ﹴ = <isolated> ٍ > ﹶ = <isolated> َ > ﹷ = <medial> َ > ﹸ = <isolated> ُ > ﹹ = <medial> ُ > ﹺ = <isolated> ِ > ﹻ = <medial> ِ > ﹼ = <isolated> ّ > ﹽ = <medial> ّ > ﹾ = <isolated> ْ > ﹿ = <medial> ْ < ﺁ = <isolated> ا ٓ < ﺂ = <final> ا ٓ < ﺃ = <isolated> ا ٔ < ﺄ = <final> ا ٔ < ﺅ = <isolated> و ٔ < ﺆ = <final> و ٔ < ﺇ = <isolated> ا ٕ < ﺈ = <final> ا ٕ < ﺉ = <isolated> ي ٔ < ﺊ = <final> ي ٔ < ﺋ = <initial> ي ٔ < ﺌ = <medial> ي ٔ > ﺁ = <isolated> آ > ﺂ = <final> آ > ﺃ = <isolated> أ > ﺄ = <final> أ > ﺅ = <isolated> ؤ > ﺆ = <final> ؤ > ﺇ = <isolated> إ > ﺈ = <final> إ > ﺉ = <isolated> ئ > ﺊ = <final> ئ > ﺋ = <initial> ئ > ﺌ = <medial> ئ < ﻵ = <isolated> ل ا ٓ < ﻶ = <final> ل ا ٓ < ﻷ = <isolated> ل ا ٔ < ﻸ = <final> ل ا ٔ < ﻹ = <isolated> ل ا ٕ < ﻺ = <final> ل ا ٕ > ﻵ = <isolated> ل آ > ﻶ = <final> ل آ > ﻷ = <isolated> ل أ > ﻸ = <final> ل أ > ﻹ = <isolated> ل إ > ﻺ = <final> ل إ < ァ = <narrow> ァ < ィ = <narrow> ィ < ゥ = <narrow> ゥ < ェ = <narrow> ェ < ォ = <narrow> ォ < ャ = <narrow> ャ < ュ = <narrow> ュ < ョ = <narrow> ョ < ッ = <narrow> ッ > ァ = <smallnarrow> ア > ィ = <smallnarrow> イ > ゥ = <smallnarrow> ウ > ェ = <smallnarrow> エ > ォ = <smallnarrow> オ > ャ = <smallnarrow> ヤ > ュ = <smallnarrow> ユ > ョ = <smallnarrow> ヨ > ッ = <smallnarrow> ツ < ᅠ = <narrow> <compat> ᅠ < ᄀ = <narrow> <compat> ᄀ < ᄁ = <narrow> <compat> ᄁ < ᆪ = <narrow> <compat> ᆪ < ᄂ = <narrow> <compat> ᄂ < ᆬ = <narrow> <compat> ᆬ < ᆭ = <narrow> <compat> ᆭ < ᄃ = <narrow> <compat> ᄃ < ᄄ = <narrow> <compat> ᄄ < ᄅ = <narrow> <compat> ᄅ < ᆰ = <narrow> <compat> ᆰ < ᆱ = <narrow> <compat> ᆱ < ᆲ = <narrow> <compat> ᆲ < ᆳ = <narrow> <compat> ᆳ < ᆴ = <narrow> <compat> ᆴ < ᆵ = <narrow> <compat> ᆵ < ᄚ = <narrow> <compat> ᄚ < ᄆ = <narrow> <compat> ᄆ < ᄇ = <narrow> <compat> ᄇ < ᄈ = <narrow> <compat> ᄈ < ᄡ = <narrow> <compat> ᄡ < ᄉ = <narrow> <compat> ᄉ < ᄊ = <narrow> <compat> ᄊ < ᄋ = <narrow> <compat> ᄋ < ᄌ = <narrow> <compat> ᄌ < ᄍ = <narrow> <compat> ᄍ < ᄎ = <narrow> <compat> ᄎ < ᄏ = <narrow> <compat> ᄏ < ᄐ = <narrow> <compat> ᄐ < ᄑ = <narrow> <compat> ᄑ < ᄒ = <narrow> <compat> ᄒ < ᅡ = <narrow> <compat> ᅡ < ᅢ = <narrow> <compat> ᅢ < ᅣ = <narrow> <compat> ᅣ < ᅤ = <narrow> <compat> ᅤ < ᅥ = <narrow> <compat> ᅥ < ᅦ = <narrow> <compat> ᅦ < ᅧ = <narrow> <compat> ᅧ < ᅨ = <narrow> <compat> ᅨ < ᅩ = <narrow> <compat> ᅩ < ᅪ = <narrow> <compat> ᅪ < ᅫ = <narrow> <compat> ᅫ < ᅬ = <narrow> <compat> ᅬ < ᅭ = <narrow> <compat> ᅭ < ᅮ = <narrow> <compat> ᅮ < ᅯ = <narrow> <compat> ᅯ < ᅰ = <narrow> <compat> ᅰ < ᅱ = <narrow> <compat> ᅱ < ᅲ = <narrow> <compat> ᅲ < ᅳ = <narrow> <compat> ᅳ < ᅴ = <narrow> <compat> ᅴ < ᅵ = <narrow> <compat> ᅵ > ᅠ = <narrow> ᅠ > ᄀ = <narrow> ᄀ > ᄁ = <narrow> ᄁ > ᆪ = <narrow> ᆪ > ᄂ = <narrow> ᄂ > ᆬ = <narrow> ᆬ > ᆭ = <narrow> ᆭ > ᄃ = <narrow> ᄃ > ᄄ = <narrow> ᄄ > ᄅ = <narrow> ᄅ > ᆰ = <narrow> ᆰ > ᆱ = <narrow> ᆱ > ᆲ = <narrow> ᆲ > ᆳ = <narrow> ᆳ > ᆴ = <narrow> ᆴ > ᆵ = <narrow> ᆵ > ᄚ = <narrow> ᄚ > ᄆ = <narrow> ᄆ > ᄇ = <narrow> ᄇ > ᄈ = <narrow> ᄈ > ᄡ = <narrow> ᄡ > ᄉ = <narrow> ᄉ > ᄊ = <narrow> ᄊ > ᄋ = <narrow> ᄋ > ᄌ = <narrow> ᄌ > ᄍ = <narrow> ᄍ > ᄎ = <narrow> ᄎ > ᄏ = <narrow> ᄏ > ᄐ = <narrow> ᄐ > ᄑ = <narrow> ᄑ > ᄒ = <narrow> ᄒ > ᅡ = <narrow> ᅡ > ᅢ = <narrow> ᅢ > ᅣ = <narrow> ᅣ > ᅤ = <narrow> ᅤ > ᅥ = <narrow> ᅥ > ᅦ = <narrow> ᅦ > ᅧ = <narrow> ᅧ > ᅨ = <narrow> ᅨ > ᅩ = <narrow> ᅩ > ᅪ = <narrow> ᅪ > ᅫ = <narrow> ᅫ > ᅬ = <narrow> ᅬ > ᅭ = <narrow> ᅭ > ᅮ = <narrow> ᅮ > ᅯ = <narrow> ᅯ > ᅰ = <narrow> ᅰ > ᅱ = <narrow> ᅱ > ᅲ = <narrow> ᅲ > ᅳ = <narrow> ᅳ > ᅴ = <narrow> ᅴ > ᅵ = <narrow> ᅵ <  ̄ = <wide> <compat> ̄ >  ̄ = <wide> ¯ < 𑂚 = 𑂙 𑂺 < 𑂜 = 𑂛 𑂺 < 𑂫 = 𑂥 𑂺 > 𐍶 = <sort> 𐍐 > 𐍷 = <sort> 𐍓 > 𐍸 = <sort> 𐍗 > 𐍹 = <sort> 𐍝 > 𐍺 = <sort> 𐍡 > 𐡭 = <final> 𐡮 > 𐢀 = <final> 𐢁 > 𐢂 = <final> 𐢃 > 𐢆 = <final> 𐢇 > 𐢌 = <final> 𐢍 > 𐢎 = <final> 𐢏 > 𐢐 = <final> 𐢑 > 𐢒 = <final> 𐢓 > 𐢔 = <final> 𐢕 > 𐢜 = <final> 𐢝 > 𐦀 = <sort> 𐦠 > 𐦁 = <sort> 𐦡 > 𐦂 = <sort> 𐦢 > 𐦃 = <sort> 𐦣 > 𐦄 = <sort> 𐦤 > 𐦅 = <sort> 𐦥 > 𐦆 = <sort> 𐦦 > 𐦇 = <sort> 𐦦 > 𐦈 = <sort> 𐦧 > 𐦉 = <sort> 𐦨 > 𐦊 = <sort> 𐦩 > 𐦋 = <sort> 𐦩 > 𐦌 = <sort> 𐦪 > 𐦍 = <sort> 𐦪 > 𐦎 = <sort> 𐦫 > 𐦏 = <sort> 𐦫 > 𐦐 = <sort> 𐦬 > 𐦑 = <sort> 𐦭 > 𐦒 = <sort> 𐦮 > 𐦓 = <sort> 𐦯 > 𐦔 = <sort> 𐦯 > 𐦕 = <sort> 𐦱 > 𐦖 = <sort> 𐦲 > 𐦗 = <sort> 𐦳 > 𐦘 = <sort> 𐦴 > 𐦙 = <sort> 𐦴 > 𐦚 = <sort> 𐦵 > 𐦛 = <sort> 𐦵 > 𐦜 = <sort> 𐦶 > 𐦝 = <sort> 𐦷 > 𐦰 = <sort> 𐦯 > 𐨍 = > 𐨎 = ं > 𐨏 = ः > 𐫈 = <sort> 𐫇 > 𐫥 = > 𐫦 = > 𐬮 = <sort> 𐬭 > 𐰁 = <sort> 𐰀 > 𐰄 = <sort> 𐰃 > 𐰈 = <sort> 𐰇 > 𐰊 = <sort> 𐰉 > 𐰌 = <sort> 𐰋 > 𐰎 = <sort> 𐰍 > 𐰐 = <sort> 𐰏 > 𐰒 = <sort> 𐰑 > 𐰕 = <sort> 𐰔 > 𐰗 = <sort> 𐰖 > 𐰙 = <sort> 𐰘 > 𐰛 = <sort> 𐰚 > 𐰝 = <sort> 𐰜 > 𐰟 = <sort> 𐰞 > 𐰥 = <sort> 𐰤 > 𐰧 = <sort> 𐰦 > 𐰩 = <sort> 𐰨 > 𐰫 = <sort> 𐰪 > 𐰮 = <sort> 𐰭 > 𐰳 = <sort> 𐰲 > 𐰵 = <sort> 𐰴 > 𐰷 = <sort> 𐰶 > 𐰹 = <sort> 𐰸 > 𐰻 = <sort> 𐰺 > 𐱀 = <sort> 𐰿 > 𐱂 = <sort> 𐱁 > 𐱄 = <sort> 𐱃 > 𐱆 = <sort> 𐱅 > 𐲁 = <sort> 𐲀 > 𐲊 = <sort> 𐲉 > 𐲋 = <sort> 𐲉 > 𐲑 = <sort> 𐲐 > 𐲜 = <sort> 𐲛 > 𐲞 = <sort> 𐲝 > 𐲟 = <sort> 𐲝 > 𐲣 = <sort> 𐲢 > 𐲫 = <sort> 𐲪 > 𐲭 = <sort> 𐲬 > 𐳁 = <sort> 𐳀 > 𐳊 = <sort> 𐳉 > 𐳋 = <sort> 𐳉 > 𐳑 = <sort> 𐳐 > 𐳜 = <sort> 𐳛 > 𐳞 = <sort> 𐳝 > 𐳟 = <sort> 𐳝 > 𐳣 = <sort> 𐳢 > 𐳫 = <sort> 𐳪 > 𐳭 = <sort> 𐳬 > 𑀀 = ँ > 𑀁 = ं > 𑀂 = ः > 𑂀 = ँ > 𑂁 = ं > 𑂂 = ः > 𑂚 = 𑂙 ़ > 𑂜 = 𑂛 ़ > 𑂫 = 𑂥 ़ > 𑂺 = ़ > 𑄀 = ँ > 𑄁 = ं > 𑄂 = ः > 𑅳 = ़ > 𑆀 = ँ > 𑆁 = ं > 𑆂 = ः > 𑇊 = ़ > 𑈴 = ं > 𑈶 = ़ > 𑈷 = ّ > 𑋟 = ं > 𑋩 = ़ > 𑌀 = ं > 𑌁 = ँ > 𑌂 = ं > 𑌃 = ः > 𑌼 = ़ > 𑒿 = ँ > 𑓀 = ं > 𑓁 = ः > 𑓃 = ़ > 𑖼 = ँ > 𑖽 = ं > 𑖾 = ः > 𑗀 = ़ > 𑗘 = <sort> 𑖂 > 𑗙 = <sort> 𑖂 > 𑗚 = <sort> 𑖃 > 𑗛 = <sort> 𑖄 > 𑗜 = <sort> 𑖲 > 𑗝 = <sort> 𑖳 > 𑘽 = ं > 𑘾 = ः > 𑙀 = ँ > 𑚫 = ं > 𑚬 = ः > 𑚷 = ़ > 𑜅 = <sort> 𑜄 > 𑜖 = <sort> 𑜕 > 𖼆 = <sort> 𖼄 > 𖼓 = <sort> 𖼐 > 𖼥 = <sort> 𖼣 > 𖼿 = <sort> 𖼽 > 𛲝 = > 𛲞 = < 𝚹 = <font> <compat> Θ > 𝚹 = <font> Θ < 𝛓 = <font> ς > 𝛓 = <font> σ < 𝛜 = <font> <compat> ε < 𝛝 = <font> <compat> θ < 𝛞 = <font> <compat> κ < 𝛟 = <font> <compat> φ < 𝛠 = <font> <compat> ρ < 𝛡 = <font> <compat> π > 𝛜 = <font> ε > 𝛝 = <font> θ > 𝛞 = <font> κ > 𝛟 = <font> φ > 𝛠 = <font> ρ > 𝛡 = <font> π < 𝛳 = <font> <compat> Θ > 𝛳 = <font> Θ < 𝜍 = <font> ς > 𝜍 = <font> σ < 𝜖 = <font> <compat> ε < 𝜗 = <font> <compat> θ < 𝜘 = <font> <compat> κ < 𝜙 = <font> <compat> φ < 𝜚 = <font> <compat> ρ < 𝜛 = <font> <compat> π > 𝜖 = <font> ε > 𝜗 = <font> θ > 𝜘 = <font> κ > 𝜙 = <font> φ > 𝜚 = <font> ρ > 𝜛 = <font> π < 𝜭 = <font> <compat> Θ > 𝜭 = <font> Θ < 𝝇 = <font> ς > 𝝇 = <font> σ < 𝝐 = <font> <compat> ε < 𝝑 = <font> <compat> θ < 𝝒 = <font> <compat> κ < 𝝓 = <font> <compat> φ < 𝝔 = <font> <compat> ρ < 𝝕 = <font> <compat> π > 𝝐 = <font> ε > 𝝑 = <font> θ > 𝝒 = <font> κ > 𝝓 = <font> φ > 𝝔 = <font> ρ > 𝝕 = <font> π < 𝝧 = <font> <compat> Θ > 𝝧 = <font> Θ < 𝞁 = <font> ς > 𝞁 = <font> σ < 𝞊 = <font> <compat> ε < 𝞋 = <font> <compat> θ < 𝞌 = <font> <compat> κ < 𝞍 = <font> <compat> φ < 𝞎 = <font> <compat> ρ < 𝞏 = <font> <compat> π > 𝞊 = <font> ε > 𝞋 = <font> θ > 𝞌 = <font> κ > 𝞍 = <font> φ > 𝞎 = <font> ρ > 𝞏 = <font> π < 𝞡 = <font> <compat> Θ > 𝞡 = <font> Θ < 𝞻 = <font> ς > 𝞻 = <font> σ < 𝟄 = <font> <compat> ε < 𝟅 = <font> <compat> θ < 𝟆 = <font> <compat> κ < 𝟇 = <font> <compat> φ < 𝟈 = <font> <compat> ρ < 𝟉 = <font> <compat> π > 𝟄 = <font> ε > 𝟅 = <font> θ > 𝟆 = <font> κ > 𝟇 = <font> φ > 𝟈 = <font> ρ > 𝟉 = <font> π > 🄋 = <circle> 0 > 🄌 = <circle> 0 > 🅐 = <circle> A > 🅑 = <circle> B > 🅒 = <circle> C > 🅓 = <circle> D > 🅔 = <circle> E > 🅕 = <circle> F > 🅖 = <circle> G > 🅗 = <circle> H > 🅘 = <circle> I > 🅙 = <circle> J > 🅚 = <circle> K > 🅛 = <circle> L > 🅜 = <circle> M > 🅝 = <circle> N > 🅞 = <circle> O > 🅟 = <circle> P > 🅠 = <circle> Q > 🅡 = <circle> R > 🅢 = <circle> S > 🅣 = <circle> T > 🅤 = <circle> U > 🅥 = <circle> V > 🅦 = <circle> W > 🅧 = <circle> X > 🅨 = <circle> Y > 🅩 = <circle> Z > 🅰 = <square> A > 🅱 = <square> B > 🅲 = <square> C > 🅳 = <square> D > 🅴 = <square> E > 🅵 = <square> F > 🅶 = <square> G > 🅷 = <square> H > 🅸 = <square> I > 🅹 = <square> J > 🅺 = <square> K > 🅻 = <square> L > 🅼 = <square> M > 🅽 = <square> N > 🅾 = <square> O > 🅿 = <square> P > 🆀 = <square> Q > 🆁 = <square> R > 🆂 = <square> S > 🆃 = <square> T > 🆄 = <square> U > 🆅 = <square> V > 🆆 = <square> W > 🆇 = <square> X > 🆈 = <square> Y > 🆉 = <square> Z > 🆊 = <square> P > 🆋 = <square> I C > 🆌 = <square> P A > 🆍 = <square> S A > 🆎 = <square> A B > 🆏 = <square> W C > 🆑 = <square> C L > 🆒 = <square> C O O L > 🆓 = <square> F R E E > 🆔 = <square> I D > 🆕 = <square> N E W > 🆖 = <square> N G > 🆗 = <square> O K > 🆘 = <square> S O S > 🆙 = <square> U P ! > 🆚 = <square> V S ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-25 0:29 ` Juri Linkov @ 2016-02-25 16:24 ` Eli Zaretskii 2016-02-29 0:22 ` Juri Linkov 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-25 16:24 UTC (permalink / raw) To: Juri Linkov; +Cc: larsi, lokedhs, rms, emacs-devel > From: Juri Linkov <juri@linkov.net> > Cc: larsi@gnus.org, lokedhs@gmail.com, rms@gnu.org, emacs-devel@gnu.org > Date: Thu, 25 Feb 2016 02:29:11 +0200 > > >> >> It seems two user variables are necessary for customization: > >> >> > >> >> 1. inclusive folding groups that will include by default such pairs > >> >> as o - ø, l - ł added to the Unicode decomposition-based rules, > >> >> and allow the users to add more rules; > >> >> > >> >> 2. exclusive folding groups to exclude locale/language-dependent rules from > >> >> the default mappings above, e.g. removing n - ñ for the "es" locale. > >> > > >> > I think we should add those in item 1 unconditionally (i.e. include > >> > them in the default mappings), and then exclude some of them under the > >> > rules you describe in item 2. Then the problem becomes easier, as we > >> > only need to filter out some mappings, as determined by a single user > >> > variable (whose default can come from the user locale). > >> > >> Better to have 4 variables (2 internal + 2 user customizable variables): > > > > Can you explain why it's better to have 4 variables rather than just > > one? > > If you mean that one customizable variable should contain all mappings from > UnicodeData.txt and decomps.txt presented to the user for customization, > such a list will be too huge to customize: there are 5721 decompositions > in UnicodeData.txt, and 6674 decompositions in decomps.txt. No, of course not. That would be extremely inconvenient. What I envisioned is a single variable that holds a list of folding sub-features. Examples include ignoring diacritics, matching ligatures and their decompositions, "controversial" foldings that users of specific languages might not want, etc. The default value will hold all of the sub-features; users that don't want some of them will be able to remove them from the list, which will affect the mapping at search time. We could also have a setting that means "DTRT for my locale", which will remove the sub-features inappropriate for the locale's language. Stuff like that. > So we could have at least one default internal variable containing all > decompositions from UnicodeData.txt plus decompositions from decomps.txt > minus locale-dependent mappings. Internally, we need a translation table for mapping equivalent characters. This table should be recomputed (or selected among several precomputed ones) according to the list of sub-features that the user requested. > > http://unicode.org/Public/UCA/latest/decomps.txt > > > > (The last release of Unicode is v8.0.) > > Thanks, comparing UnicodeData.txt with the latest decomps.txt shows > 1600 differences (such as ł decomposed to l and ̵ and ø to o and ̸) > we need to add manually (a whole set of differences is attached below): I think we need to create another uni-*.el file which defines a decomposition char-table populated from decomps.txt. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-25 16:24 ` Eli Zaretskii @ 2016-02-29 0:22 ` Juri Linkov 2016-02-29 16:27 ` Eli Zaretskii 0 siblings, 1 reply; 263+ messages in thread From: Juri Linkov @ 2016-02-29 0:22 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, lokedhs, rms, emacs-devel > What I envisioned is a single variable that holds a list of folding > sub-features. Examples include ignoring diacritics, matching > ligatures and their decompositions, "controversial" foldings that > users of specific languages might not want, etc. The default value > will hold all of the sub-features; users that don't want some of them > will be able to remove them from the list, which will affect the > mapping at search time. We could also have a setting that means "DTRT > for my locale", which will remove the sub-features inappropriate for > the locale's language. Stuff like that. Like (defcustom char-fold-defaults '(ignore-diacritics match-ligatures ...? Not sure if such terms are self-descriptive. At least plain pairs like '((o ø) (l ł) ...) should be enough to customize at the base character level, and later we might consider grouping such pairs into a more high-level features like ‘spanish-diacritics’, ‘swedish-diacritics’, etc. >> So we could have at least one default internal variable containing all >> decompositions from UnicodeData.txt plus decompositions from decomps.txt >> minus locale-dependent mappings. > > Internally, we need a translation table for mapping equivalent > characters. This table should be recomputed (or selected among > several precomputed ones) according to the list of sub-features that > the user requested. Or maybe customizing a variable like (defcustom char-fold-language (with the default depending on the user locale) could reevaluate the table on saving the modified value. >> > http://unicode.org/Public/UCA/latest/decomps.txt >> > >> > (The last release of Unicode is v8.0.) >> >> Thanks, comparing UnicodeData.txt with the latest decomps.txt shows >> 1600 differences (such as ł decomposed to l and ̵ and ø to o and ̸) >> we need to add manually (a whole set of differences is attached below): > > I think we need to create another uni-*.el file which defines a > decomposition char-table populated from decomps.txt. The name of the currently used Unicode character property is “decomposition”. What would be a good name for the property from decomps.txt? “decomposition2”? ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-29 0:22 ` Juri Linkov @ 2016-02-29 16:27 ` Eli Zaretskii 2016-02-29 23:40 ` Juri Linkov 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-29 16:27 UTC (permalink / raw) To: Juri Linkov; +Cc: larsi, lokedhs, rms, emacs-devel > From: Juri Linkov <juri@linkov.net> > Cc: larsi@gnus.org, lokedhs@gmail.com, rms@gnu.org, emacs-devel@gnu.org > Date: Mon, 29 Feb 2016 02:22:02 +0200 > > > What I envisioned is a single variable that holds a list of folding > > sub-features. Examples include ignoring diacritics, matching > > ligatures and their decompositions, "controversial" foldings that > > users of specific languages might not want, etc. The default value > > will hold all of the sub-features; users that don't want some of them > > will be able to remove them from the list, which will affect the > > mapping at search time. We could also have a setting that means "DTRT > > for my locale", which will remove the sub-features inappropriate for > > the locale's language. Stuff like that. > > Like (defcustom char-fold-defaults '(ignore-diacritics match-ligatures ...? Yes. > Not sure if such terms are self-descriptive. At least plain pairs like > '((o ø) (l ł) ...) should be enough to customize at the base character level, > and later we might consider grouping such pairs into a more high-level > features like ‘spanish-diacritics’, ‘swedish-diacritics’, etc. Such grouping is what I had in mind. I don't expect users to remember these characters by heart. > > I think we need to create another uni-*.el file which defines a > > decomposition char-table populated from decomps.txt. > > The name of the currently used Unicode character property is “decomposition”. > What would be a good name for the property from decomps.txt? “decomposition2”? I'm not good at naming stuff, but how about collating-decomposition or decomposition-for-collation? ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-29 16:27 ` Eli Zaretskii @ 2016-02-29 23:40 ` Juri Linkov 2016-03-01 16:44 ` Eli Zaretskii 0 siblings, 1 reply; 263+ messages in thread From: Juri Linkov @ 2016-02-29 23:40 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, lokedhs, rms, emacs-devel >> Like (defcustom char-fold-defaults '(ignore-diacritics match-ligatures ...? > > Yes. > >> Not sure if such terms are self-descriptive. At least plain pairs like >> '((o ø) (l ł) ...) should be enough to customize at the base character level, >> and later we might consider grouping such pairs into a more high-level >> features like ‘spanish-diacritics’, ‘swedish-diacritics’, etc. > > Such grouping is what I had in mind. I don't expect users to remember > these characters by heart. OTOH, they definitely know what characters they want to ignore. >> > I think we need to create another uni-*.el file which defines a >> > decomposition char-table populated from decomps.txt. >> >> The name of the currently used Unicode character property is “decomposition”. >> What would be a good name for the property from decomps.txt? “decomposition2”? > > I'm not good at naming stuff, but how about collating-decomposition or > decomposition-for-collation? Or to put decompositions from decomps.txt into the same table with UnicodeData.txt decompositions, but mark these additional decompositions by a special tag "<collation>", or better using the same tag "<sort>" introduced in decomps.txt. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-29 23:40 ` Juri Linkov @ 2016-03-01 16:44 ` Eli Zaretskii 0 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-03-01 16:44 UTC (permalink / raw) To: Juri Linkov; +Cc: larsi, lokedhs, rms, emacs-devel > From: Juri Linkov <juri@linkov.net> > Cc: larsi@gnus.org, lokedhs@gmail.com, rms@gnu.org, emacs-devel@gnu.org > Date: Tue, 01 Mar 2016 01:40:12 +0200 > > >> What would be a good name for the property from decomps.txt? “decomposition2”? > > > > I'm not good at naming stuff, but how about collating-decomposition or > > decomposition-for-collation? > > Or to put decompositions from decomps.txt into the same table > with UnicodeData.txt decompositions, but mark these additional > decompositions by a special tag "<collation>", or better using > the same tag "<sort>" introduced in decomps.txt. Yes, I think this is a better alternative. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 18:51 ` Eli Zaretskii 2016-02-23 0:14 ` Juri Linkov @ 2016-02-26 20:23 ` Richard Stallman 1 sibling, 0 replies; 263+ messages in thread From: Richard Stallman @ 2016-02-26 20:23 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > * A per-buffer language preference variable. > > * A global value which becomes the default for new buffers. > That's unnecessarily restrictive; we can do better with the current > infrastructure. This is not a restiction, it is a feature. It is meant to enables people to do something convenient. > Some encodings provide us with charset information, > which can be used to deduce the language of the text. Some characters > belong to Unicode blocks that allow identification of the language, or > maybe a small group of languages. In some cases, the text itself > comes with metadata which describes the language. And there might be > other sources of information about the language. If there are useful ways to determine the language from the text, that work well enough that users won't complain, let's do it. That would be an add-on to the structure I proposed. > There are other aspects of this that need to be considered, if we want > for language-specific searching to be solid. E.g., what happens with > text copied to another buffer which might have a different per-buffer > language preference? does it suddenly behave differently when > searched? Yes. If you want the two buffers to have the same language preference, then maybe Emacs can guess that for you; if not, you can specify it. > But the most basic issue is that any significant development in these > directions require to re-implement the feature on the C level, and use > char-tables for folding, like we do with case-mapping. It needs to use some sort of tables. Whether they are the current kind of char table, or some other structure, is something to be determined. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-21 2:51 ` Lars Ingebrigtsen 2016-02-21 6:28 ` Elias Mårtenson @ 2016-02-21 16:25 ` Eli Zaretskii 2016-02-22 1:56 ` Lars Ingebrigtsen 1 sibling, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-21 16:25 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: lokedhs, emacs-devel > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: lokedhs@gmail.com, emacs-devel@gnu.org > Date: Sun, 21 Feb 2016 13:51:46 +1100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > The above won't support finding decomposed sequences as in á (there > > are 2 characters here, they are just displayed as one). > > They are displayed as two characters in this Emacs (current Ubuntu, > Emacs git master). :-) Probably because your default font is not capable enough. Or maybe your build lacks libotf and/or libm17n? > If that database gives us all that, then I'm all for using that database > instead of creating our own, of course. But why doesn't C-s o find ø, > and C-s l find ł then? To avoid making yet another group of users angry, this time with no firm basis at all ;-) ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-21 16:25 ` Eli Zaretskii @ 2016-02-22 1:56 ` Lars Ingebrigtsen 2016-02-22 9:20 ` Andreas Schwab 0 siblings, 1 reply; 263+ messages in thread From: Lars Ingebrigtsen @ 2016-02-22 1:56 UTC (permalink / raw) To: Eli Zaretskii; +Cc: lokedhs, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > Probably because your default font is not capable enough. Or maybe > your build lacks libotf and/or libm17n? Let's see... Does Emacs use -lfreetype? yes Does Emacs use -lm17n-flt? yes Does Emacs use -lotf? yes Does Emacs use -lxft? yes And the font seems to be xft:-unknown-Ubuntu Mono-normal-normal-normal-*-24-*-*-*-m-0-iso10646-1 (#x27) I don't think I've customised any of this stuff -- it's just the default Ubuntu setup. It's weird that the default Ubuntu font won't do the right thing here... -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 1:56 ` Lars Ingebrigtsen @ 2016-02-22 9:20 ` Andreas Schwab 2016-02-23 1:46 ` Lars Ingebrigtsen 0 siblings, 1 reply; 263+ messages in thread From: Andreas Schwab @ 2016-02-22 9:20 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: Eli Zaretskii, lokedhs, emacs-devel Lars Ingebrigtsen <larsi@gnus.org> writes: > And the font seems to be > > xft:-unknown-Ubuntu Mono-normal-normal-normal-*-24-*-*-*-m-0-iso10646-1 (#x27) For both characters? Andreas. -- Andreas Schwab, SUSE Labs, schwab@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 9:20 ` Andreas Schwab @ 2016-02-23 1:46 ` Lars Ingebrigtsen 2016-02-23 3:38 ` Eli Zaretskii 0 siblings, 1 reply; 263+ messages in thread From: Lars Ingebrigtsen @ 2016-02-23 1:46 UTC (permalink / raw) To: Andreas Schwab; +Cc: Eli Zaretskii, lokedhs, emacs-devel Andreas Schwab <schwab@suse.de> writes: > Lars Ingebrigtsen <larsi@gnus.org> writes: > >> And the font seems to be >> >> xft:-unknown-Ubuntu >> Mono-normal-normal-normal-*-24-*-*-*-m-0-iso10646-1 (#x27) > > For both characters? No, the second one is xft:-unknown-Abyssinica SIL-normal-normal-normal-*-24-*-*-*-*-0-iso10646-1 (#x11F) Character code properties: customize what to show name: COMBINING ACUTE ACCENT old-name: NON-SPACING ACUTE general-category: Mn (Mark, Nonspacing) decomposition: (769) ('́') -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-23 1:46 ` Lars Ingebrigtsen @ 2016-02-23 3:38 ` Eli Zaretskii 0 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-23 3:38 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: schwab, lokedhs, emacs-devel > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: Eli Zaretskii <eliz@gnu.org>, lokedhs@gmail.com, emacs-devel@gnu.org > Date: Tue, 23 Feb 2016 12:46:06 +1100 > > Andreas Schwab <schwab@suse.de> writes: > > > Lars Ingebrigtsen <larsi@gnus.org> writes: > > > >> And the font seems to be > >> > >> xft:-unknown-Ubuntu > >> Mono-normal-normal-normal-*-24-*-*-*-m-0-iso10646-1 (#x27) > > > > For both characters? > > No, the second one is > > xft:-unknown-Abyssinica SIL-normal-normal-normal-*-24-*-*-*-*-0-iso10646-1 (#x11F) That's why you see them separate: Emacs can only compose characters if their glyphs come from the same font. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-20 10:34 ` Eli Zaretskii 2016-02-21 2:51 ` Lars Ingebrigtsen @ 2016-02-21 12:44 ` Richard Stallman 2016-02-21 16:05 ` Eli Zaretskii 1 sibling, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-21 12:44 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > It seems to me that we're considering using the Unicode decomposition > > rules for "variant detection" because it's what we have. > No, we use decompositions because that's how equivalent strings are to > be compared and mapped/folded. Please let's drop the idea of determining the folding behavior automatically from something in Unicide. It is too rigid. Users want many different folding behaviors. Instead of insisting on a particular set of equivalences, let's make it easy for users to specify the foldings they want. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-21 12:44 ` Richard Stallman @ 2016-02-21 16:05 ` Eli Zaretskii 2016-02-22 17:57 ` Richard Stallman 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-21 16:05 UTC (permalink / raw) To: rms; +Cc: larsi, lokedhs, emacs-devel > From: Richard Stallman <rms@gnu.org> > CC: larsi@gnus.org, lokedhs@gmail.com, emacs-devel@gnu.org > Date: Sun, 21 Feb 2016 07:44:45 -0500 > > > > It seems to me that we're considering using the Unicode decomposition > > > rules for "variant detection" because it's what we have. > > > No, we use decompositions because that's how equivalent strings are to > > be compared and mapped/folded. > > Please let's drop the idea of determining the folding behavior > automatically from something in Unicide. It is too rigid. We don't determine the behavior from Unicode. We use the Unicode data to implement the behavior we consider useful. > Users want many different folding behaviors. Instead of insisting on > a particular set of equivalences, let's make it easy for users to > specify the foldings they want. Whatever additional behavior and nuances the users want, we can implement it regardless of the Unicode data we use for the basic folding (once we figure out what is it that they want and how to implement that best). There's no dichotomy here. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-21 16:05 ` Eli Zaretskii @ 2016-02-22 17:57 ` Richard Stallman 2016-02-22 18:34 ` Eli Zaretskii 0 siblings, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-22 17:57 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, lokedhs, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > Please let's drop the idea of determining the folding behavior > > automatically from something in Unicide. It is too rigid. > We don't determine the behavior from Unicode. We use the Unicode data > to implement the behavior we consider useful. What we have seen is that the behavior that comes from that Unicode data does not please the users very much. Users seem to have many different ideas of what folding is useful, and disagree with each other greatly. We should not cling to the set of folding specs that happen to come from that Unicode data. Let's forget that Unicode data. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-22 17:57 ` Richard Stallman @ 2016-02-22 18:34 ` Eli Zaretskii 0 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-22 18:34 UTC (permalink / raw) To: rms; +Cc: larsi, lokedhs, emacs-devel > From: Richard Stallman <rms@gnu.org> > CC: larsi@gnus.org, lokedhs@gmail.com, emacs-devel@gnu.org > Date: Mon, 22 Feb 2016 12:57:54 -0500 > > > > Please let's drop the idea of determining the folding behavior > > > automatically from something in Unicide. It is too rigid. > > > We don't determine the behavior from Unicode. We use the Unicode data > > to implement the behavior we consider useful. > > What we have seen is that the behavior that comes from that Unicode > data does not please the users very much. Users seem to have many > different ideas of what folding is useful, and disagree with each > other greatly. My analysis of the discussion is that a small number of specific cases of language-independent folding makes users of some languages unhappy. The number of such cases is small, and they only bother users of a small number of languages we support. My conclusion from that is that the feature as implemented needs to be augmented in minor ways, but is basically correct for the majority of use cases. IOW, it's not perfect, but it's a significant improvement for many. > We should not cling to the set of folding specs that happen to come > from that Unicode data. Let's forget that Unicode data. That'd be a mistake tantamount to throwing the baby with the bathwater. Besides, any alternative data to use for such a feature will be either identical or very similar to what we use now. The only alternative that won't need such similar data is to decide to never have this feature. I don't think we want to do that. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-20 5:22 ` Elias Mårtenson 2016-02-20 6:31 ` Lars Ingebrigtsen @ 2016-02-20 9:21 ` Eli Zaretskii 2016-02-20 10:08 ` Elias Mårtenson 1 sibling, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-20 9:21 UTC (permalink / raw) To: Elias Mårtenson; +Cc: larsi, emacs-devel > Date: Sat, 20 Feb 2016 13:22:57 +0800 > From: Elias Mårtenson <lokedhs@gmail.com> > Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org> > > The reference you are looking for is the Unicode Standard itself. It > says to use the normalization forms, see for example section 5.16 > there. > > I have read that section before, and I have now read it again. The section certainly talks about searching > ignores diacritics, but does not discuss a method to do so. There is also a reference to TR29, but it refers to > grapheme clusters which would be a very strange way to do character folding (Koreans would be very > confused). > > Every character-folding search implementation decomposes characters > before matching them. So does Emacs. We didn't invent this, and we > certainly didn't use the decompositions where they weren't supposed to > be used. It's not a trick, it's what everyone else does to do the > job. See the ICU library, for example. > > Every example you have given so far discusses the decomposition equivalence. I.e. the fact that the who > variants of ñ are the same. Section 5.16 discuss the _concept_ of allowing n and ñ match similarly but the > mechanism to do so is locale-dependent. This is what Unicode says, and that is what I say. My position is > simply that the default (if absolutely nothing else overrides it) should be chosen to take the locale of the user > into account. > > > The decompositions are used in the normalisation forms to ensure that the two variants are treated > equally > > (such as the two alternative representations of ñ that we have been discussing). > > Yes, and any character-folding search uses normalization forms as > well. > > Yes, but that's not what normalisation forms were designed to do. Your interpretation is wrong, because every implementation of character-folding in search uses normalization forms. So if you want to maintain that whoever does that is abusing normalization forms, you are not just up against Emacs, you are up against the ICU library and others. You are also up against http://www.unicode.org/notes/tn5/. It is possible that you only see the "equivalence" parts of all these sources. But in that case, you are actually claiming that folding characters should never be done at all! "Folding" means mapping _distinct_ character sequences to the same basic sequence. You start from a normalization form, then compare the results disregarding certain secondary, tertiary, etc. differences. The Emacs implementation simply expresses this algorithm by using suitable regular expressions, and it's currently only capable of either ignoring all the non-base weights or none at all, but the principle is preserved to the letter. > Again (I really apologise for repeating myself, I'm starting to sound like a troll and that is truly not my intention), > the purpose of normalisation forms are to ensure that the two variants of ñ compare the same. It is not > designed to provide a mechanism to allow n to compare equal to ñ. Under character-folding that ignores diacritics, ñ should indeed compare equal to n. > > Yes. I am fully aware of this. But so be it. Having applications work differently depending on the locale > of the > > environment the application was started in is nothing new. > > It's not new. It's old. We should move on to more general > environments that support multiple languages. Emacs is such an > environment. The old l10n paradigms are fundamentally incompatible > with that. > > Sure, but doesn't it make sense to fall back to the user's default if the buffer does not have an overriding > locale? I don't know what you mean by "buffer has an overriding locale". Emacs buffers don't have a locale, and they cannot do that in principle because we support multiple languages. E.g., what could the locale of the HELLO buffer created by "C-h H" be? > > Being a multi-lingual environment, Emacs has no real notion of the > > locale. > > > > Perhaps it should? > > That'd be a step backward, IMO. > > As opposed to having no concept of locale at all? Yes. A multilingual environment cannot have a locale in principle. It will cease being multilingual if it does. > Strange, I always thought the data was there. Perhaps you should ask > a question on the Unicode mailing list, then. > > That's a good idea actually. That's a relief. I was beginning to suspect I don't have any good ideas at all. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-20 9:21 ` Eli Zaretskii @ 2016-02-20 10:08 ` Elias Mårtenson 2016-02-20 10:44 ` Eli Zaretskii 0 siblings, 1 reply; 263+ messages in thread From: Elias Mårtenson @ 2016-02-20 10:08 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Lars Ingebrigtsen, emacs-devel [-- Attachment #1: Type: text/plain, Size: 3056 bytes --] On 20 February 2016 at 17:21, Eli Zaretskii <eliz@gnu.org> wrote: Your interpretation is wrong, because every implementation of > character-folding in search uses normalization forms. So if you want > to maintain that whoever does that is abusing normalization forms, you > are not just up against Emacs, you are up against the ICU library and > others. You are also up against http://www.unicode.org/notes/tn5/. > They may do so, but only because we're not exactly swimming in great alternatives. > It is possible that you only see the "equivalence" parts of all these > sources. But in that case, you are actually claiming that folding > characters should never be done at all! "Folding" means mapping > _distinct_ character sequences to the same basic sequence. You start > from a normalization form, then compare the results disregarding > certain secondary, tertiary, etc. differences. Of course. But the fact that you start from a normalisation form is of secondary relevance here. I thinking that perhaps repeating the fact that the normalised form is used has somewhat clouded the discussion. When you say "ignoring [...] differences", how do you determine those differences? > Again (I really apologise for repeating myself, I'm starting to sound > like a troll and that is truly not my intention), > > the purpose of normalisation forms are to ensure that the two variants > of ñ compare the same. It is not > > designed to provide a mechanism to allow n to compare equal to ñ. > > Under character-folding that ignores diacritics, ñ should indeed > compare equal to n. > Yes again. But how do you determine what rules to apply? > > Sure, but doesn't it make sense to fall back to the user's default if > the buffer does not have an overriding > > locale? > > I don't know what you mean by "buffer has an overriding locale". > Emacs buffers don't have a locale, and they cannot do that in > principle because we support multiple languages. E.g., what could the > locale of the HELLO buffer created by "C-h H" be? > I was not talking about what Emacs does today. I was speaking about the hypothetical case where buffers can have unique locales. I can see a few cases where that would be a neat thing to have, but I have to scrape the barrel to do so. > > As opposed to having no concept of locale at all? > > Yes. A multilingual environment cannot have a locale in principle. > It will cease being multilingual if it does. > I guess we'll have to agree to disagree about this one. In any case, it's for a different thread. > > Strange, I always thought the data was there. Perhaps you should ask > > a question on the Unicode mailing list, then. > > > > That's a good idea actually. > > That's a relief. I was beginning to suspect I don't have any good > ideas at all. > Apparently I have given the impression that I think your ideas are garbage. I profoundly apologise for this and will try to be better going forward. Regards, Elias [-- Attachment #2: Type: text/html, Size: 4578 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-20 10:08 ` Elias Mårtenson @ 2016-02-20 10:44 ` Eli Zaretskii 0 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-20 10:44 UTC (permalink / raw) To: Elias Mårtenson; +Cc: larsi, emacs-devel > Date: Sat, 20 Feb 2016 18:08:20 +0800 > From: Elias Mårtenson <lokedhs@gmail.com> > Cc: Lars Ingebrigtsen <larsi@gnus.org>, emacs-devel <emacs-devel@gnu.org> > > It is possible that you only see the "equivalence" parts of all these > sources. But in that case, you are actually claiming that folding > characters should never be done at all! "Folding" means mapping > _distinct_ character sequences to the same basic sequence. You start > from a normalization form, then compare the results disregarding > certain secondary, tertiary, etc. differences. > > Of course. But the fact that you start from a normalisation form is of secondary relevance here. I thinking that > perhaps repeating the fact that the normalised form is used has somewhat clouded the discussion. > > When you say "ignoring [...] differences", how do you determine those differences? > > > Again (I really apologise for repeating myself, I'm starting to sound like a troll and that is truly not my > intention), > > the purpose of normalisation forms are to ensure that the two variants of ñ compare the same. It is > not > > designed to provide a mechanism to allow n to compare equal to ñ. > > Under character-folding that ignores diacritics, ñ should indeed > compare equal to n. > > > Yes again. But how do you determine what rules to apply? Emacs currently ignores _any_ non-base differences, so ignoring is simple: we disregard any characters in the decomposition except the first one, which is the base character. Further improvements in this direction will need to access additional Unicode properties (to properly order the combining marks), and perhaps additional tables. But this is something to consider in the future, and it will have to be done in C anyway; the regexp based implementation cannot cut it. > > That's a good idea actually. > > That's a relief. I was beginning to suspect I don't have any good > ideas at all. > > Apparently I have given the impression that I think your ideas are garbage. I profoundly apologise for this and > will try to be better going forward. My smilies are usually implicit, so no sweat. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-19 9:22 ` Elias Mårtenson 2016-02-19 10:09 ` Eli Zaretskii @ 2016-02-19 20:38 ` Marcin Borkowski 1 sibling, 0 replies; 263+ messages in thread From: Marcin Borkowski @ 2016-02-19 20:38 UTC (permalink / raw) To: Elias Mårtenson; +Cc: Eli Zaretskii, Lars Ingebrigtsen, emacs-devel On 2016-02-19, at 10:22, Elias Mårtenson <lokedhs@gmail.com> wrote: > I readily agree that using the decomposition is a clever way to get the > functionality quite a long way, but the cases where it breaks down, it does > so quite spectacularly, and that's what I (and others) have been opposing. And I'd like to remind that it breaks down both ways: non-Poles should really be able to find "żółć" (btw, this is a real word, meaning "bile") by searching for "zolc", and "l" and "ł" are currently /not/ equivalent in the char-folding sense. Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-19 8:20 ` Eli Zaretskii 2016-02-19 9:22 ` Elias Mårtenson @ 2016-02-19 22:44 ` Lars Ingebrigtsen 2016-02-19 22:54 ` Clément Pit--Claudel 2016-02-20 8:09 ` Eli Zaretskii 1 sibling, 2 replies; 263+ messages in thread From: Lars Ingebrigtsen @ 2016-02-19 22:44 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > Thanks. But what does "respect the locale" mean, in practical terms? > A large portion of the characters that have some decomposition, and > thus will be folded when searching, belong to scripts that are not > related to any language or other locale-specific attribute. What do > you think should be done with them in the context of this feature? The locale says what language culture the user is from, and that's the important thing for most users. Not the language of the document or anything like that. Norwegian (like Danish and Swedish) has a 29 character alphabet, and there are keys on our keyboards for all those letters. Having any of those characters show up when searching for other characters is as weird for a Norwegian as it would be for an American to have any of their 26 characters in their alphabet substitute for another. The Norwegian "extra" characters are æøå, of which only the latter is confused in Emacs by any other character by isearch today. I would imagine that an American would like ø to be folded with o, for instance, which it doesn't do. So as currently implemented, the feature is kinda both incomplete and too intrusive at the same time. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-19 22:44 ` Lars Ingebrigtsen @ 2016-02-19 22:54 ` Clément Pit--Claudel 2016-02-20 5:25 ` Elias Mårtenson 2016-02-20 8:09 ` Eli Zaretskii 1 sibling, 1 reply; 263+ messages in thread From: Clément Pit--Claudel @ 2016-02-19 22:54 UTC (permalink / raw) To: emacs-devel [-- Attachment #1: Type: text/plain, Size: 353 bytes --] On 02/19/2016 05:44 PM, Lars Ingebrigtsen wrote: > The locale says what language culture the user is from, and that's the > important thing for most users. Not the language of the document or > anything like that. Does it? I use GNU/Linux in English, but I'm from France. This seems to be a pretty common among French programmers. Clément. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-19 22:54 ` Clément Pit--Claudel @ 2016-02-20 5:25 ` Elias Mårtenson 2016-02-20 14:32 ` Richard Stallman 0 siblings, 1 reply; 263+ messages in thread From: Elias Mårtenson @ 2016-02-20 5:25 UTC (permalink / raw) To: Clément Pit--Claudel; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 706 bytes --] On 20 February 2016 at 06:54, Clément Pit--Claudel <clement.pit@gmail.com> wrote: > On 02/19/2016 05:44 PM, Lars Ingebrigtsen wrote: > > The locale says what language culture the user is from, and that's the > > important thing for most users. Not the language of the document or > > anything like that. > > Does it? I use GNU/Linux in English, but I'm from France. This seems to be > a pretty common among French programmers. But that is your choice, is it not? Linux (GNOME, actually) certainly have very good French support and the you made a concious choice not to use it. As such, you wouldn't be surprised to see English-oriented character folding, would you? Regards, Elias [-- Attachment #2: Type: text/html, Size: 1089 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-20 5:25 ` Elias Mårtenson @ 2016-02-20 14:32 ` Richard Stallman 2016-02-20 15:50 ` Elias Mårtenson 0 siblings, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-20 14:32 UTC (permalink / raw) To: Elias Mårtenson; +Cc: clement.pit, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > But that is your choice, is it not? Linux (GNOME, actually) certainly have > very good French support GNOME is not part of Linux. It was started by the GNU Project. Are you talking about the GNU operating system and calling it "Linux"? Please don't credit our work to someone else. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-20 14:32 ` Richard Stallman @ 2016-02-20 15:50 ` Elias Mårtenson 2016-02-21 12:45 ` Richard Stallman 0 siblings, 1 reply; 263+ messages in thread From: Elias Mårtenson @ 2016-02-20 15:50 UTC (permalink / raw) To: rms; +Cc: Clément Pit--Claudel, emacs-devel [-- Attachment #1: Type: text/plain, Size: 618 bytes --] On 20 February 2016 at 22:32, Richard Stallman <rms@gnu.org> wrote: > But that is your choice, is it not? Linux (GNOME, actually) certainly > have > > very good French support > > GNOME is not part of Linux. It was started by the GNU Project. > Are you talking about the GNU operating system and calling it "Linux"? > I was specifically referring to GNOME, since it's the localised user interface most people would interact with on a daily basis. I have to admit that I was ignorant of the fact that GNU was involved in it. I guess the G at the beginning of the name should have tipped me off. Regards, Elias [-- Attachment #2: Type: text/html, Size: 1070 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-20 15:50 ` Elias Mårtenson @ 2016-02-21 12:45 ` Richard Stallman 0 siblings, 0 replies; 263+ messages in thread From: Richard Stallman @ 2016-02-21 12:45 UTC (permalink / raw) To: Elias Mårtenson; +Cc: clement.pit, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > I was specifically referring to GNOME, since it's the localised user > interface most people would interact with on a daily basis. > I have to admit that I was ignorant of the fact that GNU was involved in > it. I guess the G at the beginning of the name should have tipped me off. Well, it certainly has nothing to do with Linux. Linux is a kernel, nothing more. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-19 22:44 ` Lars Ingebrigtsen 2016-02-19 22:54 ` Clément Pit--Claudel @ 2016-02-20 8:09 ` Eli Zaretskii 2016-02-20 14:32 ` Richard Stallman 1 sibling, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-20 8:09 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: emacs-devel > From: Lars Ingebrigtsen <larsi@gnus.org> > Cc: emacs-devel@gnu.org > Date: Sat, 20 Feb 2016 09:44:26 +1100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > Thanks. But what does "respect the locale" mean, in practical terms? > > A large portion of the characters that have some decomposition, and > > thus will be folded when searching, belong to scripts that are not > > related to any language or other locale-specific attribute. What do > > you think should be done with them in the context of this feature? > > The locale says what language culture the user is from, and that's the > important thing for most users. Not the language of the document or > anything like that. > > Norwegian (like Danish and Swedish) has a 29 character alphabet, and > there are keys on our keyboards for all those letters. Having any of > those characters show up when searching for other characters is as weird > for a Norwegian as it would be for an American to have any of their 26 > characters in their alphabet substitute for another. > > The Norwegian "extra" characters are æøå, of which only the latter is > confused in Emacs by any other character by isearch today. I would > imagine that an American would like ø to be folded with o, for instance, > which it doesn't do. > > So as currently implemented, the feature is kinda both incomplete and > too intrusive at the same time. Are you saying that making the default depend on the locale would be OK? ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-20 8:09 ` Eli Zaretskii @ 2016-02-20 14:32 ` Richard Stallman 2016-02-24 23:27 ` Rasmus 0 siblings, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-20 14:32 UTC (permalink / raw) To: Eli Zaretskii; +Cc: larsi, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > Are you saying that making the default depend on the locale would be > OK? I think it is ok to use the locale as a sort of last default, but more important than that is to make it easy to specify different behaviors in Emacs, both globally and for a specific buffer. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-20 14:32 ` Richard Stallman @ 2016-02-24 23:27 ` Rasmus 2016-02-25 20:46 ` Richard Stallman 0 siblings, 1 reply; 263+ messages in thread From: Rasmus @ 2016-02-24 23:27 UTC (permalink / raw) To: emacs-devel Richard Stallman <rms@gnu.org> writes: > [[[ To any NSA and FBI agents reading my email: please consider ]]] > [[[ whether defending the US Constitution against all enemies, ]]] > [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > > Are you saying that making the default depend on the locale would be > > OK? > > I think it is ok to use the locale as a sort of last default, > but more important than that is to make it easy to specify different > behaviors in Emacs, both globally and for a specific buffer. I think it should look at the /keyboard layout/ before the /locale/. E.g. on my system the locale would suggest that I can easily type ñ, though in fact I cannot: $ localectl System Locale: LANG=es_ES.UTF-8 VC Keymap: dk-latin1 X11 Layout: dk X11 Variant: nodeadkeys Rasmus -- Hooray! ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-24 23:27 ` Rasmus @ 2016-02-25 20:46 ` Richard Stallman 0 siblings, 0 replies; 263+ messages in thread From: Richard Stallman @ 2016-02-25 20:46 UTC (permalink / raw) To: Rasmus; +Cc: emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > I think it should look at the /keyboard layout/ before the /locale/. In principle you might be right, but how can Emacs find out the keyboard layout? -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* RE: On language-dependent defaults for character-folding 2016-02-13 17:20 ` Drew Adams 2016-02-13 17:58 ` Eli Zaretskii @ 2016-02-13 18:15 ` Artur Malabarba 2016-02-13 18:26 ` Drew Adams 1 sibling, 1 reply; 263+ messages in thread From: Artur Malabarba @ 2016-02-13 18:15 UTC (permalink / raw) To: Drew Adams; +Cc: Óscar Fuentes, Eli Zaretskii, Juri Linkov, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1138 bytes --] On 13 Feb 2016 3:20 pm, "Drew Adams" <drew.adams@oracle.com> wrote: > > > The implementation should really be on the C level, like the > > case-folding support. The current implementation isn't, and > > therefore has several disadvantages some of which were already > > pointed out... > > I would like to see a list of the disadvantages laid out clearly. See a thread here called “Char-folding: how can we implement matching multiple characters as a single "thing"?”. In summary, char folding was generating regexps that were too long for Emacs to handle. The best solution we reached was to make char folding dumber, so that the resulting regexps wouldn't grow exponentially. The C-level implementations of char folding that have been discussed wouldn't have this problem because they wouldn't need regexps. Even with the current solution, char folding can still produce too long regexps if the input string is very long (which it handles by falling back on regular search). A second disadvantage is that you can't do char folding for regexp searches (though I can't tell how common that would be). [-- Attachment #2: Type: text/html, Size: 1404 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* RE: On language-dependent defaults for character-folding 2016-02-13 18:15 ` Artur Malabarba @ 2016-02-13 18:26 ` Drew Adams 0 siblings, 0 replies; 263+ messages in thread From: Drew Adams @ 2016-02-13 18:26 UTC (permalink / raw) To: bruce.connor.am Cc: Óscar Fuentes, Eli Zaretskii, Juri Linkov, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1133 bytes --] > > The implementation should really be on the C level, like the > > case-folding support. The current implementation isn't, and > > therefore has several disadvantages some of which were already > > pointed out... > > I would like to see a list of the disadvantages laid out clearly. See a thread here called “Char-folding: how can we implement matching multiple characters as a single "thing"?”. In summary, char folding was generating regexps that were too long for Emacs to handle. The best solution we reached was to make char folding dumber, so that the resulting regexps wouldn't grow exponentially. The C-level implementations of char folding that have been discussed wouldn't have this problem because they wouldn't need regexps. Even with the current solution, char folding can still produce too long regexps if the input string is very long (which it handles by falling back on regular search). A second disadvantage is that you can't do char folding for regexp searches (though I can't tell how common that would be). Yes, I read that part of the thread. But thanks for the reminder. [-- Attachment #2: Type: text/html, Size: 3361 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 18:42 ` Óscar Fuentes 2016-02-12 19:06 ` Eli Zaretskii @ 2016-02-12 19:09 ` Clément Pit--Claudel 2016-02-12 19:39 ` Óscar Fuentes 1 sibling, 1 reply; 263+ messages in thread From: Clément Pit--Claudel @ 2016-02-12 19:09 UTC (permalink / raw) To: emacs-devel [-- Attachment #1: Type: text/plain, Size: 618 bytes --] Hey Óscar, On 02/12/2016 01:42 PM, Óscar Fuentes wrote: > But we are digressing. Eli, you are missing the point. If you wish to > set Emacs defaults as per the convenience of people who think of text as > a series of codes at the expense of breaking basic expectations of those > who see text as... text, well, frankly, I don't think it is a good > decision. I think your opinion is clear; so is that of other people in this thread. Don't generalize excessively, however: I don't think of text as a series of codes, but I do love the current default, and it meets many of my expectations. Clément. [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 19:09 ` Clément Pit--Claudel @ 2016-02-12 19:39 ` Óscar Fuentes 0 siblings, 0 replies; 263+ messages in thread From: Óscar Fuentes @ 2016-02-12 19:39 UTC (permalink / raw) To: emacs-devel Clément Pit--Claudel <clement.pit@gmail.com> writes: > I think your opinion is clear; so is that of other people in this > thread. Don't generalize excessively, however: I don't think of text > as a series of codes, but I do love the current default, and it meets > many of my expectations. Clément, as mentioned on my first message, I thought that character-folding *could* be a good default until I found the n/ñ issue and read what other people wrote about similar cases on their languages. And even on its current state character-folding is something that can be useful from time to time to me, so I'm glad that it exists. But this is not about me. I can enable or disable any feature, at any time, on my config. It's about developing Emacs, and that requires thinking on what's good for our users (actual and future). ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 8:44 ` Eli Zaretskii 2016-02-12 10:03 ` Óscar Fuentes @ 2016-02-13 15:32 ` Richard Stallman 2016-02-13 15:40 ` Eli Zaretskii 2016-02-13 16:37 ` Marcin Borkowski 2 siblings, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-13 15:32 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ofv, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > If ñ is meant to be read as ñ > Don't you see them displayed identically in Emacs (and in any other > program that correctly implements display of combining accents)? > Maybe I don't really understand that "if" part. I am using Emacs on a Linux console. I see them as two characters. The first is n, and the second displays as a diamond. I get the impression Emacs expects them to display as a single character, though, because it messes up cursor positioning. (Someone told me a variable to set to prevent that messing up, but I failed to set it in .emacs and I don't remember its name now. Does anyone recall?) -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-13 15:32 ` Richard Stallman @ 2016-02-13 15:40 ` Eli Zaretskii 2016-02-13 16:58 ` Andreas Schwab 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-13 15:40 UTC (permalink / raw) To: rms, Kenichi Handa; +Cc: ofv, emacs-devel > From: Richard Stallman <rms@gnu.org> > CC: ofv@wanadoo.es, emacs-devel@gnu.org > Date: Sat, 13 Feb 2016 10:32:22 -0500 > > > > If ñ is meant to be read as ñ > > > Don't you see them displayed identically in Emacs (and in any other > > program that correctly implements display of combining accents)? > > Maybe I don't really understand that "if" part. > > I am using Emacs on a Linux console. I see them as two characters. > The first is n, and the second displays as a diamond. Your console doesn't combine them into one. > I get the impression Emacs expects them to display as a single > character, though, because it messes up cursor positioning. > (Someone told me a variable to set to prevent that messing up, > but I failed to set it in .emacs and I don't remember its name now. > Does anyone recall?) It's auto-composition-mode. I asked Handa-san (CC'ed) earlier whether we should turn off auto-composition-mode on a TTY, but didn't get any responses. Maybe I will have better luck now. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-13 15:40 ` Eli Zaretskii @ 2016-02-13 16:58 ` Andreas Schwab 2016-02-13 17:44 ` Eli Zaretskii 0 siblings, 1 reply; 263+ messages in thread From: Andreas Schwab @ 2016-02-13 16:58 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ofv, Kenichi Handa, rms, emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > I asked Handa-san (CC'ed) earlier whether we should turn off > auto-composition-mode on a TTY, but didn't get any responses. It depends on the terminal emulator. Some implement composition (xterm, konsole), others don't (linux console). Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-13 16:58 ` Andreas Schwab @ 2016-02-13 17:44 ` Eli Zaretskii 0 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-13 17:44 UTC (permalink / raw) To: Andreas Schwab; +Cc: ofv, handa, rms, emacs-devel > From: Andreas Schwab <schwab@linux-m68k.org> > Cc: rms@gnu.org, Kenichi Handa <handa@gnu.org>, ofv@wanadoo.es, emacs-devel@gnu.org > Date: Sat, 13 Feb 2016 17:58:37 +0100 > > Eli Zaretskii <eliz@gnu.org> writes: > > > I asked Handa-san (CC'ed) earlier whether we should turn off > > auto-composition-mode on a TTY, but didn't get any responses. > > It depends on the terminal emulator. Some implement composition (xterm, > konsole), others don't (linux console). So I guess we need to be more selective. But I think a more serious problem is that auto-composition-mode is buffer-local, whereas we want it to be terminal-local. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 8:44 ` Eli Zaretskii 2016-02-12 10:03 ` Óscar Fuentes 2016-02-13 15:32 ` Richard Stallman @ 2016-02-13 16:37 ` Marcin Borkowski 2016-02-13 16:50 ` Eli Zaretskii 2 siblings, 1 reply; 263+ messages in thread From: Marcin Borkowski @ 2016-02-13 16:37 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Óscar Fuentes, emacs-devel On 2016-02-12, at 09:44, Eli Zaretskii <eliz@gnu.org> wrote: > You mean, a match should happen, right? Otherwise, I'm afraid I see > no sense in this logic: IMO identically looking text should match, or > else users will kill us. What about, say "a" and "а"? ;-) Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-13 16:37 ` Marcin Borkowski @ 2016-02-13 16:50 ` Eli Zaretskii 2016-02-13 17:15 ` Marcin Borkowski 2016-02-14 13:59 ` Richard Stallman 0 siblings, 2 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-13 16:50 UTC (permalink / raw) To: Marcin Borkowski; +Cc: ofv, emacs-devel > From: Marcin Borkowski <mbork@mbork.pl> > Cc: Óscar Fuentes <ofv@wanadoo.es>, emacs-devel@gnu.org > Date: Sat, 13 Feb 2016 17:37:48 +0100 > > > On 2016-02-12, at 09:44, Eli Zaretskii <eliz@gnu.org> wrote: > > > You mean, a match should happen, right? Otherwise, I'm afraid I see > > no sense in this logic: IMO identically looking text should match, or > > else users will kill us. > > What about, say "a" and "а"? ;-) They don't look identical, and in any case, it should be clear they should never match, except when specifically searching for so-called "confusables". ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-13 16:50 ` Eli Zaretskii @ 2016-02-13 17:15 ` Marcin Borkowski 2016-02-13 17:45 ` Eli Zaretskii 2016-02-13 17:46 ` andres.ramirez 2016-02-14 13:59 ` Richard Stallman 1 sibling, 2 replies; 263+ messages in thread From: Marcin Borkowski @ 2016-02-13 17:15 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ofv, emacs-devel On 2016-02-13, at 17:50, Eli Zaretskii <eliz@gnu.org> wrote: >> From: Marcin Borkowski <mbork@mbork.pl> >> Cc: Óscar Fuentes <ofv@wanadoo.es>, emacs-devel@gnu.org >> Date: Sat, 13 Feb 2016 17:37:48 +0100 >> >> >> On 2016-02-12, at 09:44, Eli Zaretskii <eliz@gnu.org> wrote: >> >> > You mean, a match should happen, right? Otherwise, I'm afraid I see >> > no sense in this logic: IMO identically looking text should match, or >> > else users will kill us. >> >> What about, say "a" and "а"? ;-) > > They don't look identical, and in any case, it should be clear they > should never match, except when specifically searching for so-called > "confusables". Well, they look *exactly* identical on my Emacs. I even C-x C-='d a few times - still no difference. And there are more pairs like this. All this means it is way more complex than most people imagine. Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-13 17:15 ` Marcin Borkowski @ 2016-02-13 17:45 ` Eli Zaretskii 2016-02-13 17:52 ` Marcin Borkowski 2016-02-13 17:46 ` andres.ramirez 1 sibling, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-13 17:45 UTC (permalink / raw) To: Marcin Borkowski; +Cc: ofv, emacs-devel > From: Marcin Borkowski <mbork@mbork.pl> > Cc: ofv@wanadoo.es, emacs-devel@gnu.org > Date: Sat, 13 Feb 2016 18:15:35 +0100 > > >> What about, say "a" and "а"? ;-) > > > > They don't look identical, and in any case, it should be clear they > > should never match, except when specifically searching for so-called > > "confusables". > > Well, they look *exactly* identical on my Emacs. I even C-x C-='d a few > times - still no difference. And there are more pairs like this. > > All this means it is way more complex than most people imagine. Of course it is. But the important thing is Emacs does TRT with this (and other) aspects of this complexity. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-13 17:45 ` Eli Zaretskii @ 2016-02-13 17:52 ` Marcin Borkowski 0 siblings, 0 replies; 263+ messages in thread From: Marcin Borkowski @ 2016-02-13 17:52 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ofv, emacs-devel On 2016-02-13, at 18:45, Eli Zaretskii <eliz@gnu.org> wrote: >> From: Marcin Borkowski <mbork@mbork.pl> >> Cc: ofv@wanadoo.es, emacs-devel@gnu.org >> Date: Sat, 13 Feb 2016 18:15:35 +0100 >> >> >> What about, say "a" and "а"? ;-) >> > >> > They don't look identical, and in any case, it should be clear they >> > should never match, except when specifically searching for so-called >> > "confusables". >> >> Well, they look *exactly* identical on my Emacs. I even C-x C-='d a few >> times - still no difference. And there are more pairs like this. >> >> All this means it is way more complex than most people imagine. > > Of course it is. But the important thing is Emacs does TRT with this > (and other) aspects of this complexity. Of course you're right. (Though there exist rare cases where looking for one /should/ find the other one.) What I wanted to say is that this is a counterexample to this sentence: > identically looking text should match, or else users will kill us. Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-13 17:15 ` Marcin Borkowski 2016-02-13 17:45 ` Eli Zaretskii @ 2016-02-13 17:46 ` andres.ramirez 1 sibling, 0 replies; 263+ messages in thread From: andres.ramirez @ 2016-02-13 17:46 UTC (permalink / raw) To: Marcin Borkowski; +Cc: ofv, Eli Zaretskii, emacs-devel They Do not look the same on a linux virtual console. (Perhaps just in X they look the same) BR On Sat, 13 Feb 2016 12:15:35 -0500, Marcin Borkowski wrote: > >> What about, say "a" and "а"? ;-) > > > > They don't look identical, and in any case, it should be clear they > > should never match, except when specifically searching for so-called > > "confusables". > > Well, they look *exactly* identical on my Emacs. I even C-x C-='d a few > times - still no difference. And there are more pairs like this. > > All this means it is way more complex than most people imagine. > ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-13 16:50 ` Eli Zaretskii 2016-02-13 17:15 ` Marcin Borkowski @ 2016-02-14 13:59 ` Richard Stallman 1 sibling, 0 replies; 263+ messages in thread From: Richard Stallman @ 2016-02-14 13:59 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ofv, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > What about, say "a" and "а"? ;-) > They don't look identical, They look identical on my tty. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 1:50 ` Óscar Fuentes 2016-02-12 7:10 ` Eli Zaretskii @ 2016-02-12 23:50 ` Juri Linkov 2016-02-13 0:33 ` Óscar Fuentes 2016-02-13 16:38 ` Marcin Borkowski 2 siblings, 1 reply; 263+ messages in thread From: Juri Linkov @ 2016-02-12 23:50 UTC (permalink / raw) To: Óscar Fuentes; +Cc: emacs-devel > That's not easy to implement, though, as it seems that there is > controversy on some languages. Don't you agree that it is very convenient to type just ‘C-s naive’ to find “naïve”? What about https://en.wikipedia.org/wiki/%C3%8F that brings an example in French of maïs (maize) vs. mais (but)? And what to do with Spanish loanwords in English where the letter ñ is kept intact as you can see in: https://en.wikipedia.org/wiki/English_terms_with_diacritical_marks ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 23:50 ` Juri Linkov @ 2016-02-13 0:33 ` Óscar Fuentes 2016-02-14 13:57 ` Richard Stallman 0 siblings, 1 reply; 263+ messages in thread From: Óscar Fuentes @ 2016-02-13 0:33 UTC (permalink / raw) To: emacs-devel Juri Linkov <juri@linkov.net> writes: >> That's not easy to implement, though, as it seems that there is >> controversy on some languages. > > Don't you agree that it is very convenient to type just ‘C-s naive’ > to find “naïve”? Oh, yes, it is convenient, no doubt. As it is convenient to ask for `a' and be given `á'. That is convenient to me at least as much as to anybody else. What I find flabbergasting is the insistence on ignoring the "some cases will be regarded as glaring bugs" part. This is beginning to turn into a study on psychological bias :-) [snip] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-13 0:33 ` Óscar Fuentes @ 2016-02-14 13:57 ` Richard Stallman 2016-02-14 14:27 ` Óscar Fuentes 0 siblings, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-14 13:57 UTC (permalink / raw) To: Óscar Fuentes; +Cc: emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > What I find flabbergasting is the insistence on ignoring the "some > cases will be regarded as glaring bugs" part. Might that depend on what we say the feature is supposed to do? -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-14 13:57 ` Richard Stallman @ 2016-02-14 14:27 ` Óscar Fuentes 2016-02-15 10:28 ` Richard Stallman 0 siblings, 1 reply; 263+ messages in thread From: Óscar Fuentes @ 2016-02-14 14:27 UTC (permalink / raw) To: Richard Stallman; +Cc: emacs-devel Richard Stallman <rms@gnu.org> writes: > > What I find flabbergasting is the insistence on ignoring the "some > > cases will be regarded as glaring bugs" part. > > Might that depend on what we say the feature is supposed to do? I'm disputing its default status, not the feature itself. Apparently, some people here think that the feature should be enabled by default. A search mechanism that matches unrelated letters, no less! ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-14 14:27 ` Óscar Fuentes @ 2016-02-15 10:28 ` Richard Stallman 2016-02-15 12:31 ` Óscar Fuentes 0 siblings, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-15 10:28 UTC (permalink / raw) To: Óscar Fuentes; +Cc: emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > I'm disputing its default status, not the feature itself. Apparently, > some people here think that the feature should be enabled by default. A > search mechanism that matches unrelated letters, no less! There is no a priori reason why it should be on by default, or why it should be off by default. It is just a matter of what most users prefer. You've made it clear you prefer off by default. Maybe most users agree with you. I don't know. But there is no a priori reason why searching for n should not find ñ. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-15 10:28 ` Richard Stallman @ 2016-02-15 12:31 ` Óscar Fuentes 2016-02-15 17:45 ` Richard Stallman 0 siblings, 1 reply; 263+ messages in thread From: Óscar Fuentes @ 2016-02-15 12:31 UTC (permalink / raw) To: Richard Stallman; +Cc: emacs-devel Richard Stallman <rms@gnu.org> writes: > > I'm disputing its default status, not the feature itself. Apparently, > > some people here think that the feature should be enabled by default. A > > search mechanism that matches unrelated letters, no less! > > There is no a priori reason why it should be on by default, or why it > should be off by default. It is just a matter of what most users > prefer. > > You've made it clear you prefer off by default. Maybe most users > agree with you. I don't know. > > But there is no a priori reason why searching for n should not find ñ. I've mentioned several times yet that, for someone who was educated on Spanish, searching for n and finding ñ is no different than searching for v and finding w. There are similar cases on other languages. This looks like a strong a priori reason to me. My point was repeated ad nauseam. I'll stop arguing about this issue now. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-15 12:31 ` Óscar Fuentes @ 2016-02-15 17:45 ` Richard Stallman 2016-02-16 13:54 ` Elias Mårtenson 2016-02-16 14:30 ` Per Starbäck 0 siblings, 2 replies; 263+ messages in thread From: Richard Stallman @ 2016-02-15 17:45 UTC (permalink / raw) To: Óscar Fuentes; +Cc: emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > I've mentioned several times yet that, for someone who was educated on > Spanish, searching for n and finding ñ is no different than searching > for v and finding w. Whether searching for v should also find w is not a question of principle. It's a question of what is convenient for users. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-15 17:45 ` Richard Stallman @ 2016-02-16 13:54 ` Elias Mårtenson 2016-02-16 14:30 ` Per Starbäck 1 sibling, 0 replies; 263+ messages in thread From: Elias Mårtenson @ 2016-02-16 13:54 UTC (permalink / raw) To: rms; +Cc: Óscar Fuentes, emacs-devel [-- Attachment #1: Type: text/plain, Size: 606 bytes --] On 16 Feb 2016 1:46 a.m., "Richard Stallman" <rms@gnu.org> wrote: > > > I've mentioned several times yet that, for someone who was educated on > > Spanish, searching for n and finding ñ is no different than searching > > for v and finding w. > > Whether searching for v should also find w is not a question of principle. > It's a question of what is convenient for users. In Swedish, this would be useful indeed. Up until recently V and W even sorted together in the dictionary. Anyway, I will also follow Óscar's lead and not post anything more on this subject. Regards, Elias [-- Attachment #2: Type: text/html, Size: 800 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-15 17:45 ` Richard Stallman 2016-02-16 13:54 ` Elias Mårtenson @ 2016-02-16 14:30 ` Per Starbäck 2016-02-16 19:32 ` Ken Brown 2016-02-17 8:00 ` Joost Kremers 1 sibling, 2 replies; 263+ messages in thread From: Per Starbäck @ 2016-02-16 14:30 UTC (permalink / raw) To: rms; +Cc: Óscar Fuentes, emacs-devel@gnu.org > > I've mentioned several times yet that, for someone who was educated on > > Spanish, searching for n and finding ñ is no different than searching > > for v and finding w. > > Whether searching for v should also find w is not a question of principle. > It's a question of what is convenient for users. Sure, we can avoid formulating principles that explain the regularities in what is convenient or not, but then it's also a question of *how much* inconvenient it is. Having a search for "i" also find "j" would for most users be very inconvenient, to the point that it would be seen as a bug. But for someone using classical Latin it could be convenient. Even *if* classical Latin was really big today that search behaviour would still be seen as a bug by those using for example English. If 60% used classical Latin and 40% used English we shouldn't just count the numbers and conclude that the i/j search thing would be a good thing to activate by default. Something seen as a glaring bug must weigh more than just convenience. ### Searching for "n" and finding "ñ" in Spanish, or searching for "a" and finding "ä" in Swedish ### are just as strange as searching for "i" and finding "j" in English. It's as if many people on this list just won't believe that statement, which is very frustrating. It feels a bit like reporting that a particular feature that is useful otherwise makes the computer explode if you use it in Utah or Nevada and getting the answer that a recent count concluded that the feature is convenient for most users. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-16 14:30 ` Per Starbäck @ 2016-02-16 19:32 ` Ken Brown 2016-02-16 23:49 ` Lars Ingebrigtsen 2016-02-18 8:57 ` Alan Mackenzie 2016-02-17 8:00 ` Joost Kremers 1 sibling, 2 replies; 263+ messages in thread From: Ken Brown @ 2016-02-16 19:32 UTC (permalink / raw) To: Per Starbäck, rms; +Cc: Óscar Fuentes, emacs-devel@gnu.org On 2/16/2016 9:30 AM, Per Starbäck wrote: > ### Searching for "n" and finding "ñ" in Spanish, or searching for > "a" and finding "ä" in Swedish > ### are just as strange as searching for "i" and finding "j" in English. > > It's as if many people on this list just won't believe that statement, > which is very frustrating. I've been following this discussion, and I haven't seen any indication that people don't believe that statement. What I have seen is disagreement about its importance. I've also seen several people say that we should wait for more feedback from pretesters before deciding what the default should be. What's the harm in that? Ken ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-16 19:32 ` Ken Brown @ 2016-02-16 23:49 ` Lars Ingebrigtsen 2016-02-17 16:03 ` Richard Stallman 2016-02-18 8:57 ` Alan Mackenzie 1 sibling, 1 reply; 263+ messages in thread From: Lars Ingebrigtsen @ 2016-02-16 23:49 UTC (permalink / raw) To: Ken Brown; +Cc: Óscar Fuentes, Per Starbäck, rms, emacs-devel@gnu.org Ken Brown <kbrown@cornell.edu> writes: > On 2/16/2016 9:30 AM, Per Starbäck wrote: >> ### Searching for "n" and finding "ñ" in Spanish, or searching for >> "a" and finding "ä" in Swedish >> ### are just as strange as searching for "i" and finding "j" in English. >> >> It's as if many people on this list just won't believe that statement, >> which is very frustrating. > > I've been following this discussion, and I haven't seen any indication > that people don't believe that statement. What I have seen is > disagreement about its importance. Yeah, it seems that people think it's unimportant that if you search for "i", Emacs will find "j" instead. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-16 23:49 ` Lars Ingebrigtsen @ 2016-02-17 16:03 ` Richard Stallman 0 siblings, 0 replies; 263+ messages in thread From: Richard Stallman @ 2016-02-17 16:03 UTC (permalink / raw) To: Lars Ingebrigtsen; +Cc: ofv, per.starback, kbrown, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > Yeah, it seems that people think it's unimportant that if you search for > "i", Emacs will find "j" instead. My point is that this isn't a matter of principal. It is a practical question. I would consider that a misfeature, for my actual editing; but I might like it if I were editing Latin. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-16 19:32 ` Ken Brown 2016-02-16 23:49 ` Lars Ingebrigtsen @ 2016-02-18 8:57 ` Alan Mackenzie 2016-02-18 17:27 ` Eli Zaretskii 1 sibling, 1 reply; 263+ messages in thread From: Alan Mackenzie @ 2016-02-18 8:57 UTC (permalink / raw) To: Ken Brown Cc: Óscar Fuentes, Per Starbäck, Eli Zaretskii, rms, emacs-devel Hello, Ken. Sorry if I'm piggy-backing on your post, a bit. On Tue, Feb 16, 2016 at 02:32:44PM -0500, Ken Brown wrote: > On 2/16/2016 9:30 AM, Per Starbäck wrote: > > ### Searching for "n" and finding "ñ" in Spanish, or searching for > > "a" and finding "ä" in Swedish > > ### are just as strange as searching for "i" and finding "j" in English. > > > > It's as if many people on this list just won't believe that statement, > > which is very frustrating. > I've been following this discussion, and I haven't seen any indication > that people don't believe that statement. What I have seen is > disagreement about its importance. I've also seen several people say > that we should wait for more feedback from pretesters before deciding > what the default should be. What's the harm in that? What I see is a feature that, while important, is not yet ready for prime time. It irritates, at the very least, native speakers of Swedish and Spanish; it is now clear it needs to be configurable for the user's language. It also makes clumsy and inappropriate use of regular expressions; I think it's generally acknowledged we need to move much of the implementation from Lisp to C. In short, character folding as it currently is is really in an experimental stage. I therefore vote for it to be disabled by default. > Ken -- Alan Mackenzie (Nuremberg, Germany). ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-18 8:57 ` Alan Mackenzie @ 2016-02-18 17:27 ` Eli Zaretskii 2016-02-19 12:37 ` Richard Stallman 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-18 17:27 UTC (permalink / raw) To: Alan Mackenzie; +Cc: ofv, per.starback, rms, kbrown, emacs-devel > Date: Thu, 18 Feb 2016 08:57:30 +0000 > Cc: Eli Zaretskii <eliz@gnu.org>, > Per Starbäck <per.starback@gmail.com>, rms@gnu.org, > Óscar Fuentes <ofv@wanadoo.es>, emacs-devel@gnu.org > From: Alan Mackenzie <acm@muc.de> > > In short, character folding as it currently is is really in an > experimental stage. I therefore vote for it to be disabled by default. Thanks, noted. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-18 17:27 ` Eli Zaretskii @ 2016-02-19 12:37 ` Richard Stallman 2016-02-19 18:31 ` John Wiegley 0 siblings, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-19 12:37 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ofv, acm, emacs-devel, kbrown, per.starback [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] We're not looking for "votes" on this list. That would be the wrong way to make a decision. We need to poll the users -- but that too will not be a simple matter of counting votes. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-19 12:37 ` Richard Stallman @ 2016-02-19 18:31 ` John Wiegley 0 siblings, 0 replies; 263+ messages in thread From: John Wiegley @ 2016-02-19 18:31 UTC (permalink / raw) To: Richard Stallman Cc: ofv, kbrown, per.starback, emacs-devel, acm, Eli Zaretskii >>>>> Richard Stallman <rms@gnu.org> writes: > We're not looking for "votes" on this list. That would be the wrong way to > make a decision. We need to poll the users -- but that too will not be a > simple matter of counting votes. That's understood. We're looking to gauge the sentiment of the developers here on this list, but the final decision will take every factor into account. -- John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2 ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-16 14:30 ` Per Starbäck 2016-02-16 19:32 ` Ken Brown @ 2016-02-17 8:00 ` Joost Kremers 2016-02-17 15:34 ` Eli Zaretskii 1 sibling, 1 reply; 263+ messages in thread From: Joost Kremers @ 2016-02-17 8:00 UTC (permalink / raw) To: Per Starbäck; +Cc: Óscar Fuentes, rms, emacs-devel@gnu.org On Tue, Feb 16 2016, Per Starbäck <per.starback@gmail.com> wrote: > ### Searching for "n" and finding "ñ" in Spanish, or searching for > "a" and finding "ä" in Swedish > ### are just as strange as searching for "i" and finding "j" in English. > > It's as if many people on this list just won't believe that statement, > which is very frustrating. My impression of this thread is that people *do* understand the importance of making char-folding language-dependent and that the maintainers hope to implement this at some point in the future. For various reasons, however, it is not possible to do so now. The general opinion is also that char-folding is nonetheless useful to many users, despite the fact that it will generate incorrect results in some languages. The only question that needs to be answered right now is whether the feature will be turned on or off by default. And on that point, the tendency seems to be to have it off by default, with the ability to toggle it within an i-search. -- Joost Kremers Life has its moments ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-17 8:00 ` Joost Kremers @ 2016-02-17 15:34 ` Eli Zaretskii 2016-02-17 18:30 ` Achim Gratz ` (3 more replies) 0 siblings, 4 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-17 15:34 UTC (permalink / raw) To: Joost Kremers; +Cc: ofv, per.starback, rms, emacs-devel > From: Joost Kremers <joostkremers@fastmail.fm> > Date: Wed, 17 Feb 2016 09:00:02 +0100 > Cc: Óscar Fuentes <ofv@wanadoo.es>, rms@gnu.org, > "emacs-devel@gnu.org" <emacs-devel@gnu.org> > > The general opinion is also that char-folding is nonetheless useful to > many users, despite the fact that it will generate incorrect results in > some languages. The only question that needs to be answered right now is > whether the feature will be turned on or off by default. And on that > point, the tendency seems to be to have it off by default, with the > ability to toggle it within an i-search. Actually, my counts indicate that more people want it on by default than off. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-17 15:34 ` Eli Zaretskii @ 2016-02-17 18:30 ` Achim Gratz 2016-02-17 19:30 ` Eli Zaretskii 2016-02-17 20:26 ` Marcin Borkowski 2016-02-17 20:06 ` Joost Kremers ` (2 subsequent siblings) 3 siblings, 2 replies; 263+ messages in thread From: Achim Gratz @ 2016-02-17 18:30 UTC (permalink / raw) To: emacs-devel Eli Zaretskii writes: >> The general opinion is also that char-folding is nonetheless useful to >> many users, despite the fact that it will generate incorrect results in >> some languages. The only question that needs to be answered right now is >> whether the feature will be turned on or off by default. And on that >> point, the tendency seems to be to have it off by default, with the >> ability to toggle it within an i-search. > > Actually, my counts indicate that more people want it on by default > than off. Well, if you're already counting, I don't want it on by default. I do have potential uses for the feature, but it must be switchable on the spot, when and where I need it. Even a mode-based customization seems too heavy-handed to me, at least in the modes I envision to work most of the time. Allow me to make a general remark towards the trend lately to "let's switch on every newfangled feature by default because it can be switched off via customization". I'm quite sure I can't be the only one who regularly has to work on new systems or accounts that only offer a stock Emacs. It is simply not possible to always figure out which Emacs version is installed and then spending the next half hour customizing it (if that's even allowed). So please keep the stock Emacs settings stable. Regards, Achim. -- +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+ SD adaptation for Waldorf rackAttack V1.04R1: http://Synth.Stromeko.net/Downloads.html#WaldorfSDada ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-17 18:30 ` Achim Gratz @ 2016-02-17 19:30 ` Eli Zaretskii 2016-02-17 20:26 ` Marcin Borkowski 1 sibling, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-17 19:30 UTC (permalink / raw) To: Achim Gratz; +Cc: emacs-devel > From: Achim Gratz <Stromeko@nexgo.de> > Date: Wed, 17 Feb 2016 19:30:09 +0100 > > Eli Zaretskii writes: > >> The general opinion is also that char-folding is nonetheless useful to > >> many users, despite the fact that it will generate incorrect results in > >> some languages. The only question that needs to be answered right now is > >> whether the feature will be turned on or off by default. And on that > >> point, the tendency seems to be to have it off by default, with the > >> ability to toggle it within an i-search. > > > > Actually, my counts indicate that more people want it on by default > > than off. > > Well, if you're already counting, I don't want it on by default. I'm counting because that's what we all wanted: a poll, or some approximation of it. How else can a poll be summarized, if the numbers of those for and against are not known? > it must be switchable on the spot, when and where I need it. It is, please see the documentation. You can turn it on and off for a particular search (during the search), and you can do that for the next searches. > Allow me to make a general remark towards the trend lately to "let's > switch on every newfangled feature by default because it can be switched > off via customization". There's no such trend, AFAIK. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-17 18:30 ` Achim Gratz 2016-02-17 19:30 ` Eli Zaretskii @ 2016-02-17 20:26 ` Marcin Borkowski 1 sibling, 0 replies; 263+ messages in thread From: Marcin Borkowski @ 2016-02-17 20:26 UTC (permalink / raw) To: Achim Gratz; +Cc: emacs-devel On 2016-02-17, at 19:30, Achim Gratz <Stromeko@nexgo.de> wrote: > I do have potential uses for the feature, but it must be switchable on > the spot, when and where I need it. Even a mode-based customization > seems too heavy-handed to me, at least in the modes I envision to work > most of the time. +1. Actually, this was /very/ useful for me just a few hours ago. I would like to thank all the involved for this feature! Still, I think the default should be "off". Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-17 15:34 ` Eli Zaretskii 2016-02-17 18:30 ` Achim Gratz @ 2016-02-17 20:06 ` Joost Kremers 2016-02-17 20:15 ` Eli Zaretskii 2016-02-17 22:53 ` Mark Oteiza 2016-02-18 16:30 ` Richard Stallman 3 siblings, 1 reply; 263+ messages in thread From: Joost Kremers @ 2016-02-17 20:06 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ofv, per.starback, rms, emacs-devel On Wed, Feb 17 2016, Eli Zaretskii <eliz@gnu.org> wrote: >> From: Joost Kremers <joostkremers@fastmail.fm> >> Date: Wed, 17 Feb 2016 09:00:02 +0100 >> Cc: Óscar Fuentes <ofv@wanadoo.es>, rms@gnu.org, >> "emacs-devel@gnu.org" <emacs-devel@gnu.org> >> >> The general opinion is also that char-folding is nonetheless useful to >> many users, despite the fact that it will generate incorrect results in >> some languages. The only question that needs to be answered right now is >> whether the feature will be turned on or off by default. And on that >> point, the tendency seems to be to have it off by default, with the >> ability to toggle it within an i-search. > > Actually, my counts indicate that more people want it on by default > than off. Then put me down for an "off". :-) (Is this a binding referendum?) -- Joost Kremers Life has its moments ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-17 20:06 ` Joost Kremers @ 2016-02-17 20:15 ` Eli Zaretskii 2016-02-17 22:58 ` Ken Brown 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-17 20:15 UTC (permalink / raw) To: Joost Kremers; +Cc: ofv, per.starback, rms, emacs-devel > From: Joost Kremers <joostkremers@fastmail.fm> > Cc: ofv@wanadoo.es, per.starback@gmail.com, rms@gnu.org, emacs-devel@gnu.org > Date: Wed, 17 Feb 2016 21:06:11 +0100 > > > Actually, my counts indicate that more people want it on by default > > than off. > > Then put me down for an "off". :-) Done. > (Is this a binding referendum?) Yes, of course. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-17 20:15 ` Eli Zaretskii @ 2016-02-17 22:58 ` Ken Brown 2016-02-18 0:03 ` Vinicius Latorre ` (3 more replies) 0 siblings, 4 replies; 263+ messages in thread From: Ken Brown @ 2016-02-17 22:58 UTC (permalink / raw) To: Eli Zaretskii, Joost Kremers; +Cc: ofv, per.starback, rms, emacs-devel On 2/17/2016 3:15 PM, Eli Zaretskii wrote: >> From: Joost Kremers <joostkremers@fastmail.fm> >> Cc: ofv@wanadoo.es, per.starback@gmail.com, rms@gnu.org, emacs-devel@gnu.org >> Date: Wed, 17 Feb 2016 21:06:11 +0100 >> >>> Actually, my counts indicate that more people want it on by default >>> than off. >> >> Then put me down for an "off". :-) > > Done. I wrote earlier with positive feedback about character folding, but I didn't express an opinion about what the default should be. I'm now ready to cast my vote for having it on by default. My reason is that I think many users are likely to find character folding useful, but they are unlikely to discover that it exists if it is not on by default. I have read the claims that character folding in its present form will be viewed as a bug by speakers of certain languages. But I think the possible benefits to others outweigh the possible harm done to those who initially think it's a bug. Ken ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-17 22:58 ` Ken Brown @ 2016-02-18 0:03 ` Vinicius Latorre 2016-02-18 17:29 ` Eli Zaretskii 2016-02-18 4:55 ` Marcin Borkowski ` (2 subsequent siblings) 3 siblings, 1 reply; 263+ messages in thread From: Vinicius Latorre @ 2016-02-18 0:03 UTC (permalink / raw) To: Ken Brown Cc: ofv, rms, Joost Kremers, per.starback, emacs-devel, Eli Zaretskii [-- Attachment #1: Type: text/plain, Size: 1112 bytes --] My vote is off by default. On Wed, Feb 17, 2016 at 8:58 PM, Ken Brown <kbrown@cornell.edu> wrote: > On 2/17/2016 3:15 PM, Eli Zaretskii wrote: > >> From: Joost Kremers <joostkremers@fastmail.fm> >>> Cc: ofv@wanadoo.es, per.starback@gmail.com, rms@gnu.org, >>> emacs-devel@gnu.org >>> Date: Wed, 17 Feb 2016 21:06:11 +0100 >>> >>> Actually, my counts indicate that more people want it on by default >>>> than off. >>>> >>> >>> Then put me down for an "off". :-) >>> >> >> Done. >> > > I wrote earlier with positive feedback about character folding, but I > didn't express an opinion about what the default should be. I'm now ready > to cast my vote for having it on by default. > > My reason is that I think many users are likely to find character folding > useful, but they are unlikely to discover that it exists if it is not on by > default. I have read the claims that character folding in its present form > will be viewed as a bug by speakers of certain languages. But I think the > possible benefits to others outweigh the possible harm done to those who > initially think it's a bug. > > Ken > > > [-- Attachment #2: Type: text/html, Size: 2173 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-18 0:03 ` Vinicius Latorre @ 2016-02-18 17:29 ` Eli Zaretskii 0 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-18 17:29 UTC (permalink / raw) To: Vinicius Latorre Cc: ofv, rms, kbrown, joostkremers, per.starback, emacs-devel > Date: Wed, 17 Feb 2016 22:03:53 -0200 > From: Vinicius Latorre <viniciusjl.gnu@gmail.com> > Cc: Eli Zaretskii <eliz@gnu.org>, Joost Kremers <joostkremers@fastmail.fm>, ofv@wanadoo.es, > per.starback@gmail.com, rms@gnu.org, emacs-devel <emacs-devel@gnu.org> > > My vote is off by default. Thanks. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-17 22:58 ` Ken Brown 2016-02-18 0:03 ` Vinicius Latorre @ 2016-02-18 4:55 ` Marcin Borkowski 2016-02-18 11:26 ` Filipp Gunbin 2016-02-18 17:30 ` Eli Zaretskii 3 siblings, 0 replies; 263+ messages in thread From: Marcin Borkowski @ 2016-02-18 4:55 UTC (permalink / raw) To: Ken Brown Cc: ofv, rms, Joost Kremers, per.starback, emacs-devel, Eli Zaretskii On 2016-02-17, at 23:58, Ken Brown <kbrown@cornell.edu> wrote: > My reason is that I think many users are likely to find character > folding useful, but they are unlikely to discover that it exists if it > is not on by default. [...] Wait, you mean that you suspect they will not read the manual back-to-back (or NEWS, if they are already Emacs users)‽ > Ken Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-17 22:58 ` Ken Brown 2016-02-18 0:03 ` Vinicius Latorre 2016-02-18 4:55 ` Marcin Borkowski @ 2016-02-18 11:26 ` Filipp Gunbin 2016-02-18 17:26 ` Eli Zaretskii 2016-02-18 17:30 ` Eli Zaretskii 3 siblings, 1 reply; 263+ messages in thread From: Filipp Gunbin @ 2016-02-18 11:26 UTC (permalink / raw) To: emacs-devel I think the default should be "on" only when we have documented and stable logic (even if the implementation has bugs) that is not going to change much from version to version. Otherwise, people who switch versions often (as Achim wrote earlier) will be confused. Probably that will not just be "on", but some chosen default strategy among alternatives, maybe something similar to default minibuffer completion strategies set, with additional strategies available. So I'm for "off" now. Filipp ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-18 11:26 ` Filipp Gunbin @ 2016-02-18 17:26 ` Eli Zaretskii 2016-02-19 12:30 ` Filipp Gunbin 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-18 17:26 UTC (permalink / raw) To: Filipp Gunbin; +Cc: emacs-devel > From: Filipp Gunbin <fgunbin@fastmail.fm> > Date: Thu, 18 Feb 2016 14:26:02 +0300 > > I think the default should be "on" only when we have documented and > stable logic (even if the implementation has bugs) that is not going to > change much from version to version. > > Otherwise, people who switch versions often (as Achim wrote earlier) > will be confused. I'm not sure I understand what you mean by this. If we decide to leave the option on by default, it will stay on for substantial amount of time. And the same if we decide to turn it off by default. Defaults don't change frequently in Emacs, as a matter of policy, precisely for the reasons you mention. Why should you think this option will be any different? > So I'm for "off" now. Thanks. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-18 17:26 ` Eli Zaretskii @ 2016-02-19 12:30 ` Filipp Gunbin 2016-02-19 15:22 ` Eli Zaretskii 0 siblings, 1 reply; 263+ messages in thread From: Filipp Gunbin @ 2016-02-19 12:30 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Hi Eli, On 18/02/2016 19:26 +0200, Eli Zaretskii wrote: >> From: Filipp Gunbin <fgunbin@fastmail.fm> >> Date: Thu, 18 Feb 2016 14:26:02 +0300 >> >> I think the default should be "on" only when we have documented and >> stable logic (even if the implementation has bugs) that is not going to >> change much from version to version. >> >> Otherwise, people who switch versions often (as Achim wrote earlier) >> will be confused. > > I'm not sure I understand what you mean by this. If we decide to > leave the option on by default, it will stay on for substantial amount > of time. And the same if we decide to turn it off by default. > Defaults don't change frequently in Emacs, as a matter of policy, > precisely for the reasons you mention. Why should you think this > option will be any different? I wrote about the logic of folding. There is ongoing discussion about it and I wanted to stress that if we make the feature "on" by default and then change algorithm, there will be radically different behavior in different versions (besides bugfix). Maybe that's so obvious that's even not worth saying. Filipp ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-19 12:30 ` Filipp Gunbin @ 2016-02-19 15:22 ` Eli Zaretskii 0 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-19 15:22 UTC (permalink / raw) To: Filipp Gunbin; +Cc: emacs-devel > From: Filipp Gunbin <fgunbin@fastmail.fm> > Cc: emacs-devel@gnu.org > Date: Fri, 19 Feb 2016 15:30:21 +0300 > > if we make the feature "on" by default and then change algorithm, > there will be radically different behavior in different versions > (besides bugfix). This is unlikely to happen, precisely for the reasons you said it shouldn't. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-17 22:58 ` Ken Brown ` (2 preceding siblings ...) 2016-02-18 11:26 ` Filipp Gunbin @ 2016-02-18 17:30 ` Eli Zaretskii 3 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-18 17:30 UTC (permalink / raw) To: Ken Brown; +Cc: joostkremers, ofv, emacs-devel, rms, per.starback > Cc: ofv@wanadoo.es, per.starback@gmail.com, rms@gnu.org, emacs-devel@gnu.org > From: Ken Brown <kbrown@cornell.edu> > Date: Wed, 17 Feb 2016 17:58:45 -0500 > > I wrote earlier with positive feedback about character folding, but I > didn't express an opinion about what the default should be. I'm now > ready to cast my vote for having it on by default. > > My reason is that I think many users are likely to find character > folding useful, but they are unlikely to discover that it exists if it > is not on by default. I have read the claims that character folding in > its present form will be viewed as a bug by speakers of certain > languages. But I think the possible benefits to others outweigh the > possible harm done to those who initially think it's a bug. Thanks. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-17 15:34 ` Eli Zaretskii 2016-02-17 18:30 ` Achim Gratz 2016-02-17 20:06 ` Joost Kremers @ 2016-02-17 22:53 ` Mark Oteiza 2016-02-18 0:11 ` Juri Linkov 2016-02-18 17:46 ` Eli Zaretskii 2016-02-18 16:30 ` Richard Stallman 3 siblings, 2 replies; 263+ messages in thread From: Mark Oteiza @ 2016-02-17 22:53 UTC (permalink / raw) To: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Joost Kremers <joostkremers@fastmail.fm> >> Date: Wed, 17 Feb 2016 09:00:02 +0100 >> Cc: Óscar Fuentes <ofv@wanadoo.es>, rms@gnu.org, >> "emacs-devel@gnu.org" <emacs-devel@gnu.org> >> >> The general opinion is also that char-folding is nonetheless useful to >> many users, despite the fact that it will generate incorrect results in >> some languages. The only question that needs to be answered right now is >> whether the feature will be turned on or off by default. And on that >> point, the tendency seems to be to have it off by default, with the >> ability to toggle it within an i-search. > > Actually, my counts indicate that more people want it on by default > than off. I didn't know what character folding was before this was implemented in Emacs, and AFAICT the only other thing I happen to have installed that does this is Chromium. While it's a neat feature, it should default to off. I hope it becomes more customizable w.r.t. the arguments against char-folding's current behavior. It appears that char-folding's dependence on elisp regex is a crutch. Long PS: I think the news items in "** Search and Replace" need to be clearer. In particular: - *** New user option ... should perhaps mention character-fold-to-regexp if that ends up being the default - *** `isearch' and ... should mention how to disable/enable character folding for isearch, whatever the default ends up being - *** New function ... should mention that it is to be added to `search-default-regexp-mode' To me, these appear to be completely disjoint despite having everything to do with char-folding. I think one would have to know how isearch actually works in order to put it together from reading the NEWS as it is currently. I'd be happy to make the changes, but that requires knowing what the default will be. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-17 22:53 ` Mark Oteiza @ 2016-02-18 0:11 ` Juri Linkov 2016-02-18 0:20 ` Mark Oteiza 2016-02-18 4:53 ` Marcin Borkowski 2016-02-18 17:46 ` Eli Zaretskii 1 sibling, 2 replies; 263+ messages in thread From: Juri Linkov @ 2016-02-18 0:11 UTC (permalink / raw) To: Mark Oteiza; +Cc: emacs-devel > I didn't know what character folding was before this was implemented in > Emacs, and AFAICT the only other thing I happen to have installed that > does this is Chromium. How come char-folding is on by default in Chromium, and yet nobody has a problem with that? ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-18 0:11 ` Juri Linkov @ 2016-02-18 0:20 ` Mark Oteiza 2016-02-18 17:28 ` Eli Zaretskii 2016-02-18 4:53 ` Marcin Borkowski 1 sibling, 1 reply; 263+ messages in thread From: Mark Oteiza @ 2016-02-18 0:20 UTC (permalink / raw) To: Juri Linkov; +Cc: emacs-devel On 18/02/16 at 02:11am, Juri Linkov wrote: > > I didn't know what character folding was before this was implemented in > > Emacs, and AFAICT the only other thing I happen to have installed that > > does this is Chromium. > > How come char-folding is on by default in Chromium, > and yet nobody has a problem with that? Apparently it has just been an open issue for six years. https://code.google.com/p/chromium/issues/detail?id=31609 ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-18 0:20 ` Mark Oteiza @ 2016-02-18 17:28 ` Eli Zaretskii 0 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-18 17:28 UTC (permalink / raw) To: Mark Oteiza; +Cc: emacs-devel, juri > Date: Wed, 17 Feb 2016 19:20:27 -0500 > From: Mark Oteiza <mvoteiza@udel.edu> > Cc: emacs-devel@gnu.org > > > How come char-folding is on by default in Chromium, > > and yet nobody has a problem with that? > > Apparently it has just been an open issue for six years. > https://code.google.com/p/chromium/issues/detail?id=31609 Most of the complaints there is because Chromium doesn't provide any way to turn the folding off. There's no such problem in Emacs, of course. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-18 0:11 ` Juri Linkov 2016-02-18 0:20 ` Mark Oteiza @ 2016-02-18 4:53 ` Marcin Borkowski 2016-02-18 17:07 ` Elias Mårtenson 1 sibling, 1 reply; 263+ messages in thread From: Marcin Borkowski @ 2016-02-18 4:53 UTC (permalink / raw) To: Juri Linkov; +Cc: Mark Oteiza, emacs-devel On 2016-02-18, at 01:11, Juri Linkov <juri@linkov.net> wrote: >> I didn't know what character folding was before this was implemented in >> Emacs, and AFAICT the only other thing I happen to have installed that >> does this is Chromium. > > How come char-folding is on by default in Chromium, > and yet nobody has a problem with that? Well, nobody has a problem with the fact that Chromium does not have anything like query-replace, either. ;-) Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-18 4:53 ` Marcin Borkowski @ 2016-02-18 17:07 ` Elias Mårtenson 2016-02-18 17:21 ` Eli Zaretskii 2016-02-19 20:47 ` Marcin Borkowski 0 siblings, 2 replies; 263+ messages in thread From: Elias Mårtenson @ 2016-02-18 17:07 UTC (permalink / raw) To: Marcin Borkowski; +Cc: Mark Oteiza, emacs-devel, Juri Linkov [-- Attachment #1: Type: text/plain, Size: 741 bytes --] On 18 February 2016 at 12:53, Marcin Borkowski <mbork@mbork.pl> wrote: > > On 2016-02-18, at 01:11, Juri Linkov <juri@linkov.net> wrote: > > > How come char-folding is on by default in Chromium, > > and yet nobody has a problem with that? > > Well, nobody has a problem with the fact that Chromium does not have > anything like query-replace, either. > If this impacts replace-string as well, then it moves from being a mere irritant to a disaster when applied to Swedish. Imagine trying to replace the word "correct" and you end up having the word "steering wheel" be silently replaced as well (the former is "rätt" in Swedish, while the latter is "ratt"). If my vote counts, it's obviously "off". Regards, Elias [-- Attachment #2: Type: text/html, Size: 1249 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-18 17:07 ` Elias Mårtenson @ 2016-02-18 17:21 ` Eli Zaretskii 2016-02-19 7:40 ` Elias Mårtenson 2016-02-19 20:47 ` Marcin Borkowski 1 sibling, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-18 17:21 UTC (permalink / raw) To: Elias Mårtenson; +Cc: mvoteiza, juri, emacs-devel > Date: Fri, 19 Feb 2016 01:07:31 +0800 > From: Elias Mårtenson <lokedhs@gmail.com> > Cc: Mark Oteiza <mvoteiza@udel.edu>, emacs-devel <emacs-devel@gnu.org>, > Juri Linkov <juri@linkov.net> > > If this impacts replace-string as well, then it moves from being a mere irritant to a disaster when applied to > Swedish. Imagine trying to replace the word "correct" and you end up having the word "steering wheel" be > silently replaced as well (the former is "rätt" in Swedish, while the latter is "ratt"). There's no reason to assume Emacs development is that stupid. From the Emacs manual: The replacement commands by default do not use character folding (*note character folding: Lax Search.) when looking for the text to replace. To enable character folding for matching in ‘query-replace’ and ‘replace-string’, set the variable ‘replace-character-fold’ to a non-‘nil’ value. (This setting does not affect the replacement text, only how Emacs finds the text to replace. It also doesn’t affect ‘replace-regexp’.) > If my vote counts, it's obviously "off". In general, or because you thought replacement commands fold characters? In this message: http://lists.gnu.org/archive/html/emacs-devel/2016-02/msg00245.html you expressed a different opinion: I'm not even suggesting that this kind of comparisons should not be the default, even. Especially given the fact that locale-dependent comparators are not very well supported in Emacs at the moment. This seems to be mildly in favor of the feature being on by default, or maybe I misunderstand what you wanted to say here. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-18 17:21 ` Eli Zaretskii @ 2016-02-19 7:40 ` Elias Mårtenson 2016-02-19 19:24 ` Achim Gratz 0 siblings, 1 reply; 263+ messages in thread From: Elias Mårtenson @ 2016-02-19 7:40 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Mark Oteiza, emacs-devel, Juri Linkov [-- Attachment #1: Type: text/plain, Size: 1345 bytes --] On 19 February 2016 at 01:21, Eli Zaretskii <eliz@gnu.org> wrote: > > > If my vote counts, it's obviously "off". > > In general, or because you thought replacement commands fold > characters? > Because I thought replacement was affected. > In this message: > > http://lists.gnu.org/archive/html/emacs-devel/2016-02/msg00245.html > > you expressed a different opinion: > > I'm not even suggesting that this kind of comparisons should not be > the default, even. Especially given the fact that locale-dependent > comparators are not very well supported in Emacs at the moment. > > This seems to be mildly in favor of the feature being on by default, > or maybe I misunderstand what you wanted to say here. That is correct. I was, mildly in favour, as long as it's limited to interactive search commands. As I have read the rest of the discussion, I have shifted slightly toward the negative end of the spectrum to the point that my opinion is "off" by default right now, until locale-aware searching is available in Emacs. I'm a firm believer of putting one's money where one's mouth is, and I'm willing to work on it myself. However, right now I'm limited by the fact that I have no copyright assignment on file, so you can't merge anything I do. I have to bring this up with my employer's legal department again. Regards, Elias [-- Attachment #2: Type: text/html, Size: 2110 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-19 7:40 ` Elias Mårtenson @ 2016-02-19 19:24 ` Achim Gratz 2016-02-20 5:05 ` Elias Mårtenson 0 siblings, 1 reply; 263+ messages in thread From: Achim Gratz @ 2016-02-19 19:24 UTC (permalink / raw) To: emacs-devel Elias Mårtenson writes: > I'm a firm believer of putting one's money where one's mouth is, and I'm > willing to work on it myself. However, right now I'm limited by the fact > that I have no copyright assignment on file, so you can't merge anything I > do. I have to bring this up with my employer's legal department again. If your email address is an indicator of your employer, then to the best of my knowledge that has been taken care of already, but please ask. Regards, Achim. -- +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+ Wavetables for the Terratec KOMPLEXER: http://Synth.Stromeko.net/Downloads.html#KomplexerWaves ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-19 19:24 ` Achim Gratz @ 2016-02-20 5:05 ` Elias Mårtenson 2016-02-20 13:59 ` Achim Gratz 0 siblings, 1 reply; 263+ messages in thread From: Elias Mårtenson @ 2016-02-20 5:05 UTC (permalink / raw) To: Achim Gratz; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 845 bytes --] On 20 February 2016 at 03:24, Achim Gratz <Stromeko@nexgo.de> wrote: > Elias Mårtenson writes: > > I'm a firm believer of putting one's money where one's mouth is, and I'm > > willing to work on it myself. However, right now I'm limited by the fact > > that I have no copyright assignment on file, so you can't merge anything > I > > do. I have to bring this up with my employer's legal department again. > > If your email address is an indicator of your employer, then to the best > of my knowledge that has been taken care of already, but please ask. I'm posting this from a Gmail address. Perhaps you mistook it for a google.com address? This is my personal email address. I work in the banking industry where the legal departments tend to try to want to cross the i's (or whatever the expression is). Regards, Elias [-- Attachment #2: Type: text/html, Size: 1287 bytes --] ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-20 5:05 ` Elias Mårtenson @ 2016-02-20 13:59 ` Achim Gratz 0 siblings, 0 replies; 263+ messages in thread From: Achim Gratz @ 2016-02-20 13:59 UTC (permalink / raw) To: emacs-devel Elias Mårtenson writes: > I'm posting this from a Gmail address. Perhaps you mistook it for a > google.com address? Yes, sorry, somehow I managed to read google.com… > This is my personal email address. I work in the banking industry > where the legal departments tend to try to want to cross the i's (or > whatever the expression is). Oh sure. But ask them first if they have any dibs on code you write in your spare time at all (it depends on where you live and work). If not, you don't need their signature at all to assign the copyright to the FSF. Regards, Achim. -- +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+ SD adaptations for Waldorf Q V3.00R3 and Q+ V3.54R2: http://Synth.Stromeko.net/Downloads.html#WaldorfSDada ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-18 17:07 ` Elias Mårtenson 2016-02-18 17:21 ` Eli Zaretskii @ 2016-02-19 20:47 ` Marcin Borkowski 2016-02-20 14:31 ` Richard Stallman 1 sibling, 1 reply; 263+ messages in thread From: Marcin Borkowski @ 2016-02-19 20:47 UTC (permalink / raw) To: Elias Mårtenson; +Cc: Mark Oteiza, emacs-devel, Juri Linkov On 2016-02-18, at 18:07, Elias Mårtenson <lokedhs@gmail.com> wrote: > On 18 February 2016 at 12:53, Marcin Borkowski <mbork@mbork.pl> wrote: > >> >> On 2016-02-18, at 01:11, Juri Linkov <juri@linkov.net> wrote: >> >> > How come char-folding is on by default in Chromium, >> > and yet nobody has a problem with that? >> >> Well, nobody has a problem with the fact that Chromium does not have >> anything like query-replace, either. >> > > If this impacts replace-string as well, then it moves from being a mere > irritant to a disaster when applied to Swedish. Imagine trying to replace You misunderstood me. I didn't mean replace is or should be affected. I meant that Chromium is a tool for /consuming/ text, and Emacs is a tool for both /consuming/ and /producing/ text (in fact, also for its /editing/, which is distinct from producing: I spend quite a lot of time on editing texts (in a natural langauge) written by others). This implies that search in Emacs has more use-cases than in a web browser (think navigation in the file you are editing, for instance). And yes, in general this also means replacing, though this is irrelevant to this discussion. > Regards, > Elias Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-19 20:47 ` Marcin Borkowski @ 2016-02-20 14:31 ` Richard Stallman 0 siblings, 0 replies; 263+ messages in thread From: Richard Stallman @ 2016-02-20 14:31 UTC (permalink / raw) To: Marcin Borkowski; +Cc: mvoteiza, juri, lokedhs, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > I meant that Chromium is a tool for /consuming/ text, I understand what you mean, but please don't use the word "consume" to describe looking at a document. Visiting a web page does not consume it. See http://gnu.org/philosophy/words-to-avoid.html. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-17 22:53 ` Mark Oteiza 2016-02-18 0:11 ` Juri Linkov @ 2016-02-18 17:46 ` Eli Zaretskii 2016-02-18 18:18 ` Mark Oteiza 1 sibling, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-18 17:46 UTC (permalink / raw) To: Mark Oteiza; +Cc: emacs-devel > From: Mark Oteiza <mvoteiza@udel.edu> > Date: Wed, 17 Feb 2016 17:53:27 -0500 > > I didn't know what character folding was before this was implemented in > Emacs, and AFAICT the only other thing I happen to have installed that > does this is Chromium. We don't have to always be the Nth application on the block to implement something useful. When Emacs was first introduced, it pioneered many features that nowadays are taken for granted. There's no reason why this trend should stop, IMO. > While it's a neat feature, it should default to off. Thanks for providing feedback. > It appears that char-folding's dependence on elisp regex is a > crutch. You (or anyone else) are welcome to work on re-implementing this in search.c similarly to case-folding we already have there. The current implementation was accepted because the feature was deemed important, and no one stepped forward to do it in C. > Long PS: I think the news items in "** Search and Replace" need to be > clearer. In particular: > > - *** New user option ... should perhaps mention character-fold-to-regexp if > that ends up being the default Done. > - *** `isearch' and ... should mention how to disable/enable character > folding for isearch, whatever the default ends up being I added that. > - *** New function ... should mention that it is to be added to > `search-default-regexp-mode' The first item above already does (after the changes you proposed above), so this sounds redundant. > To me, these appear to be completely disjoint despite having everything > to do with char-folding. I think one would have to know how isearch > actually works in order to put it together from reading the NEWS as it > is currently. The grouping in NEWS is not meant to facilitate putting it all together, in the sense of creating some overall picture of the underlying implementation. The grouping is there to make it easier to grasp changes related to the same feature or group of features, that's all. Thanks. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-18 17:46 ` Eli Zaretskii @ 2016-02-18 18:18 ` Mark Oteiza 2016-02-18 18:24 ` Eli Zaretskii 0 siblings, 1 reply; 263+ messages in thread From: Mark Oteiza @ 2016-02-18 18:18 UTC (permalink / raw) To: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Mark Oteiza <mvoteiza@udel.edu> >> Date: Wed, 17 Feb 2016 17:53:27 -0500 >> >> I didn't know what character folding was before this was implemented in >> Emacs, and AFAICT the only other thing I happen to have installed that >> does this is Chromium. > > We don't have to always be the Nth application on the block to > implement something useful. When Emacs was first introduced, it > pioneered many features that nowadays are taken for granted. There's > no reason why this trend should stop, IMO. If Emacs does become the first application to implement char-folding and provide a means to overcome the language issues associated with the current implementation, that will be impressive. >> It appears that char-folding's dependence on elisp regex is a >> crutch. > > You (or anyone else) are welcome to work on re-implementing this in > search.c similarly to case-folding we already have there. The current > implementation was accepted because the feature was deemed important, > and no one stepped forward to do it in C. Good to know that patches are welcome. >> Long PS: I think the news items in "** Search and Replace" need to be >> clearer. In particular: >> >> - *** New user option ... should perhaps mention character-fold-to-regexp if >> that ends up being the default > > Done. > >> - *** `isearch' and ... should mention how to disable/enable character >> folding for isearch, whatever the default ends up being > > I added that. > >> - *** New function ... should mention that it is to be added to >> `search-default-regexp-mode' > > The first item above already does (after the changes you proposed > above), so this sounds redundant. Indeed, thanks ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-18 18:18 ` Mark Oteiza @ 2016-02-18 18:24 ` Eli Zaretskii 0 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-18 18:24 UTC (permalink / raw) To: Mark Oteiza; +Cc: emacs-devel > From: Mark Oteiza <mvoteiza@udel.edu> > Date: Thu, 18 Feb 2016 13:18:12 -0500 > > Eli Zaretskii <eliz@gnu.org> writes: > > >> From: Mark Oteiza <mvoteiza@udel.edu> > >> Date: Wed, 17 Feb 2016 17:53:27 -0500 > >> > >> I didn't know what character folding was before this was implemented in > >> Emacs, and AFAICT the only other thing I happen to have installed that > >> does this is Chromium. > > > > We don't have to always be the Nth application on the block to > > implement something useful. When Emacs was first introduced, it > > pioneered many features that nowadays are taken for granted. There's > > no reason why this trend should stop, IMO. > > If Emacs does become the first application to implement char-folding and > provide a means to overcome the language issues associated with the > current implementation, that will be impressive. "A journey of a thousand miles begins with a single step." ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-17 15:34 ` Eli Zaretskii ` (2 preceding siblings ...) 2016-02-17 22:53 ` Mark Oteiza @ 2016-02-18 16:30 ` Richard Stallman 2016-02-18 17:07 ` Eli Zaretskii 3 siblings, 1 reply; 263+ messages in thread From: Richard Stallman @ 2016-02-18 16:30 UTC (permalink / raw) To: Eli Zaretskii; +Cc: joostkremers, per.starback, ofv, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > And on that > > point, the tendency seems to be to have it off by default, with the > > ability to toggle it within an i-search. > Actually, my counts indicate that more people want it on by default > than off. We should think about why people want what they want, and how much the feature helps or hurts them -- not just count people. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-18 16:30 ` Richard Stallman @ 2016-02-18 17:07 ` Eli Zaretskii 0 siblings, 0 replies; 263+ messages in thread From: Eli Zaretskii @ 2016-02-18 17:07 UTC (permalink / raw) To: rms; +Cc: joostkremers, per.starback, ofv, emacs-devel > From: Richard Stallman <rms@gnu.org> > CC: joostkremers@fastmail.fm, per.starback@gmail.com, ofv@wanadoo.es, > emacs-devel@gnu.org > Date: Thu, 18 Feb 2016 11:30:33 -0500 > > > Actually, my counts indicate that more people want it on by default > > than off. > > We should think about why people want what they want, and how much > the feature helps or hurts them -- not just count people. I do both, although counting is easier, since people don't necessarily explain their desires clearly enough. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-12 1:50 ` Óscar Fuentes 2016-02-12 7:10 ` Eli Zaretskii 2016-02-12 23:50 ` Juri Linkov @ 2016-02-13 16:38 ` Marcin Borkowski 2016-02-13 17:58 ` Content navigation (was: On language-dependent defaults for character-folding) Óscar Fuentes 2 siblings, 1 reply; 263+ messages in thread From: Marcin Borkowski @ 2016-02-13 16:38 UTC (permalink / raw) To: Óscar Fuentes; +Cc: emacs-devel On 2016-02-12, at 02:50, Óscar Fuentes <ofv@wanadoo.es> wrote: >> Isearch shines in navigation. > > My opinion is that Isearch is terrible for navigation. You may be > interested on ace-jump or avy, for jumping to a point that is visible, > or a plethora of terrific packages for jumping to a point that is not > visible. I know this is a bit OT, but could you enumerate some of those packages? I use avy, but I'd be interestedin navigating to places I don't see, too. Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 263+ messages in thread
* Content navigation (was: On language-dependent defaults for character-folding) 2016-02-13 16:38 ` Marcin Borkowski @ 2016-02-13 17:58 ` Óscar Fuentes 0 siblings, 0 replies; 263+ messages in thread From: Óscar Fuentes @ 2016-02-13 17:58 UTC (permalink / raw) To: emacs-devel; +Cc: help-gnu-emacs Marcin Borkowski <mbork@mbork.pl> writes: > On 2016-02-12, at 02:50, Óscar Fuentes <ofv@wanadoo.es> wrote: > >>> Isearch shines in navigation. >> >> My opinion is that Isearch is terrible for navigation. You may be >> interested on ace-jump or avy, for jumping to a point that is visible, >> or a plethora of terrific packages for jumping to a point that is not >> visible. > > I know this is a bit OT, but could you enumerate some of those packages? > I use avy, but I'd be interestedin navigating to places I don't see, too. It all depends on personal preferences, will to get accustomed to new ways of doing things, etc. It also depends on the type of content you work with (code, plain text, org files...) You can start looking at what Emacs provides out of the box: registers, the mark ring, imenu... Also modes that hide the content you don't care about: hide-show mode, narrow to region... smaller content, easier navigation. A direct relacement for Isearch which is much more adequate for navigation (and searching in general) is Swiper. There are packages for quickly visiting special places, such as goto-change for jumping to the edited sites. Packages that depend on more or less specialized info provided by ctags and similar analyzers. More sophisticated ones such as Semantic, Clang... In the end, the key parts are how the information is managed by the search mechanism (from simple character sequences to tokens with attached meaning), the match system that links your input to candidate targets and the UI that shows those candidates and allows you to jump to them. Personally, I use registers, goto-change, TAGS tables plus etags and probably something more that I can't remember right now. For the completion system and UI, ido with the flx matching algorithm. flx-isearch is much more convenient than Isearch for searching for identifiers on my code. Instead of ido other people use helm or ivy as completion systems (ivy comes with swiper.) This is just scratching the surface. I'm sure that I'm omitting many interesting packages. Others can chime in with their favourite packages. Follow ups set to emacs-help. ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-09 21:07 ` Óscar Fuentes 2016-02-10 2:18 ` Artur Malabarba @ 2016-02-13 16:32 ` Marcin Borkowski 2016-02-13 16:47 ` Eli Zaretskii 1 sibling, 1 reply; 263+ messages in thread From: Marcin Borkowski @ 2016-02-13 16:32 UTC (permalink / raw) To: Óscar Fuentes; +Cc: emacs-devel On 2016-02-09, at 22:07, Óscar Fuentes <ofv@wanadoo.es> wrote: > How typical for an Emacs user is to have to *search* (not write) for a > composed character that he can not type with his input setup? Please do not forget about use cases like mine. I work for a journal, and I do copyediting (among other things). Situations where sloppy authors write sometimes "Poincaré" and sometimes "Poincare" are not rare. Character folding is a blessing in such cases. Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-13 16:32 ` On language-dependent defaults for character-folding Marcin Borkowski @ 2016-02-13 16:47 ` Eli Zaretskii 2016-02-13 17:03 ` Marcin Borkowski 0 siblings, 1 reply; 263+ messages in thread From: Eli Zaretskii @ 2016-02-13 16:47 UTC (permalink / raw) To: Marcin Borkowski; +Cc: ofv, emacs-devel > From: Marcin Borkowski <mbork@mbork.pl> > Date: Sat, 13 Feb 2016 17:32:36 +0100 > Cc: emacs-devel@gnu.org > > Please do not forget about use cases like mine. I work for a journal, > and I do copyediting (among other things). Situations where sloppy > authors write sometimes "Poincaré" and sometimes "Poincare" are not > rare. Character folding is a blessing in such cases. But in a previous message you said: For Polish texts, I would rather turn char folding off. How to reconcile that with what you say above? ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-13 16:47 ` Eli Zaretskii @ 2016-02-13 17:03 ` Marcin Borkowski 0 siblings, 0 replies; 263+ messages in thread From: Marcin Borkowski @ 2016-02-13 17:03 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ofv, emacs-devel On 2016-02-13, at 17:47, Eli Zaretskii <eliz@gnu.org> wrote: >> From: Marcin Borkowski <mbork@mbork.pl> >> Date: Sat, 13 Feb 2016 17:32:36 +0100 >> Cc: emacs-devel@gnu.org >> >> Please do not forget about use cases like mine. I work for a journal, >> and I do copyediting (among other things). Situations where sloppy >> authors write sometimes "Poincaré" and sometimes "Poincare" are not >> rare. Character folding is a blessing in such cases. > > But in a previous message you said: > > For Polish texts, I would rather turn char folding off. > > How to reconcile that with what you say above? Easily. Different use-cases. Usually I want to have it off, but when I work in an article containing lots of foreign names *written not by me*, I'd turn it on instantly. Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-09 17:26 On language-dependent defaults for character-folding Artur Malabarba ` (3 preceding siblings ...) 2016-02-09 18:21 ` Óscar Fuentes @ 2016-02-10 13:52 ` Adrian.B.Robert 2016-02-24 9:58 ` Marcin Borkowski 5 siblings, 0 replies; 263+ messages in thread From: Adrian.B.Robert @ 2016-02-10 13:52 UTC (permalink / raw) To: emacs-devel Artur Malabarba <bruce.connor.am@gmail.com> writes: > Char folding is primarily about being able to easily search for > characters that you can't easily type. It also has secondary uses, > like searching when you're not even sure which character you want to > search for, but I'm focusing on the first. Thank you. I wish there were more posting of actual use cases like this in the present discussion. I feel like a lot of the posts so far are along the lines of "Because X, I don't want this to be the *default*", which it isn't going to be anyway, and very few are about "I want character folding so I can *do* Y." So far I've seen: 1) Easily search for not-easily typable characters, by casting a wide net. 2) Search for composed and decomposed variants of the same character. Note that these would be best served by two *different* features. #2 by true unicode-composition folding, and #1 by broader "optical" classes that are roughly but not exactly captured by searching for any character whose decomposition contains the template. Are there any other things that people *would like* to do with character folding (besides turn it off if it got in their way)? ^ permalink raw reply [flat|nested] 263+ messages in thread
* Re: On language-dependent defaults for character-folding 2016-02-09 17:26 On language-dependent defaults for character-folding Artur Malabarba ` (4 preceding siblings ...) 2016-02-10 13:52 ` Adrian.B.Robert @ 2016-02-24 9:58 ` Marcin Borkowski 5 siblings, 0 replies; 263+ messages in thread From: Marcin Borkowski @ 2016-02-24 9:58 UTC (permalink / raw) To: bruce.connor.am; +Cc: emacs-devel Related (well, sort of): http://xkcd.com/1647/ ;-) -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 263+ messages in thread
end of thread, other threads:[~2016-03-01 16:44 UTC | newest] Thread overview: 263+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-02-09 17:26 On language-dependent defaults for character-folding Artur Malabarba 2016-02-09 17:39 ` Pierpaolo Bernardi 2016-02-09 17:54 ` Paul Eggert 2016-02-10 0:49 ` Pierpaolo Bernardi 2016-02-10 2:20 ` Artur Malabarba 2016-02-10 3:01 ` Pierpaolo Bernardi 2016-02-10 9:55 ` Artur Malabarba 2016-02-10 18:12 ` Óscar Fuentes 2016-02-10 19:23 ` Artur Malabarba 2016-02-09 17:48 ` Drew Adams 2016-02-09 16:43 ` Artur Malabarba 2016-02-09 17:58 ` Eli Zaretskii 2016-02-09 17:10 ` Artur Malabarba 2016-02-09 18:21 ` Óscar Fuentes 2016-02-09 19:54 ` Artur Malabarba 2016-02-09 20:08 ` Eli Zaretskii 2016-02-10 1:58 ` Artur Malabarba 2016-02-09 21:07 ` Óscar Fuentes 2016-02-10 2:18 ` Artur Malabarba 2016-02-10 2:52 ` Óscar Fuentes 2016-02-10 2:56 ` Mark Oteiza 2016-02-10 15:25 ` Eli Zaretskii 2016-02-10 21:17 ` Artur Malabarba 2016-02-11 3:39 ` Eli Zaretskii 2016-02-12 22:36 ` Per Starbäck 2016-02-13 8:33 ` Eli Zaretskii 2016-02-13 10:10 ` Markus Triska 2016-02-13 10:21 ` Eli Zaretskii 2016-02-13 16:46 ` joakim 2016-02-11 0:54 ` Juri Linkov 2016-02-11 1:37 ` Óscar Fuentes 2016-02-12 0:50 ` Juri Linkov 2016-02-12 1:50 ` Óscar Fuentes 2016-02-12 7:10 ` Eli Zaretskii 2016-02-12 7:32 ` Óscar Fuentes 2016-02-12 8:44 ` Eli Zaretskii 2016-02-12 10:03 ` Óscar Fuentes 2016-02-12 11:11 ` Joost Kremers 2016-02-12 18:21 ` Óscar Fuentes 2016-02-12 12:00 ` Eli Zaretskii 2016-02-12 18:42 ` Óscar Fuentes 2016-02-12 19:06 ` Eli Zaretskii 2016-02-12 19:28 ` Óscar Fuentes 2016-02-12 23:57 ` Juri Linkov 2016-02-13 0:06 ` Drew Adams 2016-02-13 8:49 ` Eli Zaretskii 2016-02-13 17:20 ` Drew Adams 2016-02-13 17:58 ` Eli Zaretskii 2016-02-18 19:15 ` John Wiegley 2016-02-18 20:12 ` Eli Zaretskii 2016-02-19 5:11 ` Lars Ingebrigtsen 2016-02-19 8:20 ` Eli Zaretskii 2016-02-19 9:22 ` Elias Mårtenson 2016-02-19 10:09 ` Eli Zaretskii 2016-02-19 10:51 ` Elias Mårtenson 2016-02-19 11:46 ` Eli Zaretskii 2016-02-19 13:37 ` Elias Mårtenson 2016-02-19 19:18 ` Eli Zaretskii 2016-02-20 5:22 ` Elias Mårtenson 2016-02-20 6:31 ` Lars Ingebrigtsen 2016-02-20 9:18 ` Elias Mårtenson 2016-02-20 10:34 ` Eli Zaretskii 2016-02-21 2:51 ` Lars Ingebrigtsen 2016-02-21 6:28 ` Elias Mårtenson 2016-02-21 8:14 ` Achim Gratz 2016-02-23 16:56 ` Eli Zaretskii 2016-02-21 10:05 ` Lars Ingebrigtsen 2016-02-21 11:01 ` Elias Mårtenson 2016-02-21 16:02 ` Eli Zaretskii 2016-02-22 1:58 ` Lars Ingebrigtsen 2016-02-22 2:34 ` Elias Mårtenson 2016-02-22 2:48 ` Lars Ingebrigtsen 2016-02-22 6:13 ` Werner LEMBERG 2016-02-22 18:03 ` Richard Stallman 2016-02-22 18:27 ` Werner LEMBERG 2016-02-22 18:01 ` Richard Stallman 2016-02-22 19:06 ` Eli Zaretskii 2016-02-23 17:43 ` Richard Stallman 2016-02-23 18:14 ` Eli Zaretskii 2016-02-23 20:24 ` Yuri Khan 2016-02-25 12:11 ` Richard Stallman 2016-02-25 14:57 ` Yuri Khan 2016-02-26 20:21 ` Richard Stallman 2016-02-27 5:47 ` Yuri Khan 2016-02-27 19:54 ` Richard Stallman 2016-02-27 20:02 ` Eli Zaretskii 2016-02-27 20:05 ` Eli Zaretskii 2016-02-28 10:25 ` Richard Stallman 2016-02-28 6:06 ` Yuri Khan 2016-02-24 13:41 ` Richard Stallman 2016-02-24 17:54 ` Eli Zaretskii 2016-02-25 12:15 ` Richard Stallman 2016-02-25 12:38 ` Joost Kremers 2016-02-25 22:43 ` John Wiegley 2016-02-25 22:48 ` John Wiegley 2016-02-26 18:13 ` Eli Zaretskii 2016-02-27 0:48 ` John Wiegley 2016-02-27 8:38 ` Eli Zaretskii 2016-02-27 8:58 ` John Wiegley 2016-02-27 9:30 ` Eli Zaretskii 2016-02-27 16:22 ` Ken Brown 2016-02-27 22:48 ` John Wiegley 2016-02-28 15:57 ` Eli Zaretskii 2016-02-28 16:59 ` Drew Adams 2016-02-28 22:59 ` John Wiegley 2016-02-29 0:22 ` Drew Adams 2016-02-29 0:31 ` Juri Linkov 2016-02-29 3:45 ` Eli Zaretskii 2016-02-27 19:53 ` Richard Stallman 2016-02-27 20:01 ` Eli Zaretskii 2016-02-28 10:24 ` Richard Stallman 2016-02-28 16:01 ` Eli Zaretskii [not found] ` <<E1aZyX5-0007bU-Mu@fencepost.gnu.org> [not found] ` <<83oab0ako0.fsf@gnu.org> 2016-02-28 17:00 ` Drew Adams 2016-02-28 17:59 ` Clément Pit--Claudel 2016-02-28 18:04 ` Eli Zaretskii 2016-02-28 18:15 ` Clément Pit--Claudel 2016-02-28 18:23 ` Drew Adams 2016-02-28 18:46 ` Eli Zaretskii 2016-02-28 18:22 ` Drew Adams 2016-02-28 18:58 ` Clément Pit--Claudel 2016-02-24 13:41 ` Richard Stallman 2016-02-24 17:56 ` Eli Zaretskii 2016-02-25 12:15 ` Richard Stallman 2016-02-23 20:21 ` Yuri Khan 2016-02-23 21:15 ` Marcin Borkowski 2016-02-22 18:01 ` Richard Stallman 2016-02-22 18:58 ` Eli Zaretskii 2016-02-23 1:30 ` Lars Ingebrigtsen 2016-02-23 17:46 ` Richard Stallman 2016-02-24 1:50 ` Lars Ingebrigtsen 2016-02-24 6:40 ` Lars Brinkhoff 2016-02-24 13:43 ` Richard Stallman 2016-02-23 2:03 ` Elias Mårtenson 2016-02-23 17:46 ` Richard Stallman 2016-02-22 3:38 ` Eli Zaretskii 2016-02-22 3:57 ` Lars Ingebrigtsen 2016-02-22 16:10 ` Eli Zaretskii 2016-02-22 18:58 ` John Wiegley 2016-02-23 7:50 ` Per Starbäck 2016-02-23 16:29 ` John Wiegley 2016-02-21 16:31 ` Eli Zaretskii 2016-02-21 16:58 ` Elias Mårtenson 2016-02-21 17:23 ` Eli Zaretskii 2016-02-21 18:48 ` Ivan Andrus 2016-02-22 15:58 ` Wolfgang Jenkner 2016-02-22 16:35 ` Eli Zaretskii 2016-02-22 16:56 ` Wolfgang Jenkner 2016-02-22 17:24 ` Eli Zaretskii 2016-02-22 17:59 ` Richard Stallman 2016-02-22 18:57 ` Eli Zaretskii 2016-02-23 17:43 ` Richard Stallman 2016-02-23 18:03 ` Eli Zaretskii 2016-02-24 13:41 ` Richard Stallman 2016-02-23 17:43 ` Richard Stallman [not found] ` <<E1aYGze-000655-RM@fencepost.gnu.org> 2016-02-23 18:00 ` Drew Adams 2016-02-22 17:59 ` Richard Stallman 2016-02-22 18:51 ` Eli Zaretskii 2016-02-23 0:14 ` Juri Linkov 2016-02-23 17:11 ` Eli Zaretskii 2016-02-24 0:16 ` Juri Linkov 2016-02-24 18:39 ` Eli Zaretskii 2016-02-25 0:29 ` Juri Linkov 2016-02-25 16:24 ` Eli Zaretskii 2016-02-29 0:22 ` Juri Linkov 2016-02-29 16:27 ` Eli Zaretskii 2016-02-29 23:40 ` Juri Linkov 2016-03-01 16:44 ` Eli Zaretskii 2016-02-26 20:23 ` Richard Stallman 2016-02-21 16:25 ` Eli Zaretskii 2016-02-22 1:56 ` Lars Ingebrigtsen 2016-02-22 9:20 ` Andreas Schwab 2016-02-23 1:46 ` Lars Ingebrigtsen 2016-02-23 3:38 ` Eli Zaretskii 2016-02-21 12:44 ` Richard Stallman 2016-02-21 16:05 ` Eli Zaretskii 2016-02-22 17:57 ` Richard Stallman 2016-02-22 18:34 ` Eli Zaretskii 2016-02-20 9:21 ` Eli Zaretskii 2016-02-20 10:08 ` Elias Mårtenson 2016-02-20 10:44 ` Eli Zaretskii 2016-02-19 20:38 ` Marcin Borkowski 2016-02-19 22:44 ` Lars Ingebrigtsen 2016-02-19 22:54 ` Clément Pit--Claudel 2016-02-20 5:25 ` Elias Mårtenson 2016-02-20 14:32 ` Richard Stallman 2016-02-20 15:50 ` Elias Mårtenson 2016-02-21 12:45 ` Richard Stallman 2016-02-20 8:09 ` Eli Zaretskii 2016-02-20 14:32 ` Richard Stallman 2016-02-24 23:27 ` Rasmus 2016-02-25 20:46 ` Richard Stallman 2016-02-13 18:15 ` Artur Malabarba 2016-02-13 18:26 ` Drew Adams 2016-02-12 19:09 ` Clément Pit--Claudel 2016-02-12 19:39 ` Óscar Fuentes 2016-02-13 15:32 ` Richard Stallman 2016-02-13 15:40 ` Eli Zaretskii 2016-02-13 16:58 ` Andreas Schwab 2016-02-13 17:44 ` Eli Zaretskii 2016-02-13 16:37 ` Marcin Borkowski 2016-02-13 16:50 ` Eli Zaretskii 2016-02-13 17:15 ` Marcin Borkowski 2016-02-13 17:45 ` Eli Zaretskii 2016-02-13 17:52 ` Marcin Borkowski 2016-02-13 17:46 ` andres.ramirez 2016-02-14 13:59 ` Richard Stallman 2016-02-12 23:50 ` Juri Linkov 2016-02-13 0:33 ` Óscar Fuentes 2016-02-14 13:57 ` Richard Stallman 2016-02-14 14:27 ` Óscar Fuentes 2016-02-15 10:28 ` Richard Stallman 2016-02-15 12:31 ` Óscar Fuentes 2016-02-15 17:45 ` Richard Stallman 2016-02-16 13:54 ` Elias Mårtenson 2016-02-16 14:30 ` Per Starbäck 2016-02-16 19:32 ` Ken Brown 2016-02-16 23:49 ` Lars Ingebrigtsen 2016-02-17 16:03 ` Richard Stallman 2016-02-18 8:57 ` Alan Mackenzie 2016-02-18 17:27 ` Eli Zaretskii 2016-02-19 12:37 ` Richard Stallman 2016-02-19 18:31 ` John Wiegley 2016-02-17 8:00 ` Joost Kremers 2016-02-17 15:34 ` Eli Zaretskii 2016-02-17 18:30 ` Achim Gratz 2016-02-17 19:30 ` Eli Zaretskii 2016-02-17 20:26 ` Marcin Borkowski 2016-02-17 20:06 ` Joost Kremers 2016-02-17 20:15 ` Eli Zaretskii 2016-02-17 22:58 ` Ken Brown 2016-02-18 0:03 ` Vinicius Latorre 2016-02-18 17:29 ` Eli Zaretskii 2016-02-18 4:55 ` Marcin Borkowski 2016-02-18 11:26 ` Filipp Gunbin 2016-02-18 17:26 ` Eli Zaretskii 2016-02-19 12:30 ` Filipp Gunbin 2016-02-19 15:22 ` Eli Zaretskii 2016-02-18 17:30 ` Eli Zaretskii 2016-02-17 22:53 ` Mark Oteiza 2016-02-18 0:11 ` Juri Linkov 2016-02-18 0:20 ` Mark Oteiza 2016-02-18 17:28 ` Eli Zaretskii 2016-02-18 4:53 ` Marcin Borkowski 2016-02-18 17:07 ` Elias Mårtenson 2016-02-18 17:21 ` Eli Zaretskii 2016-02-19 7:40 ` Elias Mårtenson 2016-02-19 19:24 ` Achim Gratz 2016-02-20 5:05 ` Elias Mårtenson 2016-02-20 13:59 ` Achim Gratz 2016-02-19 20:47 ` Marcin Borkowski 2016-02-20 14:31 ` Richard Stallman 2016-02-18 17:46 ` Eli Zaretskii 2016-02-18 18:18 ` Mark Oteiza 2016-02-18 18:24 ` Eli Zaretskii 2016-02-18 16:30 ` Richard Stallman 2016-02-18 17:07 ` Eli Zaretskii 2016-02-13 16:38 ` Marcin Borkowski 2016-02-13 17:58 ` Content navigation (was: On language-dependent defaults for character-folding) Óscar Fuentes 2016-02-13 16:32 ` On language-dependent defaults for character-folding Marcin Borkowski 2016-02-13 16:47 ` Eli Zaretskii 2016-02-13 17:03 ` Marcin Borkowski 2016-02-10 13:52 ` Adrian.B.Robert 2016-02-24 9:58 ` Marcin Borkowski
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).