* Questions about isearch @ 2015-11-25 18:41 Eli Zaretskii 2015-11-25 19:20 ` Rasmus ` (5 more replies) 0 siblings, 6 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-25 18:41 UTC (permalink / raw) To: emacs-devel These questions came out of review and extensive updates of the search and replace sections of the Emacs manual: 1. Character folding doesn't catch ligatures, such as æ (should it match the two characters "ae")? 2. It also doesn't match ä (a single character) with ä (2 characters, which Emacs correctly composes into 1 grapheme cluster). Should it? 3. With the default value t of isearch-hide-immediately, one match in invisible text is not hidden, and remains on display. To repro: emacs -Q C-x C-f etc/NEWS RET C-c C-q C-s require C-s <RIGHT> This leaves the match and its surrounding hidden text on screen. I can understand the rationale, but the doc string doesn't say anything about this feature. On the contrary, it says: Whatever the value, all opened invisible text is hidden again after exiting the search. ^^^ 4. What is the equivalent of case-replace and the letter-case related behavior of replace commands to character folding? E.g., if the replace command specifies to replace "foo" with "bar", and we found "föo", should we replace it with "bär" or something, by analogy with letter-case behavior? ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 18:41 Questions about isearch Eli Zaretskii @ 2015-11-25 19:20 ` Rasmus 2015-11-25 20:02 ` Steinar Bang 2015-11-25 20:10 ` Eli Zaretskii 2015-11-25 20:14 ` Artur Malabarba ` (4 subsequent siblings) 5 siblings, 2 replies; 94+ messages in thread From: Rasmus @ 2015-11-25 19:20 UTC (permalink / raw) To: emacs-devel Hi, Eli Zaretskii <eliz@gnu.org> writes: > These questions came out of review and extensive updates of the search > and replace sections of the Emacs manual: > > 1. Character folding doesn't catch ligatures, such as æ (should it match > the two characters "ae")? In Danish I would not consider this a ligature, but a separate letter. It can be written as ae, however. Thus, it would probably be nice to match it via ’ae’. But where to stop? How about ’å’ (matched by ’a’)? Should it be captured by "aa"? Ø by ’oe’? There’s also ’œ’... Probably there’s lots of these weird cases. > 2. It also doesn't match ä (a single character) with ä (2 characters, > which Emacs correctly composes into 1 grapheme cluster). Should it? This reminds me: UTF-8 "stroked through a" (a̶) is also displayed as ä rather than the stroke through a Emacs on my system. But this is probably a different issue. Thanks, Rasmus -- Together we'll stand, divided we'll fall ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 19:20 ` Rasmus @ 2015-11-25 20:02 ` Steinar Bang 2015-11-26 14:46 ` Richard Stallman 2015-11-25 20:10 ` Eli Zaretskii 1 sibling, 1 reply; 94+ messages in thread From: Steinar Bang @ 2015-11-25 20:02 UTC (permalink / raw) To: emacs-devel >>>>> Rasmus <rasmus@gmx.us>: > Hi, > Eli Zaretskii <eliz@gnu.org> writes: >> These questions came out of review and extensive updates of the search >> and replace sections of the Emacs manual: >> >> 1. Character folding doesn't catch ligatures, such as æ (should it match >> the two characters "ae")? > In Danish I would not consider this a ligature, but a separate letter. It > can be written as ae, however. Hm... could this happen other than when transcribing a Danish name containing "æ" to an alphabet without Danish letters...? > Thus, it would probably be nice to match it via ’ae’. Speaking for the Norwegians: probably not! > But where to stop? How about ’å’ (matched by ’a’)? Absolutely not! > Should it be captured by "aa"? Actually perhaps yes, but only for names, and only if the locale is Norwegian (and presumably also Danish). Actually, considering the limitations, probably not. > Ø by ’oe’? No. For Norwegian the situation would be similar to "ae", and it would be for a case that is increasingly going away: having to transcribe a name in USASCII only. But it would make sense to make it search for "ö", because it rarely but _may_ be used instead of "ø" in Norwegian, and there are cases where Norwegian words differ only on "ø" vs. "ö" (funnily enough less so on "æ" vs. "ä"... an informal observation by me is that many words spelled with "ä" in Swedish are spelled with "e" in Norwegian (and pronounced with an "æ" sound)). > There’s also ’œ’... Which is French for (more or less) the same sound as "ø" ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 20:02 ` Steinar Bang @ 2015-11-26 14:46 ` Richard Stallman 2015-11-26 16:22 ` Eli Zaretskii 0 siblings, 1 reply; 94+ messages in thread From: Richard Stallman @ 2015-11-26 14:46 UTC (permalink / raw) To: Steinar Bang; +Cc: emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > In Danish I would not consider this a ligature, but a separate letter. It > > can be written as ae, however. > Hm... could this happen other than when transcribing a Danish name > containing "æ" to an alphabet without Danish letters...? > > Thus, it would probably be nice to match it via ’ae’. > Speaking for the Norwegians: probably not! > > But where to stop? How about ’å’ (matched by ’a’)? > Absolutely not! > > Should it be captured by "aa"? > Actually perhaps yes, but only for names, and only if the locale is > Norwegian (and presumably also Danish). > Actually, considering the limitations, probably not. It seems that perhaps we need these correspondences to depend on the language in use. That's true for case conversion as well. For instance the way to upcase 'i' is 'I' in most languages, but in Turkish it's a character I can't find a way to enter in Emacs. It seems to me that we want to introduce a concept of current language which would control these things, and also the language for spell checking, and maybe some other things. In some cases, the current language is determined by which characters appear. That would work fine for scripts that are used for just one language. It would be hard to do that for Latin scripts, though. For latin scripts one might always have to specify it explicitly, but it could be specified by a file local variable or other such per-file customization mechanism. The language environment, which already exists, is something different. It controls how to recognize character codings, and therefore has to be global. The current language should be per-buffer and perhaps should vary between parts of a buffer. So they can't be the same thing. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-26 14:46 ` Richard Stallman @ 2015-11-26 16:22 ` Eli Zaretskii 2015-11-26 20:46 ` Per Starbäck 2015-11-27 6:37 ` Richard Stallman 0 siblings, 2 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-26 16:22 UTC (permalink / raw) To: rms; +Cc: sb, emacs-devel > From: Richard Stallman <rms@gnu.org> > Date: Thu, 26 Nov 2015 09:46:09 -0500 > Cc: emacs-devel@gnu.org > > It seems that perhaps we need these correspondences to depend > on the language in use. > > That's true for case conversion as well. For instance the way > to upcase 'i' is 'I' in most languages, but in Turkish it's a > character I can't find a way to enter in Emacs. (That character is, İ, U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE.) IMO, it is more important to have language-independent matching in Emacs. Language-specific rules are also needed in some situations, but they are secondary for Emacs. > It seems to me that we want to introduce a concept of current language It's a problematic concept for Emacs, which is a multi-lingual environment. For example, what is the "current language" of the buffer showing this message? It cannot be US English, since it includes characters not in that language, and can easily include Turkish words. Or consider the etc/HELLO file. We could probably have a text property which will specify the language, but we don't have good means to set such a property. IOW, where that information would come from? > which would control these things, and also the language for spell checking, > and maybe some other things. Actually, modern spell-checkers can support multiple languages in the same spell-checking job (in a nutshell, they check dictionaries for each language they were told to use). In any case, a spell-checker has a simpler job in this respect: it checks one word at a time, so all it needs is the language for that one word. Conceptually, this is much simpler than what Emacs needs. > In some cases, the current language is determined by which characters > appear. That would work fine for scripts that are used for just one > language. It would be hard to do that for Latin scripts, though. > For latin scripts one might always have to specify it explicitly, > but it could be specified by a file local variable or other such > per-file customization mechanism. We already know which script each character belongs to: (aref char-script-table ?a) => latin But, as you say, this only rarely helps to deduce the language. > The language environment, which already exists, is something > different. It controls how to recognize character codings, and > therefore has to be global. The current language should be per-buffer > and perhaps should vary between parts of a buffer. So they can't > be the same thing. Indeed. But defining the current language of a buffer isn't sufficient, either, for Emacs. For that reason, we generally provide language-agnostic sorting, searching, etc. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-26 16:22 ` Eli Zaretskii @ 2015-11-26 20:46 ` Per Starbäck 2015-11-26 21:02 ` Eli Zaretskii 2015-11-26 23:18 ` Rasmus 2015-11-27 6:37 ` Richard Stallman 1 sibling, 2 replies; 94+ messages in thread From: Per Starbäck @ 2015-11-26 20:46 UTC (permalink / raw) To: emacs-devel@gnu.org; +Cc: Eli Zaretskii, sb, rms > IMO, it is more important to have language-independent matching in > Emacs. Language-specific rules are also needed in some situations, > but they are secondary for Emacs. > >> It seems to me that we want to introduce a concept of current language Yes! The language of a buffer is something I have wished for a long long time, probably using minor modes. It has primarily been to have the correct ispell dictionary and to have different abbrevs depending on language. With the new search folding it is much more needed. > It's a problematic concept for Emacs, which is a multi-lingual > environment. For example, what is the "current language" of the > buffer showing this message? It's in English. > It cannot be US English, since it > includes characters not in that language, and can easily include > Turkish words. Or consider the etc/HELLO file. I don't understand at all what you are saying here. Yes, of course Turkish words (and any character) can be in an English text. That doesn't make it false that it is in English. Do you just mean that it can be hard do determine the language of a text automatically? > We could probably have a text property which will specify the > language, but we don't have good means to set such a property. IOW, > where that information would come from? I don't envision a text property, but just a value for the buffer, because it is much easier and good enough for most things. Yes, there are situations where you might want to differentiate it like that, but that goes for other things we have in modes as well. (It would sometimes be nice to get Javascript mode for part of an HTML file etc.) So from where do we get it? Normally from the user. Many users mostly write in a few languages, like Swedish and English to take myself as an example. What I want is an indication "en" or "sv" somewhere in the information line and commands to toggle between my favourite languages. Sometimes it can be determined automatically. For example when opening a html file Emacs could look at the "lang" attribute, in a LaTeX file it could see how you use packages like Babel or Polyglossia. And in any text file various methods (like n-gram frequencies) can be used to try to identify the language automatically. I think the focus should be on buffers being able to have a (natural) language, and commands to change that. It would be quite sufficient with: * a setting listing what languages I normally want to use (the first one being the default) * a cycling command that sets the language to the next in that list (that is a toggle when you have a two-list) * a command to explicitly set any valid value Anything else can be done a lot later, and as experiments outside of the core. Automatic detection is neat, but not really needed. And exactly what changes the different languages need to do will be determined part by part by time in different language communities. The important thing is that there is some hook to hang your code on. * Why it is so important, now with the new search folding * For Scandinavians it is really important, because (with Swedish as example) åäö are really totally their own letters in the Swedish alphabet, regardless of their historic origin. To have a search for "varpa" in a Swedish text find "värpa" or "varpå" would be just wrong. It would give a strong impression of this being an American program not meant to be used for Swedish. An analogue would be finding "jamb" when looking for "iamb" in English, where I and J are totally different letters, even though they originally (in Latin) were the same. Or you start an isearch for "valid" and after the first four letters you are inside "dualism". (U and V also were the same letter originally.) Confusing and irritating, and something to make people turn off this search folding which would be sad, because it's a nice thing to have. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-26 20:46 ` Per Starbäck @ 2015-11-26 21:02 ` Eli Zaretskii 2015-11-26 21:35 ` Marcin Borkowski 2015-11-27 6:38 ` Richard Stallman 2015-11-26 23:18 ` Rasmus 1 sibling, 2 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-26 21:02 UTC (permalink / raw) To: Per Starbäck; +Cc: sb, rms, emacs-devel > Date: Thu, 26 Nov 2015 21:46:49 +0100 > From: Per Starbäck <per@starback.se> > Cc: rms@gnu.org, Eli Zaretskii <eliz@gnu.org>, sb@dod.no > > > It cannot be US English, since it > > includes characters not in that language, and can easily include > > Turkish words. Or consider the etc/HELLO file. > > I don't understand at all what you are saying here. Yes, of course > Turkish words (and any character) can be in an English text. That > doesn't make it false that it is in English. Do you just mean that it > can be hard do determine the language of a text automatically? So you will sort Turkish words in an otherwise English text according to English rules? And spell-check them using an English dictionary? I don't think so. A language attribute is something that should control how certain linguistic operations are tailored. You cannot use one language's rules with words from another language. So saying that an email message that is mostly in English, but includes words and phrases from another language, is in English is not useful, at least for handling the non-English parts of that message. And what about etc/HELLO? what language is it in? There are more non-English words there than English words, and no language in particular can claim it has the majority of the words, or even too many to count as "many". How do we treat such buffers? what rules of character folding do we apply there? > > We could probably have a text property which will specify the > > language, but we don't have good means to set such a property. IOW, > > where that information would come from? > > I don't envision a text property, but just a value for the buffer, > because it is much easier and good enough for most things. Yes, there > are situations where you might want to differentiate it like that, but > that goes for other things we have in modes as well. (It would > sometimes be nice to get Javascript mode for part of an HTML file > etc.) Having Javascript in HTML just makes it highlighted wrongly. That's aesthetically bad (and there's a todo item to solve that problem), but that's not fatal. Trying to treat a word in Japanese according to Latin rules is much worse. So I think a per-buffer language attribute is the wrong way to go. We need a finer granularity. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-26 21:02 ` Eli Zaretskii @ 2015-11-26 21:35 ` Marcin Borkowski 2015-11-27 7:43 ` Eli Zaretskii 2015-11-27 6:38 ` Richard Stallman 1 sibling, 1 reply; 94+ messages in thread From: Marcin Borkowski @ 2015-11-26 21:35 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Per Starbäck, sb, rms, emacs-devel On 2015-11-26, at 22:02, Eli Zaretskii <eliz@gnu.org> wrote: > And what about etc/HELLO? what language is it in? [...] And, as I mentioned a few times in various places, there is another case (and unlike etc/HELLO, it actually happens IRL): bibliographies of scientific papers. It is not uncommon for such a bibliography to contain titles/journal names in various languages. (Probably the most extreme example might be "Funkcialaj Ekvacioj", a Japanese journal with an Esperanto title and mostly or only English papers.) AFAIK (though I'm not 100% sure), standard LaTeX tools (i.e., BibLaTeX) do not support such a situation (which is bad, since it is really needed to have different hyphenation rules for different parts of these entries - be glad that Emacs doesn't have to care about those!). Another LaTeX bibliography tool, amsrefs, handles them well; but it's not very popular. For a less extreme example, consider e.g. Latin phrases in the midst of an English text; not uncommon, for instance in law texts (but not only there). > So I think a per-buffer language attribute is the wrong way to go. We > need a finer granularity. Yes. OTOH, my feeling is that a solution which would be correct 85% of the time is better than no solution. Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/pl/Marcin_Borkowski Wydział Matematyki i Informatyki Uniwersytet im. Adama Mickiewicza ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-26 21:35 ` Marcin Borkowski @ 2015-11-27 7:43 ` Eli Zaretskii 0 siblings, 0 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-27 7:43 UTC (permalink / raw) To: Marcin Borkowski; +Cc: per, sb, rms, emacs-devel > From: Marcin Borkowski <mbork@mbork.pl> > Cc: Per Starbäck <per@starback.se>, sb@dod.no, rms@gnu.org, > emacs-devel@gnu.org > Date: Thu, 26 Nov 2015 22:35:10 +0100 > > OTOH, my feeling is that a solution which would be correct 85% of the > time is better than no solution. That could well be so, yes. But even for such a partial solution, we still need gobs of infrastructure we don't have. For example, people mentioned language-dependent character folding: to be able to do that we need a large language-dependent database of collation data. That probably means import or access the Unicode CLRD (http://cldr.unicode.org/). (We could instead rely on the underlying libc to provide that, but then it would only work on glibc-based systems, and will require to switch locales each time we need another language, which is IMO cumbersome, inefficient, and inelegant.) We cannot seriously speak about language-dependent processing before we have that data and functions to use it. Having that infrastructure is also necessary for more sophisticated language-sensitive processing that I think we should eventually have, so patches to add such a functionality are welcome. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-26 21:02 ` Eli Zaretskii 2015-11-26 21:35 ` Marcin Borkowski @ 2015-11-27 6:38 ` Richard Stallman 2015-11-27 8:53 ` Eli Zaretskii 2015-11-27 16:21 ` raman 1 sibling, 2 replies; 94+ messages in thread From: Richard Stallman @ 2015-11-27 6:38 UTC (permalink / raw) To: Eli Zaretskii; +Cc: per, sb, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > So you will sort Turkish words in an otherwise English text according > to English rules? And spell-check them using an English dictionary? > I don't think so. You seem to be trying to design an ultimate, ideal current language facility. We might want to get there eventually, but I think we should start with something simple. After all, most buffers have only one language in them. If there are a few words in another language, the user probably won't find it hard to deal with the fact that Emacs does not know they are in another language. Having a selectable language for the whole buffer is going to be better than the current situation where you can't select it. If you have a table in Turkish in the middle of a English document, and you want to sort the table, you can switch the buffer to Turkish, sort, and switch back to English. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 6:38 ` Richard Stallman @ 2015-11-27 8:53 ` Eli Zaretskii 2015-11-27 16:21 ` raman 1 sibling, 0 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-27 8:53 UTC (permalink / raw) To: rms; +Cc: per, sb, emacs-devel > From: Richard Stallman <rms@gnu.org> > CC: per@starback.se, emacs-devel@gnu.org, sb@dod.no > Date: Fri, 27 Nov 2015 01:38:32 -0500 > > You seem to be trying to design an ultimate, ideal current language > facility. Yes. > We might want to get there eventually, but I think we should start > with something simple. IMO, the initial implementation could have only partial support for multiple languages, but the design should allow for extending that all the way towards the eventual goal, which cannot possibly be a single language per buffer, not in Emacs 2X. > After all, most buffers have only one language in them. Not in my experience: buffers that combine English and Hebrew are something I see every day. The simplest example is email: the headers are in English, while the body is in a mix of Hebrew and Latin (usually English) words. > Having a selectable language for the whole buffer is going to be > better than the current situation where you can't select it. I agree. > If you have a table in Turkish in the middle of a English document, > and you want to sort the table, you can switch the buffer to Turkish, > sort, and switch back to English. Language-specific processing is not limited to sorting contiguous regions in the buffer. This discussion started from Isearch, so the example which underlines the issues is searching for a string with character-folding enabled -- this should automatically apply language-specific rules when it hits a possible match in the Turkish portion, then switch back to English when the match is in the English part. Same with spelling -- you'd want flyspell to use the right language in each portion, without the need to restart the speller program with another dictionary. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 6:38 ` Richard Stallman 2015-11-27 8:53 ` Eli Zaretskii @ 2015-11-27 16:21 ` raman 1 sibling, 0 replies; 94+ messages in thread From: raman @ 2015-11-27 16:21 UTC (permalink / raw) To: Richard Stallman; +Cc: Eli Zaretskii, per, sb, emacs-devel Richard Stallman <rms@gnu.org> writes: 1+. For the specific case of say a Turkish table in an English buffer, etc, Emacs' facilities of narrow-to-region etc can be used to advantage while applying language-specific processing. > [[[ To any NSA and FBI agents reading my email: please consider ]]] > [[[ whether defending the US Constitution against all enemies, ]]] > [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > > So you will sort Turkish words in an otherwise English text according > > to English rules? And spell-check them using an English dictionary? > > I don't think so. > > You seem to be trying to design an ultimate, ideal current language > facility. We might want to get there eventually, but I think we > should start with something simple. After all, most buffers have only > one language in them. If there are a few words in another language, > the user probably won't find it hard to deal with the fact that Emacs > does not know they are in another language. > > Having a selectable language for the whole buffer > is going to be better than the current situation > where you can't select it. > > If you have a table in Turkish in the middle of a English document, > and you want to sort the table, you can switch the buffer to Turkish, > sort, and switch back to English. -- ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-26 20:46 ` Per Starbäck 2015-11-26 21:02 ` Eli Zaretskii @ 2015-11-26 23:18 ` Rasmus 2015-11-27 7:46 ` Eli Zaretskii 1 sibling, 1 reply; 94+ messages in thread From: Rasmus @ 2015-11-26 23:18 UTC (permalink / raw) To: emacs-devel Per Starbäck <per@starback.se> writes: > * Why it is so important, now with the new search folding * > > For Scandinavians it is really important, because (with Swedish as > example) åäö are really totally their own letters in the Swedish > alphabet, regardless of their historic origin. To have a search for > "varpa" in a Swedish text find "värpa" or "varpå" would be just wrong. > It would give a strong impression of this being an American program > not meant to be used for Swedish. Still, imagine you are stuck in some environment where you do not have a Scando keyboard (or that you have to find a ’Øystein’ as a non-Scando). It may be useful to be useful to be able to search without having to access difficult letters (though the TeX input method solves most such issues for me). Rasmus -- Enough with the bla bla! ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-26 23:18 ` Rasmus @ 2015-11-27 7:46 ` Eli Zaretskii 0 siblings, 0 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-27 7:46 UTC (permalink / raw) To: Rasmus; +Cc: emacs-devel > From: Rasmus <rasmus@gmx.us> > Date: Fri, 27 Nov 2015 00:18:35 +0100 > > Per Starbäck <per@starback.se> writes: > > > * Why it is so important, now with the new search folding * > > > > For Scandinavians it is really important, because (with Swedish as > > example) åäö are really totally their own letters in the Swedish > > alphabet, regardless of their historic origin. To have a search for > > "varpa" in a Swedish text find "värpa" or "varpå" would be just wrong. > > It would give a strong impression of this being an American program > > not meant to be used for Swedish. > > Still, imagine you are stuck in some environment where you do not have a > Scando keyboard (or that you have to find a ’Øystein’ as a non-Scando). > It may be useful to be useful to be able to search without having to > access difficult letters (though the TeX input method solves most such > issues for me). Since there are various needs and situations, this is customizable, both for the current-search and for future searches. So I don't think this is worth arguing about: Emacs gives you both alternatives. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-26 16:22 ` Eli Zaretskii 2015-11-26 20:46 ` Per Starbäck @ 2015-11-27 6:37 ` Richard Stallman 2015-11-27 8:39 ` Eli Zaretskii 1 sibling, 1 reply; 94+ messages in thread From: Richard Stallman @ 2015-11-27 6:37 UTC (permalink / raw) To: Eli Zaretskii; +Cc: sb, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > It's a problematic concept for Emacs, which is a multi-lingual > environment. For example, what is the "current language" of the > buffer showing this message? English. That's what I would select for it. > It cannot be US English, since it > includes characters not in that language, Of course it can be. If I were editing that text, I would not select Turkish for it. But if you want to select Turkish for it, you could do that. The user should be able to select any current language for a given buffer. > We could probably have a text property which will specify the > language, but we don't have good means to set such a property. IOW, > where that information would come from? We don't need anything that fancy for the initial feature. Just the ability to select the language for any buffer would be a great start. Indeed, it would not help you much for etc/HELLO, but so what? It can be useful for many situations, even if it can't handle really complex situations any better than now. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 6:37 ` Richard Stallman @ 2015-11-27 8:39 ` Eli Zaretskii 0 siblings, 0 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-27 8:39 UTC (permalink / raw) To: rms; +Cc: sb, emacs-devel > From: Richard Stallman <rms@gnu.org> > CC: sb@dod.no, emacs-devel@gnu.org > Date: Fri, 27 Nov 2015 01:37:52 -0500 > > The user should be able to select any current language > for a given buffer. That would make the feature too tedious, at least to my taste. But I'm not opposed to having that if someone finds it useful. We currently lack significant infrastructure to do language-specific processing; adding such infrastructure would be a good step forward, and is needed as prerequisite for both the simplistic and the more sophisticated features, so it should be welcome, I think. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 19:20 ` Rasmus 2015-11-25 20:02 ` Steinar Bang @ 2015-11-25 20:10 ` Eli Zaretskii 2015-11-25 20:41 ` Mike Kupfer 1 sibling, 1 reply; 94+ messages in thread From: Eli Zaretskii @ 2015-11-25 20:10 UTC (permalink / raw) To: Rasmus; +Cc: emacs-devel > From: Rasmus <rasmus@gmx.us> > Date: Wed, 25 Nov 2015 20:20:20 +0100 > > > 1. Character folding doesn't catch ligatures, such as æ (should it match > > the two characters "ae")? > > In Danish I would not consider this a ligature, but a separate letter. It > can be written as ae, however. Thus, it would probably be nice to match > it via ’ae’. But where to stop? How about ’å’ (matched by ’a’)? Should > it be captured by "aa"? Ø by ’oe’? There’s also ’œ’... > > Probably there’s lots of these weird cases. Please read the node "Lax Search" in the Emacs manual. That ship sailed several months ago, and Emacs already supports "character folding", and thus yes, 'a' matches 'å' (and also 'ä' and 'á' and 'ǎ' and many others). We don't make these matches language dependent, because Emacs is a multi-lingual environment, and most text is not tagged with a particular language. So we use language-independent folding, and AFAIU "ae" should have matched 'æ' under the rules we use. But it doesn't. (Similarly "ff" and 'ff' and others.) > > 2. It also doesn't match ä (a single character) with ä (2 characters, > > which Emacs correctly composes into 1 grapheme cluster). Should it? > > This reminds me: UTF-8 "stroked through a" (a̶) is also displayed as ä > rather than the stroke through a Emacs on my system. But this is probably > a different issue. Display is a different issue, indeed. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 20:10 ` Eli Zaretskii @ 2015-11-25 20:41 ` Mike Kupfer 2015-11-25 20:56 ` Eli Zaretskii 0 siblings, 1 reply; 94+ messages in thread From: Mike Kupfer @ 2015-11-25 20:41 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii wrote: > Please read the node "Lax Search" in the Emacs manual. Is that something new in Emacs 25? I didn't find it in 24.5. mike ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 20:41 ` Mike Kupfer @ 2015-11-25 20:56 ` Eli Zaretskii 0 siblings, 0 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-25 20:56 UTC (permalink / raw) To: Mike Kupfer; +Cc: emacs-devel > From: Mike Kupfer <m.kupfer@acm.org> > cc: emacs-devel@gnu.org > Date: Wed, 25 Nov 2015 12:41:52 -0800 > > Eli Zaretskii wrote: > > > Please read the node "Lax Search" in the Emacs manual. > > Is that something new in Emacs 25? Yes. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 18:41 Questions about isearch Eli Zaretskii 2015-11-25 19:20 ` Rasmus @ 2015-11-25 20:14 ` Artur Malabarba 2015-11-25 20:30 ` Marcin Borkowski ` (2 more replies) 2015-11-25 23:15 ` Mike Kupfer ` (3 subsequent siblings) 5 siblings, 3 replies; 94+ messages in thread From: Artur Malabarba @ 2015-11-25 20:14 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 1304 bytes --] On 25 Nov 2015 6:41 pm, "Eli Zaretskii" <eliz@gnu.org> wrote: > 1. Character folding doesn't catch ligatures, such as æ (should it match > the two characters "ae")? I've no idea. It would be easy to add. Those who use ligatures need to tell us whether that makes sense. > 2. It also doesn't match ä (a single character) with ä (2 characters, > which Emacs correctly composes into 1 grapheme cluster). Should it? Possibly. Since they look the same, might make things easier on users. But I wouldn't know as I've never seen the second version used anywhere. > 4. What is the equivalent of case-replace and the letter-case related > behavior of replace commands to character folding? E.g., if the > replace command specifies to replace "foo" with "bar", and we found > "föo", should we replace it with "bär" or something, by analogy with > letter-case behavior? I don't think we should do that. Case replacement makes sense because the way you capitalize a word is frequently (though not always) independent of the word itself. That's not the case with char folding. At least in Portuguese, accents only go in very specific places, and I would _never_ want emacs to add an accent to the replacement text just because the word being replaced happened to have an accent. [-- Attachment #2: Type: text/html, Size: 1588 bytes --] ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 20:14 ` Artur Malabarba @ 2015-11-25 20:30 ` Marcin Borkowski 2015-11-25 20:38 ` Eli Zaretskii 2015-11-25 20:36 ` Eli Zaretskii 2015-11-26 16:08 ` Rasmus 2 siblings, 1 reply; 94+ messages in thread From: Marcin Borkowski @ 2015-11-25 20:30 UTC (permalink / raw) To: bruce.connor.am; +Cc: Eli Zaretskii, emacs-devel On 2015-11-25, at 21:14, Artur Malabarba <bruce.connor.am@gmail.com> wrote: > On 25 Nov 2015 6:41 pm, "Eli Zaretskii" <eliz@gnu.org> wrote: >> 1. Character folding doesn't catch ligatures, such as æ (should it match >> the two characters "ae")? > > I've no idea. It would be easy to add. > Those who use ligatures need to tell us whether that makes sense. I'm not sure whether this is relevant, but a place where ligatures come up naturally is TeX's pdf files, which can be isearched with pdf-tools. Currently, searching for "fi" when the document contains the corresponding ligature Just Works™. I'm not sure what would happen in case of e.g. a result of a pdf->text conversion. >> 4. What is the equivalent of case-replace and the letter-case related >> behavior of replace commands to character folding? E.g., if the >> replace command specifies to replace "foo" with "bar", and we found >> "föo", should we replace it with "bär" or something, by analogy with >> letter-case behavior? > > I don't think we should do that. Case replacement makes sense because the > way you capitalize a word is frequently (though not always) independent of > the word itself. That's not the case with char folding. At least in > Portuguese, accents only go in very specific places, and I would _never_ > want emacs to add an accent to the replacement text just because the word > being replaced happened to have an accent. +1. In Polish, e.g. "a" and "ą" (or "n" and "ń", etc.) are different letters, representing different sounds, and possibly changing the meaning of a word (for instance, "kat" is an executioner and "kąt" is an angle). Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 20:30 ` Marcin Borkowski @ 2015-11-25 20:38 ` Eli Zaretskii 2015-11-25 21:58 ` Artur Malabarba 0 siblings, 1 reply; 94+ messages in thread From: Eli Zaretskii @ 2015-11-25 20:38 UTC (permalink / raw) To: Marcin Borkowski; +Cc: bruce.connor.am, emacs-devel > From: Marcin Borkowski <mbork@mbork.pl> > Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel <emacs-devel@gnu.org> > Date: Wed, 25 Nov 2015 21:30:17 +0100 > > >> 4. What is the equivalent of case-replace and the letter-case related > >> behavior of replace commands to character folding? E.g., if the > >> replace command specifies to replace "foo" with "bar", and we found > >> "föo", should we replace it with "bär" or something, by analogy with > >> letter-case behavior? > > > > I don't think we should do that. Case replacement makes sense because the > > way you capitalize a word is frequently (though not always) independent of > > the word itself. That's not the case with char folding. At least in > > Portuguese, accents only go in very specific places, and I would _never_ > > want emacs to add an accent to the replacement text just because the word > > being replaced happened to have an accent. > > +1. In Polish, e.g. "a" and "ą" (or "n" and "ń", etc.) are different > letters, representing different sounds, and possibly changing the > meaning of a word (for instance, "kat" is an executioner and "kąt" is an > angle). But replacement is all about _changing_ text, so this argument doesn't seem to be applicable. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 20:38 ` Eli Zaretskii @ 2015-11-25 21:58 ` Artur Malabarba 2015-11-25 23:04 ` Mike Kupfer 2015-11-26 13:28 ` Steinar Bang 0 siblings, 2 replies; 94+ messages in thread From: Artur Malabarba @ 2015-11-25 21:58 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 458 bytes --] On 25 Nov 2015 8:38 pm, "Eli Zaretskii" <eliz@gnu.org> wrote: > > >> 4. What is the equivalent of case-replace and the letter-case related > > >> behavior of replace commands to character folding? > But replacement is all about _changing_ text, so this argument doesn't > seem to be applicable. Just to be clear. If Emacs tries to be clever about accents when I'm replacing text, it will do the wrong thing at least 100% of the time in Portuguese text. :-) [-- Attachment #2: Type: text/html, Size: 605 bytes --] ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 21:58 ` Artur Malabarba @ 2015-11-25 23:04 ` Mike Kupfer 2015-11-26 3:40 ` Eli Zaretskii 2015-11-26 13:28 ` Steinar Bang 1 sibling, 1 reply; 94+ messages in thread From: Mike Kupfer @ 2015-11-25 23:04 UTC (permalink / raw) To: bruce.connor.am, Eli Zaretskii; +Cc: emacs-devel Artur Malabarba wrote: > Just to be clear. If Emacs tries to be clever about accents when I'm > replacing text, it will do the wrong thing at least 100% of the time in > Portuguese text. :-) To give a more concrete example, if I try to replace "papa" ("pope" in Italian) with "Francis", I would not want Emacs to also replace (or even suggest replacing) "papà" ("dad" in Italian) with "Francis". mike ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 23:04 ` Mike Kupfer @ 2015-11-26 3:40 ` Eli Zaretskii 2015-11-27 19:50 ` Mike Kupfer 0 siblings, 1 reply; 94+ messages in thread From: Eli Zaretskii @ 2015-11-26 3:40 UTC (permalink / raw) To: Mike Kupfer; +Cc: bruce.connor.am, emacs-devel > From: Mike Kupfer <m.kupfer@acm.org> > cc: emacs-devel <emacs-devel@gnu.org> > Date: Wed, 25 Nov 2015 15:04:37 -0800 > > To give a more concrete example, if I try to replace "papa" ("pope" in > Italian) with "Francis", I would not want Emacs to also replace (or even > suggest replacing) "papà" ("dad" in Italian) with "Francis". By default, Emacs won't. But if you set replace-character-fold non-nil, it will. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-26 3:40 ` Eli Zaretskii @ 2015-11-27 19:50 ` Mike Kupfer 2015-11-27 20:06 ` Eli Zaretskii 0 siblings, 1 reply; 94+ messages in thread From: Mike Kupfer @ 2015-11-27 19:50 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii wrote: > > From: Mike Kupfer <m.kupfer@acm.org> > > cc: emacs-devel <emacs-devel@gnu.org> > > Date: Wed, 25 Nov 2015 15:04:37 -0800 > > > > To give a more concrete example, if I try to replace "papa" ("pope" in > > Italian) with "Francis", I would not want Emacs to also replace (or even > > suggest replacing) "papà" ("dad" in Italian) with "Francis". > > By default, Emacs won't. But if you set replace-character-fold > non-nil, it will. Ah, I see; thanks. Assuming that there won't be any major changes for 25.1 in this area, I think it would be helpful for the "Lax Search" Info node to say something about replace-character-fold, particularly since that node mentions the relationship between case-fold-search and replace commands. And maybe replace-character-fold should be listed in the "Search Customizations" node? Also, I'm confused about the exact semantics of replace-character-fold. Its help string says it applies to query-replace. Experimentation shows that it also applies to replace-string, but not replace-regexp ("[ab]" does not match "ä" even when replace-character-fold is non-nil). I'm not sure what's intended here, particularly since replace-regexp does honor case-fold-search. And speaking of case-fold-search, it is documented as buffer-local when set. Should search-default-regexp-mode and replace-character-fold do the same? regards, mike ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 19:50 ` Mike Kupfer @ 2015-11-27 20:06 ` Eli Zaretskii 2015-11-27 23:57 ` Artur Malabarba 2015-11-28 1:36 ` Mike Kupfer 0 siblings, 2 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-27 20:06 UTC (permalink / raw) To: Mike Kupfer; +Cc: emacs-devel > From: Mike Kupfer <m.kupfer@acm.org> > cc: emacs-devel@gnu.org > Date: Fri, 27 Nov 2015 11:50:18 -0800 > > Assuming that there won't be any major changes for 25.1 in this area, I > think it would be helpful for the "Lax Search" Info node to say > something about replace-character-fold, particularly since that node > mentions the relationship between case-fold-search and replace commands. > And maybe replace-character-fold should be listed in the "Search > Customizations" node? There's a companion node "Replacement and Lax Matches", which describes this variable. > Also, I'm confused about the exact semantics of replace-character-fold. > Its help string says it applies to query-replace. That's a mistake that should be fixed, thanks. > Experimentation shows that it also applies to replace-string, but > not replace-regexp ("[ab]" does not match "ä" even when > replace-character-fold is non-nil). I'm not sure what's intended > here, particularly since replace-regexp does honor case-fold-search. Not sure whether this is intended, please submit a bug report. > And speaking of case-fold-search, it is documented as buffer-local when > set. Should search-default-regexp-mode and replace-character-fold do > the same? No, I don't think so. case-fold-search is not only for searching commands, so it follows a different logic. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 20:06 ` Eli Zaretskii @ 2015-11-27 23:57 ` Artur Malabarba 2015-11-28 1:36 ` Mike Kupfer 1 sibling, 0 replies; 94+ messages in thread From: Artur Malabarba @ 2015-11-27 23:57 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Mike Kupfer, emacs-devel [-- Attachment #1: Type: text/plain, Size: 607 bytes --] On 27 Nov 2015 8:06 pm, "Eli Zaretskii" <eliz@gnu.org> wrote: > > Experimentation shows that it also applies to replace-string, but > > not replace-regexp ("[ab]" does not match "ä" even when > > replace-character-fold is non-nil). I'm not sure what's intended > > here, particularly since replace-regexp does honor case-fold-search. > > Not sure whether this is intended, please submit a bug report. It's a known limitation. Char folding works by converting a plain string to a regexp, so it does not work on regexps. The same happen with isearch. You can't do a char-folding regexp isearch. [-- Attachment #2: Type: text/html, Size: 785 bytes --] ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 20:06 ` Eli Zaretskii 2015-11-27 23:57 ` Artur Malabarba @ 2015-11-28 1:36 ` Mike Kupfer 2015-11-28 9:28 ` Eli Zaretskii 1 sibling, 1 reply; 94+ messages in thread From: Mike Kupfer @ 2015-11-28 1:36 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii wrote: > > From: Mike Kupfer <m.kupfer@acm.org> > > cc: emacs-devel@gnu.org > > Date: Fri, 27 Nov 2015 11:50:18 -0800 > > > > Assuming that there won't be any major changes for 25.1 in this area, I > > think it would be helpful for the "Lax Search" Info node to say > > something about replace-character-fold, particularly since that node > > mentions the relationship between case-fold-search and replace commands. > > And maybe replace-character-fold should be listed in the "Search > > Customizations" node? > > There's a companion node "Replacement and Lax Matches", which > describes this variable. Okay, so can a cross-reference to "Replacement and Lax Matches" be added to the "Lax Search" node? I mean, I did what you suggested in an earlier reply to someone else: I went straight to the "Lax Search" node. I didn't see anything in there to give me a clue about replace-character-fold. I did see the cross-reference to "Replace", but that was in the context of case-fold-search, which, unlike character folding, does apply to replace commands. With the current "Lax Search" text, there's just not enough of a hint to the reader that additional important information is available. Also, will the help strings for the search and the replace functions be updated to mention the relevant character folding variables? They already mention case-fold-search. (And I'd be less concerned about the "Lax Search" text if the help string gave me the right clue.) thanks, mike ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 1:36 ` Mike Kupfer @ 2015-11-28 9:28 ` Eli Zaretskii 0 siblings, 0 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-28 9:28 UTC (permalink / raw) To: Mike Kupfer; +Cc: emacs-devel > From: Mike Kupfer <m.kupfer@acm.org> > cc: emacs-devel@gnu.org > Date: Fri, 27 Nov 2015 17:36:08 -0800 > > > There's a companion node "Replacement and Lax Matches", which > > describes this variable. > > Okay, so can a cross-reference to "Replacement and Lax Matches" be added > to the "Lax Search" node? I added it now. > Also, will the help strings for the search and the replace functions be > updated to mention the relevant character folding variables? They > already mention case-fold-search. I found no search commands whose doc strings mention case-fold-search. I did find such references in replace commands, and added the reference to replace-character-fold there. Thanks. In the future, please post such suggestions as bug reports rather than here; if nothing else, that makes it easier to refer to the discussions in the log messages. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 21:58 ` Artur Malabarba 2015-11-25 23:04 ` Mike Kupfer @ 2015-11-26 13:28 ` Steinar Bang 1 sibling, 0 replies; 94+ messages in thread From: Steinar Bang @ 2015-11-26 13:28 UTC (permalink / raw) To: emacs-devel >>>>> Artur Malabarba <bruce.connor.am@gmail.com>: > Just to be clear. If Emacs tries to be clever about accents when I'm > replacing text, it will do the wrong thing at least 100% of the time in > Portuguese text. :-) Ditto for Norwegian. (Well perhaps not 100% of the time, since there are only 3 special letters, but still...) ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 20:14 ` Artur Malabarba 2015-11-25 20:30 ` Marcin Borkowski @ 2015-11-25 20:36 ` Eli Zaretskii 2015-11-25 21:49 ` Artur Malabarba 2015-11-27 12:03 ` Artur Malabarba 2015-11-26 16:08 ` Rasmus 2 siblings, 2 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-25 20:36 UTC (permalink / raw) To: bruce.connor.am; +Cc: emacs-devel > Date: Wed, 25 Nov 2015 20:14:06 +0000 > From: Artur Malabarba <bruce.connor.am@gmail.com> > Cc: emacs-devel <emacs-devel@gnu.org> > > > 1. Character folding doesn't catch ligatures, such as æ (should it match > > the two characters "ae")? > > I've no idea. It would be easy to add. No, I meant to ask why it doesn't work already. AFAIU, the decomposition of ff is "ff": (get-char-code-property ?ff 'decomposition) => (compat 102 102) but searching for 'f' doesn't match the ligature. (æ doesn't have a decomposition in the Unicode database, so maybe it's a different case.) > Those who use ligatures need to tell us whether that makes sense. I thought we used decomposition data automatically, no? > > 2. It also doesn't match ä (a single character) with ä (2 characters, > > which Emacs correctly composes into 1 grapheme cluster). Should it? > > Possibly. Since they look the same, might make things easier on users. But I > wouldn't know as I've never seen the second version used anywhere. Once again, the decomposition attribute says we should match them: (get-char-code-property ?ä 'decomposition) => (97 776) and the second character in ä is U+0308 = 776. Doesn't that say we should have matched them? ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 20:36 ` Eli Zaretskii @ 2015-11-25 21:49 ` Artur Malabarba 2015-11-26 3:34 ` Eli Zaretskii 2015-11-27 12:03 ` Artur Malabarba 1 sibling, 1 reply; 94+ messages in thread From: Artur Malabarba @ 2015-11-25 21:49 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 1942 bytes --] On 25 Nov 2015 8:36 pm, "Eli Zaretskii" <eliz@gnu.org> wrote: > > > Date: Wed, 25 Nov 2015 20:14:06 +0000 > > From: Artur Malabarba <bruce.connor.am@gmail.com> > > Cc: emacs-devel <emacs-devel@gnu.org> > > > > > 1. Character folding doesn't catch ligatures, such as æ (should it match > > > the two characters "ae")? > > > > I've no idea. It would be easy to add. > > No, I meant to ask why it doesn't work already. AFAIU, the > decomposition of ff is "ff": > > (get-char-code-property ?ff 'decomposition) > => (compat 102 102) > > but searching for 'f' doesn't match the ligature. (æ doesn't have a > decomposition in the Unicode database, so maybe it's a different > case.) I see. I thought this was a case of adding an adhoc rule. I'll have to look into it over the weekend to see why f doesn't match ff. > > > 2. It also doesn't match ä (a single character) with ä (2 characters, > > > which Emacs correctly composes into 1 grapheme cluster). Should it? > > > > Possibly. Since they look the same, might make things easier on users. But I > > wouldn't know as I've never seen the second version used anywhere. > > Once again, the decomposition attribute says we should match them: > > (get-char-code-property ?ä 'decomposition) > => (97 776) > > and the second character in ä is U+0308 = 776. Doesn't that say we > should have matched them? That's different. Currently we use the decomposition attribute to decide that "a" should match ä. Our approach so far has been that searching for the "easy to type" characters should match the "hard to type" characters, but searching for the "hard to type" characters will only match the character itself. So right now it is working as intended. We can (and I think we should) extend that last case so that searching for the "hard to type" characters will only match the character itself or its exact decomposition. [-- Attachment #2: Type: text/html, Size: 2593 bytes --] ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 21:49 ` Artur Malabarba @ 2015-11-26 3:34 ` Eli Zaretskii 0 siblings, 0 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-26 3:34 UTC (permalink / raw) To: bruce.connor.am; +Cc: emacs-devel > Date: Wed, 25 Nov 2015 21:49:58 +0000 > From: Artur Malabarba <bruce.connor.am@gmail.com> > Cc: emacs-devel <emacs-devel@gnu.org> > > > > > 2. It also doesn't match ä (a single character) with ä (2 characters, > > > > which Emacs correctly composes into 1 grapheme cluster). Should it? > > > > > > Possibly. Since they look the same, might make things easier on users. But > I > > > wouldn't know as I've never seen the second version used anywhere. > > > > Once again, the decomposition attribute says we should match them: > > > > (get-char-code-property ?ä 'decomposition) > > => (97 776) > > > > and the second character in ä is U+0308 = 776. Doesn't that say we > > should have matched them? > > That's different. Currently we use the decomposition attribute to decide that > "a" should match ä. Our approach so far has been that searching for the "easy > to type" characters should match the "hard to type" characters, but searching > for the "hard to type" characters will only match the character itself. So > right now it is working as intended. > > We can (and I think we should) extend that last case so that searching for the > "hard to type" characters will only match the character itself or its exact > decomposition. The first part (matching only itself) already works, AFAICS. If the latter doesn't require too deep changes, I think we should do that for Emacs 25.1, because it would be confusing not to have that. Thanks. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 20:36 ` Eli Zaretskii 2015-11-25 21:49 ` Artur Malabarba @ 2015-11-27 12:03 ` Artur Malabarba 2015-11-27 14:36 ` Eli Zaretskii 1 sibling, 1 reply; 94+ messages in thread From: Artur Malabarba @ 2015-11-27 12:03 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: > No, I meant to ask why it doesn't work already. AFAIU, the > decomposition of ff is "ff": > > (get-char-code-property ?ff 'decomposition) > => (compat 102 102) > > but searching for 'f' doesn't match the ligature. It does for me. In this very buffer, if I isearch for 'f' I can get to the ligature above. Are you sure char-fold was ON when you tested? > (æ doesn't have a > decomposition in the Unicode database, so maybe it's a different > case.) True. If people think it makes sense, we can add an ad-hoc rule for 'a' to match 'æ' >> > 2. It also doesn't match ä (a single character) with ä (2 characters, >> > which Emacs correctly composes into 1 grapheme cluster). Should it? Done now. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 12:03 ` Artur Malabarba @ 2015-11-27 14:36 ` Eli Zaretskii 2015-11-27 16:50 ` Per Starbäck 2015-11-27 16:55 ` Artur Malabarba 0 siblings, 2 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-27 14:36 UTC (permalink / raw) To: Artur Malabarba; +Cc: emacs-devel > From: Artur Malabarba <bruce.connor.am@gmail.com> > Cc: emacs-devel@gnu.org > Date: Fri, 27 Nov 2015 12:03:11 +0000 > > Eli Zaretskii <eliz@gnu.org> writes: > > > No, I meant to ask why it doesn't work already. AFAIU, the > > decomposition of ff is "ff": > > > > (get-char-code-property ?ff 'decomposition) > > => (compat 102 102) > > > > but searching for 'f' doesn't match the ligature. > > It does for me. In this very buffer, if I isearch for 'f' I can get to > the ligature above. Right, it does. I think I tried "ff", not "f". Is that supposed to work? > Are you sure char-fold was ON when you tested? It was in "emacs -Q", so yes. > >> > 2. It also doesn't match ä (a single character) with ä (2 characters, > >> > which Emacs correctly composes into 1 grapheme cluster). Should it? > > Done now. Thanks. But if this now work, why doesn't "ff" find ff or vice versa? Isn't that the same case? ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 14:36 ` Eli Zaretskii @ 2015-11-27 16:50 ` Per Starbäck 2015-11-27 18:10 ` Artur Malabarba ` (2 more replies) 2015-11-27 16:55 ` Artur Malabarba 1 sibling, 3 replies; 94+ messages in thread From: Per Starbäck @ 2015-11-27 16:50 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Artur Malabarba, emacs-devel@gnu.org Oh, I have so many thoughts about this, but I'll stick to the character folding for now, which is why language setting in Emacs has become a lot more urgent now than it has been during the previous years I have wished for this. As I wrote before, ÅÄÖ are really separate letters in Swedish, just as separate from A and O as U is from V, or I is from J. I wrote: > To have a search for > "varpa" in a Swedish text find "värpa" or "varpå" would be just wrong. > It would give a strong impression of this being an American program > not meant to be used for Swedish. One answer I got was that it's possible to turn this off. Yes, it is, but defaults are important for what impression you give. I haven't been active on the list for some time, but when I have expressed opinions on Emacs here before it has often been not thinking about myself, but thinking about the students that I teach Emacs, so that *I* can change settings is not enough for my consideration. Also character folding is a great feature! I don't want to turn it off! It's just that it's bad to fold characters that are in no way seen as variants but totally different letters. There are few languages using Latin letters where it is like that, so any universal poll will say that this isn't a big problem. (For example Germans also use Ä and Ö, but much more seen as A-with-Umlaut than as something separate.) But to see how this will be received here, imagine that Emacs came from the Roman empire. (The empire never ended!) Of course we all know some Latin, so we have no problems with the menus and help texts being in Latin, even though we often use it for editing texts in other languages, like English. Now there's a new version with a new feature character folding, and when you (an American user) try to use the new version of Emacs you happen to edit a text Can dualism still be considered valid? You do a C-s to position yourself at "valid" there, but to your surprise and irritation you have to type all five letters, because still at "vali" you are stuck in "dualism" because those imperialistic Romans think that U and V are the "same" letter. That's just wrong. So what is the right way out? A possibility to set buffer language says I. Eli says that a buffer language is not enough: Eli: > This discussion started from Isearch, so the > example which underlines the issues is searching for a string with > character-folding enabled -- this should automatically apply > language-specific rules when it hits a possible match in the Turkish > portion, then switch back to English when the match is in the English > part. I don't agree, and see this as an important difference between the language of a segment and the language of a document (which I would write a lot more about if I didn't try to stick just to the character folding issue now). If you are a non-Swede looking at a text including : Eli Heckscher referred to this in his "Varpå beror det att några : människor är rika och andra fattiga?" from 1913. and do an Isearch for "varpa" with accent folding on, you *should* find that "Varpå". You see the text with some "Varpa" with some diacretical mark, of course you should find it with that search. You can't be expected to know about Swedish preferences just because there happens to be a short text fragment in Swedish in the text. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 16:50 ` Per Starbäck @ 2015-11-27 18:10 ` Artur Malabarba 2015-11-27 18:42 ` Per Starbäck 2015-11-27 21:33 ` raman 2016-02-28 0:27 ` Mathias Dahl 2 siblings, 1 reply; 94+ messages in thread From: Artur Malabarba @ 2015-11-27 18:10 UTC (permalink / raw) To: Per Starbäck; +Cc: Eli Zaretskii, emacs-devel [-- Attachment #1: Type: text/plain, Size: 392 bytes --] On 27 Nov 2015 4:50 pm, "Per Starbäck" <per.starback@gmail.com> wrote: > As I wrote before, ÅÄÖ are really separate letters in Swedish, Do they have their own keys on the keyboard? In Portuguese, aãá are never interchangeable. Still, I find char folding very convenient because it saves me keystrokes (ã and á don't get their own keys so they require two keystrokes to type). [-- Attachment #2: Type: text/html, Size: 505 bytes --] ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 18:10 ` Artur Malabarba @ 2015-11-27 18:42 ` Per Starbäck 0 siblings, 0 replies; 94+ messages in thread From: Per Starbäck @ 2015-11-27 18:42 UTC (permalink / raw) To: Artur Malabarba; +Cc: Eli Zaretskii, emacs-devel >> As I wrote before, ÅÄÖ are really separate letters in Swedish, > > Do they have their own keys on the keyboard? > In Portuguese, aãá are never interchangeable. Still, I find char folding > very convenient because it saves me keystrokes (ã and á don't get their own > keys so they require two keystrokes to type). Yes, of course they have, as they are just as much different letters as I and J. Being interchangeable is not the same thing. For example "e" and "é" are not interchangeable in Swedish either, and "ide" and "idé" are different words, but having "C-s i d e" find "idé" would be good character folding as "é" is "e" with an accent. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 16:50 ` Per Starbäck 2015-11-27 18:10 ` Artur Malabarba @ 2015-11-27 21:33 ` raman 2016-02-28 0:27 ` Mathias Dahl 2 siblings, 0 replies; 94+ messages in thread From: raman @ 2015-11-27 21:33 UTC (permalink / raw) To: Per Starbäck; +Cc: Eli Zaretskii, Artur Malabarba, emacs-devel@gnu.org This is issue may be better thought of as a character-set issue, rather than a language issue. -- ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 16:50 ` Per Starbäck 2015-11-27 18:10 ` Artur Malabarba 2015-11-27 21:33 ` raman @ 2016-02-28 0:27 ` Mathias Dahl 2016-02-28 15:58 ` Eli Zaretskii 2 siblings, 1 reply; 94+ messages in thread From: Mathias Dahl @ 2016-02-28 0:27 UTC (permalink / raw) To: Per Starbäck, emacs-devel@gnu.org; +Cc: Eli Zaretskii, Artur Malabarba [-- Attachment #1: Type: text/plain, Size: 2636 bytes --] On Fri, Nov 27, 2015 at 5:50 PM, Per Starbäck <per.starback@gmail.com> wrote: One answer I got was that it's possible to turn this off. Yes, it is, > but defaults are important for what impression you give. I haven't > been active on the list for some time, but when I have expressed > opinions on Emacs here before it has often been not thinking about > myself, but thinking about the students that I teach Emacs, so that > *I* can change settings is not enough for my consideration. > > Also character folding is a great feature! I don't want to turn it > off! It's just that it's bad to fold characters that are in no way > seen as variants but totally different letters. > I agree with Per that this new feature is problematic. I have used Emacs for soon 20 years and up until now, if I search for an "a" I find only "a". From my view, suddenly finding "ä" or "å" as well would, in my view, be to find "false hits". Surely one could argue that case folding has the same problem but I think those are less and it has been the default for as long as I have used Emacs and I think it is common in most programs to have this behavior by default. This new feature however I cannot remember seeing anywhere so it cannot be that important to have it turned on by default. I am sure the new feature is useful to some, but for me it will just be annoying. I have "ä" and "å" keys on my keyboard so I have no problem inputting them. When I visit other countries where people does not have such keyboards I simply turn on the Swedish input method swedish-postfix to enter these letters. I think having this feature on by default might risk annoy more users than it will benefit. I'm quite certain, if I tried to get a Swedish college to try out Emacs, that they would comment on such a feature as being quite strange. I do not agree it would be the same as finding a "u" when searching for "v", but still... Now that I know about this feature I will turn it off and enable it only when I need it, but I wish it would have been the other way around, that users which needs it would need to enable it. Sorry for coming late to the party on this one... /Mathias PS. Per mentioned that the scenario with "ide" matching "idé" would be okay. I'm divided on that one. "é" is not an official part of the Swedish alhpabet, like "å" and "ä", so from some perspective it would be okay, but it feels like a very slippery slope... Probably, as some has advocated here, if there would be a way to express the language for a buffer or region of text a feature like this *might* fit better. [-- Attachment #2: Type: text/html, Size: 3635 bytes --] ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2016-02-28 0:27 ` Mathias Dahl @ 2016-02-28 15:58 ` Eli Zaretskii 2016-02-28 17:52 ` Mathias Dahl 0 siblings, 1 reply; 94+ messages in thread From: Eli Zaretskii @ 2016-02-28 15:58 UTC (permalink / raw) To: Mathias Dahl; +Cc: per.starback, bruce.connor.am, emacs-devel > From: Mathias Dahl <mathias.dahl@gmail.com> > Date: Sun, 28 Feb 2016 01:27:10 +0100 > Cc: Eli Zaretskii <eliz@gnu.org>, Artur Malabarba <bruce.connor.am@gmail.com> > > I agree with Per that this new feature is problematic. I have used Emacs > for soon 20 years and up until now, if I search for an "a" I find only > "a". From my view, suddenly finding "ä" or "å" as well would, in my > view, be to find "false hits". What about finding "ä" (a 2-character sequence) when looking for "ä", or finding "å" (1 character) when looking for "å" (2 characters) -- would you consider these false hits as well? > Surely one could argue that case folding has the same problem but I > think those are less and it has been the default for as long as I > have used Emacs and I think it is common in most programs to have > this behavior by default. This new feature however I cannot remember > seeing anywhere so it cannot be that important to have it turned on > by default. Emacs has many features on by default that are not anywhere else, or weren't when Emacs introduced them. So I don't think this argument should guide our decisions. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2016-02-28 15:58 ` Eli Zaretskii @ 2016-02-28 17:52 ` Mathias Dahl 2016-02-28 18:02 ` Eli Zaretskii 0 siblings, 1 reply; 94+ messages in thread From: Mathias Dahl @ 2016-02-28 17:52 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Per Starbäck, Artur Malabarba, emacs-devel [-- Attachment #1: Type: text/plain, Size: 2342 bytes --] > > > I agree with Per that this new feature is problematic. I have used Emacs > > for soon 20 years and up until now, if I search for an "a" I find only > > "a". From my view, suddenly finding "ä" or "å" as well would, in my > > view, be to find "false hits". > > What about finding "ä" (a 2-character sequence) when looking for "ä", or > finding "å" (1 character) when looking for "å" (2 characters) -- would > you consider these false hits as well? > I have not thought about that scenario (in fact, I did not know there was a difference), but since it visually looks the same I would probably be surprised to not find the former when searching using the latter. It is a scenario that I would think is extremely unlikely to happen for "ä-users" like me though but I guess that is just anecdotal evidence. > Surely one could argue that case folding has the same problem but I > > think those are less and it has been the default for as long as I > > have used Emacs and I think it is common in most programs to have > > this behavior by default. This new feature however I cannot remember > > seeing anywhere so it cannot be that important to have it turned on > > by default. > > Emacs has many features on by default that are not anywhere else, or > weren't when Emacs introduced them. So I don't think this argument > should guide our decisions. > I don't agree. Just because this is not common in other places does not mean we must use that as the sole argument for such a decision, but I definitely think it can *guide* us, together with other arguments. Much better, of course, would be a poll among users. Since I came late to this discussion I don't know if such a poll was done. I have not heard about the use cases for this change either. In what scenarios is this useful, and does those scenarios happen often enough to motivate such a feature being on by default (and does it outnumber the cases where it causes problems)? I might possibly use this feature myself sometime, but it will not be the normal case. I view this a bit like the difference between a normal, and a regexp isearch, with the difference that I would use this much less often than I use regexp isearch. Or "word isearch", which I never use (possibly because I don't have much need for it). [-- Attachment #2: Type: text/html, Size: 3490 bytes --] ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2016-02-28 17:52 ` Mathias Dahl @ 2016-02-28 18:02 ` Eli Zaretskii 2016-02-29 13:32 ` Richard Stallman 0 siblings, 1 reply; 94+ messages in thread From: Eli Zaretskii @ 2016-02-28 18:02 UTC (permalink / raw) To: Mathias Dahl; +Cc: per.starback, bruce.connor.am, emacs-devel > From: Mathias Dahl <mathias.dahl@gmail.com> > Date: Sun, 28 Feb 2016 18:52:38 +0100 > Cc: Per Starbäck <per.starback@gmail.com>, > Artur Malabarba <bruce.connor.am@gmail.com>, emacs-devel@gnu.org > > Much better, of course, would be a poll among users. Since I came late to > this discussion I don't know if such a poll was done. The discussions were right here, you can simply read them (provided that you have enough time ;-): http://lists.gnu.org/archive/html/emacs-devel/2016-02/msg00089.html http://lists.gnu.org/archive/html/emacs-devel/2016-02/msg00506.html ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2016-02-28 18:02 ` Eli Zaretskii @ 2016-02-29 13:32 ` Richard Stallman 2016-02-29 16:04 ` Eli Zaretskii 0 siblings, 1 reply; 94+ messages in thread From: Richard Stallman @ 2016-02-29 13:32 UTC (permalink / raw) To: Eli Zaretskii; +Cc: per.starback, emacs-devel, bruce.connor.am, mathias.dahl [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > Much better, of course, would be a poll among users. Since I came late to > > this discussion I don't know if such a poll was done. > The discussions were right here, you can simply read them (provided > that you have enough time ;-): A discussion here is the first step, but not a substitute for a poll of users. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2016-02-29 13:32 ` Richard Stallman @ 2016-02-29 16:04 ` Eli Zaretskii 2016-03-01 16:52 ` Richard Stallman 0 siblings, 1 reply; 94+ messages in thread From: Eli Zaretskii @ 2016-02-29 16:04 UTC (permalink / raw) To: rms; +Cc: per.starback, emacs-devel, bruce.connor.am, mathias.dahl > From: Richard Stallman <rms@gnu.org> > CC: mathias.dahl@gmail.com, per.starback@gmail.com, > bruce.connor.am@gmail.com, emacs-devel@gnu.org > Date: Mon, 29 Feb 2016 08:32:05 -0500 > > > > Much better, of course, would be a poll among users. Since I came late to > > > this discussion I don't know if such a poll was done. > > > The discussions were right here, you can simply read them (provided > > that you have enough time ;-): > > A discussion here is the first step, but not a substitute for a poll > of users. That discussion is the closest approximation to a poll we had, so reading it should probably be useful for someone who missed it. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2016-02-29 16:04 ` Eli Zaretskii @ 2016-03-01 16:52 ` Richard Stallman 0 siblings, 0 replies; 94+ messages in thread From: Richard Stallman @ 2016-03-01 16:52 UTC (permalink / raw) To: Eli Zaretskii; +Cc: per.starback, emacs-devel, bruce.connor.am, mathias.dahl [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > A discussion here is the first step, but not a substitute for a poll > > of users. > That discussion is the closest approximation to a poll we had, so > reading it should probably be useful for someone who missed it. It is probably pertinent reading, but if it was only on this list, it doesn't come close to a poll of the users. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 14:36 ` Eli Zaretskii 2015-11-27 16:50 ` Per Starbäck @ 2015-11-27 16:55 ` Artur Malabarba 2015-11-27 17:52 ` Eli Zaretskii 2015-11-27 21:18 ` Stephen Berman 1 sibling, 2 replies; 94+ messages in thread From: Artur Malabarba @ 2015-11-27 16:55 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 1399 bytes --] On 27 Nov 2015 2:36 pm, "Eli Zaretskii" <eliz@gnu.org> wrote: > > It does for me. In this very buffer, if I isearch for 'f' I can get to > > the ligature above. > > Right, it does. I think I tried "ff", not "f". Is that supposed to > work? No. We don't support having multiple characters match a single string. This is a design limitation. We can (and should) discuss improving this. But for now I think it should be documented as not supported. > > >> > 2. It also doesn't match ä (a single character) with ä (2 characters, > > >> > which Emacs correctly composes into 1 grapheme cluster). Should it? > > > > Done now. > > Thanks. > > But if this now work, why doesn't "ff" find ff or vice versa? Isn't > that the same case? No. Each one is a different scenario here. - "ff" not finding ff is a case of multiple chars in the search string can't be collapsed as a single thing (see above). It's the same reason why 'ä' still doesn't match ä. - ä now finds 'ä'. Because that is exactly its decomposition. - ff doesn't find "ff", because the decomposition of ff is not exactly (f f), it's actually (compat f f). This was a decision, it's not a limitation. I figured that a character should only match its decomposition if the decomposition is strictly made of chars. Otherwise you get things like ¹ matching 1 (which I thought we didn't want). [-- Attachment #2: Type: text/html, Size: 1811 bytes --] ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 16:55 ` Artur Malabarba @ 2015-11-27 17:52 ` Eli Zaretskii 2015-11-27 21:18 ` Stephen Berman 1 sibling, 0 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-27 17:52 UTC (permalink / raw) To: bruce.connor.am; +Cc: emacs-devel > Date: Fri, 27 Nov 2015 16:55:45 +0000 > From: Artur Malabarba <bruce.connor.am@gmail.com> > Cc: emacs-devel <emacs-devel@gnu.org> > > > Right, it does. I think I tried "ff", not "f". Is that supposed to > > work? > > No. We don't support having multiple characters match a single string. > > This is a design limitation. We can (and should) discuss improving this. But > for now I think it should be documented as not supported. Is it reasonable to have ä match ä, but not the other way around? > - ä now finds 'ä'. Because that is exactly its decomposition. > - ff doesn't find "ff", because the decomposition of ff is not exactly (f f), > it's actually (compat f f). This was a decision, it's not a limitation. So you are saying we support canonical decompositions, but not compatibility decompositions, I see. However, it sounds inconsistent to me, because searching for a does find ⓐ, although ⓐ's decomposition is also "not exactly a". I'm afraid it will be hard to explain to the users why some of these match, while others don't. Are there any downsides in adding compatibility decompositions to what character folding supports? > I figured that a character should only match its decomposition if the > decomposition is strictly made of chars. Otherwise you get things like ¹ > matching 1 (which I thought we didn't want). Well, I think we do want that. At least MS Word does that by default, so it isn't entirely silly or without precedent. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 16:55 ` Artur Malabarba 2015-11-27 17:52 ` Eli Zaretskii @ 2015-11-27 21:18 ` Stephen Berman 2015-11-28 0:04 ` Artur Malabarba 2015-11-28 5:36 ` Richard Stallman 1 sibling, 2 replies; 94+ messages in thread From: Stephen Berman @ 2015-11-27 21:18 UTC (permalink / raw) To: Artur Malabarba; +Cc: Eli Zaretskii, emacs-devel On Fri, 27 Nov 2015 16:55:45 +0000 Artur Malabarba <bruce.connor.am@gmail.com> wrote: > On 27 Nov 2015 2:36 pm, "Eli Zaretskii" <eliz@gnu.org> wrote: >> > It does for me. In this very buffer, if I isearch for 'f' I can get to >> > the ligature above. >> >> Right, it does. I think I tried "ff", not "f". Is that supposed to >> work? > > No. We don't support having multiple characters match a single string. Is this why "ss" does not match the German letter "ß"? I assume the reason "s" does not match "ß" is that the latter does not have a decomposition including "s", whereas the decomposition of e.g. "ff" does include "f", correct? (Though I actually think that may be the preferred behavior for the search string "s" when searching German text, in contrast to the search string "ss", which I think should be able to find "ß".) In fact, looking at the value of character-fold-table, it seems to me that the current implementation of folding based on character decomposition often yields surprising results: e.g. "f" matches not only "ff" but also "㎙" and "ffl", but "m" and "l" fail to match the latter two, respectively. I would expect these three search string either all to match or all to fail to match all three composed character strings. Another shortcoming is that the decompositions do not respect case-folding, e.g. "f" fails to match "ℱ" and "℻" (with case-folding enabled), whereas "F" does match them, but fails to match "ff", etc. (also, "A" and "X" fail to match "℻"). Steve Berman ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 21:18 ` Stephen Berman @ 2015-11-28 0:04 ` Artur Malabarba 2015-11-28 7:49 ` Eli Zaretskii 2015-11-28 16:14 ` Stephen Berman 2015-11-28 5:36 ` Richard Stallman 1 sibling, 2 replies; 94+ messages in thread From: Artur Malabarba @ 2015-11-28 0:04 UTC (permalink / raw) To: Stephen Berman; +Cc: Eli Zaretskii, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1120 bytes --] On 27 Nov 2015 9:18 pm, "Stephen Berman" <stephen.berman@gmx.net> wrote: > > No. We don't support having multiple characters match a single string. > > Is this why "ss" does not match the German letter "ß"? Indeed. > I assume the > reason "s" does not match "ß" is that the latter does not have a > decomposition including "s", whereas the decomposition of e.g. "ff" does > include "f", correct? Yes. > In fact, looking at the value of character-fold-table, it seems to me > that the current implementation of folding based on character > decomposition often yields surprising results: e.g. "f" matches not only > "ff" but also "㎙" and "ffl", but "m" and "l" fail to match the latter two, > respectively. This was by choice, and it would be trivial to change. Do others find it surprising? > Another shortcoming is that the decompositions do not respect > case-folding, e.g. "f" fails to match "ℱ" and "℻" (with case-folding > enabled), whereas "F" does match them, but fails to match "ff". True. This can be fixed, I think. Could you file a bug report so we don't forget? [-- Attachment #2: Type: text/html, Size: 1589 bytes --] ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 0:04 ` Artur Malabarba @ 2015-11-28 7:49 ` Eli Zaretskii 2015-11-28 16:14 ` Stephen Berman 1 sibling, 0 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-28 7:49 UTC (permalink / raw) To: bruce.connor.am; +Cc: stephen.berman, emacs-devel > Date: Sat, 28 Nov 2015 00:04:33 +0000 > From: Artur Malabarba <bruce.connor.am@gmail.com> > Cc: emacs-devel <emacs-devel@gnu.org>, Eli Zaretskii <eliz@gnu.org> > > On 27 Nov 2015 9:18 pm, "Stephen Berman" <stephen.berman@gmx.net> wrote: > > > No. We don't support having multiple characters match a single string. > > > > Is this why "ss" does not match the German letter "ß"? > > Indeed. In fact, ß doesn't have a decomposition at all in the Unicode database: (get-char-code-property ?ß 'decomposition) => 223 IOW, it "decomposes" into itself, an indication of no decomposition. > > I assume the > > reason "s" does not match "ß" is that the latter does not have a > > decomposition including "s", whereas the decomposition of e.g. "ff" does > > include "f", correct? > > Yes. > > > In fact, looking at the value of character-fold-table, it seems to me > > that the current implementation of folding based on character > > decomposition often yields surprising results: e.g. "f" matches not only > > "ff" but also "㎙" and "ffl", but "m" and "l" fail to match the latter two, > > respectively. > > This was by choice, and it would be trivial to change. Do others find it > surprising? I do. I think these should match. > > Another shortcoming is that the decompositions do not respect > > case-folding, e.g. "f" fails to match "ℱ" and "℻" (with case-folding > > enabled), whereas "F" does match them, but fails to match "ff". > > True. This can be fixed, I think. Could you file a bug report so we don't > forget? This should be fixed for v25.1 as well, I think. Thanks. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 0:04 ` Artur Malabarba 2015-11-28 7:49 ` Eli Zaretskii @ 2015-11-28 16:14 ` Stephen Berman 1 sibling, 0 replies; 94+ messages in thread From: Stephen Berman @ 2015-11-28 16:14 UTC (permalink / raw) To: Artur Malabarba; +Cc: Eli Zaretskii, emacs-devel On Sat, 28 Nov 2015 00:04:33 +0000 Artur Malabarba <bruce.connor.am@gmail.com> wrote: > On 27 Nov 2015 9:18 pm, "Stephen Berman" <stephen.berman@gmx.net> wrote: >> > No. We don't support having multiple characters match a single string. >> >> Is this why "ss" does not match the German letter "ß"? > > Indeed. > >> I assume the >> reason "s" does not match "ß" is that the latter does not have a >> decomposition including "s", whereas the decomposition of e.g. "ff" does >> include "f", correct? > > Yes. > >> In fact, looking at the value of character-fold-table, it seems to me >> that the current implementation of folding based on character >> decomposition often yields surprising results: e.g. "f" matches not only >> "ff" but also "㎙" and "ffl", but "m" and "l" fail to match the latter two, >> respectively. > > This was by choice, and it would be trivial to change. Do others find it surprising? > >> Another shortcoming is that the decompositions do not respect >> case-folding, e.g. "f" fails to match "ℱ" and "℻" (with case-folding >> enabled), whereas "F" does match them, but fails to match "ff". > > True. This can be fixed, I think. Could you file a bug report so we don't forget? > > Although you already said you'd be working on these issues (thanks), I filed a bug for the record (bug#22038). Steve Berman ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 21:18 ` Stephen Berman 2015-11-28 0:04 ` Artur Malabarba @ 2015-11-28 5:36 ` Richard Stallman 2015-11-28 8:33 ` Eli Zaretskii 2015-11-28 8:40 ` Marcin Borkowski 1 sibling, 2 replies; 94+ messages in thread From: Richard Stallman @ 2015-11-28 5:36 UTC (permalink / raw) To: Stephen Berman; +Cc: eliz, bruce.connor.am, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] Ligatures are a different issue from letters with diacritics. I think that ideally ligatures should be equivalent, in search, to the sequence of characters they combine. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 5:36 ` Richard Stallman @ 2015-11-28 8:33 ` Eli Zaretskii 2015-11-28 8:40 ` Marcin Borkowski 1 sibling, 0 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-28 8:33 UTC (permalink / raw) To: rms; +Cc: stephen.berman, bruce.connor.am, emacs-devel > From: Richard Stallman <rms@gnu.org> > CC: bruce.connor.am@gmail.com, eliz@gnu.org, emacs-devel@gnu.org > Date: Sat, 28 Nov 2015 00:36:21 -0500 > > Ligatures are a different issue from letters with diacritics. I think > that ideally ligatures should be equivalent, in search, to the > sequence of characters they combine. I agree. I think we should make that work for Emacs 25.1, because anything else means too much inconsistency, and will be hard to explain and document. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 5:36 ` Richard Stallman 2015-11-28 8:33 ` Eli Zaretskii @ 2015-11-28 8:40 ` Marcin Borkowski 2015-11-28 9:46 ` Eli Zaretskii 1 sibling, 1 reply; 94+ messages in thread From: Marcin Borkowski @ 2015-11-28 8:40 UTC (permalink / raw) To: rms; +Cc: eliz, Stephen Berman, bruce.connor.am, emacs-devel On 2015-11-28, at 06:36, Richard Stallman <rms@gnu.org> wrote: > [[[ To any NSA and FBI agents reading my email: please consider ]]] > [[[ whether defending the US Constitution against all enemies, ]]] > [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > Ligatures are a different issue from letters with diacritics. I think > that ideally ligatures should be equivalent, in search, to the > sequence of characters they combine. Watching this discussion, I'm just astonished that no-one complained (yet?) that searching for "et" does not find "&" (and/or vice versa). ;-) Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 8:40 ` Marcin Borkowski @ 2015-11-28 9:46 ` Eli Zaretskii 2015-11-28 10:23 ` Artur Malabarba 0 siblings, 1 reply; 94+ messages in thread From: Eli Zaretskii @ 2015-11-28 9:46 UTC (permalink / raw) To: Marcin Borkowski; +Cc: stephen.berman, rms, bruce.connor.am, emacs-devel > From: Marcin Borkowski <mbork@mbork.pl> > Date: Sat, 28 Nov 2015 09:40:06 +0100 > Cc: eliz@gnu.org, Stephen Berman <stephen.berman@gmx.net>, > bruce.connor.am@gmail.com, emacs-devel@gnu.org > > Watching this discussion, I'm just astonished that no-one complained > (yet?) that searching for "et" does not find "&" (and/or vice versa). Why complain? Emacs lets you customize this feature to do that as well. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 9:46 ` Eli Zaretskii @ 2015-11-28 10:23 ` Artur Malabarba 2015-11-28 11:14 ` Eli Zaretskii ` (4 more replies) 0 siblings, 5 replies; 94+ messages in thread From: Artur Malabarba @ 2015-11-28 10:23 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Stephen Berman, Richard Stallman, emacs-devel Ok. I'm going to work on the char-folding a little bit more today to implement support for multi-char matches and to combine it with case-folding. Hopefully that will iron out the final inconsistencies. 2015-11-28 9:46 GMT+00:00 Eli Zaretskii <eliz@gnu.org>: >> From: Marcin Borkowski <mbork@mbork.pl> >> Date: Sat, 28 Nov 2015 09:40:06 +0100 >> Cc: eliz@gnu.org, Stephen Berman <stephen.berman@gmx.net>, >> bruce.connor.am@gmail.com, emacs-devel@gnu.org >> >> Watching this discussion, I'm just astonished that no-one complained >> (yet?) that searching for "et" does not find "&" (and/or vice versa). > > Why complain? Emacs lets you customize this feature to do that as > well. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 10:23 ` Artur Malabarba @ 2015-11-28 11:14 ` Eli Zaretskii 2015-11-28 14:41 ` Eli Zaretskii ` (3 subsequent siblings) 4 siblings, 0 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-28 11:14 UTC (permalink / raw) To: bruce.connor.am; +Cc: stephen.berman, rms, emacs-devel > Date: Sat, 28 Nov 2015 10:23:12 +0000 > From: Artur Malabarba <bruce.connor.am@gmail.com> > Cc: Marcin Borkowski <mbork@mbork.pl>, Richard Stallman <rms@gnu.org>, Stephen Berman <stephen.berman@gmx.net>, > emacs-devel <emacs-devel@gnu.org> > > Ok. I'm going to work on the char-folding a little bit more today to > implement support for multi-char matches and to combine it with > case-folding. Hopefully that will iron out the final inconsistencies. Thanks. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 10:23 ` Artur Malabarba 2015-11-28 11:14 ` Eli Zaretskii @ 2015-11-28 14:41 ` Eli Zaretskii 2015-11-28 15:41 ` Artur Malabarba 2015-11-28 16:48 ` character folding future [was: Questions about isearch] Drew Adams ` (2 subsequent siblings) 4 siblings, 1 reply; 94+ messages in thread From: Eli Zaretskii @ 2015-11-28 14:41 UTC (permalink / raw) To: bruce.connor.am; +Cc: emacs-devel > Date: Sat, 28 Nov 2015 10:23:12 +0000 > From: Artur Malabarba <bruce.connor.am@gmail.com> > Cc: Stephen Berman <stephen.berman@gmx.net>, Richard Stallman <rms@gnu.org>, > emacs-devel <emacs-devel@gnu.org> > > Ok. I'm going to work on the char-folding a little bit more today to > implement support for multi-char matches and to combine it with > case-folding. Hopefully that will iron out the final inconsistencies. Maybe you could also take a look at this document: http://www.unicode.org/reports/tr30/tr30-4.html (This is a draft of a report that was never approved, but that doesn't mean it cannot teach us something useful.) In particular, section 5.2 there mentions several problematic foldings, which we might consider disabling. For example, the ones mentioned in 5.2.1 and 5.2.2. Thanks. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 14:41 ` Eli Zaretskii @ 2015-11-28 15:41 ` Artur Malabarba 2015-11-28 16:29 ` Artur Malabarba 0 siblings, 1 reply; 94+ messages in thread From: Artur Malabarba @ 2015-11-28 15:41 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel 2015-11-28 14:41 GMT+00:00 Eli Zaretskii <eliz@gnu.org>: >> Date: Sat, 28 Nov 2015 10:23:12 +0000 >> From: Artur Malabarba <bruce.connor.am@gmail.com> >> Cc: Stephen Berman <stephen.berman@gmx.net>, Richard Stallman <rms@gnu.org>, >> emacs-devel <emacs-devel@gnu.org> >> >> Ok. I'm going to work on the char-folding a little bit more today to >> implement support for multi-char matches and to combine it with >> case-folding. Hopefully that will iron out the final inconsistencies. I'm running bootstrap now to make sure I didn't break anything. Then I'll push. > Maybe you could also take a look at this document: > > http://www.unicode.org/reports/tr30/tr30-4.html > > In particular, section 5.2 there mentions several problematic > foldings, which we might consider disabling. For example, the ones > mentioned in 5.2.1 and 5.2.2. Thanks for the pointer. None of those really worry me WRT searching. Char folding is supposed to be convenient at the cost of being unable to distinguish some strings. But I guess they could be a problem for query-replace. Someone replacing 58 with 59 probably doesn't want to replace 5⑧ with 59. Since folding is disabled by default on quuery-replace, I think it would be a bit of shame to disable these "risky" foldings completely. Perhaps query-replace could use a different char table, with only a subset. Or it perhaps it would be sufficient to just make this "danger" very clear in the docstring of `replace-character-fold'. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 15:41 ` Artur Malabarba @ 2015-11-28 16:29 ` Artur Malabarba 2015-11-28 17:27 ` Eli Zaretskii 2015-11-28 17:44 ` Eli Zaretskii 0 siblings, 2 replies; 94+ messages in thread From: Artur Malabarba @ 2015-11-28 16:29 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel ä2015-11-28 15:41 GMT+00:00 Artur Malabarba <bruce.connor.am@gmail.com>: > 2015-11-28 14:41 GMT+00:00 Eli Zaretskii <eliz@gnu.org>: >>> Date: Sat, 28 Nov 2015 10:23:12 +0000 >>> From: Artur Malabarba <bruce.connor.am@gmail.com> >>> Cc: Stephen Berman <stephen.berman@gmx.net>, Richard Stallman <rms@gnu.org>, >>> emacs-devel <emacs-devel@gnuuite rg> >>> >>> Ok. I'm going to work on the char-folding a little bit more today to >>> implement support for multi-char matches and to combine it with >>> case-folding. Hopefully that will iron out the final inconsistencies. > > I'm running bootstrap now to make sure I didn't break anything. Then I'll push. It is now pushed. I changed quite a bit of the logic, so please do look out for regressions. Things we do now: - 'ä' matches 'ä' - 'ä' matches 'ä' - 'a' matches both of them - 'ff' matches 'ff' - 'ff' does NOT match 'ff'. This is by choice, because the decomposition of 'ff' is actually (compat f f). We can change this choice if desired. - `case-fold-search' is respected. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 16:29 ` Artur Malabarba @ 2015-11-28 17:27 ` Eli Zaretskii 2015-11-28 17:44 ` Eli Zaretskii 1 sibling, 0 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-28 17:27 UTC (permalink / raw) To: bruce.connor.am; +Cc: emacs-devel > Date: Sat, 28 Nov 2015 16:29:00 +0000 > From: Artur Malabarba <bruce.connor.am@gmail.com> > Cc: emacs-devel <emacs-devel@gnu.org> > > It is now pushed. Thanks! > - 'ff' matches 'ff' > - 'ff' does NOT match 'ff'. This is by choice, because the > decomposition of 'ff' is actually (compat f f). We can change this > choice if desired. I think the last one should also match. It is very hard to explain to users this asymmetry. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 16:29 ` Artur Malabarba 2015-11-28 17:27 ` Eli Zaretskii @ 2015-11-28 17:44 ` Eli Zaretskii 2015-11-28 18:31 ` Artur Malabarba 1 sibling, 1 reply; 94+ messages in thread From: Eli Zaretskii @ 2015-11-28 17:44 UTC (permalink / raw) To: bruce.connor.am; +Cc: emacs-devel > Date: Sat, 28 Nov 2015 16:29:00 +0000 > From: Artur Malabarba <bruce.connor.am@gmail.com> > Cc: emacs-devel <emacs-devel@gnu.org> > > It is now pushed. I changed quite a bit of the logic, so please do > look out for regressions. Two of the tests are failing for me: Test character-fold--test-consistency condition: (invalid-regexp "Regular expression too big") FAILED 1/4 character-fold--test-consistency passed 2/4 character-fold--test-fold-to-regexp Test character-fold--test-lax-whitespace condition: (invalid-regexp "Regular expression too big") FAILED 3/4 character-fold--test-lax-whitespace passed 4/4 character-fold--test-some-defaults Let me know if I can provide more information. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 17:44 ` Eli Zaretskii @ 2015-11-28 18:31 ` Artur Malabarba 2015-11-28 18:57 ` Eli Zaretskii 0 siblings, 1 reply; 94+ messages in thread From: Artur Malabarba @ 2015-11-28 18:31 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 919 bytes --] On 28 Nov 2015 5:44 pm, "Eli Zaretskii" <eliz@gnu.org> wrote: > Two of the tests are failing for me: > > Test character-fold--test-consistency condition: > (invalid-regexp "Regular expression too big") > FAILED 1/4 character-fold--test-consistency > passed 2/4 character-fold--test-fold-to-regexp > Test character-fold--test-lax-whitespace condition: > (invalid-regexp "Regular expression too big") > FAILED 3/4 character-fold--test-lax-whitespace > passed 4/4 character-fold--test-some-defaults > > Let me know if I can provide more information. Yes, I was getting this too. I reduced the length of the random strings in the test from 100 to 50 in order to stop getting this. But it looks like your system wants it to be even lower. Can you try reducing it a bit more? Sadly, if you're forced to make it too small, then we'll have to think of another way to handle this. [-- Attachment #2: Type: text/html, Size: 1216 bytes --] ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 18:31 ` Artur Malabarba @ 2015-11-28 18:57 ` Eli Zaretskii 2015-11-28 20:00 ` Artur Malabarba 0 siblings, 1 reply; 94+ messages in thread From: Eli Zaretskii @ 2015-11-28 18:57 UTC (permalink / raw) To: bruce.connor.am; +Cc: emacs-devel > Date: Sat, 28 Nov 2015 18:31:58 +0000 > From: Artur Malabarba <bruce.connor.am@gmail.com> > Cc: emacs-devel <emacs-devel@gnu.org> > > > Test character-fold--test-consistency condition: > > (invalid-regexp "Regular expression too big") > > FAILED 1/4 character-fold--test-consistency > > passed 2/4 character-fold--test-fold-to-regexp > > Test character-fold--test-lax-whitespace condition: > > (invalid-regexp "Regular expression too big") > > FAILED 3/4 character-fold--test-lax-whitespace > > passed 4/4 character-fold--test-some-defaults > > > > Let me know if I can provide more information. > > Yes, I was getting this too. I reduced the length of the random strings in the > test from 100 to 50 in order to stop getting this. But it looks like your > system wants it to be even lower. > > Can you try reducing it a bit more? This works for me: diff --git a/test/automated/character-fold-tests.el b/test/automated/character-fold-tests.el index 3a288b9..cf19584 100644 --- a/test/automated/character-fold-tests.el +++ b/test/automated/character-fold-tests.el @@ -37,13 +37,13 @@ character-fold--test-search-with-contents \f (ert-deftest character-fold--test-consistency () - (dotimes (n 50) + (dotimes (n 30) (let ((w (character-fold--random-word n))) ;; A folded string should always match the original string. (character-fold--test-search-with-contents w w)))) (ert-deftest character-fold--test-lax-whitespace () - (dotimes (n 50) + (dotimes (n 40) (let ((w1 (character-fold--random-word n)) (w2 (character-fold--random-word n)) (search-spaces-regexp "\\s-+")) ^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 18:57 ` Eli Zaretskii @ 2015-11-28 20:00 ` Artur Malabarba 2015-11-28 20:08 ` Artur Malabarba 0 siblings, 1 reply; 94+ messages in thread From: Artur Malabarba @ 2015-11-28 20:00 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Ok. I think that's still plausible for interactive uses. I'll add a comment to the docstring warning about the danger of long regexps, and I'll make sure isearch acts gracefully if such a situation is ever encountered. Still, this function can probably be optimized. I'll try to revisit it before release. 2015-11-28 18:57 GMT+00:00 Eli Zaretskii <eliz@gnu.org>: >> Date: Sat, 28 Nov 2015 18:31:58 +0000 >> From: Artur Malabarba <bruce.connor.am@gmail.com> >> Cc: emacs-devel <emacs-devel@gnu.org> >> >> > Test character-fold--test-consistency condition: >> > (invalid-regexp "Regular expression too big") >> > FAILED 1/4 character-fold--test-consistency >> > passed 2/4 character-fold--test-fold-to-regexp >> > Test character-fold--test-lax-whitespace condition: >> > (invalid-regexp "Regular expression too big") >> > FAILED 3/4 character-fold--test-lax-whitespace >> > passed 4/4 character-fold--test-some-defaults >> > >> > Let me know if I can provide more information. >> >> Yes, I was getting this too. I reduced the length of the random strings in the >> test from 100 to 50 in order to stop getting this. But it looks like your >> system wants it to be even lower. >> >> Can you try reducing it a bit more? > > This works for me: > > diff --git a/test/automated/character-fold-tests.el b/test/automated/character-fold-tests.el > index 3a288b9..cf19584 100644 > --- a/test/automated/character-fold-tests.el > +++ b/test/automated/character-fold-tests.el > @@ -37,13 +37,13 @@ character-fold--test-search-with-contents > > > (ert-deftest character-fold--test-consistency () > - (dotimes (n 50) > + (dotimes (n 30) > (let ((w (character-fold--random-word n))) > ;; A folded string should always match the original string. > (character-fold--test-search-with-contents w w)))) > > (ert-deftest character-fold--test-lax-whitespace () > - (dotimes (n 50) > + (dotimes (n 40) > (let ((w1 (character-fold--random-word n)) > (w2 (character-fold--random-word n)) > (search-spaces-regexp "\\s-+")) ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 20:00 ` Artur Malabarba @ 2015-11-28 20:08 ` Artur Malabarba 2015-11-28 20:47 ` Eli Zaretskii 0 siblings, 1 reply; 94+ messages in thread From: Artur Malabarba @ 2015-11-28 20:08 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel 2015-11-28 20:00 GMT+00:00 Artur Malabarba <bruce.connor.am@gmail.com>: > Ok. I think that's still plausible for interactive uses. I'll add a > comment to the docstring warning about the danger of long regexps, and > I'll make sure isearch acts gracefully if such a situation is ever > encountered. Or is there a fixed limit I can use for the max size of a regexp? ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 20:08 ` Artur Malabarba @ 2015-11-28 20:47 ` Eli Zaretskii 0 siblings, 0 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-28 20:47 UTC (permalink / raw) To: bruce.connor.am; +Cc: emacs-devel > Date: Sat, 28 Nov 2015 20:08:52 +0000 > From: Artur Malabarba <bruce.connor.am@gmail.com> > Cc: emacs-devel <emacs-devel@gnu.org> > > 2015-11-28 20:00 GMT+00:00 Artur Malabarba <bruce.connor.am@gmail.com>: > > Ok. I think that's still plausible for interactive uses. I'll add a > > comment to the docstring warning about the danger of long regexps, and > > I'll make sure isearch acts gracefully if such a situation is ever > > encountered. > > Or is there a fixed limit I can use for the max size of a regexp? AFAIU, it's MAX_BUF_SIZE in regex.c. ^ permalink raw reply [flat|nested] 94+ messages in thread
* character folding future [was: Questions about isearch] 2015-11-28 10:23 ` Artur Malabarba 2015-11-28 11:14 ` Eli Zaretskii 2015-11-28 14:41 ` Eli Zaretskii @ 2015-11-28 16:48 ` Drew Adams 2015-11-28 18:34 ` Artur Malabarba 2015-12-01 11:34 ` Artur Malabarba 2015-11-29 6:03 ` Questions about isearch Richard Stallman 2015-11-29 9:39 ` Andreas Röhler 4 siblings, 2 replies; 94+ messages in thread From: Drew Adams @ 2015-11-28 16:48 UTC (permalink / raw) To: bruce.connor.am, Eli Zaretskii Cc: Stephen Berman, Richard Stallman, emacs-devel > Ok. I'm going to work on the char-folding a little bit more today to > implement support for multi-char matches and to combine it with > case-folding. Hopefully that will iron out the final inconsistencies. Thanks for working on this, Artur. I invite you to also take a look at some code I wrote for this, which I've put in `character-fold+.el'. It follows a previous discussion. Any of that, or similar, that gets added to vanilla Emacs will mean one less thing for me to bother with. ;-) A description is here: http://www.emacswiki.org/emacs/CharacterFoldPlus. The code is here: http://www.emacswiki.org/emacs/download/character-fold%2b.el The additions are essentially these: 1. An option, `char-fold-ad-hoc', for the ad hoc char foldings. Default value: the same ad hoc foldings as vanilla Emacs (quotation marks). 2. A Boolean option, `char-fold-symmetric', which when non-nil means that all members of a folding equivalence class are treated equivalently, whether base char, compositions, or other strings of chars. This lets you search for e' or é and find e and any of the other members of its class (including composition strings). The default value is nil (off). 3. A general workhorse function, `update-char-fold-table', that updates the value of variable `character-fold-table' (from which it was derived). It is used when option `char-fold-symmetric' is toggled, and it makes use of options `char-fold-ad-hoc' and `char-fold-symmetric'. 4. `character-fold-to-regexp' is advised, to reflect whether char folding is currently symmetric. Library Isearch+ provides a toggle for `char-fold-symmetric', bound by default to `M-s =' during Isearch. Another Isearch toggle can be useful when char folding is symmetric: `M-s h L', which toggles lazy highlighting, which can slow things down when using symmetric char folding. The code for `isearch+.el' is here: http://www.emacswiki.org/emacs/download/isearch%2b.el Earlier, I invited a discussion about future customization of character folding (and folding in general). That hasn't happened, so far. But `char-fold-ad-hoc' could be a start. One possibility is for an alist option, whose entries would each be a list (MODES CLASSES), where CLASSES is a list of char-folding classes such as that of `char-fold-ad-hoc'. When any of the MODES is current, those CLASSES would be used by `update-char-fold-table'. Users could thus: 1. Add their own equivalence classes. 2. Associate any number of such classes with particular modes. 3. Customize the ad hoc classes used by default. In addition, we could provide the class that abstracts from diacriticals explicitly, as another, non-customizable (?) class, so that users could include or exclude it too wrt specific modes. (Currently it is implicit in char folding, i.e., hard-coded.) Letting users exclude the broad diacritical class and include their own classes would accomodate wanting some diacritical foldings but not others. With symmetric folding it should offer considerable flexibility. Utility functions that do some of the work currently done by `update-char-fold-table' could be created, to be used by users to easily create their own diacritical classes. Currently, that part is still hard-coded (only ad hoc foldings are open to user customization, so far). ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: character folding future [was: Questions about isearch] 2015-11-28 16:48 ` character folding future [was: Questions about isearch] Drew Adams @ 2015-11-28 18:34 ` Artur Malabarba 2015-12-01 11:34 ` Artur Malabarba 1 sibling, 0 replies; 94+ messages in thread From: Artur Malabarba @ 2015-11-28 18:34 UTC (permalink / raw) To: Drew Adams; +Cc: Eli Zaretskii, Stephen Berman, emacs-devel 2015-11-28 16:48 GMT+00:00 Drew Adams <drew.adams@oracle.com>: > > A description is here: > http://www.emacswiki.org/emacs/CharacterFoldPlus. > The code is here: > http://www.emacswiki.org/emacs/download/character-fold%2b.el Thanks for the links Drew. I'll have a look at your code to see how you tackled these items. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: character folding future [was: Questions about isearch] 2015-11-28 16:48 ` character folding future [was: Questions about isearch] Drew Adams 2015-11-28 18:34 ` Artur Malabarba @ 2015-12-01 11:34 ` Artur Malabarba 2015-12-01 15:48 ` Drew Adams 1 sibling, 1 reply; 94+ messages in thread From: Artur Malabarba @ 2015-12-01 11:34 UTC (permalink / raw) To: Drew Adams; +Cc: Eli Zaretskii, Stephen Berman, Richard Stallman, emacs-devel 2015-11-28 16:48 GMT+00:00 Drew Adams <drew.adams@oracle.com>: > 1. An option, `char-fold-ad-hoc', for the ad hoc char foldings. > Default value: the same ad hoc foldings as vanilla Emacs > (quotation marks). Thanks for the code again. A list of ad-hoc foldings for the user to customize (your `char-fold-ad-hoc') is something I want too (ideally, as soon as 25.1). The reason I didn't include it initially is that the character-fold-table can take many seconds to generate, so it's pretty important that it be generated at compile time. I suppose one solution is to make it a defcustom with a :set property that updates the char-fold-table, and clearly state in the docstring that editing this variable can add several seconds to emacs startup time. ^ permalink raw reply [flat|nested] 94+ messages in thread
* RE: character folding future [was: Questions about isearch] 2015-12-01 11:34 ` Artur Malabarba @ 2015-12-01 15:48 ` Drew Adams 2015-12-03 23:54 ` Artur Malabarba 0 siblings, 1 reply; 94+ messages in thread From: Drew Adams @ 2015-12-01 15:48 UTC (permalink / raw) To: bruce.connor.am Cc: Eli Zaretskii, Stephen Berman, Richard Stallman, emacs-devel > > 1. An option, `char-fold-ad-hoc', for the ad hoc char foldings. > > Default value: the same ad hoc foldings as vanilla Emacs > > (quotation marks). > > Thanks for the code again. A list of ad-hoc foldings for the user to > customize (your `char-fold-ad-hoc') is something I want too (ideally, > as soon as 25.1). The reason I didn't include it initially is that the > character-fold-table can take many seconds to generate, so it's pretty > important that it be generated at compile time. > > I suppose one solution is to make it a defcustom with a :set property > that updates the char-fold-table, and clearly state in the docstring > that editing this variable can add several seconds to emacs startup > time. 1. Do you really see that "character-fold-table can take many seconds to generate"? I don't see that, AFAICT. 2. Have you tried it? What difference do you see in the generation time? Did you really see that it "can add several seconds"? 3. I do it now in `character-fold+.el'. (Did it for `char-fold-symmetric' from the outset, but just now added it also for `char-fold-ad-hoc'.) And AFAICT there is no noticeable time difference in initializing, and none for updating `character-fold-table' after a user customizes `char-fold-ad-hoc'. Not noticeable is quite different from "many seconds" or "several seconds". Maybe this is platform dependent? I'm using MS Windows 7 on an average laptop that is a few years old (nothing special wrt memory or CPU). Do you see the same thing I see, in terms of time, if you try `character-fold+.el'? 4. There _is_ a noticeable delay when a user customizes `char-fold-symmetric', of course - that does a lot more work. But there is no delay for that initially. It is off by default, so `update-char-fold-table' does nothing with it, except when a user customizes or toggles it on. 5. I think it makes sense in any case to factor out the code that creates/updates the table (as in my function `update-char-fold-table'). ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: character folding future [was: Questions about isearch] 2015-12-01 15:48 ` Drew Adams @ 2015-12-03 23:54 ` Artur Malabarba 0 siblings, 0 replies; 94+ messages in thread From: Artur Malabarba @ 2015-12-03 23:54 UTC (permalink / raw) To: Drew Adams; +Cc: Eli Zaretskii, Stephen Berman, Richard Stallman, emacs-devel Drew Adams <drew.adams@oracle.com> writes: > 1. Do you really see that "character-fold-table can take many > seconds to generate"? I don't see that, AFAICT. > > 2. ... No, you're right. I guess I was still carrying my memories from the initial implementations, which did take a few seconds. The current version takes ~0.3 sec on my machine if byte-compiled. While that's far from pleasant (for a lot of people, +0.3 sec of startup time would be noticeable), I guess it's reasonable enough. After all, the user will be warned in the docstring about this caveat. > 5. I think it makes sense in any case to factor out the > code that creates/updates the table (as in my function > `update-char-fold-table'). I agree. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 10:23 ` Artur Malabarba ` (2 preceding siblings ...) 2015-11-28 16:48 ` character folding future [was: Questions about isearch] Drew Adams @ 2015-11-29 6:03 ` Richard Stallman 2015-11-29 15:48 ` Eli Zaretskii 2015-11-29 9:39 ` Andreas Röhler 4 siblings, 1 reply; 94+ messages in thread From: Richard Stallman @ 2015-11-29 6:03 UTC (permalink / raw) To: bruce.connor.am; +Cc: eliz, stephen.berman, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] Would you please set it up so that ^J in search does not match anything but a newline? I often used to search for a blank line with C-s C-j C-j. I often used to search for WORD at the start of a line with C-s C-j WORD. They are both broken now. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-29 6:03 ` Questions about isearch Richard Stallman @ 2015-11-29 15:48 ` Eli Zaretskii 0 siblings, 0 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-29 15:48 UTC (permalink / raw) To: rms; +Cc: stephen.berman, bruce.connor.am, emacs-devel > From: Richard Stallman <rms@gnu.org> > CC: eliz@gnu.org, stephen.berman@gmx.net, emacs-devel@gnu.org > Date: Sun, 29 Nov 2015 01:03:40 -0500 > > Would you please set it up so that ^J in search > does not match anything but a newline? > > I often used to search for a blank line with C-s C-j C-j. > I often used to search for WORD at the start of a line > with C-s C-j WORD. They are both broken now. Both of these work for me, on the emacs-25 branch and on master. When did you last update your repository? ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-28 10:23 ` Artur Malabarba ` (3 preceding siblings ...) 2015-11-29 6:03 ` Questions about isearch Richard Stallman @ 2015-11-29 9:39 ` Andreas Röhler 2015-11-29 15:52 ` Eli Zaretskii 2015-11-30 16:05 ` Paul Eggert 4 siblings, 2 replies; 94+ messages in thread From: Andreas Röhler @ 2015-11-29 9:39 UTC (permalink / raw) To: emacs-devel; +Cc: Eli Zaretskii, Artur Malabarba Am 28.11.2015 um 11:23 schrieb Artur Malabarba: > Ok. I'm going to work on the char-folding a little bit more today to > implement support for multi-char matches and to combine it with > case-folding. Hopefully that will iron out the final inconsistencies. As mentioned ealier, this runs into the infinite. Not only new languages arise every day. Think also at ancient languages. Think at math and new symbolic languages. The possibilities of combining known and still unknown characters tend to be infinite. Char-folding is an indo-european centric sledge-hammer: heavy and limited. > > 2015-11-28 9:46 GMT+00:00 Eli Zaretskii <eliz@gnu.org>: >>> From: Marcin Borkowski <mbork@mbork.pl> >>> Date: Sat, 28 Nov 2015 09:40:06 +0100 >>> Cc: eliz@gnu.org, Stephen Berman <stephen.berman@gmx.net>, >>> bruce.connor.am@gmail.com, emacs-devel@gnu.org >>> >>> Watching this discussion, I'm just astonished that no-one complained >>> (yet?) that searching for "et" does not find "&" (and/or vice versa). >> >> Why complain? Emacs lets you customize this feature to do that as >> well. > ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-29 9:39 ` Andreas Röhler @ 2015-11-29 15:52 ` Eli Zaretskii 2015-11-30 9:39 ` Andreas Röhler 2015-11-30 16:05 ` Paul Eggert 1 sibling, 1 reply; 94+ messages in thread From: Eli Zaretskii @ 2015-11-29 15:52 UTC (permalink / raw) To: Andreas Röhler; +Cc: bruce.connor.am, emacs-devel > Cc: Eli Zaretskii <eliz@gnu.org>, Artur Malabarba > <bruce.connor.am@gmail.com>, Marcin Borkowski <mbork@mbork.pl> > From: Andreas Röhler <andreas.roehler@online.de> > Date: Sun, 29 Nov 2015 10:39:06 +0100 > > Not only new languages arise every day. Think also at ancient languages. Emacs's search capabilities are language-agnostic. So the number of languages, whether finite or infinite, doesn't affect the issues being discussed. Generally, with very few exceptions, letters and symbols that belong to some script are not folded to or with characters of other scripts, because the Unicode database precludes that. So each new language and script simply adds more assigned codepoints for its characters, but has no effect whatsoever on character folding or on Emacs search capabilities. > Think at math and new symbolic languages. Are you saying that searching for א should not find ℵ? Or that looking for π should not find ℼ? Or 1 shouldn't find 𝟏? Not even as an option? Why should we deprive Emacs users of such an important feature? Those who don't want it can always customize their Emacs not to do that. > Char-folding is an indo-european centric sledge-hammer: heavy and limited. This is nothing but unfounded name-calling. Please don't. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-29 15:52 ` Eli Zaretskii @ 2015-11-30 9:39 ` Andreas Röhler 2015-11-30 15:53 ` Eli Zaretskii 0 siblings, 1 reply; 94+ messages in thread From: Andreas Röhler @ 2015-11-30 9:39 UTC (permalink / raw) To: Eli Zaretskii; +Cc: bruce.connor.am, emacs-devel Am 29.11.2015 um 16:52 schrieb Eli Zaretskii: >> Cc: Eli Zaretskii <eliz@gnu.org>, Artur Malabarba >> <bruce.connor.am@gmail.com>, Marcin Borkowski <mbork@mbork.pl> >> From: Andreas Röhler <andreas.roehler@online.de> >> Date: Sun, 29 Nov 2015 10:39:06 +0100 >> >> Not only new languages arise every day. Think also at ancient languages. > > Emacs's search capabilities are language-agnostic. AFAIU notion of case-folding refers to the idea of upper- or lowercase - which makes sense only with a couple of languages. If case-folding is off by default, there should not be no harm. So let's wait for the next report resp. feature requests WRT to folding. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-30 9:39 ` Andreas Röhler @ 2015-11-30 15:53 ` Eli Zaretskii 0 siblings, 0 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-30 15:53 UTC (permalink / raw) To: Andreas Röhler; +Cc: bruce.connor.am, emacs-devel > Cc: emacs-devel@gnu.org, bruce.connor.am@gmail.com, mbork@mbork.pl > From: Andreas Röhler <andreas.roehler@online.de> > Date: Mon, 30 Nov 2015 10:39:45 +0100 > > Emacs's search capabilities are language-agnostic. > > AFAIU notion of case-folding refers to the idea of upper- or lowercase - which makes sense only with a couple of languages. That appears to be incorrect, because UCD, the Unicode Character Database, is not tailored to any language in particular, and yet it does specify letter-case pairs for many characters beyond ASCII. The language-specific variations to this basic data are then provided by further databases, such as CLDR. Emacs currently supports only the language-independent part of case folding, character equivalences, and other related features. Addition of new languages does not and cannot affect that. > If case-folding is off by default, there should not be no harm. Case folding was ON by default in Emacs since about forever. It is natural to many, and can be easily turned off by those who don't like it. That is why, IMO, we hear almost no complaints about it. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-29 9:39 ` Andreas Röhler 2015-11-29 15:52 ` Eli Zaretskii @ 2015-11-30 16:05 ` Paul Eggert 1 sibling, 0 replies; 94+ messages in thread From: Paul Eggert @ 2015-11-30 16:05 UTC (permalink / raw) To: Andreas Röhler, emacs-devel On 11/29/2015 01:39 AM, Andreas Röhler wrote: > Char-folding is an indo-european centric sledge-hammer While that may be true, it is a very commonly-used sledgehammer. Sometimes sledgehammers are good tools to use. As Eli says, case-folding has been on by default in Emacs for ages. Case-folding is also Indo-European-centric, but that has been OK. As char-folding by and large does not affect unicase alphabets I don't see why readers and writers of non-Indo-European text would care about it one way or another. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 20:14 ` Artur Malabarba 2015-11-25 20:30 ` Marcin Borkowski 2015-11-25 20:36 ` Eli Zaretskii @ 2015-11-26 16:08 ` Rasmus 2 siblings, 0 replies; 94+ messages in thread From: Rasmus @ 2015-11-26 16:08 UTC (permalink / raw) To: emacs-devel Artur Malabarba <bruce.connor.am@gmail.com> writes: >> 2. It also doesn't match ä (a single character) with ä (2 characters, >> which Emacs correctly composes into 1 grapheme cluster). Should it? > > Possibly. Since they look the same, might make things easier on users. But > I wouldn't know as I've never seen the second version used anywhere. Based on how they look on my screen (superscripted by underlined), I'd used these symbols for addresses in Spain, e.g. in Catalan, Carrer del Cabanes 15, 2ö, 3ä secondO piso, tercerA puerta. Rasmus -- It was you, Jezebel, it was you ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 18:41 Questions about isearch Eli Zaretskii 2015-11-25 19:20 ` Rasmus 2015-11-25 20:14 ` Artur Malabarba @ 2015-11-25 23:15 ` Mike Kupfer 2015-11-26 14:45 ` Richard Stallman ` (2 subsequent siblings) 5 siblings, 0 replies; 94+ messages in thread From: Mike Kupfer @ 2015-11-25 23:15 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel Eli Zaretskii wrote: > 2. It also doesn't match ä (a single character) with ä (2 characters, > which Emacs correctly composes into 1 grapheme cluster). Should it? They should match IMO. The difference between composed and decomposed characters should be an implementation detail that's not exposed to users. mike ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 18:41 Questions about isearch Eli Zaretskii ` (2 preceding siblings ...) 2015-11-25 23:15 ` Mike Kupfer @ 2015-11-26 14:45 ` Richard Stallman 2015-11-27 0:43 ` Juri Linkov 2015-11-27 8:02 ` Andreas Röhler 5 siblings, 0 replies; 94+ messages in thread From: Richard Stallman @ 2015-11-26 14:45 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > 1. Character folding doesn't catch ligatures, such as æ (should it match > the two characters "ae")? > 2. It also doesn't match ä (a single character) with ä (2 characters, > which Emacs correctly composes into 1 grapheme cluster). Should it? This might be a good thing to poll the users about. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 18:41 Questions about isearch Eli Zaretskii ` (3 preceding siblings ...) 2015-11-26 14:45 ` Richard Stallman @ 2015-11-27 0:43 ` Juri Linkov 2015-11-27 8:07 ` Eli Zaretskii 2015-11-27 8:02 ` Andreas Röhler 5 siblings, 1 reply; 94+ messages in thread From: Juri Linkov @ 2015-11-27 0:43 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel > 3. With the default value t of isearch-hide-immediately, one match in > invisible text is not hidden, and remains on display. To repro: > > emacs -Q > C-x C-f etc/NEWS RET > C-c C-q > C-s require C-s <RIGHT> > > This leaves the match and its surrounding hidden text on screen. I > can understand the rationale, but the doc string doesn't say anything > about this feature. On the contrary, it says: > > Whatever the value, all opened invisible text is hidden again after > exiting the search. ^^^ I see no answers to your 3rd question, so I wanted to clarify whether this is something new or can you reproduce the same in older versions? ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 0:43 ` Juri Linkov @ 2015-11-27 8:07 ` Eli Zaretskii 2015-11-27 23:24 ` Juri Linkov 0 siblings, 1 reply; 94+ messages in thread From: Eli Zaretskii @ 2015-11-27 8:07 UTC (permalink / raw) To: Juri Linkov; +Cc: emacs-devel > From: Juri Linkov <juri@linkov.net> > Cc: emacs-devel@gnu.org > Date: Fri, 27 Nov 2015 02:43:54 +0200 > > > 3. With the default value t of isearch-hide-immediately, one match in > > invisible text is not hidden, and remains on display. To repro: > > > > emacs -Q > > C-x C-f etc/NEWS RET > > C-c C-q > > C-s require C-s <RIGHT> > > > > This leaves the match and its surrounding hidden text on screen. I > > can understand the rationale, but the doc string doesn't say anything > > about this feature. On the contrary, it says: > > > > Whatever the value, all opened invisible text is hidden again after > > exiting the search. ^^^ > > I see no answers to your 3rd question, so I wanted to clarify whether > this is something new or can you reproduce the same in older versions? I don't know if it's new; it probably isn't. And that isn't my problem; my problem that triggered that question is solely to see that the documentation of this option is correct and accurate. So the only question that bothers me at this time is whether what I described is the intended behavior, in which case the doc string needs to be fixed (and in fact I already fixed it to that effect). Or maybe the doc string is right and the code is wrong. Can you tell? Thanks. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 8:07 ` Eli Zaretskii @ 2015-11-27 23:24 ` Juri Linkov 2015-11-28 8:09 ` Eli Zaretskii 0 siblings, 1 reply; 94+ messages in thread From: Juri Linkov @ 2015-11-27 23:24 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel >> > 3. With the default value t of isearch-hide-immediately, one match in >> > invisible text is not hidden, and remains on display. To repro: >> > >> > emacs -Q >> > C-x C-f etc/NEWS RET >> > C-c C-q >> > C-s require C-s <RIGHT> >> > >> > This leaves the match and its surrounding hidden text on screen. I >> > can understand the rationale, but the doc string doesn't say anything >> > about this feature. On the contrary, it says: >> > >> > Whatever the value, all opened invisible text is hidden again after >> > exiting the search. ^^^ >> >> I see no answers to your 3rd question, so I wanted to clarify whether >> this is something new or can you reproduce the same in older versions? > > I don't know if it's new; it probably isn't. And that isn't my > problem; my problem that triggered that question is solely to see that > the documentation of this option is correct and accurate. > > So the only question that bothers me at this time is whether what I > described is the intended behavior, in which case the doc string needs > to be fixed (and in fact I already fixed it to that effect). Or maybe > the doc string is right and the code is wrong. > > Can you tell? I believe this is the intended behavior since the comment of isearch-clean-overlays says this explicitly: ;; This is called when exiting isearch. It closes the temporary ;; opened overlays, except the ones that contain the latest match. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 23:24 ` Juri Linkov @ 2015-11-28 8:09 ` Eli Zaretskii 0 siblings, 0 replies; 94+ messages in thread From: Eli Zaretskii @ 2015-11-28 8:09 UTC (permalink / raw) To: Juri Linkov; +Cc: emacs-devel > From: Juri Linkov <juri@linkov.net> > Cc: emacs-devel@gnu.org > Date: Sat, 28 Nov 2015 01:24:36 +0200 > > I believe this is the intended behavior since the comment of > isearch-clean-overlays says this explicitly: > > ;; This is called when exiting isearch. It closes the temporary > ;; opened overlays, except the ones that contain the latest match. > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Thanks, this means the changes I did in the documentation are TRT. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-25 18:41 Questions about isearch Eli Zaretskii ` (4 preceding siblings ...) 2015-11-27 0:43 ` Juri Linkov @ 2015-11-27 8:02 ` Andreas Röhler 2015-11-27 8:57 ` Eli Zaretskii 5 siblings, 1 reply; 94+ messages in thread From: Andreas Röhler @ 2015-11-27 8:02 UTC (permalink / raw) To: emacs-devel; +Cc: Eli Zaretskii Am 25.11.2015 um 19:41 schrieb Eli Zaretskii: > These questions came out of review and extensive updates of the search > and replace sections of the Emacs manual: > > 1. Character folding doesn't catch ligatures, such as æ (should it match > the two characters "ae")? > > 2. It also doesn't match ä (a single character) with ä (2 characters, > which Emacs correctly composes into 1 grapheme cluster). Should it? > > 3. With the default value t of isearch-hide-immediately, one match in > invisible text is not hidden, and remains on display. To repro: > > emacs -Q > C-x C-f etc/NEWS RET > C-c C-q > C-s require C-s <RIGHT> > > This leaves the match and its surrounding hidden text on screen. I > can understand the rationale, but the doc string doesn't say anything > about this feature. On the contrary, it says: > > Whatever the value, all opened invisible text is hidden again after > exiting the search. ^^^ > > 4. What is the equivalent of case-replace and the letter-case related > behavior of replace commands to character folding? E.g., if the > replace command specifies to replace "foo" with "bar", and we found > "föo", should we replace it with "bär" or something, by analogy with > letter-case behavior? > > Considering language special cases worldwide at core will run into infinity. Would expect support of unicode-characters. Mapping them should be the task of special language-modes built upon, i.e. a text-norwegian etc. In order to support languages, isearch might accept modifiers, like fill-paragraph does with fill-paragraph-function. Thus the regexps handed over may change. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 8:02 ` Andreas Röhler @ 2015-11-27 8:57 ` Eli Zaretskii 2015-11-27 10:03 ` Artur Malabarba 0 siblings, 1 reply; 94+ messages in thread From: Eli Zaretskii @ 2015-11-27 8:57 UTC (permalink / raw) To: Andreas Röhler; +Cc: emacs-devel > Cc: Eli Zaretskii <eliz@gnu.org> > From: Andreas Röhler <andreas.roehler@online.de> > Date: Fri, 27 Nov 2015 09:02:19 +0100 > > > 4. What is the equivalent of case-replace and the letter-case related > > behavior of replace commands to character folding? E.g., if the > > replace command specifies to replace "foo" with "bar", and we found > > "föo", should we replace it with "bär" or something, by analogy with > > letter-case behavior? > > Considering language special cases worldwide at core will run into infinity. The number of languages is finite. > Would expect support of unicode-characters. Mapping them should be the > task of special language-modes built upon, i.e. a text-norwegian etc. The question I asked is should we do that _in_general_? If the answer is YES, then the language-specific rules might tell _how_ to do that in each case. But that's a different issue. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 8:57 ` Eli Zaretskii @ 2015-11-27 10:03 ` Artur Malabarba 2015-11-27 10:29 ` Eli Zaretskii 2015-11-29 9:08 ` Andreas Röhler 0 siblings, 2 replies; 94+ messages in thread From: Artur Malabarba @ 2015-11-27 10:03 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Andreas Röhler, emacs-devel [-- Attachment #1: Type: text/plain, Size: 790 bytes --] On 27 Nov 2015 8:57 am, "Eli Zaretskii" <eliz@gnu.org> wrote: > > Considering language special cases worldwide at core will run into infinity. > > The number of languages is finite. > > > Would expect support of unicode-characters. Mapping them should be the > > task of special language-modes built upon, i.e. a text-norwegian etc. > > The question I asked is should we do that _in_general_? If the answer > is YES, then the language-specific rules might tell _how_ to do that > in each case. But that's a different issue. I think this topic goes beyond isearch, and people not reading the current thread might interested in it. Maybe we should should start a new thread just for this special "language support". Starting by listing points of what would be the goals of such a feature. [-- Attachment #2: Type: text/html, Size: 981 bytes --] ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 10:03 ` Artur Malabarba @ 2015-11-27 10:29 ` Eli Zaretskii 2015-11-27 10:47 ` Artur Malabarba 2015-11-29 9:08 ` Andreas Röhler 1 sibling, 1 reply; 94+ messages in thread From: Eli Zaretskii @ 2015-11-27 10:29 UTC (permalink / raw) To: bruce.connor.am; +Cc: andreas.roehler, emacs-devel > Date: Fri, 27 Nov 2015 10:03:33 +0000 > From: Artur Malabarba <bruce.connor.am@gmail.com> > Cc: emacs-devel <emacs-devel@gnu.org>, Andreas Röhler <andreas.roehler@online.de> > > Maybe we should should start a new thread just for this special > "language support". Starting by listing points of what would be the > goals of such a feature. Please feel free. But I must say that my OP was triggered by the need to document the new features mentioned in NEWS as not yet documented. This is part of preparing Emacs for the release of v25.1. I asked those questions because working on the documentation made me wonder what should and shouldn't work, such that the parts that we intend to work are documented, and the entire feature is fairly complete and self-consistent. Design and implementation of new significant features should take a back seat at this time, if we want to release Emacs 25.1 any time soon. ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 10:29 ` Eli Zaretskii @ 2015-11-27 10:47 ` Artur Malabarba 0 siblings, 0 replies; 94+ messages in thread From: Artur Malabarba @ 2015-11-27 10:47 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Andreas Röhler, emacs-devel [-- Attachment #1: Type: text/plain, Size: 648 bytes --] On 27 Nov 2015 10:29 am, "Eli Zaretskii" <eliz@gnu.org> wrote: > But I must say that my OP was triggered by the need to document the > new features mentioned in NEWS as not yet documented. This is part of > preparing Emacs for the release of v25.1. I asked those questions > because working on the documentation made me wonder what should and > shouldn't work, such that the parts that we intend to work are > documented, and the entire feature is fairly complete and > self-consistent. Design and implementation of new significant > features should take a back seat at this time, if we want to release > Emacs 25.1 any time soon. 100% agreed. [-- Attachment #2: Type: text/html, Size: 813 bytes --] ^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch 2015-11-27 10:03 ` Artur Malabarba 2015-11-27 10:29 ` Eli Zaretskii @ 2015-11-29 9:08 ` Andreas Röhler 1 sibling, 0 replies; 94+ messages in thread From: Andreas Röhler @ 2015-11-29 9:08 UTC (permalink / raw) To: bruce.connor.am, Eli Zaretskii; +Cc: emacs-devel Am 27.11.2015 um 11:03 schrieb Artur Malabarba: > On 27 Nov 2015 8:57 am, "Eli Zaretskii" <eliz@gnu.org > <mailto:eliz@gnu.org>> wrote: > > > Considering language special cases worldwide at core will run into > infinity. > > > > The number of languages is finite. > > > > > Would expect support of unicode-characters. Mapping them should be the > > > task of special language-modes built upon, i.e. a text-norwegian etc. > > > > The question I asked is should we do that _in_general_? If the answer > > is YES, then the language-specific rules might tell _how_ to do that > > in each case. But that's a different issue. > > I think this topic goes beyond isearch, and people not reading the > current thread might interested in it. Maybe we should should start a > new thread just for this special "language support". Starting by listing > points of what would be the goals of such a feature. > As isearch accepts a regexps as argument, it should be possible to implement mode-specific commands. Probably not a task of the core. ^ permalink raw reply [flat|nested] 94+ messages in thread
end of thread, other threads:[~2016-03-01 16:52 UTC | newest] Thread overview: 94+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-11-25 18:41 Questions about isearch Eli Zaretskii 2015-11-25 19:20 ` Rasmus 2015-11-25 20:02 ` Steinar Bang 2015-11-26 14:46 ` Richard Stallman 2015-11-26 16:22 ` Eli Zaretskii 2015-11-26 20:46 ` Per Starbäck 2015-11-26 21:02 ` Eli Zaretskii 2015-11-26 21:35 ` Marcin Borkowski 2015-11-27 7:43 ` Eli Zaretskii 2015-11-27 6:38 ` Richard Stallman 2015-11-27 8:53 ` Eli Zaretskii 2015-11-27 16:21 ` raman 2015-11-26 23:18 ` Rasmus 2015-11-27 7:46 ` Eli Zaretskii 2015-11-27 6:37 ` Richard Stallman 2015-11-27 8:39 ` Eli Zaretskii 2015-11-25 20:10 ` Eli Zaretskii 2015-11-25 20:41 ` Mike Kupfer 2015-11-25 20:56 ` Eli Zaretskii 2015-11-25 20:14 ` Artur Malabarba 2015-11-25 20:30 ` Marcin Borkowski 2015-11-25 20:38 ` Eli Zaretskii 2015-11-25 21:58 ` Artur Malabarba 2015-11-25 23:04 ` Mike Kupfer 2015-11-26 3:40 ` Eli Zaretskii 2015-11-27 19:50 ` Mike Kupfer 2015-11-27 20:06 ` Eli Zaretskii 2015-11-27 23:57 ` Artur Malabarba 2015-11-28 1:36 ` Mike Kupfer 2015-11-28 9:28 ` Eli Zaretskii 2015-11-26 13:28 ` Steinar Bang 2015-11-25 20:36 ` Eli Zaretskii 2015-11-25 21:49 ` Artur Malabarba 2015-11-26 3:34 ` Eli Zaretskii 2015-11-27 12:03 ` Artur Malabarba 2015-11-27 14:36 ` Eli Zaretskii 2015-11-27 16:50 ` Per Starbäck 2015-11-27 18:10 ` Artur Malabarba 2015-11-27 18:42 ` Per Starbäck 2015-11-27 21:33 ` raman 2016-02-28 0:27 ` Mathias Dahl 2016-02-28 15:58 ` Eli Zaretskii 2016-02-28 17:52 ` Mathias Dahl 2016-02-28 18:02 ` Eli Zaretskii 2016-02-29 13:32 ` Richard Stallman 2016-02-29 16:04 ` Eli Zaretskii 2016-03-01 16:52 ` Richard Stallman 2015-11-27 16:55 ` Artur Malabarba 2015-11-27 17:52 ` Eli Zaretskii 2015-11-27 21:18 ` Stephen Berman 2015-11-28 0:04 ` Artur Malabarba 2015-11-28 7:49 ` Eli Zaretskii 2015-11-28 16:14 ` Stephen Berman 2015-11-28 5:36 ` Richard Stallman 2015-11-28 8:33 ` Eli Zaretskii 2015-11-28 8:40 ` Marcin Borkowski 2015-11-28 9:46 ` Eli Zaretskii 2015-11-28 10:23 ` Artur Malabarba 2015-11-28 11:14 ` Eli Zaretskii 2015-11-28 14:41 ` Eli Zaretskii 2015-11-28 15:41 ` Artur Malabarba 2015-11-28 16:29 ` Artur Malabarba 2015-11-28 17:27 ` Eli Zaretskii 2015-11-28 17:44 ` Eli Zaretskii 2015-11-28 18:31 ` Artur Malabarba 2015-11-28 18:57 ` Eli Zaretskii 2015-11-28 20:00 ` Artur Malabarba 2015-11-28 20:08 ` Artur Malabarba 2015-11-28 20:47 ` Eli Zaretskii 2015-11-28 16:48 ` character folding future [was: Questions about isearch] Drew Adams 2015-11-28 18:34 ` Artur Malabarba 2015-12-01 11:34 ` Artur Malabarba 2015-12-01 15:48 ` Drew Adams 2015-12-03 23:54 ` Artur Malabarba 2015-11-29 6:03 ` Questions about isearch Richard Stallman 2015-11-29 15:48 ` Eli Zaretskii 2015-11-29 9:39 ` Andreas Röhler 2015-11-29 15:52 ` Eli Zaretskii 2015-11-30 9:39 ` Andreas Röhler 2015-11-30 15:53 ` Eli Zaretskii 2015-11-30 16:05 ` Paul Eggert 2015-11-26 16:08 ` Rasmus 2015-11-25 23:15 ` Mike Kupfer 2015-11-26 14:45 ` Richard Stallman 2015-11-27 0:43 ` Juri Linkov 2015-11-27 8:07 ` Eli Zaretskii 2015-11-27 23:24 ` Juri Linkov 2015-11-28 8:09 ` Eli Zaretskii 2015-11-27 8:02 ` Andreas Röhler 2015-11-27 8:57 ` Eli Zaretskii 2015-11-27 10:03 ` Artur Malabarba 2015-11-27 10:29 ` Eli Zaretskii 2015-11-27 10:47 ` Artur Malabarba 2015-11-29 9:08 ` Andreas Röhler
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.