* char equivalence classes in search - why not symmetric? @ 2015-09-01 15:46 Drew Adams 2015-09-01 15:52 ` Davis Herring 2015-09-01 16:16 ` Eli Zaretskii 0 siblings, 2 replies; 86+ messages in thread From: Drew Adams @ 2015-09-01 15:46 UTC (permalink / raw) To: emacs-devel When character folding is turned on, shouldn't you be able to search for á and find (match) a, à, ã, ª, â, å, and ä? I think so. Currently you cannot - you can only do the reverse: search for a and find any of the above. a is treated specially. Why? I suppose that the logic behind the current implementation is to mirror what we do with case-fold searching. But is that the right thing in this case? For case-fold searching, it was thought that if you bother to hold the Shift key and thus use an uppercase letter then you want to match case, and otherwise you do not (case-insensitive). This was essentially, I think, a shortcut for programmers, and it was introduced at a time when much of the code being searched was case-ambivalent. (UNIX was still pretty much an exception at that point, in distinguishing lowercase letters.) Whether or not this behavior for case-fold is still a good thing is questionable now, I think. I don't think it is necessary now or particularly useful. And I think it can be confusing to newbies. Why should searching for A be different from searching for a, wrt case matching? But I'm not really questioning the behavior of case-fold searching now. I am questioning applying this same behavior to char folding. To me, folding a group of chars together for search purposes should be symmetric - go both ways. It should, in effect, treat the given group of chars as equivalent - as an equivalence class wrt searching. Why not? Why, when char folding, treat plain a specially for searching? Why not treat á, a, à, ã, ª, â, å, and ä the same? Isn't that the point here? We are telling Isearch that they are equivalent. Why pick one of them as the canonical search-pattern to use for finding any of them? Why privilege a over á, a, à, ã, ª, â, å, and ä? Now most of the time I, like most people, will by typing a instead of á into a search string. But that's not really the point. I think users should be able to use any members of an equivalence class of chars indifferently. And when it comes to chars other than letters, it might well be that some users, with some keyboards, will find some chars in an equivalence class easier to type than others. Let them use/type whichever they like, no? This feature, welcome as it is, seems only half-baked, so far. How about equality for char-folding equivalence? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-01 15:46 char equivalence classes in search - why not symmetric? Drew Adams @ 2015-09-01 15:52 ` Davis Herring 2015-09-01 16:51 ` Stefan Monnier 2015-09-01 17:51 ` Drew Adams 2015-09-01 16:16 ` Eli Zaretskii 1 sibling, 2 replies; 86+ messages in thread From: Davis Herring @ 2015-09-01 15:52 UTC (permalink / raw) To: Drew Adams; +Cc: emacs-devel > Whether or not this behavior for case-fold is still a good thing > is questionable now, I think. I don't think it is necessary now > or particularly useful. And I think it can be confusing to > newbies. Why should searching for A be different from searching > for a, wrt case matching? Because having both input characters mean the same thing uselessly deprives the user of expressive power. > Why not? Why, when char folding, treat plain a specially for > searching? Why not treat á, a, à, ã, ª, â, å, and ä the same? For exactly the same reason. > And when it comes to chars other than letters, it might well > be that some users, with some keyboards, will find some chars > in an equivalence class easier to type than others. Let them > use/type whichever they like, no? It would make sense to provide a customization option to control which character meant the whole set -- if anyone would use it. Are there in fact keyboards where the accented characters are significantly easier? > This feature, welcome as it is, seems only half-baked, so far. > How about equality for char-folding equivalence? These are code points, not oppressed minorities. Davis -- This product is sold by volume, not by mass. If it appears too dense or too sparse, it is because mass-energy conversion has occurred during shipping. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-01 15:52 ` Davis Herring @ 2015-09-01 16:51 ` Stefan Monnier 2015-09-01 17:51 ` Drew Adams 1 sibling, 0 replies; 86+ messages in thread From: Stefan Monnier @ 2015-09-01 16:51 UTC (permalink / raw) To: Davis Herring; +Cc: Drew Adams, emacs-devel >> This feature, welcome as it is, seems only half-baked, so far. >> How about equality for char-folding equivalence? > These are code points, not oppressed minorities. How 'bout we dedicate Sep 17 of every year all those Unicode characters left in the dark? Stefan ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-01 15:52 ` Davis Herring 2015-09-01 16:51 ` Stefan Monnier @ 2015-09-01 17:51 ` Drew Adams 2015-09-01 18:40 ` Davis Herring 2015-09-01 20:10 ` Stephen J. Turnbull 1 sibling, 2 replies; 86+ messages in thread From: Drew Adams @ 2015-09-01 17:51 UTC (permalink / raw) To: Davis Herring; +Cc: emacs-devel > Because having both input characters mean the same thing > uselessly deprives the user of expressive power. Examples/arguments/reasons, please. IOW, prove it. You can always toggle char folding, just as you can toggle case folding. IMO, more users have been tripped up than helped by the rule that "An upper-case letter anywhere in the incremental search string makes the search case-sensitive." (emacs) Search Case. Letting a user toggle between matching chars one-to-one and matching chars according to equivalence classes, is sufficient and clear, IMO. Adding rules on top of this is not helpful. But I would not oppose the current behavior as an option. Let users decide whether matching is symmetric or asymmetric. Maybe even let users toggle, or cycle among these two folding (one-many) behaviors and unfolded (one-one matching) behavior. > > Why not? Why, when char folding, treat plain a specially for > > searching? Why not treat á, a, à, ã, ª, â, å, and ä the same? > > For exactly the same reason. What reason? Please show how this optional matching behavior "deprives the user of expressive power". ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-01 17:51 ` Drew Adams @ 2015-09-01 18:40 ` Davis Herring 2015-09-01 19:09 ` Drew Adams 2015-09-01 22:45 ` Juri Linkov 2015-09-01 20:10 ` Stephen J. Turnbull 1 sibling, 2 replies; 86+ messages in thread From: Davis Herring @ 2015-09-01 18:40 UTC (permalink / raw) To: Drew Adams; +Cc: emacs-devel >> Because having both input characters mean the same thing >> uselessly deprives the user of expressive power. > > Examples/arguments/reasons, please. IOW, prove it. I'm sorry: I thought it was obvious. For case folding, there are three sets of characters that might be considered a match: [a], [A], and [aA]. The default Emacs behavior is to make "a" mean [aA] and "A" mean [A]. For the (relatively rare) case in which [a] is desired, one can turn case-fold-search off (e.g., with M-c). Then you gain [a] and lose [aA] as a choice (you can't have all three from just two characters!). With your suggestion (which addresses only case-fold-search, of course), we would have only [aA] available whether you typed "a" or "A". That is the less expressive power: the semantically distinct options available have been reduced. Of course, with more than one character there are yet other possibilities: for two characters there are 9, of which "ab" gives you [aA][bB] and each of the other three permutations give one (case-sensitive) match each. 4/9 isn't great, but it's better than 1/9! > IMO, more users have been tripped up than helped by the rule > that "An upper-case letter anywhere in the incremental search > string makes the search case-sensitive." (emacs) Search Case. How did that upper-case letter get there? Commands like C-w are careful not to add uppercase letters if there aren't already some. So the user must have typed it explicitly, and so they were paying attention to case and have no need for a case-insensitive search. The only harm is if they are inconsistent in their typing -- during something as brief as isearch. Davis -- This product is sold by volume, not by mass. If it appears too dense or too sparse, it is because mass-energy conversion has occurred during shipping. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-01 18:40 ` Davis Herring @ 2015-09-01 19:09 ` Drew Adams 2015-09-01 22:45 ` Juri Linkov 1 sibling, 0 replies; 86+ messages in thread From: Drew Adams @ 2015-09-01 19:09 UTC (permalink / raw) To: Davis Herring; +Cc: emacs-devel > >> Because having both input characters mean the same thing > >> uselessly deprives the user of expressive power. > > > > Examples/arguments/reasons, please. IOW, prove it. > > I'm sorry: I thought it was obvious. For case folding, there are three > sets of characters that might be considered a match: [a], [A], and [aA]. > The default Emacs behavior is to make "a" mean [aA] and "A" mean [A]. > For the (relatively rare) case in which [a] is desired, one can turn > case-fold-search off (e.g., with M-c). Then you gain [a] and lose [aA] > as a choice (you can't have all three from just two characters!). You are just echoing what the implementation does, not giving any supporting reasons for it. "You can't have all three from just two characters" sounds important - except that it doesn't mean anything. It is quite possible for the behavior to be any of these: a matches a only a matches a and A A matches A only A matches a and A The current implementation does not provide for the last possibility. In that, it can be argued that it "deprives the user of expressive power". But I won't bother making that argument for case folding. I am not arguing for a change now in the longstanding case-fold behavior. I am arguing that we get this right for char folding. > With your suggestion (which addresses only case-fold-search, of course), > we would have only [aA] available whether you typed "a" or "A". That is > the less expressive power: the semantically distinct options available > have been reduced. That's your suggestion perhaps. It's certainly not mine. I suggest letting the user match a to a, a to [aA], A to A, and A to [aA]. That is more expressive power, not less. With it, the "semantically distinct options available" have been increased. > Of course, with more than one character there are yet other > possibilities: for two characters there are 9, of which "ab" gives you > [aA][bB] and each of the other three permutations give one > (case-sensitive) match each. 4/9 isn't great, but it's better than 1/9! See above. You are reducing possibilities, not expanding them. > > IMO, more users have been tripped up than helped by the rule > > that "An upper-case letter anywhere in the incremental search > > string makes the search case-sensitive." (emacs) Search Case. > > How did that upper-case letter get there? Commands like C-w are careful > not to add uppercase letters if there aren't already some. So the user > must have typed it explicitly, and so they were paying attention to case > and have no need for a case-insensitive search. The only harm is if > they are inconsistent in their typing -- during something as brief as > isearch. A char in a search string can "get there" because a user typed it, and that can be because for that user it is easy to type. Or it can get there from a previous search (same Isearch invocation or not). Or it can "get there" by yanking copied text. Try typing or pasting "réduction" to Google, and see if it ignores hits such as "reduction". Good luck with that. Silly Google, missing the "obvious". It should be obvious that it can be useful to match the pattern "réduction" against "reduction", just as it can be useful to match the pattern "reduction" against "réduction" (and "réduction" against "réduction" and "reduction" against "reduction"). To remove this possibility, thus reducing user expressiveness, you really should come up with a reason. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-01 18:40 ` Davis Herring 2015-09-01 19:09 ` Drew Adams @ 2015-09-01 22:45 ` Juri Linkov 2015-09-02 0:33 ` Drew Adams 1 sibling, 1 reply; 86+ messages in thread From: Juri Linkov @ 2015-09-01 22:45 UTC (permalink / raw) To: Davis Herring; +Cc: Drew Adams, emacs-devel > I'm sorry: I thought it was obvious. For case folding, there are three > sets of characters that might be considered a match: [a], [A], and [aA]. > The default Emacs behavior is to make "a" mean [aA] and "A" mean [A]. > For the (relatively rare) case in which [a] is desired, one can turn > case-fold-search off (e.g., with M-c). Then you gain [a] and lose [aA] > as a choice (you can't have all three from just two characters!). Or in a brief table: ‘C-s a’ matches [aA] ‘C-s a M-c’ matches [a] ‘C-s A’ matches [A] ‘C-s A M-c’ matches [aA] Substituting ‘A’ into ‘ä’ (other equivalent chars omitted for brevity): ‘C-s a’ matches [aä] ‘C-s a M-'’ matches [a] ‘C-s ä’ matches [ä] ‘C-s ä M-'’ matches [aä] I see no problem implementing the same. BTW, could this scheme be applied to whitespace matching as well? ‘C-s SPC’ matches [SPC TAB] ‘C-s SPC M-s SPC’ matches [SPC] ‘C-s TAB’ matches [TAB] ‘C-s TAB M-s SPC’ matches [SPC TAB] > How did that upper-case letter get there? Commands like C-w are careful > not to add uppercase letters if there aren't already some. So the user > must have typed it explicitly, and so they were paying attention to case > and have no need for a case-insensitive search. The only harm is if > they are inconsistent in their typing -- during something as brief as > isearch. Yanking a string with upper-case letters into Isearch does more harm by converting them into lower-case. I believe yanking a string should not strip diacritics either. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-01 22:45 ` Juri Linkov @ 2015-09-02 0:33 ` Drew Adams 0 siblings, 0 replies; 86+ messages in thread From: Drew Adams @ 2015-09-02 0:33 UTC (permalink / raw) To: Juri Linkov, Davis Herring; +Cc: emacs-devel > Or in a brief table: > > ‘C-s a’ matches [aA] > ‘C-s a M-c’ matches [a] > ‘C-s A’ matches [A] > ‘C-s A M-c’ matches [aA] > > Substituting ‘A’ into ‘ä’ (other equivalent chars omitted for brevity): > > ‘C-s a’ matches [aä] > ‘C-s a M-'’ matches [a] > ‘C-s ä’ matches [ä] > ‘C-s ä M-'’ matches [aä] > > I see no problem implementing the same. Did you mean `M-s '' insteaed of `M-''? If so, except for the last line, that's what we have now, IIUC. And yes, that would be one way to do it (get the 4 match possibilities I requested). Gets my vote. > BTW, could this scheme be applied to whitespace matching as well? > > ‘C-s SPC’ matches [SPC TAB] > ‘C-s SPC M-s SPC’ matches [SPC] > ‘C-s TAB’ matches [TAB] > ‘C-s TAB M-s SPC’ matches [SPC TAB] Sounds good to me. Again, gets my vote. But in each case, I would want there to be a user option that controls the default behavior, just as `case-fold-search' does. That should be the first fix, as I mentioned earlier: change `character-fold-search' to a defcustom. Let a user decide which default behavior s?he wants for char folding - and whitespace folding as well. It is very handy to me that search always starts by default by respecting case, because my customized value of `case-fold-search' is nil. I would not want to have to do `M-c' each time I start a search. Likewise, for char folding (`M-s '') and whitespace folding (`M-s SPC'). > > How did that upper-case letter get there? Commands like C-w are careful > > not to add uppercase letters if there aren't already some. So the user > > must have typed it explicitly, and so they were paying attention to case > > and have no need for a case-insensitive search. The only harm is if > > they are inconsistent in their typing -- during something as brief as > > isearch. > > Yanking a string with upper-case letters into Isearch does more harm > by converting them into lower-case. I believe yanking a string > should not strip diacritics either. That too gets my vote - WYYIWYG: what you yank is what you get. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-01 17:51 ` Drew Adams 2015-09-01 18:40 ` Davis Herring @ 2015-09-01 20:10 ` Stephen J. Turnbull 1 sibling, 0 replies; 86+ messages in thread From: Stephen J. Turnbull @ 2015-09-01 20:10 UTC (permalink / raw) To: Drew Adams; +Cc: emacs-devel Drew Adams writes: > > Because having both input characters mean the same thing > > uselessly deprives the user of expressive power. > > Examples/arguments/reasons, please. IOW, prove it. With "a" and "A" as distinct entities I can express either of two things in one character. If I equivalence them, I can only express one thing. 2 > 1. Q.E.D. On the contrary, "we could have an option" is not a reason for having the option. We now have a working approach which has the advantage of being modeless while not imposing an excessive efficiency burden. By that I mean capitalized words are relatively uncommon, and therefore not likely to constitute a huge number of unwanted "hits" in an isearch for an entirely lowercase string. I'm not *sure* the same efficiency will be true for "accent folding", but you cannot possibly be sure it's false. The current approach is good enough for now, and experience will accumulate over time. Wait for it. > You can always toggle char folding, just as you can toggle > case folding. Modal behavior in user commands is generally avoided in Emacs where it isn't absolutely necessary. Bottom line, burden of proof is on *you*. > IMO, You repeatedly mention your opinion in the same message where you ask others to prove things. Yet your opinion is not evidence for anything except your opinion. > Let users decide whether matching is symmetric or asymmetric. I say to them: "Use the source, Luke!" ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-01 15:46 char equivalence classes in search - why not symmetric? Drew Adams 2015-09-01 15:52 ` Davis Herring @ 2015-09-01 16:16 ` Eli Zaretskii [not found] ` <<38061f42-eaf1-47c6-b74d-f676ac952b18@default> ` (2 more replies) 1 sibling, 3 replies; 86+ messages in thread From: Eli Zaretskii @ 2015-09-01 16:16 UTC (permalink / raw) To: Drew Adams; +Cc: emacs-devel > Date: Tue, 1 Sep 2015 08:46:26 -0700 (PDT) > From: Drew Adams <drew.adams@oracle.com> > > When character folding is turned on, shouldn't you be able to > search for á and find (match) a, à, ã, ª, â, å, and ä? No. You should find only á. > I think so. Currently you cannot - you can only do the reverse: > search for a and find any of the above. a is treated specially. > Why? It's the same principle as with case-folding: if you type "FOO", you will not find the lowercase variant. > I suppose that the logic behind the current implementation is > to mirror what we do with case-fold searching. But is that the > right thing in this case? It's what the Unicode Standard recommends, and IMO it makes a lot of sense. See http://unicode.org/reports/tr10/#Searching. > To me, folding a group of chars together for search purposes > should be symmetric - go both ways. You will see that the above Unicode report explicitly recommends to make it _asymmetric_. > Why not? Why, when char folding, treat plain a specially for > searching? Why not treat á, a, à, ã, ª, â, å, and ä the same? > Isn't that the point here? We are telling Isearch that they > are equivalent. Why pick one of them as the canonical > search-pattern to use for finding any of them? Why privilege > a over á, a, à, ã, ª, â, å, and ä? Because we are not "telling Isearch that they are equivalent". We are asking for matches that disregard the diacriticals (and in case of ª also higher-order collation-order variation). > Now most of the time I, like most people, will by typing a > instead of á into a search string. But that's not really the > point. I think users should be able to use any members of an > equivalence class of chars indifferently. That'd make searching for exactly á unnecessarily complicated and/or cumbersome, for no good reason. The symmetry you suggest has no practical advantages (because you can find all of these characters by just specifying a), but does have significant practical disadvantages. > This feature, welcome as it is, seems only half-baked, so far. No need for derogatory language, thank you. We certainly have a lot to learn about this feature, but half-baked it isn't. ^ permalink raw reply [flat|nested] 86+ messages in thread
[parent not found: <<38061f42-eaf1-47c6-b74d-f676ac952b18@default>]
[parent not found: <<83r3miatvl.fsf@gnu.org>]
[parent not found: <<21998.29683.916211.867479@a1i15.kph.uni-mainz.de>]
[parent not found: <<9A972800-D8F0-4DA8-877E-07D5BDC2E1F9@gmail.com>]
* RE: char equivalence classes in search - why not symmetric? 2015-09-01 16:16 ` Eli Zaretskii [not found] ` <<38061f42-eaf1-47c6-b74d-f676ac952b18@default> @ 2015-09-01 17:50 ` Drew Adams 2015-09-01 18:15 ` Eli Zaretskii 2015-09-02 15:34 ` Richard Stallman 2 siblings, 1 reply; 86+ messages in thread From: Drew Adams @ 2015-09-01 17:50 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel > > When character folding is turned on, shouldn't you be able to > > search for á and find (match) a, à, ã, ª, â, å, and ä? > > No. You should find only á. No reason? > > I think so. Currently you cannot - you can only do the > > reverse: search for a and find any of the above. a is treated > > specially. Why? > > It's the same principle as with case-folding: if you type "FOO", > you will not find the lowercase variant. You're just echoing what it does, not supporting the behavior with reasons. And I already mentioned what you say here. > > I suppose that the logic behind the current implementation is > > to mirror what we do with case-fold searching. But is that the > > right thing in this case? > > It's what the Unicode Standard recommends, and IMO it makes a > lot of sense. See http://unicode.org/reports/tr10/#Searching. I don't see that, when reading that section. I do see that it explicitly calls out that behavior as an _option_: 8.2 Asymmetric Search Users often find asymmetric searching to be a useful option. That users can find this optionally useful, I have no doubt. And I wouldn't be against making it a user option in Emacs. But I do not see anything in the section you cited that says that this asymmetric behavior is required, or recommended. In any case, Emacs is not beholden to any particular standard, as RMS so often reminds us. The question is what is useful for Emacs users. If you think "it makes a lot of sense" then you should have no difficulty giving some of that sense. So far, none; just appeals to authority. > > To me, folding a group of chars together for search purposes > > should be symmetric - go both ways. > > You will see that the above Unicode report explicitly recommends > to make it _asymmetric_. No, I do not see that. I see that the report points out that such an optional behavior can be useful for some users. And it specifically points out the case "When doing an asymmetric search", making clear that there is also the case when NOT doing an asymmetric search. Obviously, for the simpler case of a symmetric search there is no need for a section describing it - it is straightforward, whereas the asymmetric search case takes some explaining. Which is precisely what makes it more complex for users. Nowhere in that report do I see that asymmetric search is the only, or even the recommended, search behavior. It is explicitly pointed out as an optional behavior. But I read the section quickly, and you are the expert. Please point to where I am mistaken. > > Why not? Why, when char folding, treat plain a specially for > > searching? Why not treat á, a, à, ã, ª, â, å, and ä the same? > > Isn't that the point here? We are telling Isearch that they > > are equivalent. Why pick one of them as the canonical > > search-pattern to use for finding any of them? Why privilege > > a over á, a, à, ã, ª, â, å, and ä? > > Because we are not "telling Isearch that they are equivalent". I think we should be. At least that should be one possibility. > We are asking for matches that disregard the diacriticals > (and in case of ª also higher-order collation-order variation). No. You are asking for that only when you use a search pattern that does not use the diacriticals. When you search with á in the pattern you are NOT asking for matches that disregard the diacriticals. And why not? So far, no reasons given. I would favor being able not just to toggle between folded and unfolded search but to cycle among folded-symmetric, folded-asymmetric, and unfolded. Why not? > > Now most of the time I, like most people, will by typing a > > instead of á into a search string. But that's not really the > > point. I think users should be able to use any members of an > > equivalence class of chars indifferently. > > That'd make searching for exactly á unnecessarily complicated and/or > cumbersome, for no good reason. The symmetry you suggest has no > practical advantages (because you can find all of these characters by > just specifying a), but does have significant practical disadvantages. Assertions with no supporting reasons/examples. > > This feature, welcome as it is, seems only half-baked, so far. > > No need for derogatory language, thank you. Where I work, "half-baked" is used often, and it means not entirely finished, whether that refers to dev, QA, doc, whatever. It is not used in a derogatory way. And I made very clear that I welcome this feature. If you feel that "half-baked" in the context of software development is derogatory then I apologize for using the term. Let me say it this way: This feature, welcome as it is, seems not entirely finished. Whether now or later, I would like to see it go further. > We certainly have a lot to learn about this feature, And to document. And hopefully to further develop in the future. > but half-baked it isn't. Certainly the doc is half-baked, if baked at all. And in terms of the longer term goal of facilitating users modifying the classes of chars that are treated equivalently, and of defining their own sets of such classes, we are not there yet. Saying this does not take away from the progress made so far. This is a very welcome feature. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-01 17:50 ` Drew Adams @ 2015-09-01 18:15 ` Eli Zaretskii 2015-09-01 18:46 ` Drew Adams 2015-09-08 5:36 ` Ulrich Mueller 0 siblings, 2 replies; 86+ messages in thread From: Eli Zaretskii @ 2015-09-01 18:15 UTC (permalink / raw) To: Drew Adams; +Cc: emacs-devel > Date: Tue, 1 Sep 2015 10:50:22 -0700 (PDT) > From: Drew Adams <drew.adams@oracle.com> > Cc: emacs-devel@gnu.org > > > We are asking for matches that disregard the diacriticals > > (and in case of ª also higher-order collation-order variation). > > No. You are asking for that only when you use a search pattern > that does not use the diacriticals. When you search with á in > the pattern you are NOT asking for matches that disregard the > diacriticals. And why not? Because á does include a diacritical. By specifying it, the user told us the diacriticals are important, and shouldn't be disregarded. > > It's what the Unicode Standard recommends, and IMO it makes a > > lot of sense. See http://unicode.org/reports/tr10/#Searching. > > I don't see that, when reading that section. I do see that it > explicitly calls out that behavior as an _option_: > > 8.2 Asymmetric Search > Users often find asymmetric searching to be a useful option. "Users often find asymmetric searching to be a useful option" sounds like a recommendation to me. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-01 18:15 ` Eli Zaretskii @ 2015-09-01 18:46 ` Drew Adams 2015-09-01 19:19 ` Eli Zaretskii 2015-09-08 5:36 ` Ulrich Mueller 1 sibling, 1 reply; 86+ messages in thread From: Drew Adams @ 2015-09-01 18:46 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel > > > We are asking for matches that disregard the diacriticals > > > (and in case of ª also higher-order collation-order variation). > > > > No. You are asking for that only when you use a search pattern > > that does not use the diacriticals. When you search with á in > > the pattern you are NOT asking for matches that disregard the > > diacriticals. And why not? > > Because á does include a diacritical. By specifying it, the user told > us the diacriticals are important, and shouldn't be disregarded. Again, you are just parroting what the implementation does, not giving a reason supporting it. By turning on folding, a user can be said to be choosing to disregard diacriticals. Again, both options for fold matching should probably be available. There is no reason to hard-code one of them at design time. At least no reason has been put forth so far. > > > It's what the Unicode Standard recommends, and IMO it makes a > > > lot of sense. See http://unicode.org/reports/tr10/#Searching. > > > > I don't see that, when reading that section. I do see that it > > explicitly calls out that behavior as an _option_: > > > > 8.2 Asymmetric Search > > Users often find asymmetric searching to be a useful option. > > "Users often find asymmetric searching to be a useful option" sounds > like a recommendation to me. No, it is not. Not at all. That, and all of the text about this, makes clear, AFAICT, that this is a useful OPTIONAL behavior. That is the language used: "a useful option". Nowhere (AFAICT) is there any language supporting an interpretation of this as the recommended behavior. The language instead clearly points out that there are different behaviors covered by the report. And the one that is complex and needs explanation is clearly called out as an optional behavior. Not the recommended behavior, but a useful behavior to consider for inclusion as an option. Anyway, thanks for confirming that there was not some text that I missed, which in fact recommends asymmetric matching. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-01 18:46 ` Drew Adams @ 2015-09-01 19:19 ` Eli Zaretskii 2015-09-01 20:15 ` Drew Adams 0 siblings, 1 reply; 86+ messages in thread From: Eli Zaretskii @ 2015-09-01 19:19 UTC (permalink / raw) To: Drew Adams; +Cc: emacs-devel > Date: Tue, 1 Sep 2015 11:46:11 -0700 (PDT) > From: Drew Adams <drew.adams@oracle.com> > Cc: emacs-devel@gnu.org > > > > > We are asking for matches that disregard the diacriticals > > > > (and in case of ª also higher-order collation-order variation). > > > > > > No. You are asking for that only when you use a search pattern > > > that does not use the diacriticals. When you search with á in > > > the pattern you are NOT asking for matches that disregard the > > > diacriticals. And why not? > > > > Because á does include a diacritical. By specifying it, the user told > > us the diacriticals are important, and shouldn't be disregarded. > > Again, you are just parroting what the implementation does ??? I explained the interpretation of the user input, how's that implementation? > Again, both options for fold matching should probably be available. > There is no reason to hard-code one of them at design time. > > At least no reason has been put forth so far. You've got all the reasons, you just refuse to hear them. Time to bail out. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-01 19:19 ` Eli Zaretskii @ 2015-09-01 20:15 ` Drew Adams 0 siblings, 0 replies; 86+ messages in thread From: Drew Adams @ 2015-09-01 20:15 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel > > > > > We are asking for matches that disregard the diacriticals > > > > > (and in case of ª also higher-order collation-order variation). > > > > > > > > No. You are asking for that only when you use a search pattern > > > > that does not use the diacriticals. When you search with á in > > > > the pattern you are NOT asking for matches that disregard the > > > > diacriticals. And why not? ^^^^^^^^^^^^ > > > Because á does include a diacritical. By specifying it, the user told > > > us the diacriticals are important, and shouldn't be disregarded. > > > > Again, you are just parroting what the implementation does > > ??? I explained the interpretation of the user input, how's that > implementation? You described the current interpretation, by Emacs, of the user input á. That's "what the implementation does." That does not explain why use á in a search string _should_ mean that diacriticals are important and shouldn't be disregarded. And that was the question I asked - why should this be the (only) behavior? Your answer is, just because it _is_ the behavior. Because it is the behavior, users expect it and we can interpret what they want in terms of it. Well yes, sure - it's the only choice they have now. It _is_ the behavior, so of course they use it accordingly. They type á in order to match á. So what? > > Again, both options for fold matching should probably be available. > > There is no reason to hard-code one of them at design time. > > At least no reason has been put forth so far. > > You've got all the reasons, you just refuse to hear them. > Time to bail out. The only reason you gave is that this is what Emacs does now. And that that means that this is what a user expects. S?he types á to match á and a to match a (or variants, with char folding). User intention is clear here: s?he gets the behavior s?he asks Emacs for. QED. Sorry, that's not a reason _why_ this should be the (only) behavior available to a user. It's just repeating that users expect this behavior from Emacs and so act accordingly. That they get what they expect is no proof that that is the only useful behavior. It just shows that they know what Emacs does. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-01 18:15 ` Eli Zaretskii 2015-09-01 18:46 ` Drew Adams @ 2015-09-08 5:36 ` Ulrich Mueller 2015-09-08 6:04 ` Jean-Christophe Helary ` (2 more replies) 1 sibling, 3 replies; 86+ messages in thread From: Ulrich Mueller @ 2015-09-08 5:36 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Drew Adams, emacs-devel >>>>> On Tue, 01 Sep 2015, Eli Zaretskii wrote: >> No. You are asking for that only when you use a search pattern >> that does not use the diacriticals. When you search with á in >> the pattern you are NOT asking for matches that disregard the >> diacriticals. And why not? > Because á does include a diacritical. By specifying it, the user > told us the diacriticals are important, and shouldn't be > disregarded. I disagree. When I search for "Müller" I want it to also match "Muller" because some people (e.g., in French speaking countries) use this as an approximation of the spelling. (I'd also like it to match "Mueller" but that's a different issue.) Ulrich ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-08 5:36 ` Ulrich Mueller @ 2015-09-08 6:04 ` Jean-Christophe Helary 2015-09-08 13:31 ` Stephen J. Turnbull 2015-09-08 13:39 ` Drew Adams 2015-09-08 15:47 ` Eli Zaretskii 2015-09-08 20:09 ` Richard Stallman 2 siblings, 2 replies; 86+ messages in thread From: Jean-Christophe Helary @ 2015-09-08 6:04 UTC (permalink / raw) To: emacs-devel > On Sep 8, 2015, at 14:36, Ulrich Mueller <ulm@gentoo.org> wrote: > >>>>>> On Tue, 01 Sep 2015, Eli Zaretskii wrote: > >>> No. You are asking for that only when you use a search pattern >>> that does not use the diacriticals. When you search with á in >>> the pattern you are NOT asking for matches that disregard the >>> diacriticals. And why not? > >> Because á does include a diacritical. By specifying it, the user >> told us the diacriticals are important, and shouldn't be >> disregarded. > > I disagree. When I search for "Müller" I want it to also match > "Muller" because some people (e.g., in French speaking countries) use > this as an approximation of the spelling. It's fine that emacs is "different", but common (nano, vi, GUI editors, word processors) behaviour is that a search strictly matches the string, and that creates expectations. For the Muller case above, as a translator I could see myself search for Muller to correct it to Müller and not be happy to have all the correct Müllers showing up in the search. Let's just put flags that trigger case/diacritic matching, they could be on in default emacs, but they should be somewhere. Jean-Christophe Helary ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-08 6:04 ` Jean-Christophe Helary @ 2015-09-08 13:31 ` Stephen J. Turnbull 2015-09-08 14:24 ` Drew Adams [not found] ` <<8cf269bc-69d8-4752-8506-de8d992512e1@default> 2015-09-08 13:39 ` Drew Adams 1 sibling, 2 replies; 86+ messages in thread From: Stephen J. Turnbull @ 2015-09-08 13:31 UTC (permalink / raw) To: Jean-Christophe Helary; +Cc: emacs-devel Jean-Christophe Helary writes: > Let's just put flags that trigger case/diacritic matching, they > could be on in default emacs, but they should be somewhere. They're already there. The discussion here is entirely about the DWIM UI of isearch that allows requesting strict matching by having at least one uppercase or accented character, even though lax mode is enabled. Drew prefers a UI that enables/disables strict mode using a special isearch command bound to a key. That would be plausible, if the DWIM UI for case fold search in isearch weren't 3 decades old. But the DWIM UI *is* 3 decades old, and successful. Drew disputes that, but in the 25 years I've followed Emacs development this is the first time I've seen anybody complain about the DWIM-ish case folding feature. Note that incremental case-folded search (usually with no escape for strict matching!) has been widely adopted in web and file browsers. I'm +1 on generalizing this UI to "diacritic folding" in isearch. The other question is that of Ulrich Müller, who points out that it's natural for him to type his name correctly, but he'd like to laxly match Mueller and Muller, too.[1] It's a valid use case, obviously, but based on an analogy to experience with DWIMish case-folding in Emacs, I believe most users will quickly adjust to typing "muller" when they want a poor man's version of full "orthographic equivalence". Individuals may not, but I believe the great majority will, since I'm sure it's anatomically easier to type "muller" than "Müller", even on a German keyboard. Footnotes: [1] Drew also argues this point, but from an abstract insistence on "symmetry", which doesn't really exist here for representational, anatomical, psychological reasons, and let's not forget personal historical reasons like "Müller is my name". ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-08 13:31 ` Stephen J. Turnbull @ 2015-09-08 14:24 ` Drew Adams 2015-09-08 15:21 ` Stephen J. Turnbull ` (2 more replies) [not found] ` <<8cf269bc-69d8-4752-8506-de8d992512e1@default> 1 sibling, 3 replies; 86+ messages in thread From: Drew Adams @ 2015-09-08 14:24 UTC (permalink / raw) To: Stephen J. Turnbull, Jean-Christophe Helary; +Cc: emacs-devel > The discussion here is entirely about the DWIM > UI of isearch that allows requesting strict matching by having at > least one uppercase or accented character, even though lax mode is > enabled. The proposal is explicitly *not* for the former, now. The weird exception of an uppercase letter making the current search be case-sensitive, even though you have toggled case sensitivity OFF, is not under attack now. Personally, yes, I would get rid of that anomaly too at some point, but I'm not proposing that now. Likewise, for the anomaly that whitespace folding is switched off by SPC SPC. That too, I would like to see removed eventually, but I'm not proposing that now either. The point now is to DTRT wrt char folding - the new feature. > Drew prefers a UI that enables/disables strict mode using a > special isearch command bound to a key. We already have that. What I'm proposing in this thread is that when char folding is on, it work symmetrically: Folding should let you use `é' in the search string to match any of the accented or unaccented variants, just as it does for `e' in the search string. Nothing more. What's good for `e' should be good for `é' and all the rest. It's about equivalence classes. There is no reason to limit search strings to one privileged member of an equivalence class when trying to match any members of the class. That's all. > That would be plausible, if the DWIM > UI for case fold search in isearch weren't 3 decades old. See above. I am *not* now proposing a change to case-fold behavior. I've made that clear from the beginning, and repeated it several times now. But it seems that it is easier, for those not favorable to what I (and Juri, apparently) propose, to harp on the age-old anomaly of uppercase case-fold annulment as, somehow (?), an argument against clean, symmetric char folding. Please argue about the topic at hand (see Subject line), not whether the 1980s decision to make an exception for an uppercase letter in the search string was or is a good idea. > ut the DWIM UI *is* 3 decades old, and successful. Drew > disputes that, No, Drew does not. You cannot show one place where anything Drew has written written suggests that he disputes that. > but in the 25 years I've followed Emacs development this is > the first time I've seen anybody complain about the DWIM-ish > case folding feature. Live and learn. ;-) That is not the topic of this thread, in any case. > Note that incremental case-folded search (usually with no escape for > strict matching!) has been widely adopted in web and file browsers. Uh, no. Case folding, yes. But not case folding that switches off (becoming case-sensitive) just because you include an uppercase letter in the search string. Not in any browser I have, at least. Nor in Notepad or TextPad or other simple editors that newbies or non-programmers might be used to. But again, *not* the subject of this topic. > I'm +1 on generalizing this UI to "diacritic folding" in isearch. By "this UI", I guess you mean that if there is a char with a diacritic in the search string then that should turn off char folding, preventing you from matching text ignoring diacritics. That would be unfortunate - a strict loss (inability to match `é' against `e'; only ability to match `e' against `é'), and with no gain. > The other question is that of Ulrich Müller, who points out that it's > natural for him to type his name correctly, but he'd like to laxly > match Mueller and Muller, too.[1] Same as my resumé example, yes. And the use case includes various quotation marks (e.g. curly) in the search string and wanting to match various others in the text. E.g., you copy some text from a web page, which includes some curly quote marks, and you want to match text in your buffer but ignoring the difference in quote-mark type. Likewise, for any of the other equivalence classes. No reason to privilege any particular member of a class, making it so that only that member can be used in a search string to match the other members. We've seen no argument supporting such asymmetry. (I can imagine an argument in terms of implementation, but we have not heard that yet. And *no* argument has been given in user terms - UI. Why should users be limited wrt which class member they can use to match a class?) > It's a valid use case, obviously, > but based on an analogy to experience with DWIMish case-folding in > Emacs, I believe most users will quickly adjust to typing "muller" > when they want a poor man's version of full "orthographic > equivalence". Individuals may not, but I believe the great majority > will, since I'm sure it's anatomically easier to type "muller" than > "Müller", even on a German keyboard. It's not only about typing. That seems to be the main point that those who repeat this mantra forget. Text can be pasted into an Isearch string, including text copied from outside Emacs. Text using any Unicode chars, from any languages. > Footnotes: > [1] Drew also argues this point, but from an abstract insistence on > "symmetry", which doesn't really exist here for representational, > anatomical, psychological reasons, and let's not forget personal > historical reasons like "Müller is my name". Nonsense. I gave concrete examples. It's not an academic argument. It's about really having character folding, not just a one-way character folding that requires you to type (or edit a pasted string) _only_ the "canonical" chars that are folded. It's a practical argument, not an abstract insistence on symmetry. Being _able_ to fold `é' to `e' or `è', and to fold one kind of quote mark to others, is, yes, a normal use case. Nothing odd, abstract, or academic about it. Herr Müller confirms this with his own example. This should be a no-brainer, IMO. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-08 14:24 ` Drew Adams @ 2015-09-08 15:21 ` Stephen J. Turnbull 2015-09-08 16:58 ` Drew Adams 2015-09-08 20:15 ` Richard Stallman 2015-09-08 20:15 ` Richard Stallman 2 siblings, 1 reply; 86+ messages in thread From: Stephen J. Turnbull @ 2015-09-08 15:21 UTC (permalink / raw) To: Drew Adams; +Cc: emacs-devel Drew Adams writes: > This should be a no-brainer, IMO. Put your code on ELPA and demonstrate its superiority. Since it's a no-brainer, there's no risk. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-08 15:21 ` Stephen J. Turnbull @ 2015-09-08 16:58 ` Drew Adams 2015-09-08 17:38 ` Stephen J. Turnbull 0 siblings, 1 reply; 86+ messages in thread From: Drew Adams @ 2015-09-08 16:58 UTC (permalink / raw) To: Stephen J. Turnbull; +Cc: emacs-devel > > This should be a no-brainer, IMO. > > Put your code on ELPA and demonstrate its superiority. > Since it's a no-brainer, there's no risk. If you're going to quote something written by someone else, please at least do not mislead by taking it totally out of context. Here is that text in context. It says nothing about implementation being a no-brainer. > Being _able_ to fold `é' to `e' or `è', and to fold one > kind of quote mark to others, is, yes, a normal use case. > Nothing odd, abstract, or academic about it. Herr Müller > confirms this with his own example. This should be a > no-brainer, IMO. It's about _what_ users can do. It should be a no-brainer, IMO, that users should _be able_ to do what Ulrich, Juri, and I have requested. That same emphasis on _being able_ was in the original text quoted, but you still ignored it. _How_ to fix the current implementation to support that behavior is a different question. Feel free to raise that question - _how_ to do it - in another thread, if you are interested. And contribute code to it, if you like. The question this thread raises is why and why not do it. I've approached this question only from a user point of view (it is useful to be able to do it). But it is fine to present implementation-related considerations that argue against (or for) doing it. None seen so far. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-08 16:58 ` Drew Adams @ 2015-09-08 17:38 ` Stephen J. Turnbull 2015-09-09 22:52 ` Drew Adams 0 siblings, 1 reply; 86+ messages in thread From: Stephen J. Turnbull @ 2015-09-08 17:38 UTC (permalink / raw) To: Drew Adams; +Cc: emacs-devel Drew Adams writes: > I've approached this question only from a user point of > view (it is useful to be able to do it). Well, since I'm not going to do it any time soon, and you haven't even considered doing it yet, this thread is moot. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-08 17:38 ` Stephen J. Turnbull @ 2015-09-09 22:52 ` Drew Adams 2015-09-10 3:12 ` Drew Adams 0 siblings, 1 reply; 86+ messages in thread From: Drew Adams @ 2015-09-09 22:52 UTC (permalink / raw) To: Stephen J. Turnbull; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 1944 bytes --] > > I've approached this question only from a user point of > > view (it is useful to be able to do it). > > Well, since I'm not going to do it any time soon, and you > haven't even considered doing it yet, this thread is moot. AFAICT, this (or similar) is the only code needed. It fixes the char-table entries for the equivalent chars, so each points to the equivalence class and not just to itself. (Currently, only the "base" char points to the equivalence class.) ;; Add an entry for each equivalent char. (let ((others ())) (map-char-table (lambda (base v) (let ((chrs (aref equiv base))) (when (consp chrs) (dolist (chr (cdr chrs)) (push (cons (string-to-char chr) (remove chr chrs)) others))))) equiv) (dolist (it others) (let ((base (car it)) (chars (cdr it))) (aset equiv base (append chars (aref equiv base))))))) This code fragment is included in the attached code that updates `character-fold-table'. Evaluate the attached code, to try the behavior proposed in this thread. The attached code provides: * A Boolean option, `char-fold-symmetric', so you can choose which behavior you want. (Let users decide, instead of "flipping a coin" at design time.) If you use Customize (or the equivalent) to change the option value then `character-fold-table' is automatically updated to reflect the new option value. * A function that updates `character-fold-table' to reflect the option value. It evaluates the above code conditionally. Just as now, you can use M-s ' to toggle char folding. With the option value non-nil you get the behavior proposed in this thread. With the option value nil you get the current, more limited behavior. [I'm no expert on char tables. Perhaps the code could be improved. But this seems to work OK. I think it exhibits the proposed behavior.] [-- Attachment #2: symmetric-char-fold.el --] [-- Type: application/octet-stream, Size: 5948 bytes --] ;; Load this file, to evaluate these two definitions in order. ;; ;; The second is an option that lets you choose the proposed behavior ;; or the current Emacs behavior, for character folding. The first is ;; a function that redefines the char-table used for character folding ;; (`character-fold-table'), so that it reflects the option value. ;; ;; When the option is non-nil, `character-fold-table' includes ;; equivalence entries for each member of a char-folding class (an ;; equivalence class wrt search). When the option is nil, ;; `character-fold-table' includes equivalence entries only for the ;; "base" character of each class. ;; ;; Use M-' to toggle char folding, as usual. (defun update-char-fold-table () "Update the value of variable `character-fold-table'. The new value reflects the current value of `char-fold-symmetric'." (setq character-fold-table (let* ((equiv (make-char-table 'character-fold-table)) (table (unicode-property-table-internal 'decomposition)) (func (char-table-extra-slot table 1))) ;; Ensure the table is populated. (map-char-table (lambda (i v) (when (consp i) (funcall func (car i) v table))) table) ;; Compile a list of all complex chars that each simple char should match. (map-char-table (lambda (i dec) (when (consp dec) ;; Discard a possible formatting tag. (when (symbolp (car dec)) (setq dec (cdr dec))) ;; Skip trivial cases like ?a decomposing to (?a). (unless (or (and (eq i (car dec)) (not (cdr dec)))) (let ((d dec) (fold-decomp t) k found) (while (and d (not found)) (setq k (pop d)) ;; Is k a number or letter, per unicode standard? (setq found (memq (get-char-code-property k 'general-category) '(Lu Ll Lt Lm Lo Nd Nl No)))) (if found ;; Check if the decomposition has more than one letter, ;; because then we don't want the first letter to match ;; the decomposition. (dolist (k d) (when (and fold-decomp (memq (get-char-code-property k 'general-category) '(Lu Ll Lt Lm Lo Nd Nl No))) (setq fold-decomp nil))) ;; If there's no number or letter on the ;; decomposition, take the first character in it. (setq found (car-safe dec))) ;; Finally, we only fold multi-char decomposition if at ;; least one of the chars is non-spacing (combining). (when fold-decomp (setq fold-decomp nil) (dolist (k dec) (when (and (not fold-decomp) (> (get-char-code-property k 'canonical-combining-class) 0)) (setq fold-decomp t)))) ;; Add i to the list of characters that k can ;; represent. Also possibly add its decomposition, so we can ;; match multi-char representations like (format "a%c" 769) (when (and found (not (eq i k))) (let ((chars (cons (char-to-string i) (aref equiv k)))) (aset equiv k (if fold-decomp (cons (apply #'string dec) chars) chars)))))))) table) ;; Add some manual entries. (dolist (it '((?\" """ "“" "”" "”" "„" "⹂" "〞" "‟" "‟" "❞" "❝" "❠" "“" "„" "〝" "〟" "🙷" "🙶" "🙸" "«" "»") (?' "❟" "❛" "❜" "‘" "’" "‚" "‛" "‚" "" "❮" "❯" "‹" "›") (?` "❛" "‘" "‛" "" "❮" "‹"))) (let ((idx (car it)) (chars (cdr it))) (aset equiv idx (append chars (aref equiv idx))))) ;; --------8<------the only addition---------------- (when char-fold-symmetric ;; Add an entry for each equivalent char. (let ((others ())) (map-char-table (lambda (base v) (let ((chrs (aref equiv base))) (when (consp chrs) (dolist (chr (cdr chrs)) (push (cons (string-to-char chr) (remove chr chrs)) others))))) equiv) (dolist (it others) (let ((base (car it)) (chars (cdr it))) (aset equiv base (append chars (aref equiv base))))))) ;; --------8<--------------------------------------- ;; Convert the lists of characters we compiled into regexps. (map-char-table (lambda (i v) (let ((re (regexp-opt (cons (char-to-string i) v)))) (if (consp i) (set-char-table-range equiv i re) (aset equiv i re)))) equiv) equiv))) (defcustom char-fold-symmetric t "Non-nil means char-fold searching treats equivalent chars the same. That is, use of any of a set of char-fold equivalent chars in a search string finds any of them in the text being searched. If nil then only the \"base\" or \"canonical\" char of the set matches any of them. The others match only themselves, even when char-folding is turned on." :set (lambda (sym defs) (custom-set-default sym defs) (update-char-fold-table)) :type 'boolean :group 'isearch) ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-09 22:52 ` Drew Adams @ 2015-09-10 3:12 ` Drew Adams 2015-09-10 21:46 ` Drew Adams 0 siblings, 1 reply; 86+ messages in thread From: Drew Adams @ 2015-09-10 3:12 UTC (permalink / raw) To: Stephen J. Turnbull; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 1324 bytes --] > AFAICT, this (or similar) is the only code needed. Sorry, I spoke too soon. 1. The following two lines are needed, before evaluating the code I sent earlier. (I've attached an update that includes them, so you can just load/evaluate it.) (setq character-fold-search t) (load-library "character-fold") This is due to the way the vanilla code is at the moment. This also means that for this testing char folding will be on, to start with. 2. The code I have is not sufficient for everything. You can use it to see what the behavior is for single-char entries in the char table, which includes accented chars (chars with diacritics). But it does not also handle multiple-char entries in the table. For instance, you can search for "é" and get char folding, but you cannot search for "é" and get char folding. The first of these is just the char named LATIN SMALL LETTER E WITH ACUTE. The second is plain "e" composed with "́" (the char named COMBINING ACUTE ACCENT). Some more work would be needed to make such combinations work too. As I said, I'm no expert on char tables. But the attached code should give you a good idea of what is involved. At the end of the file I included some commented-out e chars to search for. (Use `C-u C-x =' on a char to see what it really is.) [-- Attachment #2: symmetric-char-fold.el --] [-- Type: application/octet-stream, Size: 5807 bytes --] (setq character-fold-search t) (load-library "character-fold") (defun update-char-fold-table () "Update the value of variable `character-fold-table'. The new value reflects the current value of `char-fold-symmetric'." (setq character-fold-table (let* ((equiv (make-char-table 'character-fold-table)) (table (unicode-property-table-internal 'decomposition)) (func (char-table-extra-slot table 1))) ;; Ensure the table is populated. (map-char-table (lambda (i v) (when (consp i) (funcall func (car i) v table))) table) ;; Compile a list of all complex chars that each simple char should match. (map-char-table (lambda (i dec) (when (consp dec) ;; Discard a possible formatting tag. (when (symbolp (car dec)) (setq dec (cdr dec))) ;; Skip trivial cases like ?a decomposing to (?a). (unless (or (and (eq i (car dec)) (not (cdr dec)))) (let ((d dec) (fold-decomp t) k found) (while (and d (not found)) (setq k (pop d)) ;; Is k a number or letter, per unicode standard? (setq found (memq (get-char-code-property k 'general-category) '(Lu Ll Lt Lm Lo Nd Nl No)))) (if found ;; Check if the decomposition has more than one letter, ;; because then we don't want the first letter to match ;; the decomposition. (dolist (k d) (when (and fold-decomp (memq (get-char-code-property k 'general-category) '(Lu Ll Lt Lm Lo Nd Nl No))) (setq fold-decomp nil))) ;; If there's no number or letter on the ;; decomposition, take the first character in it. (setq found (car-safe dec))) ;; Finally, we only fold multi-char decomposition if at ;; least one of the chars is non-spacing (combining). (when fold-decomp (setq fold-decomp nil) (dolist (k dec) (when (and (not fold-decomp) (> (get-char-code-property k 'canonical-combining-class) 0)) (setq fold-decomp t)))) ;; Add i to the list of characters that k can ;; represent. Also possibly add its decomposition, so we can ;; match multi-char representations like (format "a%c" 769) (when (and found (not (eq i k))) (let ((chars (cons (char-to-string i) (aref equiv k)))) (aset equiv k (if fold-decomp (cons (apply #'string dec) chars) chars)))))))) table) ;; Add some manual entries. (dolist (it '((?\" """ "“" "”" "”" "„" "⹂" "〞" "‟" "‟" "❞" "❝" "❠" "“" "„" "〝" "〟" "🙷" "🙶" "🙸" "«" "»") (?' "❟" "❛" "❜" "‘" "’" "‚" "‛" "‚" "" "❮" "❯" "‹" "›") (?` "❛" "‘" "‛" "" "❮" "‹"))) (let ((idx (car it)) (chars (cdr it))) (aset equiv idx (append chars (aref equiv idx))))) ;; --------8<------the only addition---------------- (when char-fold-symmetric ;; Add an entry for each equivalent char. (let ((others ())) (map-char-table (lambda (base v) (let ((chrs (aref equiv base))) (when (consp chrs) (dolist (chr (cdr chrs)) (push (cons (string-to-char chr) (remove chr chrs)) others))))) equiv) (dolist (it others) (let ((base (car it)) (chars (cdr it))) (aset equiv base (append chars (aref equiv base))))))) ;; --------8<--------------------------------------- ;; Convert the lists of characters we compiled into regexps. (map-char-table (lambda (i v) (let ((re (regexp-opt (cons (char-to-string i) v)))) (if (consp i) (set-char-table-range equiv i re) (aset equiv i re)))) equiv) equiv))) (defcustom char-fold-symmetric t "Non-nil means char-fold searching treats equivalent chars the same. That is, use of any of a set of char-fold equivalent chars in a search string finds any of them in the text being searched. If nil then only the \"base\" or \"canonical\" char of the set matches any of them. The others match only themselves, even when char-folding is turned on." :set (lambda (sym defs) (custom-set-default sym defs) (update-char-fold-table)) :type 'boolean :group 'isearch) ;; ("𝚎" "𝙚" "𝘦" "𝗲" "𝖾" "𝖊" "𝕖" "𝔢" "𝓮" "𝒆" "𝑒" "𝐞" "e" "㋎" "㋍" "ⓔ" "⒠" ;; "ⅇ" "ℯ" "ₑ" "ẽ" "ẽ" "ẻ" "ẻ" "ẹ" "ẹ" "ḛ" "ḛ" "ḙ" "ḙ" "ᵉ" "ȩ" "ȩ" "ȇ" "ȇ" ;; "ȅ" "ȅ" "ě" "ě" "ę" "ę" "ė" "ė" "ĕ" "ĕ" "ē" "ē" "ë" "ë" "ê" "ê" "é" "é" "è" "è") ;; No good yet: "𝚎" "ẽ" "ẻ" "ẹ" "ḛ" "ḙ" "ȩ" "ȇ" "ȅ" ;; "ě" "ę" "ė" "ĕ" "ē" "ë" "ê" "é" "è" ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-10 3:12 ` Drew Adams @ 2015-09-10 21:46 ` Drew Adams 0 siblings, 0 replies; 86+ messages in thread From: Drew Adams @ 2015-09-10 21:46 UTC (permalink / raw) To: emacs-devel [-- Attachment #1: Type: text/plain, Size: 1707 bytes --] Yesterday I said: > 2. The code I have is not sufficient for everything. You can > use it to see what the behavior is for single-char entries in the > char table, which includes accented chars (chars with diacritics). > But it does not also handle multiple-char entries in the table. > > For instance, you can search for "é" and get char folding, but you > cannot search for "é" and get char folding. The first of these is > just the char named LATIN SMALL LETTER E WITH ACUTE. The second is > plain "e" composed with "́" (the char named COMBINING ACUTE ACCENT). > > Some more work would be needed to make such combinations work too. > As I said, I'm no expert on char tables. But the attached code > should give you a good idea of what is involved. The attached version seems to take care of this, so you can search with, say, the decomposition "é" and get the same effect as searching for the fully composed char "é". Again, just load the file, to try it out. Remember that M-s ' toggles char folding. At the end of the file there are a few strings you can use to test. When you see two consecutive strings there that look the same, the first is a decomposition, and the second is the same char fully composed. For example: "é" "é". (The first string is two chars, however it might be displayed.) `C-u C-x =' on the first char of the first string tells you: LATIN SMALL LETTER E, decomposition: (101) ('e') and on the second char it tells you: COMBINING ACUTE ACCENT, decomposition: (769) ('́'). `C-u C-x =' on the single char of the second string tells you: LATIN SMALL LETTER E WITH ACUTE, decomposition: (101 769) ('e' '́') [-- Attachment #2: symmetric-char-fold.el --] [-- Type: application/octet-stream, Size: 7320 bytes --] (setq character-fold-search t) (load-library "character-fold") (defvar char-fold-decomps () "List of conses of a decomposition and its base char.") (defun update-char-fold-table () "Update the value of variable `character-fold-table'. The new value reflects the current value of `char-fold-symmetric'." (setq char-fold-decomps ()) (setq character-fold-table (let* ((equiv (make-char-table 'character-fold-table)) (table (unicode-property-table-internal 'decomposition)) (func (char-table-extra-slot table 1))) ;; Ensure the table is populated. (map-char-table (lambda (i v) (when (consp i) (funcall func (car i) v table))) table) ;; Compile a list of all complex chars that each simple char should match. (map-char-table (lambda (i dec) (when (consp dec) ;; Discard a possible formatting tag. (when (symbolp (car dec)) (setq dec (cdr dec))) ;; Skip trivial cases like ?a decomposing to (?a). (unless (and (eq i (car dec)) (not (cdr dec))) (let ((d dec) (fold-decomp t) k found) (while (and d (not found)) (setq k (pop d)) ;; Is k a number or letter, per unicode standard? (setq found (memq (get-char-code-property k 'general-category) '(Lu Ll Lt Lm Lo Nd Nl No)))) (if found ;; Check if the decomposition has more than one letter, ;; because then we don't want the first letter to match ;; the decomposition. (dolist (k d) (when (and fold-decomp (memq (get-char-code-property k 'general-category) '(Lu Ll Lt Lm Lo Nd Nl No))) (setq fold-decomp nil))) ;; If there's no number or letter on the ;; decomposition, take the first character in it. (setq found (car-safe dec))) ;; Finally, we only fold multi-char decomposition if at ;; least one of the chars is non-spacing (combining). (when fold-decomp (setq fold-decomp nil) (dolist (k dec) (when (and (not fold-decomp) (> (get-char-code-property k 'canonical-combining-class) 0)) (setq fold-decomp t)))) ;; Add i to the list of characters that k can ;; represent. Also possibly add its decomposition, so we can ;; match multi-char representations like (format "a%c" 769) (when (and found (not (eq i k))) (let ((chr-strgs (cons (char-to-string i) (aref equiv k)))) (aset equiv k (if fold-decomp (cons (apply #'string dec) chr-strgs) chr-strgs)))))))) table) ;; Add some manual entries. (dolist (it '((?\" """ "“" "”" "”" "„" "⹂" "〞" "‟" "‟" "❞" "❝" "❠" "“" "„" "〝" "〟" "🙷" "🙶" "🙸" "«" "»") (?' "❟" "❛" "❜" "‘" "’" "‚" "‛" "‚" "" "❮" "❯" "‹" "›") (?` "❛" "‘" "‛" "" "❮" "‹"))) (let ((idx (car it)) (chr-strgs (cdr it))) (aset equiv idx (append chr-strgs (aref equiv idx))))) ;; --------8<------the only addition---------------- (when char-fold-symmetric ;; Add an entry for each equivalent char. (let ((others ())) (map-char-table (lambda (base v) (let ((chr-strgs (aref equiv base))) (when (consp chr-strgs) (dolist (strg (cdr chr-strgs)) (if (< (length strg) 2) (push (cons (string-to-char strg) (remove strg chr-strgs)) others) ;; A decomposition. Add it and its base char to `char-fold-decomps'. (push (cons strg (char-to-string base)) char-fold-decomps)))))) equiv) (dolist (it others) (let ((base (car it)) (chr-strgs (cdr it))) (aset equiv base (append chr-strgs (aref equiv base))))))) ;; --------8<--------------------------------------- ;; Convert the lists of characters we compiled into regexps. (map-char-table (lambda (i v) (let ((re (regexp-opt (cons (char-to-string i) v)))) (if (consp i) (set-char-table-range equiv i re) (aset equiv i re)))) equiv) equiv))) (defun character-fold-to-regexp (string &optional lax) "Return a regexp matching anything that character-folds into STRING. If `character-fold-search' is nil, just `regexp-quote' STRING. Otherwise: Replace any decompositions in `character-fold-table' by their base chars, so search will match all equivalents. Then replace any chars in STRING that have entries in `character-fold-table' by their entries (which are regexps), and replace other chars in STRING by `regexp-quote' applied to them. Non-nil LAX means any whitespace char can match any number of times." (if (not character-fold-search) (regexp-quote string) (when char-fold-decomps (dolist (decomp char-fold-decomps) (setq string (replace-regexp-in-string (regexp-quote (car decomp)) (cdr decomp) string 'FIXED-CASE 'LITERAL)))) (apply #'concat (mapcar (lambda (c) (if (and lax (memq c '(?\s ?\t ?\r ?\n))) "[ \t\n\r\xa0\x2002\x2d\x200a\x202f\x205f\x3000]+" (or (aref character-fold-table c) (regexp-quote (string c))))) string)))) (defcustom char-fold-symmetric t "Non-nil means char-fold searching treats equivalent chars the same. That is, use of any of a set of char-fold equivalent chars in a search string finds any of them in the text being searched. If nil then only the \"base\" or \"canonical\" char of the set matches any of them. The others match only themselves, even when char-folding is turned on." :set (lambda (sym defs) (custom-set-default sym defs) (update-char-fold-table)) :type 'boolean :group 'isearch) ;; Test by searching for these strings. ;; ("𝚎" "𝙚" "𝘦" "𝗲" "𝖾" "𝖊" "𝕖" "𝔢" "𝓮" "𝒆" "𝑒" "𝐞" "e" "㋎" "㋍" "ⓔ" "⒠" ;; "ⅇ" "ℯ" "ₑ" "ẽ" "ẽ" "ẻ" "ẻ" "ẹ" "ẹ" "ḛ" "ḛ" "ḙ" "ḙ" "ᵉ" "ȩ" "ȩ" "ȇ" "ȇ" ;; "ȅ" "ȅ" "ě" "ě" "ę" "ę" "ė" "ė" "ĕ" "ĕ" "ē" "ē" "ë" "ë" "ê" "ê" "é" "é" "è" "è") ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-08 14:24 ` Drew Adams 2015-09-08 15:21 ` Stephen J. Turnbull @ 2015-09-08 20:15 ` Richard Stallman 2015-09-08 20:15 ` Richard Stallman 2 siblings, 0 replies; 86+ messages in thread From: Richard Stallman @ 2015-09-08 20:15 UTC (permalink / raw) To: Drew Adams; +Cc: stephen, jean.christophe.helary, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > Personally, yes, I would get rid of that anomaly too at some > point, but I'm not proposing that now. Likewise, for the > anomaly that whitespace folding is switched off by SPC SPC. SPC SPC should match only a pair of spaces! -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-08 14:24 ` Drew Adams 2015-09-08 15:21 ` Stephen J. Turnbull 2015-09-08 20:15 ` Richard Stallman @ 2015-09-08 20:15 ` Richard Stallman 2015-09-08 21:25 ` Drew Adams 2 siblings, 1 reply; 86+ messages in thread From: Richard Stallman @ 2015-09-08 20:15 UTC (permalink / raw) To: Drew Adams; +Cc: stephen, jean.christophe.helary, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > Nothing more. What's good for `e' should be good for `é' and > all the rest. It's about equivalence classes. That would be a change for the worse, since it would reduce the range of searches that the user can specify with one character in the search. Currently the user can either search for "any kind of e" or "only é" or "only è" or "only ê", etc. With your change, the user would be limited to searching for "any kind of e". That would be a step back in flexibility. Since the current interface is fairly natural, there is no loss in offering the user all these options. I would not oppose offering a configuration setting to get the behavior you want. There is nothing to lose with that. But the current behavior is a more useful default than the behavior you would like. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-08 20:15 ` Richard Stallman @ 2015-09-08 21:25 ` Drew Adams 2015-09-09 15:07 ` Richard Stallman 0 siblings, 1 reply; 86+ messages in thread From: Drew Adams @ 2015-09-08 21:25 UTC (permalink / raw) To: rms; +Cc: stephen, jean.christophe.helary, emacs-devel > > Nothing more. What's good for `e' should be good for `é' and > > all the rest. It's about equivalence classes. > > That would be a change for the worse, since it would reduce the range > of searches that the user can specify with one character in the > search. Not at all. It adds to what the user can do. It does not subtract. > Currently the user can either search for "any kind of e" or "only é" > or "only è" or "only ê", etc. That would still be the case. The only difference would be that when s?he wants to search for "any kind of e" s?he can use any of the equivalent e-chars. Any of [eéèêæë] would behave the same as `e' does not, when searching for any of [eéèêæë]. > With your change, the user would be limited to searching for "any kind > of e". That would be a step back in flexibility. Not at all. Just as now, the user can toggle char folding OFF, to search for the search string literally, i.e., to take its chars as what they are, and not consider them as representative of an equivalence class. With folding OFF, `e' searches only for `e'; `é' searches only for `é', and so on. These are all of the possible possibilities, for `e' and `é': Folding ON/OFF Search string char Buffer chars that match -------------- ------------------ ----------------------- OFF e [e] OFF é [é] ON e [eéèêæë] ON é [eéèêæë] <======= MISSING NOW And the same goes for any of the other e-chars. With folding off it matches only itself. With folding on it matches any of its class. This proposal adds more matching possibilities. It does not remove any possibilities. Currently, you cannot do what is shown in the last line above. You cannot use é to search for [eéèêæë]. Similarly, you cannot use a curly quote to search for other kinds of quote marks. You are currently limited to using only the "canonical" chars that represent their class. That removes the possibility of pasting text into the search string and being able to get char-folding search. Quote marks are a good example chars in text that you might copy and try to search for. To do that, if the copied text contained curly quotes then you would need to use `M-e' and edit the search string, to convert each of them to the corresponding "canonical" member of the quotation-mark equivalences, an ascii quote mark. There is no good reason to make users jump through such a hoop. (Plus, they would need to know what the "canonical" char is, for each equivalence class they might want to use.) Let any member of a class represent the class. > Since the current interface is fairly natural, there is no loss in > offering the user all these options. All what options? The proposal does not remove any matching options. On the contrary, it adds matching options. > I would not oppose offering a configuration setting to get the > behavior you want. There is nothing to lose with that. But the > current behavior is a more useful default than the behavior you would > like. Did you understand what is being proposed, when you wrote that? If so, how is the current restriction to `e' for matching [eéèêæë] more useful than letting any e-char do the same? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-08 21:25 ` Drew Adams @ 2015-09-09 15:07 ` Richard Stallman 2015-09-09 15:21 ` Drew Adams 0 siblings, 1 reply; 86+ messages in thread From: Richard Stallman @ 2015-09-09 15:07 UTC (permalink / raw) To: Drew Adams; +Cc: stephen, jean.christophe.helary, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > Currently the user can either search for "any kind of e" or "only é" > > or "only è" or "only ê", etc. > That would still be the case. > The only difference would be that when s?he wants to search for "any > kind of e" s?he can use any of the equivalent e-chars. Any of [eéèêæë] > would behave the same as `e' does not, when searching for any of [eéèêæë]. This seems to be a miscommunication. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-09 15:07 ` Richard Stallman @ 2015-09-09 15:21 ` Drew Adams 2015-09-10 2:03 ` Richard Stallman 0 siblings, 1 reply; 86+ messages in thread From: Drew Adams @ 2015-09-09 15:21 UTC (permalink / raw) To: rms; +Cc: stephen, jean.christophe.helary, emacs-devel > > > Currently the user can either search for "any kind of e" or "only é" > > > or "only è" or "only ê", etc. > > > That would still be the case. > > The only difference would be that when s?he wants to search for "any > > kind of e" s?he can use any of the equivalent e-chars. Any of [eéèêæë] > > would behave the same as `e' does not, when searching for any of > > [eéèêæë]. > > This seems to be a miscommunication. That communication is itself unclear. _What_ seems to you to be a miscommunication? The point is that what you say is true currently would still be the case with what is proposed in this thread. The user would continue to be able to search for either any kind of e or for only a specific kind of e. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-09 15:21 ` Drew Adams @ 2015-09-10 2:03 ` Richard Stallman 2015-09-10 3:23 ` Drew Adams 0 siblings, 1 reply; 86+ messages in thread From: Richard Stallman @ 2015-09-10 2:03 UTC (permalink / raw) To: Drew Adams; +Cc: stephen, jean.christophe.helary, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > Currently the user can either search for "any kind of e" or "only é" > > or "only è" or "only ê", etc. I mean, that the user can do all of these with one character, not using any toggle command. > That would still be the case. > The only difference would be that when s?he wants to search for "any > kind of e" s?he can use any of the equivalent e-chars. No, another difference would be that NONE of the other options is possible with one character -- all would require a toggle command that people may not remember. (I don't.) > The point is that what you say is true currently would still be the > case with what is proposed in this thread. The user would continue > to be able to search for either any kind of e or for only a specific > kind of e. The user would continue to be able to do this _somehow_, but not as now without using a separate toggle command. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-10 2:03 ` Richard Stallman @ 2015-09-10 3:23 ` Drew Adams 2015-09-11 10:28 ` Richard Stallman 2015-09-11 10:28 ` Richard Stallman 0 siblings, 2 replies; 86+ messages in thread From: Drew Adams @ 2015-09-10 3:23 UTC (permalink / raw) To: rms; +Cc: stephen, jean.christophe.helary, emacs-devel > > > Currently the user can either search for "any kind of e" or "only é" > > > or "only è" or "only ê", etc. > > I mean, that the user can do all of these with one character, not > using any toggle command. Yes, that is the difference in our views. Sure, "with one character", but the flip side is that if you happen to have é in your search string, however it got there (e.g. by pasting), then with your preferred behavior you *cannot* use your search string to search for "any kind of e". This is maybe clearer when you think about copying some text to search for from outside Emacs, and that text might have curly quotes in it, in multiple places, and the text that you want to search might use other kinds of quotes, and you want the matching to match quotes regardless of type. In that use case, you are screwed in the current design. Nothing to be done, to get char-fold search, until you replace all such non-base chars in the search string with their corresponding base chars. (And you talk about difficulty remembering? Try remembering the base char of each equivalence class... Sure, letters and numerals are easy, but not some others. And we're just getting started.) > > That would still be the case. > > The only difference would be that when s?he wants to search for "any > > kind of e" s?he can use any of the equivalent e-chars. > > No, another difference would be that NONE of the other options > is possible with one character -- all would require a toggle command > that people may not remember. (I don't.) NONE of what other options? All of the same search behaviors are available. That is, you can find any search target that you can find today, using any search string that you use today. On the difficulty of toggling char folding: Do you remember how to toggle case sensitivity? How come you do? Because you've done it a few times? And if you forget, you use `C-s C-h'? Or you use `C-h f isearch-forward'? How hard is that? Anyway, it's not likely I'll convince you to enjoy the feature yourself. But maybe you can appreciate giving users the choice? > > The point is that what you say is true currently would still be the > > case with what is proposed in this thread. The user would continue > > to be able to search for either any kind of e or for only a specific > > kind of e. > > The user would continue to be able to do this _somehow_, but not as > now without using a separate toggle command. Correct. But at least that use case would still be available. Currently, the use case that the proposal provides for is not even possible - not never not noway not nohow. That's really the point: provide that possibility. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-10 3:23 ` Drew Adams @ 2015-09-11 10:28 ` Richard Stallman 2015-09-11 13:28 ` Stefan Monnier 2015-09-11 16:31 ` Drew Adams 2015-09-11 10:28 ` Richard Stallman 1 sibling, 2 replies; 86+ messages in thread From: Richard Stallman @ 2015-09-11 10:28 UTC (permalink / raw) To: Drew Adams; +Cc: stephen, jean.christophe.helary, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > Yes, that is the difference in our views. Sure, "with one character", > but the flip side is that if you happen to have é in your search string, > however it got there (e.g. by pasting), then with your preferred behavior > you *cannot* use your search string to search for "any kind of e". You are right, for what I originally proposed. It would be like the current situation with case folding, that you can't paste in a search string with capital letters and search for it in a case-independent way. However, in the case of case folding, we solve that by downcasing text when pasting it into search strings. We could de-accent strings too when pasting them. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-11 10:28 ` Richard Stallman @ 2015-09-11 13:28 ` Stefan Monnier 2015-09-11 16:33 ` Drew Adams 2015-09-12 15:28 ` Richard Stallman 2015-09-11 16:31 ` Drew Adams 1 sibling, 2 replies; 86+ messages in thread From: Stefan Monnier @ 2015-09-11 13:28 UTC (permalink / raw) To: Richard Stallman; +Cc: stephen, jean.christophe.helary, Drew Adams, emacs-devel > current situation with case folding, that you can't paste in a search > string with capital letters and search for it in a case-independent way. Yes, you can: Use M-c to explicitly choose whether to case-fold or not. > However, in the case of case folding, we solve that by downcasing > text when pasting it into search strings. We could de-accent strings > too when pasting them. Actually, the way we downcase it has problems. E.g. Go to the beginning of this paragraph (i.e. before "Actually") and do: C-s C-w M-c and you end up searching for an exact (non-case-folded) match of "actually" rather than "Actually", so it won't even match the "Actually" from which you got it. Stefan ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-11 13:28 ` Stefan Monnier @ 2015-09-11 16:33 ` Drew Adams 2015-09-11 20:59 ` Juri Linkov 2015-09-12 15:28 ` Richard Stallman 1 sibling, 1 reply; 86+ messages in thread From: Drew Adams @ 2015-09-11 16:33 UTC (permalink / raw) To: Stefan Monnier, Richard Stallman Cc: stephen, jean.christophe.helary, emacs-devel > > current situation with case folding, that you can't paste in > > a search string with capital letters and search for it in a > > case-independent way. > > Yes, you can: Use M-c to explicitly choose whether to case-fold or not. Your "Yes" is really an agreement that no, you cannot, but you can at least override/cancel Emacs's DWIM behavior, by then using `M-c' to explicitly turn case-folding back on. That is, after you figure out that Emacs has turned the tables on you (and there is no signal that it has - no message telling you that it is now searching case-sensitively), you can insist that it go back to the mode you had already chosen: case-insensitive. And thank goodness Emacs does not remove this possibility of overriding its second-guessing. > > However, in the case of case folding, we solve that by downcasing > > text when pasting it into search strings. We could de-accent > > strings too when pasting them. > > Actually, the way we downcase it has problems. E.g. Go to the > beginning of this paragraph (i.e. before "Actually") and do: > C-s C-w M-c and you end up searching for an exact (non-case-folded) > match of "actually" rather than "Actually", so it won't even match the > "Actually" from which you got it. Yes. And see my reply to RMS - if you paste text with an uppercase letter while editing the search string using `M-e', case-folding is still turned off automatically. IOW, the automatic downcasing DWIM is used only when you use `C-M-y' (or `C-w') to yank some text at point into the search string. What was said about automatic downcasing is not true for pasting in general. Which points to another possibility of use confusion (inconsistency). ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-11 16:33 ` Drew Adams @ 2015-09-11 20:59 ` Juri Linkov 2015-09-11 23:11 ` Drew Adams 0 siblings, 1 reply; 86+ messages in thread From: Juri Linkov @ 2015-09-11 20:59 UTC (permalink / raw) To: Drew Adams Cc: stephen, jean.christophe.helary, emacs-devel, Stefan Monnier, Richard Stallman > That is, after you figure out that Emacs has turned the tables on > you (and there is no signal that it has - no message telling you > that it is now searching case-sensitively), For the automatic toggling of case-sensitivity we could display the same message as displayed for manual toggling with ‘M-s c’. > IOW, the automatic downcasing DWIM is used only when you use `C-M-y' > (or `C-w') to yank some text at point into the search string. What > was said about automatic downcasing is not true for pasting in > general. Which points to another possibility of use confusion > (inconsistency). No, pasting is broken too: try to paste the upper case “A” with ‘C-s C-y’ (isearch-yank-kill) - it's irrecoverably converted to lower case. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-11 20:59 ` Juri Linkov @ 2015-09-11 23:11 ` Drew Adams 0 siblings, 0 replies; 86+ messages in thread From: Drew Adams @ 2015-09-11 23:11 UTC (permalink / raw) To: Juri Linkov Cc: stephen, jean.christophe.helary, emacs-devel, Stefan Monnier, Richard Stallman > > That is, after you figure out that Emacs has turned the tables on > > you (and there is no signal that it has - no message telling you > > that it is now searching case-sensitively), > > For the automatic toggling of case-sensitivity we could display the same > message as displayed for manual toggling with ‘M-s c’. Yes, please. We should also discuss (in another thread, please) other, additional or better/instead ways to show the user state changes and the current state. > > IOW, the automatic downcasing DWIM is used only when you use `C-M-y' > > (or `C-w') to yank some text at point into the search string. What > > was said about automatic downcasing is not true for pasting in > > general. Which points to another possibility of use confusion > > (inconsistency). > > No, pasting is broken too: try to paste the upper case “A” with ‘C-s > C-y’ (isearch-yank-kill) - it's irrecoverably converted to lower case. Oh, right. So two ways to get broken pasting in that sense, and one way to get broken pasting in the other sense (the example I gave, with `M-e'). ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-11 13:28 ` Stefan Monnier 2015-09-11 16:33 ` Drew Adams @ 2015-09-12 15:28 ` Richard Stallman 1 sibling, 0 replies; 86+ messages in thread From: Richard Stallman @ 2015-09-12 15:28 UTC (permalink / raw) To: Stefan Monnier; +Cc: stephen, jean.christophe.helary, drew.adams, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > Actually, the way we downcase it has problems. E.g. Go to the beginning > of this paragraph (i.e. before "Actually") and do: > C-s C-w M-c > and you end up searching for an exact (non-case-folded) match of > "actually" rather than "Actually", so it won't even match the "Actually" > from which you got it. Perhaps the case-ignore toggle should affect chars as you enter them. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-11 10:28 ` Richard Stallman 2015-09-11 13:28 ` Stefan Monnier @ 2015-09-11 16:31 ` Drew Adams 1 sibling, 0 replies; 86+ messages in thread From: Drew Adams @ 2015-09-11 16:31 UTC (permalink / raw) To: rms; +Cc: stephen, jean.christophe.helary, emacs-devel > > Yes, that is the difference in our views. Sure, "with one > > character", but the flip side is that if you happen to have > > é in your search string, however it got there (e.g. by > > pasting), then with your preferred behavior you *cannot* > > use your search string to search for "any kind of e". > > You are right, for what I originally proposed. It would be like the > current situation with case folding, that you can't paste in a search > string with capital letters and search for it in a case-independent way. Exactly. You cannot. But you can still (thankfully) explicitly toggle afterward using `M-c', to turn case folding back on. > However, in the case of case folding, we solve that by downcasing > text when pasting it into search strings. We could de-accent strings > too when pasting them. Actually, Emacs does *not* do that in the general case for pasting copied text. emacs -Q ; `case-fold-search' is t Copy uppercase A from some text to the kill ring. In a buffer that has both lowercase and uppercase a's: C-s M-e C-y ; Paste the uppercase A. It appears uppercase. C-s ; Only uppercase A's are found. It does what you describe only when you yank text at point (e.g., using `C-M-y' or `C-w'). The use case I've been insisting on is copying some text from anywhere (e.g., from a web browser outside Emacs). That text can contain any chars. But anyway, I can agree that what you describe (automatic downcasing and removal of accents) might be a reasonable possibility to consider. But what if a user then wants unfolded search, after such pasting? S?he then needs to toggle anyway. I don't prefer such a design because it is another automatic switching of "mode" (folding ON/OFF). It happens behind the user's back, trying to second-guess what is best for all users in all contexts: DWIM (do something hardcoded, which someone thought at design time everyone will want at runtime). You don't like using a toggle key, which I can understand. Without toggling, which makes intention explicit/clear, you must rely on these things: 1. The mode setting the folding behavior (ON/OFF) appropriately - e.g., Info turns it ON locally, regardless of a user's customization of global `case-fold-search'. (This is good.) 2. DWIM: uppercase or accented char in the search string turns folding on. Pasting into the search string strips pasted text of uppercase and accents. (Good for you, bad for me.) If that doesn't fit what a user wants in a given context (e.g., if s?he wants to search case-sensitively in Info) then s?he needs to toggle anyway. I suspect that you might exaggerate the inconvenience, even for yourself, of having to explicitly toggle when you want to change state/mode. I use a version of Isearch that requires such toggling, and in practice I rarely toggle! Why do I rarely need to toggle? Perhaps because: * I usually want case-sensitive search. * The cases where I do not are usually covered by #1: the mode (e.g. Info) DTRT locally. At any rate, perhaps we could agree that users can prefer different behaviors? And let Emacs give them the choice? At customization time at a minimum, and in some cases via an on-the-fly toggle key? (If you don't need such a toggle then you certainly don't need to worry about memorizing it. ;-)) ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-10 3:23 ` Drew Adams 2015-09-11 10:28 ` Richard Stallman @ 2015-09-11 10:28 ` Richard Stallman 2015-09-11 16:31 ` Drew Adams 1 sibling, 1 reply; 86+ messages in thread From: Richard Stallman @ 2015-09-11 10:28 UTC (permalink / raw) To: Drew Adams; +Cc: stephen, jean.christophe.helary, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > No, another difference would be that NONE of the other options > > is possible with one character -- all would require a toggle command > > that people may not remember. (I don't.) > NONE of what other options? Currently you can type a single character and do any of these things: * Search for A with or without any accent. * Search for Á only. * Search for À only. * Search for  only. * Search for Ä only. and likewise for each accented variant of A that exists in Unicode. With your change, all of those characters would do the same thing: search for A with or without any accent. So there would be only one thing you can do in regard to searching for As, without using some sort of toggling command. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-11 10:28 ` Richard Stallman @ 2015-09-11 16:31 ` Drew Adams 2015-09-12 15:29 ` Richard Stallman 0 siblings, 1 reply; 86+ messages in thread From: Drew Adams @ 2015-09-11 16:31 UTC (permalink / raw) To: rms; +Cc: stephen, jean.christophe.helary, emacs-devel > Currently you can type a single character and do any of these things: > * Search for A with or without any accent. (1) > * Search for Á only. (3) > * Search for À only. (4) > * Search for  only. (5) > * Search for Ä only. > and likewise for each accented variant of A that exists in Unicode. (6) > > With your change, all of those characters would do the same thing: > search for A with or without any accent. > > So there would be only one thing you can do in regard to searching for > As, without using some sort of toggling command. Correct. We are agreeing about the facts, which is good. Per proposal: With char folding ON: (1) Search for A with or without any accent. (2) Search for "each accented variant of A that exists in Unicode", with or without any accent. With char folding OFF: (3), (4), (5), (6) Search for Á, À, Â, Ä only (and likewise for each...) What the current design misses is possibility (2). You *cannot* search using "Müller" and find "Muller" etc. And yes, with the proposal a user explicitly expresses an intention to search with or without char folding, by hitting a key to turn it ON/OFF. There is no automatic turn-OFF just because there is a char with a diacritic in the search string. What's more, a user option can let users choose which behavior they prefer, instead of hardcoding that choice into the design. What's more, a user can (or we could) add a toggle key for flipping that behavior: both Drew and Richard could quickly switch "designs" on the fly, if they wanted to. Why not? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-11 16:31 ` Drew Adams @ 2015-09-12 15:29 ` Richard Stallman 0 siblings, 0 replies; 86+ messages in thread From: Richard Stallman @ 2015-09-12 15:29 UTC (permalink / raw) To: Drew Adams; +Cc: stephen, jean.christophe.helary, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > Correct. We are agreeing about the facts, which is good. Per proposal: > With char folding ON: > (1) Search for A with or without any accent. > (2) Search for "each accented variant of A that exists in Unicode", > with or without any accent. That seems to be a description of how it works now. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 86+ messages in thread
[parent not found: <<8cf269bc-69d8-4752-8506-de8d992512e1@default>]
[parent not found: <<E1ZZPIS-0005rf-DJ@fencepost.gnu.org>]
* RE: char equivalence classes in search - why not symmetric? [not found] ` <<E1ZZPIS-0005rf-DJ@fencepost.gnu.org> @ 2015-09-08 21:46 ` Drew Adams 0 siblings, 0 replies; 86+ messages in thread From: Drew Adams @ 2015-09-08 21:46 UTC (permalink / raw) To: rms, Drew Adams; +Cc: stephen, jean.christophe.helary, emacs-devel > > Personally, yes, I would get rid of that anomaly too at some > > point, but I'm not proposing that now. Likewise, for the > > anomaly that whitespace folding is switched off by SPC SPC. > > SPC SPC should match only a pair of spaces! This is not the topic of this thread. But if and when we get to that discussion (if I'm still around in 20 years ;-)), the right answer is the same as for the current proposal, which is about char folding: Just toggle whitespace-folding OFF. "Just say NO" to SPC SPC matching any string of whitespace. Whitespace folding will stay off as long as you do not toggle it ON. And you can customize the default behavior so that it starts either ON or OFF. (I'm not a fan of whitespace folding most of the time, so I will turn it OFF by default, personally. I do want SPC SPC to match only two consecutive spaces, nearly all the time.) The simple idea is that folding is either on or off. When on, equivalence classes are used - whether for diacritics ("char folding"), or for case (case folding), or for whitespace (whitespace folding). But that's just a preview of a possible FUTURE discussion. No one is proposing NOW that we change the current behavior of case folding or whitespace folding. The topic here, now, is char folding - whether it should treat all chars of a class the same or not. What do you LOSE with the proposed behavior (for char folding now, and perhaps for case or whitespace folding later)? You lose the fact that any particular members of an equivalence class are "canonical", and so using one of them during folding automatically switches folding off. E.g., currently, using é in a search string turns char folding off. And of course using an uppercase char turns case folding off. And SPC SPC turns whitespace folding off. What do you GAIN with the proposed behavior? You need not type a particular, privileged member of a class in order to match any member of the class. Any member will match any member (including itself, of course). The point is to have users explicitly hit a key to toggle folding. That enables the use of any char in a class to match any other char in the same class. That's the tradeoff. With the proposal, there is nothing to remember, no exceptions or special rules. Folding is either on or off, and a single key toggles it (for each kind of folding). ^ permalink raw reply [flat|nested] 86+ messages in thread
[parent not found: <<E1ZZPIT-0005s6-ST@fencepost.gnu.org>]
[parent not found: <<da54a6cb-90eb-481d-aa20-acfad612e709@default>]
[parent not found: <<E1ZZgxz-0006X2-Bg@fencepost.gnu.org>]
[parent not found: <<cb107072-7f90-41fb-9aff-075d50eb65bb@default>]
[parent not found: <<E1ZZrCm-0001x4-9a@fencepost.gnu.org>]
[parent not found: <<4f3b1db3-d3d2-480f-8662-fbf7c74aa67f@default>]
[parent not found: <<E1ZaLZR-0002Bf-8q@fencepost.gnu.org>]
[parent not found: <<e77f8e7b-581f-436d-816a-c8daed734ff5@default>]
[parent not found: <<E1ZamkM-0005d4-RN@fencepost.gnu.org>]
* RE: char equivalence classes in search - why not symmetric? [not found] ` <<E1ZamkM-0005d4-RN@fencepost.gnu.org> @ 2015-09-12 15:59 ` Drew Adams 0 siblings, 0 replies; 86+ messages in thread From: Drew Adams @ 2015-09-12 15:59 UTC (permalink / raw) To: rms, Drew Adams; +Cc: stephen, jean.christophe.helary, emacs-devel > > Correct. We are agreeing about the facts, which is good. Per > > proposal: > > > > With char folding ON: > > > > (1) Search for A with or without any accent. > > (2) Search for "each accented variant of A that exists in > > Unicode", with or without any accent. > > That seems to be a description of how it works now. No, it is not meant to. #2 means use any of the variants (in the search string) to search for any of the variants (in the text being searched). It is the proposal of this thread. (#2 was admittedly expressed not so well (I tried to reuse your two expressions, and the result of combining them was clumsy.) Currently, to search for all of the variants (any of them, indifferently), you must use the base character in the search string. You cannot use any of the variants in the search string, to get the same effect. Only the base char lets you search for the class, i.e., use char folding. (But I think you already realized this.) ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-08 6:04 ` Jean-Christophe Helary 2015-09-08 13:31 ` Stephen J. Turnbull @ 2015-09-08 13:39 ` Drew Adams 2015-09-08 21:19 ` Juri Linkov 1 sibling, 1 reply; 86+ messages in thread From: Drew Adams @ 2015-09-08 13:39 UTC (permalink / raw) To: Jean-Christophe Helary, emacs-devel > > I disagree. When I search for "Müller" I want it to also match > > "Muller" because some people (e.g., in French speaking countries) use > > this as an approximation of the spelling. > > It's fine that emacs is "different", but common (nano, vi, GUI editors, word > processors) behaviour is that a search strictly matches the string, and that > creates expectations. For the Muller case above, as a translator I could see > myself search for Muller to correct it to Müller and not be happy to have > all the correct Müllers showing up in the search. Not a problem, provided we have a toggle like what Juri suggested. Toggle literal vs char folding. And ensure that char folding is symmetric (this thread), and not just one-way as it is now. I agree with you about the default behavior (literal, not folded). But of course users need to be able to customize the default behavior, so they start out with whichever behavior they prefer. > Let's just put flags that trigger case/diacritic matching, they could be on > in default emacs, but they should be somewhere. Yup. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-08 13:39 ` Drew Adams @ 2015-09-08 21:19 ` Juri Linkov 2015-09-09 15:07 ` Richard Stallman 0 siblings, 1 reply; 86+ messages in thread From: Juri Linkov @ 2015-09-08 21:19 UTC (permalink / raw) To: Drew Adams; +Cc: Jean-Christophe Helary, emacs-devel >> > I disagree. When I search for "Müller" I want it to also match >> > "Muller" because some people (e.g., in French speaking countries) use >> > this as an approximation of the spelling. >> >> It's fine that emacs is "different", but common (nano, vi, GUI editors, word >> processors) behaviour is that a search strictly matches the string, and that >> creates expectations. In Web browsers by default “u” matches “ü” as well as “ü” matches “u”. >> For the Muller case above, as a translator I could see >> myself search for Muller to correct it to Müller and not be happy to have >> all the correct Müllers showing up in the search. > > Not a problem, provided we have a toggle like what Juri suggested. > Toggle literal vs char folding. And ensure that char folding is > symmetric (this thread), and not just one-way as it is now. Do you mean a toggle for an individual character in the search string or a toggle for the whole search string? Also is it a three-state toggle between literal match, “ü” matches only “ü”, “ü” matches both “u” and “ü”, “ü” matches “u”, “ü” and all other variants like “ú”? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-08 21:19 ` Juri Linkov @ 2015-09-09 15:07 ` Richard Stallman 0 siblings, 0 replies; 86+ messages in thread From: Richard Stallman @ 2015-09-09 15:07 UTC (permalink / raw) To: Juri Linkov; +Cc: jean.christophe.helary, drew.adams, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > Do you mean a toggle for an individual character in the search string > or a toggle for the whole search string? I think it needs to be a toggle that applies to the input keys, so when you toggle it, the new state affects subsequent keys. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-08 5:36 ` Ulrich Mueller 2015-09-08 6:04 ` Jean-Christophe Helary @ 2015-09-08 15:47 ` Eli Zaretskii 2015-09-08 16:57 ` Drew Adams 2015-09-08 21:20 ` Juri Linkov 2015-09-08 20:09 ` Richard Stallman 2 siblings, 2 replies; 86+ messages in thread From: Eli Zaretskii @ 2015-09-08 15:47 UTC (permalink / raw) To: Ulrich Mueller; +Cc: drew.adams, emacs-devel > Date: Tue, 8 Sep 2015 07:36:51 +0200 > Cc: Drew Adams <drew.adams@oracle.com>, emacs-devel@gnu.org > From: Ulrich Mueller <ulm@gentoo.org> > > >>>>> On Tue, 01 Sep 2015, Eli Zaretskii wrote: > > >> No. You are asking for that only when you use a search pattern > >> that does not use the diacriticals. When you search with á in > >> the pattern you are NOT asking for matches that disregard the > >> diacriticals. And why not? > > > Because á does include a diacritical. By specifying it, the user > > told us the diacriticals are important, and shouldn't be > > disregarded. > > I disagree. When I search for "Müller" I want it to also match > "Muller" Then you should type "Muller" instead of "Müller". > (I'd also like it to match "Mueller" but that's a different issue.) With this feature, you can. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-08 15:47 ` Eli Zaretskii @ 2015-09-08 16:57 ` Drew Adams 2015-09-08 21:20 ` Juri Linkov 1 sibling, 0 replies; 86+ messages in thread From: Drew Adams @ 2015-09-08 16:57 UTC (permalink / raw) To: Eli Zaretskii, Ulrich Mueller; +Cc: emacs-devel > > > Because á does include a diacritical. By specifying it, the user > > > told us the diacriticals are important, and shouldn't be > > > disregarded. > > > > I disagree. When I search for "Müller" I want it to also match > > "Muller" > > Then you should type "Muller" instead of "Müller". I believe Ulrich is specifically asking to be able to type (or paste) "Müller" _instead of having_ to type "Muller", to match both "Müller" and "Muller". Telling him to just type "Muller" ignores his request and his argument that it is useful to be able to do what he asks. That's all we've heard, so far, as an argument against the proposal: You don't need it; just get by with the canonical chars instead of accented chars in search strings, if you want char folding. No reason given why someone should not _be able_ to do what Ulrich wants. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-08 15:47 ` Eli Zaretskii 2015-09-08 16:57 ` Drew Adams @ 2015-09-08 21:20 ` Juri Linkov 2015-09-09 2:42 ` Eli Zaretskii 1 sibling, 1 reply; 86+ messages in thread From: Juri Linkov @ 2015-09-08 21:20 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Ulrich Mueller, drew.adams, emacs-devel >> (I'd also like it to match "Mueller" but that's a different issue.) > > With this feature, you can. This is not what I see. The generated regexp for “u” is: \(?:u[̀-̄̆̈-̨̛̣̤̭̰̌̏̑]\|[uù-üũūŭůűųưǔȕȗᵘᵤṳṵṷụủ⒰ⓤu𝐮𝑢𝒖𝓊𝓾𝔲𝕦𝖚𝗎𝘂𝘶𝙪𝚞]\) that doesn't match “ue”. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-08 21:20 ` Juri Linkov @ 2015-09-09 2:42 ` Eli Zaretskii 2015-09-09 11:23 ` Artur Malabarba [not found] ` <<CAAdUY-JMQVsRFku8nwX8JcA9k6Y9sHWoVL6ZC60RHnjoj0cd+Q@mail.gmail.com> 0 siblings, 2 replies; 86+ messages in thread From: Eli Zaretskii @ 2015-09-09 2:42 UTC (permalink / raw) To: Juri Linkov; +Cc: ulm, drew.adams, emacs-devel > From: Juri Linkov <juri@linkov.net> > Cc: Ulrich Mueller <ulm@gentoo.org>, drew.adams@oracle.com, emacs-devel@gnu.org > Date: Wed, 09 Sep 2015 00:20:20 +0300 > > >> (I'd also like it to match "Mueller" but that's a different issue.) > > > > With this feature, you can. > > This is not what I see. This needs customizing the equivalence set. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-09 2:42 ` Eli Zaretskii @ 2015-09-09 11:23 ` Artur Malabarba 2015-09-09 13:32 ` Drew Adams 2015-09-09 15:12 ` Richard Stallman [not found] ` <<CAAdUY-JMQVsRFku8nwX8JcA9k6Y9sHWoVL6ZC60RHnjoj0cd+Q@mail.gmail.com> 1 sibling, 2 replies; 86+ messages in thread From: Artur Malabarba @ 2015-09-09 11:23 UTC (permalink / raw) To: Eli Zaretskii; +Cc: ulm, emacs-devel, Drew Adams, Juri Linkov If I may weigh in. I think the whole discussion of whether this should be symmetric or not is pointless. There are arguments for both sides, and without any significant amount of empirical evidence, any choice is as good as flipping a coin. I'd much rather we focus effort on making the equiv-classes easier to customize. 2015-09-09 3:42 GMT+01:00 Eli Zaretskii <eliz@gnu.org>: >> From: Juri Linkov <juri@linkov.net> >> Cc: Ulrich Mueller <ulm@gentoo.org>, drew.adams@oracle.com, emacs-devel@gnu.org >> Date: Wed, 09 Sep 2015 00:20:20 +0300 >> >> >> (I'd also like it to match "Mueller" but that's a different issue.) >> > >> > With this feature, you can. >> >> This is not what I see. > > This needs customizing the equivalence set. Yes. Discussing how to expose easy and useful customization to the user is a much more useful discussio IMO. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-09 11:23 ` Artur Malabarba @ 2015-09-09 13:32 ` Drew Adams 2015-09-09 15:12 ` Richard Stallman 1 sibling, 0 replies; 86+ messages in thread From: Drew Adams @ 2015-09-09 13:32 UTC (permalink / raw) To: bruce.connor.am, Eli Zaretskii; +Cc: ulm, emacs-devel, Juri Linkov > If I may weigh in. I think the whole discussion of whether this should > be symmetric or not is pointless. There are arguments for both sides, > and without any significant amount of empirical evidence, any choice > is as good as flipping a coin. > > I'd much rather we focus effort on making the equiv-classes easier to > customize. 1. You are welcome to say that you would rather flip a coin than try to discuss what this thread proposes. 2. I too would like to see progress wrt a discussion about letting users easily define new equivalence classes and customize existing equivalence classes. But please start a separate thread for that. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-09 11:23 ` Artur Malabarba 2015-09-09 13:32 ` Drew Adams @ 2015-09-09 15:12 ` Richard Stallman 2015-09-11 20:50 ` Juri Linkov 1 sibling, 1 reply; 86+ messages in thread From: Richard Stallman @ 2015-09-09 15:12 UTC (permalink / raw) To: bruce.connor.am; +Cc: eliz, juri, ulm, drew.adams, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > I'd much rather we focus effort on making the equiv-classes easier to customize. Let's not call them "equiv-classes", because that term presupposes symmetry. (An equivalence relation is symmetric.) Let's call them search classes for characters. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-09 15:12 ` Richard Stallman @ 2015-09-11 20:50 ` Juri Linkov 0 siblings, 0 replies; 86+ messages in thread From: Juri Linkov @ 2015-09-11 20:50 UTC (permalink / raw) To: Richard Stallman; +Cc: drew.adams, eliz, ulm, bruce.connor.am, emacs-devel > > I'd much rather we focus effort on making the equiv-classes easier to customize. > > Let's not call them "equiv-classes", because that term presupposes > symmetry. (An equivalence relation is symmetric.) Let's call them > search classes for characters. A case table can define all of them: upcase, canonicalize, and equivalence classes, so char-folding could define equiv-classes as well. ^ permalink raw reply [flat|nested] 86+ messages in thread
[parent not found: <<CAAdUY-JMQVsRFku8nwX8JcA9k6Y9sHWoVL6ZC60RHnjoj0cd+Q@mail.gmail.com>]
[parent not found: <<E1ZZh2a-0003u6-Fj@fencepost.gnu.org>]
* RE: char equivalence classes in search - why not symmetric? [not found] ` <<E1ZZh2a-0003u6-Fj@fencepost.gnu.org> @ 2015-09-09 15:22 ` Drew Adams 2015-09-10 2:03 ` Richard Stallman 0 siblings, 1 reply; 86+ messages in thread From: Drew Adams @ 2015-09-09 15:22 UTC (permalink / raw) To: rms, bruce.connor.am; +Cc: eliz, juri, ulm, drew.adams, emacs-devel > > I'd much rather we focus effort on making the equiv-classes easier to > > customize. > > Let's not call them "equiv-classes", because that term presupposes > symmetry. (An equivalence relation is symmetric.) Let's call them > search classes for characters. They are equivalence classes. The chars are equivalent when searched for (with char folding turned on). The equivalence relation is among the chars in the class. This equivalence has nothing to do with the symmetry of handling them between search string and searched text. Whether or not they should _also_ be equivalent (handled the same way) when used in the search string is the topic of this thread. But even without that improvement, i.e., currently, the chars are equivalent when searched for. "Search classes for characters" means little. It says nothing about what makes them a class - what they have in common. What they have in common is that they are treated the same (equivalently) when searched for. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-09 15:22 ` Drew Adams @ 2015-09-10 2:03 ` Richard Stallman 2015-09-10 3:15 ` Drew Adams 0 siblings, 1 reply; 86+ messages in thread From: Richard Stallman @ 2015-09-10 2:03 UTC (permalink / raw) To: Drew Adams; +Cc: ulm, bruce.connor.am, emacs-devel, eliz, juri, drew.adams [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > They are equivalence classes. The chars are equivalent when searched > for (with char folding turned on). No, they aren't. For instance, A and Á are not equivalent in search. Searching for A will match Á, but searching for Á will not match A. To make them equivalent would be a change for the worse. I already explained why. Current -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-10 2:03 ` Richard Stallman @ 2015-09-10 3:15 ` Drew Adams 2015-09-10 6:57 ` David Kastrup 2015-09-10 15:50 ` Richard Stallman 0 siblings, 2 replies; 86+ messages in thread From: Drew Adams @ 2015-09-10 3:15 UTC (permalink / raw) To: rms; +Cc: eliz, emacs-devel, ulm, bruce.connor.am, juri > > They are equivalence classes. The chars are equivalent when searched > > for (with char folding turned on). > > No, they aren't. For instance, A and Á are not equivalent in search. > Searching for A will match Á, but searching for Á will not match A. Please read what I said: "The chars are equivalent when searched for." ^^^^^^^^^^^^^^^^^ I did *not* say, as you say, that they are "equivalent in search." I tried to carefully distinguish the two uses of the chars: when used as search targets (they are currently equivalent) vs when used in the search string (they are not equivalent, currently). If you search for A with char folding on you will find both A and Á (and all the rest of the A family). All members of that family (class) are equivalent _as search targets_. Currently. 100% equivalent. They form an equivalence class wrt the operation of searching _for_ them. They are not yet equivalent also as search patterns, i.e., when used in the search string. That is the proposal of this thread: to make them equivalent also in their use in a search string (when char folding is turned on). > To make them equivalent would be a change for the worse. > I already explained why. The only explanation I saw from you was that you want the presence of an accented char in the search string to automatically turn off char folding. That's your preference. It leads to an absolute reduction of possibilities for users (they cannot use abstract from accented search when there are accented chars in the search string). But you have every right to prefer that limitation. Please be aware that with what is being proposed a user can still, anytime, get diacritic-sensitive search when there are accented chars in the search string (and when there not). It is sufficient to toggle off char folding. You want that toggling off to happen automatically, based on the mere presence of an accented char in the search string. I don't, because users lose the possibility of getting char-folded search whenever there are accented chars in the search string. They then need to edit the search string if they want to abstract from diacritics, replacing any such chars with the unaccented ("base") versions, in order to get char-fold search. In the code I sent, I provided a user option, to let _users_ decide which behavior they want, individually: the one you prefer or the one I prefer. Why not give them the choice? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-10 3:15 ` Drew Adams @ 2015-09-10 6:57 ` David Kastrup 2015-09-10 15:02 ` Drew Adams 2015-09-10 15:50 ` Richard Stallman 1 sibling, 1 reply; 86+ messages in thread From: David Kastrup @ 2015-09-10 6:57 UTC (permalink / raw) To: Drew Adams; +Cc: rms, ulm, bruce.connor.am, juri, eliz, emacs-devel Drew Adams <drew.adams@oracle.com> writes: >> > They are equivalence classes. The chars are equivalent when searched >> > for (with char folding turned on). >> >> No, they aren't. For instance, A and Á are not equivalent in search. >> Searching for A will match Á, but searching for Á will not match A. > > Please read what I said: "The chars are equivalent when searched for." > ^^^^^^^^^^^^^^^^^ They aren't. Searching with the search string "Á" will find "Á" but not "A". > I did *not* say, as you say, that they are "equivalent in search." > I tried to carefully distinguish the two uses of the chars: when used > as search targets (they are currently equivalent) vs when used in the > search string (they are not equivalent, currently). Yes, there is a distinction between search targets and search spec. But they are different in either category. -- David Kastrup ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-10 6:57 ` David Kastrup @ 2015-09-10 15:02 ` Drew Adams 0 siblings, 0 replies; 86+ messages in thread From: Drew Adams @ 2015-09-10 15:02 UTC (permalink / raw) To: David Kastrup; +Cc: rms, ulm, bruce.connor.am, juri, eliz, emacs-devel > >>> They are equivalence classes. The chars are equivalent when searched > >>> for (with char folding turned on). > >> > >> No, they aren't. For instance, A and Á are not equivalent in search. > >> Searching for A will match Á, but searching for Á will not match A. > > > > Please read what I said: "The chars are equivalent when searched for." > > ^^^^^^^^^^^^^^^^^ (with char-fold search, i.e., ignoring diacritics - that's the context) > They aren't. Searching with the search string "Á" will find "Á" but > not "A". For anyone who really still does not understand, and anyone who might be pretending not to understand ;-): When search is case-insensitive, occurrences of a and A in the searched text are found equivalently. As search targets, a and A are equivalent for case-insensitive search. If you ask to find an occurrence of the first letter of the English alphabet, and you say that you don't care about case, you find, as you expect, either a or A, indifferently. a and A in the searched text are treated the same by case folding. They form an equivalence class in this context. But in Emacs, if you put A in the search string then you inhibit, turn OFF, blow away case-insensitive search - case is no longer folded. So of course any statement about the behavior of case-fold search is irrelevant then. Likewise, for char folding. When char folding is on, A and Á in the searched text are found equivalently. As search targets, A and Á are equivalent for char-fold search. If you don't care about diacritics, you can expect to find either A or Á, indifferently, and you do, when char folding is in effect. A and Á in the searched text are treated the same by char folding. They form an equivalence class in this context. But in Emacs, currently, if you put Á in the search string then you inhibit, turn OFF, blow away char-fold search. So of course any statement about the behavior of char-fold search is irrelevant then. a and A for case folding, and A and Á for char folding, form equivalence classes wrt being found in searched text. Case folding does NOT apply if you put A in the search string. Char folding does NOT apply if you put Á in the search string. Ulrich Müller CANNOT search for his last name using Müller in the search string and have search ignore diacritics, so that it matches indifferently Müller and Muller. That is, char folding simply DOES NOT WORK here - verboten. (He can of course use regexp search to work around the limitation.) > > I did *not* say, as you say, that they are "equivalent in search." > > I tried to carefully distinguish the two uses of the chars: when used > > as search targets (they are currently equivalent) vs when used in the > > search string (they are not equivalent, currently). > > Yes, there is a distinction between search targets and search spec. But > they are different in either category. Indeed, sigh. The point of the proposal of this thread is to _allow_ users to search _using char folding_ regardless of whether there are diacritics in the search string. They would still be able to use search without char folding, e.g., to search for Á and find only Á, not also A. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-10 3:15 ` Drew Adams 2015-09-10 6:57 ` David Kastrup @ 2015-09-10 15:50 ` Richard Stallman 1 sibling, 0 replies; 86+ messages in thread From: Richard Stallman @ 2015-09-10 15:50 UTC (permalink / raw) To: Drew Adams; +Cc: eliz, emacs-devel, ulm, bruce.connor.am, juri [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > No, they aren't. For instance, A and Á are not equivalent in search. > > Searching for A will match Á, but searching for Á will not match A. > Please read what I said: "The chars are equivalent when searched for." > ^^^^^^^^^^^^^^^^^ I stand corrected. Strictly speaking, that is true. But since the term's implications could be misleading, let's avoid the word "equivalence" and say it in other ways. > That is the proposal of this thread: to make > them equivalent also in their use in a search string (when char > folding is turned on). I think that is a mistake. > The only explanation I saw from you was that you want the presence > of an accented char in the search string to automatically turn off > char folding. That's your preference. I proposed that. But perhaps making Á match only Á is better. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-08 5:36 ` Ulrich Mueller 2015-09-08 6:04 ` Jean-Christophe Helary 2015-09-08 15:47 ` Eli Zaretskii @ 2015-09-08 20:09 ` Richard Stallman 2015-09-08 21:00 ` Drew Adams 2015-09-08 21:47 ` Ulrich Mueller 2 siblings, 2 replies; 86+ messages in thread From: Richard Stallman @ 2015-09-08 20:09 UTC (permalink / raw) To: Ulrich Mueller; +Cc: eliz, drew.adams, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > I disagree. When I search for "Müller" I want it to also match > "Muller" because some people (e.g., in French speaking countries) use > this as an approximation of the spelling. Are you suggesting that searching for ü should match u but not ú or ù? -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-08 20:09 ` Richard Stallman @ 2015-09-08 21:00 ` Drew Adams 2015-09-09 15:06 ` Richard Stallman 2015-09-08 21:47 ` Ulrich Mueller 1 sibling, 1 reply; 86+ messages in thread From: Drew Adams @ 2015-09-08 21:00 UTC (permalink / raw) To: rms, Ulrich Mueller; +Cc: eliz, emacs-devel > > I disagree. When I search for "Müller" I want it to also match > > "Muller" because some people (e.g., in French speaking countries) use > > this as an approximation of the spelling. > > Are you suggesting that searching for ü should match u but not ú or ù? I'm not speaking for Ulrich, but no, I am not suggesting that. The proposal behind this thread is that when char folding is turned ON, any char CHR in a given equivalence class would match any other char in that class, when CHR is used in a search string. So if char folding is on, you can find any of [eéèêæë] in the buffer text using any of those chars in the search string, not just `e' in the search string. None of them has a privileged role in the search string. To match only one of those folding-equivalent chars (e.g., only `e' or `é'), you would turn OFF char folding and use that exact char in the search string. Char folding would be togglable, as now, using `M-s ''. The only difference would be that when char folding is on, any of [eéèêæë] would act the same way in a search string. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-08 21:00 ` Drew Adams @ 2015-09-09 15:06 ` Richard Stallman 0 siblings, 0 replies; 86+ messages in thread From: Richard Stallman @ 2015-09-09 15:06 UTC (permalink / raw) To: Drew Adams; +Cc: ulm, eliz, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > Are you suggesting that searching for ü should match u but not ú or ù? > I'm not speaking for Ulrich, but no, I am not suggesting that. I was asking Ulrich. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-08 20:09 ` Richard Stallman 2015-09-08 21:00 ` Drew Adams @ 2015-09-08 21:47 ` Ulrich Mueller 1 sibling, 0 replies; 86+ messages in thread From: Ulrich Mueller @ 2015-09-08 21:47 UTC (permalink / raw) To: rms; +Cc: eliz, drew.adams, emacs-devel >>>>> On Tue, 08 Sep 2015, Richard Stallman wrote: >> I disagree. When I search for "Müller" I want it to also match >> "Muller" because some people (e.g., in French speaking countries) >> use this as an approximation of the spelling. > Are you suggesting that searching for ü should match u but not ú or ù? No, I am not. It is fine if the search would match a u with any diacritics. It does not make much of a practical difference because both Múller and Mùller are unlikely spellings. Ulrich ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-01 16:16 ` Eli Zaretskii [not found] ` <<38061f42-eaf1-47c6-b74d-f676ac952b18@default> 2015-09-01 17:50 ` Drew Adams @ 2015-09-02 15:34 ` Richard Stallman 2015-09-02 15:56 ` Drew Adams ` (3 more replies) 2 siblings, 4 replies; 86+ messages in thread From: Richard Stallman @ 2015-09-02 15:34 UTC (permalink / raw) To: Eli Zaretskii; +Cc: drew.adams, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] Since it is possible to search for only 'á', it would be nice to have some convenient way to search only for 'a' with no accents. The only convenient interface I can think of is that you type, in a postfix input method, a ' DEL. Currently that is equivalent to typing just a. But we could conceivably make it different. Can someone think of some other interface for this? -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-02 15:34 ` Richard Stallman @ 2015-09-02 15:56 ` Drew Adams 2015-09-02 16:05 ` Eli Zaretskii ` (2 subsequent siblings) 3 siblings, 0 replies; 86+ messages in thread From: Drew Adams @ 2015-09-02 15:56 UTC (permalink / raw) To: rms, Eli Zaretskii; +Cc: emacs-devel > Since it is possible to search for only 'á', it would be nice to have > some convenient way to search only for 'a' with no accents. > > The only convenient interface I can think of is that you type, in a > postfix input method, a ' DEL. Currently that is equivalent to typing > just a. But we could conceivably make it different. > > Can someone think of some other interface for this? Yes, I mentioned this. And see the proposal from Juri in this thread: During Isearch, `M-s '' (he wrote `M-'' but I think he meant `M-s '') would toggle character folding, just as `M-c' toggles case folding. If char folding is on then `a' matches all of the variants (á etc.). But if it it is off then `a' matches only `a'. Users could customize the default (on or off), just as they can today customize `case-fold-search'. So someone could leave char folding on most of the time, and toggle it off anytime using `M-s '', or vice versa, leave it off most of the time and toggle it on. If it is on and you want to search for only `a', not also á etc.: C-s M-s ' a ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-02 15:34 ` Richard Stallman 2015-09-02 15:56 ` Drew Adams @ 2015-09-02 16:05 ` Eli Zaretskii 2015-09-02 21:51 ` Jean-Christophe Helary 2015-09-02 16:10 ` Artur Malabarba 2015-09-03 19:49 ` Pip Cet 3 siblings, 1 reply; 86+ messages in thread From: Eli Zaretskii @ 2015-09-02 16:05 UTC (permalink / raw) To: rms; +Cc: drew.adams, emacs-devel > From: Richard Stallman <rms@gnu.org> > CC: drew.adams@oracle.com, emacs-devel@gnu.org > Date: Wed, 02 Sep 2015 11:34:28 -0400 > > Since it is possible to search for only 'á', it would be nice to have > some convenient way to search only for 'a' with no accents. > > The only convenient interface I can think of is that you type, in a > postfix input method, a ' DEL. Currently that is equivalent to typing > just a. But we could conceivably make it different. > > Can someone think of some other interface for this? What is its equivalent for letter-case differences? IOW, how do I search for a without also catching A? ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-02 16:05 ` Eli Zaretskii @ 2015-09-02 21:51 ` Jean-Christophe Helary 2015-09-02 22:15 ` Drew Adams ` (2 more replies) 0 siblings, 3 replies; 86+ messages in thread From: Jean-Christophe Helary @ 2015-09-02 21:51 UTC (permalink / raw) To: emacs-devel > On Sep 3, 2015, at 01:05, Eli Zaretskii <eliz@gnu.org> wrote: > > What is its equivalent for letter-case differences? IOW, how do I > search for a without also catching A? Maybe the default is wrong: a should catch only a (and not aAàá etc.) a case modifier would allow a to catch aA and a diacritic modifier would allow a to catch aàá etc. the free case and diacritic modifier can be combined so that a can catch aAàÀáÁ etc. ie, the default it to catch *exactly* what the user types. Jean-Christophe Helary ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-02 21:51 ` Jean-Christophe Helary @ 2015-09-02 22:15 ` Drew Adams 2015-09-03 15:37 ` Richard Stallman 2015-09-03 2:41 ` Eli Zaretskii 2015-09-03 15:00 ` Stefan Monnier 2 siblings, 1 reply; 86+ messages in thread From: Drew Adams @ 2015-09-02 22:15 UTC (permalink / raw) To: Jean-Christophe Helary, emacs-devel > > What is its equivalent for letter-case differences? IOW, how do I > > search for a without also catching A? > > Maybe the default is wrong: > a should catch only a (and not aAàá etc.) > a case modifier would allow a to catch aA > and a diacritic modifier would allow a to catch aàá etc. > the free case and diacritic modifier can be combined so that a can catch > aAàÀáÁ etc. > > ie, the default it to catch *exactly* what the user types. Personally, I too think that is better default behavior. For char folding, case folding, and whitespace folding. But it's not very important, as long as users can (a) set their own default behavior by customizing one or more options and (b) easily toggle each kind of folding on the fly. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-02 22:15 ` Drew Adams @ 2015-09-03 15:37 ` Richard Stallman 0 siblings, 0 replies; 86+ messages in thread From: Richard Stallman @ 2015-09-03 15:37 UTC (permalink / raw) To: Drew Adams; +Cc: jean.christophe.helary, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] Jean-Christophe Helary wrote: > > Maybe the default is wrong: > > a should catch only a (and not aAàá etc.) > > a case modifier would allow a to catch aA > > and a diacritic modifier would allow a to catch aàá etc. What are this "case modifier" and "diacritic modifier"? If they are easy to type, this might be convenient. If they are hard, I think the existing default is better for handling case, and maybe for diacritics too. Meanwhile, there is also the issue of discoverability. If case-fold search required memorizing a special character, most users would not memorize it and would never use it. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-02 21:51 ` Jean-Christophe Helary 2015-09-02 22:15 ` Drew Adams @ 2015-09-03 2:41 ` Eli Zaretskii 2015-09-03 3:08 ` Jean-Christophe Helary 2015-09-03 15:00 ` Stefan Monnier 2 siblings, 1 reply; 86+ messages in thread From: Eli Zaretskii @ 2015-09-03 2:41 UTC (permalink / raw) To: Jean-Christophe Helary; +Cc: emacs-devel > From: Jean-Christophe Helary <jean.christophe.helary@gmail.com> > Date: Thu, 3 Sep 2015 06:51:07 +0900 > > the default it to catch *exactly* what the user types. That goes against long-standing Emacs practice, and I envision strong objections. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-03 2:41 ` Eli Zaretskii @ 2015-09-03 3:08 ` Jean-Christophe Helary 2015-09-03 7:28 ` Artur Malabarba 2015-09-03 14:33 ` Eli Zaretskii 0 siblings, 2 replies; 86+ messages in thread From: Jean-Christophe Helary @ 2015-09-03 3:08 UTC (permalink / raw) To: emacs-devel > On Sep 3, 2015, at 11:41, Eli Zaretskii <eliz@gnu.org> wrote: > >> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com> >> Date: Thu, 3 Sep 2015 06:51:07 +0900 >> >> the default it to catch *exactly* what the user types. > > That goes against long-standing Emacs practice, and I envision strong > objections. Even if the current behavior were to be emulated by appropriate variables? Jean-Christophe Helary ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-03 3:08 ` Jean-Christophe Helary @ 2015-09-03 7:28 ` Artur Malabarba 2015-09-03 17:15 ` Drew Adams 2015-09-03 14:33 ` Eli Zaretskii 1 sibling, 1 reply; 86+ messages in thread From: Artur Malabarba @ 2015-09-03 7:28 UTC (permalink / raw) To: Jean-Christophe Helary; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 682 bytes --] On 3 Sep 2015 4:08 am, "Jean-Christophe Helary" < jean.christophe.helary@gmail.com> wrote: > > > > On Sep 3, 2015, at 11:41, Eli Zaretskii <eliz@gnu.org> wrote: > > > > That goes against long-standing Emacs practice, and I envision strong > > objections. > > Even if the current behavior were to be emulated by appropriate variables? Yes, and you can count me among those objections. When I first started with emacs, case folding by default was something I liked a lot, before I ever knew how to configure this stuff. I also only learned about lax whitespace when it became the default (IIRC). It was a feature that already existed and yet I had no idea because it wasn't default. [-- Attachment #2: Type: text/html, Size: 911 bytes --] ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-03 7:28 ` Artur Malabarba @ 2015-09-03 17:15 ` Drew Adams 2015-09-07 13:52 ` Nix 0 siblings, 1 reply; 86+ messages in thread From: Drew Adams @ 2015-09-03 17:15 UTC (permalink / raw) To: bruce.connor.am, Jean-Christophe Helary; +Cc: emacs-devel > Yes, and you can count me among those objections. > When I first started with emacs, case folding by default was something > I liked a lot, before I ever knew how to configure this stuff. > I also only learned about lax whitespace when it became the default (IIRC). > It was a feature that already existed and yet I had no idea because > it wasn't default. Emacs _should_ work on improving discoverability, IMO, but that is a separate discussion. IMO and FWIW, it is misguided to provide confusing, dwim behavior by default. Hard for a newbie to guess what the behavior really is, because it is too complex, conditional, contextual, whatever. The argument that we have this nifty feature and newbies won't discover it on their own easily, so let's foist it upon them from the outset, as the default behavior, is quite misguided. What should be done is to have simple, obvious default behavior, easy to fathom. AND to have easy ways to discover alternate, optional, fancy behavior that some of us might be convinced is handier, more powerful, more elegant, or more clever. Discoverability is not an argument for choosing any default behavior. Poor discoverability is an argument for improving discoverability. Nothing more. That should be a no-brainer, IMO, but we hear this over and over again. Developers like to show off the clever things they come up with. That's human and normal. Add such things, sure, but don't make them the default behavior. Especially when they are brand new. That a somewhat dwimish default was chosen for case folding 40 years ago, back when I was programming FORTRAN and most editing and programming involved case-insensitive contexts, should not be an argument for using it today - and certainly not for doubling down on it for new developments (e.g. char folding). It should instead be a reason to revisit whether we, in 2015, should continue to have search be case-insensitive by default. There is only one reasonable argument I can see in favor of keeping case insensitivity the default, and it does not at all apply to the other kinds of folding we are talking about now (char folding, whitespace folding). This is why I said: But I won't bother making that argument for case folding. I am not arguing for a change now in the longstanding case-fold behavior. I am arguing that we get this right for char folding. What is that somewhat reasonable argument for turning on case insensitivity by default? Habit. I see no other good argument for it "nowadays". Forty years ago, yes; today, no. Today, most contexts involve both uppercase and lowercase letters, and they are distinguished semantically (case-sensitive). It's perhaps a bit odd that some of those who are so quick to argue for "modernizing" Emacs might also argue to keep their case insensitivity by default. Old-fartness is relative? The rule about least surprise for newbies I expressed above applies even more to the dwim rule that an uppercase letter in the search string magically flips search to case sensitivity. Handy as you might find that dwim, it is hardly immediately clear to a newbie what is going on. Other editors that are case-insensitive by default do not throw such a gotcha at new users. (Emacs is not your average editor, and it is great that Emacs does fancier things than most do, but we're talking about default behavior here.) I mention this to try to put a stop to the application of an old rationale for case folding to char folding etc., not to argue that we should (now) consider changing the default behavior for case folding. To be clear, and to try to forestall the usual whining from some: I don't care much what the _default_ behavior is for char folding. That's not what this thread is about. I, like Jean-Christophe apparently, think that it helps newbies more to have Isearch, by default, search for just what you type (imagine!). But I don't feel strongly about that. What is more important is to be able to (a) customize the default behavior and (b) toggle it anytime during Isearch. Also important, to me, is to be able, as I proposed and as Juri apparently seconded, to have `á' match any of the `a' variants, just like `a' can do. That is, be able to toggle whether `á' (or `a') matches only itself or all `a' variants - e.g., as Juri proposed, using `M-''. And that, BTW, is the topic of this thread (see Subject line). What goes for `a' should also go for `á': either of them should be able to match, au choix, either itself alone or any of its char-folding variants (and yes, they _are_ equivalences). I also support Juri's mention of doing the same for whitespace folding: letting `M-s SPC' toggle whitespace dwimming (option `search-whitespace-regexp'). But we can also separate out that discussion from the current topic, which is about char folding. The general argument about the default behavior is that what a user puts in the search string is what should be looked for. If s?he inserts a SPC char then only a SPC char should be sought. If s?he inserts two consecutive SPC chars then only a two consecutive SPC chars should be sought. You want cleverer, handier behavior? Customize the option. Attempts to finesse the confusion and the possible useful dwim behaviors tend to end with even more complex dwim behavior: rules upon rules. See recent discussions about whitespace, where we hear things like SPC should (by default) match any amount of any whitespace, but SPC SPC should match only SPC SPC. Unless the moon is full or it is Tuesday before noon... Epicycles upon epicycles. Far better to keep the default behavior simple and immediately understandable - no need to look up the doc and study a dwim flowchart. On top of that we can add any fancy alternative behaviors we think are handier or more clever. But let's not impose those on newbies as default behavior, no matter how helpful and ingenious we are convinced they might be. And certainly not with the excuse that it makes the fancy feature more discoverable. [The last (so far) of the folding things is what `M-s i' does: it toggles search behavior for invisible text. I'm OK with the default value in this case, but it too could be open for discussion in the general context of folding. That too is best left for a separate discussion.] ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-03 17:15 ` Drew Adams @ 2015-09-07 13:52 ` Nix 2015-09-07 17:07 ` Drew Adams 2015-09-08 2:17 ` Richard Stallman 0 siblings, 2 replies; 86+ messages in thread From: Nix @ 2015-09-07 13:52 UTC (permalink / raw) To: Drew Adams; +Cc: Jean-Christophe Helary, bruce.connor.am, emacs-devel On 3 Sep 2015, Drew Adams spake thusly: > IMO and FWIW, it is misguided to provide confusing, dwim behavior > by default. Hard for a newbie to guess what the behavior really > is, because it is too complex, conditional, contextual, whatever. FWIW I just introduced Emacs to a newbie last month -- using trunk Emacs because that's what I happened to have available. She was very happy indeed about not only isearch, not only case-fold search but specifically char-fold search, and she writes stuff using diacritics all the time. The key to remember here is that there are many use cases in which it is better if isearch finds something similar to what you typed than if it misses something you were looking for: you can always hit C-s again! So thanks to case-fold and char-fold search she doesn't have to worry about getting either the case or diacritics right, and can cut down on chording and compose characters while searching. So that's one newbie in particular who would vociferously disagree with you. > What should be done is to have simple, obvious default behavior, She found "searching ignores accent-like things and case" to be easy and instantly understandable, even though the implementation of ignoring even case is (thanks to case-conversion tables) quite complicated in a Unicode world. -- NULL && (void) ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-07 13:52 ` Nix @ 2015-09-07 17:07 ` Drew Adams 2015-09-07 23:23 ` Nix 2015-09-08 2:17 ` Richard Stallman 1 sibling, 1 reply; 86+ messages in thread From: Drew Adams @ 2015-09-07 17:07 UTC (permalink / raw) To: Nix; +Cc: Jean-Christophe Helary, bruce.connor.am, emacs-devel > FWIW I just introduced Emacs to a newbie last month -- using trunk Emacs > because that's what I happened to have available. She was very happy > indeed about not only isearch, not only case-fold search but > specifically char-fold search, and she writes stuff using diacritics all > the time. > > The key to remember here is that there are many use cases in which it is > better if isearch finds something similar to what you typed than if it > misses something you were looking for: you can always hit C-s again! > So thanks to case-fold and char-fold search she doesn't have to worry > about getting either the case or diacritics right, and can cut down on > chording and compose characters while searching. > > So that's one newbie in particular who would vociferously disagree > with you. > > > What should be done is to have simple, obvious default behavior, ^^^^^^^ > She found "searching ignores accent-like things and case" to be easy > and instantly understandable, even though the implementation of > ignoring even case is (thanks to case-conversion tables) quite > complicated in a Unicode world. Anecdotal evidence from one newbie. OK. I don't see anything in your description of her understanding of Isearch that shows that she "would vociferously disagree" with my proposition that literal search is a better default behavior, but I guess that is how you feel. So be it. Nevertheless, I wonder a bit about her nonsurprise and instant understanding wrt char folding. Did she just search for something like `a' and find things like `á'? Or did she also search for something like `á' and find things like `a'? (She could not have, as that is not yet implemented, AFAIK.) I would be somewhat surprised if she would not be somewhat surprised that looking for `á' can find `a'. Note the current discussion and the Subject line. This thread is about making char folding treat `á' and `a' as equivalent, i.e., both directions. I think it should be clear that searching for and finding exactly what you type is _absolutely_ easier to understand than finding things that you did not type. Of course, both literal and dwim searching might be easy enough in some contexts or for some users. So sure, this absolute difference in ease of understanding does not preclude the existence of some users for whom even the most complex mapping of search string to search hits might be "easy and instantly understandable". Such users should not be bothered by whichever behavior is chosen as default. Regexp vs literal search is a good example of literal search being easier to "get". Regexp search requires some extra understanding of, or feeling for, the mapping between search patterns and what the patterns match; literal search does not: what you type is what you find, literally. I doubt that all newbies expect our whitespace folding or find it natural. Likewise, how non-nil `case-fold-search' treats the presence of an uppercase letter in the search string. These things are not obvious, in general, even if you can point to a new user for whom they seem to be obvious. The uppercase-letter-in-search-string behavior, in particular, is unusual - not common in text editors. That might have made sense as default behavior for Emacs in 1985, but now? These things are gotchas, even if there might be some newbies who do not seem to have ever been "got" by them. It is better not to make such behavior the default, as long as the alternative is useful. And it is easy enough to customize search to make such dwim searching the default for any particular user. And it is trivial to toggle the behavior anytime. There is no special reason to make the default behavior a "gotcha" one. The _only argument_ that I have heard, for making folding searches the default behavior, and the only one that I can imagine, is that if we do not do so then users might not discover them quickly, and so they might miss out on how useful they can be. I repeat what I said before about that: Discoverability is not an argument for choosing any default behavior. Poor discoverability is an argument for improving discoverability. Nothing more. > The key to remember here is that there are many use cases in > which it is better if isearch finds something similar to what > you typed than if it misses something you were looking for No, that is not anything key to remember, in this discussion. No one has doubted that non-literal search can be extremely useful. That is in fact one of the reasons for this thread: make char-fold search do exactly that for any char in the search string, including for a char with diacritics. Currently, it always searches only literally for `á', even when char folding is turned on. It should be clear that no one is arguing against the usefulness of folding search. The post you responded to was a counter to the false argument that we should turn char folding on by default because it facilitates discovery of this nifty feature. This thread is not really about what the default behavior should be, but I did address that extraneous argument, and you did respond. If there is a need to continue about that topic, we should do it in a separate thread. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-07 17:07 ` Drew Adams @ 2015-09-07 23:23 ` Nix 0 siblings, 0 replies; 86+ messages in thread From: Nix @ 2015-09-07 23:23 UTC (permalink / raw) To: Drew Adams; +Cc: Jean-Christophe Helary, bruce.connor.am, emacs-devel On 7 Sep 2015, Drew Adams spake thusly: >> FWIW I just introduced Emacs to a newbie last month -- using trunk Emacs >> because that's what I happened to have available. She was very happy >> indeed about not only isearch, not only case-fold search but >> specifically char-fold search, and she writes stuff using diacritics all >> the time. >> >> The key to remember here is that there are many use cases in which it is >> better if isearch finds something similar to what you typed than if it >> misses something you were looking for: you can always hit C-s again! >> So thanks to case-fold and char-fold search she doesn't have to worry >> about getting either the case or diacritics right, and can cut down on >> chording and compose characters while searching. >> >> So that's one newbie in particular who would vociferously disagree >> with you. >> >> > What should be done is to have simple, obvious default behavior, > ^^^^^^^ >> She found "searching ignores accent-like things and case" to be easy >> and instantly understandable, even though the implementation of >> ignoring even case is (thanks to case-conversion tables) quite >> complicated in a Unicode world. > > Anecdotal evidence from one newbie. OK. I think that counters your anecdotal evidence that newbies would find it confusing: at least one doesn't. (It's not like either of us are remotely newbies. Heck, I can't even remember what it was like to be one, so anecdata is all I have to contribute on this front.) > Nevertheless, I wonder a bit about her nonsurprise and instant > understanding wrt char folding. Did she just search for > something like `a' and find things like `á'? Or did she also > search for something like `á' and find things like `a'? (She > could not have, as that is not yet implemented, AFAIK.) She did the former, of course -- the latter is harder to type, so I cannot imagine any situation in which anyone would expect it. The whole nature of *-fold-search is that you can search for the non-chorded basis of things that must be typed with chords or which are otherwise composite and get the composite variants too. > I would be somewhat surprised if she would not be somewhat > surprised that looking for `á' can find `a'. Perhaps you never thought of it in terms of the keyboard. :) > Note the current discussion and the Subject line. This > thread is about making char folding treat `á' and `a' as > equivalent, i.e., both directions. I think that would be deeply bizarre. Searching for 'Foo' does not find 'foo' when case-fold-saerch is on: this is, as has been noted, precisely analogous to this longstanding Emacs behaviour. > I think it should be clear that searching for and finding > exactly what you type is _absolutely_ easier to understand > than finding things that you did not type. Finding things without having to type the whole thing in is exactly what isearch has always been about. This is just an extension of that, and not even a very big one. > So sure, this absolute difference in ease of understanding > does not preclude the existence of some users for whom even > the most complex mapping of search string to search hits > might be "easy and instantly understandable". Such users > should not be bothered by whichever behavior is chosen as > default. Are you actually reduced to saying that actual newbies' experience is obviously less significant than your guesses as to what newbies will surely find less confusing?? Try it on actual newbies. I bet you they won't be confused, based on my single data point :) > I doubt that all newbies expect our whitespace folding or > find it natural. Haven't you seen non-geeks typing? They leave multiple spaces routinely (often due to hitting space at the end of a run of typing, then again at the start of the next one) and expect them to act like just one. This seems quite reasonable to me, even if random irregular spacing does look too ugly for me to perpetrate it myself. > Likewise, how non-nil `case-fold-search' > treats the presence of an uppercase letter in the search > string. The thing is, both of these are more or less obscure. When people discover that lowercase also finds uppercase, or that non-diacritic also finds diacritic, they generally respond by not bothering to use uppercase in search terms for a long time. So by the time they encounter even the first half of the behaviour you call so confusing they are no longer quite newbies. > These things are not obvious, in general, even if you can > point to a new user for whom they seem to be obvious. I don't think they're even relevant to new users, precisely *because* they are not terribly discoverable. > These things are gotchas, even if there might be some > newbies who do not seem to have ever been "got" by them. This would be more convincing if you could point to any instances of newbies actually being confused by it. > It is better not to make such behavior the default, as > long as the alternative is useful. That ship sailed decades ago. -- NULL && (void) ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-07 13:52 ` Nix 2015-09-07 17:07 ` Drew Adams @ 2015-09-08 2:17 ` Richard Stallman 1 sibling, 0 replies; 86+ messages in thread From: Richard Stallman @ 2015-09-08 2:17 UTC (permalink / raw) To: Nix; +Cc: jean.christophe.helary, bruce.connor.am, drew.adams, emacs-devel [[[ To any NSA and FBI agents reading my email: please consider ]]] [[[ whether defending the US Constitution against all enemies, ]]] [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > FWIW I just introduced Emacs to a newbie last month -- using trunk Emacs > because that's what I happened to have available. She was very happy > indeed about not only isearch, not only case-fold search but > specifically char-fold search, and she writes stuff using diacritics all > the time. I expect people will generally like it. -- Dr Richard Stallman President, Free Software Foundation (gnu.org, fsf.org) Internet Hall-of-Famer (internethalloffame.org) Skype: No way! See stallman.org/skype.html. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-03 3:08 ` Jean-Christophe Helary 2015-09-03 7:28 ` Artur Malabarba @ 2015-09-03 14:33 ` Eli Zaretskii 1 sibling, 0 replies; 86+ messages in thread From: Eli Zaretskii @ 2015-09-03 14:33 UTC (permalink / raw) To: Jean-Christophe Helary; +Cc: emacs-devel > From: Jean-Christophe Helary <jean.christophe.helary@gmail.com> > Date: Thu, 3 Sep 2015 12:08:17 +0900 > > > On Sep 3, 2015, at 11:41, Eli Zaretskii <eliz@gnu.org> wrote: > > > >> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com> > >> Date: Thu, 3 Sep 2015 06:51:07 +0900 > >> > >> the default it to catch *exactly* what the user types. > > > > That goes against long-standing Emacs practice, and I envision strong > > objections. > > Even if the current behavior were to be emulated by appropriate variables? You mean, customization variables? We were talking about the _default_ behavior. It's that default that I think people will object to have changed towards case-sensitivity. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-02 21:51 ` Jean-Christophe Helary 2015-09-02 22:15 ` Drew Adams 2015-09-03 2:41 ` Eli Zaretskii @ 2015-09-03 15:00 ` Stefan Monnier 2015-09-03 16:15 ` Drew Adams 2 siblings, 1 reply; 86+ messages in thread From: Stefan Monnier @ 2015-09-03 15:00 UTC (permalink / raw) To: Jean-Christophe Helary; +Cc: emacs-devel > ie, the default it to catch *exactly* what the user types. I disagree. But if you want to add a Custom var to let users change the default, that's fine by me. Personally for those rare cases when I need to explicitly disable case-folding in isearch, `M-c' works well enough, Stefan ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-03 15:00 ` Stefan Monnier @ 2015-09-03 16:15 ` Drew Adams 2015-09-03 16:23 ` Eli Zaretskii 0 siblings, 1 reply; 86+ messages in thread From: Drew Adams @ 2015-09-03 16:15 UTC (permalink / raw) To: Stefan Monnier, Jean-Christophe Helary; +Cc: emacs-devel > > ie, the default it to catch *exactly* what the user types. > > I disagree. But if you want to add a Custom var to let users change the > default, that's fine by me. > Personally for those rare cases when I need to explicitly disable > case-folding in isearch, `M-c' works well enough, There already is such a Custom var: `case-fold-search'. And in the rare cases where I need to explicitly _enable_ case folding in Isearch, `M-c' works well enough. I've customized `case-fold-search' to turn it OFF by default. But the question Jean-Christophe raised is about the _default_ behavior. And BTW, he raised it specifically wrt char folding, not case folding. The attempt, each time, to hark back to the fact that Emacs defaults _case_ folding to ON, in the context of a discussion about _char_ folding, is lamentable. We can deal with case folding later, if there is enough interest in reconsidering its default behavior. In this thread the question is about char folding, first and foremost. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-03 16:15 ` Drew Adams @ 2015-09-03 16:23 ` Eli Zaretskii 2015-09-03 16:46 ` Drew Adams 0 siblings, 1 reply; 86+ messages in thread From: Eli Zaretskii @ 2015-09-03 16:23 UTC (permalink / raw) To: Drew Adams; +Cc: jean.christophe.helary, monnier, emacs-devel > Date: Thu, 3 Sep 2015 09:15:40 -0700 (PDT) > From: Drew Adams <drew.adams@oracle.com> > Cc: emacs-devel@gnu.org > > But the question Jean-Christophe raised is about the _default_ > behavior. Indeed. > And BTW, he raised it specifically wrt char folding, not case folding. That's not true. Quote: > Maybe the default is wrong: > a should catch only a (and not aAàá etc.) > a case modifier would allow a to catch aA > and a diacritic modifier would allow a to catch aàá etc. > the free case and diacritic modifier can be combined so that a can catch > aAàÀáÁ etc. > > ie, the default it to catch *exactly* what the user types. > The attempt, each time, to hark back to the fact that Emacs defaults > _case_ folding to ON, in the context of a discussion about _char_ > folding, is lamentable. > > We can deal with case folding later, if there is enough interest in > reconsidering its default behavior. In this thread the question is > about char folding, first and foremost. I reacted specifically to Jean-Christophe's suggestion to change the default for case-fold-search. ^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric? 2015-09-03 16:23 ` Eli Zaretskii @ 2015-09-03 16:46 ` Drew Adams 0 siblings, 0 replies; 86+ messages in thread From: Drew Adams @ 2015-09-03 16:46 UTC (permalink / raw) To: Eli Zaretskii; +Cc: jean.christophe.helary, monnier, emacs-devel > > But the question Jean-Christophe raised is about the _default_ > > behavior. > > Indeed. > > > And BTW, he raised it specifically wrt char folding, not case folding. > > That's not true. Quote: > > >> Maybe the default is wrong: > >> a should catch only a (and not aAàá etc.) > >> a case modifier would allow a to catch aA > >> and a diacritic modifier would allow a to catch aàá etc. > >> the free case and diacritic modifier can be combined so that a can catch > >> aAàÀáÁ etc. > >> > >> ie, the default it to catch *exactly* what the user types. Well, OK, he did mention case as well as char folding, yes. He made, I think, a valid general point. But I agree that we should leave case folding out of it. > > The attempt, each time, to hark back to the fact that Emacs defaults > > _case_ folding to ON, in the context of a discussion about _char_ > > folding, is lamentable. > > > > We can deal with case folding later, if there is enough interest in > > reconsidering its default behavior. In this thread the question is > > about char folding, first and foremost. > > I reacted specifically to Jean-Christophe's suggestion to change the > default for case-fold-search. OK. We can agree to separate that out from the current discussion. ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-02 15:34 ` Richard Stallman 2015-09-02 15:56 ` Drew Adams 2015-09-02 16:05 ` Eli Zaretskii @ 2015-09-02 16:10 ` Artur Malabarba 2015-09-03 19:49 ` Pip Cet 3 siblings, 0 replies; 86+ messages in thread From: Artur Malabarba @ 2015-09-02 16:10 UTC (permalink / raw) Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 647 bytes --] > Since it is possible to search for only 'á', it would be nice to have > some convenient way to search only for 'a' with no accents. > > The only convenient interface I can think of is that you type, in a > postfix input method, a ' DEL. Currently that is equivalent to typing > just a. But we could conceivably make it different. > > Can someone think of some other interface for this? You can toggle off char folding with M-s '. That's the same number of keys as this idea where you would type an accent and then delete. Of course, one affects the entire search string, while the other would only affect that specific letter. [-- Attachment #2: Type: text/html, Size: 770 bytes --] ^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric? 2015-09-02 15:34 ` Richard Stallman ` (2 preceding siblings ...) 2015-09-02 16:10 ` Artur Malabarba @ 2015-09-03 19:49 ` Pip Cet 3 siblings, 0 replies; 86+ messages in thread From: Pip Cet @ 2015-09-03 19:49 UTC (permalink / raw) To: rms; +Cc: Eli Zaretskii, drew.adams, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1876 bytes --] How about "C-q a"? C-q SPC already is special-cased to mean something different in isearch mode, so it wouldn't be a drastic change. Of course that doesn't solve the problem for characters that are not represented on the user's keyboard; the quick fix that comes to mind is that quoted-insert with a negative prefix could read a character using the input method (not using read-quoted-char), then insert it as though C-q had been used with the corresponding positive prefix, so when using the TeX input method, "C-- C-q \ a l p h a" would be equivalent, during isearch, to "C-q α", to search for an alpha without an accent, breathing mark, or iota subscriptum. (The minus sign seems logical to me because we can think of C-q as a two-step command: switch to "literal mode", then read a character to insert. C-- C-q does the opposite: read a character, then go into "literal mode" to insert). In essence, that would make C-q yet another modifier key... On Wed, Sep 2, 2015 at 3:34 PM, Richard Stallman <rms@gnu.org> wrote: > [[[ To any NSA and FBI agents reading my email: please consider ]]] > [[[ whether defending the US Constitution against all enemies, ]]] > [[[ foreign or domestic, requires you to follow Snowden's example. ]]] > > Since it is possible to search for only 'á', it would be nice to have > some convenient way to search only for 'a' with no accents. > > The only convenient interface I can think of is that you type, in a > postfix input method, a ' DEL. Currently that is equivalent to typing > just a. But we could conceivably make it different. > > Can someone think of some other interface for this? > > > > -- > Dr Richard Stallman > President, Free Software Foundation (gnu.org, fsf.org) > Internet Hall-of-Famer (internethalloffame.org) > Skype: No way! See stallman.org/skype.html. > > > [-- Attachment #2: Type: text/html, Size: 2674 bytes --] ^ permalink raw reply [flat|nested] 86+ messages in thread
[parent not found: <<2a7b9134-af2a-462d-af6c-d02bad60bbe8@default>]
end of thread, other threads:[~2015-09-12 15:59 UTC | newest] Thread overview: 86+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-09-01 15:46 char equivalence classes in search - why not symmetric? Drew Adams 2015-09-01 15:52 ` Davis Herring 2015-09-01 16:51 ` Stefan Monnier 2015-09-01 17:51 ` Drew Adams 2015-09-01 18:40 ` Davis Herring 2015-09-01 19:09 ` Drew Adams 2015-09-01 22:45 ` Juri Linkov 2015-09-02 0:33 ` Drew Adams 2015-09-01 20:10 ` Stephen J. Turnbull 2015-09-01 16:16 ` Eli Zaretskii [not found] ` <<38061f42-eaf1-47c6-b74d-f676ac952b18@default> [not found] ` <<83r3miatvl.fsf@gnu.org> [not found] ` <<21998.29683.916211.867479@a1i15.kph.uni-mainz.de> [not found] ` <<9A972800-D8F0-4DA8-877E-07D5BDC2E1F9@gmail.com> 2015-09-01 17:50 ` Drew Adams 2015-09-01 18:15 ` Eli Zaretskii 2015-09-01 18:46 ` Drew Adams 2015-09-01 19:19 ` Eli Zaretskii 2015-09-01 20:15 ` Drew Adams 2015-09-08 5:36 ` Ulrich Mueller 2015-09-08 6:04 ` Jean-Christophe Helary 2015-09-08 13:31 ` Stephen J. Turnbull 2015-09-08 14:24 ` Drew Adams 2015-09-08 15:21 ` Stephen J. Turnbull 2015-09-08 16:58 ` Drew Adams 2015-09-08 17:38 ` Stephen J. Turnbull 2015-09-09 22:52 ` Drew Adams 2015-09-10 3:12 ` Drew Adams 2015-09-10 21:46 ` Drew Adams 2015-09-08 20:15 ` Richard Stallman 2015-09-08 20:15 ` Richard Stallman 2015-09-08 21:25 ` Drew Adams 2015-09-09 15:07 ` Richard Stallman 2015-09-09 15:21 ` Drew Adams 2015-09-10 2:03 ` Richard Stallman 2015-09-10 3:23 ` Drew Adams 2015-09-11 10:28 ` Richard Stallman 2015-09-11 13:28 ` Stefan Monnier 2015-09-11 16:33 ` Drew Adams 2015-09-11 20:59 ` Juri Linkov 2015-09-11 23:11 ` Drew Adams 2015-09-12 15:28 ` Richard Stallman 2015-09-11 16:31 ` Drew Adams 2015-09-11 10:28 ` Richard Stallman 2015-09-11 16:31 ` Drew Adams 2015-09-12 15:29 ` Richard Stallman [not found] ` <<8cf269bc-69d8-4752-8506-de8d992512e1@default> [not found] ` <<E1ZZPIS-0005rf-DJ@fencepost.gnu.org> 2015-09-08 21:46 ` Drew Adams [not found] ` <<E1ZZPIT-0005s6-ST@fencepost.gnu.org> [not found] ` <<da54a6cb-90eb-481d-aa20-acfad612e709@default> [not found] ` <<E1ZZgxz-0006X2-Bg@fencepost.gnu.org> [not found] ` <<cb107072-7f90-41fb-9aff-075d50eb65bb@default> [not found] ` <<E1ZZrCm-0001x4-9a@fencepost.gnu.org> [not found] ` <<4f3b1db3-d3d2-480f-8662-fbf7c74aa67f@default> [not found] ` <<E1ZaLZR-0002Bf-8q@fencepost.gnu.org> [not found] ` <<e77f8e7b-581f-436d-816a-c8daed734ff5@default> [not found] ` <<E1ZamkM-0005d4-RN@fencepost.gnu.org> 2015-09-12 15:59 ` Drew Adams 2015-09-08 13:39 ` Drew Adams 2015-09-08 21:19 ` Juri Linkov 2015-09-09 15:07 ` Richard Stallman 2015-09-08 15:47 ` Eli Zaretskii 2015-09-08 16:57 ` Drew Adams 2015-09-08 21:20 ` Juri Linkov 2015-09-09 2:42 ` Eli Zaretskii 2015-09-09 11:23 ` Artur Malabarba 2015-09-09 13:32 ` Drew Adams 2015-09-09 15:12 ` Richard Stallman 2015-09-11 20:50 ` Juri Linkov [not found] ` <<CAAdUY-JMQVsRFku8nwX8JcA9k6Y9sHWoVL6ZC60RHnjoj0cd+Q@mail.gmail.com> [not found] ` <<E1ZZh2a-0003u6-Fj@fencepost.gnu.org> 2015-09-09 15:22 ` Drew Adams 2015-09-10 2:03 ` Richard Stallman 2015-09-10 3:15 ` Drew Adams 2015-09-10 6:57 ` David Kastrup 2015-09-10 15:02 ` Drew Adams 2015-09-10 15:50 ` Richard Stallman 2015-09-08 20:09 ` Richard Stallman 2015-09-08 21:00 ` Drew Adams 2015-09-09 15:06 ` Richard Stallman 2015-09-08 21:47 ` Ulrich Mueller 2015-09-02 15:34 ` Richard Stallman 2015-09-02 15:56 ` Drew Adams 2015-09-02 16:05 ` Eli Zaretskii 2015-09-02 21:51 ` Jean-Christophe Helary 2015-09-02 22:15 ` Drew Adams 2015-09-03 15:37 ` Richard Stallman 2015-09-03 2:41 ` Eli Zaretskii 2015-09-03 3:08 ` Jean-Christophe Helary 2015-09-03 7:28 ` Artur Malabarba 2015-09-03 17:15 ` Drew Adams 2015-09-07 13:52 ` Nix 2015-09-07 17:07 ` Drew Adams 2015-09-07 23:23 ` Nix 2015-09-08 2:17 ` Richard Stallman 2015-09-03 14:33 ` Eli Zaretskii 2015-09-03 15:00 ` Stefan Monnier 2015-09-03 16:15 ` Drew Adams 2015-09-03 16:23 ` Eli Zaretskii 2015-09-03 16:46 ` Drew Adams 2015-09-02 16:10 ` Artur Malabarba 2015-09-03 19:49 ` Pip Cet [not found] <<2a7b9134-af2a-462d-af6c-d02bad60bbe8@default>
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).