* char equivalence classes in search - why not symmetric?
@ 2015-09-01 15:46 Drew Adams
2015-09-01 15:52 ` Davis Herring
2015-09-01 16:16 ` Eli Zaretskii
0 siblings, 2 replies; 86+ messages in thread
From: Drew Adams @ 2015-09-01 15:46 UTC (permalink / raw)
To: emacs-devel
When character folding is turned on, shouldn't you be able to
search for á and find (match) a, à, ã, ª, â, å, and ä?
I think so. Currently you cannot - you can only do the reverse:
search for a and find any of the above. a is treated specially.
Why?
I suppose that the logic behind the current implementation is
to mirror what we do with case-fold searching. But is that the
right thing in this case?
For case-fold searching, it was thought that if you bother to
hold the Shift key and thus use an uppercase letter then you
want to match case, and otherwise you do not (case-insensitive).
This was essentially, I think, a shortcut for programmers, and
it was introduced at a time when much of the code being searched
was case-ambivalent. (UNIX was still pretty much an exception
at that point, in distinguishing lowercase letters.)
Whether or not this behavior for case-fold is still a good thing
is questionable now, I think. I don't think it is necessary now
or particularly useful. And I think it can be confusing to
newbies. Why should searching for A be different from searching
for a, wrt case matching?
But I'm not really questioning the behavior of case-fold
searching now. I am questioning applying this same behavior
to char folding.
To me, folding a group of chars together for search purposes
should be symmetric - go both ways. It should, in effect,
treat the given group of chars as equivalent - as an
equivalence class wrt searching.
Why not? Why, when char folding, treat plain a specially for
searching? Why not treat á, a, à, ã, ª, â, å, and ä the same?
Isn't that the point here? We are telling Isearch that they
are equivalent. Why pick one of them as the canonical
search-pattern to use for finding any of them? Why privilege
a over á, a, à, ã, ª, â, å, and ä?
Now most of the time I, like most people, will by typing a
instead of á into a search string. But that's not really the
point. I think users should be able to use any members of an
equivalence class of chars indifferently.
And when it comes to chars other than letters, it might well
be that some users, with some keyboards, will find some chars
in an equivalence class easier to type than others. Let them
use/type whichever they like, no?
This feature, welcome as it is, seems only half-baked, so far.
How about equality for char-folding equivalence?
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-01 15:46 char equivalence classes in search - why not symmetric? Drew Adams
@ 2015-09-01 15:52 ` Davis Herring
2015-09-01 16:51 ` Stefan Monnier
2015-09-01 17:51 ` Drew Adams
2015-09-01 16:16 ` Eli Zaretskii
1 sibling, 2 replies; 86+ messages in thread
From: Davis Herring @ 2015-09-01 15:52 UTC (permalink / raw)
To: Drew Adams; +Cc: emacs-devel
> Whether or not this behavior for case-fold is still a good thing
> is questionable now, I think. I don't think it is necessary now
> or particularly useful. And I think it can be confusing to
> newbies. Why should searching for A be different from searching
> for a, wrt case matching?
Because having both input characters mean the same thing uselessly
deprives the user of expressive power.
> Why not? Why, when char folding, treat plain a specially for
> searching? Why not treat á, a, à, ã, ª, â, å, and ä the same?
For exactly the same reason.
> And when it comes to chars other than letters, it might well
> be that some users, with some keyboards, will find some chars
> in an equivalence class easier to type than others. Let them
> use/type whichever they like, no?
It would make sense to provide a customization option to control which
character meant the whole set -- if anyone would use it. Are there in
fact keyboards where the accented characters are significantly easier?
> This feature, welcome as it is, seems only half-baked, so far.
> How about equality for char-folding equivalence?
These are code points, not oppressed minorities.
Davis
--
This product is sold by volume, not by mass. If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-01 15:46 char equivalence classes in search - why not symmetric? Drew Adams
2015-09-01 15:52 ` Davis Herring
@ 2015-09-01 16:16 ` Eli Zaretskii
[not found] ` <<38061f42-eaf1-47c6-b74d-f676ac952b18@default>
` (2 more replies)
1 sibling, 3 replies; 86+ messages in thread
From: Eli Zaretskii @ 2015-09-01 16:16 UTC (permalink / raw)
To: Drew Adams; +Cc: emacs-devel
> Date: Tue, 1 Sep 2015 08:46:26 -0700 (PDT)
> From: Drew Adams <drew.adams@oracle.com>
>
> When character folding is turned on, shouldn't you be able to
> search for á and find (match) a, à, ã, ª, â, å, and ä?
No. You should find only á.
> I think so. Currently you cannot - you can only do the reverse:
> search for a and find any of the above. a is treated specially.
> Why?
It's the same principle as with case-folding: if you type "FOO", you
will not find the lowercase variant.
> I suppose that the logic behind the current implementation is
> to mirror what we do with case-fold searching. But is that the
> right thing in this case?
It's what the Unicode Standard recommends, and IMO it makes a lot of
sense. See http://unicode.org/reports/tr10/#Searching.
> To me, folding a group of chars together for search purposes
> should be symmetric - go both ways.
You will see that the above Unicode report explicitly recommends to
make it _asymmetric_.
> Why not? Why, when char folding, treat plain a specially for
> searching? Why not treat á, a, à, ã, ª, â, å, and ä the same?
> Isn't that the point here? We are telling Isearch that they
> are equivalent. Why pick one of them as the canonical
> search-pattern to use for finding any of them? Why privilege
> a over á, a, à, ã, ª, â, å, and ä?
Because we are not "telling Isearch that they are equivalent". We are
asking for matches that disregard the diacriticals (and in case of ª
also higher-order collation-order variation).
> Now most of the time I, like most people, will by typing a
> instead of á into a search string. But that's not really the
> point. I think users should be able to use any members of an
> equivalence class of chars indifferently.
That'd make searching for exactly á unnecessarily complicated and/or
cumbersome, for no good reason. The symmetry you suggest has no
practical advantages (because you can find all of these characters by
just specifying a), but does have significant practical disadvantages.
> This feature, welcome as it is, seems only half-baked, so far.
No need for derogatory language, thank you. We certainly have a lot
to learn about this feature, but half-baked it isn't.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-01 15:52 ` Davis Herring
@ 2015-09-01 16:51 ` Stefan Monnier
2015-09-01 17:51 ` Drew Adams
1 sibling, 0 replies; 86+ messages in thread
From: Stefan Monnier @ 2015-09-01 16:51 UTC (permalink / raw)
To: Davis Herring; +Cc: Drew Adams, emacs-devel
>> This feature, welcome as it is, seems only half-baked, so far.
>> How about equality for char-folding equivalence?
> These are code points, not oppressed minorities.
How 'bout we dedicate Sep 17 of every year all those Unicode characters
left in the dark?
Stefan
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-01 16:16 ` Eli Zaretskii
[not found] ` <<38061f42-eaf1-47c6-b74d-f676ac952b18@default>
@ 2015-09-01 17:50 ` Drew Adams
2015-09-01 18:15 ` Eli Zaretskii
2015-09-02 15:34 ` Richard Stallman
2 siblings, 1 reply; 86+ messages in thread
From: Drew Adams @ 2015-09-01 17:50 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
> > When character folding is turned on, shouldn't you be able to
> > search for á and find (match) a, à, ã, ª, â, å, and ä?
>
> No. You should find only á.
No reason?
> > I think so. Currently you cannot - you can only do the
> > reverse: search for a and find any of the above. a is treated
> > specially. Why?
>
> It's the same principle as with case-folding: if you type "FOO",
> you will not find the lowercase variant.
You're just echoing what it does, not supporting the behavior with
reasons. And I already mentioned what you say here.
> > I suppose that the logic behind the current implementation is
> > to mirror what we do with case-fold searching. But is that the
> > right thing in this case?
>
> It's what the Unicode Standard recommends, and IMO it makes a
> lot of sense. See http://unicode.org/reports/tr10/#Searching.
I don't see that, when reading that section. I do see that it
explicitly calls out that behavior as an _option_:
8.2 Asymmetric Search
Users often find asymmetric searching to be a useful option.
That users can find this optionally useful, I have no doubt.
And I wouldn't be against making it a user option in Emacs.
But I do not see anything in the section you cited that says
that this asymmetric behavior is required, or recommended.
In any case, Emacs is not beholden to any particular standard,
as RMS so often reminds us. The question is what is useful for
Emacs users.
If you think "it makes a lot of sense" then you should have
no difficulty giving some of that sense. So far, none; just
appeals to authority.
> > To me, folding a group of chars together for search purposes
> > should be symmetric - go both ways.
>
> You will see that the above Unicode report explicitly recommends
> to make it _asymmetric_.
No, I do not see that. I see that the report points out that
such an optional behavior can be useful for some users.
And it specifically points out the case "When doing an
asymmetric search", making clear that there is also the case
when NOT doing an asymmetric search.
Obviously, for the simpler case of a symmetric search there
is no need for a section describing it - it is straightforward,
whereas the asymmetric search case takes some explaining.
Which is precisely what makes it more complex for users.
Nowhere in that report do I see that asymmetric search is the
only, or even the recommended, search behavior. It is
explicitly pointed out as an optional behavior.
But I read the section quickly, and you are the expert.
Please point to where I am mistaken.
> > Why not? Why, when char folding, treat plain a specially for
> > searching? Why not treat á, a, à, ã, ª, â, å, and ä the same?
> > Isn't that the point here? We are telling Isearch that they
> > are equivalent. Why pick one of them as the canonical
> > search-pattern to use for finding any of them? Why privilege
> > a over á, a, à, ã, ª, â, å, and ä?
>
> Because we are not "telling Isearch that they are equivalent".
I think we should be. At least that should be one possibility.
> We are asking for matches that disregard the diacriticals
> (and in case of ª also higher-order collation-order variation).
No. You are asking for that only when you use a search pattern
that does not use the diacriticals. When you search with á in
the pattern you are NOT asking for matches that disregard the
diacriticals. And why not? So far, no reasons given.
I would favor being able not just to toggle between folded
and unfolded search but to cycle among folded-symmetric,
folded-asymmetric, and unfolded. Why not?
> > Now most of the time I, like most people, will by typing a
> > instead of á into a search string. But that's not really the
> > point. I think users should be able to use any members of an
> > equivalence class of chars indifferently.
>
> That'd make searching for exactly á unnecessarily complicated and/or
> cumbersome, for no good reason. The symmetry you suggest has no
> practical advantages (because you can find all of these characters by
> just specifying a), but does have significant practical disadvantages.
Assertions with no supporting reasons/examples.
> > This feature, welcome as it is, seems only half-baked, so far.
>
> No need for derogatory language, thank you.
Where I work, "half-baked" is used often, and it means not
entirely finished, whether that refers to dev, QA, doc, whatever.
It is not used in a derogatory way. And I made very clear that
I welcome this feature.
If you feel that "half-baked" in the context of software
development is derogatory then I apologize for using the term.
Let me say it this way: This feature, welcome as it is, seems
not entirely finished. Whether now or later, I would like to
see it go further.
> We certainly have a lot to learn about this feature,
And to document. And hopefully to further develop in the future.
> but half-baked it isn't.
Certainly the doc is half-baked, if baked at all. And in
terms of the longer term goal of facilitating users modifying
the classes of chars that are treated equivalently, and of
defining their own sets of such classes, we are not there yet.
Saying this does not take away from the progress made so far.
This is a very welcome feature.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-01 15:52 ` Davis Herring
2015-09-01 16:51 ` Stefan Monnier
@ 2015-09-01 17:51 ` Drew Adams
2015-09-01 18:40 ` Davis Herring
2015-09-01 20:10 ` Stephen J. Turnbull
1 sibling, 2 replies; 86+ messages in thread
From: Drew Adams @ 2015-09-01 17:51 UTC (permalink / raw)
To: Davis Herring; +Cc: emacs-devel
> Because having both input characters mean the same thing
> uselessly deprives the user of expressive power.
Examples/arguments/reasons, please. IOW, prove it.
You can always toggle char folding, just as you can toggle
case folding.
IMO, more users have been tripped up than helped by the rule
that "An upper-case letter anywhere in the incremental search
string makes the search case-sensitive." (emacs) Search Case.
Letting a user toggle between matching chars one-to-one and
matching chars according to equivalence classes, is sufficient
and clear, IMO. Adding rules on top of this is not helpful.
But I would not oppose the current behavior as an option.
Let users decide whether matching is symmetric or asymmetric.
Maybe even let users toggle, or cycle among these two folding
(one-many) behaviors and unfolded (one-one matching) behavior.
> > Why not? Why, when char folding, treat plain a specially for
> > searching? Why not treat á, a, à, ã, ª, â, å, and ä the same?
>
> For exactly the same reason.
What reason? Please show how this optional matching
behavior "deprives the user of expressive power".
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-01 17:50 ` Drew Adams
@ 2015-09-01 18:15 ` Eli Zaretskii
2015-09-01 18:46 ` Drew Adams
2015-09-08 5:36 ` Ulrich Mueller
0 siblings, 2 replies; 86+ messages in thread
From: Eli Zaretskii @ 2015-09-01 18:15 UTC (permalink / raw)
To: Drew Adams; +Cc: emacs-devel
> Date: Tue, 1 Sep 2015 10:50:22 -0700 (PDT)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: emacs-devel@gnu.org
>
> > We are asking for matches that disregard the diacriticals
> > (and in case of ª also higher-order collation-order variation).
>
> No. You are asking for that only when you use a search pattern
> that does not use the diacriticals. When you search with á in
> the pattern you are NOT asking for matches that disregard the
> diacriticals. And why not?
Because á does include a diacritical. By specifying it, the user told
us the diacriticals are important, and shouldn't be disregarded.
> > It's what the Unicode Standard recommends, and IMO it makes a
> > lot of sense. See http://unicode.org/reports/tr10/#Searching.
>
> I don't see that, when reading that section. I do see that it
> explicitly calls out that behavior as an _option_:
>
> 8.2 Asymmetric Search
> Users often find asymmetric searching to be a useful option.
"Users often find asymmetric searching to be a useful option" sounds
like a recommendation to me.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-01 17:51 ` Drew Adams
@ 2015-09-01 18:40 ` Davis Herring
2015-09-01 19:09 ` Drew Adams
2015-09-01 22:45 ` Juri Linkov
2015-09-01 20:10 ` Stephen J. Turnbull
1 sibling, 2 replies; 86+ messages in thread
From: Davis Herring @ 2015-09-01 18:40 UTC (permalink / raw)
To: Drew Adams; +Cc: emacs-devel
>> Because having both input characters mean the same thing
>> uselessly deprives the user of expressive power.
>
> Examples/arguments/reasons, please. IOW, prove it.
I'm sorry: I thought it was obvious. For case folding, there are three
sets of characters that might be considered a match: [a], [A], and [aA].
The default Emacs behavior is to make "a" mean [aA] and "A" mean [A].
For the (relatively rare) case in which [a] is desired, one can turn
case-fold-search off (e.g., with M-c). Then you gain [a] and lose [aA]
as a choice (you can't have all three from just two characters!).
With your suggestion (which addresses only case-fold-search, of course),
we would have only [aA] available whether you typed "a" or "A". That is
the less expressive power: the semantically distinct options available
have been reduced.
Of course, with more than one character there are yet other
possibilities: for two characters there are 9, of which "ab" gives you
[aA][bB] and each of the other three permutations give one
(case-sensitive) match each. 4/9 isn't great, but it's better than 1/9!
> IMO, more users have been tripped up than helped by the rule
> that "An upper-case letter anywhere in the incremental search
> string makes the search case-sensitive." (emacs) Search Case.
How did that upper-case letter get there? Commands like C-w are careful
not to add uppercase letters if there aren't already some. So the user
must have typed it explicitly, and so they were paying attention to case
and have no need for a case-insensitive search. The only harm is if
they are inconsistent in their typing -- during something as brief as
isearch.
Davis
--
This product is sold by volume, not by mass. If it appears too dense or
too sparse, it is because mass-energy conversion has occurred during
shipping.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-01 18:15 ` Eli Zaretskii
@ 2015-09-01 18:46 ` Drew Adams
2015-09-01 19:19 ` Eli Zaretskii
2015-09-08 5:36 ` Ulrich Mueller
1 sibling, 1 reply; 86+ messages in thread
From: Drew Adams @ 2015-09-01 18:46 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
> > > We are asking for matches that disregard the diacriticals
> > > (and in case of ª also higher-order collation-order variation).
> >
> > No. You are asking for that only when you use a search pattern
> > that does not use the diacriticals. When you search with á in
> > the pattern you are NOT asking for matches that disregard the
> > diacriticals. And why not?
>
> Because á does include a diacritical. By specifying it, the user told
> us the diacriticals are important, and shouldn't be disregarded.
Again, you are just parroting what the implementation does, not
giving a reason supporting it. By turning on folding, a user
can be said to be choosing to disregard diacriticals.
Again, both options for fold matching should probably be available.
There is no reason to hard-code one of them at design time.
At least no reason has been put forth so far.
> > > It's what the Unicode Standard recommends, and IMO it makes a
> > > lot of sense. See http://unicode.org/reports/tr10/#Searching.
> >
> > I don't see that, when reading that section. I do see that it
> > explicitly calls out that behavior as an _option_:
> >
> > 8.2 Asymmetric Search
> > Users often find asymmetric searching to be a useful option.
>
> "Users often find asymmetric searching to be a useful option" sounds
> like a recommendation to me.
No, it is not. Not at all. That, and all of the text about this,
makes clear, AFAICT, that this is a useful OPTIONAL behavior.
That is the language used: "a useful option". Nowhere (AFAICT)
is there any language supporting an interpretation of this as
the recommended behavior.
The language instead clearly points out that there are different
behaviors covered by the report. And the one that is complex and
needs explanation is clearly called out as an optional behavior.
Not the recommended behavior, but a useful behavior to consider
for inclusion as an option.
Anyway, thanks for confirming that there was not some text
that I missed, which in fact recommends asymmetric matching.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-01 18:40 ` Davis Herring
@ 2015-09-01 19:09 ` Drew Adams
2015-09-01 22:45 ` Juri Linkov
1 sibling, 0 replies; 86+ messages in thread
From: Drew Adams @ 2015-09-01 19:09 UTC (permalink / raw)
To: Davis Herring; +Cc: emacs-devel
> >> Because having both input characters mean the same thing
> >> uselessly deprives the user of expressive power.
> >
> > Examples/arguments/reasons, please. IOW, prove it.
>
> I'm sorry: I thought it was obvious. For case folding, there are three
> sets of characters that might be considered a match: [a], [A], and [aA].
> The default Emacs behavior is to make "a" mean [aA] and "A" mean [A].
> For the (relatively rare) case in which [a] is desired, one can turn
> case-fold-search off (e.g., with M-c). Then you gain [a] and lose [aA]
> as a choice (you can't have all three from just two characters!).
You are just echoing what the implementation does, not giving
any supporting reasons for it.
"You can't have all three from just two characters" sounds
important - except that it doesn't mean anything.
It is quite possible for the behavior to be any of these:
a matches a only
a matches a and A
A matches A only
A matches a and A
The current implementation does not provide for the last
possibility. In that, it can be argued that it "deprives
the user of expressive power".
But I won't bother making that argument for case folding.
I am not arguing for a change now in the longstanding
case-fold behavior. I am arguing that we get this right
for char folding.
> With your suggestion (which addresses only case-fold-search, of course),
> we would have only [aA] available whether you typed "a" or "A". That is
> the less expressive power: the semantically distinct options available
> have been reduced.
That's your suggestion perhaps. It's certainly not mine.
I suggest letting the user match a to a, a to [aA], A to A, and
A to [aA]. That is more expressive power, not less. With it,
the "semantically distinct options available" have been increased.
> Of course, with more than one character there are yet other
> possibilities: for two characters there are 9, of which "ab" gives you
> [aA][bB] and each of the other three permutations give one
> (case-sensitive) match each. 4/9 isn't great, but it's better than 1/9!
See above. You are reducing possibilities, not expanding them.
> > IMO, more users have been tripped up than helped by the rule
> > that "An upper-case letter anywhere in the incremental search
> > string makes the search case-sensitive." (emacs) Search Case.
>
> How did that upper-case letter get there? Commands like C-w are careful
> not to add uppercase letters if there aren't already some. So the user
> must have typed it explicitly, and so they were paying attention to case
> and have no need for a case-insensitive search. The only harm is if
> they are inconsistent in their typing -- during something as brief as
> isearch.
A char in a search string can "get there" because a user typed it,
and that can be because for that user it is easy to type. Or it can
get there from a previous search (same Isearch invocation or not).
Or it can "get there" by yanking copied text.
Try typing or pasting "réduction" to Google, and see if it ignores
hits such as "reduction". Good luck with that. Silly Google,
missing the "obvious".
It should be obvious that it can be useful to match the pattern
"réduction" against "reduction", just as it can be useful to
match the pattern "reduction" against "réduction" (and "réduction"
against "réduction" and "reduction" against "reduction").
To remove this possibility, thus reducing user expressiveness,
you really should come up with a reason.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-01 18:46 ` Drew Adams
@ 2015-09-01 19:19 ` Eli Zaretskii
2015-09-01 20:15 ` Drew Adams
0 siblings, 1 reply; 86+ messages in thread
From: Eli Zaretskii @ 2015-09-01 19:19 UTC (permalink / raw)
To: Drew Adams; +Cc: emacs-devel
> Date: Tue, 1 Sep 2015 11:46:11 -0700 (PDT)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: emacs-devel@gnu.org
>
> > > > We are asking for matches that disregard the diacriticals
> > > > (and in case of ª also higher-order collation-order variation).
> > >
> > > No. You are asking for that only when you use a search pattern
> > > that does not use the diacriticals. When you search with á in
> > > the pattern you are NOT asking for matches that disregard the
> > > diacriticals. And why not?
> >
> > Because á does include a diacritical. By specifying it, the user told
> > us the diacriticals are important, and shouldn't be disregarded.
>
> Again, you are just parroting what the implementation does
??? I explained the interpretation of the user input, how's that
implementation?
> Again, both options for fold matching should probably be available.
> There is no reason to hard-code one of them at design time.
>
> At least no reason has been put forth so far.
You've got all the reasons, you just refuse to hear them.
Time to bail out.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-01 17:51 ` Drew Adams
2015-09-01 18:40 ` Davis Herring
@ 2015-09-01 20:10 ` Stephen J. Turnbull
1 sibling, 0 replies; 86+ messages in thread
From: Stephen J. Turnbull @ 2015-09-01 20:10 UTC (permalink / raw)
To: Drew Adams; +Cc: emacs-devel
Drew Adams writes:
> > Because having both input characters mean the same thing
> > uselessly deprives the user of expressive power.
>
> Examples/arguments/reasons, please. IOW, prove it.
With "a" and "A" as distinct entities I can express either of two
things in one character. If I equivalence them, I can only express
one thing. 2 > 1. Q.E.D.
On the contrary, "we could have an option" is not a reason for having
the option. We now have a working approach which has the advantage of
being modeless while not imposing an excessive efficiency burden. By
that I mean capitalized words are relatively uncommon, and therefore
not likely to constitute a huge number of unwanted "hits" in an
isearch for an entirely lowercase string.
I'm not *sure* the same efficiency will be true for "accent folding",
but you cannot possibly be sure it's false. The current approach is
good enough for now, and experience will accumulate over time. Wait
for it.
> You can always toggle char folding, just as you can toggle
> case folding.
Modal behavior in user commands is generally avoided in Emacs where it
isn't absolutely necessary.
Bottom line, burden of proof is on *you*.
> IMO,
You repeatedly mention your opinion in the same message where you ask
others to prove things. Yet your opinion is not evidence for anything
except your opinion.
> Let users decide whether matching is symmetric or asymmetric.
I say to them: "Use the source, Luke!"
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-01 19:19 ` Eli Zaretskii
@ 2015-09-01 20:15 ` Drew Adams
0 siblings, 0 replies; 86+ messages in thread
From: Drew Adams @ 2015-09-01 20:15 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
> > > > > We are asking for matches that disregard the diacriticals
> > > > > (and in case of ª also higher-order collation-order variation).
> > > >
> > > > No. You are asking for that only when you use a search pattern
> > > > that does not use the diacriticals. When you search with á in
> > > > the pattern you are NOT asking for matches that disregard the
> > > > diacriticals. And why not?
^^^^^^^^^^^^
> > > Because á does include a diacritical. By specifying it, the user told
> > > us the diacriticals are important, and shouldn't be disregarded.
> >
> > Again, you are just parroting what the implementation does
>
> ??? I explained the interpretation of the user input, how's that
> implementation?
You described the current interpretation, by Emacs, of the
user input á. That's "what the implementation does."
That does not explain why use á in a search string _should_ mean
that diacriticals are important and shouldn't be disregarded.
And that was the question I asked - why should this be the
(only) behavior? Your answer is, just because it _is_ the
behavior.
Because it is the behavior, users expect it and we can interpret
what they want in terms of it. Well yes, sure - it's the only
choice they have now. It _is_ the behavior, so of course they
use it accordingly. They type á in order to match á. So what?
> > Again, both options for fold matching should probably be available.
> > There is no reason to hard-code one of them at design time.
> > At least no reason has been put forth so far.
>
> You've got all the reasons, you just refuse to hear them.
> Time to bail out.
The only reason you gave is that this is what Emacs does now.
And that that means that this is what a user expects. S?he
types á to match á and a to match a (or variants, with char
folding). User intention is clear here: s?he gets the behavior
s?he asks Emacs for. QED.
Sorry, that's not a reason _why_ this should be the (only)
behavior available to a user. It's just repeating that users
expect this behavior from Emacs and so act accordingly.
That they get what they expect is no proof that that is the
only useful behavior. It just shows that they know what
Emacs does.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-01 18:40 ` Davis Herring
2015-09-01 19:09 ` Drew Adams
@ 2015-09-01 22:45 ` Juri Linkov
2015-09-02 0:33 ` Drew Adams
1 sibling, 1 reply; 86+ messages in thread
From: Juri Linkov @ 2015-09-01 22:45 UTC (permalink / raw)
To: Davis Herring; +Cc: Drew Adams, emacs-devel
> I'm sorry: I thought it was obvious. For case folding, there are three
> sets of characters that might be considered a match: [a], [A], and [aA].
> The default Emacs behavior is to make "a" mean [aA] and "A" mean [A].
> For the (relatively rare) case in which [a] is desired, one can turn
> case-fold-search off (e.g., with M-c). Then you gain [a] and lose [aA]
> as a choice (you can't have all three from just two characters!).
Or in a brief table:
‘C-s a’ matches [aA]
‘C-s a M-c’ matches [a]
‘C-s A’ matches [A]
‘C-s A M-c’ matches [aA]
Substituting ‘A’ into ‘ä’ (other equivalent chars omitted for brevity):
‘C-s a’ matches [aä]
‘C-s a M-'’ matches [a]
‘C-s ä’ matches [ä]
‘C-s ä M-'’ matches [aä]
I see no problem implementing the same.
BTW, could this scheme be applied to whitespace matching as well?
‘C-s SPC’ matches [SPC TAB]
‘C-s SPC M-s SPC’ matches [SPC]
‘C-s TAB’ matches [TAB]
‘C-s TAB M-s SPC’ matches [SPC TAB]
> How did that upper-case letter get there? Commands like C-w are careful
> not to add uppercase letters if there aren't already some. So the user
> must have typed it explicitly, and so they were paying attention to case
> and have no need for a case-insensitive search. The only harm is if
> they are inconsistent in their typing -- during something as brief as
> isearch.
Yanking a string with upper-case letters into Isearch does more harm
by converting them into lower-case. I believe yanking a string
should not strip diacritics either.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-01 22:45 ` Juri Linkov
@ 2015-09-02 0:33 ` Drew Adams
0 siblings, 0 replies; 86+ messages in thread
From: Drew Adams @ 2015-09-02 0:33 UTC (permalink / raw)
To: Juri Linkov, Davis Herring; +Cc: emacs-devel
> Or in a brief table:
>
> ‘C-s a’ matches [aA]
> ‘C-s a M-c’ matches [a]
> ‘C-s A’ matches [A]
> ‘C-s A M-c’ matches [aA]
>
> Substituting ‘A’ into ‘ä’ (other equivalent chars omitted for brevity):
>
> ‘C-s a’ matches [aä]
> ‘C-s a M-'’ matches [a]
> ‘C-s ä’ matches [ä]
> ‘C-s ä M-'’ matches [aä]
>
> I see no problem implementing the same.
Did you mean `M-s '' insteaed of `M-''? If so, except for the last
line, that's what we have now, IIUC.
And yes, that would be one way to do it (get the 4 match possibilities
I requested). Gets my vote.
> BTW, could this scheme be applied to whitespace matching as well?
>
> ‘C-s SPC’ matches [SPC TAB]
> ‘C-s SPC M-s SPC’ matches [SPC]
> ‘C-s TAB’ matches [TAB]
> ‘C-s TAB M-s SPC’ matches [SPC TAB]
Sounds good to me. Again, gets my vote.
But in each case, I would want there to be a user option that
controls the default behavior, just as `case-fold-search' does.
That should be the first fix, as I mentioned earlier: change
`character-fold-search' to a defcustom. Let a user decide
which default behavior s?he wants for char folding - and
whitespace folding as well.
It is very handy to me that search always starts by default
by respecting case, because my customized value of
`case-fold-search' is nil. I would not want to have to
do `M-c' each time I start a search. Likewise, for char
folding (`M-s '') and whitespace folding (`M-s SPC').
> > How did that upper-case letter get there? Commands like C-w are careful
> > not to add uppercase letters if there aren't already some. So the user
> > must have typed it explicitly, and so they were paying attention to case
> > and have no need for a case-insensitive search. The only harm is if
> > they are inconsistent in their typing -- during something as brief as
> > isearch.
>
> Yanking a string with upper-case letters into Isearch does more harm
> by converting them into lower-case. I believe yanking a string
> should not strip diacritics either.
That too gets my vote - WYYIWYG: what you yank is what you get.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-01 16:16 ` Eli Zaretskii
[not found] ` <<38061f42-eaf1-47c6-b74d-f676ac952b18@default>
2015-09-01 17:50 ` Drew Adams
@ 2015-09-02 15:34 ` Richard Stallman
2015-09-02 15:56 ` Drew Adams
` (3 more replies)
2 siblings, 4 replies; 86+ messages in thread
From: Richard Stallman @ 2015-09-02 15:34 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: drew.adams, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Since it is possible to search for only 'á', it would be nice to have
some convenient way to search only for 'a' with no accents.
The only convenient interface I can think of is that you type, in a
postfix input method, a ' DEL. Currently that is equivalent to typing
just a. But we could conceivably make it different.
Can someone think of some other interface for this?
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-02 15:34 ` Richard Stallman
@ 2015-09-02 15:56 ` Drew Adams
2015-09-02 16:05 ` Eli Zaretskii
` (2 subsequent siblings)
3 siblings, 0 replies; 86+ messages in thread
From: Drew Adams @ 2015-09-02 15:56 UTC (permalink / raw)
To: rms, Eli Zaretskii; +Cc: emacs-devel
> Since it is possible to search for only 'á', it would be nice to have
> some convenient way to search only for 'a' with no accents.
>
> The only convenient interface I can think of is that you type, in a
> postfix input method, a ' DEL. Currently that is equivalent to typing
> just a. But we could conceivably make it different.
>
> Can someone think of some other interface for this?
Yes, I mentioned this. And see the proposal from Juri in this thread:
During Isearch, `M-s '' (he wrote `M-'' but I think he meant `M-s '')
would toggle character folding, just as `M-c' toggles case folding.
If char folding is on then `a' matches all of the variants (á etc.).
But if it it is off then `a' matches only `a'.
Users could customize the default (on or off), just as they can today
customize `case-fold-search'.
So someone could leave char folding on most of the time, and toggle it
off anytime using `M-s '', or vice versa, leave it off most of the time
and toggle it on.
If it is on and you want to search for only `a', not also á etc.:
C-s M-s ' a
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-02 15:34 ` Richard Stallman
2015-09-02 15:56 ` Drew Adams
@ 2015-09-02 16:05 ` Eli Zaretskii
2015-09-02 21:51 ` Jean-Christophe Helary
2015-09-02 16:10 ` Artur Malabarba
2015-09-03 19:49 ` Pip Cet
3 siblings, 1 reply; 86+ messages in thread
From: Eli Zaretskii @ 2015-09-02 16:05 UTC (permalink / raw)
To: rms; +Cc: drew.adams, emacs-devel
> From: Richard Stallman <rms@gnu.org>
> CC: drew.adams@oracle.com, emacs-devel@gnu.org
> Date: Wed, 02 Sep 2015 11:34:28 -0400
>
> Since it is possible to search for only 'á', it would be nice to have
> some convenient way to search only for 'a' with no accents.
>
> The only convenient interface I can think of is that you type, in a
> postfix input method, a ' DEL. Currently that is equivalent to typing
> just a. But we could conceivably make it different.
>
> Can someone think of some other interface for this?
What is its equivalent for letter-case differences? IOW, how do I
search for a without also catching A?
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-02 15:34 ` Richard Stallman
2015-09-02 15:56 ` Drew Adams
2015-09-02 16:05 ` Eli Zaretskii
@ 2015-09-02 16:10 ` Artur Malabarba
2015-09-03 19:49 ` Pip Cet
3 siblings, 0 replies; 86+ messages in thread
From: Artur Malabarba @ 2015-09-02 16:10 UTC (permalink / raw)
Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 647 bytes --]
> Since it is possible to search for only 'á', it would be nice to have
> some convenient way to search only for 'a' with no accents.
>
> The only convenient interface I can think of is that you type, in a
> postfix input method, a ' DEL. Currently that is equivalent to typing
> just a. But we could conceivably make it different.
>
> Can someone think of some other interface for this?
You can toggle off char folding with M-s '. That's the same number of keys
as this idea where you would type an accent and then delete.
Of course, one affects the entire search string, while the other would only
affect that specific letter.
[-- Attachment #2: Type: text/html, Size: 770 bytes --]
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-02 16:05 ` Eli Zaretskii
@ 2015-09-02 21:51 ` Jean-Christophe Helary
2015-09-02 22:15 ` Drew Adams
` (2 more replies)
0 siblings, 3 replies; 86+ messages in thread
From: Jean-Christophe Helary @ 2015-09-02 21:51 UTC (permalink / raw)
To: emacs-devel
> On Sep 3, 2015, at 01:05, Eli Zaretskii <eliz@gnu.org> wrote:
>
> What is its equivalent for letter-case differences? IOW, how do I
> search for a without also catching A?
Maybe the default is wrong:
a should catch only a (and not aAàá etc.)
a case modifier would allow a to catch aA
and a diacritic modifier would allow a to catch aàá etc.
the free case and diacritic modifier can be combined so that a can catch aAàÀáÁ etc.
ie, the default it to catch *exactly* what the user types.
Jean-Christophe Helary
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-02 21:51 ` Jean-Christophe Helary
@ 2015-09-02 22:15 ` Drew Adams
2015-09-03 15:37 ` Richard Stallman
2015-09-03 2:41 ` Eli Zaretskii
2015-09-03 15:00 ` Stefan Monnier
2 siblings, 1 reply; 86+ messages in thread
From: Drew Adams @ 2015-09-02 22:15 UTC (permalink / raw)
To: Jean-Christophe Helary, emacs-devel
> > What is its equivalent for letter-case differences? IOW, how do I
> > search for a without also catching A?
>
> Maybe the default is wrong:
> a should catch only a (and not aAàá etc.)
> a case modifier would allow a to catch aA
> and a diacritic modifier would allow a to catch aàá etc.
> the free case and diacritic modifier can be combined so that a can catch
> aAàÀáÁ etc.
>
> ie, the default it to catch *exactly* what the user types.
Personally, I too think that is better default behavior.
For char folding, case folding, and whitespace folding.
But it's not very important, as long as users can (a) set
their own default behavior by customizing one or more options
and (b) easily toggle each kind of folding on the fly.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-02 21:51 ` Jean-Christophe Helary
2015-09-02 22:15 ` Drew Adams
@ 2015-09-03 2:41 ` Eli Zaretskii
2015-09-03 3:08 ` Jean-Christophe Helary
2015-09-03 15:00 ` Stefan Monnier
2 siblings, 1 reply; 86+ messages in thread
From: Eli Zaretskii @ 2015-09-03 2:41 UTC (permalink / raw)
To: Jean-Christophe Helary; +Cc: emacs-devel
> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com>
> Date: Thu, 3 Sep 2015 06:51:07 +0900
>
> the default it to catch *exactly* what the user types.
That goes against long-standing Emacs practice, and I envision strong
objections.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-03 2:41 ` Eli Zaretskii
@ 2015-09-03 3:08 ` Jean-Christophe Helary
2015-09-03 7:28 ` Artur Malabarba
2015-09-03 14:33 ` Eli Zaretskii
0 siblings, 2 replies; 86+ messages in thread
From: Jean-Christophe Helary @ 2015-09-03 3:08 UTC (permalink / raw)
To: emacs-devel
> On Sep 3, 2015, at 11:41, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com>
>> Date: Thu, 3 Sep 2015 06:51:07 +0900
>>
>> the default it to catch *exactly* what the user types.
>
> That goes against long-standing Emacs practice, and I envision strong
> objections.
Even if the current behavior were to be emulated by appropriate variables?
Jean-Christophe Helary
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-03 3:08 ` Jean-Christophe Helary
@ 2015-09-03 7:28 ` Artur Malabarba
2015-09-03 17:15 ` Drew Adams
2015-09-03 14:33 ` Eli Zaretskii
1 sibling, 1 reply; 86+ messages in thread
From: Artur Malabarba @ 2015-09-03 7:28 UTC (permalink / raw)
To: Jean-Christophe Helary; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 682 bytes --]
On 3 Sep 2015 4:08 am, "Jean-Christophe Helary" <
jean.christophe.helary@gmail.com> wrote:
>
>
> > On Sep 3, 2015, at 11:41, Eli Zaretskii <eliz@gnu.org> wrote:
> >
> > That goes against long-standing Emacs practice, and I envision strong
> > objections.
>
> Even if the current behavior were to be emulated by appropriate variables?
Yes, and you can count me among those objections.
When I first started with emacs, case folding by default was something I
liked a lot, before I ever knew how to configure this stuff.
I also only learned about lax whitespace when it became the default (IIRC).
It was a feature that already existed and yet I had no idea because it
wasn't default.
[-- Attachment #2: Type: text/html, Size: 911 bytes --]
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-03 3:08 ` Jean-Christophe Helary
2015-09-03 7:28 ` Artur Malabarba
@ 2015-09-03 14:33 ` Eli Zaretskii
1 sibling, 0 replies; 86+ messages in thread
From: Eli Zaretskii @ 2015-09-03 14:33 UTC (permalink / raw)
To: Jean-Christophe Helary; +Cc: emacs-devel
> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com>
> Date: Thu, 3 Sep 2015 12:08:17 +0900
>
> > On Sep 3, 2015, at 11:41, Eli Zaretskii <eliz@gnu.org> wrote:
> >
> >> From: Jean-Christophe Helary <jean.christophe.helary@gmail.com>
> >> Date: Thu, 3 Sep 2015 06:51:07 +0900
> >>
> >> the default it to catch *exactly* what the user types.
> >
> > That goes against long-standing Emacs practice, and I envision strong
> > objections.
>
> Even if the current behavior were to be emulated by appropriate variables?
You mean, customization variables? We were talking about the
_default_ behavior. It's that default that I think people will object
to have changed towards case-sensitivity.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-02 21:51 ` Jean-Christophe Helary
2015-09-02 22:15 ` Drew Adams
2015-09-03 2:41 ` Eli Zaretskii
@ 2015-09-03 15:00 ` Stefan Monnier
2015-09-03 16:15 ` Drew Adams
2 siblings, 1 reply; 86+ messages in thread
From: Stefan Monnier @ 2015-09-03 15:00 UTC (permalink / raw)
To: Jean-Christophe Helary; +Cc: emacs-devel
> ie, the default it to catch *exactly* what the user types.
I disagree. But if you want to add a Custom var to let users change the
default, that's fine by me.
Personally for those rare cases when I need to explicitly disable
case-folding in isearch, `M-c' works well enough,
Stefan
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-02 22:15 ` Drew Adams
@ 2015-09-03 15:37 ` Richard Stallman
0 siblings, 0 replies; 86+ messages in thread
From: Richard Stallman @ 2015-09-03 15:37 UTC (permalink / raw)
To: Drew Adams; +Cc: jean.christophe.helary, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Jean-Christophe Helary wrote:
> > Maybe the default is wrong:
> > a should catch only a (and not aAàá etc.)
> > a case modifier would allow a to catch aA
> > and a diacritic modifier would allow a to catch aàá etc.
What are this "case modifier" and "diacritic modifier"?
If they are easy to type, this might be convenient.
If they are hard, I think the existing default is better for
handling case, and maybe for diacritics too.
Meanwhile, there is also the issue of discoverability.
If case-fold search required memorizing a special character,
most users would not memorize it and would never use it.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-03 15:00 ` Stefan Monnier
@ 2015-09-03 16:15 ` Drew Adams
2015-09-03 16:23 ` Eli Zaretskii
0 siblings, 1 reply; 86+ messages in thread
From: Drew Adams @ 2015-09-03 16:15 UTC (permalink / raw)
To: Stefan Monnier, Jean-Christophe Helary; +Cc: emacs-devel
> > ie, the default it to catch *exactly* what the user types.
>
> I disagree. But if you want to add a Custom var to let users change the
> default, that's fine by me.
> Personally for those rare cases when I need to explicitly disable
> case-folding in isearch, `M-c' works well enough,
There already is such a Custom var: `case-fold-search'.
And in the rare cases where I need to explicitly _enable_ case
folding in Isearch, `M-c' works well enough. I've customized
`case-fold-search' to turn it OFF by default.
But the question Jean-Christophe raised is about the _default_
behavior.
And BTW, he raised it specifically wrt char folding, not case folding.
The attempt, each time, to hark back to the fact that Emacs defaults
_case_ folding to ON, in the context of a discussion about _char_
folding, is lamentable.
We can deal with case folding later, if there is enough interest in
reconsidering its default behavior. In this thread the question is
about char folding, first and foremost.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-03 16:15 ` Drew Adams
@ 2015-09-03 16:23 ` Eli Zaretskii
2015-09-03 16:46 ` Drew Adams
0 siblings, 1 reply; 86+ messages in thread
From: Eli Zaretskii @ 2015-09-03 16:23 UTC (permalink / raw)
To: Drew Adams; +Cc: jean.christophe.helary, monnier, emacs-devel
> Date: Thu, 3 Sep 2015 09:15:40 -0700 (PDT)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: emacs-devel@gnu.org
>
> But the question Jean-Christophe raised is about the _default_
> behavior.
Indeed.
> And BTW, he raised it specifically wrt char folding, not case folding.
That's not true. Quote:
> Maybe the default is wrong:
> a should catch only a (and not aAàá etc.)
> a case modifier would allow a to catch aA
> and a diacritic modifier would allow a to catch aàá etc.
> the free case and diacritic modifier can be combined so that a can catch
> aAàÀáÁ etc.
>
> ie, the default it to catch *exactly* what the user types.
> The attempt, each time, to hark back to the fact that Emacs defaults
> _case_ folding to ON, in the context of a discussion about _char_
> folding, is lamentable.
>
> We can deal with case folding later, if there is enough interest in
> reconsidering its default behavior. In this thread the question is
> about char folding, first and foremost.
I reacted specifically to Jean-Christophe's suggestion to change the
default for case-fold-search.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-03 16:23 ` Eli Zaretskii
@ 2015-09-03 16:46 ` Drew Adams
0 siblings, 0 replies; 86+ messages in thread
From: Drew Adams @ 2015-09-03 16:46 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: jean.christophe.helary, monnier, emacs-devel
> > But the question Jean-Christophe raised is about the _default_
> > behavior.
>
> Indeed.
>
> > And BTW, he raised it specifically wrt char folding, not case folding.
>
> That's not true. Quote:
>
> >> Maybe the default is wrong:
> >> a should catch only a (and not aAàá etc.)
> >> a case modifier would allow a to catch aA
> >> and a diacritic modifier would allow a to catch aàá etc.
> >> the free case and diacritic modifier can be combined so that a can catch
> >> aAàÀáÁ etc.
> >>
> >> ie, the default it to catch *exactly* what the user types.
Well, OK, he did mention case as well as char folding, yes.
He made, I think, a valid general point. But I agree that we should
leave case folding out of it.
> > The attempt, each time, to hark back to the fact that Emacs defaults
> > _case_ folding to ON, in the context of a discussion about _char_
> > folding, is lamentable.
> >
> > We can deal with case folding later, if there is enough interest in
> > reconsidering its default behavior. In this thread the question is
> > about char folding, first and foremost.
>
> I reacted specifically to Jean-Christophe's suggestion to change the
> default for case-fold-search.
OK. We can agree to separate that out from the current discussion.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-03 7:28 ` Artur Malabarba
@ 2015-09-03 17:15 ` Drew Adams
2015-09-07 13:52 ` Nix
0 siblings, 1 reply; 86+ messages in thread
From: Drew Adams @ 2015-09-03 17:15 UTC (permalink / raw)
To: bruce.connor.am, Jean-Christophe Helary; +Cc: emacs-devel
> Yes, and you can count me among those objections.
> When I first started with emacs, case folding by default was something
> I liked a lot, before I ever knew how to configure this stuff.
> I also only learned about lax whitespace when it became the default (IIRC).
> It was a feature that already existed and yet I had no idea because
> it wasn't default.
Emacs _should_ work on improving discoverability, IMO, but that
is a separate discussion.
IMO and FWIW, it is misguided to provide confusing, dwim behavior
by default. Hard for a newbie to guess what the behavior really
is, because it is too complex, conditional, contextual, whatever.
The argument that we have this nifty feature and newbies won't
discover it on their own easily, so let's foist it upon them
from the outset, as the default behavior, is quite misguided.
What should be done is to have simple, obvious default behavior,
easy to fathom. AND to have easy ways to discover alternate,
optional, fancy behavior that some of us might be convinced is
handier, more powerful, more elegant, or more clever.
Discoverability is not an argument for choosing any default
behavior. Poor discoverability is an argument for improving
discoverability. Nothing more.
That should be a no-brainer, IMO, but we hear this over and over
again. Developers like to show off the clever things they come
up with. That's human and normal. Add such things, sure, but
don't make them the default behavior. Especially when they are
brand new.
That a somewhat dwimish default was chosen for case folding 40
years ago, back when I was programming FORTRAN and most editing
and programming involved case-insensitive contexts, should not be
an argument for using it today - and certainly not for doubling
down on it for new developments (e.g. char folding).
It should instead be a reason to revisit whether we, in 2015,
should continue to have search be case-insensitive by default.
There is only one reasonable argument I can see in favor of
keeping case insensitivity the default, and it does not at all
apply to the other kinds of folding we are talking about now
(char folding, whitespace folding). This is why I said:
But I won't bother making that argument for case folding.
I am not arguing for a change now in the longstanding
case-fold behavior. I am arguing that we get this right
for char folding.
What is that somewhat reasonable argument for turning on case
insensitivity by default? Habit. I see no other good argument
for it "nowadays". Forty years ago, yes; today, no. Today,
most contexts involve both uppercase and lowercase letters,
and they are distinguished semantically (case-sensitive).
It's perhaps a bit odd that some of those who are so quick to
argue for "modernizing" Emacs might also argue to keep their
case insensitivity by default. Old-fartness is relative?
The rule about least surprise for newbies I expressed above
applies even more to the dwim rule that an uppercase letter
in the search string magically flips search to case sensitivity.
Handy as you might find that dwim, it is hardly immediately
clear to a newbie what is going on. Other editors that are
case-insensitive by default do not throw such a gotcha at new
users. (Emacs is not your average editor, and it is great
that Emacs does fancier things than most do, but we're talking
about default behavior here.)
I mention this to try to put a stop to the application of an
old rationale for case folding to char folding etc., not to
argue that we should (now) consider changing the default
behavior for case folding.
To be clear, and to try to forestall the usual whining from some:
I don't care much what the _default_ behavior is for char folding.
That's not what this thread is about.
I, like Jean-Christophe apparently, think that it helps newbies
more to have Isearch, by default, search for just what you type
(imagine!). But I don't feel strongly about that.
What is more important is to be able to (a) customize the default
behavior and (b) toggle it anytime during Isearch.
Also important, to me, is to be able, as I proposed and as Juri
apparently seconded, to have `á' match any of the `a' variants,
just like `a' can do. That is, be able to toggle whether `á'
(or `a') matches only itself or all `a' variants - e.g., as Juri
proposed, using `M-''.
And that, BTW, is the topic of this thread (see Subject line).
What goes for `a' should also go for `á': either of them should
be able to match, au choix, either itself alone or any of its
char-folding variants (and yes, they _are_ equivalences).
I also support Juri's mention of doing the same for whitespace
folding: letting `M-s SPC' toggle whitespace dwimming (option
`search-whitespace-regexp'). But we can also separate out that
discussion from the current topic, which is about char folding.
The general argument about the default behavior is that what a
user puts in the search string is what should be looked for.
If s?he inserts a SPC char then only a SPC char should be sought.
If s?he inserts two consecutive SPC chars then only a two
consecutive SPC chars should be sought. You want cleverer,
handier behavior? Customize the option.
Attempts to finesse the confusion and the possible useful dwim
behaviors tend to end with even more complex dwim behavior: rules
upon rules. See recent discussions about whitespace, where we
hear things like SPC should (by default) match any amount of any
whitespace, but SPC SPC should match only SPC SPC. Unless the
moon is full or it is Tuesday before noon... Epicycles upon
epicycles.
Far better to keep the default behavior simple and immediately
understandable - no need to look up the doc and study a dwim
flowchart. On top of that we can add any fancy alternative
behaviors we think are handier or more clever.
But let's not impose those on newbies as default behavior,
no matter how helpful and ingenious we are convinced they
might be. And certainly not with the excuse that it makes
the fancy feature more discoverable.
[The last (so far) of the folding things is what `M-s i' does:
it toggles search behavior for invisible text. I'm OK with the
default value in this case, but it too could be open for
discussion in the general context of folding. That too is best
left for a separate discussion.]
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-02 15:34 ` Richard Stallman
` (2 preceding siblings ...)
2015-09-02 16:10 ` Artur Malabarba
@ 2015-09-03 19:49 ` Pip Cet
3 siblings, 0 replies; 86+ messages in thread
From: Pip Cet @ 2015-09-03 19:49 UTC (permalink / raw)
To: rms; +Cc: Eli Zaretskii, drew.adams, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1876 bytes --]
How about "C-q a"? C-q SPC already is special-cased to mean something
different in isearch mode, so it wouldn't be a drastic change.
Of course that doesn't solve the problem for characters that are not
represented on the user's keyboard; the quick fix that comes to mind is
that quoted-insert with a negative prefix could read a character using the
input method (not using read-quoted-char), then insert it as though C-q had
been used with the corresponding positive prefix, so when using the TeX
input method, "C-- C-q \ a l p h a" would be equivalent, during isearch, to
"C-q α", to search for an alpha without an accent, breathing mark, or iota
subscriptum. (The minus sign seems logical to me because we can think of
C-q as a two-step command: switch to "literal mode", then read a character
to insert. C-- C-q does the opposite: read a character, then go into
"literal mode" to insert).
In essence, that would make C-q yet another modifier key...
On Wed, Sep 2, 2015 at 3:34 PM, Richard Stallman <rms@gnu.org> wrote:
> [[[ To any NSA and FBI agents reading my email: please consider ]]]
> [[[ whether defending the US Constitution against all enemies, ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
> Since it is possible to search for only 'á', it would be nice to have
> some convenient way to search only for 'a' with no accents.
>
> The only convenient interface I can think of is that you type, in a
> postfix input method, a ' DEL. Currently that is equivalent to typing
> just a. But we could conceivably make it different.
>
> Can someone think of some other interface for this?
>
>
>
> --
> Dr Richard Stallman
> President, Free Software Foundation (gnu.org, fsf.org)
> Internet Hall-of-Famer (internethalloffame.org)
> Skype: No way! See stallman.org/skype.html.
>
>
>
[-- Attachment #2: Type: text/html, Size: 2674 bytes --]
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-03 17:15 ` Drew Adams
@ 2015-09-07 13:52 ` Nix
2015-09-07 17:07 ` Drew Adams
2015-09-08 2:17 ` Richard Stallman
0 siblings, 2 replies; 86+ messages in thread
From: Nix @ 2015-09-07 13:52 UTC (permalink / raw)
To: Drew Adams; +Cc: Jean-Christophe Helary, bruce.connor.am, emacs-devel
On 3 Sep 2015, Drew Adams spake thusly:
> IMO and FWIW, it is misguided to provide confusing, dwim behavior
> by default. Hard for a newbie to guess what the behavior really
> is, because it is too complex, conditional, contextual, whatever.
FWIW I just introduced Emacs to a newbie last month -- using trunk Emacs
because that's what I happened to have available. She was very happy
indeed about not only isearch, not only case-fold search but
specifically char-fold search, and she writes stuff using diacritics all
the time.
The key to remember here is that there are many use cases in which it is
better if isearch finds something similar to what you typed than if it
misses something you were looking for: you can always hit C-s again!
So thanks to case-fold and char-fold search she doesn't have to worry
about getting either the case or diacritics right, and can cut down on
chording and compose characters while searching.
So that's one newbie in particular who would vociferously disagree with
you.
> What should be done is to have simple, obvious default behavior,
She found "searching ignores accent-like things and case" to be easy and
instantly understandable, even though the implementation of ignoring
even case is (thanks to case-conversion tables) quite complicated in a
Unicode world.
--
NULL && (void)
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-07 13:52 ` Nix
@ 2015-09-07 17:07 ` Drew Adams
2015-09-07 23:23 ` Nix
2015-09-08 2:17 ` Richard Stallman
1 sibling, 1 reply; 86+ messages in thread
From: Drew Adams @ 2015-09-07 17:07 UTC (permalink / raw)
To: Nix; +Cc: Jean-Christophe Helary, bruce.connor.am, emacs-devel
> FWIW I just introduced Emacs to a newbie last month -- using trunk Emacs
> because that's what I happened to have available. She was very happy
> indeed about not only isearch, not only case-fold search but
> specifically char-fold search, and she writes stuff using diacritics all
> the time.
>
> The key to remember here is that there are many use cases in which it is
> better if isearch finds something similar to what you typed than if it
> misses something you were looking for: you can always hit C-s again!
> So thanks to case-fold and char-fold search she doesn't have to worry
> about getting either the case or diacritics right, and can cut down on
> chording and compose characters while searching.
>
> So that's one newbie in particular who would vociferously disagree
> with you.
>
> > What should be done is to have simple, obvious default behavior,
^^^^^^^
> She found "searching ignores accent-like things and case" to be easy
> and instantly understandable, even though the implementation of
> ignoring even case is (thanks to case-conversion tables) quite
> complicated in a Unicode world.
Anecdotal evidence from one newbie. OK.
I don't see anything in your description of her understanding of
Isearch that shows that she "would vociferously disagree" with
my proposition that literal search is a better default behavior,
but I guess that is how you feel. So be it.
Nevertheless, I wonder a bit about her nonsurprise and instant
understanding wrt char folding. Did she just search for
something like `a' and find things like `á'? Or did she also
search for something like `á' and find things like `a'? (She
could not have, as that is not yet implemented, AFAIK.)
I would be somewhat surprised if she would not be somewhat
surprised that looking for `á' can find `a'.
Note the current discussion and the Subject line. This
thread is about making char folding treat `á' and `a' as
equivalent, i.e., both directions.
I think it should be clear that searching for and finding
exactly what you type is _absolutely_ easier to understand
than finding things that you did not type. Of course, both
literal and dwim searching might be easy enough in some
contexts or for some users.
So sure, this absolute difference in ease of understanding
does not preclude the existence of some users for whom even
the most complex mapping of search string to search hits
might be "easy and instantly understandable". Such users
should not be bothered by whichever behavior is chosen as
default.
Regexp vs literal search is a good example of literal search
being easier to "get". Regexp search requires some extra
understanding of, or feeling for, the mapping between search
patterns and what the patterns match; literal search does not:
what you type is what you find, literally.
I doubt that all newbies expect our whitespace folding or
find it natural. Likewise, how non-nil `case-fold-search'
treats the presence of an uppercase letter in the search
string.
These things are not obvious, in general, even if you can
point to a new user for whom they seem to be obvious. The
uppercase-letter-in-search-string behavior, in particular,
is unusual - not common in text editors. That might have
made sense as default behavior for Emacs in 1985, but now?
These things are gotchas, even if there might be some
newbies who do not seem to have ever been "got" by them.
It is better not to make such behavior the default, as
long as the alternative is useful.
And it is easy enough to customize search to make such
dwim searching the default for any particular user.
And it is trivial to toggle the behavior anytime.
There is no special reason to make the default behavior
a "gotcha" one.
The _only argument_ that I have heard, for making folding
searches the default behavior, and the only one that I can
imagine, is that if we do not do so then users might not
discover them quickly, and so they might miss out on how
useful they can be.
I repeat what I said before about that:
Discoverability is not an argument for choosing any
default behavior.
Poor discoverability is an argument for improving
discoverability. Nothing more.
> The key to remember here is that there are many use cases in
> which it is better if isearch finds something similar to what
> you typed than if it misses something you were looking for
No, that is not anything key to remember, in this discussion.
No one has doubted that non-literal search can be extremely
useful. That is in fact one of the reasons for this thread:
make char-fold search do exactly that for any char in the
search string, including for a char with diacritics.
Currently, it always searches only literally for `á', even
when char folding is turned on.
It should be clear that no one is arguing against the
usefulness of folding search. The post you responded to
was a counter to the false argument that we should turn
char folding on by default because it facilitates discovery
of this nifty feature.
This thread is not really about what the default behavior
should be, but I did address that extraneous argument,
and you did respond. If there is a need to continue about
that topic, we should do it in a separate thread.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-07 17:07 ` Drew Adams
@ 2015-09-07 23:23 ` Nix
0 siblings, 0 replies; 86+ messages in thread
From: Nix @ 2015-09-07 23:23 UTC (permalink / raw)
To: Drew Adams; +Cc: Jean-Christophe Helary, bruce.connor.am, emacs-devel
On 7 Sep 2015, Drew Adams spake thusly:
>> FWIW I just introduced Emacs to a newbie last month -- using trunk Emacs
>> because that's what I happened to have available. She was very happy
>> indeed about not only isearch, not only case-fold search but
>> specifically char-fold search, and she writes stuff using diacritics all
>> the time.
>>
>> The key to remember here is that there are many use cases in which it is
>> better if isearch finds something similar to what you typed than if it
>> misses something you were looking for: you can always hit C-s again!
>> So thanks to case-fold and char-fold search she doesn't have to worry
>> about getting either the case or diacritics right, and can cut down on
>> chording and compose characters while searching.
>>
>> So that's one newbie in particular who would vociferously disagree
>> with you.
>>
>> > What should be done is to have simple, obvious default behavior,
> ^^^^^^^
>> She found "searching ignores accent-like things and case" to be easy
>> and instantly understandable, even though the implementation of
>> ignoring even case is (thanks to case-conversion tables) quite
>> complicated in a Unicode world.
>
> Anecdotal evidence from one newbie. OK.
I think that counters your anecdotal evidence that newbies would find it
confusing: at least one doesn't. (It's not like either of us are
remotely newbies. Heck, I can't even remember what it was like to be
one, so anecdata is all I have to contribute on this front.)
> Nevertheless, I wonder a bit about her nonsurprise and instant
> understanding wrt char folding. Did she just search for
> something like `a' and find things like `á'? Or did she also
> search for something like `á' and find things like `a'? (She
> could not have, as that is not yet implemented, AFAIK.)
She did the former, of course -- the latter is harder to type, so I
cannot imagine any situation in which anyone would expect it. The whole
nature of *-fold-search is that you can search for the non-chorded basis
of things that must be typed with chords or which are otherwise
composite and get the composite variants too.
> I would be somewhat surprised if she would not be somewhat
> surprised that looking for `á' can find `a'.
Perhaps you never thought of it in terms of the keyboard. :)
> Note the current discussion and the Subject line. This
> thread is about making char folding treat `á' and `a' as
> equivalent, i.e., both directions.
I think that would be deeply bizarre. Searching for 'Foo' does not find
'foo' when case-fold-saerch is on: this is, as has been noted, precisely
analogous to this longstanding Emacs behaviour.
> I think it should be clear that searching for and finding
> exactly what you type is _absolutely_ easier to understand
> than finding things that you did not type.
Finding things without having to type the whole thing in is exactly what
isearch has always been about. This is just an extension of that, and
not even a very big one.
> So sure, this absolute difference in ease of understanding
> does not preclude the existence of some users for whom even
> the most complex mapping of search string to search hits
> might be "easy and instantly understandable". Such users
> should not be bothered by whichever behavior is chosen as
> default.
Are you actually reduced to saying that actual newbies' experience is
obviously less significant than your guesses as to what newbies will
surely find less confusing??
Try it on actual newbies. I bet you they won't be confused, based on my
single data point :)
> I doubt that all newbies expect our whitespace folding or
> find it natural.
Haven't you seen non-geeks typing? They leave multiple spaces routinely
(often due to hitting space at the end of a run of typing, then again at
the start of the next one) and expect them to act like just one. This
seems quite reasonable to me, even if random irregular spacing does look
too ugly for me to perpetrate it myself.
> Likewise, how non-nil `case-fold-search'
> treats the presence of an uppercase letter in the search
> string.
The thing is, both of these are more or less obscure. When people
discover that lowercase also finds uppercase, or that non-diacritic also
finds diacritic, they generally respond by not bothering to use
uppercase in search terms for a long time. So by the time they encounter
even the first half of the behaviour you call so confusing they are no
longer quite newbies.
> These things are not obvious, in general, even if you can
> point to a new user for whom they seem to be obvious.
I don't think they're even relevant to new users, precisely *because*
they are not terribly discoverable.
> These things are gotchas, even if there might be some
> newbies who do not seem to have ever been "got" by them.
This would be more convincing if you could point to any instances of
newbies actually being confused by it.
> It is better not to make such behavior the default, as
> long as the alternative is useful.
That ship sailed decades ago.
--
NULL && (void)
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-07 13:52 ` Nix
2015-09-07 17:07 ` Drew Adams
@ 2015-09-08 2:17 ` Richard Stallman
1 sibling, 0 replies; 86+ messages in thread
From: Richard Stallman @ 2015-09-08 2:17 UTC (permalink / raw)
To: Nix; +Cc: jean.christophe.helary, bruce.connor.am, drew.adams, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> FWIW I just introduced Emacs to a newbie last month -- using trunk Emacs
> because that's what I happened to have available. She was very happy
> indeed about not only isearch, not only case-fold search but
> specifically char-fold search, and she writes stuff using diacritics all
> the time.
I expect people will generally like it.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-01 18:15 ` Eli Zaretskii
2015-09-01 18:46 ` Drew Adams
@ 2015-09-08 5:36 ` Ulrich Mueller
2015-09-08 6:04 ` Jean-Christophe Helary
` (2 more replies)
1 sibling, 3 replies; 86+ messages in thread
From: Ulrich Mueller @ 2015-09-08 5:36 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Drew Adams, emacs-devel
>>>>> On Tue, 01 Sep 2015, Eli Zaretskii wrote:
>> No. You are asking for that only when you use a search pattern
>> that does not use the diacriticals. When you search with á in
>> the pattern you are NOT asking for matches that disregard the
>> diacriticals. And why not?
> Because á does include a diacritical. By specifying it, the user
> told us the diacriticals are important, and shouldn't be
> disregarded.
I disagree. When I search for "Müller" I want it to also match
"Muller" because some people (e.g., in French speaking countries) use
this as an approximation of the spelling.
(I'd also like it to match "Mueller" but that's a different issue.)
Ulrich
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-08 5:36 ` Ulrich Mueller
@ 2015-09-08 6:04 ` Jean-Christophe Helary
2015-09-08 13:31 ` Stephen J. Turnbull
2015-09-08 13:39 ` Drew Adams
2015-09-08 15:47 ` Eli Zaretskii
2015-09-08 20:09 ` Richard Stallman
2 siblings, 2 replies; 86+ messages in thread
From: Jean-Christophe Helary @ 2015-09-08 6:04 UTC (permalink / raw)
To: emacs-devel
> On Sep 8, 2015, at 14:36, Ulrich Mueller <ulm@gentoo.org> wrote:
>
>>>>>> On Tue, 01 Sep 2015, Eli Zaretskii wrote:
>
>>> No. You are asking for that only when you use a search pattern
>>> that does not use the diacriticals. When you search with á in
>>> the pattern you are NOT asking for matches that disregard the
>>> diacriticals. And why not?
>
>> Because á does include a diacritical. By specifying it, the user
>> told us the diacriticals are important, and shouldn't be
>> disregarded.
>
> I disagree. When I search for "Müller" I want it to also match
> "Muller" because some people (e.g., in French speaking countries) use
> this as an approximation of the spelling.
It's fine that emacs is "different", but common (nano, vi, GUI editors, word processors) behaviour is that a search strictly matches the string, and that creates expectations. For the Muller case above, as a translator I could see myself search for Muller to correct it to Müller and not be happy to have all the correct Müllers showing up in the search.
Let's just put flags that trigger case/diacritic matching, they could be on in default emacs, but they should be somewhere.
Jean-Christophe Helary
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-08 6:04 ` Jean-Christophe Helary
@ 2015-09-08 13:31 ` Stephen J. Turnbull
2015-09-08 14:24 ` Drew Adams
[not found] ` <<8cf269bc-69d8-4752-8506-de8d992512e1@default>
2015-09-08 13:39 ` Drew Adams
1 sibling, 2 replies; 86+ messages in thread
From: Stephen J. Turnbull @ 2015-09-08 13:31 UTC (permalink / raw)
To: Jean-Christophe Helary; +Cc: emacs-devel
Jean-Christophe Helary writes:
> Let's just put flags that trigger case/diacritic matching, they
> could be on in default emacs, but they should be somewhere.
They're already there. The discussion here is entirely about the DWIM
UI of isearch that allows requesting strict matching by having at
least one uppercase or accented character, even though lax mode is
enabled.
Drew prefers a UI that enables/disables strict mode using a special
isearch command bound to a key. That would be plausible, if the DWIM
UI for case fold search in isearch weren't 3 decades old. But the
DWIM UI *is* 3 decades old, and successful. Drew disputes that, but
in the 25 years I've followed Emacs development this is the first time
I've seen anybody complain about the DWIM-ish case folding feature.
Note that incremental case-folded search (usually with no escape for
strict matching!) has been widely adopted in web and file browsers.
I'm +1 on generalizing this UI to "diacritic folding" in isearch.
The other question is that of Ulrich Müller, who points out that it's
natural for him to type his name correctly, but he'd like to laxly
match Mueller and Muller, too.[1] It's a valid use case, obviously,
but based on an analogy to experience with DWIMish case-folding in
Emacs, I believe most users will quickly adjust to typing "muller"
when they want a poor man's version of full "orthographic
equivalence". Individuals may not, but I believe the great majority
will, since I'm sure it's anatomically easier to type "muller" than
"Müller", even on a German keyboard.
Footnotes:
[1] Drew also argues this point, but from an abstract insistence on
"symmetry", which doesn't really exist here for representational,
anatomical, psychological reasons, and let's not forget personal
historical reasons like "Müller is my name".
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-08 6:04 ` Jean-Christophe Helary
2015-09-08 13:31 ` Stephen J. Turnbull
@ 2015-09-08 13:39 ` Drew Adams
2015-09-08 21:19 ` Juri Linkov
1 sibling, 1 reply; 86+ messages in thread
From: Drew Adams @ 2015-09-08 13:39 UTC (permalink / raw)
To: Jean-Christophe Helary, emacs-devel
> > I disagree. When I search for "Müller" I want it to also match
> > "Muller" because some people (e.g., in French speaking countries) use
> > this as an approximation of the spelling.
>
> It's fine that emacs is "different", but common (nano, vi, GUI editors, word
> processors) behaviour is that a search strictly matches the string, and that
> creates expectations. For the Muller case above, as a translator I could see
> myself search for Muller to correct it to Müller and not be happy to have
> all the correct Müllers showing up in the search.
Not a problem, provided we have a toggle like what Juri suggested.
Toggle literal vs char folding. And ensure that char folding is
symmetric (this thread), and not just one-way as it is now.
I agree with you about the default behavior (literal, not folded).
But of course users need to be able to customize the default
behavior, so they start out with whichever behavior they prefer.
> Let's just put flags that trigger case/diacritic matching, they could be on
> in default emacs, but they should be somewhere.
Yup.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-08 13:31 ` Stephen J. Turnbull
@ 2015-09-08 14:24 ` Drew Adams
2015-09-08 15:21 ` Stephen J. Turnbull
` (2 more replies)
[not found] ` <<8cf269bc-69d8-4752-8506-de8d992512e1@default>
1 sibling, 3 replies; 86+ messages in thread
From: Drew Adams @ 2015-09-08 14:24 UTC (permalink / raw)
To: Stephen J. Turnbull, Jean-Christophe Helary; +Cc: emacs-devel
> The discussion here is entirely about the DWIM
> UI of isearch that allows requesting strict matching by having at
> least one uppercase or accented character, even though lax mode is
> enabled.
The proposal is explicitly *not* for the former, now. The weird
exception of an uppercase letter making the current search be
case-sensitive, even though you have toggled case sensitivity
OFF, is not under attack now.
Personally, yes, I would get rid of that anomaly too at some
point, but I'm not proposing that now. Likewise, for the
anomaly that whitespace folding is switched off by SPC SPC.
That too, I would like to see removed eventually, but I'm not
proposing that now either.
The point now is to DTRT wrt char folding - the new feature.
> Drew prefers a UI that enables/disables strict mode using a
> special isearch command bound to a key.
We already have that. What I'm proposing in this thread is
that when char folding is on, it work symmetrically: Folding
should let you use `é' in the search string to match any of
the accented or unaccented variants, just as it does for `e'
in the search string.
Nothing more. What's good for `e' should be good for `é' and
all the rest. It's about equivalence classes. There is no
reason to limit search strings to one privileged member of
an equivalence class when trying to match any members of the
class. That's all.
> That would be plausible, if the DWIM
> UI for case fold search in isearch weren't 3 decades old.
See above. I am *not* now proposing a change to case-fold
behavior. I've made that clear from the beginning, and
repeated it several times now.
But it seems that it is easier, for those not favorable to
what I (and Juri, apparently) propose, to harp on the age-old
anomaly of uppercase case-fold annulment as, somehow (?), an
argument against clean, symmetric char folding.
Please argue about the topic at hand (see Subject line),
not whether the 1980s decision to make an exception for
an uppercase letter in the search string was or is a
good idea.
> ut the DWIM UI *is* 3 decades old, and successful. Drew
> disputes that,
No, Drew does not. You cannot show one place where anything
Drew has written written suggests that he disputes that.
> but in the 25 years I've followed Emacs development this is
> the first time I've seen anybody complain about the DWIM-ish
> case folding feature.
Live and learn. ;-) That is not the topic of this thread,
in any case.
> Note that incremental case-folded search (usually with no escape for
> strict matching!) has been widely adopted in web and file browsers.
Uh, no. Case folding, yes. But not case folding that
switches off (becoming case-sensitive) just because you
include an uppercase letter in the search string. Not in any
browser I have, at least. Nor in Notepad or TextPad or other
simple editors that newbies or non-programmers might be used to.
But again, *not* the subject of this topic.
> I'm +1 on generalizing this UI to "diacritic folding" in isearch.
By "this UI", I guess you mean that if there is a char with
a diacritic in the search string then that should turn off
char folding, preventing you from matching text ignoring
diacritics.
That would be unfortunate - a strict loss (inability to
match `é' against `e'; only ability to match `e' against `é'),
and with no gain.
> The other question is that of Ulrich Müller, who points out that it's
> natural for him to type his name correctly, but he'd like to laxly
> match Mueller and Muller, too.[1]
Same as my resumé example, yes.
And the use case includes various quotation marks (e.g. curly)
in the search string and wanting to match various others in
the text. E.g., you copy some text from a web page, which
includes some curly quote marks, and you want to match text
in your buffer but ignoring the difference in quote-mark type.
Likewise, for any of the other equivalence classes. No reason
to privilege any particular member of a class, making it so
that only that member can be used in a search string to match
the other members. We've seen no argument supporting such
asymmetry.
(I can imagine an argument in terms of implementation, but
we have not heard that yet. And *no* argument has been
given in user terms - UI. Why should users be limited wrt
which class member they can use to match a class?)
> It's a valid use case, obviously,
> but based on an analogy to experience with DWIMish case-folding in
> Emacs, I believe most users will quickly adjust to typing "muller"
> when they want a poor man's version of full "orthographic
> equivalence". Individuals may not, but I believe the great majority
> will, since I'm sure it's anatomically easier to type "muller" than
> "Müller", even on a German keyboard.
It's not only about typing. That seems to be the main point
that those who repeat this mantra forget. Text can be pasted
into an Isearch string, including text copied from outside
Emacs. Text using any Unicode chars, from any languages.
> Footnotes:
> [1] Drew also argues this point, but from an abstract insistence on
> "symmetry", which doesn't really exist here for representational,
> anatomical, psychological reasons, and let's not forget personal
> historical reasons like "Müller is my name".
Nonsense. I gave concrete examples. It's not an academic
argument. It's about really having character folding, not
just a one-way character folding that requires you to type
(or edit a pasted string) _only_ the "canonical" chars that
are folded. It's a practical argument, not an abstract
insistence on symmetry.
Being _able_ to fold `é' to `e' or `è', and to fold one kind
of quote mark to others, is, yes, a normal use case. Nothing
odd, abstract, or academic about it. Herr Müller confirms
this with his own example. This should be a no-brainer, IMO.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-08 14:24 ` Drew Adams
@ 2015-09-08 15:21 ` Stephen J. Turnbull
2015-09-08 16:58 ` Drew Adams
2015-09-08 20:15 ` Richard Stallman
2015-09-08 20:15 ` Richard Stallman
2 siblings, 1 reply; 86+ messages in thread
From: Stephen J. Turnbull @ 2015-09-08 15:21 UTC (permalink / raw)
To: Drew Adams; +Cc: emacs-devel
Drew Adams writes:
> This should be a no-brainer, IMO.
Put your code on ELPA and demonstrate its superiority. Since it's a
no-brainer, there's no risk.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-08 5:36 ` Ulrich Mueller
2015-09-08 6:04 ` Jean-Christophe Helary
@ 2015-09-08 15:47 ` Eli Zaretskii
2015-09-08 16:57 ` Drew Adams
2015-09-08 21:20 ` Juri Linkov
2015-09-08 20:09 ` Richard Stallman
2 siblings, 2 replies; 86+ messages in thread
From: Eli Zaretskii @ 2015-09-08 15:47 UTC (permalink / raw)
To: Ulrich Mueller; +Cc: drew.adams, emacs-devel
> Date: Tue, 8 Sep 2015 07:36:51 +0200
> Cc: Drew Adams <drew.adams@oracle.com>, emacs-devel@gnu.org
> From: Ulrich Mueller <ulm@gentoo.org>
>
> >>>>> On Tue, 01 Sep 2015, Eli Zaretskii wrote:
>
> >> No. You are asking for that only when you use a search pattern
> >> that does not use the diacriticals. When you search with á in
> >> the pattern you are NOT asking for matches that disregard the
> >> diacriticals. And why not?
>
> > Because á does include a diacritical. By specifying it, the user
> > told us the diacriticals are important, and shouldn't be
> > disregarded.
>
> I disagree. When I search for "Müller" I want it to also match
> "Muller"
Then you should type "Muller" instead of "Müller".
> (I'd also like it to match "Mueller" but that's a different issue.)
With this feature, you can.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-08 15:47 ` Eli Zaretskii
@ 2015-09-08 16:57 ` Drew Adams
2015-09-08 21:20 ` Juri Linkov
1 sibling, 0 replies; 86+ messages in thread
From: Drew Adams @ 2015-09-08 16:57 UTC (permalink / raw)
To: Eli Zaretskii, Ulrich Mueller; +Cc: emacs-devel
> > > Because á does include a diacritical. By specifying it, the user
> > > told us the diacriticals are important, and shouldn't be
> > > disregarded.
> >
> > I disagree. When I search for "Müller" I want it to also match
> > "Muller"
>
> Then you should type "Muller" instead of "Müller".
I believe Ulrich is specifically asking to be able to type (or
paste) "Müller" _instead of having_ to type "Muller", to match
both "Müller" and "Muller".
Telling him to just type "Muller" ignores his request and his
argument that it is useful to be able to do what he asks.
That's all we've heard, so far, as an argument against the
proposal: You don't need it; just get by with the canonical
chars instead of accented chars in search strings, if you
want char folding.
No reason given why someone should not _be able_ to do
what Ulrich wants.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-08 15:21 ` Stephen J. Turnbull
@ 2015-09-08 16:58 ` Drew Adams
2015-09-08 17:38 ` Stephen J. Turnbull
0 siblings, 1 reply; 86+ messages in thread
From: Drew Adams @ 2015-09-08 16:58 UTC (permalink / raw)
To: Stephen J. Turnbull; +Cc: emacs-devel
> > This should be a no-brainer, IMO.
>
> Put your code on ELPA and demonstrate its superiority.
> Since it's a no-brainer, there's no risk.
If you're going to quote something written by someone else,
please at least do not mislead by taking it totally out of
context.
Here is that text in context. It says nothing about
implementation being a no-brainer.
> Being _able_ to fold `é' to `e' or `è', and to fold one
> kind of quote mark to others, is, yes, a normal use case.
> Nothing odd, abstract, or academic about it. Herr Müller
> confirms this with his own example. This should be a
> no-brainer, IMO.
It's about _what_ users can do. It should be a no-brainer,
IMO, that users should _be able_ to do what Ulrich, Juri,
and I have requested. That same emphasis on _being able_
was in the original text quoted, but you still ignored it.
_How_ to fix the current implementation to support that
behavior is a different question. Feel free to raise that
question - _how_ to do it - in another thread, if you are
interested. And contribute code to it, if you like.
The question this thread raises is why and why not do it.
I've approached this question only from a user point of
view (it is useful to be able to do it). But it is fine
to present implementation-related considerations that
argue against (or for) doing it. None seen so far.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-08 16:58 ` Drew Adams
@ 2015-09-08 17:38 ` Stephen J. Turnbull
2015-09-09 22:52 ` Drew Adams
0 siblings, 1 reply; 86+ messages in thread
From: Stephen J. Turnbull @ 2015-09-08 17:38 UTC (permalink / raw)
To: Drew Adams; +Cc: emacs-devel
Drew Adams writes:
> I've approached this question only from a user point of
> view (it is useful to be able to do it).
Well, since I'm not going to do it any time soon, and you haven't even
considered doing it yet, this thread is moot.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-08 5:36 ` Ulrich Mueller
2015-09-08 6:04 ` Jean-Christophe Helary
2015-09-08 15:47 ` Eli Zaretskii
@ 2015-09-08 20:09 ` Richard Stallman
2015-09-08 21:00 ` Drew Adams
2015-09-08 21:47 ` Ulrich Mueller
2 siblings, 2 replies; 86+ messages in thread
From: Richard Stallman @ 2015-09-08 20:09 UTC (permalink / raw)
To: Ulrich Mueller; +Cc: eliz, drew.adams, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> I disagree. When I search for "Müller" I want it to also match
> "Muller" because some people (e.g., in French speaking countries) use
> this as an approximation of the spelling.
Are you suggesting that searching for ü should match u but not ú or ù?
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-08 14:24 ` Drew Adams
2015-09-08 15:21 ` Stephen J. Turnbull
@ 2015-09-08 20:15 ` Richard Stallman
2015-09-08 20:15 ` Richard Stallman
2 siblings, 0 replies; 86+ messages in thread
From: Richard Stallman @ 2015-09-08 20:15 UTC (permalink / raw)
To: Drew Adams; +Cc: stephen, jean.christophe.helary, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> Personally, yes, I would get rid of that anomaly too at some
> point, but I'm not proposing that now. Likewise, for the
> anomaly that whitespace folding is switched off by SPC SPC.
SPC SPC should match only a pair of spaces!
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-08 14:24 ` Drew Adams
2015-09-08 15:21 ` Stephen J. Turnbull
2015-09-08 20:15 ` Richard Stallman
@ 2015-09-08 20:15 ` Richard Stallman
2015-09-08 21:25 ` Drew Adams
2 siblings, 1 reply; 86+ messages in thread
From: Richard Stallman @ 2015-09-08 20:15 UTC (permalink / raw)
To: Drew Adams; +Cc: stephen, jean.christophe.helary, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> Nothing more. What's good for `e' should be good for `é' and
> all the rest. It's about equivalence classes.
That would be a change for the worse, since it would reduce the range
of searches that the user can specify with one character in the
search.
Currently the user can either search for "any kind of e" or "only é"
or "only è" or "only ê", etc.
With your change, the user would be limited to searching for "any kind
of e". That would be a step back in flexibility.
Since the current interface is fairly natural, there is no loss in
offering the user all these options.
I would not oppose offering a configuration setting to get the
behavior you want. There is nothing to lose with that. But the
current behavior is a more useful default than the behavior you would
like.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-08 20:09 ` Richard Stallman
@ 2015-09-08 21:00 ` Drew Adams
2015-09-09 15:06 ` Richard Stallman
2015-09-08 21:47 ` Ulrich Mueller
1 sibling, 1 reply; 86+ messages in thread
From: Drew Adams @ 2015-09-08 21:00 UTC (permalink / raw)
To: rms, Ulrich Mueller; +Cc: eliz, emacs-devel
> > I disagree. When I search for "Müller" I want it to also match
> > "Muller" because some people (e.g., in French speaking countries) use
> > this as an approximation of the spelling.
>
> Are you suggesting that searching for ü should match u but not ú or ù?
I'm not speaking for Ulrich, but no, I am not suggesting that.
The proposal behind this thread is that when char folding is turned
ON, any char CHR in a given equivalence class would match any other
char in that class, when CHR is used in a search string.
So if char folding is on, you can find any of [eéèêæë] in the buffer
text using any of those chars in the search string, not just `e' in
the search string. None of them has a privileged role in the search
string.
To match only one of those folding-equivalent chars (e.g., only `e'
or `é'), you would turn OFF char folding and use that exact char in
the search string.
Char folding would be togglable, as now, using `M-s ''. The only
difference would be that when char folding is on, any of [eéèêæë]
would act the same way in a search string.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-08 13:39 ` Drew Adams
@ 2015-09-08 21:19 ` Juri Linkov
2015-09-09 15:07 ` Richard Stallman
0 siblings, 1 reply; 86+ messages in thread
From: Juri Linkov @ 2015-09-08 21:19 UTC (permalink / raw)
To: Drew Adams; +Cc: Jean-Christophe Helary, emacs-devel
>> > I disagree. When I search for "Müller" I want it to also match
>> > "Muller" because some people (e.g., in French speaking countries) use
>> > this as an approximation of the spelling.
>>
>> It's fine that emacs is "different", but common (nano, vi, GUI editors, word
>> processors) behaviour is that a search strictly matches the string, and that
>> creates expectations.
In Web browsers by default “u” matches “ü” as well as “ü” matches “u”.
>> For the Muller case above, as a translator I could see
>> myself search for Muller to correct it to Müller and not be happy to have
>> all the correct Müllers showing up in the search.
>
> Not a problem, provided we have a toggle like what Juri suggested.
> Toggle literal vs char folding. And ensure that char folding is
> symmetric (this thread), and not just one-way as it is now.
Do you mean a toggle for an individual character in the search string
or a toggle for the whole search string? Also is it a three-state toggle
between literal match, “ü” matches only “ü”, “ü” matches both “u” and “ü”,
“ü” matches “u”, “ü” and all other variants like “ú”?
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-08 15:47 ` Eli Zaretskii
2015-09-08 16:57 ` Drew Adams
@ 2015-09-08 21:20 ` Juri Linkov
2015-09-09 2:42 ` Eli Zaretskii
1 sibling, 1 reply; 86+ messages in thread
From: Juri Linkov @ 2015-09-08 21:20 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Ulrich Mueller, drew.adams, emacs-devel
>> (I'd also like it to match "Mueller" but that's a different issue.)
>
> With this feature, you can.
This is not what I see. The generated regexp for “u” is:
\(?:u[̀-̄̆̈-̨̛̣̤̭̰̌̏̑]\|[uù-üũūŭůűųưǔȕȗᵘᵤṳṵṷụủ⒰ⓤu𝐮𝑢𝒖𝓊𝓾𝔲𝕦𝖚𝗎𝘂𝘶𝙪𝚞]\)
that doesn't match “ue”.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-08 20:15 ` Richard Stallman
@ 2015-09-08 21:25 ` Drew Adams
2015-09-09 15:07 ` Richard Stallman
0 siblings, 1 reply; 86+ messages in thread
From: Drew Adams @ 2015-09-08 21:25 UTC (permalink / raw)
To: rms; +Cc: stephen, jean.christophe.helary, emacs-devel
> > Nothing more. What's good for `e' should be good for `é' and
> > all the rest. It's about equivalence classes.
>
> That would be a change for the worse, since it would reduce the range
> of searches that the user can specify with one character in the
> search.
Not at all. It adds to what the user can do. It does not subtract.
> Currently the user can either search for "any kind of e" or "only é"
> or "only è" or "only ê", etc.
That would still be the case.
The only difference would be that when s?he wants to search for "any
kind of e" s?he can use any of the equivalent e-chars. Any of [eéèêæë]
would behave the same as `e' does not, when searching for any of [eéèêæë].
> With your change, the user would be limited to searching for "any kind
> of e". That would be a step back in flexibility.
Not at all. Just as now, the user can toggle char folding OFF, to
search for the search string literally, i.e., to take its chars as
what they are, and not consider them as representative of an
equivalence class.
With folding OFF, `e' searches only for `e'; `é' searches only for `é',
and so on.
These are all of the possible possibilities, for `e' and `é':
Folding ON/OFF Search string char Buffer chars that match
-------------- ------------------ -----------------------
OFF e [e]
OFF é [é]
ON e [eéèêæë]
ON é [eéèêæë] <======= MISSING NOW
And the same goes for any of the other e-chars. With folding off
it matches only itself. With folding on it matches any of its class.
This proposal adds more matching possibilities. It does
not remove any possibilities.
Currently, you cannot do what is shown in the last line above.
You cannot use é to search for [eéèêæë]. Similarly, you cannot
use a curly quote to search for other kinds of quote marks.
You are currently limited to using only the "canonical" chars
that represent their class. That removes the possibility of
pasting text into the search string and being able to get
char-folding search.
Quote marks are a good example chars in text that you might copy
and try to search for. To do that, if the copied text contained
curly quotes then you would need to use `M-e' and edit the search
string, to convert each of them to the corresponding "canonical"
member of the quotation-mark equivalences, an ascii quote mark.
There is no good reason to make users jump through such a hoop.
(Plus, they would need to know what the "canonical" char is, for
each equivalence class they might want to use.) Let any member
of a class represent the class.
> Since the current interface is fairly natural, there is no loss in
> offering the user all these options.
All what options? The proposal does not remove any matching options.
On the contrary, it adds matching options.
> I would not oppose offering a configuration setting to get the
> behavior you want. There is nothing to lose with that. But the
> current behavior is a more useful default than the behavior you would
> like.
Did you understand what is being proposed, when you wrote that?
If so, how is the current restriction to `e' for matching [eéèêæë]
more useful than letting any e-char do the same?
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
[not found] ` <<E1ZZPIS-0005rf-DJ@fencepost.gnu.org>
@ 2015-09-08 21:46 ` Drew Adams
0 siblings, 0 replies; 86+ messages in thread
From: Drew Adams @ 2015-09-08 21:46 UTC (permalink / raw)
To: rms, Drew Adams; +Cc: stephen, jean.christophe.helary, emacs-devel
> > Personally, yes, I would get rid of that anomaly too at some
> > point, but I'm not proposing that now. Likewise, for the
> > anomaly that whitespace folding is switched off by SPC SPC.
>
> SPC SPC should match only a pair of spaces!
This is not the topic of this thread. But if and when we get to
that discussion (if I'm still around in 20 years ;-)), the right
answer is the same as for the current proposal, which is about
char folding:
Just toggle whitespace-folding OFF. "Just say NO" to SPC SPC
matching any string of whitespace. Whitespace folding will stay
off as long as you do not toggle it ON. And you can customize
the default behavior so that it starts either ON or OFF.
(I'm not a fan of whitespace folding most of the time, so I
will turn it OFF by default, personally. I do want SPC SPC
to match only two consecutive spaces, nearly all the time.)
The simple idea is that folding is either on or off.
When on, equivalence classes are used - whether for diacritics
("char folding"), or for case (case folding), or for whitespace
(whitespace folding).
But that's just a preview of a possible FUTURE discussion.
No one is proposing NOW that we change the current behavior
of case folding or whitespace folding. The topic here, now,
is char folding - whether it should treat all chars of a class
the same or not.
What do you LOSE with the proposed behavior (for char folding
now, and perhaps for case or whitespace folding later)? You
lose the fact that any particular members of an equivalence
class are "canonical", and so using one of them during folding
automatically switches folding off.
E.g., currently, using é in a search string turns char folding
off. And of course using an uppercase char turns case folding
off. And SPC SPC turns whitespace folding off.
What do you GAIN with the proposed behavior? You need not
type a particular, privileged member of a class in order to
match any member of the class. Any member will match any
member (including itself, of course).
The point is to have users explicitly hit a key to toggle folding.
That enables the use of any char in a class to match any other
char in the same class. That's the tradeoff.
With the proposal, there is nothing to remember, no exceptions
or special rules. Folding is either on or off, and a single
key toggles it (for each kind of folding).
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-08 20:09 ` Richard Stallman
2015-09-08 21:00 ` Drew Adams
@ 2015-09-08 21:47 ` Ulrich Mueller
1 sibling, 0 replies; 86+ messages in thread
From: Ulrich Mueller @ 2015-09-08 21:47 UTC (permalink / raw)
To: rms; +Cc: eliz, drew.adams, emacs-devel
>>>>> On Tue, 08 Sep 2015, Richard Stallman wrote:
>> I disagree. When I search for "Müller" I want it to also match
>> "Muller" because some people (e.g., in French speaking countries)
>> use this as an approximation of the spelling.
> Are you suggesting that searching for ü should match u but not ú or ù?
No, I am not. It is fine if the search would match a u with any
diacritics. It does not make much of a practical difference because
both Múller and Mùller are unlikely spellings.
Ulrich
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-08 21:20 ` Juri Linkov
@ 2015-09-09 2:42 ` Eli Zaretskii
2015-09-09 11:23 ` Artur Malabarba
[not found] ` <<CAAdUY-JMQVsRFku8nwX8JcA9k6Y9sHWoVL6ZC60RHnjoj0cd+Q@mail.gmail.com>
0 siblings, 2 replies; 86+ messages in thread
From: Eli Zaretskii @ 2015-09-09 2:42 UTC (permalink / raw)
To: Juri Linkov; +Cc: ulm, drew.adams, emacs-devel
> From: Juri Linkov <juri@linkov.net>
> Cc: Ulrich Mueller <ulm@gentoo.org>, drew.adams@oracle.com, emacs-devel@gnu.org
> Date: Wed, 09 Sep 2015 00:20:20 +0300
>
> >> (I'd also like it to match "Mueller" but that's a different issue.)
> >
> > With this feature, you can.
>
> This is not what I see.
This needs customizing the equivalence set.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-09 2:42 ` Eli Zaretskii
@ 2015-09-09 11:23 ` Artur Malabarba
2015-09-09 13:32 ` Drew Adams
2015-09-09 15:12 ` Richard Stallman
[not found] ` <<CAAdUY-JMQVsRFku8nwX8JcA9k6Y9sHWoVL6ZC60RHnjoj0cd+Q@mail.gmail.com>
1 sibling, 2 replies; 86+ messages in thread
From: Artur Malabarba @ 2015-09-09 11:23 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: ulm, emacs-devel, Drew Adams, Juri Linkov
If I may weigh in. I think the whole discussion of whether this should
be symmetric or not is pointless. There are arguments for both sides,
and without any significant amount of empirical evidence, any choice
is as good as flipping a coin.
I'd much rather we focus effort on making the equiv-classes easier to customize.
2015-09-09 3:42 GMT+01:00 Eli Zaretskii <eliz@gnu.org>:
>> From: Juri Linkov <juri@linkov.net>
>> Cc: Ulrich Mueller <ulm@gentoo.org>, drew.adams@oracle.com, emacs-devel@gnu.org
>> Date: Wed, 09 Sep 2015 00:20:20 +0300
>>
>> >> (I'd also like it to match "Mueller" but that's a different issue.)
>> >
>> > With this feature, you can.
>>
>> This is not what I see.
>
> This needs customizing the equivalence set.
Yes. Discussing how to expose easy and useful customization to the
user is a much more useful discussio IMO.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-09 11:23 ` Artur Malabarba
@ 2015-09-09 13:32 ` Drew Adams
2015-09-09 15:12 ` Richard Stallman
1 sibling, 0 replies; 86+ messages in thread
From: Drew Adams @ 2015-09-09 13:32 UTC (permalink / raw)
To: bruce.connor.am, Eli Zaretskii; +Cc: ulm, emacs-devel, Juri Linkov
> If I may weigh in. I think the whole discussion of whether this should
> be symmetric or not is pointless. There are arguments for both sides,
> and without any significant amount of empirical evidence, any choice
> is as good as flipping a coin.
>
> I'd much rather we focus effort on making the equiv-classes easier to
> customize.
1. You are welcome to say that you would rather flip a coin than
try to discuss what this thread proposes.
2. I too would like to see progress wrt a discussion about letting
users easily define new equivalence classes and customize
existing equivalence classes. But please start a separate
thread for that.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-08 21:00 ` Drew Adams
@ 2015-09-09 15:06 ` Richard Stallman
0 siblings, 0 replies; 86+ messages in thread
From: Richard Stallman @ 2015-09-09 15:06 UTC (permalink / raw)
To: Drew Adams; +Cc: ulm, eliz, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> > Are you suggesting that searching for ü should match u but not ú or ù?
> I'm not speaking for Ulrich, but no, I am not suggesting that.
I was asking Ulrich.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-08 21:19 ` Juri Linkov
@ 2015-09-09 15:07 ` Richard Stallman
0 siblings, 0 replies; 86+ messages in thread
From: Richard Stallman @ 2015-09-09 15:07 UTC (permalink / raw)
To: Juri Linkov; +Cc: jean.christophe.helary, drew.adams, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> Do you mean a toggle for an individual character in the search string
> or a toggle for the whole search string?
I think it needs to be a toggle that applies to the input keys,
so when you toggle it, the new state affects subsequent keys.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-08 21:25 ` Drew Adams
@ 2015-09-09 15:07 ` Richard Stallman
2015-09-09 15:21 ` Drew Adams
0 siblings, 1 reply; 86+ messages in thread
From: Richard Stallman @ 2015-09-09 15:07 UTC (permalink / raw)
To: Drew Adams; +Cc: stephen, jean.christophe.helary, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> > Currently the user can either search for "any kind of e" or "only é"
> > or "only è" or "only ê", etc.
> That would still be the case.
> The only difference would be that when s?he wants to search for "any
> kind of e" s?he can use any of the equivalent e-chars. Any of [eéèêæë]
> would behave the same as `e' does not, when searching for any of [eéèêæë].
This seems to be a miscommunication.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-09 11:23 ` Artur Malabarba
2015-09-09 13:32 ` Drew Adams
@ 2015-09-09 15:12 ` Richard Stallman
2015-09-11 20:50 ` Juri Linkov
1 sibling, 1 reply; 86+ messages in thread
From: Richard Stallman @ 2015-09-09 15:12 UTC (permalink / raw)
To: bruce.connor.am; +Cc: eliz, juri, ulm, drew.adams, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> I'd much rather we focus effort on making the equiv-classes easier to customize.
Let's not call them "equiv-classes", because that term presupposes
symmetry. (An equivalence relation is symmetric.) Let's call them
search classes for characters.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-09 15:07 ` Richard Stallman
@ 2015-09-09 15:21 ` Drew Adams
2015-09-10 2:03 ` Richard Stallman
0 siblings, 1 reply; 86+ messages in thread
From: Drew Adams @ 2015-09-09 15:21 UTC (permalink / raw)
To: rms; +Cc: stephen, jean.christophe.helary, emacs-devel
> > > Currently the user can either search for "any kind of e" or "only é"
> > > or "only è" or "only ê", etc.
>
> > That would still be the case.
> > The only difference would be that when s?he wants to search for "any
> > kind of e" s?he can use any of the equivalent e-chars. Any of [eéèêæë]
> > would behave the same as `e' does not, when searching for any of
> > [eéèêæë].
>
> This seems to be a miscommunication.
That communication is itself unclear. _What_ seems to you to be a
miscommunication?
The point is that what you say is true currently would still be the
case with what is proposed in this thread. The user would continue
to be able to search for either any kind of e or for only a specific
kind of e.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
[not found] ` <<E1ZZh2a-0003u6-Fj@fencepost.gnu.org>
@ 2015-09-09 15:22 ` Drew Adams
2015-09-10 2:03 ` Richard Stallman
0 siblings, 1 reply; 86+ messages in thread
From: Drew Adams @ 2015-09-09 15:22 UTC (permalink / raw)
To: rms, bruce.connor.am; +Cc: eliz, juri, ulm, drew.adams, emacs-devel
> > I'd much rather we focus effort on making the equiv-classes easier to
> > customize.
>
> Let's not call them "equiv-classes", because that term presupposes
> symmetry. (An equivalence relation is symmetric.) Let's call them
> search classes for characters.
They are equivalence classes. The chars are equivalent when searched
for (with char folding turned on). The equivalence relation is among
the chars in the class.
This equivalence has nothing to do with the symmetry of handling them
between search string and searched text.
Whether or not they should _also_ be equivalent (handled the same way)
when used in the search string is the topic of this thread.
But even without that improvement, i.e., currently, the chars are
equivalent when searched for.
"Search classes for characters" means little. It says nothing about
what makes them a class - what they have in common. What they have
in common is that they are treated the same (equivalently) when
searched for.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-08 17:38 ` Stephen J. Turnbull
@ 2015-09-09 22:52 ` Drew Adams
2015-09-10 3:12 ` Drew Adams
0 siblings, 1 reply; 86+ messages in thread
From: Drew Adams @ 2015-09-09 22:52 UTC (permalink / raw)
To: Stephen J. Turnbull; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1944 bytes --]
> > I've approached this question only from a user point of
> > view (it is useful to be able to do it).
>
> Well, since I'm not going to do it any time soon, and you
> haven't even considered doing it yet, this thread is moot.
AFAICT, this (or similar) is the only code needed. It fixes
the char-table entries for the equivalent chars, so each points
to the equivalence class and not just to itself. (Currently,
only the "base" char points to the equivalence class.)
;; Add an entry for each equivalent char.
(let ((others ()))
(map-char-table
(lambda (base v)
(let ((chrs (aref equiv base)))
(when (consp chrs)
(dolist (chr (cdr chrs))
(push (cons (string-to-char chr) (remove chr chrs))
others)))))
equiv)
(dolist (it others)
(let ((base (car it))
(chars (cdr it)))
(aset equiv base (append chars (aref equiv base)))))))
This code fragment is included in the attached code that updates
`character-fold-table'. Evaluate the attached code, to try the
behavior proposed in this thread.
The attached code provides:
* A Boolean option, `char-fold-symmetric', so you can choose which
behavior you want. (Let users decide, instead of "flipping a
coin" at design time.)
If you use Customize (or the equivalent) to change the option
value then `character-fold-table' is automatically updated to
reflect the new option value.
* A function that updates `character-fold-table' to reflect the
option value. It evaluates the above code conditionally.
Just as now, you can use M-s ' to toggle char folding. With the
option value non-nil you get the behavior proposed in this thread.
With the option value nil you get the current, more limited behavior.
[I'm no expert on char tables. Perhaps the code could be improved.
But this seems to work OK. I think it exhibits the proposed behavior.]
[-- Attachment #2: symmetric-char-fold.el --]
[-- Type: application/octet-stream, Size: 5948 bytes --]
;; Load this file, to evaluate these two definitions in order.
;;
;; The second is an option that lets you choose the proposed behavior
;; or the current Emacs behavior, for character folding. The first is
;; a function that redefines the char-table used for character folding
;; (`character-fold-table'), so that it reflects the option value.
;;
;; When the option is non-nil, `character-fold-table' includes
;; equivalence entries for each member of a char-folding class (an
;; equivalence class wrt search). When the option is nil,
;; `character-fold-table' includes equivalence entries only for the
;; "base" character of each class.
;;
;; Use M-' to toggle char folding, as usual.
(defun update-char-fold-table ()
"Update the value of variable `character-fold-table'.
The new value reflects the current value of `char-fold-symmetric'."
(setq character-fold-table
(let* ((equiv (make-char-table 'character-fold-table))
(table (unicode-property-table-internal 'decomposition))
(func (char-table-extra-slot table 1)))
;; Ensure the table is populated.
(map-char-table
(lambda (i v) (when (consp i) (funcall func (car i) v table)))
table)
;; Compile a list of all complex chars that each simple char should match.
(map-char-table
(lambda (i dec)
(when (consp dec)
;; Discard a possible formatting tag.
(when (symbolp (car dec))
(setq dec (cdr dec)))
;; Skip trivial cases like ?a decomposing to (?a).
(unless (or (and (eq i (car dec)) (not (cdr dec))))
(let ((d dec)
(fold-decomp t)
k found)
(while (and d (not found))
(setq k (pop d))
;; Is k a number or letter, per unicode standard?
(setq found (memq (get-char-code-property k 'general-category)
'(Lu Ll Lt Lm Lo Nd Nl No))))
(if found
;; Check if the decomposition has more than one letter,
;; because then we don't want the first letter to match
;; the decomposition.
(dolist (k d)
(when (and fold-decomp
(memq (get-char-code-property k 'general-category)
'(Lu Ll Lt Lm Lo Nd Nl No)))
(setq fold-decomp nil)))
;; If there's no number or letter on the
;; decomposition, take the first character in it.
(setq found (car-safe dec)))
;; Finally, we only fold multi-char decomposition if at
;; least one of the chars is non-spacing (combining).
(when fold-decomp
(setq fold-decomp nil)
(dolist (k dec)
(when (and (not fold-decomp)
(> (get-char-code-property k 'canonical-combining-class) 0))
(setq fold-decomp t))))
;; Add i to the list of characters that k can
;; represent. Also possibly add its decomposition, so we can
;; match multi-char representations like (format "a%c" 769)
(when (and found (not (eq i k)))
(let ((chars (cons (char-to-string i) (aref equiv k))))
(aset equiv k (if fold-decomp
(cons (apply #'string dec) chars)
chars))))))))
table)
;; Add some manual entries.
(dolist (it '((?\" """ "“" "”" "”" "„" "⹂" "〞" "‟" "‟" "❞" "❝"
"❠" "“" "„" "〝" "〟" "🙷" "🙶" "🙸" "«" "»")
(?' "❟" "❛" "❜" "‘" "’" "‚" "‛" "‚" "" "❮" "❯" "‹" "›")
(?` "❛" "‘" "‛" "" "❮" "‹")))
(let ((idx (car it))
(chars (cdr it)))
(aset equiv idx (append chars (aref equiv idx)))))
;; --------8<------the only addition----------------
(when char-fold-symmetric
;; Add an entry for each equivalent char.
(let ((others ()))
(map-char-table
(lambda (base v)
(let ((chrs (aref equiv base)))
(when (consp chrs)
(dolist (chr (cdr chrs))
(push (cons (string-to-char chr) (remove chr chrs)) others)))))
equiv)
(dolist (it others)
(let ((base (car it))
(chars (cdr it)))
(aset equiv base (append chars (aref equiv base)))))))
;; --------8<---------------------------------------
;; Convert the lists of characters we compiled into regexps.
(map-char-table
(lambda (i v) (let ((re (regexp-opt (cons (char-to-string i) v))))
(if (consp i)
(set-char-table-range equiv i re)
(aset equiv i re))))
equiv)
equiv)))
(defcustom char-fold-symmetric t
"Non-nil means char-fold searching treats equivalent chars the same.
That is, use of any of a set of char-fold equivalent chars in a search
string finds any of them in the text being searched.
If nil then only the \"base\" or \"canonical\" char of the set matches
any of them. The others match only themselves, even when char-folding
is turned on."
:set (lambda (sym defs)
(custom-set-default sym defs)
(update-char-fold-table))
:type 'boolean :group 'isearch)
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-09 15:22 ` Drew Adams
@ 2015-09-10 2:03 ` Richard Stallman
2015-09-10 3:15 ` Drew Adams
0 siblings, 1 reply; 86+ messages in thread
From: Richard Stallman @ 2015-09-10 2:03 UTC (permalink / raw)
To: Drew Adams; +Cc: ulm, bruce.connor.am, emacs-devel, eliz, juri, drew.adams
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> They are equivalence classes. The chars are equivalent when searched
> for (with char folding turned on).
No, they aren't. For instance, A and Á are not equivalent in search.
Searching for A will match Á, but searching for Á will not match A.
To make them equivalent would be a change for the worse.
I already explained why.
Current
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-09 15:21 ` Drew Adams
@ 2015-09-10 2:03 ` Richard Stallman
2015-09-10 3:23 ` Drew Adams
0 siblings, 1 reply; 86+ messages in thread
From: Richard Stallman @ 2015-09-10 2:03 UTC (permalink / raw)
To: Drew Adams; +Cc: stephen, jean.christophe.helary, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> > Currently the user can either search for "any kind of e" or "only é"
> > or "only è" or "only ê", etc.
I mean, that the user can do all of these with one character, not
using any toggle command.
> That would still be the case.
> The only difference would be that when s?he wants to search for "any
> kind of e" s?he can use any of the equivalent e-chars.
No, another difference would be that NONE of the other options
is possible with one character -- all would require a toggle command
that people may not remember. (I don't.)
> The point is that what you say is true currently would still be the
> case with what is proposed in this thread. The user would continue
> to be able to search for either any kind of e or for only a specific
> kind of e.
The user would continue to be able to do this _somehow_, but not as
now without using a separate toggle command.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-09 22:52 ` Drew Adams
@ 2015-09-10 3:12 ` Drew Adams
2015-09-10 21:46 ` Drew Adams
0 siblings, 1 reply; 86+ messages in thread
From: Drew Adams @ 2015-09-10 3:12 UTC (permalink / raw)
To: Stephen J. Turnbull; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1324 bytes --]
> AFAICT, this (or similar) is the only code needed.
Sorry, I spoke too soon.
1. The following two lines are needed, before evaluating the code
I sent earlier. (I've attached an update that includes them, so
you can just load/evaluate it.)
(setq character-fold-search t)
(load-library "character-fold")
This is due to the way the vanilla code is at the moment. This also
means that for this testing char folding will be on, to start with.
2. The code I have is not sufficient for everything. You can
use it to see what the behavior is for single-char entries in the
char table, which includes accented chars (chars with diacritics).
But it does not also handle multiple-char entries in the table.
For instance, you can search for "é" and get char folding, but you
cannot search for "é" and get char folding. The first of these is
just the char named LATIN SMALL LETTER E WITH ACUTE. The second is
plain "e" composed with "́" (the char named COMBINING ACUTE ACCENT).
Some more work would be needed to make such combinations work too.
As I said, I'm no expert on char tables. But the attached code
should give you a good idea of what is involved.
At the end of the file I included some commented-out e chars to
search for. (Use `C-u C-x =' on a char to see what it really is.)
[-- Attachment #2: symmetric-char-fold.el --]
[-- Type: application/octet-stream, Size: 5807 bytes --]
(setq character-fold-search t)
(load-library "character-fold")
(defun update-char-fold-table ()
"Update the value of variable `character-fold-table'.
The new value reflects the current value of `char-fold-symmetric'."
(setq character-fold-table
(let* ((equiv (make-char-table 'character-fold-table))
(table (unicode-property-table-internal 'decomposition))
(func (char-table-extra-slot table 1)))
;; Ensure the table is populated.
(map-char-table
(lambda (i v) (when (consp i) (funcall func (car i) v table)))
table)
;; Compile a list of all complex chars that each simple char should match.
(map-char-table
(lambda (i dec)
(when (consp dec)
;; Discard a possible formatting tag.
(when (symbolp (car dec))
(setq dec (cdr dec)))
;; Skip trivial cases like ?a decomposing to (?a).
(unless (or (and (eq i (car dec)) (not (cdr dec))))
(let ((d dec)
(fold-decomp t)
k found)
(while (and d (not found))
(setq k (pop d))
;; Is k a number or letter, per unicode standard?
(setq found (memq (get-char-code-property k 'general-category)
'(Lu Ll Lt Lm Lo Nd Nl No))))
(if found
;; Check if the decomposition has more than one letter,
;; because then we don't want the first letter to match
;; the decomposition.
(dolist (k d)
(when (and fold-decomp
(memq (get-char-code-property k 'general-category)
'(Lu Ll Lt Lm Lo Nd Nl No)))
(setq fold-decomp nil)))
;; If there's no number or letter on the
;; decomposition, take the first character in it.
(setq found (car-safe dec)))
;; Finally, we only fold multi-char decomposition if at
;; least one of the chars is non-spacing (combining).
(when fold-decomp
(setq fold-decomp nil)
(dolist (k dec)
(when (and (not fold-decomp)
(> (get-char-code-property k 'canonical-combining-class) 0))
(setq fold-decomp t))))
;; Add i to the list of characters that k can
;; represent. Also possibly add its decomposition, so we can
;; match multi-char representations like (format "a%c" 769)
(when (and found (not (eq i k)))
(let ((chars (cons (char-to-string i) (aref equiv k))))
(aset equiv k (if fold-decomp
(cons (apply #'string dec) chars)
chars))))))))
table)
;; Add some manual entries.
(dolist (it '((?\" """ "“" "”" "”" "„" "⹂" "〞" "‟" "‟" "❞" "❝"
"❠" "“" "„" "〝" "〟" "🙷" "🙶" "🙸" "«" "»")
(?' "❟" "❛" "❜" "‘" "’" "‚" "‛" "‚" "" "❮" "❯" "‹" "›")
(?` "❛" "‘" "‛" "" "❮" "‹")))
(let ((idx (car it))
(chars (cdr it)))
(aset equiv idx (append chars (aref equiv idx)))))
;; --------8<------the only addition----------------
(when char-fold-symmetric
;; Add an entry for each equivalent char.
(let ((others ()))
(map-char-table
(lambda (base v)
(let ((chrs (aref equiv base)))
(when (consp chrs)
(dolist (chr (cdr chrs))
(push (cons (string-to-char chr) (remove chr chrs)) others)))))
equiv)
(dolist (it others)
(let ((base (car it))
(chars (cdr it)))
(aset equiv base (append chars (aref equiv base)))))))
;; --------8<---------------------------------------
;; Convert the lists of characters we compiled into regexps.
(map-char-table
(lambda (i v) (let ((re (regexp-opt (cons (char-to-string i) v))))
(if (consp i)
(set-char-table-range equiv i re)
(aset equiv i re))))
equiv)
equiv)))
(defcustom char-fold-symmetric t
"Non-nil means char-fold searching treats equivalent chars the same.
That is, use of any of a set of char-fold equivalent chars in a search
string finds any of them in the text being searched.
If nil then only the \"base\" or \"canonical\" char of the set matches
any of them. The others match only themselves, even when char-folding
is turned on."
:set (lambda (sym defs)
(custom-set-default sym defs)
(update-char-fold-table))
:type 'boolean :group 'isearch)
;; ("𝚎" "𝙚" "𝘦" "𝗲" "𝖾" "𝖊" "𝕖" "𝔢" "𝓮" "𝒆" "𝑒" "𝐞" "e" "㋎" "㋍" "ⓔ" "⒠"
;; "ⅇ" "ℯ" "ₑ" "ẽ" "ẽ" "ẻ" "ẻ" "ẹ" "ẹ" "ḛ" "ḛ" "ḙ" "ḙ" "ᵉ" "ȩ" "ȩ" "ȇ" "ȇ"
;; "ȅ" "ȅ" "ě" "ě" "ę" "ę" "ė" "ė" "ĕ" "ĕ" "ē" "ē" "ë" "ë" "ê" "ê" "é" "é" "è" "è")
;; No good yet: "𝚎" "ẽ" "ẻ" "ẹ" "ḛ" "ḙ" "ȩ" "ȇ" "ȅ"
;; "ě" "ę" "ė" "ĕ" "ē" "ë" "ê" "é" "è"
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-10 2:03 ` Richard Stallman
@ 2015-09-10 3:15 ` Drew Adams
2015-09-10 6:57 ` David Kastrup
2015-09-10 15:50 ` Richard Stallman
0 siblings, 2 replies; 86+ messages in thread
From: Drew Adams @ 2015-09-10 3:15 UTC (permalink / raw)
To: rms; +Cc: eliz, emacs-devel, ulm, bruce.connor.am, juri
> > They are equivalence classes. The chars are equivalent when searched
> > for (with char folding turned on).
>
> No, they aren't. For instance, A and Á are not equivalent in search.
> Searching for A will match Á, but searching for Á will not match A.
Please read what I said: "The chars are equivalent when searched for."
^^^^^^^^^^^^^^^^^
I did *not* say, as you say, that they are "equivalent in search."
I tried to carefully distinguish the two uses of the chars: when used
as search targets (they are currently equivalent) vs when used in the
search string (they are not equivalent, currently).
If you search for A with char folding on you will find both A and Á
(and all the rest of the A family). All members of that family (class)
are equivalent _as search targets_. Currently. 100% equivalent.
They form an equivalence class wrt the operation of searching _for_
them.
They are not yet equivalent also as search patterns, i.e., when used
in the search string. That is the proposal of this thread: to make
them equivalent also in their use in a search string (when char
folding is turned on).
> To make them equivalent would be a change for the worse.
> I already explained why.
The only explanation I saw from you was that you want the presence
of an accented char in the search string to automatically turn off
char folding. That's your preference.
It leads to an absolute reduction of possibilities for users (they
cannot use abstract from accented search when there are accented
chars in the search string). But you have every right to prefer
that limitation.
Please be aware that with what is being proposed a user can still,
anytime, get diacritic-sensitive search when there are accented
chars in the search string (and when there not). It is sufficient
to toggle off char folding.
You want that toggling off to happen automatically, based on the
mere presence of an accented char in the search string. I don't,
because users lose the possibility of getting char-folded search
whenever there are accented chars in the search string. They
then need to edit the search string if they want to abstract from
diacritics, replacing any such chars with the unaccented ("base")
versions, in order to get char-fold search.
In the code I sent, I provided a user option, to let _users_
decide which behavior they want, individually: the one you
prefer or the one I prefer. Why not give them the choice?
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-10 2:03 ` Richard Stallman
@ 2015-09-10 3:23 ` Drew Adams
2015-09-11 10:28 ` Richard Stallman
2015-09-11 10:28 ` Richard Stallman
0 siblings, 2 replies; 86+ messages in thread
From: Drew Adams @ 2015-09-10 3:23 UTC (permalink / raw)
To: rms; +Cc: stephen, jean.christophe.helary, emacs-devel
> > > Currently the user can either search for "any kind of e" or "only é"
> > > or "only è" or "only ê", etc.
>
> I mean, that the user can do all of these with one character, not
> using any toggle command.
Yes, that is the difference in our views. Sure, "with one character",
but the flip side is that if you happen to have é in your search string,
however it got there (e.g. by pasting), then with your preferred behavior
you *cannot* use your search string to search for "any kind of e".
This is maybe clearer when you think about copying some text to search
for from outside Emacs, and that text might have curly quotes in it,
in multiple places, and the text that you want to search might use
other kinds of quotes, and you want the matching to match quotes
regardless of type.
In that use case, you are screwed in the current design. Nothing to
be done, to get char-fold search, until you replace all such non-base
chars in the search string with their corresponding base chars.
(And you talk about difficulty remembering? Try remembering the base
char of each equivalence class... Sure, letters and numerals are
easy, but not some others. And we're just getting started.)
> > That would still be the case.
> > The only difference would be that when s?he wants to search for "any
> > kind of e" s?he can use any of the equivalent e-chars.
>
> No, another difference would be that NONE of the other options
> is possible with one character -- all would require a toggle command
> that people may not remember. (I don't.)
NONE of what other options? All of the same search behaviors are
available. That is, you can find any search target that you can
find today, using any search string that you use today.
On the difficulty of toggling char folding:
Do you remember how to toggle case sensitivity? How come you do?
Because you've done it a few times? And if you forget, you use
`C-s C-h'? Or you use `C-h f isearch-forward'? How hard is that?
Anyway, it's not likely I'll convince you to enjoy the feature
yourself. But maybe you can appreciate giving users the choice?
> > The point is that what you say is true currently would still be the
> > case with what is proposed in this thread. The user would continue
> > to be able to search for either any kind of e or for only a specific
> > kind of e.
>
> The user would continue to be able to do this _somehow_, but not as
> now without using a separate toggle command.
Correct. But at least that use case would still be available.
Currently, the use case that the proposal provides for is not
even possible - not never not noway not nohow. That's really
the point: provide that possibility.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-10 3:15 ` Drew Adams
@ 2015-09-10 6:57 ` David Kastrup
2015-09-10 15:02 ` Drew Adams
2015-09-10 15:50 ` Richard Stallman
1 sibling, 1 reply; 86+ messages in thread
From: David Kastrup @ 2015-09-10 6:57 UTC (permalink / raw)
To: Drew Adams; +Cc: rms, ulm, bruce.connor.am, juri, eliz, emacs-devel
Drew Adams <drew.adams@oracle.com> writes:
>> > They are equivalence classes. The chars are equivalent when searched
>> > for (with char folding turned on).
>>
>> No, they aren't. For instance, A and Á are not equivalent in search.
>> Searching for A will match Á, but searching for Á will not match A.
>
> Please read what I said: "The chars are equivalent when searched for."
> ^^^^^^^^^^^^^^^^^
They aren't. Searching with the search string "Á" will find "Á" but not
"A".
> I did *not* say, as you say, that they are "equivalent in search."
> I tried to carefully distinguish the two uses of the chars: when used
> as search targets (they are currently equivalent) vs when used in the
> search string (they are not equivalent, currently).
Yes, there is a distinction between search targets and search spec. But
they are different in either category.
--
David Kastrup
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-10 6:57 ` David Kastrup
@ 2015-09-10 15:02 ` Drew Adams
0 siblings, 0 replies; 86+ messages in thread
From: Drew Adams @ 2015-09-10 15:02 UTC (permalink / raw)
To: David Kastrup; +Cc: rms, ulm, bruce.connor.am, juri, eliz, emacs-devel
> >>> They are equivalence classes. The chars are equivalent when searched
> >>> for (with char folding turned on).
> >>
> >> No, they aren't. For instance, A and Á are not equivalent in search.
> >> Searching for A will match Á, but searching for Á will not match A.
> >
> > Please read what I said: "The chars are equivalent when searched for."
> > ^^^^^^^^^^^^^^^^^
(with char-fold search, i.e., ignoring diacritics - that's the context)
> They aren't. Searching with the search string "Á" will find "Á" but
> not "A".
For anyone who really still does not understand, and anyone who might
be pretending not to understand ;-):
When search is case-insensitive, occurrences of a and A in the
searched text are found equivalently. As search targets, a and A are
equivalent for case-insensitive search.
If you ask to find an occurrence of the first letter of the English
alphabet, and you say that you don't care about case, you find, as you
expect, either a or A, indifferently.
a and A in the searched text are treated the same by case folding.
They form an equivalence class in this context.
But in Emacs, if you put A in the search string then you inhibit, turn
OFF, blow away case-insensitive search - case is no longer folded. So
of course any statement about the behavior of case-fold search is
irrelevant then.
Likewise, for char folding.
When char folding is on, A and Á in the searched text are found
equivalently. As search targets, A and Á are equivalent for char-fold
search.
If you don't care about diacritics, you can expect to find either A or
Á, indifferently, and you do, when char folding is in effect.
A and Á in the searched text are treated the same by char folding.
They form an equivalence class in this context.
But in Emacs, currently, if you put Á in the search string then you
inhibit, turn OFF, blow away char-fold search. So of course any
statement about the behavior of char-fold search is irrelevant then.
a and A for case folding, and A and Á for char folding, form
equivalence classes wrt being found in searched text. Case folding
does NOT apply if you put A in the search string. Char folding does
NOT apply if you put Á in the search string.
Ulrich Müller CANNOT search for his last name using Müller in the
search string and have search ignore diacritics, so that it matches
indifferently Müller and Muller. That is, char folding simply DOES
NOT WORK here - verboten. (He can of course use regexp search to work
around the limitation.)
> > I did *not* say, as you say, that they are "equivalent in search."
> > I tried to carefully distinguish the two uses of the chars: when used
> > as search targets (they are currently equivalent) vs when used in the
> > search string (they are not equivalent, currently).
>
> Yes, there is a distinction between search targets and search spec. But
> they are different in either category.
Indeed, sigh.
The point of the proposal of this thread is to _allow_ users to search
_using char folding_ regardless of whether there are diacritics in the
search string. They would still be able to use search without char
folding, e.g., to search for Á and find only Á, not also A.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-10 3:15 ` Drew Adams
2015-09-10 6:57 ` David Kastrup
@ 2015-09-10 15:50 ` Richard Stallman
1 sibling, 0 replies; 86+ messages in thread
From: Richard Stallman @ 2015-09-10 15:50 UTC (permalink / raw)
To: Drew Adams; +Cc: eliz, emacs-devel, ulm, bruce.connor.am, juri
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> > No, they aren't. For instance, A and Á are not equivalent in search.
> > Searching for A will match Á, but searching for Á will not match A.
> Please read what I said: "The chars are equivalent when searched for."
> ^^^^^^^^^^^^^^^^^
I stand corrected. Strictly speaking, that is true. But since the
term's implications could be misleading, let's avoid the word
"equivalence" and say it in other ways.
> That is the proposal of this thread: to make
> them equivalent also in their use in a search string (when char
> folding is turned on).
I think that is a mistake.
> The only explanation I saw from you was that you want the presence
> of an accented char in the search string to automatically turn off
> char folding. That's your preference.
I proposed that. But perhaps making Á match only Á is better.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-10 3:12 ` Drew Adams
@ 2015-09-10 21:46 ` Drew Adams
0 siblings, 0 replies; 86+ messages in thread
From: Drew Adams @ 2015-09-10 21:46 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1707 bytes --]
Yesterday I said:
> 2. The code I have is not sufficient for everything. You can
> use it to see what the behavior is for single-char entries in the
> char table, which includes accented chars (chars with diacritics).
> But it does not also handle multiple-char entries in the table.
>
> For instance, you can search for "é" and get char folding, but you
> cannot search for "é" and get char folding. The first of these is
> just the char named LATIN SMALL LETTER E WITH ACUTE. The second is
> plain "e" composed with "́" (the char named COMBINING ACUTE ACCENT).
>
> Some more work would be needed to make such combinations work too.
> As I said, I'm no expert on char tables. But the attached code
> should give you a good idea of what is involved.
The attached version seems to take care of this, so you can search
with, say, the decomposition "é" and get the same effect as
searching for the fully composed char "é".
Again, just load the file, to try it out. Remember that M-s '
toggles char folding.
At the end of the file there are a few strings you can use to test.
When you see two consecutive strings there that look the same, the
first is a decomposition, and the second is the same char fully
composed.
For example: "é" "é". (The first string is two chars, however it
might be displayed.)
`C-u C-x =' on the first char of the first string tells you:
LATIN SMALL LETTER E, decomposition: (101) ('e')
and on the second char it tells you:
COMBINING ACUTE ACCENT, decomposition: (769) ('́').
`C-u C-x =' on the single char of the second string tells you:
LATIN SMALL LETTER E WITH ACUTE, decomposition: (101 769) ('e' '́')
[-- Attachment #2: symmetric-char-fold.el --]
[-- Type: application/octet-stream, Size: 7320 bytes --]
(setq character-fold-search t)
(load-library "character-fold")
(defvar char-fold-decomps ()
"List of conses of a decomposition and its base char.")
(defun update-char-fold-table ()
"Update the value of variable `character-fold-table'.
The new value reflects the current value of `char-fold-symmetric'."
(setq char-fold-decomps ())
(setq character-fold-table
(let* ((equiv (make-char-table 'character-fold-table))
(table (unicode-property-table-internal 'decomposition))
(func (char-table-extra-slot table 1)))
;; Ensure the table is populated.
(map-char-table (lambda (i v) (when (consp i) (funcall func (car i) v table)))
table)
;; Compile a list of all complex chars that each simple char should match.
(map-char-table
(lambda (i dec)
(when (consp dec)
;; Discard a possible formatting tag.
(when (symbolp (car dec))
(setq dec (cdr dec)))
;; Skip trivial cases like ?a decomposing to (?a).
(unless (and (eq i (car dec)) (not (cdr dec)))
(let ((d dec)
(fold-decomp t)
k found)
(while (and d (not found))
(setq k (pop d))
;; Is k a number or letter, per unicode standard?
(setq found (memq (get-char-code-property k 'general-category)
'(Lu Ll Lt Lm Lo Nd Nl No))))
(if found
;; Check if the decomposition has more than one letter,
;; because then we don't want the first letter to match
;; the decomposition.
(dolist (k d)
(when (and fold-decomp
(memq (get-char-code-property k 'general-category)
'(Lu Ll Lt Lm Lo Nd Nl No)))
(setq fold-decomp nil)))
;; If there's no number or letter on the
;; decomposition, take the first character in it.
(setq found (car-safe dec)))
;; Finally, we only fold multi-char decomposition if at
;; least one of the chars is non-spacing (combining).
(when fold-decomp
(setq fold-decomp nil)
(dolist (k dec)
(when (and (not fold-decomp)
(> (get-char-code-property k 'canonical-combining-class) 0))
(setq fold-decomp t))))
;; Add i to the list of characters that k can
;; represent. Also possibly add its decomposition, so we can
;; match multi-char representations like (format "a%c" 769)
(when (and found (not (eq i k)))
(let ((chr-strgs (cons (char-to-string i) (aref equiv k))))
(aset equiv k (if fold-decomp
(cons (apply #'string dec) chr-strgs)
chr-strgs))))))))
table)
;; Add some manual entries.
(dolist (it '((?\" """ "“" "”" "”" "„" "⹂" "〞" "‟" "‟" "❞" "❝"
"❠" "“" "„" "〝" "〟" "🙷" "🙶" "🙸" "«" "»")
(?' "❟" "❛" "❜" "‘" "’" "‚" "‛" "‚" "" "❮" "❯" "‹" "›")
(?` "❛" "‘" "‛" "" "❮" "‹")))
(let ((idx (car it))
(chr-strgs (cdr it)))
(aset equiv idx (append chr-strgs (aref equiv idx)))))
;; --------8<------the only addition----------------
(when char-fold-symmetric
;; Add an entry for each equivalent char.
(let ((others ()))
(map-char-table
(lambda (base v)
(let ((chr-strgs (aref equiv base)))
(when (consp chr-strgs)
(dolist (strg (cdr chr-strgs))
(if (< (length strg) 2)
(push (cons (string-to-char strg) (remove strg chr-strgs)) others)
;; A decomposition. Add it and its base char to `char-fold-decomps'.
(push (cons strg (char-to-string base)) char-fold-decomps))))))
equiv)
(dolist (it others)
(let ((base (car it))
(chr-strgs (cdr it)))
(aset equiv base (append chr-strgs (aref equiv base)))))))
;; --------8<---------------------------------------
;; Convert the lists of characters we compiled into regexps.
(map-char-table
(lambda (i v) (let ((re (regexp-opt (cons (char-to-string i) v))))
(if (consp i)
(set-char-table-range equiv i re)
(aset equiv i re))))
equiv)
equiv)))
(defun character-fold-to-regexp (string &optional lax)
"Return a regexp matching anything that character-folds into STRING.
If `character-fold-search' is nil, just `regexp-quote' STRING.
Otherwise:
Replace any decompositions in `character-fold-table' by their base
chars, so search will match all equivalents. Then replace any chars
in STRING that have entries in `character-fold-table' by their
entries (which are regexps), and replace other chars in STRING by
`regexp-quote' applied to them.
Non-nil LAX means any whitespace char can match any number of times."
(if (not character-fold-search)
(regexp-quote string)
(when char-fold-decomps
(dolist (decomp char-fold-decomps)
(setq string (replace-regexp-in-string
(regexp-quote (car decomp)) (cdr decomp) string 'FIXED-CASE 'LITERAL))))
(apply #'concat
(mapcar (lambda (c) (if (and lax (memq c '(?\s ?\t ?\r ?\n)))
"[ \t\n\r\xa0\x2002\x2d\x200a\x202f\x205f\x3000]+"
(or (aref character-fold-table c)
(regexp-quote (string c)))))
string))))
(defcustom char-fold-symmetric t
"Non-nil means char-fold searching treats equivalent chars the same.
That is, use of any of a set of char-fold equivalent chars in a search
string finds any of them in the text being searched.
If nil then only the \"base\" or \"canonical\" char of the set matches
any of them. The others match only themselves, even when char-folding
is turned on."
:set (lambda (sym defs)
(custom-set-default sym defs)
(update-char-fold-table))
:type 'boolean :group 'isearch)
;; Test by searching for these strings.
;; ("𝚎" "𝙚" "𝘦" "𝗲" "𝖾" "𝖊" "𝕖" "𝔢" "𝓮" "𝒆" "𝑒" "𝐞" "e" "㋎" "㋍" "ⓔ" "⒠"
;; "ⅇ" "ℯ" "ₑ" "ẽ" "ẽ" "ẻ" "ẻ" "ẹ" "ẹ" "ḛ" "ḛ" "ḙ" "ḙ" "ᵉ" "ȩ" "ȩ" "ȇ" "ȇ"
;; "ȅ" "ȅ" "ě" "ě" "ę" "ę" "ė" "ė" "ĕ" "ĕ" "ē" "ē" "ë" "ë" "ê" "ê" "é" "é" "è" "è")
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-10 3:23 ` Drew Adams
@ 2015-09-11 10:28 ` Richard Stallman
2015-09-11 13:28 ` Stefan Monnier
2015-09-11 16:31 ` Drew Adams
2015-09-11 10:28 ` Richard Stallman
1 sibling, 2 replies; 86+ messages in thread
From: Richard Stallman @ 2015-09-11 10:28 UTC (permalink / raw)
To: Drew Adams; +Cc: stephen, jean.christophe.helary, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> Yes, that is the difference in our views. Sure, "with one character",
> but the flip side is that if you happen to have é in your search string,
> however it got there (e.g. by pasting), then with your preferred behavior
> you *cannot* use your search string to search for "any kind of e".
You are right, for what I originally proposed. It would be like the
current situation with case folding, that you can't paste in a search
string with capital letters and search for it in a case-independent way.
However, in the case of case folding, we solve that by downcasing
text when pasting it into search strings. We could de-accent strings
too when pasting them.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-10 3:23 ` Drew Adams
2015-09-11 10:28 ` Richard Stallman
@ 2015-09-11 10:28 ` Richard Stallman
2015-09-11 16:31 ` Drew Adams
1 sibling, 1 reply; 86+ messages in thread
From: Richard Stallman @ 2015-09-11 10:28 UTC (permalink / raw)
To: Drew Adams; +Cc: stephen, jean.christophe.helary, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> > No, another difference would be that NONE of the other options
> > is possible with one character -- all would require a toggle command
> > that people may not remember. (I don't.)
> NONE of what other options?
Currently you can type a single character and do any of these things:
* Search for A with or without any accent.
* Search for Á only.
* Search for À only.
* Search for  only.
* Search for Ä only.
and likewise for each accented variant of A that exists in Unicode.
With your change, all of those characters would do the same thing:
search for A with or without any accent.
So there would be only one thing you can do in regard to searching for As,
without using some sort of toggling command.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-11 10:28 ` Richard Stallman
@ 2015-09-11 13:28 ` Stefan Monnier
2015-09-11 16:33 ` Drew Adams
2015-09-12 15:28 ` Richard Stallman
2015-09-11 16:31 ` Drew Adams
1 sibling, 2 replies; 86+ messages in thread
From: Stefan Monnier @ 2015-09-11 13:28 UTC (permalink / raw)
To: Richard Stallman; +Cc: stephen, jean.christophe.helary, Drew Adams, emacs-devel
> current situation with case folding, that you can't paste in a search
> string with capital letters and search for it in a case-independent way.
Yes, you can: Use M-c to explicitly choose whether to case-fold or not.
> However, in the case of case folding, we solve that by downcasing
> text when pasting it into search strings. We could de-accent strings
> too when pasting them.
Actually, the way we downcase it has problems. E.g. Go to the beginning
of this paragraph (i.e. before "Actually") and do:
C-s C-w M-c
and you end up searching for an exact (non-case-folded) match of
"actually" rather than "Actually", so it won't even match the "Actually"
from which you got it.
Stefan
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-11 10:28 ` Richard Stallman
@ 2015-09-11 16:31 ` Drew Adams
2015-09-12 15:29 ` Richard Stallman
0 siblings, 1 reply; 86+ messages in thread
From: Drew Adams @ 2015-09-11 16:31 UTC (permalink / raw)
To: rms; +Cc: stephen, jean.christophe.helary, emacs-devel
> Currently you can type a single character and do any of these things:
> * Search for A with or without any accent.
(1)
> * Search for Á only.
(3)
> * Search for À only.
(4)
> * Search for  only.
(5)
> * Search for Ä only.
> and likewise for each accented variant of A that exists in Unicode.
(6)
>
> With your change, all of those characters would do the same thing:
> search for A with or without any accent.
>
> So there would be only one thing you can do in regard to searching for
> As, without using some sort of toggling command.
Correct. We are agreeing about the facts, which is good. Per proposal:
With char folding ON:
(1) Search for A with or without any accent.
(2) Search for "each accented variant of A that exists in Unicode",
with or without any accent.
With char folding OFF:
(3), (4), (5), (6) Search for Á, À, Â, Ä only (and likewise for each...)
What the current design misses is possibility (2). You *cannot*
search using "Müller" and find "Muller" etc.
And yes, with the proposal a user explicitly expresses an intention
to search with or without char folding, by hitting a key to turn it
ON/OFF. There is no automatic turn-OFF just because there is a char
with a diacritic in the search string.
What's more, a user option can let users choose which behavior they
prefer, instead of hardcoding that choice into the design. What's
more, a user can (or we could) add a toggle key for flipping that
behavior: both Drew and Richard could quickly switch "designs" on
the fly, if they wanted to.
Why not?
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-11 10:28 ` Richard Stallman
2015-09-11 13:28 ` Stefan Monnier
@ 2015-09-11 16:31 ` Drew Adams
1 sibling, 0 replies; 86+ messages in thread
From: Drew Adams @ 2015-09-11 16:31 UTC (permalink / raw)
To: rms; +Cc: stephen, jean.christophe.helary, emacs-devel
> > Yes, that is the difference in our views. Sure, "with one
> > character", but the flip side is that if you happen to have
> > é in your search string, however it got there (e.g. by
> > pasting), then with your preferred behavior you *cannot*
> > use your search string to search for "any kind of e".
>
> You are right, for what I originally proposed. It would be like the
> current situation with case folding, that you can't paste in a search
> string with capital letters and search for it in a case-independent way.
Exactly. You cannot. But you can still (thankfully)
explicitly toggle afterward using `M-c', to turn case
folding back on.
> However, in the case of case folding, we solve that by downcasing
> text when pasting it into search strings. We could de-accent strings
> too when pasting them.
Actually, Emacs does *not* do that in the general case for
pasting copied text.
emacs -Q ; `case-fold-search' is t
Copy uppercase A from some text to the kill ring.
In a buffer that has both lowercase and uppercase a's:
C-s M-e C-y ; Paste the uppercase A. It appears uppercase.
C-s ; Only uppercase A's are found.
It does what you describe only when you yank text at point
(e.g., using `C-M-y' or `C-w'). The use case I've been
insisting on is copying some text from anywhere (e.g.,
from a web browser outside Emacs). That text can contain
any chars.
But anyway, I can agree that what you describe (automatic
downcasing and removal of accents) might be a reasonable
possibility to consider.
But what if a user then wants unfolded search, after such
pasting? S?he then needs to toggle anyway.
I don't prefer such a design because it is another automatic
switching of "mode" (folding ON/OFF). It happens behind the
user's back, trying to second-guess what is best for all users
in all contexts: DWIM (do something hardcoded, which someone
thought at design time everyone will want at runtime).
You don't like using a toggle key, which I can understand.
Without toggling, which makes intention explicit/clear, you
must rely on these things:
1. The mode setting the folding behavior (ON/OFF) appropriately
- e.g., Info turns it ON locally, regardless of a user's
customization of global `case-fold-search'. (This is good.)
2. DWIM: uppercase or accented char in the search string turns
folding on. Pasting into the search string strips pasted
text of uppercase and accents. (Good for you, bad for me.)
If that doesn't fit what a user wants in a given context (e.g.,
if s?he wants to search case-sensitively in Info) then s?he
needs to toggle anyway.
I suspect that you might exaggerate the inconvenience, even
for yourself, of having to explicitly toggle when you want
to change state/mode. I use a version of Isearch that
requires such toggling, and in practice I rarely toggle!
Why do I rarely need to toggle? Perhaps because:
* I usually want case-sensitive search.
* The cases where I do not are usually covered by #1:
the mode (e.g. Info) DTRT locally.
At any rate, perhaps we could agree that users can prefer
different behaviors? And let Emacs give them the choice?
At customization time at a minimum, and in some cases via
an on-the-fly toggle key?
(If you don't need such a toggle then you certainly don't
need to worry about memorizing it. ;-))
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-11 13:28 ` Stefan Monnier
@ 2015-09-11 16:33 ` Drew Adams
2015-09-11 20:59 ` Juri Linkov
2015-09-12 15:28 ` Richard Stallman
1 sibling, 1 reply; 86+ messages in thread
From: Drew Adams @ 2015-09-11 16:33 UTC (permalink / raw)
To: Stefan Monnier, Richard Stallman
Cc: stephen, jean.christophe.helary, emacs-devel
> > current situation with case folding, that you can't paste in
> > a search string with capital letters and search for it in a
> > case-independent way.
>
> Yes, you can: Use M-c to explicitly choose whether to case-fold or not.
Your "Yes" is really an agreement that no, you cannot, but you can
at least override/cancel Emacs's DWIM behavior, by then using `M-c'
to explicitly turn case-folding back on.
That is, after you figure out that Emacs has turned the tables on
you (and there is no signal that it has - no message telling you
that it is now searching case-sensitively), you can insist that
it go back to the mode you had already chosen: case-insensitive.
And thank goodness Emacs does not remove this possibility of
overriding its second-guessing.
> > However, in the case of case folding, we solve that by downcasing
> > text when pasting it into search strings. We could de-accent
> > strings too when pasting them.
>
> Actually, the way we downcase it has problems. E.g. Go to the
> beginning of this paragraph (i.e. before "Actually") and do:
> C-s C-w M-c and you end up searching for an exact (non-case-folded)
> match of "actually" rather than "Actually", so it won't even match the
> "Actually" from which you got it.
Yes. And see my reply to RMS - if you paste text with an uppercase
letter while editing the search string using `M-e', case-folding is
still turned off automatically.
IOW, the automatic downcasing DWIM is used only when you use `C-M-y'
(or `C-w') to yank some text at point into the search string. What
was said about automatic downcasing is not true for pasting in
general. Which points to another possibility of use confusion
(inconsistency).
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-09 15:12 ` Richard Stallman
@ 2015-09-11 20:50 ` Juri Linkov
0 siblings, 0 replies; 86+ messages in thread
From: Juri Linkov @ 2015-09-11 20:50 UTC (permalink / raw)
To: Richard Stallman; +Cc: drew.adams, eliz, ulm, bruce.connor.am, emacs-devel
> > I'd much rather we focus effort on making the equiv-classes easier to customize.
>
> Let's not call them "equiv-classes", because that term presupposes
> symmetry. (An equivalence relation is symmetric.) Let's call them
> search classes for characters.
A case table can define all of them: upcase, canonicalize, and equivalence
classes, so char-folding could define equiv-classes as well.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-11 16:33 ` Drew Adams
@ 2015-09-11 20:59 ` Juri Linkov
2015-09-11 23:11 ` Drew Adams
0 siblings, 1 reply; 86+ messages in thread
From: Juri Linkov @ 2015-09-11 20:59 UTC (permalink / raw)
To: Drew Adams
Cc: stephen, jean.christophe.helary, emacs-devel, Stefan Monnier,
Richard Stallman
> That is, after you figure out that Emacs has turned the tables on
> you (and there is no signal that it has - no message telling you
> that it is now searching case-sensitively),
For the automatic toggling of case-sensitivity we could display the same
message as displayed for manual toggling with ‘M-s c’.
> IOW, the automatic downcasing DWIM is used only when you use `C-M-y'
> (or `C-w') to yank some text at point into the search string. What
> was said about automatic downcasing is not true for pasting in
> general. Which points to another possibility of use confusion
> (inconsistency).
No, pasting is broken too: try to paste the upper case “A” with ‘C-s
C-y’ (isearch-yank-kill) - it's irrecoverably converted to lower case.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
2015-09-11 20:59 ` Juri Linkov
@ 2015-09-11 23:11 ` Drew Adams
0 siblings, 0 replies; 86+ messages in thread
From: Drew Adams @ 2015-09-11 23:11 UTC (permalink / raw)
To: Juri Linkov
Cc: stephen, jean.christophe.helary, emacs-devel, Stefan Monnier,
Richard Stallman
> > That is, after you figure out that Emacs has turned the tables on
> > you (and there is no signal that it has - no message telling you
> > that it is now searching case-sensitively),
>
> For the automatic toggling of case-sensitivity we could display the same
> message as displayed for manual toggling with ‘M-s c’.
Yes, please.
We should also discuss (in another thread, please) other,
additional or better/instead ways to show the user state
changes and the current state.
> > IOW, the automatic downcasing DWIM is used only when you use `C-M-y'
> > (or `C-w') to yank some text at point into the search string. What
> > was said about automatic downcasing is not true for pasting in
> > general. Which points to another possibility of use confusion
> > (inconsistency).
>
> No, pasting is broken too: try to paste the upper case “A” with ‘C-s
> C-y’ (isearch-yank-kill) - it's irrecoverably converted to lower case.
Oh, right. So two ways to get broken pasting in that sense,
and one way to get broken pasting in the other sense (the
example I gave, with `M-e').
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-11 13:28 ` Stefan Monnier
2015-09-11 16:33 ` Drew Adams
@ 2015-09-12 15:28 ` Richard Stallman
1 sibling, 0 replies; 86+ messages in thread
From: Richard Stallman @ 2015-09-12 15:28 UTC (permalink / raw)
To: Stefan Monnier; +Cc: stephen, jean.christophe.helary, drew.adams, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> Actually, the way we downcase it has problems. E.g. Go to the beginning
> of this paragraph (i.e. before "Actually") and do:
> C-s C-w M-c
> and you end up searching for an exact (non-case-folded) match of
> "actually" rather than "Actually", so it won't even match the "Actually"
> from which you got it.
Perhaps the case-ignore toggle should affect chars as you enter them.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 86+ messages in thread
* Re: char equivalence classes in search - why not symmetric?
2015-09-11 16:31 ` Drew Adams
@ 2015-09-12 15:29 ` Richard Stallman
0 siblings, 0 replies; 86+ messages in thread
From: Richard Stallman @ 2015-09-12 15:29 UTC (permalink / raw)
To: Drew Adams; +Cc: stephen, jean.christophe.helary, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> Correct. We are agreeing about the facts, which is good. Per proposal:
> With char folding ON:
> (1) Search for A with or without any accent.
> (2) Search for "each accented variant of A that exists in Unicode",
> with or without any accent.
That seems to be a description of how it works now.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 86+ messages in thread
* RE: char equivalence classes in search - why not symmetric?
[not found] ` <<E1ZamkM-0005d4-RN@fencepost.gnu.org>
@ 2015-09-12 15:59 ` Drew Adams
0 siblings, 0 replies; 86+ messages in thread
From: Drew Adams @ 2015-09-12 15:59 UTC (permalink / raw)
To: rms, Drew Adams; +Cc: stephen, jean.christophe.helary, emacs-devel
> > Correct. We are agreeing about the facts, which is good. Per
> > proposal:
> >
> > With char folding ON:
> >
> > (1) Search for A with or without any accent.
> > (2) Search for "each accented variant of A that exists in
> > Unicode", with or without any accent.
>
> That seems to be a description of how it works now.
No, it is not meant to.
#2 means use any of the variants (in the search string) to
search for any of the variants (in the text being searched).
It is the proposal of this thread.
(#2 was admittedly expressed not so well (I tried to reuse
your two expressions, and the result of combining them was
clumsy.)
Currently, to search for all of the variants (any of them,
indifferently), you must use the base character in the
search string. You cannot use any of the variants in the
search string, to get the same effect. Only the base char
lets you search for the class, i.e., use char folding.
(But I think you already realized this.)
^ permalink raw reply [flat|nested] 86+ messages in thread
end of thread, other threads:[~2015-09-12 15:59 UTC | newest]
Thread overview: 86+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-01 15:46 char equivalence classes in search - why not symmetric? Drew Adams
2015-09-01 15:52 ` Davis Herring
2015-09-01 16:51 ` Stefan Monnier
2015-09-01 17:51 ` Drew Adams
2015-09-01 18:40 ` Davis Herring
2015-09-01 19:09 ` Drew Adams
2015-09-01 22:45 ` Juri Linkov
2015-09-02 0:33 ` Drew Adams
2015-09-01 20:10 ` Stephen J. Turnbull
2015-09-01 16:16 ` Eli Zaretskii
[not found] ` <<38061f42-eaf1-47c6-b74d-f676ac952b18@default>
[not found] ` <<83r3miatvl.fsf@gnu.org>
[not found] ` <<21998.29683.916211.867479@a1i15.kph.uni-mainz.de>
[not found] ` <<9A972800-D8F0-4DA8-877E-07D5BDC2E1F9@gmail.com>
2015-09-01 17:50 ` Drew Adams
2015-09-01 18:15 ` Eli Zaretskii
2015-09-01 18:46 ` Drew Adams
2015-09-01 19:19 ` Eli Zaretskii
2015-09-01 20:15 ` Drew Adams
2015-09-08 5:36 ` Ulrich Mueller
2015-09-08 6:04 ` Jean-Christophe Helary
2015-09-08 13:31 ` Stephen J. Turnbull
2015-09-08 14:24 ` Drew Adams
2015-09-08 15:21 ` Stephen J. Turnbull
2015-09-08 16:58 ` Drew Adams
2015-09-08 17:38 ` Stephen J. Turnbull
2015-09-09 22:52 ` Drew Adams
2015-09-10 3:12 ` Drew Adams
2015-09-10 21:46 ` Drew Adams
2015-09-08 20:15 ` Richard Stallman
2015-09-08 20:15 ` Richard Stallman
2015-09-08 21:25 ` Drew Adams
2015-09-09 15:07 ` Richard Stallman
2015-09-09 15:21 ` Drew Adams
2015-09-10 2:03 ` Richard Stallman
2015-09-10 3:23 ` Drew Adams
2015-09-11 10:28 ` Richard Stallman
2015-09-11 13:28 ` Stefan Monnier
2015-09-11 16:33 ` Drew Adams
2015-09-11 20:59 ` Juri Linkov
2015-09-11 23:11 ` Drew Adams
2015-09-12 15:28 ` Richard Stallman
2015-09-11 16:31 ` Drew Adams
2015-09-11 10:28 ` Richard Stallman
2015-09-11 16:31 ` Drew Adams
2015-09-12 15:29 ` Richard Stallman
[not found] ` <<8cf269bc-69d8-4752-8506-de8d992512e1@default>
[not found] ` <<E1ZZPIS-0005rf-DJ@fencepost.gnu.org>
2015-09-08 21:46 ` Drew Adams
[not found] ` <<E1ZZPIT-0005s6-ST@fencepost.gnu.org>
[not found] ` <<da54a6cb-90eb-481d-aa20-acfad612e709@default>
[not found] ` <<E1ZZgxz-0006X2-Bg@fencepost.gnu.org>
[not found] ` <<cb107072-7f90-41fb-9aff-075d50eb65bb@default>
[not found] ` <<E1ZZrCm-0001x4-9a@fencepost.gnu.org>
[not found] ` <<4f3b1db3-d3d2-480f-8662-fbf7c74aa67f@default>
[not found] ` <<E1ZaLZR-0002Bf-8q@fencepost.gnu.org>
[not found] ` <<e77f8e7b-581f-436d-816a-c8daed734ff5@default>
[not found] ` <<E1ZamkM-0005d4-RN@fencepost.gnu.org>
2015-09-12 15:59 ` Drew Adams
2015-09-08 13:39 ` Drew Adams
2015-09-08 21:19 ` Juri Linkov
2015-09-09 15:07 ` Richard Stallman
2015-09-08 15:47 ` Eli Zaretskii
2015-09-08 16:57 ` Drew Adams
2015-09-08 21:20 ` Juri Linkov
2015-09-09 2:42 ` Eli Zaretskii
2015-09-09 11:23 ` Artur Malabarba
2015-09-09 13:32 ` Drew Adams
2015-09-09 15:12 ` Richard Stallman
2015-09-11 20:50 ` Juri Linkov
[not found] ` <<CAAdUY-JMQVsRFku8nwX8JcA9k6Y9sHWoVL6ZC60RHnjoj0cd+Q@mail.gmail.com>
[not found] ` <<E1ZZh2a-0003u6-Fj@fencepost.gnu.org>
2015-09-09 15:22 ` Drew Adams
2015-09-10 2:03 ` Richard Stallman
2015-09-10 3:15 ` Drew Adams
2015-09-10 6:57 ` David Kastrup
2015-09-10 15:02 ` Drew Adams
2015-09-10 15:50 ` Richard Stallman
2015-09-08 20:09 ` Richard Stallman
2015-09-08 21:00 ` Drew Adams
2015-09-09 15:06 ` Richard Stallman
2015-09-08 21:47 ` Ulrich Mueller
2015-09-02 15:34 ` Richard Stallman
2015-09-02 15:56 ` Drew Adams
2015-09-02 16:05 ` Eli Zaretskii
2015-09-02 21:51 ` Jean-Christophe Helary
2015-09-02 22:15 ` Drew Adams
2015-09-03 15:37 ` Richard Stallman
2015-09-03 2:41 ` Eli Zaretskii
2015-09-03 3:08 ` Jean-Christophe Helary
2015-09-03 7:28 ` Artur Malabarba
2015-09-03 17:15 ` Drew Adams
2015-09-07 13:52 ` Nix
2015-09-07 17:07 ` Drew Adams
2015-09-07 23:23 ` Nix
2015-09-08 2:17 ` Richard Stallman
2015-09-03 14:33 ` Eli Zaretskii
2015-09-03 15:00 ` Stefan Monnier
2015-09-03 16:15 ` Drew Adams
2015-09-03 16:23 ` Eli Zaretskii
2015-09-03 16:46 ` Drew Adams
2015-09-02 16:10 ` Artur Malabarba
2015-09-03 19:49 ` Pip Cet
[not found] <<2a7b9134-af2a-462d-af6c-d02bad60bbe8@default>
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.