* Questions about isearch
@ 2015-11-25 18:41 Eli Zaretskii
2015-11-25 19:20 ` Rasmus
` (5 more replies)
0 siblings, 6 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-25 18:41 UTC (permalink / raw)
To: emacs-devel
These questions came out of review and extensive updates of the search
and replace sections of the Emacs manual:
1. Character folding doesn't catch ligatures, such as æ (should it match
the two characters "ae")?
2. It also doesn't match ä (a single character) with ä (2 characters,
which Emacs correctly composes into 1 grapheme cluster). Should it?
3. With the default value t of isearch-hide-immediately, one match in
invisible text is not hidden, and remains on display. To repro:
emacs -Q
C-x C-f etc/NEWS RET
C-c C-q
C-s require C-s <RIGHT>
This leaves the match and its surrounding hidden text on screen. I
can understand the rationale, but the doc string doesn't say anything
about this feature. On the contrary, it says:
Whatever the value, all opened invisible text is hidden again after
exiting the search. ^^^
4. What is the equivalent of case-replace and the letter-case related
behavior of replace commands to character folding? E.g., if the
replace command specifies to replace "foo" with "bar", and we found
"föo", should we replace it with "bär" or something, by analogy with
letter-case behavior?
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 18:41 Questions about isearch Eli Zaretskii
@ 2015-11-25 19:20 ` Rasmus
2015-11-25 20:02 ` Steinar Bang
2015-11-25 20:10 ` Eli Zaretskii
2015-11-25 20:14 ` Artur Malabarba
` (4 subsequent siblings)
5 siblings, 2 replies; 94+ messages in thread
From: Rasmus @ 2015-11-25 19:20 UTC (permalink / raw)
To: emacs-devel
Hi,
Eli Zaretskii <eliz@gnu.org> writes:
> These questions came out of review and extensive updates of the search
> and replace sections of the Emacs manual:
>
> 1. Character folding doesn't catch ligatures, such as æ (should it match
> the two characters "ae")?
In Danish I would not consider this a ligature, but a separate letter. It
can be written as ae, however. Thus, it would probably be nice to match
it via ’ae’. But where to stop? How about ’å’ (matched by ’a’)? Should
it be captured by "aa"? Ø by ’oe’? There’s also ’œ’...
Probably there’s lots of these weird cases.
> 2. It also doesn't match ä (a single character) with ä (2 characters,
> which Emacs correctly composes into 1 grapheme cluster). Should it?
This reminds me: UTF-8 "stroked through a" (a̶) is also displayed as ä
rather than the stroke through a Emacs on my system. But this is probably
a different issue.
Thanks,
Rasmus
--
Together we'll stand, divided we'll fall
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 19:20 ` Rasmus
@ 2015-11-25 20:02 ` Steinar Bang
2015-11-26 14:46 ` Richard Stallman
2015-11-25 20:10 ` Eli Zaretskii
1 sibling, 1 reply; 94+ messages in thread
From: Steinar Bang @ 2015-11-25 20:02 UTC (permalink / raw)
To: emacs-devel
>>>>> Rasmus <rasmus@gmx.us>:
> Hi,
> Eli Zaretskii <eliz@gnu.org> writes:
>> These questions came out of review and extensive updates of the search
>> and replace sections of the Emacs manual:
>>
>> 1. Character folding doesn't catch ligatures, such as æ (should it match
>> the two characters "ae")?
> In Danish I would not consider this a ligature, but a separate letter. It
> can be written as ae, however.
Hm... could this happen other than when transcribing a Danish name
containing "æ" to an alphabet without Danish letters...?
> Thus, it would probably be nice to match it via ’ae’.
Speaking for the Norwegians: probably not!
> But where to stop? How about ’å’ (matched by ’a’)?
Absolutely not!
> Should it be captured by "aa"?
Actually perhaps yes, but only for names, and only if the locale is
Norwegian (and presumably also Danish).
Actually, considering the limitations, probably not.
> Ø by ’oe’?
No. For Norwegian the situation would be similar to "ae", and it would
be for a case that is increasingly going away: having to transcribe a
name in USASCII only.
But it would make sense to make it search for "ö", because it rarely but
_may_ be used instead of "ø" in Norwegian, and there are cases where
Norwegian words differ only on "ø" vs. "ö" (funnily enough less so on
"æ" vs. "ä"... an informal observation by me is that many words spelled
with "ä" in Swedish are spelled with "e" in Norwegian (and pronounced
with an "æ" sound)).
> There’s also ’œ’...
Which is French for (more or less) the same sound as "ø"
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 19:20 ` Rasmus
2015-11-25 20:02 ` Steinar Bang
@ 2015-11-25 20:10 ` Eli Zaretskii
2015-11-25 20:41 ` Mike Kupfer
1 sibling, 1 reply; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-25 20:10 UTC (permalink / raw)
To: Rasmus; +Cc: emacs-devel
> From: Rasmus <rasmus@gmx.us>
> Date: Wed, 25 Nov 2015 20:20:20 +0100
>
> > 1. Character folding doesn't catch ligatures, such as æ (should it match
> > the two characters "ae")?
>
> In Danish I would not consider this a ligature, but a separate letter. It
> can be written as ae, however. Thus, it would probably be nice to match
> it via ’ae’. But where to stop? How about ’å’ (matched by ’a’)? Should
> it be captured by "aa"? Ø by ’oe’? There’s also ’œ’...
>
> Probably there’s lots of these weird cases.
Please read the node "Lax Search" in the Emacs manual. That ship
sailed several months ago, and Emacs already supports "character
folding", and thus yes, 'a' matches 'å' (and also 'ä' and 'á' and 'ǎ'
and many others). We don't make these matches language dependent,
because Emacs is a multi-lingual environment, and most text is not
tagged with a particular language. So we use language-independent
folding, and AFAIU "ae" should have matched 'æ' under the rules we
use. But it doesn't. (Similarly "ff" and 'ff' and others.)
> > 2. It also doesn't match ä (a single character) with ä (2 characters,
> > which Emacs correctly composes into 1 grapheme cluster). Should it?
>
> This reminds me: UTF-8 "stroked through a" (a̶) is also displayed as ä
> rather than the stroke through a Emacs on my system. But this is probably
> a different issue.
Display is a different issue, indeed.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 18:41 Questions about isearch Eli Zaretskii
2015-11-25 19:20 ` Rasmus
@ 2015-11-25 20:14 ` Artur Malabarba
2015-11-25 20:30 ` Marcin Borkowski
` (2 more replies)
2015-11-25 23:15 ` Mike Kupfer
` (3 subsequent siblings)
5 siblings, 3 replies; 94+ messages in thread
From: Artur Malabarba @ 2015-11-25 20:14 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1304 bytes --]
On 25 Nov 2015 6:41 pm, "Eli Zaretskii" <eliz@gnu.org> wrote:
> 1. Character folding doesn't catch ligatures, such as æ (should it match
> the two characters "ae")?
I've no idea. It would be easy to add.
Those who use ligatures need to tell us whether that makes sense.
> 2. It also doesn't match ä (a single character) with ä (2 characters,
> which Emacs correctly composes into 1 grapheme cluster). Should it?
Possibly. Since they look the same, might make things easier on users. But
I wouldn't know as I've never seen the second version used anywhere.
> 4. What is the equivalent of case-replace and the letter-case related
> behavior of replace commands to character folding? E.g., if the
> replace command specifies to replace "foo" with "bar", and we found
> "föo", should we replace it with "bär" or something, by analogy with
> letter-case behavior?
I don't think we should do that. Case replacement makes sense because the
way you capitalize a word is frequently (though not always) independent of
the word itself. That's not the case with char folding. At least in
Portuguese, accents only go in very specific places, and I would _never_
want emacs to add an accent to the replacement text just because the word
being replaced happened to have an accent.
[-- Attachment #2: Type: text/html, Size: 1588 bytes --]
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 20:14 ` Artur Malabarba
@ 2015-11-25 20:30 ` Marcin Borkowski
2015-11-25 20:38 ` Eli Zaretskii
2015-11-25 20:36 ` Eli Zaretskii
2015-11-26 16:08 ` Rasmus
2 siblings, 1 reply; 94+ messages in thread
From: Marcin Borkowski @ 2015-11-25 20:30 UTC (permalink / raw)
To: bruce.connor.am; +Cc: Eli Zaretskii, emacs-devel
On 2015-11-25, at 21:14, Artur Malabarba <bruce.connor.am@gmail.com> wrote:
> On 25 Nov 2015 6:41 pm, "Eli Zaretskii" <eliz@gnu.org> wrote:
>> 1. Character folding doesn't catch ligatures, such as æ (should it match
>> the two characters "ae")?
>
> I've no idea. It would be easy to add.
> Those who use ligatures need to tell us whether that makes sense.
I'm not sure whether this is relevant, but a place where ligatures come
up naturally is TeX's pdf files, which can be isearched with pdf-tools.
Currently, searching for "fi" when the document contains the
corresponding ligature Just Works™. I'm not sure what would happen in
case of e.g. a result of a pdf->text conversion.
>> 4. What is the equivalent of case-replace and the letter-case related
>> behavior of replace commands to character folding? E.g., if the
>> replace command specifies to replace "foo" with "bar", and we found
>> "föo", should we replace it with "bär" or something, by analogy with
>> letter-case behavior?
>
> I don't think we should do that. Case replacement makes sense because the
> way you capitalize a word is frequently (though not always) independent of
> the word itself. That's not the case with char folding. At least in
> Portuguese, accents only go in very specific places, and I would _never_
> want emacs to add an accent to the replacement text just because the word
> being replaced happened to have an accent.
+1. In Polish, e.g. "a" and "ą" (or "n" and "ń", etc.) are different
letters, representing different sounds, and possibly changing the
meaning of a word (for instance, "kat" is an executioner and "kąt" is an
angle).
Best,
--
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 20:14 ` Artur Malabarba
2015-11-25 20:30 ` Marcin Borkowski
@ 2015-11-25 20:36 ` Eli Zaretskii
2015-11-25 21:49 ` Artur Malabarba
2015-11-27 12:03 ` Artur Malabarba
2015-11-26 16:08 ` Rasmus
2 siblings, 2 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-25 20:36 UTC (permalink / raw)
To: bruce.connor.am; +Cc: emacs-devel
> Date: Wed, 25 Nov 2015 20:14:06 +0000
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Cc: emacs-devel <emacs-devel@gnu.org>
>
> > 1. Character folding doesn't catch ligatures, such as æ (should it match
> > the two characters "ae")?
>
> I've no idea. It would be easy to add.
No, I meant to ask why it doesn't work already. AFAIU, the
decomposition of ff is "ff":
(get-char-code-property ?ff 'decomposition)
=> (compat 102 102)
but searching for 'f' doesn't match the ligature. (æ doesn't have a
decomposition in the Unicode database, so maybe it's a different
case.)
> Those who use ligatures need to tell us whether that makes sense.
I thought we used decomposition data automatically, no?
> > 2. It also doesn't match ä (a single character) with ä (2 characters,
> > which Emacs correctly composes into 1 grapheme cluster). Should it?
>
> Possibly. Since they look the same, might make things easier on users. But I
> wouldn't know as I've never seen the second version used anywhere.
Once again, the decomposition attribute says we should match them:
(get-char-code-property ?ä 'decomposition)
=> (97 776)
and the second character in ä is U+0308 = 776. Doesn't that say we
should have matched them?
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 20:30 ` Marcin Borkowski
@ 2015-11-25 20:38 ` Eli Zaretskii
2015-11-25 21:58 ` Artur Malabarba
0 siblings, 1 reply; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-25 20:38 UTC (permalink / raw)
To: Marcin Borkowski; +Cc: bruce.connor.am, emacs-devel
> From: Marcin Borkowski <mbork@mbork.pl>
> Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel <emacs-devel@gnu.org>
> Date: Wed, 25 Nov 2015 21:30:17 +0100
>
> >> 4. What is the equivalent of case-replace and the letter-case related
> >> behavior of replace commands to character folding? E.g., if the
> >> replace command specifies to replace "foo" with "bar", and we found
> >> "föo", should we replace it with "bär" or something, by analogy with
> >> letter-case behavior?
> >
> > I don't think we should do that. Case replacement makes sense because the
> > way you capitalize a word is frequently (though not always) independent of
> > the word itself. That's not the case with char folding. At least in
> > Portuguese, accents only go in very specific places, and I would _never_
> > want emacs to add an accent to the replacement text just because the word
> > being replaced happened to have an accent.
>
> +1. In Polish, e.g. "a" and "ą" (or "n" and "ń", etc.) are different
> letters, representing different sounds, and possibly changing the
> meaning of a word (for instance, "kat" is an executioner and "kąt" is an
> angle).
But replacement is all about _changing_ text, so this argument doesn't
seem to be applicable.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 20:10 ` Eli Zaretskii
@ 2015-11-25 20:41 ` Mike Kupfer
2015-11-25 20:56 ` Eli Zaretskii
0 siblings, 1 reply; 94+ messages in thread
From: Mike Kupfer @ 2015-11-25 20:41 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Eli Zaretskii wrote:
> Please read the node "Lax Search" in the Emacs manual.
Is that something new in Emacs 25? I didn't find it in 24.5.
mike
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 20:41 ` Mike Kupfer
@ 2015-11-25 20:56 ` Eli Zaretskii
0 siblings, 0 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-25 20:56 UTC (permalink / raw)
To: Mike Kupfer; +Cc: emacs-devel
> From: Mike Kupfer <m.kupfer@acm.org>
> cc: emacs-devel@gnu.org
> Date: Wed, 25 Nov 2015 12:41:52 -0800
>
> Eli Zaretskii wrote:
>
> > Please read the node "Lax Search" in the Emacs manual.
>
> Is that something new in Emacs 25?
Yes.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 20:36 ` Eli Zaretskii
@ 2015-11-25 21:49 ` Artur Malabarba
2015-11-26 3:34 ` Eli Zaretskii
2015-11-27 12:03 ` Artur Malabarba
1 sibling, 1 reply; 94+ messages in thread
From: Artur Malabarba @ 2015-11-25 21:49 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1942 bytes --]
On 25 Nov 2015 8:36 pm, "Eli Zaretskii" <eliz@gnu.org> wrote:
>
> > Date: Wed, 25 Nov 2015 20:14:06 +0000
> > From: Artur Malabarba <bruce.connor.am@gmail.com>
> > Cc: emacs-devel <emacs-devel@gnu.org>
> >
> > > 1. Character folding doesn't catch ligatures, such as æ (should it
match
> > > the two characters "ae")?
> >
> > I've no idea. It would be easy to add.
>
> No, I meant to ask why it doesn't work already. AFAIU, the
> decomposition of ff is "ff":
>
> (get-char-code-property ?ff 'decomposition)
> => (compat 102 102)
>
> but searching for 'f' doesn't match the ligature. (æ doesn't have a
> decomposition in the Unicode database, so maybe it's a different
> case.)
I see. I thought this was a case of adding an adhoc rule.
I'll have to look into it over the weekend to see why f doesn't match ff.
> > > 2. It also doesn't match ä (a single character) with ä (2 characters,
> > > which Emacs correctly composes into 1 grapheme cluster). Should it?
> >
> > Possibly. Since they look the same, might make things easier on users.
But I
> > wouldn't know as I've never seen the second version used anywhere.
>
> Once again, the decomposition attribute says we should match them:
>
> (get-char-code-property ?ä 'decomposition)
> => (97 776)
>
> and the second character in ä is U+0308 = 776. Doesn't that say we
> should have matched them?
That's different. Currently we use the decomposition attribute to decide
that "a" should match ä. Our approach so far has been that searching for
the "easy to type" characters should match the "hard to type" characters,
but searching for the "hard to type" characters will only match the
character itself. So right now it is working as intended.
We can (and I think we should) extend that last case so that searching for
the "hard to type" characters will only match the character itself or its
exact decomposition.
[-- Attachment #2: Type: text/html, Size: 2593 bytes --]
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 20:38 ` Eli Zaretskii
@ 2015-11-25 21:58 ` Artur Malabarba
2015-11-25 23:04 ` Mike Kupfer
2015-11-26 13:28 ` Steinar Bang
0 siblings, 2 replies; 94+ messages in thread
From: Artur Malabarba @ 2015-11-25 21:58 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 458 bytes --]
On 25 Nov 2015 8:38 pm, "Eli Zaretskii" <eliz@gnu.org> wrote:
> > >> 4. What is the equivalent of case-replace and the letter-case related
> > >> behavior of replace commands to character folding?
> But replacement is all about _changing_ text, so this argument doesn't
> seem to be applicable.
Just to be clear. If Emacs tries to be clever about accents when I'm
replacing text, it will do the wrong thing at least 100% of the time in
Portuguese text. :-)
[-- Attachment #2: Type: text/html, Size: 605 bytes --]
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 21:58 ` Artur Malabarba
@ 2015-11-25 23:04 ` Mike Kupfer
2015-11-26 3:40 ` Eli Zaretskii
2015-11-26 13:28 ` Steinar Bang
1 sibling, 1 reply; 94+ messages in thread
From: Mike Kupfer @ 2015-11-25 23:04 UTC (permalink / raw)
To: bruce.connor.am, Eli Zaretskii; +Cc: emacs-devel
Artur Malabarba wrote:
> Just to be clear. If Emacs tries to be clever about accents when I'm
> replacing text, it will do the wrong thing at least 100% of the time in
> Portuguese text. :-)
To give a more concrete example, if I try to replace "papa" ("pope" in
Italian) with "Francis", I would not want Emacs to also replace (or even
suggest replacing) "papà" ("dad" in Italian) with "Francis".
mike
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 18:41 Questions about isearch Eli Zaretskii
2015-11-25 19:20 ` Rasmus
2015-11-25 20:14 ` Artur Malabarba
@ 2015-11-25 23:15 ` Mike Kupfer
2015-11-26 14:45 ` Richard Stallman
` (2 subsequent siblings)
5 siblings, 0 replies; 94+ messages in thread
From: Mike Kupfer @ 2015-11-25 23:15 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Eli Zaretskii wrote:
> 2. It also doesn't match ä (a single character) with ä (2 characters,
> which Emacs correctly composes into 1 grapheme cluster). Should it?
They should match IMO. The difference between composed and decomposed
characters should be an implementation detail that's not exposed to
users.
mike
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 21:49 ` Artur Malabarba
@ 2015-11-26 3:34 ` Eli Zaretskii
0 siblings, 0 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-26 3:34 UTC (permalink / raw)
To: bruce.connor.am; +Cc: emacs-devel
> Date: Wed, 25 Nov 2015 21:49:58 +0000
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Cc: emacs-devel <emacs-devel@gnu.org>
>
> > > > 2. It also doesn't match ä (a single character) with ä (2 characters,
> > > > which Emacs correctly composes into 1 grapheme cluster). Should it?
> > >
> > > Possibly. Since they look the same, might make things easier on users. But
> I
> > > wouldn't know as I've never seen the second version used anywhere.
> >
> > Once again, the decomposition attribute says we should match them:
> >
> > (get-char-code-property ?ä 'decomposition)
> > => (97 776)
> >
> > and the second character in ä is U+0308 = 776. Doesn't that say we
> > should have matched them?
>
> That's different. Currently we use the decomposition attribute to decide that
> "a" should match ä. Our approach so far has been that searching for the "easy
> to type" characters should match the "hard to type" characters, but searching
> for the "hard to type" characters will only match the character itself. So
> right now it is working as intended.
>
> We can (and I think we should) extend that last case so that searching for the
> "hard to type" characters will only match the character itself or its exact
> decomposition.
The first part (matching only itself) already works, AFAICS. If the
latter doesn't require too deep changes, I think we should do that for
Emacs 25.1, because it would be confusing not to have that.
Thanks.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 23:04 ` Mike Kupfer
@ 2015-11-26 3:40 ` Eli Zaretskii
2015-11-27 19:50 ` Mike Kupfer
0 siblings, 1 reply; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-26 3:40 UTC (permalink / raw)
To: Mike Kupfer; +Cc: bruce.connor.am, emacs-devel
> From: Mike Kupfer <m.kupfer@acm.org>
> cc: emacs-devel <emacs-devel@gnu.org>
> Date: Wed, 25 Nov 2015 15:04:37 -0800
>
> To give a more concrete example, if I try to replace "papa" ("pope" in
> Italian) with "Francis", I would not want Emacs to also replace (or even
> suggest replacing) "papà" ("dad" in Italian) with "Francis".
By default, Emacs won't. But if you set replace-character-fold
non-nil, it will.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 21:58 ` Artur Malabarba
2015-11-25 23:04 ` Mike Kupfer
@ 2015-11-26 13:28 ` Steinar Bang
1 sibling, 0 replies; 94+ messages in thread
From: Steinar Bang @ 2015-11-26 13:28 UTC (permalink / raw)
To: emacs-devel
>>>>> Artur Malabarba <bruce.connor.am@gmail.com>:
> Just to be clear. If Emacs tries to be clever about accents when I'm
> replacing text, it will do the wrong thing at least 100% of the time in
> Portuguese text. :-)
Ditto for Norwegian. (Well perhaps not 100% of the time, since there
are only 3 special letters, but still...)
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 18:41 Questions about isearch Eli Zaretskii
` (2 preceding siblings ...)
2015-11-25 23:15 ` Mike Kupfer
@ 2015-11-26 14:45 ` Richard Stallman
2015-11-27 0:43 ` Juri Linkov
2015-11-27 8:02 ` Andreas Röhler
5 siblings, 0 replies; 94+ messages in thread
From: Richard Stallman @ 2015-11-26 14:45 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> 1. Character folding doesn't catch ligatures, such as æ (should it match
> the two characters "ae")?
> 2. It also doesn't match ä (a single character) with ä (2 characters,
> which Emacs correctly composes into 1 grapheme cluster). Should it?
This might be a good thing to poll the users about.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 20:02 ` Steinar Bang
@ 2015-11-26 14:46 ` Richard Stallman
2015-11-26 16:22 ` Eli Zaretskii
0 siblings, 1 reply; 94+ messages in thread
From: Richard Stallman @ 2015-11-26 14:46 UTC (permalink / raw)
To: Steinar Bang; +Cc: emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> > In Danish I would not consider this a ligature, but a separate letter. It
> > can be written as ae, however.
> Hm... could this happen other than when transcribing a Danish name
> containing "æ" to an alphabet without Danish letters...?
> > Thus, it would probably be nice to match it via ’ae’.
> Speaking for the Norwegians: probably not!
> > But where to stop? How about ’å’ (matched by ’a’)?
> Absolutely not!
> > Should it be captured by "aa"?
> Actually perhaps yes, but only for names, and only if the locale is
> Norwegian (and presumably also Danish).
> Actually, considering the limitations, probably not.
It seems that perhaps we need these correspondences to depend
on the language in use.
That's true for case conversion as well. For instance the way
to upcase 'i' is 'I' in most languages, but in Turkish it's a
character I can't find a way to enter in Emacs.
It seems to me that we want to introduce a concept of current language
which would control these things, and also the language for spell checking,
and maybe some other things.
In some cases, the current language is determined by which characters
appear. That would work fine for scripts that are used for just one
language. It would be hard to do that for Latin scripts, though.
For latin scripts one might always have to specify it explicitly,
but it could be specified by a file local variable or other such
per-file customization mechanism.
The language environment, which already exists, is something
different. It controls how to recognize character codings, and
therefore has to be global. The current language should be per-buffer
and perhaps should vary between parts of a buffer. So they can't
be the same thing.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 20:14 ` Artur Malabarba
2015-11-25 20:30 ` Marcin Borkowski
2015-11-25 20:36 ` Eli Zaretskii
@ 2015-11-26 16:08 ` Rasmus
2 siblings, 0 replies; 94+ messages in thread
From: Rasmus @ 2015-11-26 16:08 UTC (permalink / raw)
To: emacs-devel
Artur Malabarba <bruce.connor.am@gmail.com> writes:
>> 2. It also doesn't match ä (a single character) with ä (2 characters,
>> which Emacs correctly composes into 1 grapheme cluster). Should it?
>
> Possibly. Since they look the same, might make things easier on users. But
> I wouldn't know as I've never seen the second version used anywhere.
Based on how they look on my screen (superscripted by underlined), I'd
used these symbols for addresses in Spain, e.g. in Catalan,
Carrer del Cabanes 15, 2ö, 3ä
secondO piso, tercerA puerta.
Rasmus
--
It was you, Jezebel, it was you
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-26 14:46 ` Richard Stallman
@ 2015-11-26 16:22 ` Eli Zaretskii
2015-11-26 20:46 ` Per Starbäck
2015-11-27 6:37 ` Richard Stallman
0 siblings, 2 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-26 16:22 UTC (permalink / raw)
To: rms; +Cc: sb, emacs-devel
> From: Richard Stallman <rms@gnu.org>
> Date: Thu, 26 Nov 2015 09:46:09 -0500
> Cc: emacs-devel@gnu.org
>
> It seems that perhaps we need these correspondences to depend
> on the language in use.
>
> That's true for case conversion as well. For instance the way
> to upcase 'i' is 'I' in most languages, but in Turkish it's a
> character I can't find a way to enter in Emacs.
(That character is, İ, U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE.)
IMO, it is more important to have language-independent matching in
Emacs. Language-specific rules are also needed in some situations,
but they are secondary for Emacs.
> It seems to me that we want to introduce a concept of current language
It's a problematic concept for Emacs, which is a multi-lingual
environment. For example, what is the "current language" of the
buffer showing this message? It cannot be US English, since it
includes characters not in that language, and can easily include
Turkish words. Or consider the etc/HELLO file.
We could probably have a text property which will specify the
language, but we don't have good means to set such a property. IOW,
where that information would come from?
> which would control these things, and also the language for spell checking,
> and maybe some other things.
Actually, modern spell-checkers can support multiple languages in the
same spell-checking job (in a nutshell, they check dictionaries for
each language they were told to use).
In any case, a spell-checker has a simpler job in this respect: it
checks one word at a time, so all it needs is the language for that
one word. Conceptually, this is much simpler than what Emacs needs.
> In some cases, the current language is determined by which characters
> appear. That would work fine for scripts that are used for just one
> language. It would be hard to do that for Latin scripts, though.
> For latin scripts one might always have to specify it explicitly,
> but it could be specified by a file local variable or other such
> per-file customization mechanism.
We already know which script each character belongs to:
(aref char-script-table ?a) => latin
But, as you say, this only rarely helps to deduce the language.
> The language environment, which already exists, is something
> different. It controls how to recognize character codings, and
> therefore has to be global. The current language should be per-buffer
> and perhaps should vary between parts of a buffer. So they can't
> be the same thing.
Indeed. But defining the current language of a buffer isn't
sufficient, either, for Emacs.
For that reason, we generally provide language-agnostic sorting,
searching, etc.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-26 16:22 ` Eli Zaretskii
@ 2015-11-26 20:46 ` Per Starbäck
2015-11-26 21:02 ` Eli Zaretskii
2015-11-26 23:18 ` Rasmus
2015-11-27 6:37 ` Richard Stallman
1 sibling, 2 replies; 94+ messages in thread
From: Per Starbäck @ 2015-11-26 20:46 UTC (permalink / raw)
To: emacs-devel@gnu.org; +Cc: Eli Zaretskii, sb, rms
> IMO, it is more important to have language-independent matching in
> Emacs. Language-specific rules are also needed in some situations,
> but they are secondary for Emacs.
>
>> It seems to me that we want to introduce a concept of current language
Yes! The language of a buffer is something I have wished for a long
long time, probably using minor modes. It has primarily been to have
the correct ispell dictionary and to have different abbrevs depending
on language.
With the new search folding it is much more needed.
> It's a problematic concept for Emacs, which is a multi-lingual
> environment. For example, what is the "current language" of the
> buffer showing this message?
It's in English.
> It cannot be US English, since it
> includes characters not in that language, and can easily include
> Turkish words. Or consider the etc/HELLO file.
I don't understand at all what you are saying here. Yes, of course
Turkish words (and any character) can be in an English text. That
doesn't make it false that it is in English. Do you just mean that it
can be hard do determine the language of a text automatically?
> We could probably have a text property which will specify the
> language, but we don't have good means to set such a property. IOW,
> where that information would come from?
I don't envision a text property, but just a value for the buffer,
because it is much easier and good enough for most things. Yes, there
are situations where you might want to differentiate it like that, but
that goes for other things we have in modes as well. (It would
sometimes be nice to get Javascript mode for part of an HTML file
etc.)
So from where do we get it? Normally from the user. Many users mostly
write in a few languages, like Swedish and English to take myself as
an example. What I want is an indication "en" or "sv" somewhere in the
information line and commands to toggle between my favourite
languages.
Sometimes it can be determined automatically. For example when opening
a html file Emacs could look at the "lang" attribute, in a LaTeX file
it could see how you use packages like Babel or Polyglossia. And in
any text file various methods (like n-gram frequencies) can be used to
try to identify the language automatically.
I think the focus should be on buffers being able to have a (natural)
language, and commands to change that. It would be quite sufficient
with:
* a setting listing what languages I normally want to use (the first
one being the default)
* a cycling command that sets the language to the next in that list
(that is a toggle when you have a two-list)
* a command to explicitly set any valid value
Anything else can be done a lot later, and as experiments outside of
the core. Automatic detection is neat, but not really needed. And
exactly what changes the different languages need to do will be
determined part by part by time in different language communities. The
important thing is that there is some hook to hang your code on.
* Why it is so important, now with the new search folding *
For Scandinavians it is really important, because (with Swedish as
example) åäö are really totally their own letters in the Swedish
alphabet, regardless of their historic origin. To have a search for
"varpa" in a Swedish text find "värpa" or "varpå" would be just wrong.
It would give a strong impression of this being an American program
not meant to be used for Swedish.
An analogue would be finding "jamb" when looking for "iamb" in
English, where I and J are totally different letters, even though they
originally (in Latin) were the same. Or you start an isearch for
"valid" and after the first four letters you are inside "dualism". (U
and V also were the same letter originally.) Confusing and irritating,
and something to make people turn off this search folding which would
be sad, because it's a nice thing to have.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-26 20:46 ` Per Starbäck
@ 2015-11-26 21:02 ` Eli Zaretskii
2015-11-26 21:35 ` Marcin Borkowski
2015-11-27 6:38 ` Richard Stallman
2015-11-26 23:18 ` Rasmus
1 sibling, 2 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-26 21:02 UTC (permalink / raw)
To: Per Starbäck; +Cc: sb, rms, emacs-devel
> Date: Thu, 26 Nov 2015 21:46:49 +0100
> From: Per Starbäck <per@starback.se>
> Cc: rms@gnu.org, Eli Zaretskii <eliz@gnu.org>, sb@dod.no
>
> > It cannot be US English, since it
> > includes characters not in that language, and can easily include
> > Turkish words. Or consider the etc/HELLO file.
>
> I don't understand at all what you are saying here. Yes, of course
> Turkish words (and any character) can be in an English text. That
> doesn't make it false that it is in English. Do you just mean that it
> can be hard do determine the language of a text automatically?
So you will sort Turkish words in an otherwise English text according
to English rules? And spell-check them using an English dictionary?
I don't think so.
A language attribute is something that should control how certain
linguistic operations are tailored. You cannot use one language's
rules with words from another language.
So saying that an email message that is mostly in English, but
includes words and phrases from another language, is in English is not
useful, at least for handling the non-English parts of that message.
And what about etc/HELLO? what language is it in? There are more
non-English words there than English words, and no language in
particular can claim it has the majority of the words, or even too
many to count as "many". How do we treat such buffers? what rules of
character folding do we apply there?
> > We could probably have a text property which will specify the
> > language, but we don't have good means to set such a property. IOW,
> > where that information would come from?
>
> I don't envision a text property, but just a value for the buffer,
> because it is much easier and good enough for most things. Yes, there
> are situations where you might want to differentiate it like that, but
> that goes for other things we have in modes as well. (It would
> sometimes be nice to get Javascript mode for part of an HTML file
> etc.)
Having Javascript in HTML just makes it highlighted wrongly. That's
aesthetically bad (and there's a todo item to solve that problem), but
that's not fatal. Trying to treat a word in Japanese according to
Latin rules is much worse.
So I think a per-buffer language attribute is the wrong way to go. We
need a finer granularity.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-26 21:02 ` Eli Zaretskii
@ 2015-11-26 21:35 ` Marcin Borkowski
2015-11-27 7:43 ` Eli Zaretskii
2015-11-27 6:38 ` Richard Stallman
1 sibling, 1 reply; 94+ messages in thread
From: Marcin Borkowski @ 2015-11-26 21:35 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Per Starbäck, sb, rms, emacs-devel
On 2015-11-26, at 22:02, Eli Zaretskii <eliz@gnu.org> wrote:
> And what about etc/HELLO? what language is it in? [...]
And, as I mentioned a few times in various places, there is another case
(and unlike etc/HELLO, it actually happens IRL): bibliographies of
scientific papers. It is not uncommon for such a bibliography to
contain titles/journal names in various languages. (Probably the most
extreme example might be "Funkcialaj Ekvacioj", a Japanese journal with
an Esperanto title and mostly or only English papers.)
AFAIK (though I'm not 100% sure), standard LaTeX tools (i.e., BibLaTeX)
do not support such a situation (which is bad, since it is really needed
to have different hyphenation rules for different parts of these entries
- be glad that Emacs doesn't have to care about those!). Another LaTeX
bibliography tool, amsrefs, handles them well; but it's not very
popular.
For a less extreme example, consider e.g. Latin phrases in the midst of
an English text; not uncommon, for instance in law texts (but not only
there).
> So I think a per-buffer language attribute is the wrong way to go. We
> need a finer granularity.
Yes.
OTOH, my feeling is that a solution which would be correct 85% of the
time is better than no solution.
Best,
--
Marcin Borkowski
http://octd.wmi.amu.edu.pl/pl/Marcin_Borkowski
Wydział Matematyki i Informatyki
Uniwersytet im. Adama Mickiewicza
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-26 20:46 ` Per Starbäck
2015-11-26 21:02 ` Eli Zaretskii
@ 2015-11-26 23:18 ` Rasmus
2015-11-27 7:46 ` Eli Zaretskii
1 sibling, 1 reply; 94+ messages in thread
From: Rasmus @ 2015-11-26 23:18 UTC (permalink / raw)
To: emacs-devel
Per Starbäck <per@starback.se> writes:
> * Why it is so important, now with the new search folding *
>
> For Scandinavians it is really important, because (with Swedish as
> example) åäö are really totally their own letters in the Swedish
> alphabet, regardless of their historic origin. To have a search for
> "varpa" in a Swedish text find "värpa" or "varpå" would be just wrong.
> It would give a strong impression of this being an American program
> not meant to be used for Swedish.
Still, imagine you are stuck in some environment where you do not have a
Scando keyboard (or that you have to find a ’Øystein’ as a non-Scando).
It may be useful to be useful to be able to search without having to
access difficult letters (though the TeX input method solves most such
issues for me).
Rasmus
--
Enough with the bla bla!
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 18:41 Questions about isearch Eli Zaretskii
` (3 preceding siblings ...)
2015-11-26 14:45 ` Richard Stallman
@ 2015-11-27 0:43 ` Juri Linkov
2015-11-27 8:07 ` Eli Zaretskii
2015-11-27 8:02 ` Andreas Röhler
5 siblings, 1 reply; 94+ messages in thread
From: Juri Linkov @ 2015-11-27 0:43 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
> 3. With the default value t of isearch-hide-immediately, one match in
> invisible text is not hidden, and remains on display. To repro:
>
> emacs -Q
> C-x C-f etc/NEWS RET
> C-c C-q
> C-s require C-s <RIGHT>
>
> This leaves the match and its surrounding hidden text on screen. I
> can understand the rationale, but the doc string doesn't say anything
> about this feature. On the contrary, it says:
>
> Whatever the value, all opened invisible text is hidden again after
> exiting the search. ^^^
I see no answers to your 3rd question, so I wanted to clarify whether
this is something new or can you reproduce the same in older versions?
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-26 16:22 ` Eli Zaretskii
2015-11-26 20:46 ` Per Starbäck
@ 2015-11-27 6:37 ` Richard Stallman
2015-11-27 8:39 ` Eli Zaretskii
1 sibling, 1 reply; 94+ messages in thread
From: Richard Stallman @ 2015-11-27 6:37 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: sb, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> It's a problematic concept for Emacs, which is a multi-lingual
> environment. For example, what is the "current language" of the
> buffer showing this message?
English. That's what I would select for it.
> It cannot be US English, since it
> includes characters not in that language,
Of course it can be. If I were editing that text, I would
not select Turkish for it.
But if you want to select Turkish for it, you could do that.
The user should be able to select any current language
for a given buffer.
> We could probably have a text property which will specify the
> language, but we don't have good means to set such a property. IOW,
> where that information would come from?
We don't need anything that fancy for the initial feature. Just the
ability to select the language for any buffer would be a great start.
Indeed, it would not help you much for etc/HELLO, but so what?
It can be useful for many situations, even if it can't handle
really complex situations any better than now.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-26 21:02 ` Eli Zaretskii
2015-11-26 21:35 ` Marcin Borkowski
@ 2015-11-27 6:38 ` Richard Stallman
2015-11-27 8:53 ` Eli Zaretskii
2015-11-27 16:21 ` raman
1 sibling, 2 replies; 94+ messages in thread
From: Richard Stallman @ 2015-11-27 6:38 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: per, sb, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> So you will sort Turkish words in an otherwise English text according
> to English rules? And spell-check them using an English dictionary?
> I don't think so.
You seem to be trying to design an ultimate, ideal current language
facility. We might want to get there eventually, but I think we
should start with something simple. After all, most buffers have only
one language in them. If there are a few words in another language,
the user probably won't find it hard to deal with the fact that Emacs
does not know they are in another language.
Having a selectable language for the whole buffer
is going to be better than the current situation
where you can't select it.
If you have a table in Turkish in the middle of a English document,
and you want to sort the table, you can switch the buffer to Turkish,
sort, and switch back to English.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-26 21:35 ` Marcin Borkowski
@ 2015-11-27 7:43 ` Eli Zaretskii
0 siblings, 0 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-27 7:43 UTC (permalink / raw)
To: Marcin Borkowski; +Cc: per, sb, rms, emacs-devel
> From: Marcin Borkowski <mbork@mbork.pl>
> Cc: Per Starbäck <per@starback.se>, sb@dod.no, rms@gnu.org,
> emacs-devel@gnu.org
> Date: Thu, 26 Nov 2015 22:35:10 +0100
>
> OTOH, my feeling is that a solution which would be correct 85% of the
> time is better than no solution.
That could well be so, yes. But even for such a partial solution, we
still need gobs of infrastructure we don't have. For example, people
mentioned language-dependent character folding: to be able to do that
we need a large language-dependent database of collation data. That
probably means import or access the Unicode CLRD
(http://cldr.unicode.org/). (We could instead rely on the underlying
libc to provide that, but then it would only work on glibc-based
systems, and will require to switch locales each time we need another
language, which is IMO cumbersome, inefficient, and inelegant.) We
cannot seriously speak about language-dependent processing before we
have that data and functions to use it.
Having that infrastructure is also necessary for more sophisticated
language-sensitive processing that I think we should eventually have,
so patches to add such a functionality are welcome.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-26 23:18 ` Rasmus
@ 2015-11-27 7:46 ` Eli Zaretskii
0 siblings, 0 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-27 7:46 UTC (permalink / raw)
To: Rasmus; +Cc: emacs-devel
> From: Rasmus <rasmus@gmx.us>
> Date: Fri, 27 Nov 2015 00:18:35 +0100
>
> Per Starbäck <per@starback.se> writes:
>
> > * Why it is so important, now with the new search folding *
> >
> > For Scandinavians it is really important, because (with Swedish as
> > example) åäö are really totally their own letters in the Swedish
> > alphabet, regardless of their historic origin. To have a search for
> > "varpa" in a Swedish text find "värpa" or "varpå" would be just wrong.
> > It would give a strong impression of this being an American program
> > not meant to be used for Swedish.
>
> Still, imagine you are stuck in some environment where you do not have a
> Scando keyboard (or that you have to find a ’Øystein’ as a non-Scando).
> It may be useful to be useful to be able to search without having to
> access difficult letters (though the TeX input method solves most such
> issues for me).
Since there are various needs and situations, this is customizable,
both for the current-search and for future searches. So I don't think
this is worth arguing about: Emacs gives you both alternatives.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 18:41 Questions about isearch Eli Zaretskii
` (4 preceding siblings ...)
2015-11-27 0:43 ` Juri Linkov
@ 2015-11-27 8:02 ` Andreas Röhler
2015-11-27 8:57 ` Eli Zaretskii
5 siblings, 1 reply; 94+ messages in thread
From: Andreas Röhler @ 2015-11-27 8:02 UTC (permalink / raw)
To: emacs-devel; +Cc: Eli Zaretskii
Am 25.11.2015 um 19:41 schrieb Eli Zaretskii:
> These questions came out of review and extensive updates of the search
> and replace sections of the Emacs manual:
>
> 1. Character folding doesn't catch ligatures, such as æ (should it match
> the two characters "ae")?
>
> 2. It also doesn't match ä (a single character) with ä (2 characters,
> which Emacs correctly composes into 1 grapheme cluster). Should it?
>
> 3. With the default value t of isearch-hide-immediately, one match in
> invisible text is not hidden, and remains on display. To repro:
>
> emacs -Q
> C-x C-f etc/NEWS RET
> C-c C-q
> C-s require C-s <RIGHT>
>
> This leaves the match and its surrounding hidden text on screen. I
> can understand the rationale, but the doc string doesn't say anything
> about this feature. On the contrary, it says:
>
> Whatever the value, all opened invisible text is hidden again after
> exiting the search. ^^^
>
> 4. What is the equivalent of case-replace and the letter-case related
> behavior of replace commands to character folding? E.g., if the
> replace command specifies to replace "foo" with "bar", and we found
> "föo", should we replace it with "bär" or something, by analogy with
> letter-case behavior?
>
>
Considering language special cases worldwide at core will run into infinity.
Would expect support of unicode-characters. Mapping them should be the
task of special language-modes built upon, i.e. a text-norwegian etc.
In order to support languages, isearch might accept modifiers, like
fill-paragraph does with fill-paragraph-function. Thus the regexps
handed over may change.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 0:43 ` Juri Linkov
@ 2015-11-27 8:07 ` Eli Zaretskii
2015-11-27 23:24 ` Juri Linkov
0 siblings, 1 reply; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-27 8:07 UTC (permalink / raw)
To: Juri Linkov; +Cc: emacs-devel
> From: Juri Linkov <juri@linkov.net>
> Cc: emacs-devel@gnu.org
> Date: Fri, 27 Nov 2015 02:43:54 +0200
>
> > 3. With the default value t of isearch-hide-immediately, one match in
> > invisible text is not hidden, and remains on display. To repro:
> >
> > emacs -Q
> > C-x C-f etc/NEWS RET
> > C-c C-q
> > C-s require C-s <RIGHT>
> >
> > This leaves the match and its surrounding hidden text on screen. I
> > can understand the rationale, but the doc string doesn't say anything
> > about this feature. On the contrary, it says:
> >
> > Whatever the value, all opened invisible text is hidden again after
> > exiting the search. ^^^
>
> I see no answers to your 3rd question, so I wanted to clarify whether
> this is something new or can you reproduce the same in older versions?
I don't know if it's new; it probably isn't. And that isn't my
problem; my problem that triggered that question is solely to see that
the documentation of this option is correct and accurate.
So the only question that bothers me at this time is whether what I
described is the intended behavior, in which case the doc string needs
to be fixed (and in fact I already fixed it to that effect). Or maybe
the doc string is right and the code is wrong.
Can you tell?
Thanks.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 6:37 ` Richard Stallman
@ 2015-11-27 8:39 ` Eli Zaretskii
0 siblings, 0 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-27 8:39 UTC (permalink / raw)
To: rms; +Cc: sb, emacs-devel
> From: Richard Stallman <rms@gnu.org>
> CC: sb@dod.no, emacs-devel@gnu.org
> Date: Fri, 27 Nov 2015 01:37:52 -0500
>
> The user should be able to select any current language
> for a given buffer.
That would make the feature too tedious, at least to my taste.
But I'm not opposed to having that if someone finds it useful. We
currently lack significant infrastructure to do language-specific
processing; adding such infrastructure would be a good step forward,
and is needed as prerequisite for both the simplistic and the more
sophisticated features, so it should be welcome, I think.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 6:38 ` Richard Stallman
@ 2015-11-27 8:53 ` Eli Zaretskii
2015-11-27 16:21 ` raman
1 sibling, 0 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-27 8:53 UTC (permalink / raw)
To: rms; +Cc: per, sb, emacs-devel
> From: Richard Stallman <rms@gnu.org>
> CC: per@starback.se, emacs-devel@gnu.org, sb@dod.no
> Date: Fri, 27 Nov 2015 01:38:32 -0500
>
> You seem to be trying to design an ultimate, ideal current language
> facility.
Yes.
> We might want to get there eventually, but I think we should start
> with something simple.
IMO, the initial implementation could have only partial support for
multiple languages, but the design should allow for extending that all
the way towards the eventual goal, which cannot possibly be a single
language per buffer, not in Emacs 2X.
> After all, most buffers have only one language in them.
Not in my experience: buffers that combine English and Hebrew are
something I see every day. The simplest example is email: the headers
are in English, while the body is in a mix of Hebrew and Latin
(usually English) words.
> Having a selectable language for the whole buffer is going to be
> better than the current situation where you can't select it.
I agree.
> If you have a table in Turkish in the middle of a English document,
> and you want to sort the table, you can switch the buffer to Turkish,
> sort, and switch back to English.
Language-specific processing is not limited to sorting contiguous
regions in the buffer. This discussion started from Isearch, so the
example which underlines the issues is searching for a string with
character-folding enabled -- this should automatically apply
language-specific rules when it hits a possible match in the Turkish
portion, then switch back to English when the match is in the English
part. Same with spelling -- you'd want flyspell to use the right
language in each portion, without the need to restart the speller
program with another dictionary.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 8:02 ` Andreas Röhler
@ 2015-11-27 8:57 ` Eli Zaretskii
2015-11-27 10:03 ` Artur Malabarba
0 siblings, 1 reply; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-27 8:57 UTC (permalink / raw)
To: Andreas Röhler; +Cc: emacs-devel
> Cc: Eli Zaretskii <eliz@gnu.org>
> From: Andreas Röhler <andreas.roehler@online.de>
> Date: Fri, 27 Nov 2015 09:02:19 +0100
>
> > 4. What is the equivalent of case-replace and the letter-case related
> > behavior of replace commands to character folding? E.g., if the
> > replace command specifies to replace "foo" with "bar", and we found
> > "föo", should we replace it with "bär" or something, by analogy with
> > letter-case behavior?
>
> Considering language special cases worldwide at core will run into infinity.
The number of languages is finite.
> Would expect support of unicode-characters. Mapping them should be the
> task of special language-modes built upon, i.e. a text-norwegian etc.
The question I asked is should we do that _in_general_? If the answer
is YES, then the language-specific rules might tell _how_ to do that
in each case. But that's a different issue.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 8:57 ` Eli Zaretskii
@ 2015-11-27 10:03 ` Artur Malabarba
2015-11-27 10:29 ` Eli Zaretskii
2015-11-29 9:08 ` Andreas Röhler
0 siblings, 2 replies; 94+ messages in thread
From: Artur Malabarba @ 2015-11-27 10:03 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Andreas Röhler, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 790 bytes --]
On 27 Nov 2015 8:57 am, "Eli Zaretskii" <eliz@gnu.org> wrote:
> > Considering language special cases worldwide at core will run into
infinity.
>
> The number of languages is finite.
>
> > Would expect support of unicode-characters. Mapping them should be the
> > task of special language-modes built upon, i.e. a text-norwegian etc.
>
> The question I asked is should we do that _in_general_? If the answer
> is YES, then the language-specific rules might tell _how_ to do that
> in each case. But that's a different issue.
I think this topic goes beyond isearch, and people not reading the current
thread might interested in it. Maybe we should should start a new thread
just for this special "language support". Starting by listing points of
what would be the goals of such a feature.
[-- Attachment #2: Type: text/html, Size: 981 bytes --]
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 10:03 ` Artur Malabarba
@ 2015-11-27 10:29 ` Eli Zaretskii
2015-11-27 10:47 ` Artur Malabarba
2015-11-29 9:08 ` Andreas Röhler
1 sibling, 1 reply; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-27 10:29 UTC (permalink / raw)
To: bruce.connor.am; +Cc: andreas.roehler, emacs-devel
> Date: Fri, 27 Nov 2015 10:03:33 +0000
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Cc: emacs-devel <emacs-devel@gnu.org>, Andreas Röhler <andreas.roehler@online.de>
>
> Maybe we should should start a new thread just for this special
> "language support". Starting by listing points of what would be the
> goals of such a feature.
Please feel free.
But I must say that my OP was triggered by the need to document the
new features mentioned in NEWS as not yet documented. This is part of
preparing Emacs for the release of v25.1. I asked those questions
because working on the documentation made me wonder what should and
shouldn't work, such that the parts that we intend to work are
documented, and the entire feature is fairly complete and
self-consistent. Design and implementation of new significant
features should take a back seat at this time, if we want to release
Emacs 25.1 any time soon.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 10:29 ` Eli Zaretskii
@ 2015-11-27 10:47 ` Artur Malabarba
0 siblings, 0 replies; 94+ messages in thread
From: Artur Malabarba @ 2015-11-27 10:47 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Andreas Röhler, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 648 bytes --]
On 27 Nov 2015 10:29 am, "Eli Zaretskii" <eliz@gnu.org> wrote:
> But I must say that my OP was triggered by the need to document the
> new features mentioned in NEWS as not yet documented. This is part of
> preparing Emacs for the release of v25.1. I asked those questions
> because working on the documentation made me wonder what should and
> shouldn't work, such that the parts that we intend to work are
> documented, and the entire feature is fairly complete and
> self-consistent. Design and implementation of new significant
> features should take a back seat at this time, if we want to release
> Emacs 25.1 any time soon.
100% agreed.
[-- Attachment #2: Type: text/html, Size: 813 bytes --]
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-25 20:36 ` Eli Zaretskii
2015-11-25 21:49 ` Artur Malabarba
@ 2015-11-27 12:03 ` Artur Malabarba
2015-11-27 14:36 ` Eli Zaretskii
1 sibling, 1 reply; 94+ messages in thread
From: Artur Malabarba @ 2015-11-27 12:03 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Eli Zaretskii <eliz@gnu.org> writes:
> No, I meant to ask why it doesn't work already. AFAIU, the
> decomposition of ff is "ff":
>
> (get-char-code-property ?ff 'decomposition)
> => (compat 102 102)
>
> but searching for 'f' doesn't match the ligature.
It does for me. In this very buffer, if I isearch for 'f' I can get to
the ligature above. Are you sure char-fold was ON when you tested?
> (æ doesn't have a
> decomposition in the Unicode database, so maybe it's a different
> case.)
True. If people think it makes sense, we can add an ad-hoc rule for 'a'
to match 'æ'
>> > 2. It also doesn't match ä (a single character) with ä (2 characters,
>> > which Emacs correctly composes into 1 grapheme cluster). Should it?
Done now.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 12:03 ` Artur Malabarba
@ 2015-11-27 14:36 ` Eli Zaretskii
2015-11-27 16:50 ` Per Starbäck
2015-11-27 16:55 ` Artur Malabarba
0 siblings, 2 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-27 14:36 UTC (permalink / raw)
To: Artur Malabarba; +Cc: emacs-devel
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Cc: emacs-devel@gnu.org
> Date: Fri, 27 Nov 2015 12:03:11 +0000
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > No, I meant to ask why it doesn't work already. AFAIU, the
> > decomposition of ff is "ff":
> >
> > (get-char-code-property ?ff 'decomposition)
> > => (compat 102 102)
> >
> > but searching for 'f' doesn't match the ligature.
>
> It does for me. In this very buffer, if I isearch for 'f' I can get to
> the ligature above.
Right, it does. I think I tried "ff", not "f". Is that supposed to
work?
> Are you sure char-fold was ON when you tested?
It was in "emacs -Q", so yes.
> >> > 2. It also doesn't match ä (a single character) with ä (2 characters,
> >> > which Emacs correctly composes into 1 grapheme cluster). Should it?
>
> Done now.
Thanks.
But if this now work, why doesn't "ff" find ff or vice versa? Isn't
that the same case?
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 6:38 ` Richard Stallman
2015-11-27 8:53 ` Eli Zaretskii
@ 2015-11-27 16:21 ` raman
1 sibling, 0 replies; 94+ messages in thread
From: raman @ 2015-11-27 16:21 UTC (permalink / raw)
To: Richard Stallman; +Cc: Eli Zaretskii, per, sb, emacs-devel
Richard Stallman <rms@gnu.org> writes:
1+.
For the specific case of say a Turkish table in an English buffer, etc,
Emacs' facilities of narrow-to-region etc can be used to advantage while
applying language-specific processing. > [[[ To any NSA and FBI agents reading my email: please consider ]]]
> [[[ whether defending the US Constitution against all enemies, ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
> > So you will sort Turkish words in an otherwise English text according
> > to English rules? And spell-check them using an English dictionary?
> > I don't think so.
>
> You seem to be trying to design an ultimate, ideal current language
> facility. We might want to get there eventually, but I think we
> should start with something simple. After all, most buffers have only
> one language in them. If there are a few words in another language,
> the user probably won't find it hard to deal with the fact that Emacs
> does not know they are in another language.
>
> Having a selectable language for the whole buffer
> is going to be better than the current situation
> where you can't select it.
>
> If you have a table in Turkish in the middle of a English document,
> and you want to sort the table, you can switch the buffer to Turkish,
> sort, and switch back to English.
--
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 14:36 ` Eli Zaretskii
@ 2015-11-27 16:50 ` Per Starbäck
2015-11-27 18:10 ` Artur Malabarba
` (2 more replies)
2015-11-27 16:55 ` Artur Malabarba
1 sibling, 3 replies; 94+ messages in thread
From: Per Starbäck @ 2015-11-27 16:50 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Artur Malabarba, emacs-devel@gnu.org
Oh, I have so many thoughts about this, but I'll stick to the
character folding for now, which is why language setting in Emacs has
become a lot more urgent now than it has been during the previous
years I have wished for this.
As I wrote before, ÅÄÖ are really separate letters in Swedish, just
as separate from A and O as U is from V, or I is from J. I wrote:
> To have a search for
> "varpa" in a Swedish text find "värpa" or "varpå" would be just wrong.
> It would give a strong impression of this being an American program
> not meant to be used for Swedish.
One answer I got was that it's possible to turn this off. Yes, it is,
but defaults are important for what impression you give. I haven't
been active on the list for some time, but when I have expressed
opinions on Emacs here before it has often been not thinking about
myself, but thinking about the students that I teach Emacs, so that
*I* can change settings is not enough for my consideration.
Also character folding is a great feature! I don't want to turn it
off! It's just that it's bad to fold characters that are in no way
seen as variants but totally different letters.
There are few languages using Latin letters where it is like that, so
any universal poll will say that this isn't a big problem. (For
example Germans also use Ä and Ö, but much more seen as A-with-Umlaut
than as something separate.)
But to see how this will be received here, imagine that Emacs came
from the Roman empire. (The empire never ended!) Of course we all know
some Latin, so we have no problems with the menus and help texts being
in Latin, even though we often use it for editing texts in other
languages, like English. Now there's a new version with a new feature
character folding, and when you (an American user) try to use the new
version of Emacs you happen to edit a text
Can dualism still be considered valid?
You do a C-s to position yourself at "valid" there, but to your
surprise and irritation you have to type all five letters, because
still at "vali" you are stuck in "dualism" because those imperialistic
Romans think that U and V are the "same" letter. That's just wrong.
So what is the right way out? A possibility to set buffer language
says I. Eli says that a buffer language is not enough:
Eli:
> This discussion started from Isearch, so the
> example which underlines the issues is searching for a string with
> character-folding enabled -- this should automatically apply
> language-specific rules when it hits a possible match in the Turkish
> portion, then switch back to English when the match is in the English
> part.
I don't agree, and see this as an important difference between the
language of a segment and the language of a
document (which I would write a lot more about if I didn't try to
stick just to the character folding issue now).
If you are a non-Swede looking at a text including
: Eli Heckscher referred to this in his "Varpå beror det att några
: människor är rika och andra fattiga?" from 1913.
and do an Isearch for "varpa" with accent folding on, you *should*
find that "Varpå". You see the text with some "Varpa" with some
diacretical mark, of course you should find it with that search. You
can't be expected to know about Swedish preferences just because there
happens to be a short text fragment in Swedish in the text.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 14:36 ` Eli Zaretskii
2015-11-27 16:50 ` Per Starbäck
@ 2015-11-27 16:55 ` Artur Malabarba
2015-11-27 17:52 ` Eli Zaretskii
2015-11-27 21:18 ` Stephen Berman
1 sibling, 2 replies; 94+ messages in thread
From: Artur Malabarba @ 2015-11-27 16:55 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1399 bytes --]
On 27 Nov 2015 2:36 pm, "Eli Zaretskii" <eliz@gnu.org> wrote:
> > It does for me. In this very buffer, if I isearch for 'f' I can get to
> > the ligature above.
>
> Right, it does. I think I tried "ff", not "f". Is that supposed to
> work?
No. We don't support having multiple characters match a single string.
This is a design limitation. We can (and should) discuss improving this.
But for now I think it should be documented as not supported.
> > >> > 2. It also doesn't match ä (a single character) with ä (2
characters,
> > >> > which Emacs correctly composes into 1 grapheme cluster). Should it?
> >
> > Done now.
>
> Thanks.
>
> But if this now work, why doesn't "ff" find ff or vice versa? Isn't
> that the same case?
No. Each one is a different scenario here.
- "ff" not finding ff is a case of multiple chars in the search string
can't be collapsed as a single thing (see above). It's the same reason why
'ä' still doesn't match ä.
- ä now finds 'ä'. Because that is exactly its decomposition.
- ff doesn't find "ff", because the decomposition of ff is not exactly (f f),
it's actually (compat f f). This was a decision, it's not a limitation.
I figured that a character should only match its decomposition if the
decomposition is strictly made of chars. Otherwise you get things like ¹
matching 1 (which I thought we didn't want).
[-- Attachment #2: Type: text/html, Size: 1811 bytes --]
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 16:55 ` Artur Malabarba
@ 2015-11-27 17:52 ` Eli Zaretskii
2015-11-27 21:18 ` Stephen Berman
1 sibling, 0 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-27 17:52 UTC (permalink / raw)
To: bruce.connor.am; +Cc: emacs-devel
> Date: Fri, 27 Nov 2015 16:55:45 +0000
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Cc: emacs-devel <emacs-devel@gnu.org>
>
> > Right, it does. I think I tried "ff", not "f". Is that supposed to
> > work?
>
> No. We don't support having multiple characters match a single string.
>
> This is a design limitation. We can (and should) discuss improving this. But
> for now I think it should be documented as not supported.
Is it reasonable to have ä match ä, but not the other way around?
> - ä now finds 'ä'. Because that is exactly its decomposition.
> - ff doesn't find "ff", because the decomposition of ff is not exactly (f f),
> it's actually (compat f f). This was a decision, it's not a limitation.
So you are saying we support canonical decompositions, but not
compatibility decompositions, I see. However, it sounds inconsistent
to me, because searching for a does find ⓐ, although ⓐ's decomposition
is also "not exactly a". I'm afraid it will be hard to explain to the
users why some of these match, while others don't.
Are there any downsides in adding compatibility decompositions to what
character folding supports?
> I figured that a character should only match its decomposition if the
> decomposition is strictly made of chars. Otherwise you get things like ¹
> matching 1 (which I thought we didn't want).
Well, I think we do want that. At least MS Word does that by default,
so it isn't entirely silly or without precedent.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 16:50 ` Per Starbäck
@ 2015-11-27 18:10 ` Artur Malabarba
2015-11-27 18:42 ` Per Starbäck
2015-11-27 21:33 ` raman
2016-02-28 0:27 ` Mathias Dahl
2 siblings, 1 reply; 94+ messages in thread
From: Artur Malabarba @ 2015-11-27 18:10 UTC (permalink / raw)
To: Per Starbäck; +Cc: Eli Zaretskii, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 392 bytes --]
On 27 Nov 2015 4:50 pm, "Per Starbäck" <per.starback@gmail.com> wrote:
> As I wrote before, ÅÄÖ are really separate letters in Swedish,
Do they have their own keys on the keyboard?
In Portuguese, aãá are never interchangeable. Still, I find char folding
very convenient because it saves me keystrokes (ã and á don't get their own
keys so they require two keystrokes to type).
[-- Attachment #2: Type: text/html, Size: 505 bytes --]
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 18:10 ` Artur Malabarba
@ 2015-11-27 18:42 ` Per Starbäck
0 siblings, 0 replies; 94+ messages in thread
From: Per Starbäck @ 2015-11-27 18:42 UTC (permalink / raw)
To: Artur Malabarba; +Cc: Eli Zaretskii, emacs-devel
>> As I wrote before, ÅÄÖ are really separate letters in Swedish,
>
> Do they have their own keys on the keyboard?
> In Portuguese, aãá are never interchangeable. Still, I find char folding
> very convenient because it saves me keystrokes (ã and á don't get their own
> keys so they require two keystrokes to type).
Yes, of course they have, as they are just as much different letters as I and J.
Being interchangeable is not the same thing. For example "e" and "é"
are not interchangeable in Swedish either, and "ide" and "idé" are
different words, but having "C-s i d e" find "idé" would be good
character folding as "é" is "e" with an accent.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-26 3:40 ` Eli Zaretskii
@ 2015-11-27 19:50 ` Mike Kupfer
2015-11-27 20:06 ` Eli Zaretskii
0 siblings, 1 reply; 94+ messages in thread
From: Mike Kupfer @ 2015-11-27 19:50 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Eli Zaretskii wrote:
> > From: Mike Kupfer <m.kupfer@acm.org>
> > cc: emacs-devel <emacs-devel@gnu.org>
> > Date: Wed, 25 Nov 2015 15:04:37 -0800
> >
> > To give a more concrete example, if I try to replace "papa" ("pope" in
> > Italian) with "Francis", I would not want Emacs to also replace (or even
> > suggest replacing) "papà" ("dad" in Italian) with "Francis".
>
> By default, Emacs won't. But if you set replace-character-fold
> non-nil, it will.
Ah, I see; thanks.
Assuming that there won't be any major changes for 25.1 in this area, I
think it would be helpful for the "Lax Search" Info node to say
something about replace-character-fold, particularly since that node
mentions the relationship between case-fold-search and replace commands.
And maybe replace-character-fold should be listed in the "Search
Customizations" node?
Also, I'm confused about the exact semantics of replace-character-fold.
Its help string says it applies to query-replace. Experimentation shows
that it also applies to replace-string, but not replace-regexp ("[ab]"
does not match "ä" even when replace-character-fold is non-nil). I'm
not sure what's intended here, particularly since replace-regexp does
honor case-fold-search.
And speaking of case-fold-search, it is documented as buffer-local when
set. Should search-default-regexp-mode and replace-character-fold do
the same?
regards,
mike
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 19:50 ` Mike Kupfer
@ 2015-11-27 20:06 ` Eli Zaretskii
2015-11-27 23:57 ` Artur Malabarba
2015-11-28 1:36 ` Mike Kupfer
0 siblings, 2 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-27 20:06 UTC (permalink / raw)
To: Mike Kupfer; +Cc: emacs-devel
> From: Mike Kupfer <m.kupfer@acm.org>
> cc: emacs-devel@gnu.org
> Date: Fri, 27 Nov 2015 11:50:18 -0800
>
> Assuming that there won't be any major changes for 25.1 in this area, I
> think it would be helpful for the "Lax Search" Info node to say
> something about replace-character-fold, particularly since that node
> mentions the relationship between case-fold-search and replace commands.
> And maybe replace-character-fold should be listed in the "Search
> Customizations" node?
There's a companion node "Replacement and Lax Matches", which
describes this variable.
> Also, I'm confused about the exact semantics of replace-character-fold.
> Its help string says it applies to query-replace.
That's a mistake that should be fixed, thanks.
> Experimentation shows that it also applies to replace-string, but
> not replace-regexp ("[ab]" does not match "ä" even when
> replace-character-fold is non-nil). I'm not sure what's intended
> here, particularly since replace-regexp does honor case-fold-search.
Not sure whether this is intended, please submit a bug report.
> And speaking of case-fold-search, it is documented as buffer-local when
> set. Should search-default-regexp-mode and replace-character-fold do
> the same?
No, I don't think so. case-fold-search is not only for searching
commands, so it follows a different logic.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 16:55 ` Artur Malabarba
2015-11-27 17:52 ` Eli Zaretskii
@ 2015-11-27 21:18 ` Stephen Berman
2015-11-28 0:04 ` Artur Malabarba
2015-11-28 5:36 ` Richard Stallman
1 sibling, 2 replies; 94+ messages in thread
From: Stephen Berman @ 2015-11-27 21:18 UTC (permalink / raw)
To: Artur Malabarba; +Cc: Eli Zaretskii, emacs-devel
On Fri, 27 Nov 2015 16:55:45 +0000 Artur Malabarba <bruce.connor.am@gmail.com> wrote:
> On 27 Nov 2015 2:36 pm, "Eli Zaretskii" <eliz@gnu.org> wrote:
>> > It does for me. In this very buffer, if I isearch for 'f' I can get to
>> > the ligature above.
>>
>> Right, it does. I think I tried "ff", not "f". Is that supposed to
>> work?
>
> No. We don't support having multiple characters match a single string.
Is this why "ss" does not match the German letter "ß"? I assume the
reason "s" does not match "ß" is that the latter does not have a
decomposition including "s", whereas the decomposition of e.g. "ff" does
include "f", correct? (Though I actually think that may be the
preferred behavior for the search string "s" when searching German text,
in contrast to the search string "ss", which I think should be able to
find "ß".)
In fact, looking at the value of character-fold-table, it seems to me
that the current implementation of folding based on character
decomposition often yields surprising results: e.g. "f" matches not only
"ff" but also "㎙" and "ffl", but "m" and "l" fail to match the latter two,
respectively. I would expect these three search string either all to
match or all to fail to match all three composed character strings.
Another shortcoming is that the decompositions do not respect
case-folding, e.g. "f" fails to match "ℱ" and "℻" (with case-folding
enabled), whereas "F" does match them, but fails to match "ff",
etc. (also, "A" and "X" fail to match "℻").
Steve Berman
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 16:50 ` Per Starbäck
2015-11-27 18:10 ` Artur Malabarba
@ 2015-11-27 21:33 ` raman
2016-02-28 0:27 ` Mathias Dahl
2 siblings, 0 replies; 94+ messages in thread
From: raman @ 2015-11-27 21:33 UTC (permalink / raw)
To: Per Starbäck; +Cc: Eli Zaretskii, Artur Malabarba, emacs-devel@gnu.org
This is issue may be better thought of as a character-set issue, rather
than a language issue.
--
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 8:07 ` Eli Zaretskii
@ 2015-11-27 23:24 ` Juri Linkov
2015-11-28 8:09 ` Eli Zaretskii
0 siblings, 1 reply; 94+ messages in thread
From: Juri Linkov @ 2015-11-27 23:24 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
>> > 3. With the default value t of isearch-hide-immediately, one match in
>> > invisible text is not hidden, and remains on display. To repro:
>> >
>> > emacs -Q
>> > C-x C-f etc/NEWS RET
>> > C-c C-q
>> > C-s require C-s <RIGHT>
>> >
>> > This leaves the match and its surrounding hidden text on screen. I
>> > can understand the rationale, but the doc string doesn't say anything
>> > about this feature. On the contrary, it says:
>> >
>> > Whatever the value, all opened invisible text is hidden again after
>> > exiting the search. ^^^
>>
>> I see no answers to your 3rd question, so I wanted to clarify whether
>> this is something new or can you reproduce the same in older versions?
>
> I don't know if it's new; it probably isn't. And that isn't my
> problem; my problem that triggered that question is solely to see that
> the documentation of this option is correct and accurate.
>
> So the only question that bothers me at this time is whether what I
> described is the intended behavior, in which case the doc string needs
> to be fixed (and in fact I already fixed it to that effect). Or maybe
> the doc string is right and the code is wrong.
>
> Can you tell?
I believe this is the intended behavior since the comment of
isearch-clean-overlays says this explicitly:
;; This is called when exiting isearch. It closes the temporary
;; opened overlays, except the ones that contain the latest match.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 20:06 ` Eli Zaretskii
@ 2015-11-27 23:57 ` Artur Malabarba
2015-11-28 1:36 ` Mike Kupfer
1 sibling, 0 replies; 94+ messages in thread
From: Artur Malabarba @ 2015-11-27 23:57 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Mike Kupfer, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 607 bytes --]
On 27 Nov 2015 8:06 pm, "Eli Zaretskii" <eliz@gnu.org> wrote:
> > Experimentation shows that it also applies to replace-string, but
> > not replace-regexp ("[ab]" does not match "ä" even when
> > replace-character-fold is non-nil). I'm not sure what's intended
> > here, particularly since replace-regexp does honor case-fold-search.
>
> Not sure whether this is intended, please submit a bug report.
It's a known limitation. Char folding works by converting a plain string to
a regexp, so it does not work on regexps.
The same happen with isearch. You can't do a char-folding regexp isearch.
[-- Attachment #2: Type: text/html, Size: 785 bytes --]
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 21:18 ` Stephen Berman
@ 2015-11-28 0:04 ` Artur Malabarba
2015-11-28 7:49 ` Eli Zaretskii
2015-11-28 16:14 ` Stephen Berman
2015-11-28 5:36 ` Richard Stallman
1 sibling, 2 replies; 94+ messages in thread
From: Artur Malabarba @ 2015-11-28 0:04 UTC (permalink / raw)
To: Stephen Berman; +Cc: Eli Zaretskii, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1120 bytes --]
On 27 Nov 2015 9:18 pm, "Stephen Berman" <stephen.berman@gmx.net> wrote:
> > No. We don't support having multiple characters match a single string.
>
> Is this why "ss" does not match the German letter "ß"?
Indeed.
> I assume the
> reason "s" does not match "ß" is that the latter does not have a
> decomposition including "s", whereas the decomposition of e.g. "ff" does
> include "f", correct?
Yes.
> In fact, looking at the value of character-fold-table, it seems to me
> that the current implementation of folding based on character
> decomposition often yields surprising results: e.g. "f" matches not only
> "ff" but also "㎙" and "ffl", but "m" and "l" fail to match the latter two,
> respectively.
This was by choice, and it would be trivial to change. Do others find it
surprising?
> Another shortcoming is that the decompositions do not respect
> case-folding, e.g. "f" fails to match "ℱ" and "℻" (with case-folding
> enabled), whereas "F" does match them, but fails to match "ff".
True. This can be fixed, I think. Could you file a bug report so we don't
forget?
[-- Attachment #2: Type: text/html, Size: 1589 bytes --]
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 20:06 ` Eli Zaretskii
2015-11-27 23:57 ` Artur Malabarba
@ 2015-11-28 1:36 ` Mike Kupfer
2015-11-28 9:28 ` Eli Zaretskii
1 sibling, 1 reply; 94+ messages in thread
From: Mike Kupfer @ 2015-11-28 1:36 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Eli Zaretskii wrote:
> > From: Mike Kupfer <m.kupfer@acm.org>
> > cc: emacs-devel@gnu.org
> > Date: Fri, 27 Nov 2015 11:50:18 -0800
> >
> > Assuming that there won't be any major changes for 25.1 in this area, I
> > think it would be helpful for the "Lax Search" Info node to say
> > something about replace-character-fold, particularly since that node
> > mentions the relationship between case-fold-search and replace commands.
> > And maybe replace-character-fold should be listed in the "Search
> > Customizations" node?
>
> There's a companion node "Replacement and Lax Matches", which
> describes this variable.
Okay, so can a cross-reference to "Replacement and Lax Matches" be added
to the "Lax Search" node? I mean, I did what you suggested in an
earlier reply to someone else: I went straight to the "Lax Search" node.
I didn't see anything in there to give me a clue about
replace-character-fold. I did see the cross-reference to "Replace", but
that was in the context of case-fold-search, which, unlike character
folding, does apply to replace commands. With the current "Lax Search"
text, there's just not enough of a hint to the reader that additional
important information is available.
Also, will the help strings for the search and the replace functions be
updated to mention the relevant character folding variables? They
already mention case-fold-search. (And I'd be less concerned about the
"Lax Search" text if the help string gave me the right clue.)
thanks,
mike
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 21:18 ` Stephen Berman
2015-11-28 0:04 ` Artur Malabarba
@ 2015-11-28 5:36 ` Richard Stallman
2015-11-28 8:33 ` Eli Zaretskii
2015-11-28 8:40 ` Marcin Borkowski
1 sibling, 2 replies; 94+ messages in thread
From: Richard Stallman @ 2015-11-28 5:36 UTC (permalink / raw)
To: Stephen Berman; +Cc: eliz, bruce.connor.am, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Ligatures are a different issue from letters with diacritics. I think
that ideally ligatures should be equivalent, in search, to the
sequence of characters they combine.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 0:04 ` Artur Malabarba
@ 2015-11-28 7:49 ` Eli Zaretskii
2015-11-28 16:14 ` Stephen Berman
1 sibling, 0 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-28 7:49 UTC (permalink / raw)
To: bruce.connor.am; +Cc: stephen.berman, emacs-devel
> Date: Sat, 28 Nov 2015 00:04:33 +0000
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Cc: emacs-devel <emacs-devel@gnu.org>, Eli Zaretskii <eliz@gnu.org>
>
> On 27 Nov 2015 9:18 pm, "Stephen Berman" <stephen.berman@gmx.net> wrote:
> > > No. We don't support having multiple characters match a single string.
> >
> > Is this why "ss" does not match the German letter "ß"?
>
> Indeed.
In fact, ß doesn't have a decomposition at all in the Unicode
database:
(get-char-code-property ?ß 'decomposition) => 223
IOW, it "decomposes" into itself, an indication of no decomposition.
> > I assume the
> > reason "s" does not match "ß" is that the latter does not have a
> > decomposition including "s", whereas the decomposition of e.g. "ff" does
> > include "f", correct?
>
> Yes.
>
> > In fact, looking at the value of character-fold-table, it seems to me
> > that the current implementation of folding based on character
> > decomposition often yields surprising results: e.g. "f" matches not only
> > "ff" but also "㎙" and "ffl", but "m" and "l" fail to match the latter two,
> > respectively.
>
> This was by choice, and it would be trivial to change. Do others find it
> surprising?
I do. I think these should match.
> > Another shortcoming is that the decompositions do not respect
> > case-folding, e.g. "f" fails to match "ℱ" and "℻" (with case-folding
> > enabled), whereas "F" does match them, but fails to match "ff".
>
> True. This can be fixed, I think. Could you file a bug report so we don't
> forget?
This should be fixed for v25.1 as well, I think.
Thanks.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 23:24 ` Juri Linkov
@ 2015-11-28 8:09 ` Eli Zaretskii
0 siblings, 0 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-28 8:09 UTC (permalink / raw)
To: Juri Linkov; +Cc: emacs-devel
> From: Juri Linkov <juri@linkov.net>
> Cc: emacs-devel@gnu.org
> Date: Sat, 28 Nov 2015 01:24:36 +0200
>
> I believe this is the intended behavior since the comment of
> isearch-clean-overlays says this explicitly:
>
> ;; This is called when exiting isearch. It closes the temporary
> ;; opened overlays, except the ones that contain the latest match.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Thanks, this means the changes I did in the documentation are TRT.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 5:36 ` Richard Stallman
@ 2015-11-28 8:33 ` Eli Zaretskii
2015-11-28 8:40 ` Marcin Borkowski
1 sibling, 0 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-28 8:33 UTC (permalink / raw)
To: rms; +Cc: stephen.berman, bruce.connor.am, emacs-devel
> From: Richard Stallman <rms@gnu.org>
> CC: bruce.connor.am@gmail.com, eliz@gnu.org, emacs-devel@gnu.org
> Date: Sat, 28 Nov 2015 00:36:21 -0500
>
> Ligatures are a different issue from letters with diacritics. I think
> that ideally ligatures should be equivalent, in search, to the
> sequence of characters they combine.
I agree. I think we should make that work for Emacs 25.1, because
anything else means too much inconsistency, and will be hard to
explain and document.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 5:36 ` Richard Stallman
2015-11-28 8:33 ` Eli Zaretskii
@ 2015-11-28 8:40 ` Marcin Borkowski
2015-11-28 9:46 ` Eli Zaretskii
1 sibling, 1 reply; 94+ messages in thread
From: Marcin Borkowski @ 2015-11-28 8:40 UTC (permalink / raw)
To: rms; +Cc: eliz, Stephen Berman, bruce.connor.am, emacs-devel
On 2015-11-28, at 06:36, Richard Stallman <rms@gnu.org> wrote:
> [[[ To any NSA and FBI agents reading my email: please consider ]]]
> [[[ whether defending the US Constitution against all enemies, ]]]
> [[[ foreign or domestic, requires you to follow Snowden's example. ]]]
>
> Ligatures are a different issue from letters with diacritics. I think
> that ideally ligatures should be equivalent, in search, to the
> sequence of characters they combine.
Watching this discussion, I'm just astonished that no-one complained
(yet?) that searching for "et" does not find "&" (and/or vice versa).
;-)
Best,
--
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 1:36 ` Mike Kupfer
@ 2015-11-28 9:28 ` Eli Zaretskii
0 siblings, 0 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-28 9:28 UTC (permalink / raw)
To: Mike Kupfer; +Cc: emacs-devel
> From: Mike Kupfer <m.kupfer@acm.org>
> cc: emacs-devel@gnu.org
> Date: Fri, 27 Nov 2015 17:36:08 -0800
>
> > There's a companion node "Replacement and Lax Matches", which
> > describes this variable.
>
> Okay, so can a cross-reference to "Replacement and Lax Matches" be added
> to the "Lax Search" node?
I added it now.
> Also, will the help strings for the search and the replace functions be
> updated to mention the relevant character folding variables? They
> already mention case-fold-search.
I found no search commands whose doc strings mention case-fold-search.
I did find such references in replace commands, and added the
reference to replace-character-fold there.
Thanks. In the future, please post such suggestions as bug reports
rather than here; if nothing else, that makes it easier to refer to
the discussions in the log messages.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 8:40 ` Marcin Borkowski
@ 2015-11-28 9:46 ` Eli Zaretskii
2015-11-28 10:23 ` Artur Malabarba
0 siblings, 1 reply; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-28 9:46 UTC (permalink / raw)
To: Marcin Borkowski; +Cc: stephen.berman, rms, bruce.connor.am, emacs-devel
> From: Marcin Borkowski <mbork@mbork.pl>
> Date: Sat, 28 Nov 2015 09:40:06 +0100
> Cc: eliz@gnu.org, Stephen Berman <stephen.berman@gmx.net>,
> bruce.connor.am@gmail.com, emacs-devel@gnu.org
>
> Watching this discussion, I'm just astonished that no-one complained
> (yet?) that searching for "et" does not find "&" (and/or vice versa).
Why complain? Emacs lets you customize this feature to do that as
well.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 9:46 ` Eli Zaretskii
@ 2015-11-28 10:23 ` Artur Malabarba
2015-11-28 11:14 ` Eli Zaretskii
` (4 more replies)
0 siblings, 5 replies; 94+ messages in thread
From: Artur Malabarba @ 2015-11-28 10:23 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Stephen Berman, Richard Stallman, emacs-devel
Ok. I'm going to work on the char-folding a little bit more today to
implement support for multi-char matches and to combine it with
case-folding. Hopefully that will iron out the final inconsistencies.
2015-11-28 9:46 GMT+00:00 Eli Zaretskii <eliz@gnu.org>:
>> From: Marcin Borkowski <mbork@mbork.pl>
>> Date: Sat, 28 Nov 2015 09:40:06 +0100
>> Cc: eliz@gnu.org, Stephen Berman <stephen.berman@gmx.net>,
>> bruce.connor.am@gmail.com, emacs-devel@gnu.org
>>
>> Watching this discussion, I'm just astonished that no-one complained
>> (yet?) that searching for "et" does not find "&" (and/or vice versa).
>
> Why complain? Emacs lets you customize this feature to do that as
> well.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 10:23 ` Artur Malabarba
@ 2015-11-28 11:14 ` Eli Zaretskii
2015-11-28 14:41 ` Eli Zaretskii
` (3 subsequent siblings)
4 siblings, 0 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-28 11:14 UTC (permalink / raw)
To: bruce.connor.am; +Cc: stephen.berman, rms, emacs-devel
> Date: Sat, 28 Nov 2015 10:23:12 +0000
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Cc: Marcin Borkowski <mbork@mbork.pl>, Richard Stallman <rms@gnu.org>, Stephen Berman <stephen.berman@gmx.net>,
> emacs-devel <emacs-devel@gnu.org>
>
> Ok. I'm going to work on the char-folding a little bit more today to
> implement support for multi-char matches and to combine it with
> case-folding. Hopefully that will iron out the final inconsistencies.
Thanks.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 10:23 ` Artur Malabarba
2015-11-28 11:14 ` Eli Zaretskii
@ 2015-11-28 14:41 ` Eli Zaretskii
2015-11-28 15:41 ` Artur Malabarba
2015-11-28 16:48 ` character folding future [was: Questions about isearch] Drew Adams
` (2 subsequent siblings)
4 siblings, 1 reply; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-28 14:41 UTC (permalink / raw)
To: bruce.connor.am; +Cc: emacs-devel
> Date: Sat, 28 Nov 2015 10:23:12 +0000
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Cc: Stephen Berman <stephen.berman@gmx.net>, Richard Stallman <rms@gnu.org>,
> emacs-devel <emacs-devel@gnu.org>
>
> Ok. I'm going to work on the char-folding a little bit more today to
> implement support for multi-char matches and to combine it with
> case-folding. Hopefully that will iron out the final inconsistencies.
Maybe you could also take a look at this document:
http://www.unicode.org/reports/tr30/tr30-4.html
(This is a draft of a report that was never approved, but that doesn't
mean it cannot teach us something useful.)
In particular, section 5.2 there mentions several problematic
foldings, which we might consider disabling. For example, the ones
mentioned in 5.2.1 and 5.2.2.
Thanks.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 14:41 ` Eli Zaretskii
@ 2015-11-28 15:41 ` Artur Malabarba
2015-11-28 16:29 ` Artur Malabarba
0 siblings, 1 reply; 94+ messages in thread
From: Artur Malabarba @ 2015-11-28 15:41 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
2015-11-28 14:41 GMT+00:00 Eli Zaretskii <eliz@gnu.org>:
>> Date: Sat, 28 Nov 2015 10:23:12 +0000
>> From: Artur Malabarba <bruce.connor.am@gmail.com>
>> Cc: Stephen Berman <stephen.berman@gmx.net>, Richard Stallman <rms@gnu.org>,
>> emacs-devel <emacs-devel@gnu.org>
>>
>> Ok. I'm going to work on the char-folding a little bit more today to
>> implement support for multi-char matches and to combine it with
>> case-folding. Hopefully that will iron out the final inconsistencies.
I'm running bootstrap now to make sure I didn't break anything. Then I'll push.
> Maybe you could also take a look at this document:
>
> http://www.unicode.org/reports/tr30/tr30-4.html
>
> In particular, section 5.2 there mentions several problematic
> foldings, which we might consider disabling. For example, the ones
> mentioned in 5.2.1 and 5.2.2.
Thanks for the pointer.
None of those really worry me WRT searching. Char folding is supposed
to be convenient at the cost of being unable to distinguish some
strings.
But I guess they could be a problem for query-replace. Someone
replacing 58 with 59 probably doesn't want to replace 5⑧ with 59.
Since folding is disabled by default on quuery-replace, I think it
would be a bit of shame to disable these "risky" foldings completely.
Perhaps query-replace could use a different char table, with only a
subset. Or it perhaps it would be sufficient to just make this
"danger" very clear in the docstring of `replace-character-fold'.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 0:04 ` Artur Malabarba
2015-11-28 7:49 ` Eli Zaretskii
@ 2015-11-28 16:14 ` Stephen Berman
1 sibling, 0 replies; 94+ messages in thread
From: Stephen Berman @ 2015-11-28 16:14 UTC (permalink / raw)
To: Artur Malabarba; +Cc: Eli Zaretskii, emacs-devel
On Sat, 28 Nov 2015 00:04:33 +0000 Artur Malabarba <bruce.connor.am@gmail.com> wrote:
> On 27 Nov 2015 9:18 pm, "Stephen Berman" <stephen.berman@gmx.net> wrote:
>> > No. We don't support having multiple characters match a single string.
>>
>> Is this why "ss" does not match the German letter "ß"?
>
> Indeed.
>
>> I assume the
>> reason "s" does not match "ß" is that the latter does not have a
>> decomposition including "s", whereas the decomposition of e.g. "ff" does
>> include "f", correct?
>
> Yes.
>
>> In fact, looking at the value of character-fold-table, it seems to me
>> that the current implementation of folding based on character
>> decomposition often yields surprising results: e.g. "f" matches not only
>> "ff" but also "㎙" and "ffl", but "m" and "l" fail to match the latter two,
>> respectively.
>
> This was by choice, and it would be trivial to change. Do others find it surprising?
>
>> Another shortcoming is that the decompositions do not respect
>> case-folding, e.g. "f" fails to match "ℱ" and "℻" (with case-folding
>> enabled), whereas "F" does match them, but fails to match "ff".
>
> True. This can be fixed, I think. Could you file a bug report so we don't forget?
>
>
Although you already said you'd be working on these issues (thanks), I
filed a bug for the record (bug#22038).
Steve Berman
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 15:41 ` Artur Malabarba
@ 2015-11-28 16:29 ` Artur Malabarba
2015-11-28 17:27 ` Eli Zaretskii
2015-11-28 17:44 ` Eli Zaretskii
0 siblings, 2 replies; 94+ messages in thread
From: Artur Malabarba @ 2015-11-28 16:29 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
ä2015-11-28 15:41 GMT+00:00 Artur Malabarba <bruce.connor.am@gmail.com>:
> 2015-11-28 14:41 GMT+00:00 Eli Zaretskii <eliz@gnu.org>:
>>> Date: Sat, 28 Nov 2015 10:23:12 +0000
>>> From: Artur Malabarba <bruce.connor.am@gmail.com>
>>> Cc: Stephen Berman <stephen.berman@gmx.net>, Richard Stallman <rms@gnu.org>,
>>> emacs-devel <emacs-devel@gnuuite rg>
>>>
>>> Ok. I'm going to work on the char-folding a little bit more today to
>>> implement support for multi-char matches and to combine it with
>>> case-folding. Hopefully that will iron out the final inconsistencies.
>
> I'm running bootstrap now to make sure I didn't break anything. Then I'll push.
It is now pushed. I changed quite a bit of the logic, so please do
look out for regressions.
Things we do now:
- 'ä' matches 'ä'
- 'ä' matches 'ä'
- 'a' matches both of them
- 'ff' matches 'ff'
- 'ff' does NOT match 'ff'. This is by choice, because the
decomposition of 'ff' is actually (compat f f). We can change this
choice if desired.
- `case-fold-search' is respected.
^ permalink raw reply [flat|nested] 94+ messages in thread
* character folding future [was: Questions about isearch]
2015-11-28 10:23 ` Artur Malabarba
2015-11-28 11:14 ` Eli Zaretskii
2015-11-28 14:41 ` Eli Zaretskii
@ 2015-11-28 16:48 ` Drew Adams
2015-11-28 18:34 ` Artur Malabarba
2015-12-01 11:34 ` Artur Malabarba
2015-11-29 6:03 ` Questions about isearch Richard Stallman
2015-11-29 9:39 ` Andreas Röhler
4 siblings, 2 replies; 94+ messages in thread
From: Drew Adams @ 2015-11-28 16:48 UTC (permalink / raw)
To: bruce.connor.am, Eli Zaretskii
Cc: Stephen Berman, Richard Stallman, emacs-devel
> Ok. I'm going to work on the char-folding a little bit more today to
> implement support for multi-char matches and to combine it with
> case-folding. Hopefully that will iron out the final inconsistencies.
Thanks for working on this, Artur. I invite you to also
take a look at some code I wrote for this, which I've put
in `character-fold+.el'. It follows a previous discussion.
Any of that, or similar, that gets added to vanilla Emacs
will mean one less thing for me to bother with. ;-)
A description is here:
http://www.emacswiki.org/emacs/CharacterFoldPlus.
The code is here:
http://www.emacswiki.org/emacs/download/character-fold%2b.el
The additions are essentially these:
1. An option, `char-fold-ad-hoc', for the ad hoc char foldings.
Default value: the same ad hoc foldings as vanilla Emacs
(quotation marks).
2. A Boolean option, `char-fold-symmetric', which when non-nil
means that all members of a folding equivalence class are
treated equivalently, whether base char, compositions, or
other strings of chars. This lets you search for e' or é
and find e and any of the other members of its class
(including composition strings). The default value is nil
(off).
3. A general workhorse function, `update-char-fold-table',
that updates the value of variable `character-fold-table'
(from which it was derived). It is used when option
`char-fold-symmetric' is toggled, and it makes use of
options `char-fold-ad-hoc' and `char-fold-symmetric'.
4. `character-fold-to-regexp' is advised, to reflect whether
char folding is currently symmetric.
Library Isearch+ provides a toggle for `char-fold-symmetric',
bound by default to `M-s =' during Isearch.
Another Isearch toggle can be useful when char folding is
symmetric: `M-s h L', which toggles lazy highlighting, which
can slow things down when using symmetric char folding.
The code for `isearch+.el' is here:
http://www.emacswiki.org/emacs/download/isearch%2b.el
Earlier, I invited a discussion about future customization
of character folding (and folding in general). That hasn't
happened, so far. But `char-fold-ad-hoc' could be a start.
One possibility is for an alist option, whose entries would
each be a list (MODES CLASSES), where CLASSES is a list of
char-folding classes such as that of `char-fold-ad-hoc'.
When any of the MODES is current, those CLASSES would be
used by `update-char-fold-table'.
Users could thus:
1. Add their own equivalence classes.
2. Associate any number of such classes with particular modes.
3. Customize the ad hoc classes used by default.
In addition, we could provide the class that abstracts
from diacriticals explicitly, as another, non-customizable
(?) class, so that users could include or exclude it too
wrt specific modes. (Currently it is implicit in char
folding, i.e., hard-coded.)
Letting users exclude the broad diacritical class and
include their own classes would accomodate wanting some
diacritical foldings but not others. With symmetric
folding it should offer considerable flexibility.
Utility functions that do some of the work currently done
by `update-char-fold-table' could be created, to be used
by users to easily create their own diacritical classes.
Currently, that part is still hard-coded (only ad hoc
foldings are open to user customization, so far).
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 16:29 ` Artur Malabarba
@ 2015-11-28 17:27 ` Eli Zaretskii
2015-11-28 17:44 ` Eli Zaretskii
1 sibling, 0 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-28 17:27 UTC (permalink / raw)
To: bruce.connor.am; +Cc: emacs-devel
> Date: Sat, 28 Nov 2015 16:29:00 +0000
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Cc: emacs-devel <emacs-devel@gnu.org>
>
> It is now pushed.
Thanks!
> - 'ff' matches 'ff'
> - 'ff' does NOT match 'ff'. This is by choice, because the
> decomposition of 'ff' is actually (compat f f). We can change this
> choice if desired.
I think the last one should also match. It is very hard to explain to
users this asymmetry.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 16:29 ` Artur Malabarba
2015-11-28 17:27 ` Eli Zaretskii
@ 2015-11-28 17:44 ` Eli Zaretskii
2015-11-28 18:31 ` Artur Malabarba
1 sibling, 1 reply; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-28 17:44 UTC (permalink / raw)
To: bruce.connor.am; +Cc: emacs-devel
> Date: Sat, 28 Nov 2015 16:29:00 +0000
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Cc: emacs-devel <emacs-devel@gnu.org>
>
> It is now pushed. I changed quite a bit of the logic, so please do
> look out for regressions.
Two of the tests are failing for me:
Test character-fold--test-consistency condition:
(invalid-regexp "Regular expression too big")
FAILED 1/4 character-fold--test-consistency
passed 2/4 character-fold--test-fold-to-regexp
Test character-fold--test-lax-whitespace condition:
(invalid-regexp "Regular expression too big")
FAILED 3/4 character-fold--test-lax-whitespace
passed 4/4 character-fold--test-some-defaults
Let me know if I can provide more information.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 17:44 ` Eli Zaretskii
@ 2015-11-28 18:31 ` Artur Malabarba
2015-11-28 18:57 ` Eli Zaretskii
0 siblings, 1 reply; 94+ messages in thread
From: Artur Malabarba @ 2015-11-28 18:31 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 919 bytes --]
On 28 Nov 2015 5:44 pm, "Eli Zaretskii" <eliz@gnu.org> wrote:
> Two of the tests are failing for me:
>
> Test character-fold--test-consistency condition:
> (invalid-regexp "Regular expression too big")
> FAILED 1/4 character-fold--test-consistency
> passed 2/4 character-fold--test-fold-to-regexp
> Test character-fold--test-lax-whitespace condition:
> (invalid-regexp "Regular expression too big")
> FAILED 3/4 character-fold--test-lax-whitespace
> passed 4/4 character-fold--test-some-defaults
>
> Let me know if I can provide more information.
Yes, I was getting this too. I reduced the length of the random strings in
the test from 100 to 50 in order to stop getting this. But it looks like
your system wants it to be even lower.
Can you try reducing it a bit more?
Sadly, if you're forced to make it too small, then we'll have to think of
another way to handle this.
[-- Attachment #2: Type: text/html, Size: 1216 bytes --]
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: character folding future [was: Questions about isearch]
2015-11-28 16:48 ` character folding future [was: Questions about isearch] Drew Adams
@ 2015-11-28 18:34 ` Artur Malabarba
2015-12-01 11:34 ` Artur Malabarba
1 sibling, 0 replies; 94+ messages in thread
From: Artur Malabarba @ 2015-11-28 18:34 UTC (permalink / raw)
To: Drew Adams; +Cc: Eli Zaretskii, Stephen Berman, emacs-devel
2015-11-28 16:48 GMT+00:00 Drew Adams <drew.adams@oracle.com>:
>
> A description is here:
> http://www.emacswiki.org/emacs/CharacterFoldPlus.
> The code is here:
> http://www.emacswiki.org/emacs/download/character-fold%2b.el
Thanks for the links Drew. I'll have a look at your code to see how
you tackled these items.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 18:31 ` Artur Malabarba
@ 2015-11-28 18:57 ` Eli Zaretskii
2015-11-28 20:00 ` Artur Malabarba
0 siblings, 1 reply; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-28 18:57 UTC (permalink / raw)
To: bruce.connor.am; +Cc: emacs-devel
> Date: Sat, 28 Nov 2015 18:31:58 +0000
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Cc: emacs-devel <emacs-devel@gnu.org>
>
> > Test character-fold--test-consistency condition:
> > (invalid-regexp "Regular expression too big")
> > FAILED 1/4 character-fold--test-consistency
> > passed 2/4 character-fold--test-fold-to-regexp
> > Test character-fold--test-lax-whitespace condition:
> > (invalid-regexp "Regular expression too big")
> > FAILED 3/4 character-fold--test-lax-whitespace
> > passed 4/4 character-fold--test-some-defaults
> >
> > Let me know if I can provide more information.
>
> Yes, I was getting this too. I reduced the length of the random strings in the
> test from 100 to 50 in order to stop getting this. But it looks like your
> system wants it to be even lower.
>
> Can you try reducing it a bit more?
This works for me:
diff --git a/test/automated/character-fold-tests.el b/test/automated/character-fold-tests.el
index 3a288b9..cf19584 100644
--- a/test/automated/character-fold-tests.el
+++ b/test/automated/character-fold-tests.el
@@ -37,13 +37,13 @@ character-fold--test-search-with-contents
\f
(ert-deftest character-fold--test-consistency ()
- (dotimes (n 50)
+ (dotimes (n 30)
(let ((w (character-fold--random-word n)))
;; A folded string should always match the original string.
(character-fold--test-search-with-contents w w))))
(ert-deftest character-fold--test-lax-whitespace ()
- (dotimes (n 50)
+ (dotimes (n 40)
(let ((w1 (character-fold--random-word n))
(w2 (character-fold--random-word n))
(search-spaces-regexp "\\s-+"))
^ permalink raw reply related [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 18:57 ` Eli Zaretskii
@ 2015-11-28 20:00 ` Artur Malabarba
2015-11-28 20:08 ` Artur Malabarba
0 siblings, 1 reply; 94+ messages in thread
From: Artur Malabarba @ 2015-11-28 20:00 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
Ok. I think that's still plausible for interactive uses. I'll add a
comment to the docstring warning about the danger of long regexps, and
I'll make sure isearch acts gracefully if such a situation is ever
encountered.
Still, this function can probably be optimized. I'll try to revisit it
before release.
2015-11-28 18:57 GMT+00:00 Eli Zaretskii <eliz@gnu.org>:
>> Date: Sat, 28 Nov 2015 18:31:58 +0000
>> From: Artur Malabarba <bruce.connor.am@gmail.com>
>> Cc: emacs-devel <emacs-devel@gnu.org>
>>
>> > Test character-fold--test-consistency condition:
>> > (invalid-regexp "Regular expression too big")
>> > FAILED 1/4 character-fold--test-consistency
>> > passed 2/4 character-fold--test-fold-to-regexp
>> > Test character-fold--test-lax-whitespace condition:
>> > (invalid-regexp "Regular expression too big")
>> > FAILED 3/4 character-fold--test-lax-whitespace
>> > passed 4/4 character-fold--test-some-defaults
>> >
>> > Let me know if I can provide more information.
>>
>> Yes, I was getting this too. I reduced the length of the random strings in the
>> test from 100 to 50 in order to stop getting this. But it looks like your
>> system wants it to be even lower.
>>
>> Can you try reducing it a bit more?
>
> This works for me:
>
> diff --git a/test/automated/character-fold-tests.el b/test/automated/character-fold-tests.el
> index 3a288b9..cf19584 100644
> --- a/test/automated/character-fold-tests.el
> +++ b/test/automated/character-fold-tests.el
> @@ -37,13 +37,13 @@ character-fold--test-search-with-contents
>
>
> (ert-deftest character-fold--test-consistency ()
> - (dotimes (n 50)
> + (dotimes (n 30)
> (let ((w (character-fold--random-word n)))
> ;; A folded string should always match the original string.
> (character-fold--test-search-with-contents w w))))
>
> (ert-deftest character-fold--test-lax-whitespace ()
> - (dotimes (n 50)
> + (dotimes (n 40)
> (let ((w1 (character-fold--random-word n))
> (w2 (character-fold--random-word n))
> (search-spaces-regexp "\\s-+"))
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 20:00 ` Artur Malabarba
@ 2015-11-28 20:08 ` Artur Malabarba
2015-11-28 20:47 ` Eli Zaretskii
0 siblings, 1 reply; 94+ messages in thread
From: Artur Malabarba @ 2015-11-28 20:08 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
2015-11-28 20:00 GMT+00:00 Artur Malabarba <bruce.connor.am@gmail.com>:
> Ok. I think that's still plausible for interactive uses. I'll add a
> comment to the docstring warning about the danger of long regexps, and
> I'll make sure isearch acts gracefully if such a situation is ever
> encountered.
Or is there a fixed limit I can use for the max size of a regexp?
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 20:08 ` Artur Malabarba
@ 2015-11-28 20:47 ` Eli Zaretskii
0 siblings, 0 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-28 20:47 UTC (permalink / raw)
To: bruce.connor.am; +Cc: emacs-devel
> Date: Sat, 28 Nov 2015 20:08:52 +0000
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Cc: emacs-devel <emacs-devel@gnu.org>
>
> 2015-11-28 20:00 GMT+00:00 Artur Malabarba <bruce.connor.am@gmail.com>:
> > Ok. I think that's still plausible for interactive uses. I'll add a
> > comment to the docstring warning about the danger of long regexps, and
> > I'll make sure isearch acts gracefully if such a situation is ever
> > encountered.
>
> Or is there a fixed limit I can use for the max size of a regexp?
AFAIU, it's MAX_BUF_SIZE in regex.c.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 10:23 ` Artur Malabarba
` (2 preceding siblings ...)
2015-11-28 16:48 ` character folding future [was: Questions about isearch] Drew Adams
@ 2015-11-29 6:03 ` Richard Stallman
2015-11-29 15:48 ` Eli Zaretskii
2015-11-29 9:39 ` Andreas Röhler
4 siblings, 1 reply; 94+ messages in thread
From: Richard Stallman @ 2015-11-29 6:03 UTC (permalink / raw)
To: bruce.connor.am; +Cc: eliz, stephen.berman, emacs-devel
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
Would you please set it up so that ^J in search
does not match anything but a newline?
I often used to search for a blank line with C-s C-j C-j.
I often used to search for WORD at the start of a line
with C-s C-j WORD. They are both broken now.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 10:03 ` Artur Malabarba
2015-11-27 10:29 ` Eli Zaretskii
@ 2015-11-29 9:08 ` Andreas Röhler
1 sibling, 0 replies; 94+ messages in thread
From: Andreas Röhler @ 2015-11-29 9:08 UTC (permalink / raw)
To: bruce.connor.am, Eli Zaretskii; +Cc: emacs-devel
Am 27.11.2015 um 11:03 schrieb Artur Malabarba:
> On 27 Nov 2015 8:57 am, "Eli Zaretskii" <eliz@gnu.org
> <mailto:eliz@gnu.org>> wrote:
> > > Considering language special cases worldwide at core will run into
> infinity.
> >
> > The number of languages is finite.
> >
> > > Would expect support of unicode-characters. Mapping them should be the
> > > task of special language-modes built upon, i.e. a text-norwegian etc.
> >
> > The question I asked is should we do that _in_general_? If the answer
> > is YES, then the language-specific rules might tell _how_ to do that
> > in each case. But that's a different issue.
>
> I think this topic goes beyond isearch, and people not reading the
> current thread might interested in it. Maybe we should should start a
> new thread just for this special "language support". Starting by listing
> points of what would be the goals of such a feature.
>
As isearch accepts a regexps as argument, it should be possible to
implement mode-specific commands.
Probably not a task of the core.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-28 10:23 ` Artur Malabarba
` (3 preceding siblings ...)
2015-11-29 6:03 ` Questions about isearch Richard Stallman
@ 2015-11-29 9:39 ` Andreas Röhler
2015-11-29 15:52 ` Eli Zaretskii
2015-11-30 16:05 ` Paul Eggert
4 siblings, 2 replies; 94+ messages in thread
From: Andreas Röhler @ 2015-11-29 9:39 UTC (permalink / raw)
To: emacs-devel; +Cc: Eli Zaretskii, Artur Malabarba
Am 28.11.2015 um 11:23 schrieb Artur Malabarba:
> Ok. I'm going to work on the char-folding a little bit more today to
> implement support for multi-char matches and to combine it with
> case-folding. Hopefully that will iron out the final inconsistencies.
As mentioned ealier, this runs into the infinite.
Not only new languages arise every day. Think also at ancient languages.
Think at math and new symbolic languages. The possibilities of combining
known and still unknown characters tend to be infinite.
Char-folding is an indo-european centric sledge-hammer: heavy and limited.
>
> 2015-11-28 9:46 GMT+00:00 Eli Zaretskii <eliz@gnu.org>:
>>> From: Marcin Borkowski <mbork@mbork.pl>
>>> Date: Sat, 28 Nov 2015 09:40:06 +0100
>>> Cc: eliz@gnu.org, Stephen Berman <stephen.berman@gmx.net>,
>>> bruce.connor.am@gmail.com, emacs-devel@gnu.org
>>>
>>> Watching this discussion, I'm just astonished that no-one complained
>>> (yet?) that searching for "et" does not find "&" (and/or vice versa).
>>
>> Why complain? Emacs lets you customize this feature to do that as
>> well.
>
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-29 6:03 ` Questions about isearch Richard Stallman
@ 2015-11-29 15:48 ` Eli Zaretskii
0 siblings, 0 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-29 15:48 UTC (permalink / raw)
To: rms; +Cc: stephen.berman, bruce.connor.am, emacs-devel
> From: Richard Stallman <rms@gnu.org>
> CC: eliz@gnu.org, stephen.berman@gmx.net, emacs-devel@gnu.org
> Date: Sun, 29 Nov 2015 01:03:40 -0500
>
> Would you please set it up so that ^J in search
> does not match anything but a newline?
>
> I often used to search for a blank line with C-s C-j C-j.
> I often used to search for WORD at the start of a line
> with C-s C-j WORD. They are both broken now.
Both of these work for me, on the emacs-25 branch and on master. When
did you last update your repository?
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-29 9:39 ` Andreas Röhler
@ 2015-11-29 15:52 ` Eli Zaretskii
2015-11-30 9:39 ` Andreas Röhler
2015-11-30 16:05 ` Paul Eggert
1 sibling, 1 reply; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-29 15:52 UTC (permalink / raw)
To: Andreas Röhler; +Cc: bruce.connor.am, emacs-devel
> Cc: Eli Zaretskii <eliz@gnu.org>, Artur Malabarba
> <bruce.connor.am@gmail.com>, Marcin Borkowski <mbork@mbork.pl>
> From: Andreas Röhler <andreas.roehler@online.de>
> Date: Sun, 29 Nov 2015 10:39:06 +0100
>
> Not only new languages arise every day. Think also at ancient languages.
Emacs's search capabilities are language-agnostic. So the number of
languages, whether finite or infinite, doesn't affect the issues being
discussed.
Generally, with very few exceptions, letters and symbols that belong
to some script are not folded to or with characters of other scripts,
because the Unicode database precludes that. So each new language and
script simply adds more assigned codepoints for its characters, but
has no effect whatsoever on character folding or on Emacs search
capabilities.
> Think at math and new symbolic languages.
Are you saying that searching for א should not find ℵ? Or that
looking for π should not find ℼ? Or 1 shouldn't find 𝟏? Not even as
an option? Why should we deprive Emacs users of such an important
feature? Those who don't want it can always customize their Emacs not
to do that.
> Char-folding is an indo-european centric sledge-hammer: heavy and limited.
This is nothing but unfounded name-calling. Please don't.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-29 15:52 ` Eli Zaretskii
@ 2015-11-30 9:39 ` Andreas Röhler
2015-11-30 15:53 ` Eli Zaretskii
0 siblings, 1 reply; 94+ messages in thread
From: Andreas Röhler @ 2015-11-30 9:39 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: bruce.connor.am, emacs-devel
Am 29.11.2015 um 16:52 schrieb Eli Zaretskii:
>> Cc: Eli Zaretskii <eliz@gnu.org>, Artur Malabarba
>> <bruce.connor.am@gmail.com>, Marcin Borkowski <mbork@mbork.pl>
>> From: Andreas Röhler <andreas.roehler@online.de>
>> Date: Sun, 29 Nov 2015 10:39:06 +0100
>>
>> Not only new languages arise every day. Think also at ancient languages.
>
> Emacs's search capabilities are language-agnostic.
AFAIU notion of case-folding refers to the idea of upper- or lowercase -
which makes sense only with a couple of languages.
If case-folding is off by default, there should not be no harm.
So let's wait for the next report resp. feature requests WRT to folding.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-30 9:39 ` Andreas Röhler
@ 2015-11-30 15:53 ` Eli Zaretskii
0 siblings, 0 replies; 94+ messages in thread
From: Eli Zaretskii @ 2015-11-30 15:53 UTC (permalink / raw)
To: Andreas Röhler; +Cc: bruce.connor.am, emacs-devel
> Cc: emacs-devel@gnu.org, bruce.connor.am@gmail.com, mbork@mbork.pl
> From: Andreas Röhler <andreas.roehler@online.de>
> Date: Mon, 30 Nov 2015 10:39:45 +0100
>
> Emacs's search capabilities are language-agnostic.
>
> AFAIU notion of case-folding refers to the idea of upper- or lowercase - which makes sense only with a couple of languages.
That appears to be incorrect, because UCD, the Unicode Character
Database, is not tailored to any language in particular, and yet it
does specify letter-case pairs for many characters beyond ASCII. The
language-specific variations to this basic data are then provided by
further databases, such as CLDR.
Emacs currently supports only the language-independent part of case
folding, character equivalences, and other related features. Addition
of new languages does not and cannot affect that.
> If case-folding is off by default, there should not be no harm.
Case folding was ON by default in Emacs since about forever. It is
natural to many, and can be easily turned off by those who don't like
it. That is why, IMO, we hear almost no complaints about it.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-29 9:39 ` Andreas Röhler
2015-11-29 15:52 ` Eli Zaretskii
@ 2015-11-30 16:05 ` Paul Eggert
1 sibling, 0 replies; 94+ messages in thread
From: Paul Eggert @ 2015-11-30 16:05 UTC (permalink / raw)
To: Andreas Röhler, emacs-devel
On 11/29/2015 01:39 AM, Andreas Röhler wrote:
> Char-folding is an indo-european centric sledge-hammer
While that may be true, it is a very commonly-used sledgehammer.
Sometimes sledgehammers are good tools to use.
As Eli says, case-folding has been on by default in Emacs for ages.
Case-folding is also Indo-European-centric, but that has been OK. As
char-folding by and large does not affect unicase alphabets I don't see
why readers and writers of non-Indo-European text would care about it
one way or another.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: character folding future [was: Questions about isearch]
2015-11-28 16:48 ` character folding future [was: Questions about isearch] Drew Adams
2015-11-28 18:34 ` Artur Malabarba
@ 2015-12-01 11:34 ` Artur Malabarba
2015-12-01 15:48 ` Drew Adams
1 sibling, 1 reply; 94+ messages in thread
From: Artur Malabarba @ 2015-12-01 11:34 UTC (permalink / raw)
To: Drew Adams; +Cc: Eli Zaretskii, Stephen Berman, Richard Stallman, emacs-devel
2015-11-28 16:48 GMT+00:00 Drew Adams <drew.adams@oracle.com>:
> 1. An option, `char-fold-ad-hoc', for the ad hoc char foldings.
> Default value: the same ad hoc foldings as vanilla Emacs
> (quotation marks).
Thanks for the code again. A list of ad-hoc foldings for the user to
customize (your `char-fold-ad-hoc') is something I want too (ideally,
as soon as 25.1). The reason I didn't include it initially is that the
character-fold-table can take many seconds to generate, so it's pretty
important that it be generated at compile time.
I suppose one solution is to make it a defcustom with a :set property
that updates the char-fold-table, and clearly state in the docstring
that editing this variable can add several seconds to emacs startup
time.
^ permalink raw reply [flat|nested] 94+ messages in thread
* RE: character folding future [was: Questions about isearch]
2015-12-01 11:34 ` Artur Malabarba
@ 2015-12-01 15:48 ` Drew Adams
2015-12-03 23:54 ` Artur Malabarba
0 siblings, 1 reply; 94+ messages in thread
From: Drew Adams @ 2015-12-01 15:48 UTC (permalink / raw)
To: bruce.connor.am
Cc: Eli Zaretskii, Stephen Berman, Richard Stallman, emacs-devel
> > 1. An option, `char-fold-ad-hoc', for the ad hoc char foldings.
> > Default value: the same ad hoc foldings as vanilla Emacs
> > (quotation marks).
>
> Thanks for the code again. A list of ad-hoc foldings for the user to
> customize (your `char-fold-ad-hoc') is something I want too (ideally,
> as soon as 25.1). The reason I didn't include it initially is that the
> character-fold-table can take many seconds to generate, so it's pretty
> important that it be generated at compile time.
>
> I suppose one solution is to make it a defcustom with a :set property
> that updates the char-fold-table, and clearly state in the docstring
> that editing this variable can add several seconds to emacs startup
> time.
1. Do you really see that "character-fold-table can take many
seconds to generate"? I don't see that, AFAICT.
2. Have you tried it? What difference do you see in the
generation time? Did you really see that it "can add
several seconds"?
3. I do it now in `character-fold+.el'. (Did it for
`char-fold-symmetric' from the outset, but just now added
it also for `char-fold-ad-hoc'.) And AFAICT there is no
noticeable time difference in initializing, and none for
updating `character-fold-table' after a user customizes
`char-fold-ad-hoc'.
Not noticeable is quite different from "many seconds" or
"several seconds". Maybe this is platform dependent?
I'm using MS Windows 7 on an average laptop that is a
few years old (nothing special wrt memory or CPU).
Do you see the same thing I see, in terms of time, if
you try `character-fold+.el'?
4. There _is_ a noticeable delay when a user customizes
`char-fold-symmetric', of course - that does a lot more
work. But there is no delay for that initially. It is
off by default, so `update-char-fold-table' does nothing
with it, except when a user customizes or toggles it on.
5. I think it makes sense in any case to factor out the
code that creates/updates the table (as in my function
`update-char-fold-table').
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: character folding future [was: Questions about isearch]
2015-12-01 15:48 ` Drew Adams
@ 2015-12-03 23:54 ` Artur Malabarba
0 siblings, 0 replies; 94+ messages in thread
From: Artur Malabarba @ 2015-12-03 23:54 UTC (permalink / raw)
To: Drew Adams; +Cc: Eli Zaretskii, Stephen Berman, Richard Stallman, emacs-devel
Drew Adams <drew.adams@oracle.com> writes:
> 1. Do you really see that "character-fold-table can take many
> seconds to generate"? I don't see that, AFAICT.
>
> 2. ...
No, you're right. I guess I was still carrying my memories from the
initial implementations, which did take a few seconds. The current
version takes ~0.3 sec on my machine if byte-compiled.
While that's far from pleasant (for a lot of people, +0.3 sec of startup
time would be noticeable), I guess it's reasonable enough. After all,
the user will be warned in the docstring about this caveat.
> 5. I think it makes sense in any case to factor out the
> code that creates/updates the table (as in my function
> `update-char-fold-table').
I agree.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2015-11-27 16:50 ` Per Starbäck
2015-11-27 18:10 ` Artur Malabarba
2015-11-27 21:33 ` raman
@ 2016-02-28 0:27 ` Mathias Dahl
2016-02-28 15:58 ` Eli Zaretskii
2 siblings, 1 reply; 94+ messages in thread
From: Mathias Dahl @ 2016-02-28 0:27 UTC (permalink / raw)
To: Per Starbäck, emacs-devel@gnu.org; +Cc: Eli Zaretskii, Artur Malabarba
[-- Attachment #1: Type: text/plain, Size: 2636 bytes --]
On Fri, Nov 27, 2015 at 5:50 PM, Per Starbäck <per.starback@gmail.com>
wrote:
One answer I got was that it's possible to turn this off. Yes, it is,
> but defaults are important for what impression you give. I haven't
> been active on the list for some time, but when I have expressed
> opinions on Emacs here before it has often been not thinking about
> myself, but thinking about the students that I teach Emacs, so that
> *I* can change settings is not enough for my consideration.
>
> Also character folding is a great feature! I don't want to turn it
> off! It's just that it's bad to fold characters that are in no way
> seen as variants but totally different letters.
>
I agree with Per that this new feature is problematic. I have used Emacs
for soon 20 years and up until now, if I search for an "a" I find only
"a". From my view, suddenly finding "ä" or "å" as well would, in my
view, be to find "false hits". Surely one could argue that case folding
has the same problem but I think those are less and it has been the
default for as long as I have used Emacs and I think it is common in
most programs to have this behavior by default. This new feature however
I cannot remember seeing anywhere so it cannot be that important to have
it turned on by default.
I am sure the new feature is useful to some, but for me it will just be
annoying. I have "ä" and "å" keys on my keyboard so I have no problem
inputting them. When I visit other countries where people does not have
such keyboards I simply turn on the Swedish input method swedish-postfix
to enter these letters.
I think having this feature on by default might risk annoy more users
than it will benefit. I'm quite certain, if I tried to get a Swedish
college to try out Emacs, that they would comment on such a feature as
being quite strange. I do not agree it would be the same as finding a "u"
when searching for "v", but still...
Now that I know about this feature I will turn it off and enable it only
when I need it, but I wish it would have been the other way around, that
users which needs it would need to enable it.
Sorry for coming late to the party on this one...
/Mathias
PS. Per mentioned that the scenario with "ide" matching "idé" would be
okay. I'm divided on that one. "é" is not an official part of the
Swedish alhpabet, like "å" and "ä", so from some perspective it would be
okay, but it feels like a very slippery slope... Probably, as some has
advocated here, if there would be a way to express the language for a
buffer or region of text a feature like this *might* fit better.
[-- Attachment #2: Type: text/html, Size: 3635 bytes --]
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2016-02-28 0:27 ` Mathias Dahl
@ 2016-02-28 15:58 ` Eli Zaretskii
2016-02-28 17:52 ` Mathias Dahl
0 siblings, 1 reply; 94+ messages in thread
From: Eli Zaretskii @ 2016-02-28 15:58 UTC (permalink / raw)
To: Mathias Dahl; +Cc: per.starback, bruce.connor.am, emacs-devel
> From: Mathias Dahl <mathias.dahl@gmail.com>
> Date: Sun, 28 Feb 2016 01:27:10 +0100
> Cc: Eli Zaretskii <eliz@gnu.org>, Artur Malabarba <bruce.connor.am@gmail.com>
>
> I agree with Per that this new feature is problematic. I have used Emacs
> for soon 20 years and up until now, if I search for an "a" I find only
> "a". From my view, suddenly finding "ä" or "å" as well would, in my
> view, be to find "false hits".
What about finding "ä" (a 2-character sequence) when looking for "ä", or
finding "å" (1 character) when looking for "å" (2 characters) -- would
you consider these false hits as well?
> Surely one could argue that case folding has the same problem but I
> think those are less and it has been the default for as long as I
> have used Emacs and I think it is common in most programs to have
> this behavior by default. This new feature however I cannot remember
> seeing anywhere so it cannot be that important to have it turned on
> by default.
Emacs has many features on by default that are not anywhere else, or
weren't when Emacs introduced them. So I don't think this argument
should guide our decisions.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2016-02-28 15:58 ` Eli Zaretskii
@ 2016-02-28 17:52 ` Mathias Dahl
2016-02-28 18:02 ` Eli Zaretskii
0 siblings, 1 reply; 94+ messages in thread
From: Mathias Dahl @ 2016-02-28 17:52 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Per Starbäck, Artur Malabarba, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 2342 bytes --]
>
> > I agree with Per that this new feature is problematic. I have used Emacs
> > for soon 20 years and up until now, if I search for an "a" I find only
> > "a". From my view, suddenly finding "ä" or "å" as well would, in my
> > view, be to find "false hits".
>
> What about finding "ä" (a 2-character sequence) when looking for "ä", or
> finding "å" (1 character) when looking for "å" (2 characters) -- would
> you consider these false hits as well?
>
I have not thought about that scenario (in fact, I did not know there was
a difference), but since it visually looks the same I would probably be
surprised
to not find the former when searching using the latter. It is a scenario
that I
would think is extremely unlikely to happen for "ä-users" like me though
but I guess that is just anecdotal evidence.
> Surely one could argue that case folding has the same problem but I
> > think those are less and it has been the default for as long as I
> > have used Emacs and I think it is common in most programs to have
> > this behavior by default. This new feature however I cannot remember
> > seeing anywhere so it cannot be that important to have it turned on
> > by default.
>
> Emacs has many features on by default that are not anywhere else, or
> weren't when Emacs introduced them. So I don't think this argument
> should guide our decisions.
>
I don't agree. Just because this is not common in other places does not
mean
we must use that as the sole argument for such a decision, but I definitely
think it can *guide* us, together with other arguments.
Much better, of course, would be a poll among users. Since I came late to
this discussion I don't know if such a poll was done. I have not heard
about
the use cases for this change either. In what scenarios is this useful, and
does
those scenarios happen often enough to motivate such a feature being on by
default (and does it outnumber the cases where it causes problems)? I might
possibly use this feature myself sometime, but it will not be the normal
case.
I view this a bit like the difference between a normal, and a regexp
isearch, with
the difference that I would use this much less often than I use regexp
isearch. Or
"word isearch", which I never use (possibly because I don't have much need
for
it).
[-- Attachment #2: Type: text/html, Size: 3490 bytes --]
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2016-02-28 17:52 ` Mathias Dahl
@ 2016-02-28 18:02 ` Eli Zaretskii
2016-02-29 13:32 ` Richard Stallman
0 siblings, 1 reply; 94+ messages in thread
From: Eli Zaretskii @ 2016-02-28 18:02 UTC (permalink / raw)
To: Mathias Dahl; +Cc: per.starback, bruce.connor.am, emacs-devel
> From: Mathias Dahl <mathias.dahl@gmail.com>
> Date: Sun, 28 Feb 2016 18:52:38 +0100
> Cc: Per Starbäck <per.starback@gmail.com>,
> Artur Malabarba <bruce.connor.am@gmail.com>, emacs-devel@gnu.org
>
> Much better, of course, would be a poll among users. Since I came late to
> this discussion I don't know if such a poll was done.
The discussions were right here, you can simply read them (provided
that you have enough time ;-):
http://lists.gnu.org/archive/html/emacs-devel/2016-02/msg00089.html
http://lists.gnu.org/archive/html/emacs-devel/2016-02/msg00506.html
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2016-02-28 18:02 ` Eli Zaretskii
@ 2016-02-29 13:32 ` Richard Stallman
2016-02-29 16:04 ` Eli Zaretskii
0 siblings, 1 reply; 94+ messages in thread
From: Richard Stallman @ 2016-02-29 13:32 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: per.starback, emacs-devel, bruce.connor.am, mathias.dahl
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> > Much better, of course, would be a poll among users. Since I came late to
> > this discussion I don't know if such a poll was done.
> The discussions were right here, you can simply read them (provided
> that you have enough time ;-):
A discussion here is the first step, but not a substitute for a poll
of users.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2016-02-29 13:32 ` Richard Stallman
@ 2016-02-29 16:04 ` Eli Zaretskii
2016-03-01 16:52 ` Richard Stallman
0 siblings, 1 reply; 94+ messages in thread
From: Eli Zaretskii @ 2016-02-29 16:04 UTC (permalink / raw)
To: rms; +Cc: per.starback, emacs-devel, bruce.connor.am, mathias.dahl
> From: Richard Stallman <rms@gnu.org>
> CC: mathias.dahl@gmail.com, per.starback@gmail.com,
> bruce.connor.am@gmail.com, emacs-devel@gnu.org
> Date: Mon, 29 Feb 2016 08:32:05 -0500
>
> > > Much better, of course, would be a poll among users. Since I came late to
> > > this discussion I don't know if such a poll was done.
>
> > The discussions were right here, you can simply read them (provided
> > that you have enough time ;-):
>
> A discussion here is the first step, but not a substitute for a poll
> of users.
That discussion is the closest approximation to a poll we had, so
reading it should probably be useful for someone who missed it.
^ permalink raw reply [flat|nested] 94+ messages in thread
* Re: Questions about isearch
2016-02-29 16:04 ` Eli Zaretskii
@ 2016-03-01 16:52 ` Richard Stallman
0 siblings, 0 replies; 94+ messages in thread
From: Richard Stallman @ 2016-03-01 16:52 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: per.starback, emacs-devel, bruce.connor.am, mathias.dahl
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> > A discussion here is the first step, but not a substitute for a poll
> > of users.
> That discussion is the closest approximation to a poll we had, so
> reading it should probably be useful for someone who missed it.
It is probably pertinent reading, but if it was only on this list,
it doesn't come close to a poll of the users.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 94+ messages in thread
end of thread, other threads:[~2016-03-01 16:52 UTC | newest]
Thread overview: 94+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-25 18:41 Questions about isearch Eli Zaretskii
2015-11-25 19:20 ` Rasmus
2015-11-25 20:02 ` Steinar Bang
2015-11-26 14:46 ` Richard Stallman
2015-11-26 16:22 ` Eli Zaretskii
2015-11-26 20:46 ` Per Starbäck
2015-11-26 21:02 ` Eli Zaretskii
2015-11-26 21:35 ` Marcin Borkowski
2015-11-27 7:43 ` Eli Zaretskii
2015-11-27 6:38 ` Richard Stallman
2015-11-27 8:53 ` Eli Zaretskii
2015-11-27 16:21 ` raman
2015-11-26 23:18 ` Rasmus
2015-11-27 7:46 ` Eli Zaretskii
2015-11-27 6:37 ` Richard Stallman
2015-11-27 8:39 ` Eli Zaretskii
2015-11-25 20:10 ` Eli Zaretskii
2015-11-25 20:41 ` Mike Kupfer
2015-11-25 20:56 ` Eli Zaretskii
2015-11-25 20:14 ` Artur Malabarba
2015-11-25 20:30 ` Marcin Borkowski
2015-11-25 20:38 ` Eli Zaretskii
2015-11-25 21:58 ` Artur Malabarba
2015-11-25 23:04 ` Mike Kupfer
2015-11-26 3:40 ` Eli Zaretskii
2015-11-27 19:50 ` Mike Kupfer
2015-11-27 20:06 ` Eli Zaretskii
2015-11-27 23:57 ` Artur Malabarba
2015-11-28 1:36 ` Mike Kupfer
2015-11-28 9:28 ` Eli Zaretskii
2015-11-26 13:28 ` Steinar Bang
2015-11-25 20:36 ` Eli Zaretskii
2015-11-25 21:49 ` Artur Malabarba
2015-11-26 3:34 ` Eli Zaretskii
2015-11-27 12:03 ` Artur Malabarba
2015-11-27 14:36 ` Eli Zaretskii
2015-11-27 16:50 ` Per Starbäck
2015-11-27 18:10 ` Artur Malabarba
2015-11-27 18:42 ` Per Starbäck
2015-11-27 21:33 ` raman
2016-02-28 0:27 ` Mathias Dahl
2016-02-28 15:58 ` Eli Zaretskii
2016-02-28 17:52 ` Mathias Dahl
2016-02-28 18:02 ` Eli Zaretskii
2016-02-29 13:32 ` Richard Stallman
2016-02-29 16:04 ` Eli Zaretskii
2016-03-01 16:52 ` Richard Stallman
2015-11-27 16:55 ` Artur Malabarba
2015-11-27 17:52 ` Eli Zaretskii
2015-11-27 21:18 ` Stephen Berman
2015-11-28 0:04 ` Artur Malabarba
2015-11-28 7:49 ` Eli Zaretskii
2015-11-28 16:14 ` Stephen Berman
2015-11-28 5:36 ` Richard Stallman
2015-11-28 8:33 ` Eli Zaretskii
2015-11-28 8:40 ` Marcin Borkowski
2015-11-28 9:46 ` Eli Zaretskii
2015-11-28 10:23 ` Artur Malabarba
2015-11-28 11:14 ` Eli Zaretskii
2015-11-28 14:41 ` Eli Zaretskii
2015-11-28 15:41 ` Artur Malabarba
2015-11-28 16:29 ` Artur Malabarba
2015-11-28 17:27 ` Eli Zaretskii
2015-11-28 17:44 ` Eli Zaretskii
2015-11-28 18:31 ` Artur Malabarba
2015-11-28 18:57 ` Eli Zaretskii
2015-11-28 20:00 ` Artur Malabarba
2015-11-28 20:08 ` Artur Malabarba
2015-11-28 20:47 ` Eli Zaretskii
2015-11-28 16:48 ` character folding future [was: Questions about isearch] Drew Adams
2015-11-28 18:34 ` Artur Malabarba
2015-12-01 11:34 ` Artur Malabarba
2015-12-01 15:48 ` Drew Adams
2015-12-03 23:54 ` Artur Malabarba
2015-11-29 6:03 ` Questions about isearch Richard Stallman
2015-11-29 15:48 ` Eli Zaretskii
2015-11-29 9:39 ` Andreas Röhler
2015-11-29 15:52 ` Eli Zaretskii
2015-11-30 9:39 ` Andreas Röhler
2015-11-30 15:53 ` Eli Zaretskii
2015-11-30 16:05 ` Paul Eggert
2015-11-26 16:08 ` Rasmus
2015-11-25 23:15 ` Mike Kupfer
2015-11-26 14:45 ` Richard Stallman
2015-11-27 0:43 ` Juri Linkov
2015-11-27 8:07 ` Eli Zaretskii
2015-11-27 23:24 ` Juri Linkov
2015-11-28 8:09 ` Eli Zaretskii
2015-11-27 8:02 ` Andreas Röhler
2015-11-27 8:57 ` Eli Zaretskii
2015-11-27 10:03 ` Artur Malabarba
2015-11-27 10:29 ` Eli Zaretskii
2015-11-27 10:47 ` Artur Malabarba
2015-11-29 9:08 ` Andreas Röhler
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.