* bug#22038: 25.1.50; Character folding issues with isearch
@ 2015-11-28 16:07 Stephen Berman
2015-11-28 16:39 ` Eli Zaretskii
2015-11-29 0:06 ` bug#22038: 25.1.50; Character folding issues with isearch Artur Malabarba
0 siblings, 2 replies; 11+ messages in thread
From: Stephen Berman @ 2015-11-28 16:07 UTC (permalink / raw)
To: 22038
Issue 1: Please support having multiple characters match a single
string in searches, so that e.g. "ss" can match the German letter "ß".
Issue 2: The current implementation of character folding based on
character decomposition often yields surprising results when searching:
e.g. the search string "f" matches not only "ff" but also "㎙" and "ffl",
but the search strings "m" and "fm" fail to match the former and the
search strings "l" and "fl" fail to match the latter. I think all
recognizable characters in a composite character should be matchable by
searching.
Issue 3: Character folding does not respect case-folding in searches,
e.g. "f" fails to match "ℱ" and "℻" (with case-folding enabled), whereas
"F" does match them, but fails to match "ff".
In GNU Emacs 25.1.50.1 (x86_64-suse-linux-gnu, GTK+ Version 3.14.15)
of 2015-11-20
Repository revision: 5c81fd58e32d965c2551663622e084f2800e1e90
Windowing system distributor 'The X.Org Foundation', version 11.0.11601000
System Description: openSUSE 13.2 (Harlequin) (x86_64)
Configured using:
'configure 'CFLAGS=-Og -g3''
Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND DBUS GCONF GSETTINGS NOTIFY
GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS
GTK3 X11
Important settings:
value of $LANG: en_US.UTF-8
value of $XMODIFIERS: @im=ibus
locale-coding-system: utf-8-unix
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#22038: 25.1.50; Character folding issues with isearch
2015-11-28 16:07 bug#22038: 25.1.50; Character folding issues with isearch Stephen Berman
@ 2015-11-28 16:39 ` Eli Zaretskii
2015-11-28 17:10 ` Stephen Berman
2015-11-29 0:06 ` bug#22038: 25.1.50; Character folding issues with isearch Artur Malabarba
1 sibling, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2015-11-28 16:39 UTC (permalink / raw)
To: Stephen Berman; +Cc: 22038
> From: Stephen Berman <stephen.berman@gmx.net>
> Date: Sat, 28 Nov 2015 17:07:22 +0100
>
> Issue 1: Please support having multiple characters match a single
> string in searches, so that e.g. "ss" can match the German letter "ß".
You mean, allow equivalent strings be of different length, I believe.
(That's the only way I could parse "multiple characters matching a
single string".) We will have that, but it won't allow "ss" to match
"ß", unless you customize character-fold-table to include that. The
reason is that "ß" doesn't have any decompositions in the Unicode
database, so the default character-fold-table doesn't include any
expansions for it.
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#22038: 25.1.50; Character folding issues with isearch
2015-11-28 16:39 ` Eli Zaretskii
@ 2015-11-28 17:10 ` Stephen Berman
2015-11-28 17:40 ` Eli Zaretskii
0 siblings, 1 reply; 11+ messages in thread
From: Stephen Berman @ 2015-11-28 17:10 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 22038
On Sat, 28 Nov 2015 18:39:36 +0200 Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Stephen Berman <stephen.berman@gmx.net>
>> Date: Sat, 28 Nov 2015 17:07:22 +0100
>>
>> Issue 1: Please support having multiple characters match a single
>> string in searches, so that e.g. "ss" can match the German letter "ß".
>
> You mean, allow equivalent strings be of different length, I believe.
Yes.
> (That's the only way I could parse "multiple characters matching a
> single string".) We will have that, but it won't allow "ss" to match
> "ß", unless you customize character-fold-table to include that. The
> reason is that "ß" doesn't have any decompositions in the Unicode
> database, so the default character-fold-table doesn't include any
> expansions for it.
This suggests to me that basing character folding solely on character
decomposition is insufficient. From a user's point of view I see no
reason why the search string "a" under character-folding matches "ä" but
not e.g. "æ". Requiring a customization to get the latter strikes me as
a user-unfriendly crutch to work around a deficient implementation. (I
don't know if it's easy to improve, I'm just giving my impression as a
user.)
Steve Berman
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#22038: 25.1.50; Character folding issues with isearch
2015-11-28 17:10 ` Stephen Berman
@ 2015-11-28 17:40 ` Eli Zaretskii
2015-11-28 18:26 ` Stephen Berman
0 siblings, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2015-11-28 17:40 UTC (permalink / raw)
To: Stephen Berman; +Cc: 22038
> From: Stephen Berman <stephen.berman@gmx.net>
> Cc: 22038@debbugs.gnu.org
> Date: Sat, 28 Nov 2015 18:10:53 +0100
>
> > (That's the only way I could parse "multiple characters matching a
> > single string".) We will have that, but it won't allow "ss" to match
> > "ß", unless you customize character-fold-table to include that. The
> > reason is that "ß" doesn't have any decompositions in the Unicode
> > database, so the default character-fold-table doesn't include any
> > expansions for it.
>
> This suggests to me that basing character folding solely on character
> decomposition is insufficient. From a user's point of view I see no
> reason why the search string "a" under character-folding matches "ä" but
> not e.g. "æ". Requiring a customization to get the latter strikes me as
> a user-unfriendly crutch to work around a deficient implementation. (I
> don't know if it's easy to improve, I'm just giving my impression as a
> user.)
Easiness is not the most important issue here: there's a more basic
problem involved. Both "ß" vs "ss" and "æ" vs "a" (or "ae") are
language-specific: they are only valid matches in the context of
specific languages. AFAIU, that is why they are not in the Unicode
database. And we don't yet have language-specific text processing
capabilities and infrastructure (well, string-collate-lessp and
string-collate-equalp are a beginning, but only that). So allowing
those by default risk running afoul of what users want.
There are more language-specific foldings possible, outside of the
European languages. For example, folding of Arabic positional forms
of the same letter. These are at times much more important than the
above ligatures, and yet we don't support them yet, either.
In this initial release of such functionality I think it is prudent to
go by the standard, because we don't yet have any real-life experience
to build upon. That doesn't cover every possible use case where a
more radical folding would be useful, but we had nothing in Emacs 24,
so this is still a large step in the right direction, IMO. Let's not
bite more than we can chew.
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#22038: 25.1.50; Character folding issues with isearch
2015-11-28 17:40 ` Eli Zaretskii
@ 2015-11-28 18:26 ` Stephen Berman
2015-11-28 18:51 ` Eli Zaretskii
2015-11-29 6:04 ` Richard Stallman
0 siblings, 2 replies; 11+ messages in thread
From: Stephen Berman @ 2015-11-28 18:26 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 22038
On Sat, 28 Nov 2015 19:40:26 +0200 Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Stephen Berman <stephen.berman@gmx.net>
>> Cc: 22038@debbugs.gnu.org
>> Date: Sat, 28 Nov 2015 18:10:53 +0100
>>
>> > (That's the only way I could parse "multiple characters matching a
>> > single string".) We will have that, but it won't allow "ss" to match
>> > "ß", unless you customize character-fold-table to include that. The
>> > reason is that "ß" doesn't have any decompositions in the Unicode
>> > database, so the default character-fold-table doesn't include any
>> > expansions for it.
>>
>> This suggests to me that basing character folding solely on character
>> decomposition is insufficient. From a user's point of view I see no
>> reason why the search string "a" under character-folding matches "ä" but
>> not e.g. "æ". Requiring a customization to get the latter strikes me as
>> a user-unfriendly crutch to work around a deficient implementation. (I
>> don't know if it's easy to improve, I'm just giving my impression as a
>> user.)
>
> Easiness is not the most important issue here: there's a more basic
> problem involved. Both "ß" vs "ss" and "æ" vs "a" (or "ae") are
> language-specific: they are only valid matches in the context of
> specific languages. AFAIU, that is why they are not in the Unicode
> database. And we don't yet have language-specific text processing
> capabilities and infrastructure (well, string-collate-lessp and
> string-collate-equalp are a beginning, but only that). So allowing
> those by default risk running afoul of what users want.
I'm not sure what you mean by "only valid matches in the context of
specific languages", but it sounds like what Per Starbäck said about "ä"
being considered a completely separate character from "a" in Swedish,
unlike in German. Yet if this is a language-specific difference, Emacs
doesn't respect it by default, since "a" does match "ä" under
character-folding. (Or does it fail to do so when
current-language-environment is Swedish? I suspect it doesn't.)
But I know nothing about the Unicode specifications; maybe you are
referring to a more subtle issue, which may be unrelated to my point,
which is simply that I think it should be just as convenient for a user
whose keyboard may lack "ß" or "æ" to match these characters by
searching with "s" or "a" (or "e" or "ae") as it is to match "ff" by
searching with "f". This is not a language-specific issue AFAICS.
Steve Berman
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#22038: 25.1.50; Character folding issues with isearch
2015-11-28 18:26 ` Stephen Berman
@ 2015-11-28 18:51 ` Eli Zaretskii
2015-11-29 6:04 ` Richard Stallman
1 sibling, 0 replies; 11+ messages in thread
From: Eli Zaretskii @ 2015-11-28 18:51 UTC (permalink / raw)
To: Stephen Berman; +Cc: 22038
> From: Stephen Berman <stephen.berman@gmx.net>
> Cc: 22038@debbugs.gnu.org
> Date: Sat, 28 Nov 2015 19:26:20 +0100
>
> > Easiness is not the most important issue here: there's a more basic
> > problem involved. Both "ß" vs "ss" and "æ" vs "a" (or "ae") are
> > language-specific: they are only valid matches in the context of
> > specific languages. AFAIU, that is why they are not in the Unicode
> > database. And we don't yet have language-specific text processing
> > capabilities and infrastructure (well, string-collate-lessp and
> > string-collate-equalp are a beginning, but only that). So allowing
> > those by default risk running afoul of what users want.
>
> I'm not sure what you mean by "only valid matches in the context of
> specific languages", but it sounds like what Per Starbäck said about "ä"
> being considered a completely separate character from "a" in Swedish,
> unlike in German. Yet if this is a language-specific difference, Emacs
> doesn't respect it by default, since "a" does match "ä" under
> character-folding. (Or does it fail to do so when
> current-language-environment is Swedish? I suspect it doesn't.)
We simply go by the decompositions that the (language-agnostic)
Unicode database specified, and do not augment that by anything that
is only valid or relevant in the context of specific languages.
> But I know nothing about the Unicode specifications; maybe you are
> referring to a more subtle issue, which may be unrelated to my point,
> which is simply that I think it should be just as convenient for a user
> whose keyboard may lack "ß" or "æ" to match these characters by
> searching with "s" or "a" (or "e" or "ae") as it is to match "ff" by
> searching with "f". This is not a language-specific issue AFAICS.
The character folding feature is primarily a feature of search, not a
convenience feature for typing characters, although it might double as
that.
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#22038: 25.1.50; Character folding issues with isearch
2015-11-28 18:26 ` Stephen Berman
2015-11-28 18:51 ` Eli Zaretskii
@ 2015-11-29 6:04 ` Richard Stallman
2015-11-29 12:33 ` Character folding issues with isearch (was: bug#22038: 25.1.50; ...) Stephen Berman
1 sibling, 1 reply; 11+ messages in thread
From: Richard Stallman @ 2015-11-29 6:04 UTC (permalink / raw)
To: Stephen Berman; +Cc: 22038
[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]
> But I know nothing about the Unicode specifications;
We don't have an obligation to do as Unicode says, but it is certainly
worth paying attention to what Unicode says.
maybe you are
> referring to a more subtle issue, which may be unrelated to my point,
> which is simply that I think it should be just as convenient for a user
> whose keyboard may lack "ß" or "æ" to match these characters by
> searching with "s" or "a" (or "e" or "ae") as it is to match "ff" by
> searching with "f". This is not a language-specific issue AFAICS.
When a character has a standard equivalent character series in a
certain language, as "ß" does in German, that is a specific powerful
argument for isearch to treat the two as equivalent. But this
argument doesn't apply in a a language-independent way.
--
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Character folding issues with isearch (was: bug#22038: 25.1.50; ...)
2015-11-29 6:04 ` Richard Stallman
@ 2015-11-29 12:33 ` Stephen Berman
0 siblings, 0 replies; 11+ messages in thread
From: Stephen Berman @ 2015-11-29 12:33 UTC (permalink / raw)
To: Richard Stallman; +Cc: emacs-devel
On Sun, 29 Nov 2015 01:04:58 -0500 Richard Stallman <rms@gnu.org> wrote:
>
> > maybe you are
> > referring to a more subtle issue, which may be unrelated to my point,
> > which is simply that I think it should be just as convenient for a user
> > whose keyboard may lack "ß" or "æ" to match these characters by
> > searching with "s" or "a" (or "e" or "ae") as it is to match "ff" by
> > searching with "f". This is not a language-specific issue AFAICS.
>
> When a character has a standard equivalent character series in a
> certain language, as "ß" does in German, that is a specific powerful
> argument for isearch to treat the two as equivalent. But this
> argument doesn't apply in a a language-independent way.
I think I conflated two issues better treated separately. What seems to
me really to be language-independent is being able to match characters
which as written obviously (i.e., whether you know the language or not)
are a combination of characters including the one(s) used in the search
string (whether or not such "compound" characters are Unicode composite
characters); e.g. using "a" (or "e" or "ae") to match "æ" (this includes
not just ligatures but also such composite characters as "℻"). Such
cases I think Emacs should ideally (i.e. WIBNI) handle by default. In
contrast, I now agree that the case of "ß" really is language-specific,
since it is not obvious (unless you know German or the history of this
character) that it is a "compound" including "s". Such cases could
ideally be handled by a language-specific setting or, as a workaround,
by customization.
Steve Berman
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#22038: 25.1.50; Character folding issues with isearch
2015-11-28 16:07 bug#22038: 25.1.50; Character folding issues with isearch Stephen Berman
2015-11-28 16:39 ` Eli Zaretskii
@ 2015-11-29 0:06 ` Artur Malabarba
2015-11-29 9:52 ` Andreas Röhler
1 sibling, 1 reply; 11+ messages in thread
From: Artur Malabarba @ 2015-11-29 0:06 UTC (permalink / raw)
To: Stephen Berman; +Cc: 22038-done
[-- Attachment #1: Type: text/plain, Size: 280 bytes --]
All these items are now implemented. Thanks for filing the issue, Stephen.
We can discuss whether we want to add any ad hoc rules. But I agree with
Eli that I want to wait and see. This is a brand new feature, and I'd
prefer that it's first iteration be just a solid foundation.
[-- Attachment #2: Type: text/html, Size: 323 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* bug#22038: 25.1.50; Character folding issues with isearch
2015-11-29 0:06 ` bug#22038: 25.1.50; Character folding issues with isearch Artur Malabarba
@ 2015-11-29 9:52 ` Andreas Röhler
2015-11-29 15:52 ` Eli Zaretskii
0 siblings, 1 reply; 11+ messages in thread
From: Andreas Röhler @ 2015-11-29 9:52 UTC (permalink / raw)
To: 22038
Am 29.11.2015 um 01:06 schrieb Artur Malabarba:
>
> All these items are now implemented. Thanks for filing the issue,
> Stephen.
>
> We can discuss whether we want to add any ad hoc rules. But I agree
> with Eli that I want to wait and see. This is a brand new feature, and
> I'd prefer that it's first iteration be just a solid foundation.
>
It would make sense to restrict character folding to ASCII. At least
announce that in documentation. In other languages behaviour will remain
hasardous.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-11-29 15:52 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-28 16:07 bug#22038: 25.1.50; Character folding issues with isearch Stephen Berman
2015-11-28 16:39 ` Eli Zaretskii
2015-11-28 17:10 ` Stephen Berman
2015-11-28 17:40 ` Eli Zaretskii
2015-11-28 18:26 ` Stephen Berman
2015-11-28 18:51 ` Eli Zaretskii
2015-11-29 6:04 ` Richard Stallman
2015-11-29 12:33 ` Character folding issues with isearch (was: bug#22038: 25.1.50; ...) Stephen Berman
2015-11-29 0:06 ` bug#22038: 25.1.50; Character folding issues with isearch Artur Malabarba
2015-11-29 9:52 ` Andreas Röhler
2015-11-29 15:52 ` Eli Zaretskii
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.