unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#22038: 25.1.50; Character folding issues with isearch
@ 2015-11-28 16:07 Stephen Berman
  2015-11-28 16:39 ` Eli Zaretskii
  2015-11-29  0:06 ` Artur Malabarba
  0 siblings, 2 replies; 10+ messages in thread
From: Stephen Berman @ 2015-11-28 16:07 UTC (permalink / raw)
  To: 22038

Issue 1: Please support having multiple characters match a single
string in searches, so that e.g. "ss" can match the German letter "ß".

Issue 2: The current implementation of character folding based on
character decomposition often yields surprising results when searching:
e.g. the search string "f" matches not only "ff" but also "㎙" and "ffl",
but the search strings "m" and "fm" fail to match the former and the
search strings "l" and "fl" fail to match the latter.  I think all
recognizable characters in a composite character should be matchable by
searching.

Issue 3: Character folding does not respect case-folding in searches,
e.g. "f" fails to match "ℱ" and "℻" (with case-folding enabled), whereas
"F" does match them, but fails to match "ff".


In GNU Emacs 25.1.50.1 (x86_64-suse-linux-gnu, GTK+ Version 3.14.15)
 of 2015-11-20
Repository revision: 5c81fd58e32d965c2551663622e084f2800e1e90
Windowing system distributor 'The X.Org Foundation', version 11.0.11601000
System Description:	openSUSE 13.2 (Harlequin) (x86_64)

Configured using:
 'configure 'CFLAGS=-Og -g3''

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND DBUS GCONF GSETTINGS NOTIFY
GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS
GTK3 X11

Important settings:
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#22038: 25.1.50; Character folding issues with isearch
  2015-11-28 16:07 bug#22038: 25.1.50; Character folding issues with isearch Stephen Berman
@ 2015-11-28 16:39 ` Eli Zaretskii
  2015-11-28 17:10   ` Stephen Berman
  2015-11-29  0:06 ` Artur Malabarba
  1 sibling, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2015-11-28 16:39 UTC (permalink / raw)
  To: Stephen Berman; +Cc: 22038

> From: Stephen Berman <stephen.berman@gmx.net>
> Date: Sat, 28 Nov 2015 17:07:22 +0100
> 
> Issue 1: Please support having multiple characters match a single
> string in searches, so that e.g. "ss" can match the German letter "ß".

You mean, allow equivalent strings be of different length, I believe.
(That's the only way I could parse "multiple characters matching a
single string".)  We will have that, but it won't allow "ss" to match
"ß", unless you customize character-fold-table to include that.  The
reason is that "ß" doesn't have any decompositions in the Unicode
database, so the default character-fold-table doesn't include any
expansions for it.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#22038: 25.1.50; Character folding issues with isearch
  2015-11-28 16:39 ` Eli Zaretskii
@ 2015-11-28 17:10   ` Stephen Berman
  2015-11-28 17:40     ` Eli Zaretskii
  0 siblings, 1 reply; 10+ messages in thread
From: Stephen Berman @ 2015-11-28 17:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22038

On Sat, 28 Nov 2015 18:39:36 +0200 Eli Zaretskii <eliz@gnu.org> wrote:

>> From: Stephen Berman <stephen.berman@gmx.net>
>> Date: Sat, 28 Nov 2015 17:07:22 +0100
>> 
>> Issue 1: Please support having multiple characters match a single
>> string in searches, so that e.g. "ss" can match the German letter "ß".
>
> You mean, allow equivalent strings be of different length, I believe.

Yes.

> (That's the only way I could parse "multiple characters matching a
> single string".)  We will have that, but it won't allow "ss" to match
> "ß", unless you customize character-fold-table to include that.  The
> reason is that "ß" doesn't have any decompositions in the Unicode
> database, so the default character-fold-table doesn't include any
> expansions for it.

This suggests to me that basing character folding solely on character
decomposition is insufficient.  From a user's point of view I see no
reason why the search string "a" under character-folding matches "ä" but
not e.g. "æ".  Requiring a customization to get the latter strikes me as
a user-unfriendly crutch to work around a deficient implementation.  (I
don't know if it's easy to improve, I'm just giving my impression as a
user.)

Steve Berman





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#22038: 25.1.50; Character folding issues with isearch
  2015-11-28 17:10   ` Stephen Berman
@ 2015-11-28 17:40     ` Eli Zaretskii
  2015-11-28 18:26       ` Stephen Berman
  0 siblings, 1 reply; 10+ messages in thread
From: Eli Zaretskii @ 2015-11-28 17:40 UTC (permalink / raw)
  To: Stephen Berman; +Cc: 22038

> From: Stephen Berman <stephen.berman@gmx.net>
> Cc: 22038@debbugs.gnu.org
> Date: Sat, 28 Nov 2015 18:10:53 +0100
> 
> > (That's the only way I could parse "multiple characters matching a
> > single string".)  We will have that, but it won't allow "ss" to match
> > "ß", unless you customize character-fold-table to include that.  The
> > reason is that "ß" doesn't have any decompositions in the Unicode
> > database, so the default character-fold-table doesn't include any
> > expansions for it.
> 
> This suggests to me that basing character folding solely on character
> decomposition is insufficient.  From a user's point of view I see no
> reason why the search string "a" under character-folding matches "ä" but
> not e.g. "æ".  Requiring a customization to get the latter strikes me as
> a user-unfriendly crutch to work around a deficient implementation.  (I
> don't know if it's easy to improve, I'm just giving my impression as a
> user.)

Easiness is not the most important issue here: there's a more basic
problem involved.  Both "ß" vs "ss" and "æ" vs "a" (or "ae") are
language-specific: they are only valid matches in the context of
specific languages.  AFAIU, that is why they are not in the Unicode
database.  And we don't yet have language-specific text processing
capabilities and infrastructure (well, string-collate-lessp and
string-collate-equalp are a beginning, but only that).  So allowing
those by default risk running afoul of what users want.

There are more language-specific foldings possible, outside of the
European languages.  For example, folding of Arabic positional forms
of the same letter.  These are at times much more important than the
above ligatures, and yet we don't support them yet, either.

In this initial release of such functionality I think it is prudent to
go by the standard, because we don't yet have any real-life experience
to build upon.  That doesn't cover every possible use case where a
more radical folding would be useful, but we had nothing in Emacs 24,
so this is still a large step in the right direction, IMO.  Let's not
bite more than we can chew.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#22038: 25.1.50; Character folding issues with isearch
  2015-11-28 17:40     ` Eli Zaretskii
@ 2015-11-28 18:26       ` Stephen Berman
  2015-11-28 18:51         ` Eli Zaretskii
  2015-11-29  6:04         ` Richard Stallman
  0 siblings, 2 replies; 10+ messages in thread
From: Stephen Berman @ 2015-11-28 18:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 22038

On Sat, 28 Nov 2015 19:40:26 +0200 Eli Zaretskii <eliz@gnu.org> wrote:

>> From: Stephen Berman <stephen.berman@gmx.net>
>> Cc: 22038@debbugs.gnu.org
>> Date: Sat, 28 Nov 2015 18:10:53 +0100
>> 
>> > (That's the only way I could parse "multiple characters matching a
>> > single string".)  We will have that, but it won't allow "ss" to match
>> > "ß", unless you customize character-fold-table to include that.  The
>> > reason is that "ß" doesn't have any decompositions in the Unicode
>> > database, so the default character-fold-table doesn't include any
>> > expansions for it.
>> 
>> This suggests to me that basing character folding solely on character
>> decomposition is insufficient.  From a user's point of view I see no
>> reason why the search string "a" under character-folding matches "ä" but
>> not e.g. "æ".  Requiring a customization to get the latter strikes me as
>> a user-unfriendly crutch to work around a deficient implementation.  (I
>> don't know if it's easy to improve, I'm just giving my impression as a
>> user.)
>
> Easiness is not the most important issue here: there's a more basic
> problem involved.  Both "ß" vs "ss" and "æ" vs "a" (or "ae") are
> language-specific: they are only valid matches in the context of
> specific languages.  AFAIU, that is why they are not in the Unicode
> database.  And we don't yet have language-specific text processing
> capabilities and infrastructure (well, string-collate-lessp and
> string-collate-equalp are a beginning, but only that).  So allowing
> those by default risk running afoul of what users want.

I'm not sure what you mean by "only valid matches in the context of
specific languages", but it sounds like what Per Starbäck said about "ä"
being considered a completely separate character from "a" in Swedish,
unlike in German.  Yet if this is a language-specific difference, Emacs
doesn't respect it by default, since "a" does match "ä" under
character-folding.  (Or does it fail to do so when
current-language-environment is Swedish?  I suspect it doesn't.)

But I know nothing about the Unicode specifications; maybe you are
referring to a more subtle issue, which may be unrelated to my point,
which is simply that I think it should be just as convenient for a user
whose keyboard may lack "ß" or "æ" to match these characters by
searching with "s" or "a" (or "e" or "ae") as it is to match "ff" by
searching with "f".  This is not a language-specific issue AFAICS.

Steve Berman





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#22038: 25.1.50; Character folding issues with isearch
  2015-11-28 18:26       ` Stephen Berman
@ 2015-11-28 18:51         ` Eli Zaretskii
  2015-11-29  6:04         ` Richard Stallman
  1 sibling, 0 replies; 10+ messages in thread
From: Eli Zaretskii @ 2015-11-28 18:51 UTC (permalink / raw)
  To: Stephen Berman; +Cc: 22038

> From: Stephen Berman <stephen.berman@gmx.net>
> Cc: 22038@debbugs.gnu.org
> Date: Sat, 28 Nov 2015 19:26:20 +0100
> 
> > Easiness is not the most important issue here: there's a more basic
> > problem involved.  Both "ß" vs "ss" and "æ" vs "a" (or "ae") are
> > language-specific: they are only valid matches in the context of
> > specific languages.  AFAIU, that is why they are not in the Unicode
> > database.  And we don't yet have language-specific text processing
> > capabilities and infrastructure (well, string-collate-lessp and
> > string-collate-equalp are a beginning, but only that).  So allowing
> > those by default risk running afoul of what users want.
> 
> I'm not sure what you mean by "only valid matches in the context of
> specific languages", but it sounds like what Per Starbäck said about "ä"
> being considered a completely separate character from "a" in Swedish,
> unlike in German.  Yet if this is a language-specific difference, Emacs
> doesn't respect it by default, since "a" does match "ä" under
> character-folding.  (Or does it fail to do so when
> current-language-environment is Swedish?  I suspect it doesn't.)

We simply go by the decompositions that the (language-agnostic)
Unicode database specified, and do not augment that by anything that
is only valid or relevant in the context of specific languages.

> But I know nothing about the Unicode specifications; maybe you are
> referring to a more subtle issue, which may be unrelated to my point,
> which is simply that I think it should be just as convenient for a user
> whose keyboard may lack "ß" or "æ" to match these characters by
> searching with "s" or "a" (or "e" or "ae") as it is to match "ff" by
> searching with "f".  This is not a language-specific issue AFAICS.

The character folding feature is primarily a feature of search, not a
convenience feature for typing characters, although it might double as
that.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#22038: 25.1.50; Character folding issues with isearch
  2015-11-28 16:07 bug#22038: 25.1.50; Character folding issues with isearch Stephen Berman
  2015-11-28 16:39 ` Eli Zaretskii
@ 2015-11-29  0:06 ` Artur Malabarba
  2015-11-29  9:52   ` Andreas Röhler
  1 sibling, 1 reply; 10+ messages in thread
From: Artur Malabarba @ 2015-11-29  0:06 UTC (permalink / raw)
  To: Stephen Berman; +Cc: 22038-done

[-- Attachment #1: Type: text/plain, Size: 280 bytes --]

All these items are now implemented. Thanks for filing the issue, Stephen.

We can discuss whether we want to add any ad hoc rules. But I agree with
Eli that I want to wait and see. This is a brand new feature, and I'd
prefer that it's first iteration be just a solid foundation.

[-- Attachment #2: Type: text/html, Size: 323 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#22038: 25.1.50; Character folding issues with isearch
  2015-11-28 18:26       ` Stephen Berman
  2015-11-28 18:51         ` Eli Zaretskii
@ 2015-11-29  6:04         ` Richard Stallman
  1 sibling, 0 replies; 10+ messages in thread
From: Richard Stallman @ 2015-11-29  6:04 UTC (permalink / raw)
  To: Stephen Berman; +Cc: 22038

[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > But I know nothing about the Unicode specifications;

We don't have an obligation to do as Unicode says, but it is certainly
worth paying attention to what Unicode says.

							 maybe you are
  > referring to a more subtle issue, which may be unrelated to my point,
  > which is simply that I think it should be just as convenient for a user
  > whose keyboard may lack "ß" or "æ" to match these characters by
  > searching with "s" or "a" (or "e" or "ae") as it is to match "ff" by
  > searching with "f".  This is not a language-specific issue AFAICS.

When a character has a standard equivalent character series in a
certain language, as "ß" does in German, that is a specific powerful
argument for isearch to treat the two as equivalent.  But this
argument doesn't apply in a a language-independent way.
-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.






^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#22038: 25.1.50; Character folding issues with isearch
  2015-11-29  0:06 ` Artur Malabarba
@ 2015-11-29  9:52   ` Andreas Röhler
  2015-11-29 15:52     ` Eli Zaretskii
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Röhler @ 2015-11-29  9:52 UTC (permalink / raw)
  To: 22038



Am 29.11.2015 um 01:06 schrieb Artur Malabarba:
>
> All these items are now implemented. Thanks for filing the issue, 
> Stephen.
>
> We can discuss whether we want to add any ad hoc rules. But I agree 
> with Eli that I want to wait and see. This is a brand new feature, and 
> I'd prefer that it's first iteration be just a solid foundation.
>

It would make sense to restrict character folding to ASCII.  At least 
announce that in documentation. In other languages behaviour will remain 
hasardous.






^ permalink raw reply	[flat|nested] 10+ messages in thread

* bug#22038: 25.1.50; Character folding issues with isearch
  2015-11-29  9:52   ` Andreas Röhler
@ 2015-11-29 15:52     ` Eli Zaretskii
  0 siblings, 0 replies; 10+ messages in thread
From: Eli Zaretskii @ 2015-11-29 15:52 UTC (permalink / raw)
  To: Andreas Röhler; +Cc: 22038

> From: Andreas Röhler <andreas.roehler@easy-emacs.de>
> Date: Sun, 29 Nov 2015 10:52:30 +0100
> 
> It would make sense to restrict character folding to ASCII.

Most characters indeed fold into ASCII.  But not all of them, and
there's no reason to limit this only to ASCII, IMO.





^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-11-29 15:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-28 16:07 bug#22038: 25.1.50; Character folding issues with isearch Stephen Berman
2015-11-28 16:39 ` Eli Zaretskii
2015-11-28 17:10   ` Stephen Berman
2015-11-28 17:40     ` Eli Zaretskii
2015-11-28 18:26       ` Stephen Berman
2015-11-28 18:51         ` Eli Zaretskii
2015-11-29  6:04         ` Richard Stallman
2015-11-29  0:06 ` Artur Malabarba
2015-11-29  9:52   ` Andreas Röhler
2015-11-29 15:52     ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).