all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Single quotes in Info
@ 2015-01-23 23:17 Marcin Borkowski
  2015-01-23 23:53 ` Drew Adams
                   ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Marcin Borkowski @ 2015-01-23 23:17 UTC (permalink / raw)
  To: Help Gnu Emacs mailing list

Hello all,

I'm not sure about it, but it seems that after upgrading from 24.3 to
25.0.50.1, the Info buffer is a bit uglified.  First, it uses some face
I don't like for variable and function names – but if this annoys me too
much, I can change it easily.  Worse, instead of e.g. `t' it now says
‘t’, for instance (i.e., it uses Unicode single quotation marks).

This is extremely annoying, since it makes incremental searching for
single-quoted strings much harder.

I apropos'ed the "Info-" variables and grepped the list for "quot",
"unicode" and "single", all to no avail, and ran out of ideas.  Is this
behavior customizable?  How to get back to ASCII quotes?

TIA,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: Single quotes in Info
  2015-01-23 23:17 Single quotes in Info Marcin Borkowski
@ 2015-01-23 23:53 ` Drew Adams
  2015-01-24 17:01   ` Marcin Borkowski
  2015-01-24  8:38 ` Eli Zaretskii
       [not found] ` <mailman.18484.1422057224.1147.help-gnu-emacs@gnu.org>
  2 siblings, 1 reply; 40+ messages in thread
From: Drew Adams @ 2015-01-23 23:53 UTC (permalink / raw)
  To: Marcin Borkowski, Help Gnu Emacs mailing list

> I'm not sure about it, but it seems that after upgrading from 24.3 to
> 25.0.50.1, the Info buffer is a bit uglified.  First, it uses some face
> I don't like for variable and function names – but if this annoys me too
> much, I can change it easily.  Worse, instead of e.g. `t' it now says
> ‘t’, for instance (i.e., it uses Unicode single quotation marks).
> 
> This is extremely annoying, since it makes incremental searching for
> single-quoted strings much harder.
> 
> I apropos'ed the "Info-" variables and grepped the list for "quot",
> "unicode" and "single", all to no avail, and ran out of ideas.  Is this
> behavior customizable?  How to get back to ASCII quotes?

Oh boy, you'll have fun reading about this in the bug threads:

#16292 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16292
         info docs now contain single straight quotes instead of `'

#13131 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13131
         Allow curly quotes to be found by searching for straight quotes?

#16439 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16439
         Highlighting of strings within Info buffers

#13228 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13228
         Request for highlighting back-quote/quote pair notation

Enjoy!

(Info+ can at least help by highlighting quoted names etc.
http://www.emacswiki.org/emacs/InfoPlus)



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-23 23:17 Single quotes in Info Marcin Borkowski
  2015-01-23 23:53 ` Drew Adams
@ 2015-01-24  8:38 ` Eli Zaretskii
  2015-01-24 15:11   ` Drew Adams
       [not found] ` <mailman.18484.1422057224.1147.help-gnu-emacs@gnu.org>
  2 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2015-01-24  8:38 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Marcin Borkowski <mbork@wmi.amu.edu.pl>
> Date: Sat, 24 Jan 2015 00:17:47 +0100
> 
> I'm not sure about it, but it seems that after upgrading from 24.3 to
> 25.0.50.1, the Info buffer is a bit uglified.  First, it uses some face
> I don't like for variable and function names

Not sure what you mean here, because there is no such face in Info.
Maybe you mean Info-quoted, which is used for quoted strings?  (You
can use "M-x describe-text-properties" to show the face at point.)

> Worse, instead of e.g. `t' it now says ‘t’, for instance (i.e., it
> uses Unicode single quotation marks).

I don't think this has anything to do with Emacs.  These characters
come from the Info file itself, and are produced by the new 'makeinfo'
command.  That's "progress" for you: many people nowadays no longer
want to see ASCII quotes, they want to see those fancy characters
Unicode introduced.

Or maybe the reason is that in Emacs 24 we actively prevent 'makeinfo'
from doing that, whereas in Emacs 25 we don't.

> This is extremely annoying, since it makes incremental searching for
> single-quoted strings much harder.

Doesn't M-C-s allow you to find that by a suitable regexp?

Anyway, we should revive bug #13131, and provide an easier solution
for this particular issue.

> How to get back to ASCII quotes?

I think you need to regenerate the Info docs, using the levers we did
in Emacs 24 to disallow Unicode quotes.  Or customize 'makeinfo' to
produce ASCII characters instead (search for OPEN_QUOTE_SYMBOL in the
Texinfo manual), and then regenerate the docs.  Or install an older
'makeinfo', which didn't produce these quotes, and then regenerate the
docs.




^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: Single quotes in Info
  2015-01-24  8:38 ` Eli Zaretskii
@ 2015-01-24 15:11   ` Drew Adams
  2015-01-24 15:19     ` Eli Zaretskii
                       ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Drew Adams @ 2015-01-24 15:11 UTC (permalink / raw)
  To: Eli Zaretskii, help-gnu-emacs

> Anyway, we should revive bug #13131, and provide an easier solution
> for this particular issue.

I agree.  For this particular (search) issue.

This is conceptually related to, but it need not necessarily be
extended to, discussion about being able to Isearch abstracting from
diacritical marks etc.  (E.g. bug #13041:
http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13041.)

IOW, being able to easily specify equivalence classes of chars for
search (and other) purposes, and preferably being able to quickly
choose whether to make use of them (this one or that one) - e.g.,
as we can do now for case-sensitivity (`a' ~ `A').

The easily-search-for-curly-or-not-curly problem reminds us that
Info is not only about display: One needs to be able to easily
search for (and perhaps even type directly) the chars that are
displayed.  Chars ` and ' correspond to keys on most keyboards.
‘ and ’ do not.

Some of those who propose curly-quote etc. display as a "modernization"
of Emacs might not take sufficiently into account how Emacs users
interact with the text.  "Modern" appearance is nice (even important),
but Emacs is not *only* about display.

> > How to get back to ASCII quotes?
> 
> I think you need to regenerate the Info docs, using the levers we did
> in Emacs 24 to disallow Unicode quotes.  Or customize 'makeinfo' to
> produce ASCII characters instead (search for OPEN_QUOTE_SYMBOL in the
> Texinfo manual), and then regenerate the docs.  Or install an older
> 'makeinfo', which didn't produce these quotes, and then regenerate the
> docs.

As I know you are aware, Eli, this return-to-the-source is not a real
solution.  (Ideally) Emacs users themselves should be able (somehow)
to choose which chars are used for such display.  Remaking Info should
not be our only (i.e., final) answer, even if it is such today.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-24 15:11   ` Drew Adams
@ 2015-01-24 15:19     ` Eli Zaretskii
       [not found]     ` <<838ugsrysw.fsf@gnu.org>
  2015-01-24 17:00     ` Marcin Borkowski
  2 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2015-01-24 15:19 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Sat, 24 Jan 2015 07:11:05 -0800 (PST)
> From: Drew Adams <drew.adams@oracle.com>
> 
> > > How to get back to ASCII quotes?
> > 
> > I think you need to regenerate the Info docs, using the levers we did
> > in Emacs 24 to disallow Unicode quotes.  Or customize 'makeinfo' to
> > produce ASCII characters instead (search for OPEN_QUOTE_SYMBOL in the
> > Texinfo manual), and then regenerate the docs.  Or install an older
> > 'makeinfo', which didn't produce these quotes, and then regenerate the
> > docs.
> 
> As I know you are aware, Eli, this return-to-the-source is not a real
> solution.

I was enumerating solutions that are available to the OP now.  This
list is about helping users do whatever they want, not about telling
Emacs developers what future features they should work on ;-)



^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: Single quotes in Info
       [not found]     ` <<838ugsrysw.fsf@gnu.org>
@ 2015-01-24 15:54       ` Drew Adams
  2015-01-24 16:45         ` Marcin Borkowski
  0 siblings, 1 reply; 40+ messages in thread
From: Drew Adams @ 2015-01-24 15:54 UTC (permalink / raw)
  To: Eli Zaretskii, help-gnu-emacs

> > As I know you are aware, Eli, this return-to-the-source is not a
> > real solution.  (Ideally) Emacs users themselves should be able
> > (somehow) to choose which chars are used for such display.
> > Remaking Info should not be our only (i.e., final) answer, even
> > if it is such today.
> 
> I was enumerating solutions that are available to the OP now.  This
> list is about helping users do whatever they want, not about telling
> Emacs developers what future features they should work on ;-)

My message was an endorsement reply to your own development-oriented
statement:

 ez> Anyway, we should revive bug #13131, and provide an easier
 ez> solution for this particular issue.

And I think it does not hurt users to be reminded that curly
quotes are not as easy to type as straight quotes (with many/most
keyboards), and that Info is about things like search and not
only about display.  Thanks to Marcin for reminding us all.

It is perfectly legitimate to discuss possible new features on
this list, as well as current limitations & possible workarounds.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-24 15:54       ` Drew Adams
@ 2015-01-24 16:45         ` Marcin Borkowski
  0 siblings, 0 replies; 40+ messages in thread
From: Marcin Borkowski @ 2015-01-24 16:45 UTC (permalink / raw)
  To: Eli Zaretskii, help-gnu-emacs


On 2015-01-24, at 16:54, Drew Adams <drew.adams@oracle.com> wrote:

> And I think it does not hurt users to be reminded that curly
> quotes are not as easy to type as straight quotes (with many/most
> keyboards), and that Info is about things like search and not
> only about display.  Thanks to Marcin for reminding us all.

You're welcome.

BTW, I love the Info system.  Only recently I learned to use its index
(and not only isearch), and it's even better with that.

My particular use case was with the info page on interactive codes.
I wanted to search for the string "`p'", and I could enter curly quotes
using M-e and editing the search query with some unicode-aware things
(C-x 8 RET, for instance), but this is a nuisance.  (Also, in my case,
isearch-forward-regexp wouldn't help.)

Please note that I do appreciate typographical niceties like proper
quotes and such.  In this case, however, usability is more important
than aesthetics imho.

> It is perfectly legitimate to discuss possible new features on
> this list, as well as current limitations & possible workarounds.

That's good to know! :-)

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-24 15:11   ` Drew Adams
  2015-01-24 15:19     ` Eli Zaretskii
       [not found]     ` <<838ugsrysw.fsf@gnu.org>
@ 2015-01-24 17:00     ` Marcin Borkowski
  2015-01-27 16:27       ` Artur Malabarba
  2 siblings, 1 reply; 40+ messages in thread
From: Marcin Borkowski @ 2015-01-24 17:00 UTC (permalink / raw)
  To: Eli Zaretskii, help-gnu-emacs


On 2015-01-24, at 16:11, Drew Adams <drew.adams@oracle.com> wrote:

> This is conceptually related to, but it need not necessarily be
> extended to, discussion about being able to Isearch abstracting from
> diacritical marks etc.  (E.g. bug #13041:
> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13041.)
>
> IOW, being able to easily specify equivalence classes of chars for
> search (and other) purposes, and preferably being able to quickly
> choose whether to make use of them (this one or that one) - e.g.,
> as we can do now for case-sensitivity (`a' ~ `A').

This is a great idea.  Maybe even not only for isearch.

Best,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-23 23:53 ` Drew Adams
@ 2015-01-24 17:01   ` Marcin Borkowski
  0 siblings, 0 replies; 40+ messages in thread
From: Marcin Borkowski @ 2015-01-24 17:01 UTC (permalink / raw)
  To: Help Gnu Emacs mailing list


On 2015-01-24, at 00:53, Drew Adams <drew.adams@oracle.com> wrote:

>> I'm not sure about it, but it seems that after upgrading from 24.3 to
>> 25.0.50.1, the Info buffer is a bit uglified.  First, it uses some face
>> I don't like for variable and function names – but if this annoys me too
>> much, I can change it easily.  Worse, instead of e.g. `t' it now says
>> ‘t’, for instance (i.e., it uses Unicode single quotation marks).
>> 
>> This is extremely annoying, since it makes incremental searching for
>> single-quoted strings much harder.
>> 
>> I apropos'ed the "Info-" variables and grepped the list for "quot",
>> "unicode" and "single", all to no avail, and ran out of ideas.  Is this
>> behavior customizable?  How to get back to ASCII quotes?
>
> Oh boy, you'll have fun reading about this in the bug threads:
>
> #16292 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16292
>          info docs now contain single straight quotes instead of `'
>
> #13131 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13131
>          Allow curly quotes to be found by searching for straight quotes?
>
> #16439 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16439
>          Highlighting of strings within Info buffers
>
> #13228 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13228
>          Request for highlighting back-quote/quote pair notation
>
> Enjoy!

Thanks, I'll look at these.

Best,

-- 
Marcin Borkowski
http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Unicode in emacs (was Single quotes in Info)
       [not found] ` <mailman.18484.1422057224.1147.help-gnu-emacs@gnu.org>
@ 2015-01-26  3:26   ` Rusi
  0 siblings, 0 replies; 40+ messages in thread
From: Rusi @ 2015-01-26  3:26 UTC (permalink / raw)
  To: help-gnu-emacs

On Saturday, January 24, 2015 at 5:23:46 AM UTC+5:30, Drew Adams wrote:
> > I'm not sure about it, but it seems that after upgrading from 24.3 to
> > 25.0.50.1, the Info buffer is a bit uglified.  First, it uses some face
> > I don't like for variable and function names - but if this annoys me too
> > much, I can change it easily.  Worse, instead of e.g. `t' it now says
> > 't', for instance (i.e., it uses Unicode single quotation marks).
> > 
> > This is extremely annoying, since it makes incremental searching for
> > single-quoted strings much harder.
> > 
> > I apropos'ed the "Info-" variables and grepped the list for "quot",
> > "unicode" and "single", all to no avail, and ran out of ideas.  Is this
> > behavior customizable?  How to get back to ASCII quotes?
> 
> Oh boy, you'll have fun reading about this in the bug threads:
> 
> #16292 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16292
>          info docs now contain single straight quotes instead of `'
> 
> #13131 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13131
>          Allow curly quotes to be found by searching for straight quotes?
> 
> #16439 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16439
>          Highlighting of strings within Info buffers
> 
> #13228 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13228
>          Request for highlighting back-quote/quote pair notation
> 
> Enjoy!
> 
> (Info+ can at least help by highlighting quoted names etc.
> http://www.emacswiki.org/emacs/InfoPlus)

Just some (very laymanish) thoughts about unicode.
Uni-code has two aspects:
1. Uni-fying the tower of babel that is human languages
2. Uni-versality of a common core

Historically, the 1st is the driver why unicode caught on at all
[The world is a bit larger than the two sides of the atlantic!]

However the 2nd probably holds more hope for reducing babel-ish bedlam.

Some of the more universal sides of unicode:
1. ASCII (for historical reasons alone)
2. Math
3. Typography  (which this thread is about)

[Note this will not technically hold up. I am talking more sociologically
ie
"2+3" is more likely to universalize than "Add two and three"
]

Further expanded in this post
http://blog.languager.org/2015/01/unicode-and-universe.html

Also a plea for programming languages to start getting more unicoded
[Not to be taken too seriously - just a possible direction]
http://blog.languager.org/2014/04/unicoded-python.html


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-24 17:00     ` Marcin Borkowski
@ 2015-01-27 16:27       ` Artur Malabarba
  2015-01-27 17:37         ` Stefan Monnier
  2015-01-27 18:04         ` Eli Zaretskii
  0 siblings, 2 replies; 40+ messages in thread
From: Artur Malabarba @ 2015-01-27 16:27 UTC (permalink / raw)
  To: Marcin Borkowski, emacs-devel; +Cc: Eli Zaretskii, help-gnu-emacs

2015-01-24 15:00 GMT-02:00 Marcin Borkowski <mbork@wmi.amu.edu.pl>:
>
> On 2015-01-24, at 16:11, Drew Adams <drew.adams@oracle.com> wrote:
>
>> This is conceptually related to, but it need not necessarily be
>> extended to, discussion about being able to Isearch abstracting from
>> diacritical marks etc.  (E.g. bug #13041:
>> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13041.)
>>
>> IOW, being able to easily specify equivalence classes of chars for
>> search (and other) purposes, and preferably being able to quickly
>> choose whether to make use of them (this one or that one) - e.g.,
>> as we can do now for case-sensitivity (`a' ~ `A').
>
> This is a great idea.  Maybe even not only for isearch.
>

I also really like this idea, so much so that I've gone ahead and
implemented it. It is implemented on the branch
`scratch/isearch-character-group-folding'. I called it group-folding,
but we can call it class folding or whatever sounds more intuitive to
most people.

The implementation is very much up for debate. Currently, what it does
is use regexps (behind the scenes) so that a plain double quote
matches all those unicode double quotes, and the same for a hard
single quote. The way it is written, it is trivial to add more groups
by adding entries to `isearch-groups-alist'.
Of course, other characters are appropriately regexp-quoted behind the
scenes, so that everything else works as expected. The surface is
exactly like regular isearch, except for these two characters.

The set of groups is defined by `isearch-groups-alist', and the
folding only happens if `isearch-fold-groups' is non-nil.
Other groups that maybe should be added are latin accented letters.

Cheers to all,



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-27 16:27       ` Artur Malabarba
@ 2015-01-27 17:37         ` Stefan Monnier
  2015-01-27 18:09           ` Eli Zaretskii
  2015-01-27 19:49           ` Artur Malabarba
  2015-01-27 18:04         ` Eli Zaretskii
  1 sibling, 2 replies; 40+ messages in thread
From: Stefan Monnier @ 2015-01-27 17:37 UTC (permalink / raw)
  To: Artur Malabarba
  Cc: Eli Zaretskii, emacs-devel, help-gnu-emacs, Marcin Borkowski

> The implementation is very much up for debate. Currently, what it does
> is use regexps (behind the scenes) so that a plain double quote
> matches all those unicode double quotes, and the same for a hard
> single quote.

Why not use the case-fold machinery instead?


        Stefan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-27 16:27       ` Artur Malabarba
  2015-01-27 17:37         ` Stefan Monnier
@ 2015-01-27 18:04         ` Eli Zaretskii
  2015-01-27 18:39           ` Drew Adams
  2015-01-27 20:24           ` Artur Malabarba
  1 sibling, 2 replies; 40+ messages in thread
From: Eli Zaretskii @ 2015-01-27 18:04 UTC (permalink / raw)
  To: bruce.connor.am; +Cc: help-gnu-emacs, emacs-devel, mbork

> Date: Tue, 27 Jan 2015 14:27:45 -0200
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, help-gnu-emacs <help-gnu-emacs@gnu.org>
> 
> I also really like this idea, so much so that I've gone ahead and
> implemented it. It is implemented on the branch
> `scratch/isearch-character-group-folding'. I called it group-folding,
> but we can call it class folding or whatever sounds more intuitive to
> most people.

I didn't yet have time to look at the source, so apologies if what's
below is off the mark.

> The implementation is very much up for debate. Currently, what it does
> is use regexps (behind the scenes) so that a plain double quote
> matches all those unicode double quotes, and the same for a hard
> single quote. The way it is written, it is trivial to add more groups
> by adding entries to `isearch-groups-alist'.
> Of course, other characters are appropriately regexp-quoted behind the
> scenes, so that everything else works as expected. The surface is
> exactly like regular isearch, except for these two characters.

If this is implemented in isearch, then IMO doing it for quotes alone
makes very little sense.  It would make a lot of sense if it were
implemented in info.el, for searching Info manuals (in which case it
should also support the other Unicode characters produced by makeinfo
that have ASCII equivalents, like ⇒ vs =>.  (Note that this is not
character-for-character equivalence anymore.)

For a general-purpose search feature, we'd need a much more
general-purpose and versatile implementation.

> The set of groups is defined by `isearch-groups-alist', and the
> folding only happens if `isearch-fold-groups' is non-nil.
> Other groups that maybe should be added are latin accented letters.

If we do this via our private database, that database is going to be
huge.  I suggest to explore an alternative implementation, which uses
canonical equivalence.  We already have infrastructure for that, see
the description of the 'decomposition' character property in the ELisp
manual.




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-27 17:37         ` Stefan Monnier
@ 2015-01-27 18:09           ` Eli Zaretskii
  2015-01-27 19:00             ` Stefan Monnier
  2015-01-27 19:49           ` Artur Malabarba
  1 sibling, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2015-01-27 18:09 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: help-gnu-emacs, emacs-devel, bruce.connor.am, mbork

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Marcin Borkowski <mbork@wmi.amu.edu.pl>,  emacs-devel <emacs-devel@gnu.org>,  Eli Zaretskii <eliz@gnu.org>,  help-gnu-emacs <help-gnu-emacs@gnu.org>
> Date: Tue, 27 Jan 2015 12:37:31 -0500
> 
> > The implementation is very much up for debate. Currently, what it does
> > is use regexps (behind the scenes) so that a plain double quote
> > matches all those unicode double quotes, and the same for a hard
> > single quote.
> 
> Why not use the case-fold machinery instead?

That will work only for character-for-character replacements, won't
it?



^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: Single quotes in Info
  2015-01-27 18:04         ` Eli Zaretskii
@ 2015-01-27 18:39           ` Drew Adams
  2015-01-27 20:24           ` Artur Malabarba
  1 sibling, 0 replies; 40+ messages in thread
From: Drew Adams @ 2015-01-27 18:39 UTC (permalink / raw)
  To: emacs-devel

FWIW, I suggest that help-gnu-emacs be removed from this thread from now on.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-27 18:09           ` Eli Zaretskii
@ 2015-01-27 19:00             ` Stefan Monnier
  2015-01-27 19:15               ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2015-01-27 19:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: help-gnu-emacs, emacs-devel, bruce.connor.am, mbork

>> Why not use the case-fold machinery instead?
> That will work only for character-for-character replacements, won't
> it?

That's right.  But it will work a lot more efficiently (and reliably,
e.g. if you have a one of those characters in a character-range) for those.


        Stefan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-27 19:00             ` Stefan Monnier
@ 2015-01-27 19:15               ` Eli Zaretskii
  0 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2015-01-27 19:15 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel, bruce.connor.am, mbork

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: bruce.connor.am@gmail.com,  mbork@wmi.amu.edu.pl,  emacs-devel@gnu.org,  help-gnu-emacs@gnu.org
> Date: Tue, 27 Jan 2015 14:00:49 -0500
> 
> >> Why not use the case-fold machinery instead?
> > That will work only for character-for-character replacements, won't
> > it?
> 
> That's right.  But it will work a lot more efficiently (and reliably,
> e.g. if you have a one of those characters in a character-range) for those.

But then someone else will come up complaining about the other Unicode
characters emitted by makeinfo 5.x.  There's about a dozen of them.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-27 17:37         ` Stefan Monnier
  2015-01-27 18:09           ` Eli Zaretskii
@ 2015-01-27 19:49           ` Artur Malabarba
  2015-01-27 20:30             ` Stefan Monnier
  1 sibling, 1 reply; 40+ messages in thread
From: Artur Malabarba @ 2015-01-27 19:49 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel, help-gnu-emacs, Marcin Borkowski

2015-01-27 15:37 GMT-02:00 Stefan Monnier <monnier@iro.umontreal.ca>:
>> The implementation is very much up for debate. Currently, what it does
>> is use regexps (behind the scenes) so that a plain double quote
>> matches all those unicode double quotes, and the same for a hard
>> single quote.
>
> Why not use the case-fold machinery instead?

Because, IIUC, this is done in c code. While I know c, I can't say I
know Emacs' c. So that implementation will take longer (something on
the order of weeks).



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-27 18:04         ` Eli Zaretskii
  2015-01-27 18:39           ` Drew Adams
@ 2015-01-27 20:24           ` Artur Malabarba
  2015-01-27 21:18             ` Eli Zaretskii
  1 sibling, 1 reply; 40+ messages in thread
From: Artur Malabarba @ 2015-01-27 20:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, Marcin Borkowski

[-- Attachment #1: Type: text/plain, Size: 1806 bytes --]

> If this is implemented in isearch, then IMO doing it for quotes alone
> makes very little sense.

The quotes are just proof of concept. Adding other equivalency classes is
easy from here, and I do agree it makes sense to add others.

> It would make a lot of sense if it were
> implemented in info.el, for searching Info manuals

There are ways to do that too if people prefer, but info manuals are not
the only ones that contain such characters.
For instance, lots of people use round quotes in org-mode files.

> (in which case it
> should also support the other Unicode characters produced by makeinfo
> that have ASCII equivalents, like ⇒ vs =>. (Note that this is not
> character-for-character equivalence anymore.)

I agree with the idea, but it will be more tricky. Translating a character
to any regexp is easy right now. Translating multiple characters into a
single is more complicated, but I can do that. But I'm worried about the
performance of that.

> If we do this via our private database, that database is going to be
> huge.

Is it? I would expect something on the order of 50 lines. That would be
large, but not huge. Each entry relates a key from a simple keyboard to a
set of possible characters that are not represented in simple keyboards.
But maybe I'm just being naive.

> I suggest to explore an alternative implementation, which uses
> canonical equivalence.

I'd love that.

> We already have infrastructure for that, see
> the description of the 'decomposition' character property in the ELisp
> manual.

Building this on preexisting infrastructure would be great, but does that
go the right way? Does it relate a simple character to all its complex
equivalents? Or does it relate each complex character to a simple
alternative?

[-- Attachment #2: Type: text/html, Size: 2087 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-27 19:49           ` Artur Malabarba
@ 2015-01-27 20:30             ` Stefan Monnier
  2015-01-28  3:48               ` Stefan Monnier
  0 siblings, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2015-01-27 20:30 UTC (permalink / raw)
  To: Artur Malabarba; +Cc: emacs-devel, help-gnu-emacs, Marcin Borkowski

>> Why not use the case-fold machinery instead?
> Because, IIUC, this is done in c code. While I know c, I can't say I
> know Emacs' c. So that implementation will take longer (something on
> the order of weeks).

It's configured in C, tho.  Try:

   C-h f *case-table TAB

for a start.


        Stefan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-27 20:24           ` Artur Malabarba
@ 2015-01-27 21:18             ` Eli Zaretskii
  2015-01-28  1:15               ` Artur Malabarba
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2015-01-27 21:18 UTC (permalink / raw)
  To: bruce.connor.am; +Cc: emacs-devel, mbork

> Date: Tue, 27 Jan 2015 18:24:09 -0200
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Cc: Marcin Borkowski <mbork@wmi.amu.edu.pl>, emacs-devel <emacs-devel@gnu.org>
> 
> > If this is implemented in isearch, then IMO doing it for quotes alone
> > makes very little sense.
> 
> The quotes are just proof of concept.

Yes, but what concept is that?  Does it scale up to a general-purpose
feature of the kind that suits isearch.el?  Just replacing one
character for another doesn't, IMO.

> > If we do this via our private database, that database is going to be
> > huge.
> 
> Is it? I would expect something on the order of 50 lines.

There are more than 5000 characters in the Unicode database that have
equivalence and canonical decompositions.  (Look for entries in
UnicodeData.txt whose 6th field is non-empty.)

> > We already have infrastructure for that, see
> > the description of the 'decomposition' character property in the ELisp
> > manual.
> 
> Building this on preexisting infrastructure would be great, but does that go
> the right way? Does it relate a simple character to all its complex
> equivalents? Or does it relate each complex character to a simple alternative? 
The latter.  Read paragraph 1.1 of UAX #15 for the starting point, and
also section 3.7 of the Unicode Standard.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-27 21:18             ` Eli Zaretskii
@ 2015-01-28  1:15               ` Artur Malabarba
  2015-01-28 15:24                 ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Artur Malabarba @ 2015-01-28  1:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2843 bytes --]

Eli, if I may ask, did you get a chance to see the code? (it's quite short)
The last couple emails give me the impression we're not quite on the same
page.

On 27 Jan 2015 19:18, "Eli Zaretskii" <eliz@gnu.org> wrote:
>
> > Date: Tue, 27 Jan 2015 18:24:09 -0200
> > From: Artur Malabarba <bruce.connor.am@gmail.com>
> > Cc: Marcin Borkowski <mbork@wmi.amu.edu.pl>, emacs-devel <
emacs-devel@gnu.org>
> >
> > > If this is implemented in isearch, then IMO doing it for quotes alone
> > > makes very little sense.
> >
> > The quotes are just proof of concept.
>
> Yes, but what concept is that?  Does it scale up to a general-purpose
> feature of the kind that suits isearch.el?  Just replacing one
> character for another doesn't, IMO.

No. It replaces one character with an arbitrary regexp. In the quotes case
that's used to match about a dozen different quotation characters, but it's
not limited to that. You can also use that to implement lax-whi

> > > If we do this via our private database, that database is going to be
> > > huge.
> >
> > Is it? I would expect something on the order of 50 lines.
>
> There are more than 5000 characters in the Unicode database that have
> equivalence and canonical decompositions.  (Look for entries in
> UnicodeData.txt whose 6th field is non-empty.)

The purpose of this is to allow the user to search for complex characters
(such as curly quotes or any of these "“””„⹂〞‟‟❞❝❠“„〝〟🙷🙶🙸) by typing a
simple character available on simple keyboards (such as the plain double
quote "). Each simple character, needs an entry on the
`isearch-groups-alist' variable. The max number of entries we'll ever need
on this alist (in the very worst possible scenario) is the number of simple
characters in a simple keyboard (which is way less than 5000 last I
checked).
This might be easier to understand looking at the code.

>
> > > We already have infrastructure for that, see
> > > the description of the 'decomposition' character property in the ELisp
> > > manual.
> >
> > Building this on preexisting infrastructure would be great, but does
that go
> > the right way? Does it relate a simple character to all its complex
> > equivalents? Or does it relate each complex character to a simple
alternative?
> The latter.  Read paragraph 1.1 of UAX #15 for the starting point, and
> also section 3.7 of the Unicode Standard.
If it's the latter, then it's the wrong way for us to do an automated
approach. What we need is to know the whole set of Unicode characters which
is equivalent to a given ASCII character. Of course we can build this table
from the Unicode Standard (that's exactly what the `isearch-groups-alist'
variable is meant to do), I'm just saying an automated approach probably
isn't viable here.

[-- Attachment #2: Type: text/html, Size: 3563 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-27 20:30             ` Stefan Monnier
@ 2015-01-28  3:48               ` Stefan Monnier
  2015-01-28 21:42                 ` Artur Malabarba
  0 siblings, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2015-01-28  3:48 UTC (permalink / raw)
  To: Artur Malabarba
  Cc: Eli Zaretskii, emacs-devel, help-gnu-emacs, Marcin Borkowski

> It's configured in C, tho.  Try:
                    ^^^
                   Elisp
Duh!


        Stefan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-28  1:15               ` Artur Malabarba
@ 2015-01-28 15:24                 ` Eli Zaretskii
  2015-01-28 16:10                   ` Yuri Khan
  2015-01-28 21:38                   ` Artur Malabarba
  0 siblings, 2 replies; 40+ messages in thread
From: Eli Zaretskii @ 2015-01-28 15:24 UTC (permalink / raw)
  To: bruce.connor.am; +Cc: emacs-devel

> Date: Tue, 27 Jan 2015 23:15:22 -0200
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Cc: emacs-devel <emacs-devel@gnu.org>
> 
> Eli, if I may ask, did you get a chance to see the code? (it's quite short)
> The last couple emails give me the impression we're not quite on the same page.

I did just now, and I don't think I was on a different page.

> The purpose of this is to allow the user to search for complex characters (such as curly quotes or any of these "“””„⹂〞‟‟❞❝❠“„〝〟🙷🙶🙸) by typing a simple character available on simple keyboards (such as the plain double quote "). 

But that's exactly where it falls short of supporting a more general
feature, which allows to find text that is "equivalent" to the one you
search for.  The limitation to "simple characters available on simple
keyboards" might seem a no-brainer for predominantly ASCII text, but
it _is_ a serious limitation for any non-ASCII script, certainly for
complex scripts, which Emacs supports for years.

> Each simple character, needs an entry on the `isearch-groups-alist' variable. The max number of entries we'll ever need on this alist (in the very worst possible scenario) is the number of simple characters in a simple keyboard (which is way less than 5000 last I checked).

You seem to forget that modern keyboards and input methods support
much more than what meets the eye on the keyboard.  Even Latin locales
provide non-ASCII characters such as á and å.  It is also not uncommon
to copy/paste a search string from some text, in which case the search
string could include the "complex" characters, but you'd still want to
find their "simple" equivalents; your code, which transforms only the
search string, cannot support this use case.  Moreover, CJK locales
use input methods that can produce thousands of characters, and for
people in those cultures such input is "simple" because they can use
nothing simpler.

Using a database that maps ASCII characters to regexps doesn't scale
for supporting these use cases.  It doesn't even scale to the
above-mentioned Latin characters, because á has a sequence of 2
characters "a ́" as its canonical decomposition, so when I type á, I
expect to find both á and "a ́", and vice versa.  More complex scripts
have several forms of the same letter, such as the "final" form used
in Arabic and Hebrew for the last letter in a word -- typing one of
these forms should find any other form.  Etc. etc. -- there's a huge
complexity behind all this, and we need to support it if we want to be
respected as a text editor.

The way to support this is similar to how we support case-insensitive
search: we "fold" each character, both in the search string and in the
text being searched, using case tables, and then compare the "folded"
characters.  Similarly, to support equivalence, we need to produce a
canonical/equivalent decomposition from each character on both sides
of the comparison, and then compare the results.

As I said before, we already have all the necessary data in the
'decomposition' property of each character, we just need to use it in
a way that is similar to case tables, just slightly more complex
(because we are no longer talking single characters).

> > > Does it relate a simple character to all its complex
> > > equivalents? Or does it relate each complex character to a simple alternative?
> > The latter.  Read paragraph 1.1 of UAX #15 for the starting point, and
> > also section 3.7 of the Unicode Standard.
> If it's the latter, then it's the wrong way for us to do an automated approach. What we need is to know the whole set of Unicode characters which is equivalent to a given ASCII character. Of course we can build this table from the Unicode Standard (that's exactly what the `isearch-groups-alist' variable is meant to do), I'm just saying an automated approach probably isn't viable here.

I don't see why it won't be viable, or maybe I don't understand what
you mean by "automated" here.  I certainly don't think we should limit
ourselves to "simple characters", not for something as general-purpose
as text search.  This might be okay for Info only, but not if we want
it in isearch.el.

My idea is to use the 'decomposition' property to decompose each
character in the search string and in the text being searched, when
they need to be compared.  Exactly like we do with case-folding.




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-28 15:24                 ` Eli Zaretskii
@ 2015-01-28 16:10                   ` Yuri Khan
  2015-01-28 17:22                     ` Eli Zaretskii
  2015-01-28 21:38                   ` Artur Malabarba
  1 sibling, 1 reply; 40+ messages in thread
From: Yuri Khan @ 2015-01-28 16:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: bruce.connor.am, Emacs developers

On Wed, Jan 28, 2015 at 9:24 PM, Eli Zaretskii <eliz@gnu.org> wrote:

> As I said before, we already have all the necessary data in the
> 'decomposition' property of each character, we just need to use it in
> a way that is similar to case tables, just slightly more complex
> (because we are no longer talking single characters).

Proper case folding is not about single characters either, because ß.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-28 16:10                   ` Yuri Khan
@ 2015-01-28 17:22                     ` Eli Zaretskii
  0 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2015-01-28 17:22 UTC (permalink / raw)
  To: Yuri Khan; +Cc: bruce.connor.am, emacs-devel

> From: Yuri Khan <yuri.v.khan@gmail.com>
> Date: Wed, 28 Jan 2015 23:10:32 +0700
> Cc: bruce.connor.am@gmail.com, Emacs developers <emacs-devel@gnu.org>
> 
> On Wed, Jan 28, 2015 at 9:24 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> > As I said before, we already have all the necessary data in the
> > 'decomposition' property of each character, we just need to use it in
> > a way that is similar to case tables, just slightly more complex
> > (because we are no longer talking single characters).
> 
> Proper case folding is not about single characters either, because ß.

Which we don't yet support for the same reasons.




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-28 15:24                 ` Eli Zaretskii
  2015-01-28 16:10                   ` Yuri Khan
@ 2015-01-28 21:38                   ` Artur Malabarba
  2015-01-29  3:44                     ` Eli Zaretskii
  1 sibling, 1 reply; 40+ messages in thread
From: Artur Malabarba @ 2015-01-28 21:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

I've been looking into what you suggest, but it seems the
decomposition property won't be enough. It does give us the necessary
information for things like á and ç, but it doesn't say anything about
the quotes (which was the whole inital point), nor about characters
like ⇒ (which I think someone else on this thread suggested).

Furthermore, the point here would be to have "a" and "á" match each
other, but the decomposition of "á" gives us two characters (as would
be expected). How are we to programmatically know which of these two
characters is to be considered equivalent to "a with accute"? Is it
safe to assume it's the first character?
Otherwise, if we demand that the user types a´ to be able to match the
á letter, then this feature seems kind of moot.

2015-01-28 13:24 GMT-02:00 Eli Zaretskii <eliz@gnu.org>:
>> Date: Tue, 27 Jan 2015 23:15:22 -0200
>> From: Artur Malabarba <bruce.connor.am@gmail.com>
>> Cc: emacs-devel <emacs-devel@gnu.org>
>>
>> Eli, if I may ask, did you get a chance to see the code? (it's quite short)
>> The last couple emails give me the impression we're not quite on the same page.
>
> I did just now, and I don't think I was on a different page.
>
>> The purpose of this is to allow the user to search for complex characters (such as curly quotes or any of these "“””„⹂〞‟‟❞❝❠“„〝〟🙷🙶🙸) by typing a simple character available on simple keyboards (such as the plain double quote ").
>
> But that's exactly where it falls short of supporting a more general
> feature, which allows to find text that is "equivalent" to the one you
> search for.  The limitation to "simple characters available on simple
> keyboards" might seem a no-brainer for predominantly ASCII text, but
> it _is_ a serious limitation for any non-ASCII script, certainly for
> complex scripts, which Emacs supports for years.
>
>> Each simple character, needs an entry on the `isearch-groups-alist' variable. The max number of entries we'll ever need on this alist (in the very worst possible scenario) is the number of simple characters in a simple keyboard (which is way less than 5000 last I checked).
>
> You seem to forget that modern keyboards and input methods support
> much more than what meets the eye on the keyboard.  Even Latin locales
> provide non-ASCII characters such as á and å.  It is also not uncommon
> to copy/paste a search string from some text, in which case the search
> string could include the "complex" characters, but you'd still want to
> find their "simple" equivalents; your code, which transforms only the
> search string, cannot support this use case.  Moreover, CJK locales
> use input methods that can produce thousands of characters, and for
> people in those cultures such input is "simple" because they can use
> nothing simpler.
>
> Using a database that maps ASCII characters to regexps doesn't scale
> for supporting these use cases.  It doesn't even scale to the
> above-mentioned Latin characters, because á has a sequence of 2
> characters "a ́" as its canonical decomposition, so when I type á, I
> expect to find both á and "a ́", and vice versa.  More complex scripts
> have several forms of the same letter, such as the "final" form used
> in Arabic and Hebrew for the last letter in a word -- typing one of
> these forms should find any other form.  Etc. etc. -- there's a huge
> complexity behind all this, and we need to support it if we want to be
> respected as a text editor.
>
> The way to support this is similar to how we support case-insensitive
> search: we "fold" each character, both in the search string and in the
> text being searched, using case tables, and then compare the "folded"
> characters.  Similarly, to support equivalence, we need to produce a
> canonical/equivalent decomposition from each character on both sides
> of the comparison, and then compare the results.
>
> As I said before, we already have all the necessary data in the
> 'decomposition' property of each character, we just need to use it in
> a way that is similar to case tables, just slightly more complex
> (because we are no longer talking single characters).
>
>> > > Does it relate a simple character to all its complex
>> > > equivalents? Or does it relate each complex character to a simple alternative?
>> > The latter.  Read paragraph 1.1 of UAX #15 for the starting point, and
>> > also section 3.7 of the Unicode Standard.
>> If it's the latter, then it's the wrong way for us to do an automated approach. What we need is to know the whole set of Unicode characters which is equivalent to a given ASCII character. Of course we can build this table from the Unicode Standard (that's exactly what the `isearch-groups-alist' variable is meant to do), I'm just saying an automated approach probably isn't viable here.
>
> I don't see why it won't be viable, or maybe I don't understand what
> you mean by "automated" here.  I certainly don't think we should limit
> ourselves to "simple characters", not for something as general-purpose
> as text search.  This might be okay for Info only, but not if we want
> it in isearch.el.
>
> My idea is to use the 'decomposition' property to decompose each
> character in the search string and in the text being searched, when
> they need to be compared.  Exactly like we do with case-folding.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-28  3:48               ` Stefan Monnier
@ 2015-01-28 21:42                 ` Artur Malabarba
  2015-01-28 22:23                   ` Stefan Monnier
  0 siblings, 1 reply; 40+ messages in thread
From: Artur Malabarba @ 2015-01-28 21:42 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: help-gnu-emacs, emacs-devel, Marcin Borkowski

Ok, I'll be getting on a 10 hour flight now, so I'll be looking into
the case-fold machinery.
I did have a brief look already and it doesn't seem horribly absurd.

Any other pointers that might be useful before I jump into no-internet land? :-)

2015-01-28 1:48 GMT-02:00 Stefan Monnier <monnier@iro.umontreal.ca>:
>> It's configured in C, tho.  Try:
>                     ^^^
>                    Elisp
> Duh!
>
>
>         Stefan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-28 21:42                 ` Artur Malabarba
@ 2015-01-28 22:23                   ` Stefan Monnier
  2015-01-29 14:31                     ` Artur Malabarba
  0 siblings, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2015-01-28 22:23 UTC (permalink / raw)
  To: Artur Malabarba; +Cc: help-gnu-emacs, emacs-devel, Marcin Borkowski

> Ok, I'll be getting on a 10 hour flight now, so I'll be looking into
> the case-fold machinery.
> I did have a brief look already and it doesn't seem horribly absurd.

> Any other pointers that might be useful before I jump into no-internet
>  land? :-)

Just a warning: the case-tables are threatened.  They should be replaced by
Unicode-aware (locale-dependent?) case folding for the 99.99% of the
cases, the only remaining case is the "ASCII upcase/downcase" operation
used in sendmail.el (IIRC), which we can hopefully solve some other way.


        Stefan



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-28 21:38                   ` Artur Malabarba
@ 2015-01-29  3:44                     ` Eli Zaretskii
  2015-01-29  6:01                       ` Drew Adams
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2015-01-29  3:44 UTC (permalink / raw)
  To: bruce.connor.am; +Cc: emacs-devel

> Date: Wed, 28 Jan 2015 19:38:08 -0200
> From: Artur Malabarba <bruce.connor.am@gmail.com>
> Cc: emacs-devel <emacs-devel@gnu.org>
> 
> I've been looking into what you suggest, but it seems the
> decomposition property won't be enough. It does give us the necessary
> information for things like á and ç, but it doesn't say anything about
> the quotes (which was the whole inital point), nor about characters
> like ⇒ (which I think someone else on this thread suggested).

These are specific to Emacs, and should be added.

> Furthermore, the point here would be to have "a" and "á" match each
> other, but the decomposition of "á" gives us two characters (as would
> be expected). How are we to programmatically know which of these two
> characters is to be considered equivalent to "a with accute"? Is it
> safe to assume it's the first character?

I'm not at all sure we should compare a and á equal.  It's an
additional feature anyway.  If we do want them to compare equal in
some cases, then yes, you take only the first character of the
decomposition (the so-called "base character").

> Otherwise, if we demand that the user types a´ to be able to match the
> á letter, then this feature seems kind of moot.

As I explained, the user can type the decomposed character instead.

Again, this is not necessarily about easier typing, this is about
comparing equivalent text equal.




^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: Single quotes in Info
  2015-01-29  3:44                     ` Eli Zaretskii
@ 2015-01-29  6:01                       ` Drew Adams
  2015-01-29 16:03                         ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Drew Adams @ 2015-01-29  6:01 UTC (permalink / raw)
  To: Eli Zaretskii, bruce.connor.am; +Cc: emacs-devel

> I'm not at all sure we should compare a and á equal.  It's an
> additional feature anyway.

I get the impression that you are talking only about a built-in
(more or less hard-coded, predefined) set of equivalence classes
of chars, whatever that set might be defined as.

Is that right, or would users be able to define the equivalence
classes you are thinking of?

If they would not then a separate but desirable (IMO) feature
would be for users to be able to easily define their own such
equivalence classes.

It could be OK if this feature did not have the same efficiency
as the built-in classes.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-28 22:23                   ` Stefan Monnier
@ 2015-01-29 14:31                     ` Artur Malabarba
  0 siblings, 0 replies; 40+ messages in thread
From: Artur Malabarba @ 2015-01-29 14:31 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel, Marcin Borkowski

[-- Attachment #1: Type: text/plain, Size: 1155 bytes --]

Ok, here's how I can see this being done:
1. define a new field for the buffer object which is a char-table.
2. Populate this table in lisp code.
3. Use it instead of the case folding table as the translation table for
searches, if some given variable is non-nil.

Would that be desirable?

We could also use the equivalence class folding table in addition to the
case folding table. But that would (in the very least) involve changing the
c search functions to take an additional argument.
On 28 Jan 2015 20:23, "Stefan Monnier" <monnier@iro.umontreal.ca> wrote:

> > Ok, I'll be getting on a 10 hour flight now, so I'll be looking into
> > the case-fold machinery.
> > I did have a brief look already and it doesn't seem horribly absurd.
>
> > Any other pointers that might be useful before I jump into no-internet
> >  land? :-)
>
> Just a warning: the case-tables are threatened.  They should be replaced by
> Unicode-aware (locale-dependent?) case folding for the 99.99% of the
> cases, the only remaining case is the "ASCII upcase/downcase" operation
> used in sendmail.el (IIRC), which we can hopefully solve some other way.
>
>
>         Stefan
>

[-- Attachment #2: Type: text/html, Size: 1549 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-29  6:01                       ` Drew Adams
@ 2015-01-29 16:03                         ` Eli Zaretskii
  2015-01-29 16:24                           ` Drew Adams
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2015-01-29 16:03 UTC (permalink / raw)
  To: Drew Adams; +Cc: bruce.connor.am, emacs-devel

> Date: Wed, 28 Jan 2015 22:01:09 -0800 (PST)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: emacs-devel@gnu.org
> 
> > I'm not at all sure we should compare a and á equal.  It's an
> > additional feature anyway.
> 
> I get the impression that you are talking only about a built-in
> (more or less hard-coded, predefined) set of equivalence classes
> of chars, whatever that set might be defined as.

We certainly should have predefined equivalence support based on the
Unicode Standard's recommendations.  That is the state of the art
these days, and any respectable text editor should include such
support.

> Is that right, or would users be able to define the equivalence
> classes you are thinking of?

We should first provide users with a set of sensible optional
behaviors that they are likely to expect in various situations.  Each
option will invoke a certain predefined behavior, such as whether or
not equivalence classes are at all considered, whether or not a and á
compare equal, etc.  There are important use cases for each one of
those, exactly like there important use cases for both case-sensitive
and case-insensitive search.

Once we have that in place, we can add user-defined additions.  I
expect them to be relatively minor and mostly mode-specific, such as
the special treatment of quotes and other special characters in Info
buffers.  Why minor? because Unicode already thought out and defined
almost any imaginable feature in this regard, so chances that some
user might need something in addition are small.

Mode-specific additions could be just alists that map characters or
strings to their equivalents.  Since I don't expect those to become
large, there's no need for anything fancier, IMO.

> If they would not then a separate but desirable (IMO) feature
> would be for users to be able to easily define their own such
> equivalence classes.

I wouldn't call them equivalence classes.  Users are not expected to
be experts in Unicode features, its various data tables, and their
implementation in Emacs.  We should instead provide easy-to-customize
option variables to select out of an array of predefined features
based on Unicode tables we already have.  User additions should be
some simple data structure that don't require any special expertise.




^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: Single quotes in Info
  2015-01-29 16:03                         ` Eli Zaretskii
@ 2015-01-29 16:24                           ` Drew Adams
  2015-01-29 16:57                             ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Drew Adams @ 2015-01-29 16:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: bruce.connor.am, emacs-devel

> > > I'm not at all sure we should compare a and á equal.  It's an
> > > additional feature anyway.
> >
> > I get the impression that you are talking only about a built-in
> > (more or less hard-coded, predefined) set of equivalence classes
> > of chars, whatever that set might be defined as.
> 
> We certainly should have predefined equivalence support based on the
> Unicode Standard's recommendations.  That is the state of the art
> these days, and any respectable text editor should include such
> support.
> 
> > Is that right, or would users be able to define the equivalence
> > classes you are thinking of?
> 
> We should first provide users with a set of sensible optional
> behaviors that they are likely to expect in various situations.  Each
> option will invoke a certain predefined behavior, such as whether or
> not equivalence classes are at all considered, whether or not a and á
> compare equal, etc.  There are important use cases for each one of
> those, exactly like there important use cases for both case-sensitive
> and case-insensitive search.
> 
> Once we have that in place, we can add user-defined additions.  I
> expect them to be relatively minor and mostly mode-specific, such as
> the special treatment of quotes and other special characters in Info
> buffers.  Why minor? because Unicode already thought out and defined
> almost any imaginable feature in this regard, so chances that some
> user might need something in addition are small.
> 
> Mode-specific additions could be just alists that map characters or
> strings to their equivalents.  Since I don't expect those to become
> large, there's no need for anything fancier, IMO.

Glad to see all of those specific replies.  It all sounds good to me,
including the proposed development priorities.

> > If they would not then a separate but desirable (IMO) feature
> > would be for users to be able to easily define their own such
> > equivalence classes.
> 
> I wouldn't call them equivalence classes.  Users are not expected to
> be experts in Unicode features, its various data tables, and their
> implementation in Emacs.  We should instead provide easy-to-customize
> option variables to select out of an array of predefined features
> based on Unicode tables we already have.  User additions should be
> some simple data structure that don't require any special expertise.

I don't care what you call them.  In the interest of brevity I also
did not explicitly mention the possibility of associating multiple-char
sequences with other such or with single chars (e.g., associating "=>"
with ⇒ or "ss" with ß, though those two would presumably be predefined).

To me, each set of such associations constitutes an equivalence class,
but I don't care what nomenclature is used to describe it, as long
as it is clear.

My point was for users to eventually be able to specify their own
such associations, in addition to those (e.g. Unicode) that would be
predefined.

And it would be good to be able to use these not only for search but
also for easy replacement (in either direction of such an equivalence),
etc.  E.g., have easy access to such pairs via `M-%' - be able to
input one of such a class (char or char sequence) and then pick from
its defined equivalences for the replacement.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-29 16:24                           ` Drew Adams
@ 2015-01-29 16:57                             ` Eli Zaretskii
  0 siblings, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2015-01-29 16:57 UTC (permalink / raw)
  To: Drew Adams; +Cc: bruce.connor.am, emacs-devel

> Date: Thu, 29 Jan 2015 08:24:02 -0800 (PST)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: bruce.connor.am@gmail.com, emacs-devel@gnu.org
> 
> To me, each set of such associations constitutes an equivalence class,
> but I don't care what nomenclature is used to describe it, as long
> as it is clear.

My point was that I don't think it would be wise to ask users to mess
with Unicode tables to customize this.  We should instead provide a
way to add simple data structures that add to the predefined
equivalence calsses.

> And it would be good to be able to use these not only for search but
> also for easy replacement (in either direction of such an equivalence),
> etc.

I agree.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: Single quotes in Info
       [not found]                             ` <<83mw51msnz.fsf@gnu.org>
@ 2015-01-29 17:05                               ` Drew Adams
  2015-01-29 17:24                                 ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Drew Adams @ 2015-01-29 17:05 UTC (permalink / raw)
  To: Eli Zaretskii, Drew Adams; +Cc: bruce.connor.am, emacs-devel

> > To me, each set of such associations constitutes an equivalence class,
> > but I don't care what nomenclature is used to describe it, as long
> > as it is clear.
> 
> My point was that I don't think it would be wise to ask users to mess
> with Unicode tables to customize this.

I agree with that (without a lot of understanding of the implications).
Users should have a simple way to define such a class of equivalences
(choose your own term).  Something as simple as an alist, perhaps.

> We should instead provide a way to add simple data structures that
> add to the predefined equivalence calsses.

Not sure what you mean, but if you mean that users would only be able
to add their own associations (equivalences) to existing classes then
that is not what I would like to see as the only possibility.

I would like to see the ability for users to define classes, and to
"activate" (enable the use of; turn on) or "deactivate" (turn off) a
particular class of equivalences as a whole, including any of the
predefined classes.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-29 17:05                               ` Single quotes in Info Drew Adams
@ 2015-01-29 17:24                                 ` Eli Zaretskii
  2015-01-29 18:34                                   ` Drew Adams
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2015-01-29 17:24 UTC (permalink / raw)
  To: Drew Adams; +Cc: bruce.connor.am, emacs-devel

> Date: Thu, 29 Jan 2015 09:05:38 -0800 (PST)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: bruce.connor.am@gmail.com, emacs-devel@gnu.org
> 
> I would like to see the ability for users to define classes, and to
> "activate" (enable the use of; turn on) or "deactivate" (turn off) a
> particular class of equivalences as a whole, including any of the
> predefined classes.

This would require modifying the Unicode tables.  They are just large
char-tables, so someone who knows what they are doing should be able
to do that.

But that's not for the faint at heart, and I don't see why users would
like to disable or replace portions of those tables.  I do understand
why in some use cases certain equivalences classes are inappropriate,
but they are inappropriate _as_a_whole_.  Doing that for a part of a
class doesn't make sense to me.  E.g., why would you want to make 2
and ② equivalent, but not 2 and ²?  So this kind of customization
doesn't have to be easy, IMO, and it's okay to ask such users to know
what they are doing.




^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: Single quotes in Info
  2015-01-29 17:24                                 ` Eli Zaretskii
@ 2015-01-29 18:34                                   ` Drew Adams
  2015-01-29 18:54                                     ` Eli Zaretskii
  0 siblings, 1 reply; 40+ messages in thread
From: Drew Adams @ 2015-01-29 18:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: bruce.connor.am, emacs-devel

> > I would like to see the ability for users to define classes, and to
> > "activate" (enable the use of; turn on) or "deactivate" (turn off) a
> > particular class of equivalences as a whole, including any of the
> > predefined classes.
> 
> This would require modifying the Unicode tables.  They are just large
> char-tables, so someone who knows what they are doing should be able
> to do that.

The point is to let ordinary users define such classes, and use them
selectively.

> But that's not for the faint at heart

Then fiddling at that level is not the (only) answer.  If changes at
that level are ultimately required, then perhaps a user-friendly layer
can be added above such low-level changes.

> and I don't see why users would
> like to disable or replace portions of those tables.

That's putting it wrong, putting it already in terms of implementation.
Ordinary users would certainly not *want* to "disable or replace portions
of those tables".  That is, they would not want to, and should not need
to, think in terms of such tables.  Whether such tables get changed
under the covers when they want to define a new class of chars should
not be something they need concern themselves with (I hope).

What (some) ordinary users are liable to want to be able to do is define
a class of chars that they can use in place of each other etc., and to
choose among such classes, via Lisp or interactively, enabling/disabling
the equivalences they define.

> I do understand why in some use cases certain equivalences classes
> are inappropriate, but they are inappropriate _as_a_whole_.  Doing
> that for a part of a class doesn't make sense to me.

I did not say anything about enabling some of the equivalences of a
class but not others.  What I suggested was being able to specify a
set of associations as a new, user-level equivalence class, and then
being able to enable/disable that class as a whole.  Whether the
members of that class also belong to a larger, predefined class is
not relevant here. 

> E.g., why would you want to make 2 and ② equivalent, but not 2 and ²?

Why not?  Why not be able to define your own class that includes
2 = ②, 3 = ③, etc., but not 2 = ² etc.?  What you want to consider
equivalent can depend on your particular context/needs.

The fact that there are natural, predefined Unicode equivalences
in general does not mean that only those equivalences make sense for
a given user in a given context.

> So this kind of customization doesn't have to be easy, IMO, and
> it's okay to ask such users to know what they are doing.

I disagree.  But I'm talking user-level and wishlist.  I have nothing
to say about the difficulty of providing what I am suggesting.

I am hoping that it *will* be easy for a user to both (a) define
an equivalence class (set of associations) of chars and (b) enable
or disable the use of that class.  For search and for other purposes.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: Single quotes in Info
  2015-01-29 18:34                                   ` Drew Adams
@ 2015-01-29 18:54                                     ` Eli Zaretskii
  2015-01-29 19:35                                       ` Drew Adams
  0 siblings, 1 reply; 40+ messages in thread
From: Eli Zaretskii @ 2015-01-29 18:54 UTC (permalink / raw)
  To: Drew Adams; +Cc: bruce.connor.am, emacs-devel

> Date: Thu, 29 Jan 2015 10:34:58 -0800 (PST)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: bruce.connor.am@gmail.com, emacs-devel@gnu.org
> 
> > > I would like to see the ability for users to define classes, and to
> > > "activate" (enable the use of; turn on) or "deactivate" (turn off) a
> > > particular class of equivalences as a whole, including any of the
> > > predefined classes.
> > 
> > This would require modifying the Unicode tables.  They are just large
> > char-tables, so someone who knows what they are doing should be able
> > to do that.
> 
> The point is to let ordinary users define such classes, and use them
> selectively.

They should be able to.  But I was talking about _un_defining existing
classes.

> > and I don't see why users would
> > like to disable or replace portions of those tables.
> 
> That's putting it wrong, putting it already in terms of implementation.

No, it's not.  I just used these words, that's all.  The intent was to
say that disabling portions of a certain class makes no sense.

> Ordinary users would certainly not *want* to "disable or replace portions
> of those tables".  That is, they would not want to, and should not need
> to, think in terms of such tables.

Red herring.  I was using these words to make the issue clear.

> What (some) ordinary users are liable to want to be able to do is define
> a class of chars that they can use in place of each other etc., and to
> choose among such classes, via Lisp or interactively, enabling/disabling
> the equivalences they define.

Replacing existing classes would need modifications of the Unicode
tables.  Again, not easy, and should be.

> > E.g., why would you want to make 2 and ② equivalent, but not 2 and ²?
> 
> Why not?  Why not be able to define your own class that includes
> 2 = ②, 3 = ③, etc., but not 2 = ² etc.?

Because it makes no sense.  This isn't some game we are playing here;
these equivalences have deep meaning in some contexts.  If they don't,
they should not be used as a whole.

> > So this kind of customization doesn't have to be easy, IMO, and
> > it's okay to ask such users to know what they are doing.
> 
> I disagree.

Then we will have to agree to disagree.

However, this is all highly theoretical, since the real decision will
be made by whoever develops this.




^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: Single quotes in Info
  2015-01-29 18:54                                     ` Eli Zaretskii
@ 2015-01-29 19:35                                       ` Drew Adams
  0 siblings, 0 replies; 40+ messages in thread
From: Drew Adams @ 2015-01-29 19:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: bruce.connor.am, emacs-devel

> Replacing existing classes would need modifications of the Unicode
> tables.  Again, not easy, and should be.

I didn't say anything about replacing existing classes.

> > > E.g., why would you want to make 2 and ② equivalent, but not 2 and ²?
> >
> > Why not?  Why not be able to define your own class that includes
> > 2 = ②, 3 = ③, etc., but not 2 = ² etc.?
> 
> Because it makes no sense.  This isn't some game we are playing here;
> these equivalences have deep meaning in some contexts.  If they don't,
> they should not be used as a whole.

I give up.  To me, it should be possible to allow user & use-case
choices - arbitrary equivalence classes, not just
only-predefined-correspondences-can-possibly-make-sense.

User-defined does not imply silly game-playing or any necessary lack
of "deep meaning".



^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2015-01-29 19:35 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-23 23:17 Single quotes in Info Marcin Borkowski
2015-01-23 23:53 ` Drew Adams
2015-01-24 17:01   ` Marcin Borkowski
2015-01-24  8:38 ` Eli Zaretskii
2015-01-24 15:11   ` Drew Adams
2015-01-24 15:19     ` Eli Zaretskii
     [not found]     ` <<838ugsrysw.fsf@gnu.org>
2015-01-24 15:54       ` Drew Adams
2015-01-24 16:45         ` Marcin Borkowski
2015-01-24 17:00     ` Marcin Borkowski
2015-01-27 16:27       ` Artur Malabarba
2015-01-27 17:37         ` Stefan Monnier
2015-01-27 18:09           ` Eli Zaretskii
2015-01-27 19:00             ` Stefan Monnier
2015-01-27 19:15               ` Eli Zaretskii
2015-01-27 19:49           ` Artur Malabarba
2015-01-27 20:30             ` Stefan Monnier
2015-01-28  3:48               ` Stefan Monnier
2015-01-28 21:42                 ` Artur Malabarba
2015-01-28 22:23                   ` Stefan Monnier
2015-01-29 14:31                     ` Artur Malabarba
2015-01-27 18:04         ` Eli Zaretskii
2015-01-27 18:39           ` Drew Adams
2015-01-27 20:24           ` Artur Malabarba
2015-01-27 21:18             ` Eli Zaretskii
2015-01-28  1:15               ` Artur Malabarba
2015-01-28 15:24                 ` Eli Zaretskii
2015-01-28 16:10                   ` Yuri Khan
2015-01-28 17:22                     ` Eli Zaretskii
2015-01-28 21:38                   ` Artur Malabarba
2015-01-29  3:44                     ` Eli Zaretskii
2015-01-29  6:01                       ` Drew Adams
2015-01-29 16:03                         ` Eli Zaretskii
2015-01-29 16:24                           ` Drew Adams
2015-01-29 16:57                             ` Eli Zaretskii
     [not found] ` <mailman.18484.1422057224.1147.help-gnu-emacs@gnu.org>
2015-01-26  3:26   ` Unicode in emacs (was Single quotes in Info) Rusi
     [not found] <<87twzhgk84.fsf@wmi.amu.edu.pl>
     [not found] ` <<83lhksshdm.fsf@gnu.org>
     [not found]   ` <<9ee0c895-a178-40e1-b1c8-ed2b97071c6b@default>
     [not found]     ` <<87h9vgglkz.fsf@wmi.amu.edu.pl>
     [not found]       ` <<CAAdUY-J4s+1_C7bj32Xk5x8d01fe9baPCYmwd+0KU=QorO7wZg@mail.gmail.com>
     [not found]         ` <<83h9vcp0bq.fsf@gnu.org>
     [not found]           ` <<CAAdUY-Kck6moHTRJshbXJdRVQ6gK6Q24f_PD7SuEaZ7hURpdQw@mail.gmail.com>
     [not found]             ` <<83y4onorcc.fsf@gnu.org>
     [not found]               ` <<CAAdUY-+ooLydD-qPtiEvv-01TGxX5E-cf6asvs+Jn+eR_=38ig@mail.gmail.com>
     [not found]                 ` <<83vbjrnd1f.fsf@gnu.org>
     [not found]                   ` <<CAAdUY-JwX-p-ZzdExm9+cKs5pC0SUoLLs8ppA9esuXsRuHRdng@mail.gmail.com>
     [not found]                     ` <<83386untcd.fsf@gnu.org>
     [not found]                       ` <<ee612423-67bf-42d0-a0ef-0dad11605c49@default>
     [not found]                         ` <<83vbjpmv4w.fsf@gnu.org>
     [not found]                           ` <<6164d89d-23ac-46bf-9f84-154cc0e6c6e4@default>
     [not found]                             ` <<83mw51msnz.fsf@gnu.org>
2015-01-29 17:05                               ` Single quotes in Info Drew Adams
2015-01-29 17:24                                 ` Eli Zaretskii
2015-01-29 18:34                                   ` Drew Adams
2015-01-29 18:54                                     ` Eli Zaretskii
2015-01-29 19:35                                       ` Drew Adams

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.