* Single quotes in Info @ 2015-01-23 23:17 Marcin Borkowski 2015-01-23 23:53 ` Drew Adams ` (2 more replies) 0 siblings, 3 replies; 40+ messages in thread From: Marcin Borkowski @ 2015-01-23 23:17 UTC (permalink / raw) To: Help Gnu Emacs mailing list Hello all, I'm not sure about it, but it seems that after upgrading from 24.3 to 25.0.50.1, the Info buffer is a bit uglified. First, it uses some face I don't like for variable and function names – but if this annoys me too much, I can change it easily. Worse, instead of e.g. `t' it now says ‘t’, for instance (i.e., it uses Unicode single quotation marks). This is extremely annoying, since it makes incremental searching for single-quoted strings much harder. I apropos'ed the "Info-" variables and grepped the list for "quot", "unicode" and "single", all to no avail, and ran out of ideas. Is this behavior customizable? How to get back to ASCII quotes? TIA, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: Single quotes in Info 2015-01-23 23:17 Single quotes in Info Marcin Borkowski @ 2015-01-23 23:53 ` Drew Adams 2015-01-24 17:01 ` Marcin Borkowski 2015-01-24 8:38 ` Eli Zaretskii [not found] ` <mailman.18484.1422057224.1147.help-gnu-emacs@gnu.org> 2 siblings, 1 reply; 40+ messages in thread From: Drew Adams @ 2015-01-23 23:53 UTC (permalink / raw) To: Marcin Borkowski, Help Gnu Emacs mailing list > I'm not sure about it, but it seems that after upgrading from 24.3 to > 25.0.50.1, the Info buffer is a bit uglified. First, it uses some face > I don't like for variable and function names – but if this annoys me too > much, I can change it easily. Worse, instead of e.g. `t' it now says > ‘t’, for instance (i.e., it uses Unicode single quotation marks). > > This is extremely annoying, since it makes incremental searching for > single-quoted strings much harder. > > I apropos'ed the "Info-" variables and grepped the list for "quot", > "unicode" and "single", all to no avail, and ran out of ideas. Is this > behavior customizable? How to get back to ASCII quotes? Oh boy, you'll have fun reading about this in the bug threads: #16292 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16292 info docs now contain single straight quotes instead of `' #13131 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13131 Allow curly quotes to be found by searching for straight quotes? #16439 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16439 Highlighting of strings within Info buffers #13228 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13228 Request for highlighting back-quote/quote pair notation Enjoy! (Info+ can at least help by highlighting quoted names etc. http://www.emacswiki.org/emacs/InfoPlus) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-23 23:53 ` Drew Adams @ 2015-01-24 17:01 ` Marcin Borkowski 0 siblings, 0 replies; 40+ messages in thread From: Marcin Borkowski @ 2015-01-24 17:01 UTC (permalink / raw) To: Help Gnu Emacs mailing list On 2015-01-24, at 00:53, Drew Adams <drew.adams@oracle.com> wrote: >> I'm not sure about it, but it seems that after upgrading from 24.3 to >> 25.0.50.1, the Info buffer is a bit uglified. First, it uses some face >> I don't like for variable and function names – but if this annoys me too >> much, I can change it easily. Worse, instead of e.g. `t' it now says >> ‘t’, for instance (i.e., it uses Unicode single quotation marks). >> >> This is extremely annoying, since it makes incremental searching for >> single-quoted strings much harder. >> >> I apropos'ed the "Info-" variables and grepped the list for "quot", >> "unicode" and "single", all to no avail, and ran out of ideas. Is this >> behavior customizable? How to get back to ASCII quotes? > > Oh boy, you'll have fun reading about this in the bug threads: > > #16292 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16292 > info docs now contain single straight quotes instead of `' > > #13131 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13131 > Allow curly quotes to be found by searching for straight quotes? > > #16439 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16439 > Highlighting of strings within Info buffers > > #13228 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13228 > Request for highlighting back-quote/quote pair notation > > Enjoy! Thanks, I'll look at these. Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-23 23:17 Single quotes in Info Marcin Borkowski 2015-01-23 23:53 ` Drew Adams @ 2015-01-24 8:38 ` Eli Zaretskii 2015-01-24 15:11 ` Drew Adams [not found] ` <mailman.18484.1422057224.1147.help-gnu-emacs@gnu.org> 2 siblings, 1 reply; 40+ messages in thread From: Eli Zaretskii @ 2015-01-24 8:38 UTC (permalink / raw) To: help-gnu-emacs > From: Marcin Borkowski <mbork@wmi.amu.edu.pl> > Date: Sat, 24 Jan 2015 00:17:47 +0100 > > I'm not sure about it, but it seems that after upgrading from 24.3 to > 25.0.50.1, the Info buffer is a bit uglified. First, it uses some face > I don't like for variable and function names Not sure what you mean here, because there is no such face in Info. Maybe you mean Info-quoted, which is used for quoted strings? (You can use "M-x describe-text-properties" to show the face at point.) > Worse, instead of e.g. `t' it now says ‘t’, for instance (i.e., it > uses Unicode single quotation marks). I don't think this has anything to do with Emacs. These characters come from the Info file itself, and are produced by the new 'makeinfo' command. That's "progress" for you: many people nowadays no longer want to see ASCII quotes, they want to see those fancy characters Unicode introduced. Or maybe the reason is that in Emacs 24 we actively prevent 'makeinfo' from doing that, whereas in Emacs 25 we don't. > This is extremely annoying, since it makes incremental searching for > single-quoted strings much harder. Doesn't M-C-s allow you to find that by a suitable regexp? Anyway, we should revive bug #13131, and provide an easier solution for this particular issue. > How to get back to ASCII quotes? I think you need to regenerate the Info docs, using the levers we did in Emacs 24 to disallow Unicode quotes. Or customize 'makeinfo' to produce ASCII characters instead (search for OPEN_QUOTE_SYMBOL in the Texinfo manual), and then regenerate the docs. Or install an older 'makeinfo', which didn't produce these quotes, and then regenerate the docs. ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: Single quotes in Info 2015-01-24 8:38 ` Eli Zaretskii @ 2015-01-24 15:11 ` Drew Adams 2015-01-24 15:19 ` Eli Zaretskii ` (2 more replies) 0 siblings, 3 replies; 40+ messages in thread From: Drew Adams @ 2015-01-24 15:11 UTC (permalink / raw) To: Eli Zaretskii, help-gnu-emacs > Anyway, we should revive bug #13131, and provide an easier solution > for this particular issue. I agree. For this particular (search) issue. This is conceptually related to, but it need not necessarily be extended to, discussion about being able to Isearch abstracting from diacritical marks etc. (E.g. bug #13041: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13041.) IOW, being able to easily specify equivalence classes of chars for search (and other) purposes, and preferably being able to quickly choose whether to make use of them (this one or that one) - e.g., as we can do now for case-sensitivity (`a' ~ `A'). The easily-search-for-curly-or-not-curly problem reminds us that Info is not only about display: One needs to be able to easily search for (and perhaps even type directly) the chars that are displayed. Chars ` and ' correspond to keys on most keyboards. ‘ and ’ do not. Some of those who propose curly-quote etc. display as a "modernization" of Emacs might not take sufficiently into account how Emacs users interact with the text. "Modern" appearance is nice (even important), but Emacs is not *only* about display. > > How to get back to ASCII quotes? > > I think you need to regenerate the Info docs, using the levers we did > in Emacs 24 to disallow Unicode quotes. Or customize 'makeinfo' to > produce ASCII characters instead (search for OPEN_QUOTE_SYMBOL in the > Texinfo manual), and then regenerate the docs. Or install an older > 'makeinfo', which didn't produce these quotes, and then regenerate the > docs. As I know you are aware, Eli, this return-to-the-source is not a real solution. (Ideally) Emacs users themselves should be able (somehow) to choose which chars are used for such display. Remaking Info should not be our only (i.e., final) answer, even if it is such today. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-24 15:11 ` Drew Adams @ 2015-01-24 15:19 ` Eli Zaretskii [not found] ` <<838ugsrysw.fsf@gnu.org> 2015-01-24 17:00 ` Marcin Borkowski 2 siblings, 0 replies; 40+ messages in thread From: Eli Zaretskii @ 2015-01-24 15:19 UTC (permalink / raw) To: help-gnu-emacs > Date: Sat, 24 Jan 2015 07:11:05 -0800 (PST) > From: Drew Adams <drew.adams@oracle.com> > > > > How to get back to ASCII quotes? > > > > I think you need to regenerate the Info docs, using the levers we did > > in Emacs 24 to disallow Unicode quotes. Or customize 'makeinfo' to > > produce ASCII characters instead (search for OPEN_QUOTE_SYMBOL in the > > Texinfo manual), and then regenerate the docs. Or install an older > > 'makeinfo', which didn't produce these quotes, and then regenerate the > > docs. > > As I know you are aware, Eli, this return-to-the-source is not a real > solution. I was enumerating solutions that are available to the OP now. This list is about helping users do whatever they want, not about telling Emacs developers what future features they should work on ;-) ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <<838ugsrysw.fsf@gnu.org>]
* RE: Single quotes in Info [not found] ` <<838ugsrysw.fsf@gnu.org> @ 2015-01-24 15:54 ` Drew Adams 2015-01-24 16:45 ` Marcin Borkowski 0 siblings, 1 reply; 40+ messages in thread From: Drew Adams @ 2015-01-24 15:54 UTC (permalink / raw) To: Eli Zaretskii, help-gnu-emacs > > As I know you are aware, Eli, this return-to-the-source is not a > > real solution. (Ideally) Emacs users themselves should be able > > (somehow) to choose which chars are used for such display. > > Remaking Info should not be our only (i.e., final) answer, even > > if it is such today. > > I was enumerating solutions that are available to the OP now. This > list is about helping users do whatever they want, not about telling > Emacs developers what future features they should work on ;-) My message was an endorsement reply to your own development-oriented statement: ez> Anyway, we should revive bug #13131, and provide an easier ez> solution for this particular issue. And I think it does not hurt users to be reminded that curly quotes are not as easy to type as straight quotes (with many/most keyboards), and that Info is about things like search and not only about display. Thanks to Marcin for reminding us all. It is perfectly legitimate to discuss possible new features on this list, as well as current limitations & possible workarounds. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-24 15:54 ` Drew Adams @ 2015-01-24 16:45 ` Marcin Borkowski 0 siblings, 0 replies; 40+ messages in thread From: Marcin Borkowski @ 2015-01-24 16:45 UTC (permalink / raw) To: Eli Zaretskii, help-gnu-emacs On 2015-01-24, at 16:54, Drew Adams <drew.adams@oracle.com> wrote: > And I think it does not hurt users to be reminded that curly > quotes are not as easy to type as straight quotes (with many/most > keyboards), and that Info is about things like search and not > only about display. Thanks to Marcin for reminding us all. You're welcome. BTW, I love the Info system. Only recently I learned to use its index (and not only isearch), and it's even better with that. My particular use case was with the info page on interactive codes. I wanted to search for the string "`p'", and I could enter curly quotes using M-e and editing the search query with some unicode-aware things (C-x 8 RET, for instance), but this is a nuisance. (Also, in my case, isearch-forward-regexp wouldn't help.) Please note that I do appreciate typographical niceties like proper quotes and such. In this case, however, usability is more important than aesthetics imho. > It is perfectly legitimate to discuss possible new features on > this list, as well as current limitations & possible workarounds. That's good to know! :-) -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-24 15:11 ` Drew Adams 2015-01-24 15:19 ` Eli Zaretskii [not found] ` <<838ugsrysw.fsf@gnu.org> @ 2015-01-24 17:00 ` Marcin Borkowski 2015-01-27 16:27 ` Artur Malabarba 2 siblings, 1 reply; 40+ messages in thread From: Marcin Borkowski @ 2015-01-24 17:00 UTC (permalink / raw) To: Eli Zaretskii, help-gnu-emacs On 2015-01-24, at 16:11, Drew Adams <drew.adams@oracle.com> wrote: > This is conceptually related to, but it need not necessarily be > extended to, discussion about being able to Isearch abstracting from > diacritical marks etc. (E.g. bug #13041: > http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13041.) > > IOW, being able to easily specify equivalence classes of chars for > search (and other) purposes, and preferably being able to quickly > choose whether to make use of them (this one or that one) - e.g., > as we can do now for case-sensitivity (`a' ~ `A'). This is a great idea. Maybe even not only for isearch. Best, -- Marcin Borkowski http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski Faculty of Mathematics and Computer Science Adam Mickiewicz University ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-24 17:00 ` Marcin Borkowski @ 2015-01-27 16:27 ` Artur Malabarba 2015-01-27 17:37 ` Stefan Monnier 2015-01-27 18:04 ` Eli Zaretskii 0 siblings, 2 replies; 40+ messages in thread From: Artur Malabarba @ 2015-01-27 16:27 UTC (permalink / raw) To: Marcin Borkowski, emacs-devel; +Cc: Eli Zaretskii, help-gnu-emacs 2015-01-24 15:00 GMT-02:00 Marcin Borkowski <mbork@wmi.amu.edu.pl>: > > On 2015-01-24, at 16:11, Drew Adams <drew.adams@oracle.com> wrote: > >> This is conceptually related to, but it need not necessarily be >> extended to, discussion about being able to Isearch abstracting from >> diacritical marks etc. (E.g. bug #13041: >> http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13041.) >> >> IOW, being able to easily specify equivalence classes of chars for >> search (and other) purposes, and preferably being able to quickly >> choose whether to make use of them (this one or that one) - e.g., >> as we can do now for case-sensitivity (`a' ~ `A'). > > This is a great idea. Maybe even not only for isearch. > I also really like this idea, so much so that I've gone ahead and implemented it. It is implemented on the branch `scratch/isearch-character-group-folding'. I called it group-folding, but we can call it class folding or whatever sounds more intuitive to most people. The implementation is very much up for debate. Currently, what it does is use regexps (behind the scenes) so that a plain double quote matches all those unicode double quotes, and the same for a hard single quote. The way it is written, it is trivial to add more groups by adding entries to `isearch-groups-alist'. Of course, other characters are appropriately regexp-quoted behind the scenes, so that everything else works as expected. The surface is exactly like regular isearch, except for these two characters. The set of groups is defined by `isearch-groups-alist', and the folding only happens if `isearch-fold-groups' is non-nil. Other groups that maybe should be added are latin accented letters. Cheers to all, ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-27 16:27 ` Artur Malabarba @ 2015-01-27 17:37 ` Stefan Monnier 2015-01-27 18:09 ` Eli Zaretskii 2015-01-27 19:49 ` Artur Malabarba 2015-01-27 18:04 ` Eli Zaretskii 1 sibling, 2 replies; 40+ messages in thread From: Stefan Monnier @ 2015-01-27 17:37 UTC (permalink / raw) To: Artur Malabarba Cc: Eli Zaretskii, emacs-devel, help-gnu-emacs, Marcin Borkowski > The implementation is very much up for debate. Currently, what it does > is use regexps (behind the scenes) so that a plain double quote > matches all those unicode double quotes, and the same for a hard > single quote. Why not use the case-fold machinery instead? Stefan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-27 17:37 ` Stefan Monnier @ 2015-01-27 18:09 ` Eli Zaretskii 2015-01-27 19:00 ` Stefan Monnier 2015-01-27 19:49 ` Artur Malabarba 1 sibling, 1 reply; 40+ messages in thread From: Eli Zaretskii @ 2015-01-27 18:09 UTC (permalink / raw) To: Stefan Monnier; +Cc: help-gnu-emacs, emacs-devel, bruce.connor.am, mbork > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: Marcin Borkowski <mbork@wmi.amu.edu.pl>, emacs-devel <emacs-devel@gnu.org>, Eli Zaretskii <eliz@gnu.org>, help-gnu-emacs <help-gnu-emacs@gnu.org> > Date: Tue, 27 Jan 2015 12:37:31 -0500 > > > The implementation is very much up for debate. Currently, what it does > > is use regexps (behind the scenes) so that a plain double quote > > matches all those unicode double quotes, and the same for a hard > > single quote. > > Why not use the case-fold machinery instead? That will work only for character-for-character replacements, won't it? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-27 18:09 ` Eli Zaretskii @ 2015-01-27 19:00 ` Stefan Monnier 2015-01-27 19:15 ` Eli Zaretskii 0 siblings, 1 reply; 40+ messages in thread From: Stefan Monnier @ 2015-01-27 19:00 UTC (permalink / raw) To: Eli Zaretskii; +Cc: help-gnu-emacs, emacs-devel, bruce.connor.am, mbork >> Why not use the case-fold machinery instead? > That will work only for character-for-character replacements, won't > it? That's right. But it will work a lot more efficiently (and reliably, e.g. if you have a one of those characters in a character-range) for those. Stefan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-27 19:00 ` Stefan Monnier @ 2015-01-27 19:15 ` Eli Zaretskii 0 siblings, 0 replies; 40+ messages in thread From: Eli Zaretskii @ 2015-01-27 19:15 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel, bruce.connor.am, mbork > From: Stefan Monnier <monnier@iro.umontreal.ca> > Cc: bruce.connor.am@gmail.com, mbork@wmi.amu.edu.pl, emacs-devel@gnu.org, help-gnu-emacs@gnu.org > Date: Tue, 27 Jan 2015 14:00:49 -0500 > > >> Why not use the case-fold machinery instead? > > That will work only for character-for-character replacements, won't > > it? > > That's right. But it will work a lot more efficiently (and reliably, > e.g. if you have a one of those characters in a character-range) for those. But then someone else will come up complaining about the other Unicode characters emitted by makeinfo 5.x. There's about a dozen of them. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-27 17:37 ` Stefan Monnier 2015-01-27 18:09 ` Eli Zaretskii @ 2015-01-27 19:49 ` Artur Malabarba 2015-01-27 20:30 ` Stefan Monnier 1 sibling, 1 reply; 40+ messages in thread From: Artur Malabarba @ 2015-01-27 19:49 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel, help-gnu-emacs, Marcin Borkowski 2015-01-27 15:37 GMT-02:00 Stefan Monnier <monnier@iro.umontreal.ca>: >> The implementation is very much up for debate. Currently, what it does >> is use regexps (behind the scenes) so that a plain double quote >> matches all those unicode double quotes, and the same for a hard >> single quote. > > Why not use the case-fold machinery instead? Because, IIUC, this is done in c code. While I know c, I can't say I know Emacs' c. So that implementation will take longer (something on the order of weeks). ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-27 19:49 ` Artur Malabarba @ 2015-01-27 20:30 ` Stefan Monnier 2015-01-28 3:48 ` Stefan Monnier 0 siblings, 1 reply; 40+ messages in thread From: Stefan Monnier @ 2015-01-27 20:30 UTC (permalink / raw) To: Artur Malabarba; +Cc: emacs-devel, help-gnu-emacs, Marcin Borkowski >> Why not use the case-fold machinery instead? > Because, IIUC, this is done in c code. While I know c, I can't say I > know Emacs' c. So that implementation will take longer (something on > the order of weeks). It's configured in C, tho. Try: C-h f *case-table TAB for a start. Stefan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-27 20:30 ` Stefan Monnier @ 2015-01-28 3:48 ` Stefan Monnier 2015-01-28 21:42 ` Artur Malabarba 0 siblings, 1 reply; 40+ messages in thread From: Stefan Monnier @ 2015-01-28 3:48 UTC (permalink / raw) To: Artur Malabarba Cc: Eli Zaretskii, emacs-devel, help-gnu-emacs, Marcin Borkowski > It's configured in C, tho. Try: ^^^ Elisp Duh! Stefan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-28 3:48 ` Stefan Monnier @ 2015-01-28 21:42 ` Artur Malabarba 2015-01-28 22:23 ` Stefan Monnier 0 siblings, 1 reply; 40+ messages in thread From: Artur Malabarba @ 2015-01-28 21:42 UTC (permalink / raw) To: Stefan Monnier; +Cc: help-gnu-emacs, emacs-devel, Marcin Borkowski Ok, I'll be getting on a 10 hour flight now, so I'll be looking into the case-fold machinery. I did have a brief look already and it doesn't seem horribly absurd. Any other pointers that might be useful before I jump into no-internet land? :-) 2015-01-28 1:48 GMT-02:00 Stefan Monnier <monnier@iro.umontreal.ca>: >> It's configured in C, tho. Try: > ^^^ > Elisp > Duh! > > > Stefan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-28 21:42 ` Artur Malabarba @ 2015-01-28 22:23 ` Stefan Monnier 2015-01-29 14:31 ` Artur Malabarba 0 siblings, 1 reply; 40+ messages in thread From: Stefan Monnier @ 2015-01-28 22:23 UTC (permalink / raw) To: Artur Malabarba; +Cc: help-gnu-emacs, emacs-devel, Marcin Borkowski > Ok, I'll be getting on a 10 hour flight now, so I'll be looking into > the case-fold machinery. > I did have a brief look already and it doesn't seem horribly absurd. > Any other pointers that might be useful before I jump into no-internet > land? :-) Just a warning: the case-tables are threatened. They should be replaced by Unicode-aware (locale-dependent?) case folding for the 99.99% of the cases, the only remaining case is the "ASCII upcase/downcase" operation used in sendmail.el (IIRC), which we can hopefully solve some other way. Stefan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-28 22:23 ` Stefan Monnier @ 2015-01-29 14:31 ` Artur Malabarba 0 siblings, 0 replies; 40+ messages in thread From: Artur Malabarba @ 2015-01-29 14:31 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel, Marcin Borkowski [-- Attachment #1: Type: text/plain, Size: 1155 bytes --] Ok, here's how I can see this being done: 1. define a new field for the buffer object which is a char-table. 2. Populate this table in lisp code. 3. Use it instead of the case folding table as the translation table for searches, if some given variable is non-nil. Would that be desirable? We could also use the equivalence class folding table in addition to the case folding table. But that would (in the very least) involve changing the c search functions to take an additional argument. On 28 Jan 2015 20:23, "Stefan Monnier" <monnier@iro.umontreal.ca> wrote: > > Ok, I'll be getting on a 10 hour flight now, so I'll be looking into > > the case-fold machinery. > > I did have a brief look already and it doesn't seem horribly absurd. > > > Any other pointers that might be useful before I jump into no-internet > > land? :-) > > Just a warning: the case-tables are threatened. They should be replaced by > Unicode-aware (locale-dependent?) case folding for the 99.99% of the > cases, the only remaining case is the "ASCII upcase/downcase" operation > used in sendmail.el (IIRC), which we can hopefully solve some other way. > > > Stefan > [-- Attachment #2: Type: text/html, Size: 1549 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-27 16:27 ` Artur Malabarba 2015-01-27 17:37 ` Stefan Monnier @ 2015-01-27 18:04 ` Eli Zaretskii 2015-01-27 18:39 ` Drew Adams 2015-01-27 20:24 ` Artur Malabarba 1 sibling, 2 replies; 40+ messages in thread From: Eli Zaretskii @ 2015-01-27 18:04 UTC (permalink / raw) To: bruce.connor.am; +Cc: help-gnu-emacs, emacs-devel, mbork > Date: Tue, 27 Jan 2015 14:27:45 -0200 > From: Artur Malabarba <bruce.connor.am@gmail.com> > Cc: Eli Zaretskii <eliz@gnu.org>, help-gnu-emacs <help-gnu-emacs@gnu.org> > > I also really like this idea, so much so that I've gone ahead and > implemented it. It is implemented on the branch > `scratch/isearch-character-group-folding'. I called it group-folding, > but we can call it class folding or whatever sounds more intuitive to > most people. I didn't yet have time to look at the source, so apologies if what's below is off the mark. > The implementation is very much up for debate. Currently, what it does > is use regexps (behind the scenes) so that a plain double quote > matches all those unicode double quotes, and the same for a hard > single quote. The way it is written, it is trivial to add more groups > by adding entries to `isearch-groups-alist'. > Of course, other characters are appropriately regexp-quoted behind the > scenes, so that everything else works as expected. The surface is > exactly like regular isearch, except for these two characters. If this is implemented in isearch, then IMO doing it for quotes alone makes very little sense. It would make a lot of sense if it were implemented in info.el, for searching Info manuals (in which case it should also support the other Unicode characters produced by makeinfo that have ASCII equivalents, like ⇒ vs =>. (Note that this is not character-for-character equivalence anymore.) For a general-purpose search feature, we'd need a much more general-purpose and versatile implementation. > The set of groups is defined by `isearch-groups-alist', and the > folding only happens if `isearch-fold-groups' is non-nil. > Other groups that maybe should be added are latin accented letters. If we do this via our private database, that database is going to be huge. I suggest to explore an alternative implementation, which uses canonical equivalence. We already have infrastructure for that, see the description of the 'decomposition' character property in the ELisp manual. ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: Single quotes in Info 2015-01-27 18:04 ` Eli Zaretskii @ 2015-01-27 18:39 ` Drew Adams 2015-01-27 20:24 ` Artur Malabarba 1 sibling, 0 replies; 40+ messages in thread From: Drew Adams @ 2015-01-27 18:39 UTC (permalink / raw) To: emacs-devel FWIW, I suggest that help-gnu-emacs be removed from this thread from now on. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-27 18:04 ` Eli Zaretskii 2015-01-27 18:39 ` Drew Adams @ 2015-01-27 20:24 ` Artur Malabarba 2015-01-27 21:18 ` Eli Zaretskii 1 sibling, 1 reply; 40+ messages in thread From: Artur Malabarba @ 2015-01-27 20:24 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, Marcin Borkowski [-- Attachment #1: Type: text/plain, Size: 1806 bytes --] > If this is implemented in isearch, then IMO doing it for quotes alone > makes very little sense. The quotes are just proof of concept. Adding other equivalency classes is easy from here, and I do agree it makes sense to add others. > It would make a lot of sense if it were > implemented in info.el, for searching Info manuals There are ways to do that too if people prefer, but info manuals are not the only ones that contain such characters. For instance, lots of people use round quotes in org-mode files. > (in which case it > should also support the other Unicode characters produced by makeinfo > that have ASCII equivalents, like ⇒ vs =>. (Note that this is not > character-for-character equivalence anymore.) I agree with the idea, but it will be more tricky. Translating a character to any regexp is easy right now. Translating multiple characters into a single is more complicated, but I can do that. But I'm worried about the performance of that. > If we do this via our private database, that database is going to be > huge. Is it? I would expect something on the order of 50 lines. That would be large, but not huge. Each entry relates a key from a simple keyboard to a set of possible characters that are not represented in simple keyboards. But maybe I'm just being naive. > I suggest to explore an alternative implementation, which uses > canonical equivalence. I'd love that. > We already have infrastructure for that, see > the description of the 'decomposition' character property in the ELisp > manual. Building this on preexisting infrastructure would be great, but does that go the right way? Does it relate a simple character to all its complex equivalents? Or does it relate each complex character to a simple alternative? [-- Attachment #2: Type: text/html, Size: 2087 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-27 20:24 ` Artur Malabarba @ 2015-01-27 21:18 ` Eli Zaretskii 2015-01-28 1:15 ` Artur Malabarba 0 siblings, 1 reply; 40+ messages in thread From: Eli Zaretskii @ 2015-01-27 21:18 UTC (permalink / raw) To: bruce.connor.am; +Cc: emacs-devel, mbork > Date: Tue, 27 Jan 2015 18:24:09 -0200 > From: Artur Malabarba <bruce.connor.am@gmail.com> > Cc: Marcin Borkowski <mbork@wmi.amu.edu.pl>, emacs-devel <emacs-devel@gnu.org> > > > If this is implemented in isearch, then IMO doing it for quotes alone > > makes very little sense. > > The quotes are just proof of concept. Yes, but what concept is that? Does it scale up to a general-purpose feature of the kind that suits isearch.el? Just replacing one character for another doesn't, IMO. > > If we do this via our private database, that database is going to be > > huge. > > Is it? I would expect something on the order of 50 lines. There are more than 5000 characters in the Unicode database that have equivalence and canonical decompositions. (Look for entries in UnicodeData.txt whose 6th field is non-empty.) > > We already have infrastructure for that, see > > the description of the 'decomposition' character property in the ELisp > > manual. > > Building this on preexisting infrastructure would be great, but does that go > the right way? Does it relate a simple character to all its complex > equivalents? Or does it relate each complex character to a simple alternative? The latter. Read paragraph 1.1 of UAX #15 for the starting point, and also section 3.7 of the Unicode Standard. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-27 21:18 ` Eli Zaretskii @ 2015-01-28 1:15 ` Artur Malabarba 2015-01-28 15:24 ` Eli Zaretskii 0 siblings, 1 reply; 40+ messages in thread From: Artur Malabarba @ 2015-01-28 1:15 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel [-- Attachment #1: Type: text/plain, Size: 2843 bytes --] Eli, if I may ask, did you get a chance to see the code? (it's quite short) The last couple emails give me the impression we're not quite on the same page. On 27 Jan 2015 19:18, "Eli Zaretskii" <eliz@gnu.org> wrote: > > > Date: Tue, 27 Jan 2015 18:24:09 -0200 > > From: Artur Malabarba <bruce.connor.am@gmail.com> > > Cc: Marcin Borkowski <mbork@wmi.amu.edu.pl>, emacs-devel < emacs-devel@gnu.org> > > > > > If this is implemented in isearch, then IMO doing it for quotes alone > > > makes very little sense. > > > > The quotes are just proof of concept. > > Yes, but what concept is that? Does it scale up to a general-purpose > feature of the kind that suits isearch.el? Just replacing one > character for another doesn't, IMO. No. It replaces one character with an arbitrary regexp. In the quotes case that's used to match about a dozen different quotation characters, but it's not limited to that. You can also use that to implement lax-whi > > > If we do this via our private database, that database is going to be > > > huge. > > > > Is it? I would expect something on the order of 50 lines. > > There are more than 5000 characters in the Unicode database that have > equivalence and canonical decompositions. (Look for entries in > UnicodeData.txt whose 6th field is non-empty.) The purpose of this is to allow the user to search for complex characters (such as curly quotes or any of these "“””„⹂〞‟‟❞❝❠“„〝〟🙷🙶🙸) by typing a simple character available on simple keyboards (such as the plain double quote "). Each simple character, needs an entry on the `isearch-groups-alist' variable. The max number of entries we'll ever need on this alist (in the very worst possible scenario) is the number of simple characters in a simple keyboard (which is way less than 5000 last I checked). This might be easier to understand looking at the code. > > > > We already have infrastructure for that, see > > > the description of the 'decomposition' character property in the ELisp > > > manual. > > > > Building this on preexisting infrastructure would be great, but does that go > > the right way? Does it relate a simple character to all its complex > > equivalents? Or does it relate each complex character to a simple alternative? > The latter. Read paragraph 1.1 of UAX #15 for the starting point, and > also section 3.7 of the Unicode Standard. If it's the latter, then it's the wrong way for us to do an automated approach. What we need is to know the whole set of Unicode characters which is equivalent to a given ASCII character. Of course we can build this table from the Unicode Standard (that's exactly what the `isearch-groups-alist' variable is meant to do), I'm just saying an automated approach probably isn't viable here. [-- Attachment #2: Type: text/html, Size: 3563 bytes --] ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-28 1:15 ` Artur Malabarba @ 2015-01-28 15:24 ` Eli Zaretskii 2015-01-28 16:10 ` Yuri Khan 2015-01-28 21:38 ` Artur Malabarba 0 siblings, 2 replies; 40+ messages in thread From: Eli Zaretskii @ 2015-01-28 15:24 UTC (permalink / raw) To: bruce.connor.am; +Cc: emacs-devel > Date: Tue, 27 Jan 2015 23:15:22 -0200 > From: Artur Malabarba <bruce.connor.am@gmail.com> > Cc: emacs-devel <emacs-devel@gnu.org> > > Eli, if I may ask, did you get a chance to see the code? (it's quite short) > The last couple emails give me the impression we're not quite on the same page. I did just now, and I don't think I was on a different page. > The purpose of this is to allow the user to search for complex characters (such as curly quotes or any of these "“””„⹂〞‟‟❞❝❠“„〝〟🙷🙶🙸) by typing a simple character available on simple keyboards (such as the plain double quote "). But that's exactly where it falls short of supporting a more general feature, which allows to find text that is "equivalent" to the one you search for. The limitation to "simple characters available on simple keyboards" might seem a no-brainer for predominantly ASCII text, but it _is_ a serious limitation for any non-ASCII script, certainly for complex scripts, which Emacs supports for years. > Each simple character, needs an entry on the `isearch-groups-alist' variable. The max number of entries we'll ever need on this alist (in the very worst possible scenario) is the number of simple characters in a simple keyboard (which is way less than 5000 last I checked). You seem to forget that modern keyboards and input methods support much more than what meets the eye on the keyboard. Even Latin locales provide non-ASCII characters such as á and å. It is also not uncommon to copy/paste a search string from some text, in which case the search string could include the "complex" characters, but you'd still want to find their "simple" equivalents; your code, which transforms only the search string, cannot support this use case. Moreover, CJK locales use input methods that can produce thousands of characters, and for people in those cultures such input is "simple" because they can use nothing simpler. Using a database that maps ASCII characters to regexps doesn't scale for supporting these use cases. It doesn't even scale to the above-mentioned Latin characters, because á has a sequence of 2 characters "a ́" as its canonical decomposition, so when I type á, I expect to find both á and "a ́", and vice versa. More complex scripts have several forms of the same letter, such as the "final" form used in Arabic and Hebrew for the last letter in a word -- typing one of these forms should find any other form. Etc. etc. -- there's a huge complexity behind all this, and we need to support it if we want to be respected as a text editor. The way to support this is similar to how we support case-insensitive search: we "fold" each character, both in the search string and in the text being searched, using case tables, and then compare the "folded" characters. Similarly, to support equivalence, we need to produce a canonical/equivalent decomposition from each character on both sides of the comparison, and then compare the results. As I said before, we already have all the necessary data in the 'decomposition' property of each character, we just need to use it in a way that is similar to case tables, just slightly more complex (because we are no longer talking single characters). > > > Does it relate a simple character to all its complex > > > equivalents? Or does it relate each complex character to a simple alternative? > > The latter. Read paragraph 1.1 of UAX #15 for the starting point, and > > also section 3.7 of the Unicode Standard. > If it's the latter, then it's the wrong way for us to do an automated approach. What we need is to know the whole set of Unicode characters which is equivalent to a given ASCII character. Of course we can build this table from the Unicode Standard (that's exactly what the `isearch-groups-alist' variable is meant to do), I'm just saying an automated approach probably isn't viable here. I don't see why it won't be viable, or maybe I don't understand what you mean by "automated" here. I certainly don't think we should limit ourselves to "simple characters", not for something as general-purpose as text search. This might be okay for Info only, but not if we want it in isearch.el. My idea is to use the 'decomposition' property to decompose each character in the search string and in the text being searched, when they need to be compared. Exactly like we do with case-folding. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-28 15:24 ` Eli Zaretskii @ 2015-01-28 16:10 ` Yuri Khan 2015-01-28 17:22 ` Eli Zaretskii 2015-01-28 21:38 ` Artur Malabarba 1 sibling, 1 reply; 40+ messages in thread From: Yuri Khan @ 2015-01-28 16:10 UTC (permalink / raw) To: Eli Zaretskii; +Cc: bruce.connor.am, Emacs developers On Wed, Jan 28, 2015 at 9:24 PM, Eli Zaretskii <eliz@gnu.org> wrote: > As I said before, we already have all the necessary data in the > 'decomposition' property of each character, we just need to use it in > a way that is similar to case tables, just slightly more complex > (because we are no longer talking single characters). Proper case folding is not about single characters either, because ß. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-28 16:10 ` Yuri Khan @ 2015-01-28 17:22 ` Eli Zaretskii 0 siblings, 0 replies; 40+ messages in thread From: Eli Zaretskii @ 2015-01-28 17:22 UTC (permalink / raw) To: Yuri Khan; +Cc: bruce.connor.am, emacs-devel > From: Yuri Khan <yuri.v.khan@gmail.com> > Date: Wed, 28 Jan 2015 23:10:32 +0700 > Cc: bruce.connor.am@gmail.com, Emacs developers <emacs-devel@gnu.org> > > On Wed, Jan 28, 2015 at 9:24 PM, Eli Zaretskii <eliz@gnu.org> wrote: > > > As I said before, we already have all the necessary data in the > > 'decomposition' property of each character, we just need to use it in > > a way that is similar to case tables, just slightly more complex > > (because we are no longer talking single characters). > > Proper case folding is not about single characters either, because ß. Which we don't yet support for the same reasons. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-28 15:24 ` Eli Zaretskii 2015-01-28 16:10 ` Yuri Khan @ 2015-01-28 21:38 ` Artur Malabarba 2015-01-29 3:44 ` Eli Zaretskii 1 sibling, 1 reply; 40+ messages in thread From: Artur Malabarba @ 2015-01-28 21:38 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel I've been looking into what you suggest, but it seems the decomposition property won't be enough. It does give us the necessary information for things like á and ç, but it doesn't say anything about the quotes (which was the whole inital point), nor about characters like ⇒ (which I think someone else on this thread suggested). Furthermore, the point here would be to have "a" and "á" match each other, but the decomposition of "á" gives us two characters (as would be expected). How are we to programmatically know which of these two characters is to be considered equivalent to "a with accute"? Is it safe to assume it's the first character? Otherwise, if we demand that the user types a´ to be able to match the á letter, then this feature seems kind of moot. 2015-01-28 13:24 GMT-02:00 Eli Zaretskii <eliz@gnu.org>: >> Date: Tue, 27 Jan 2015 23:15:22 -0200 >> From: Artur Malabarba <bruce.connor.am@gmail.com> >> Cc: emacs-devel <emacs-devel@gnu.org> >> >> Eli, if I may ask, did you get a chance to see the code? (it's quite short) >> The last couple emails give me the impression we're not quite on the same page. > > I did just now, and I don't think I was on a different page. > >> The purpose of this is to allow the user to search for complex characters (such as curly quotes or any of these "“””„⹂〞‟‟❞❝❠“„〝〟🙷🙶🙸) by typing a simple character available on simple keyboards (such as the plain double quote "). > > But that's exactly where it falls short of supporting a more general > feature, which allows to find text that is "equivalent" to the one you > search for. The limitation to "simple characters available on simple > keyboards" might seem a no-brainer for predominantly ASCII text, but > it _is_ a serious limitation for any non-ASCII script, certainly for > complex scripts, which Emacs supports for years. > >> Each simple character, needs an entry on the `isearch-groups-alist' variable. The max number of entries we'll ever need on this alist (in the very worst possible scenario) is the number of simple characters in a simple keyboard (which is way less than 5000 last I checked). > > You seem to forget that modern keyboards and input methods support > much more than what meets the eye on the keyboard. Even Latin locales > provide non-ASCII characters such as á and å. It is also not uncommon > to copy/paste a search string from some text, in which case the search > string could include the "complex" characters, but you'd still want to > find their "simple" equivalents; your code, which transforms only the > search string, cannot support this use case. Moreover, CJK locales > use input methods that can produce thousands of characters, and for > people in those cultures such input is "simple" because they can use > nothing simpler. > > Using a database that maps ASCII characters to regexps doesn't scale > for supporting these use cases. It doesn't even scale to the > above-mentioned Latin characters, because á has a sequence of 2 > characters "a ́" as its canonical decomposition, so when I type á, I > expect to find both á and "a ́", and vice versa. More complex scripts > have several forms of the same letter, such as the "final" form used > in Arabic and Hebrew for the last letter in a word -- typing one of > these forms should find any other form. Etc. etc. -- there's a huge > complexity behind all this, and we need to support it if we want to be > respected as a text editor. > > The way to support this is similar to how we support case-insensitive > search: we "fold" each character, both in the search string and in the > text being searched, using case tables, and then compare the "folded" > characters. Similarly, to support equivalence, we need to produce a > canonical/equivalent decomposition from each character on both sides > of the comparison, and then compare the results. > > As I said before, we already have all the necessary data in the > 'decomposition' property of each character, we just need to use it in > a way that is similar to case tables, just slightly more complex > (because we are no longer talking single characters). > >> > > Does it relate a simple character to all its complex >> > > equivalents? Or does it relate each complex character to a simple alternative? >> > The latter. Read paragraph 1.1 of UAX #15 for the starting point, and >> > also section 3.7 of the Unicode Standard. >> If it's the latter, then it's the wrong way for us to do an automated approach. What we need is to know the whole set of Unicode characters which is equivalent to a given ASCII character. Of course we can build this table from the Unicode Standard (that's exactly what the `isearch-groups-alist' variable is meant to do), I'm just saying an automated approach probably isn't viable here. > > I don't see why it won't be viable, or maybe I don't understand what > you mean by "automated" here. I certainly don't think we should limit > ourselves to "simple characters", not for something as general-purpose > as text search. This might be okay for Info only, but not if we want > it in isearch.el. > > My idea is to use the 'decomposition' property to decompose each > character in the search string and in the text being searched, when > they need to be compared. Exactly like we do with case-folding. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-28 21:38 ` Artur Malabarba @ 2015-01-29 3:44 ` Eli Zaretskii 2015-01-29 6:01 ` Drew Adams 0 siblings, 1 reply; 40+ messages in thread From: Eli Zaretskii @ 2015-01-29 3:44 UTC (permalink / raw) To: bruce.connor.am; +Cc: emacs-devel > Date: Wed, 28 Jan 2015 19:38:08 -0200 > From: Artur Malabarba <bruce.connor.am@gmail.com> > Cc: emacs-devel <emacs-devel@gnu.org> > > I've been looking into what you suggest, but it seems the > decomposition property won't be enough. It does give us the necessary > information for things like á and ç, but it doesn't say anything about > the quotes (which was the whole inital point), nor about characters > like ⇒ (which I think someone else on this thread suggested). These are specific to Emacs, and should be added. > Furthermore, the point here would be to have "a" and "á" match each > other, but the decomposition of "á" gives us two characters (as would > be expected). How are we to programmatically know which of these two > characters is to be considered equivalent to "a with accute"? Is it > safe to assume it's the first character? I'm not at all sure we should compare a and á equal. It's an additional feature anyway. If we do want them to compare equal in some cases, then yes, you take only the first character of the decomposition (the so-called "base character"). > Otherwise, if we demand that the user types a´ to be able to match the > á letter, then this feature seems kind of moot. As I explained, the user can type the decomposed character instead. Again, this is not necessarily about easier typing, this is about comparing equivalent text equal. ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: Single quotes in Info 2015-01-29 3:44 ` Eli Zaretskii @ 2015-01-29 6:01 ` Drew Adams 2015-01-29 16:03 ` Eli Zaretskii 0 siblings, 1 reply; 40+ messages in thread From: Drew Adams @ 2015-01-29 6:01 UTC (permalink / raw) To: Eli Zaretskii, bruce.connor.am; +Cc: emacs-devel > I'm not at all sure we should compare a and á equal. It's an > additional feature anyway. I get the impression that you are talking only about a built-in (more or less hard-coded, predefined) set of equivalence classes of chars, whatever that set might be defined as. Is that right, or would users be able to define the equivalence classes you are thinking of? If they would not then a separate but desirable (IMO) feature would be for users to be able to easily define their own such equivalence classes. It could be OK if this feature did not have the same efficiency as the built-in classes. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-29 6:01 ` Drew Adams @ 2015-01-29 16:03 ` Eli Zaretskii 2015-01-29 16:24 ` Drew Adams 0 siblings, 1 reply; 40+ messages in thread From: Eli Zaretskii @ 2015-01-29 16:03 UTC (permalink / raw) To: Drew Adams; +Cc: bruce.connor.am, emacs-devel > Date: Wed, 28 Jan 2015 22:01:09 -0800 (PST) > From: Drew Adams <drew.adams@oracle.com> > Cc: emacs-devel@gnu.org > > > I'm not at all sure we should compare a and á equal. It's an > > additional feature anyway. > > I get the impression that you are talking only about a built-in > (more or less hard-coded, predefined) set of equivalence classes > of chars, whatever that set might be defined as. We certainly should have predefined equivalence support based on the Unicode Standard's recommendations. That is the state of the art these days, and any respectable text editor should include such support. > Is that right, or would users be able to define the equivalence > classes you are thinking of? We should first provide users with a set of sensible optional behaviors that they are likely to expect in various situations. Each option will invoke a certain predefined behavior, such as whether or not equivalence classes are at all considered, whether or not a and á compare equal, etc. There are important use cases for each one of those, exactly like there important use cases for both case-sensitive and case-insensitive search. Once we have that in place, we can add user-defined additions. I expect them to be relatively minor and mostly mode-specific, such as the special treatment of quotes and other special characters in Info buffers. Why minor? because Unicode already thought out and defined almost any imaginable feature in this regard, so chances that some user might need something in addition are small. Mode-specific additions could be just alists that map characters or strings to their equivalents. Since I don't expect those to become large, there's no need for anything fancier, IMO. > If they would not then a separate but desirable (IMO) feature > would be for users to be able to easily define their own such > equivalence classes. I wouldn't call them equivalence classes. Users are not expected to be experts in Unicode features, its various data tables, and their implementation in Emacs. We should instead provide easy-to-customize option variables to select out of an array of predefined features based on Unicode tables we already have. User additions should be some simple data structure that don't require any special expertise. ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: Single quotes in Info 2015-01-29 16:03 ` Eli Zaretskii @ 2015-01-29 16:24 ` Drew Adams 2015-01-29 16:57 ` Eli Zaretskii 0 siblings, 1 reply; 40+ messages in thread From: Drew Adams @ 2015-01-29 16:24 UTC (permalink / raw) To: Eli Zaretskii; +Cc: bruce.connor.am, emacs-devel > > > I'm not at all sure we should compare a and á equal. It's an > > > additional feature anyway. > > > > I get the impression that you are talking only about a built-in > > (more or less hard-coded, predefined) set of equivalence classes > > of chars, whatever that set might be defined as. > > We certainly should have predefined equivalence support based on the > Unicode Standard's recommendations. That is the state of the art > these days, and any respectable text editor should include such > support. > > > Is that right, or would users be able to define the equivalence > > classes you are thinking of? > > We should first provide users with a set of sensible optional > behaviors that they are likely to expect in various situations. Each > option will invoke a certain predefined behavior, such as whether or > not equivalence classes are at all considered, whether or not a and á > compare equal, etc. There are important use cases for each one of > those, exactly like there important use cases for both case-sensitive > and case-insensitive search. > > Once we have that in place, we can add user-defined additions. I > expect them to be relatively minor and mostly mode-specific, such as > the special treatment of quotes and other special characters in Info > buffers. Why minor? because Unicode already thought out and defined > almost any imaginable feature in this regard, so chances that some > user might need something in addition are small. > > Mode-specific additions could be just alists that map characters or > strings to their equivalents. Since I don't expect those to become > large, there's no need for anything fancier, IMO. Glad to see all of those specific replies. It all sounds good to me, including the proposed development priorities. > > If they would not then a separate but desirable (IMO) feature > > would be for users to be able to easily define their own such > > equivalence classes. > > I wouldn't call them equivalence classes. Users are not expected to > be experts in Unicode features, its various data tables, and their > implementation in Emacs. We should instead provide easy-to-customize > option variables to select out of an array of predefined features > based on Unicode tables we already have. User additions should be > some simple data structure that don't require any special expertise. I don't care what you call them. In the interest of brevity I also did not explicitly mention the possibility of associating multiple-char sequences with other such or with single chars (e.g., associating "=>" with ⇒ or "ss" with ß, though those two would presumably be predefined). To me, each set of such associations constitutes an equivalence class, but I don't care what nomenclature is used to describe it, as long as it is clear. My point was for users to eventually be able to specify their own such associations, in addition to those (e.g. Unicode) that would be predefined. And it would be good to be able to use these not only for search but also for easy replacement (in either direction of such an equivalence), etc. E.g., have easy access to such pairs via `M-%' - be able to input one of such a class (char or char sequence) and then pick from its defined equivalences for the replacement. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-29 16:24 ` Drew Adams @ 2015-01-29 16:57 ` Eli Zaretskii 0 siblings, 0 replies; 40+ messages in thread From: Eli Zaretskii @ 2015-01-29 16:57 UTC (permalink / raw) To: Drew Adams; +Cc: bruce.connor.am, emacs-devel > Date: Thu, 29 Jan 2015 08:24:02 -0800 (PST) > From: Drew Adams <drew.adams@oracle.com> > Cc: bruce.connor.am@gmail.com, emacs-devel@gnu.org > > To me, each set of such associations constitutes an equivalence class, > but I don't care what nomenclature is used to describe it, as long > as it is clear. My point was that I don't think it would be wise to ask users to mess with Unicode tables to customize this. We should instead provide a way to add simple data structures that add to the predefined equivalence calsses. > And it would be good to be able to use these not only for search but > also for easy replacement (in either direction of such an equivalence), > etc. I agree. ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <mailman.18484.1422057224.1147.help-gnu-emacs@gnu.org>]
* Unicode in emacs (was Single quotes in Info) [not found] ` <mailman.18484.1422057224.1147.help-gnu-emacs@gnu.org> @ 2015-01-26 3:26 ` Rusi 0 siblings, 0 replies; 40+ messages in thread From: Rusi @ 2015-01-26 3:26 UTC (permalink / raw) To: help-gnu-emacs On Saturday, January 24, 2015 at 5:23:46 AM UTC+5:30, Drew Adams wrote: > > I'm not sure about it, but it seems that after upgrading from 24.3 to > > 25.0.50.1, the Info buffer is a bit uglified. First, it uses some face > > I don't like for variable and function names - but if this annoys me too > > much, I can change it easily. Worse, instead of e.g. `t' it now says > > 't', for instance (i.e., it uses Unicode single quotation marks). > > > > This is extremely annoying, since it makes incremental searching for > > single-quoted strings much harder. > > > > I apropos'ed the "Info-" variables and grepped the list for "quot", > > "unicode" and "single", all to no avail, and ran out of ideas. Is this > > behavior customizable? How to get back to ASCII quotes? > > Oh boy, you'll have fun reading about this in the bug threads: > > #16292 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16292 > info docs now contain single straight quotes instead of `' > > #13131 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13131 > Allow curly quotes to be found by searching for straight quotes? > > #16439 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16439 > Highlighting of strings within Info buffers > > #13228 - http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13228 > Request for highlighting back-quote/quote pair notation > > Enjoy! > > (Info+ can at least help by highlighting quoted names etc. > http://www.emacswiki.org/emacs/InfoPlus) Just some (very laymanish) thoughts about unicode. Uni-code has two aspects: 1. Uni-fying the tower of babel that is human languages 2. Uni-versality of a common core Historically, the 1st is the driver why unicode caught on at all [The world is a bit larger than the two sides of the atlantic!] However the 2nd probably holds more hope for reducing babel-ish bedlam. Some of the more universal sides of unicode: 1. ASCII (for historical reasons alone) 2. Math 3. Typography (which this thread is about) [Note this will not technically hold up. I am talking more sociologically ie "2+3" is more likely to universalize than "Add two and three" ] Further expanded in this post http://blog.languager.org/2015/01/unicode-and-universe.html Also a plea for programming languages to start getting more unicoded [Not to be taken too seriously - just a possible direction] http://blog.languager.org/2014/04/unicoded-python.html ^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <<87twzhgk84.fsf@wmi.amu.edu.pl>]
[parent not found: <<83lhksshdm.fsf@gnu.org>]
[parent not found: <<9ee0c895-a178-40e1-b1c8-ed2b97071c6b@default>]
[parent not found: <<87h9vgglkz.fsf@wmi.amu.edu.pl>]
[parent not found: <<CAAdUY-J4s+1_C7bj32Xk5x8d01fe9baPCYmwd+0KU=QorO7wZg@mail.gmail.com>]
[parent not found: <<83h9vcp0bq.fsf@gnu.org>]
[parent not found: <<CAAdUY-Kck6moHTRJshbXJdRVQ6gK6Q24f_PD7SuEaZ7hURpdQw@mail.gmail.com>]
[parent not found: <<83y4onorcc.fsf@gnu.org>]
[parent not found: <<CAAdUY-+ooLydD-qPtiEvv-01TGxX5E-cf6asvs+Jn+eR_=38ig@mail.gmail.com>]
[parent not found: <<83vbjrnd1f.fsf@gnu.org>]
[parent not found: <<CAAdUY-JwX-p-ZzdExm9+cKs5pC0SUoLLs8ppA9esuXsRuHRdng@mail.gmail.com>]
[parent not found: <<83386untcd.fsf@gnu.org>]
[parent not found: <<ee612423-67bf-42d0-a0ef-0dad11605c49@default>]
[parent not found: <<83vbjpmv4w.fsf@gnu.org>]
[parent not found: <<6164d89d-23ac-46bf-9f84-154cc0e6c6e4@default>]
[parent not found: <<83mw51msnz.fsf@gnu.org>]
* RE: Single quotes in Info [not found] ` <<83mw51msnz.fsf@gnu.org> @ 2015-01-29 17:05 ` Drew Adams 2015-01-29 17:24 ` Eli Zaretskii 0 siblings, 1 reply; 40+ messages in thread From: Drew Adams @ 2015-01-29 17:05 UTC (permalink / raw) To: Eli Zaretskii, Drew Adams; +Cc: bruce.connor.am, emacs-devel > > To me, each set of such associations constitutes an equivalence class, > > but I don't care what nomenclature is used to describe it, as long > > as it is clear. > > My point was that I don't think it would be wise to ask users to mess > with Unicode tables to customize this. I agree with that (without a lot of understanding of the implications). Users should have a simple way to define such a class of equivalences (choose your own term). Something as simple as an alist, perhaps. > We should instead provide a way to add simple data structures that > add to the predefined equivalence calsses. Not sure what you mean, but if you mean that users would only be able to add their own associations (equivalences) to existing classes then that is not what I would like to see as the only possibility. I would like to see the ability for users to define classes, and to "activate" (enable the use of; turn on) or "deactivate" (turn off) a particular class of equivalences as a whole, including any of the predefined classes. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-29 17:05 ` Single quotes in Info Drew Adams @ 2015-01-29 17:24 ` Eli Zaretskii 2015-01-29 18:34 ` Drew Adams 0 siblings, 1 reply; 40+ messages in thread From: Eli Zaretskii @ 2015-01-29 17:24 UTC (permalink / raw) To: Drew Adams; +Cc: bruce.connor.am, emacs-devel > Date: Thu, 29 Jan 2015 09:05:38 -0800 (PST) > From: Drew Adams <drew.adams@oracle.com> > Cc: bruce.connor.am@gmail.com, emacs-devel@gnu.org > > I would like to see the ability for users to define classes, and to > "activate" (enable the use of; turn on) or "deactivate" (turn off) a > particular class of equivalences as a whole, including any of the > predefined classes. This would require modifying the Unicode tables. They are just large char-tables, so someone who knows what they are doing should be able to do that. But that's not for the faint at heart, and I don't see why users would like to disable or replace portions of those tables. I do understand why in some use cases certain equivalences classes are inappropriate, but they are inappropriate _as_a_whole_. Doing that for a part of a class doesn't make sense to me. E.g., why would you want to make 2 and ② equivalent, but not 2 and ²? So this kind of customization doesn't have to be easy, IMO, and it's okay to ask such users to know what they are doing. ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: Single quotes in Info 2015-01-29 17:24 ` Eli Zaretskii @ 2015-01-29 18:34 ` Drew Adams 2015-01-29 18:54 ` Eli Zaretskii 0 siblings, 1 reply; 40+ messages in thread From: Drew Adams @ 2015-01-29 18:34 UTC (permalink / raw) To: Eli Zaretskii; +Cc: bruce.connor.am, emacs-devel > > I would like to see the ability for users to define classes, and to > > "activate" (enable the use of; turn on) or "deactivate" (turn off) a > > particular class of equivalences as a whole, including any of the > > predefined classes. > > This would require modifying the Unicode tables. They are just large > char-tables, so someone who knows what they are doing should be able > to do that. The point is to let ordinary users define such classes, and use them selectively. > But that's not for the faint at heart Then fiddling at that level is not the (only) answer. If changes at that level are ultimately required, then perhaps a user-friendly layer can be added above such low-level changes. > and I don't see why users would > like to disable or replace portions of those tables. That's putting it wrong, putting it already in terms of implementation. Ordinary users would certainly not *want* to "disable or replace portions of those tables". That is, they would not want to, and should not need to, think in terms of such tables. Whether such tables get changed under the covers when they want to define a new class of chars should not be something they need concern themselves with (I hope). What (some) ordinary users are liable to want to be able to do is define a class of chars that they can use in place of each other etc., and to choose among such classes, via Lisp or interactively, enabling/disabling the equivalences they define. > I do understand why in some use cases certain equivalences classes > are inappropriate, but they are inappropriate _as_a_whole_. Doing > that for a part of a class doesn't make sense to me. I did not say anything about enabling some of the equivalences of a class but not others. What I suggested was being able to specify a set of associations as a new, user-level equivalence class, and then being able to enable/disable that class as a whole. Whether the members of that class also belong to a larger, predefined class is not relevant here. > E.g., why would you want to make 2 and ② equivalent, but not 2 and ²? Why not? Why not be able to define your own class that includes 2 = ②, 3 = ③, etc., but not 2 = ² etc.? What you want to consider equivalent can depend on your particular context/needs. The fact that there are natural, predefined Unicode equivalences in general does not mean that only those equivalences make sense for a given user in a given context. > So this kind of customization doesn't have to be easy, IMO, and > it's okay to ask such users to know what they are doing. I disagree. But I'm talking user-level and wishlist. I have nothing to say about the difficulty of providing what I am suggesting. I am hoping that it *will* be easy for a user to both (a) define an equivalence class (set of associations) of chars and (b) enable or disable the use of that class. For search and for other purposes. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Single quotes in Info 2015-01-29 18:34 ` Drew Adams @ 2015-01-29 18:54 ` Eli Zaretskii 2015-01-29 19:35 ` Drew Adams 0 siblings, 1 reply; 40+ messages in thread From: Eli Zaretskii @ 2015-01-29 18:54 UTC (permalink / raw) To: Drew Adams; +Cc: bruce.connor.am, emacs-devel > Date: Thu, 29 Jan 2015 10:34:58 -0800 (PST) > From: Drew Adams <drew.adams@oracle.com> > Cc: bruce.connor.am@gmail.com, emacs-devel@gnu.org > > > > I would like to see the ability for users to define classes, and to > > > "activate" (enable the use of; turn on) or "deactivate" (turn off) a > > > particular class of equivalences as a whole, including any of the > > > predefined classes. > > > > This would require modifying the Unicode tables. They are just large > > char-tables, so someone who knows what they are doing should be able > > to do that. > > The point is to let ordinary users define such classes, and use them > selectively. They should be able to. But I was talking about _un_defining existing classes. > > and I don't see why users would > > like to disable or replace portions of those tables. > > That's putting it wrong, putting it already in terms of implementation. No, it's not. I just used these words, that's all. The intent was to say that disabling portions of a certain class makes no sense. > Ordinary users would certainly not *want* to "disable or replace portions > of those tables". That is, they would not want to, and should not need > to, think in terms of such tables. Red herring. I was using these words to make the issue clear. > What (some) ordinary users are liable to want to be able to do is define > a class of chars that they can use in place of each other etc., and to > choose among such classes, via Lisp or interactively, enabling/disabling > the equivalences they define. Replacing existing classes would need modifications of the Unicode tables. Again, not easy, and should be. > > E.g., why would you want to make 2 and ② equivalent, but not 2 and ²? > > Why not? Why not be able to define your own class that includes > 2 = ②, 3 = ③, etc., but not 2 = ² etc.? Because it makes no sense. This isn't some game we are playing here; these equivalences have deep meaning in some contexts. If they don't, they should not be used as a whole. > > So this kind of customization doesn't have to be easy, IMO, and > > it's okay to ask such users to know what they are doing. > > I disagree. Then we will have to agree to disagree. However, this is all highly theoretical, since the real decision will be made by whoever develops this. ^ permalink raw reply [flat|nested] 40+ messages in thread
* RE: Single quotes in Info 2015-01-29 18:54 ` Eli Zaretskii @ 2015-01-29 19:35 ` Drew Adams 0 siblings, 0 replies; 40+ messages in thread From: Drew Adams @ 2015-01-29 19:35 UTC (permalink / raw) To: Eli Zaretskii; +Cc: bruce.connor.am, emacs-devel > Replacing existing classes would need modifications of the Unicode > tables. Again, not easy, and should be. I didn't say anything about replacing existing classes. > > > E.g., why would you want to make 2 and ② equivalent, but not 2 and ²? > > > > Why not? Why not be able to define your own class that includes > > 2 = ②, 3 = ③, etc., but not 2 = ² etc.? > > Because it makes no sense. This isn't some game we are playing here; > these equivalences have deep meaning in some contexts. If they don't, > they should not be used as a whole. I give up. To me, it should be possible to allow user & use-case choices - arbitrary equivalence classes, not just only-predefined-correspondences-can-possibly-make-sense. User-defined does not imply silly game-playing or any necessary lack of "deep meaning". ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2015-01-29 19:35 UTC | newest] Thread overview: 40+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-01-23 23:17 Single quotes in Info Marcin Borkowski 2015-01-23 23:53 ` Drew Adams 2015-01-24 17:01 ` Marcin Borkowski 2015-01-24 8:38 ` Eli Zaretskii 2015-01-24 15:11 ` Drew Adams 2015-01-24 15:19 ` Eli Zaretskii [not found] ` <<838ugsrysw.fsf@gnu.org> 2015-01-24 15:54 ` Drew Adams 2015-01-24 16:45 ` Marcin Borkowski 2015-01-24 17:00 ` Marcin Borkowski 2015-01-27 16:27 ` Artur Malabarba 2015-01-27 17:37 ` Stefan Monnier 2015-01-27 18:09 ` Eli Zaretskii 2015-01-27 19:00 ` Stefan Monnier 2015-01-27 19:15 ` Eli Zaretskii 2015-01-27 19:49 ` Artur Malabarba 2015-01-27 20:30 ` Stefan Monnier 2015-01-28 3:48 ` Stefan Monnier 2015-01-28 21:42 ` Artur Malabarba 2015-01-28 22:23 ` Stefan Monnier 2015-01-29 14:31 ` Artur Malabarba 2015-01-27 18:04 ` Eli Zaretskii 2015-01-27 18:39 ` Drew Adams 2015-01-27 20:24 ` Artur Malabarba 2015-01-27 21:18 ` Eli Zaretskii 2015-01-28 1:15 ` Artur Malabarba 2015-01-28 15:24 ` Eli Zaretskii 2015-01-28 16:10 ` Yuri Khan 2015-01-28 17:22 ` Eli Zaretskii 2015-01-28 21:38 ` Artur Malabarba 2015-01-29 3:44 ` Eli Zaretskii 2015-01-29 6:01 ` Drew Adams 2015-01-29 16:03 ` Eli Zaretskii 2015-01-29 16:24 ` Drew Adams 2015-01-29 16:57 ` Eli Zaretskii [not found] ` <mailman.18484.1422057224.1147.help-gnu-emacs@gnu.org> 2015-01-26 3:26 ` Unicode in emacs (was Single quotes in Info) Rusi [not found] <<87twzhgk84.fsf@wmi.amu.edu.pl> [not found] ` <<83lhksshdm.fsf@gnu.org> [not found] ` <<9ee0c895-a178-40e1-b1c8-ed2b97071c6b@default> [not found] ` <<87h9vgglkz.fsf@wmi.amu.edu.pl> [not found] ` <<CAAdUY-J4s+1_C7bj32Xk5x8d01fe9baPCYmwd+0KU=QorO7wZg@mail.gmail.com> [not found] ` <<83h9vcp0bq.fsf@gnu.org> [not found] ` <<CAAdUY-Kck6moHTRJshbXJdRVQ6gK6Q24f_PD7SuEaZ7hURpdQw@mail.gmail.com> [not found] ` <<83y4onorcc.fsf@gnu.org> [not found] ` <<CAAdUY-+ooLydD-qPtiEvv-01TGxX5E-cf6asvs+Jn+eR_=38ig@mail.gmail.com> [not found] ` <<83vbjrnd1f.fsf@gnu.org> [not found] ` <<CAAdUY-JwX-p-ZzdExm9+cKs5pC0SUoLLs8ppA9esuXsRuHRdng@mail.gmail.com> [not found] ` <<83386untcd.fsf@gnu.org> [not found] ` <<ee612423-67bf-42d0-a0ef-0dad11605c49@default> [not found] ` <<83vbjpmv4w.fsf@gnu.org> [not found] ` <<6164d89d-23ac-46bf-9f84-154cc0e6c6e4@default> [not found] ` <<83mw51msnz.fsf@gnu.org> 2015-01-29 17:05 ` Single quotes in Info Drew Adams 2015-01-29 17:24 ` Eli Zaretskii 2015-01-29 18:34 ` Drew Adams 2015-01-29 18:54 ` Eli Zaretskii 2015-01-29 19:35 ` Drew Adams
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.