* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction @ 2012-01-13 8:40 Aaron Ecay 2012-01-13 10:45 ` Andreas Schwab 2016-08-11 0:29 ` npostavs 0 siblings, 2 replies; 17+ messages in thread From: Aaron Ecay @ 2012-01-13 8:40 UTC (permalink / raw) To: 10494 This bug report will be sent to the Bug-GNU-Emacs mailing list and the GNU bug tracker at debbugs.gnu.org. Please check that the From: line contains a valid email address. After a delay of up to one day, you should receive an acknowledgement at that address. Please write in English if possible, as the Emacs maintainers usually do not have translators for other languages. Please describe exactly what actions triggered the bug, and the precise symptoms of the bug. If you can, give a recipe starting from `emacs -Q': This bug relates to setting a non-ASCII character punctuation character (U+2019, which is ’) to have word syntax, and using word-motion commands. Here’s a recipe from emacs -Q: M-x text-mode don't C-a M-f -> (as expected, the cursor moves to the end of the line) RET RET don M-x ucs-insert 2019 t -> (text in buffer: "don’t") C-a M-f -> (cursor is on the quotation mark, as expected) M-: (modify-syntax-entry ?’ "w" text-mode-syntax-table) C-a M-f -> (BUG: cursor is on quotation mark, which should count as part of the word) If you re-run the experiment substituting - for ’ everywhere, there is a difference in behavior – the cursor moves to the end of the line after the call to modify-syntax-entry, as expected. This leads me to think that the problem has to do with ’ being outside the ASCII charset. This is with a recent-ish bzr trunk build, btw. The most recent commit is: revno: 106824 [merge] committer: Chong Yidong <cyd@gnu.org> branch nick: trunk timestamp: Mon 2012-01-09 13:48:13 +0800 message: Merge changes from emacs-23 branch If Emacs crashed, and you have the Emacs process in the gdb debugger, please include the output from the following gdb commands: `bt full' and `xbacktrace'. For information about debugging Emacs, please read the file /Users/aecay/Applications/Emacs.app/Contents/Resources/etc/DEBUG. In GNU Emacs 24.0.92.1 (i386-apple-darwin10.8.0, NS apple-appkit-1038.36) of 2012-01-09 on awe Windowing system distributor `Apple', version 10.3.1038 configured using `configure '--with-ns' '--without-gnutls'' -- Aaron Ecay ^ permalink raw reply [flat|nested] 17+ messages in thread
* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction 2012-01-13 8:40 bug#10494: 24.0.92; Syntax table and non-ASCII character interaction Aaron Ecay @ 2012-01-13 10:45 ` Andreas Schwab 2012-01-13 17:04 ` Aaron Ecay 2016-08-11 0:29 ` npostavs 1 sibling, 1 reply; 17+ messages in thread From: Andreas Schwab @ 2012-01-13 10:45 UTC (permalink / raw) To: Aaron Ecay; +Cc: 10494 Aaron Ecay <aaronecay@gmail.com> writes: > M-x text-mode > don't > C-a M-f > -> (as expected, the cursor moves to the end of the line) > RET RET > don M-x ucs-insert 2019 t > -> (text in buffer: "don’t") > C-a M-f > -> (cursor is on the quotation mark, as expected) > M-: (modify-syntax-entry ?’ "w" text-mode-syntax-table) > C-a M-f > -> (BUG: cursor is on quotation mark, which should count as part of the word) ?’ isn't of the same script as the surrounding characters, so there are word boundaries before and after it. See also word-combining-categories. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 17+ messages in thread
* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction 2012-01-13 10:45 ` Andreas Schwab @ 2012-01-13 17:04 ` Aaron Ecay 0 siblings, 0 replies; 17+ messages in thread From: Aaron Ecay @ 2012-01-13 17:04 UTC (permalink / raw) To: Andreas Schwab; +Cc: 10494 On Fri, 13 Jan 2012 11:45:21 +0100, Andreas Schwab <schwab@linux-m68k.org> wrote: > ?’ isn't of the same script as the surrounding characters, so there are > word boundaries before and after it. See also > word-combining-categories. What does that mean? I assume it has something to do with the “category:” line in the output of describe-char. For ?’ this gives: “.:Base, c:Chinese, h:Korean, j:Japanese”; for ?' it is “.:Base, a:ASCII, l:Latin, r:Roman”. So, what does ?’ have to do with CJK scripts? More specifically, I would like to use ?’ as an apostrophe in writing text, so I’d like for word-motion commands to treat it as part of a word, just as they do ?'. How might this be accomplished? Thanks, -- Aaron Ecay ^ permalink raw reply [flat|nested] 17+ messages in thread
* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction 2012-01-13 8:40 bug#10494: 24.0.92; Syntax table and non-ASCII character interaction Aaron Ecay 2012-01-13 10:45 ` Andreas Schwab @ 2016-08-11 0:29 ` npostavs 2016-08-11 15:24 ` Eli Zaretskii 1 sibling, 1 reply; 17+ messages in thread From: npostavs @ 2016-08-11 0:29 UTC (permalink / raw) To: Aaron Ecay; +Cc: 10494 tags 10494 confirmed found 10494 25.1 quit I confirm this is still the case in 25.1-rc1. Aaron Ecay <aaronecay@gmail.com> writes: > > This bug relates to setting a non-ASCII character punctuation character > (U+2019, which is ’) to have word syntax, and using word-motion > commands. Here’s a recipe from emacs -Q: > > M-x text-mode > don't > C-a M-f > -> (as expected, the cursor moves to the end of the line) > RET RET > don M-x ucs-insert 2019 t This should now use insert-char (C-x 8 RET) instead of ucs-insert. > -> (text in buffer: "don’t") > C-a M-f > -> (cursor is on the quotation mark, as expected) > M-: (modify-syntax-entry ?’ "w" text-mode-syntax-table) > C-a M-f > -> (BUG: cursor is on quotation mark, which should count as part of the word) > > If you re-run the experiment substituting - for ’ everywhere, there is a > difference in behavior – the cursor moves to the end of the line after > the call to modify-syntax-entry, as expected. This leads me to think > that the problem has to do with ’ being outside the ASCII charset. ^ permalink raw reply [flat|nested] 17+ messages in thread
* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction 2016-08-11 0:29 ` npostavs @ 2016-08-11 15:24 ` Eli Zaretskii 2016-08-12 22:37 ` npostavs 0 siblings, 1 reply; 17+ messages in thread From: Eli Zaretskii @ 2016-08-11 15:24 UTC (permalink / raw) To: npostavs; +Cc: aaronecay, 10494 > From: npostavs@users.sourceforge.net > Date: Wed, 10 Aug 2016 20:29:05 -0400 > Cc: 10494@debbugs.gnu.org > > I confirm this is still the case in 25.1-rc1. > > Aaron Ecay <aaronecay@gmail.com> writes: > > > > This bug relates to setting a non-ASCII character punctuation character > > (U+2019, which is ’) to have word syntax, and using word-motion > > commands. Here’s a recipe from emacs -Q: > > > > M-x text-mode > > don't > > C-a M-f > > -> (as expected, the cursor moves to the end of the line) > > RET RET > > don M-x ucs-insert 2019 t > > This should now use insert-char (C-x 8 RET) instead of ucs-insert. > > > -> (text in buffer: "don’t") > > C-a M-f > > -> (cursor is on the quotation mark, as expected) > > M-: (modify-syntax-entry ?’ "w" text-mode-syntax-table) > > C-a M-f > > -> (BUG: cursor is on quotation mark, which should count as part of the word) > > > > If you re-run the experiment substituting - for ’ everywhere, there is a > > difference in behavior – the cursor moves to the end of the line after > > the call to modify-syntax-entry, as expected. This leads me to think > > that the problem has to do with ’ being outside the ASCII charset. Indeed. This is a feature: we don't let word-movement commands to cross into a different script. IOW, if (aref char-script-table C1) and (aref char-script-table C2) return different values, then we decide that there's a word boundary between C1 and C2. See the function word_boundary_p, which is called from scan_words. Maybe we should document this somewhere, like the ELisp manual. ^ permalink raw reply [flat|nested] 17+ messages in thread
* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction 2016-08-11 15:24 ` Eli Zaretskii @ 2016-08-12 22:37 ` npostavs 2016-08-13 6:56 ` Eli Zaretskii 0 siblings, 1 reply; 17+ messages in thread From: npostavs @ 2016-08-12 22:37 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 10494, aaronecay [-- Attachment #1: Type: text/plain, Size: 373 bytes --] Eli Zaretskii <eliz@gnu.org> writes: > Indeed. This is a feature: Ah, so doing (modify-syntax-entry ?’ "w" text-mode-syntax-table) (aset char-script-table ?’ 'latin) does let word motion skip over ’ as OP wanted. > > Maybe we should document this somewhere, like the ELisp manual. `(elisp) Word Motion' looks like a good place for it: [-- Attachment #2: patch --] [-- Type: text/plain, Size: 2177 bytes --] From 03dbee2bf6bae29b21ea36ff3d73bce773458f78 Mon Sep 17 00:00:00 2001 From: Noam Postavsky <npostavs@gmail.com> Date: Fri, 12 Aug 2016 18:33:17 -0400 Subject: [PATCH v1] Document char-script-table's effect on word motion * doc/lispref/positions.texi (Word Motion): Talk about char-script-table (Bug #10494). --- doc/lispref/positions.texi | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/doc/lispref/positions.texi b/doc/lispref/positions.texi index 1d748b8..3359ced 100644 --- a/doc/lispref/positions.texi +++ b/doc/lispref/positions.texi @@ -192,8 +192,8 @@ Word Motion @subsection Motion by Words The functions for parsing words described below use the syntax table -to decide whether a given character is part of a word. @xref{Syntax -Tables}. +and @code{char-script-table} to decide whether a given character is +part of a word. @xref{Syntax Tables} and @xref{Character Properties}. @deffn Command forward-word &optional count This function moves point forward @var{count} words (or backward if @@ -207,11 +207,13 @@ Word Motion that begin and end words, known as @dfn{word boundaries}, are defined by the current buffer's syntax table (@pxref{Syntax Class Table}), but modes can override that by setting up a suitable -@code{find-word-boundary-function-table}, described below. In any -case, this function cannot move point past the boundary of the -accessible portion of the buffer, or across a field boundary -(@pxref{Fields}). The most common case of a field boundary is the end -of the prompt in the minibuffer. +@code{find-word-boundary-function-table}, described below. Characters +that belong to a different script (as defined by +@code{char-syntax-table}), also mark a word boundary (@pxref{Character +Properties}). In any case, this function cannot move point past the +boundary of the accessible portion of the buffer, or across a field +boundary (@pxref{Fields}). The most common case of a field boundary +is the end of the prompt in the minibuffer. If it is possible to move @var{count} words, without being stopped prematurely by the buffer boundary or a field boundary, the value is -- 2.9.2 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction 2016-08-12 22:37 ` npostavs @ 2016-08-13 6:56 ` Eli Zaretskii 2016-08-13 13:21 ` npostavs 0 siblings, 1 reply; 17+ messages in thread From: Eli Zaretskii @ 2016-08-13 6:56 UTC (permalink / raw) To: npostavs; +Cc: 10494, aaronecay > From: npostavs@users.sourceforge.net > Cc: 10494@debbugs.gnu.org, aaronecay@gmail.com > Date: Fri, 12 Aug 2016 18:37:56 -0400 > > Eli Zaretskii <eliz@gnu.org> writes: > > > Indeed. This is a feature: > > Ah, so doing > > (modify-syntax-entry ?’ "w" text-mode-syntax-table) > (aset char-script-table ?’ 'latin) > > does let word motion skip over ’ as OP wanted. Yes. But I don't recommend such a "solution", because that would most probably bite elsewhere, when we do want that character behave as a symbol. > `(elisp) Word Motion' looks like a good place for it: Right, thanks. > The functions for parsing words described below use the syntax table > -to decide whether a given character is part of a word. @xref{Syntax > -Tables}. > +and @code{char-script-table} to decide whether a given character is > +part of a word. @xref{Syntax Tables} and @xref{Character Properties}. @xref generates a capitalized "See", so is inappropriate in the middle of a sentence. Please use "see @ref" instead. > +@code{find-word-boundary-function-table}, described below. Characters > +that belong to a different script (as defined by I'd say "belong to different scripts", otherwise the text begs the question "different from what?". > +@code{char-syntax-table}), also mark a word boundary (@pxref{Character "define a word boundary" sounds better to me. Othwrise, LGTM, thanks. Please push to emacs-25. ^ permalink raw reply [flat|nested] 17+ messages in thread
* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction 2016-08-13 6:56 ` Eli Zaretskii @ 2016-08-13 13:21 ` npostavs 2016-08-13 13:33 ` Eli Zaretskii 0 siblings, 1 reply; 17+ messages in thread From: npostavs @ 2016-08-13 13:21 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 10494, aaronecay [-- Attachment #1: Type: text/plain, Size: 1801 bytes --] Eli Zaretskii <eliz@gnu.org> writes: >> From: npostavs@users.sourceforge.net >> Cc: 10494@debbugs.gnu.org, aaronecay@gmail.com >> Date: Fri, 12 Aug 2016 18:37:56 -0400 >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >> > Indeed. This is a feature: >> >> Ah, so doing >> >> (modify-syntax-entry ?’ "w" text-mode-syntax-table) >> (aset char-script-table ?’ 'latin) >> >> does let word motion skip over ’ as OP wanted. > > Yes. But I don't recommend such a "solution", because that would most > probably bite elsewhere, when we do want that character behave as a > symbol. Sure, but it could be made local to text-mode: (modify-syntax-entry ?’ "w" text-mode-syntax-table) (defconst my-text-char-script-table (let ((table (copy-sequence char-script-table))) (aset table ?’ 'latin) table)) (defun my-text-mode-hook () (set (make-local-variable 'char-script-table) my-text-char-script-table)) (add-hook 'text-mode-hook 'my-text-mode-hook) > >> `(elisp) Word Motion' looks like a good place for it: > > Right, thanks. > >> The functions for parsing words described below use the syntax table >> -to decide whether a given character is part of a word. @xref{Syntax >> -Tables}. >> +and @code{char-script-table} to decide whether a given character is >> +part of a word. @xref{Syntax Tables} and @xref{Character Properties}. > > @xref generates a capitalized "See", so is inappropriate in the middle > of a sentence. Please use "see @ref" instead. Uff, I find these multiple variants of reference very confusing, I also got a complaint from makeinfo that I was missing punctuation after the first @xref. Does it look okay now? (I made the other wording fixes too) [-- Attachment #2: patch v2 --] [-- Type: text/plain, Size: 2185 bytes --] From e18a6dc7be2aa245767d00ac69a0e13605fc4440 Mon Sep 17 00:00:00 2001 From: Noam Postavsky <npostavs@gmail.com> Date: Fri, 12 Aug 2016 18:33:17 -0400 Subject: [PATCH v2] Document char-script-table's effect on word motion * doc/lispref/positions.texi (Word Motion): Talk about char-script-table (Bug #10494). --- doc/lispref/positions.texi | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/doc/lispref/positions.texi b/doc/lispref/positions.texi index 1d748b8..b6133dc 100644 --- a/doc/lispref/positions.texi +++ b/doc/lispref/positions.texi @@ -192,8 +192,9 @@ Word Motion @subsection Motion by Words The functions for parsing words described below use the syntax table -to decide whether a given character is part of a word. @xref{Syntax -Tables}. +and @code{char-script-table} to decide whether a given character is +part of a word. @xref{Syntax Tables}, and see @ref{Character +Properties}. @deffn Command forward-word &optional count This function moves point forward @var{count} words (or backward if @@ -207,11 +208,13 @@ Word Motion that begin and end words, known as @dfn{word boundaries}, are defined by the current buffer's syntax table (@pxref{Syntax Class Table}), but modes can override that by setting up a suitable -@code{find-word-boundary-function-table}, described below. In any -case, this function cannot move point past the boundary of the -accessible portion of the buffer, or across a field boundary -(@pxref{Fields}). The most common case of a field boundary is the end -of the prompt in the minibuffer. +@code{find-word-boundary-function-table}, described below. Characters +that belong to different scripts (as defined by +@code{char-syntax-table}), also define a word boundary +(@pxref{Character Properties}). In any case, this function cannot +move point past the boundary of the accessible portion of the buffer, +or across a field boundary (@pxref{Fields}). The most common case of +a field boundary is the end of the prompt in the minibuffer. If it is possible to move @var{count} words, without being stopped prematurely by the buffer boundary or a field boundary, the value is -- 2.9.2 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction 2016-08-13 13:21 ` npostavs @ 2016-08-13 13:33 ` Eli Zaretskii 2016-08-13 14:19 ` npostavs 0 siblings, 1 reply; 17+ messages in thread From: Eli Zaretskii @ 2016-08-13 13:33 UTC (permalink / raw) To: npostavs; +Cc: 10494, aaronecay > From: npostavs@users.sourceforge.net > Cc: 10494@debbugs.gnu.org, aaronecay@gmail.com > Date: Sat, 13 Aug 2016 09:21:54 -0400 > > >> (modify-syntax-entry ?’ "w" text-mode-syntax-table) > >> (aset char-script-table ?’ 'latin) > >> > >> does let word motion skip over ’ as OP wanted. > > > > Yes. But I don't recommend such a "solution", because that would most > > probably bite elsewhere, when we do want that character behave as a > > symbol. > > Sure, but it could be made local to text-mode: > > (modify-syntax-entry ?’ "w" text-mode-syntax-table) > (defconst my-text-char-script-table > (let ((table (copy-sequence char-script-table))) > (aset table ?’ 'latin) > table)) > > (defun my-text-mode-hook () > (set (make-local-variable 'char-script-table) > my-text-char-script-table)) > (add-hook 'text-mode-hook 'my-text-mode-hook) Are you sure nothing in text-mode will ever want to use \s_ in any regexp? > > @xref generates a capitalized "See", so is inappropriate in the middle > > of a sentence. Please use "see @ref" instead. > > Uff, I find these multiple variants of reference very confusing, I also > got a complaint from makeinfo that I was missing punctuation after the > first @xref. Does it look okay now? (I made the other wording fixes > too) Yes, looks good, thanks. ^ permalink raw reply [flat|nested] 17+ messages in thread
* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction 2016-08-13 13:33 ` Eli Zaretskii @ 2016-08-13 14:19 ` npostavs 2016-08-13 14:31 ` Eli Zaretskii 0 siblings, 1 reply; 17+ messages in thread From: npostavs @ 2016-08-13 14:19 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 10494, aaronecay Eli Zaretskii <eliz@gnu.org> writes: >> From: npostavs@users.sourceforge.net >> Cc: 10494@debbugs.gnu.org, aaronecay@gmail.com >> Date: Sat, 13 Aug 2016 09:21:54 -0400 >> >> >> (modify-syntax-entry ?’ "w" text-mode-syntax-table) >> >> (aset char-script-table ?’ 'latin) >> >> >> >> does let word motion skip over ’ as OP wanted. >> > >> > Yes. But I don't recommend such a "solution", because that would most >> > probably bite elsewhere, when we do want that character behave as a >> > symbol. >> >> Sure, but it could be made local to text-mode: >> >> (modify-syntax-entry ?’ "w" text-mode-syntax-table) >> (defconst my-text-char-script-table >> (let ((table (copy-sequence char-script-table))) >> (aset table ?’ 'latin) >> table)) >> >> (defun my-text-mode-hook () >> (set (make-local-variable 'char-script-table) >> my-text-char-script-table)) >> (add-hook 'text-mode-hook 'my-text-mode-hook) > > Are you sure nothing in text-mode will ever want to use \s_ in any > regexp? Did you mean \> (word boundary) or \s. (punctation)? \s_ doesn't match ’ regardless because its syntax class is punctuation, not symbol. If the user wants ’ to be part of a word, then surely it's correct for regexps to treat it as such. > >> > @xref generates a capitalized "See", so is inappropriate in the middle >> > of a sentence. Please use "see @ref" instead. >> >> Uff, I find these multiple variants of reference very confusing, I also >> got a complaint from makeinfo that I was missing punctuation after the >> first @xref. Does it look okay now? (I made the other wording fixes >> too) > > Yes, looks good, thanks. Pushed as 8342e748 ^ permalink raw reply [flat|nested] 17+ messages in thread
* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction 2016-08-13 14:19 ` npostavs @ 2016-08-13 14:31 ` Eli Zaretskii 2016-08-13 14:55 ` Eli Zaretskii 2016-08-13 18:14 ` npostavs 0 siblings, 2 replies; 17+ messages in thread From: Eli Zaretskii @ 2016-08-13 14:31 UTC (permalink / raw) To: npostavs; +Cc: 10494, aaronecay > From: npostavs@users.sourceforge.net > Cc: 10494@debbugs.gnu.org, aaronecay@gmail.com > Date: Sat, 13 Aug 2016 10:19:34 -0400 > > >> (modify-syntax-entry ?’ "w" text-mode-syntax-table) > >> (defconst my-text-char-script-table > >> (let ((table (copy-sequence char-script-table))) > >> (aset table ?’ 'latin) > >> table)) > >> > >> (defun my-text-mode-hook () > >> (set (make-local-variable 'char-script-table) > >> my-text-char-script-table)) > >> (add-hook 'text-mode-hook 'my-text-mode-hook) > > > > Are you sure nothing in text-mode will ever want to use \s_ in any > > regexp? > > Did you mean \> (word boundary) or \s. (punctation)? \s_ doesn't match > ’ regardless because its syntax class is punctuation, not symbol. Sorry, I guess I was thinking of \cl. It will not match ’, although it might be expected. Anyway, my point is that these char-tables should really be treated as read-only by Lisp applications. > Pushed as 8342e748 Thanks. ^ permalink raw reply [flat|nested] 17+ messages in thread
* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction 2016-08-13 14:31 ` Eli Zaretskii @ 2016-08-13 14:55 ` Eli Zaretskii 2016-08-13 18:14 ` npostavs 1 sibling, 0 replies; 17+ messages in thread From: Eli Zaretskii @ 2016-08-13 14:55 UTC (permalink / raw) To: npostavs; +Cc: 10494, aaronecay > Date: Sat, 13 Aug 2016 17:31:48 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: 10494@debbugs.gnu.org, aaronecay@gmail.com > > > From: npostavs@users.sourceforge.net > > Cc: 10494@debbugs.gnu.org, aaronecay@gmail.com > > Date: Sat, 13 Aug 2016 10:19:34 -0400 > > > > >> (modify-syntax-entry ?’ "w" text-mode-syntax-table) > > >> (defconst my-text-char-script-table > > >> (let ((table (copy-sequence char-script-table))) > > >> (aset table ?’ 'latin) > > >> table)) > > >> > > >> (defun my-text-mode-hook () > > >> (set (make-local-variable 'char-script-table) > > >> my-text-char-script-table)) > > >> (add-hook 'text-mode-hook 'my-text-mode-hook) > > > > > > Are you sure nothing in text-mode will ever want to use \s_ in any > > > regexp? > > > > Did you mean \> (word boundary) or \s. (punctation)? \s_ doesn't match > > ’ regardless because its syntax class is punctuation, not symbol. > > Sorry, I guess I was thinking of \cl. It will not match ’, although > it might be expected. > > Anyway, my point is that these char-tables should really be treated as > read-only by Lisp applications. Btw, some believe that using ’ as an apostrophe is wrong. They say U+02BC should be used instead; see, for example, this discussion: http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0047.html That character already is word-constituent. ^ permalink raw reply [flat|nested] 17+ messages in thread
* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction 2016-08-13 14:31 ` Eli Zaretskii 2016-08-13 14:55 ` Eli Zaretskii @ 2016-08-13 18:14 ` npostavs 2016-08-13 18:35 ` Eli Zaretskii 1 sibling, 1 reply; 17+ messages in thread From: npostavs @ 2016-08-13 18:14 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 10494, aaronecay Eli Zaretskii <eliz@gnu.org> writes: >> From: npostavs@users.sourceforge.net >> Cc: 10494@debbugs.gnu.org, aaronecay@gmail.com >> Date: Sat, 13 Aug 2016 10:19:34 -0400 >> >> >> (modify-syntax-entry ?’ "w" text-mode-syntax-table) >> >> (defconst my-text-char-script-table >> >> (let ((table (copy-sequence char-script-table))) >> >> (aset table ?’ 'latin) >> >> table)) >> >> >> >> (defun my-text-mode-hook () >> >> (set (make-local-variable 'char-script-table) >> >> my-text-char-script-table)) >> >> (add-hook 'text-mode-hook 'my-text-mode-hook) >> > >> > Are you sure nothing in text-mode will ever want to use \s_ in any >> > regexp? >> >> Did you mean \> (word boundary) or \s. (punctation)? \s_ doesn't match >> ’ regardless because its syntax class is punctuation, not symbol. > > Sorry, I guess I was thinking of \cl. It will not match ’, although > it might be expected. Which could be fixed by (modify-category-entry ?’ ?l). I would suggest this additional docstring patch, because I was confused at first as to what CATEGORY was supposed to be (I looked around a bit for how to create some kind of "category object"): diff --git i/src/category.c w/src/category.c index 4397f66..31ac2ec 100644 --- i/src/category.c +++ w/src/category.c @@ -336,6 +336,7 @@ DEFUN ("modify-category-entry", Fmodify_category_entry, the current buffer's category table. CHARACTER can be either a single character or a cons representing the lower and upper ends of an inclusive character range to modify. +CATEGORY must be a category name (a character between ` ' and `~'). If optional fourth argument RESET is non-nil, then delete CATEGORY from the category set instead of adding it. */) (Lisp_Object character, Lisp_Object category, Lisp_Object table, Lisp_Object reset) > > Anyway, my point is that these char-tables should really be treated as > read-only by Lisp applications. Right, but I think this bug is about the user modifying stuff. ^ permalink raw reply related [flat|nested] 17+ messages in thread
* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction 2016-08-13 18:14 ` npostavs @ 2016-08-13 18:35 ` Eli Zaretskii 2016-08-13 21:42 ` npostavs 0 siblings, 1 reply; 17+ messages in thread From: Eli Zaretskii @ 2016-08-13 18:35 UTC (permalink / raw) To: npostavs; +Cc: 10494, aaronecay > From: npostavs@users.sourceforge.net > Cc: 10494@debbugs.gnu.org, aaronecay@gmail.com > Date: Sat, 13 Aug 2016 14:14:49 -0400 > > > Sorry, I guess I was thinking of \cl. It will not match ’, although > > it might be expected. > > Which could be fixed by (modify-category-entry ?’ ?l). This is Emacs, right? But the fact that you can do this doesn't yet mean you should want to, or that we should encourage it. > I would suggest this additional docstring patch, because I was confused > at first as to what CATEGORY was supposed to be (I looked around a bit > for how to create some kind of "category object"): > > diff --git i/src/category.c w/src/category.c > index 4397f66..31ac2ec 100644 > --- i/src/category.c > +++ w/src/category.c > @@ -336,6 +336,7 @@ DEFUN ("modify-category-entry", Fmodify_category_entry, > the current buffer's category table. > CHARACTER can be either a single character or a cons representing the > lower and upper ends of an inclusive character range to modify. > +CATEGORY must be a category name (a character between ` ' and `~'). > If optional fourth argument RESET is non-nil, > then delete CATEGORY from the category set instead of adding it. */) > (Lisp_Object character, Lisp_Object category, Lisp_Object table, Lisp_Object reset) How about mentioning describe-categories as well? > > Anyway, my point is that these char-tables should really be treated as > > read-only by Lisp applications. > > Right, but I think this bug is about the user modifying stuff. Which is even less recommendable, IMO. How many users really understand the implications? ^ permalink raw reply [flat|nested] 17+ messages in thread
* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction 2016-08-13 18:35 ` Eli Zaretskii @ 2016-08-13 21:42 ` npostavs 2016-08-14 2:32 ` Eli Zaretskii 0 siblings, 1 reply; 17+ messages in thread From: npostavs @ 2016-08-13 21:42 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 10494, aaronecay Eli Zaretskii <eliz@gnu.org> writes: > > How about mentioning describe-categories as well? Okay, so: diff --git i/src/category.c w/src/category.c index 4397f66..8315797 100644 --- i/src/category.c +++ w/src/category.c @@ -336,6 +336,8 @@ DEFUN ("modify-category-entry", Fmodify_category_entry, the current buffer's category table. CHARACTER can be either a single character or a cons representing the lower and upper ends of an inclusive character range to modify. +CATEGORY must be a category name (a character between ` ' and `~'). +Use `describe-categories' to see existing category names. If optional fourth argument RESET is non-nil, then delete CATEGORY from the category set instead of adding it. */) (Lisp_Object character, Lisp_Object category, Lisp_Object table, Lisp_Object reset) > >> > Anyway, my point is that these char-tables should really be treated as >> > read-only by Lisp applications. >> >> Right, but I think this bug is about the user modifying stuff. > > Which is even less recommendable, IMO. How many users really > understand the implications? Well, they might find what the implications are by trying it :) Anyway, do you think there is anything else to do about this bug? ^ permalink raw reply related [flat|nested] 17+ messages in thread
* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction 2016-08-13 21:42 ` npostavs @ 2016-08-14 2:32 ` Eli Zaretskii 2016-08-14 2:58 ` npostavs 0 siblings, 1 reply; 17+ messages in thread From: Eli Zaretskii @ 2016-08-14 2:32 UTC (permalink / raw) To: npostavs; +Cc: 10494, aaronecay > From: npostavs@users.sourceforge.net > Cc: 10494@debbugs.gnu.org, aaronecay@gmail.com > Date: Sat, 13 Aug 2016 17:42:03 -0400 > > Eli Zaretskii <eliz@gnu.org> writes: > > > > How about mentioning describe-categories as well? > > Okay, so: > > diff --git i/src/category.c w/src/category.c > index 4397f66..8315797 100644 > --- i/src/category.c > +++ w/src/category.c > @@ -336,6 +336,8 @@ DEFUN ("modify-category-entry", Fmodify_category_entry, > the current buffer's category table. > CHARACTER can be either a single character or a cons representing the > lower and upper ends of an inclusive character range to modify. > +CATEGORY must be a category name (a character between ` ' and `~'). > +Use `describe-categories' to see existing category names. > If optional fourth argument RESET is non-nil, > then delete CATEGORY from the category set instead of adding it. */) > (Lisp_Object character, Lisp_Object category, Lisp_Object table, Lisp_Object reset) LGTM, thanks. > Anyway, do you think there is anything else to do about this bug? Not that I can see, no. ^ permalink raw reply [flat|nested] 17+ messages in thread
* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction 2016-08-14 2:32 ` Eli Zaretskii @ 2016-08-14 2:58 ` npostavs 0 siblings, 0 replies; 17+ messages in thread From: npostavs @ 2016-08-14 2:58 UTC (permalink / raw) To: Eli Zaretskii; +Cc: 10494, aaronecay tags 10494 notabug close 10494 quit Eli Zaretskii <eliz@gnu.org> writes: > > LGTM, thanks. Pushed as 8d681476 "Document CATEGORY arg to modify-category-entry" > >> Anyway, do you think there is anything else to do about this bug? > > Not that I can see, no. Okay, I'm closing it. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2016-08-14 2:58 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-01-13 8:40 bug#10494: 24.0.92; Syntax table and non-ASCII character interaction Aaron Ecay 2012-01-13 10:45 ` Andreas Schwab 2012-01-13 17:04 ` Aaron Ecay 2016-08-11 0:29 ` npostavs 2016-08-11 15:24 ` Eli Zaretskii 2016-08-12 22:37 ` npostavs 2016-08-13 6:56 ` Eli Zaretskii 2016-08-13 13:21 ` npostavs 2016-08-13 13:33 ` Eli Zaretskii 2016-08-13 14:19 ` npostavs 2016-08-13 14:31 ` Eli Zaretskii 2016-08-13 14:55 ` Eli Zaretskii 2016-08-13 18:14 ` npostavs 2016-08-13 18:35 ` Eli Zaretskii 2016-08-13 21:42 ` npostavs 2016-08-14 2:32 ` Eli Zaretskii 2016-08-14 2:58 ` npostavs
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).