bug#10494: 24.0.92; Syntax table and non-ASCII character interaction

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction
@ 2012-01-13  8:40 Aaron Ecay
  2012-01-13 10:45 ` Andreas Schwab
  2016-08-11  0:29 ` npostavs
  0 siblings, 2 replies; 17+ messages in thread
From: Aaron Ecay @ 2012-01-13  8:40 UTC (permalink / raw)
  To: 10494

This bug report will be sent to the Bug-GNU-Emacs mailing list
and the GNU bug tracker at debbugs.gnu.org.  Please check that
the From: line contains a valid email address.  After a delay of up
to one day, you should receive an acknowledgement at that address.

Please write in English if possible, as the Emacs maintainers
usually do not have translators for other languages.

Please describe exactly what actions triggered the bug, and
the precise symptoms of the bug.  If you can, give a recipe
starting from `emacs -Q':

This bug relates to setting a non-ASCII character punctuation character
(U+2019, which is ’) to have word syntax, and using word-motion
commands.  Here’s a recipe from emacs -Q:

M-x text-mode
don't
C-a M-f
  -> (as expected, the cursor moves to the end of the line)
RET RET
don M-x ucs-insert 2019 t
  -> (text in buffer: "don’t")
C-a M-f
  -> (cursor is on the quotation mark, as expected)
M-: (modify-syntax-entry ?’ "w" text-mode-syntax-table)
C-a M-f
  -> (BUG: cursor is on quotation mark, which should count as part of the word)

If you re-run the experiment substituting - for ’ everywhere, there is a
difference in behavior – the cursor moves to the end of the line after
the call to modify-syntax-entry, as expected.  This leads me to think
that the problem has to do with ’ being outside the ASCII charset.

This is with a recent-ish bzr trunk build, btw.  The most recent commit
is:
revno: 106824 [merge]
committer: Chong Yidong <cyd@gnu.org>
branch nick: trunk
timestamp: Mon 2012-01-09 13:48:13 +0800
message:
  Merge changes from emacs-23 branch

If Emacs crashed, and you have the Emacs process in the gdb debugger,
please include the output from the following gdb commands:
    `bt full' and `xbacktrace'.
For information about debugging Emacs, please read the file
/Users/aecay/Applications/Emacs.app/Contents/Resources/etc/DEBUG.

In GNU Emacs 24.0.92.1 (i386-apple-darwin10.8.0, NS apple-appkit-1038.36)
 of 2012-01-09 on awe
Windowing system distributor `Apple', version 10.3.1038
configured using `configure  '--with-ns' '--without-gnutls''

-- 
Aaron Ecay

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction
  2012-01-13  8:40 bug#10494: 24.0.92; Syntax table and non-ASCII character interaction Aaron Ecay
@ 2012-01-13 10:45 ` Andreas Schwab
  2012-01-13 17:04   ` Aaron Ecay
  2016-08-11  0:29 ` npostavs
  1 sibling, 1 reply; 17+ messages in thread
From: Andreas Schwab @ 2012-01-13 10:45 UTC (permalink / raw)
  To: Aaron Ecay; +Cc: 10494

Aaron Ecay <aaronecay@gmail.com> writes:

> M-x text-mode
> don't
> C-a M-f
>   -> (as expected, the cursor moves to the end of the line)
> RET RET
> don M-x ucs-insert 2019 t
>   -> (text in buffer: "don’t")
> C-a M-f
>   -> (cursor is on the quotation mark, as expected)
> M-: (modify-syntax-entry ?’ "w" text-mode-syntax-table)
> C-a M-f
>   -> (BUG: cursor is on quotation mark, which should count as part of the word)

?’ isn't of the same script as the surrounding characters, so there are
word boundaries before and after it.  See also
word-combining-categories.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction
  2012-01-13 10:45 ` Andreas Schwab
@ 2012-01-13 17:04   ` Aaron Ecay
  0 siblings, 0 replies; 17+ messages in thread
From: Aaron Ecay @ 2012-01-13 17:04 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: 10494

On Fri, 13 Jan 2012 11:45:21 +0100, Andreas Schwab <schwab@linux-m68k.org> wrote:
> ?’ isn't of the same script as the surrounding characters, so there are
> word boundaries before and after it.  See also
> word-combining-categories.

What does that mean?  I assume it has something to do with the
“category:” line in the output of describe-char.  For ?’ this gives:
“.:Base, c:Chinese, h:Korean, j:Japanese”; for ?' it is “.:Base,
a:ASCII, l:Latin, r:Roman”.  So, what does ?’ have to do with CJK
scripts?

More specifically, I would like to use ?’ as an apostrophe in writing
text, so I’d like for word-motion commands to treat it as part of a
word, just as they do ?'.  How might this be accomplished?

Thanks,

-- 
Aaron Ecay

^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction
  2012-01-13  8:40 bug#10494: 24.0.92; Syntax table and non-ASCII character interaction Aaron Ecay
  2012-01-13 10:45 ` Andreas Schwab
@ 2016-08-11  0:29 ` npostavs
  2016-08-11 15:24   ` Eli Zaretskii
  1 sibling, 1 reply; 17+ messages in thread
From: npostavs @ 2016-08-11  0:29 UTC (permalink / raw)
  To: Aaron Ecay; +Cc: 10494

tags 10494 confirmed
found 10494 25.1
quit

I confirm this is still the case in 25.1-rc1.

Aaron Ecay <aaronecay@gmail.com> writes:
>
> This bug relates to setting a non-ASCII character punctuation character
> (U+2019, which is ’) to have word syntax, and using word-motion
> commands.  Here’s a recipe from emacs -Q:
>
> M-x text-mode
> don't
> C-a M-f
>   -> (as expected, the cursor moves to the end of the line)
> RET RET
> don M-x ucs-insert 2019 t

This should now use insert-char (C-x 8 RET) instead of ucs-insert.

>   -> (text in buffer: "don’t")
> C-a M-f
>   -> (cursor is on the quotation mark, as expected)
> M-: (modify-syntax-entry ?’ "w" text-mode-syntax-table)
> C-a M-f
>   -> (BUG: cursor is on quotation mark, which should count as part of the word)
>
> If you re-run the experiment substituting - for ’ everywhere, there is a
> difference in behavior – the cursor moves to the end of the line after
> the call to modify-syntax-entry, as expected.  This leads me to think
> that the problem has to do with ’ being outside the ASCII charset.






^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction
  2016-08-11  0:29 ` npostavs
@ 2016-08-11 15:24   ` Eli Zaretskii
  2016-08-12 22:37     ` npostavs
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2016-08-11 15:24 UTC (permalink / raw)
  To: npostavs; +Cc: aaronecay, 10494

> From: npostavs@users.sourceforge.net
> Date: Wed, 10 Aug 2016 20:29:05 -0400
> Cc: 10494@debbugs.gnu.org
> 
> I confirm this is still the case in 25.1-rc1.
> 
> Aaron Ecay <aaronecay@gmail.com> writes:
> >
> > This bug relates to setting a non-ASCII character punctuation character
> > (U+2019, which is ’) to have word syntax, and using word-motion
> > commands.  Here’s a recipe from emacs -Q:
> >
> > M-x text-mode
> > don't
> > C-a M-f
> >   -> (as expected, the cursor moves to the end of the line)
> > RET RET
> > don M-x ucs-insert 2019 t
> 
> This should now use insert-char (C-x 8 RET) instead of ucs-insert.
> 
> >   -> (text in buffer: "don’t")
> > C-a M-f
> >   -> (cursor is on the quotation mark, as expected)
> > M-: (modify-syntax-entry ?’ "w" text-mode-syntax-table)
> > C-a M-f
> >   -> (BUG: cursor is on quotation mark, which should count as part of the word)
> >
> > If you re-run the experiment substituting - for ’ everywhere, there is a
> > difference in behavior – the cursor moves to the end of the line after
> > the call to modify-syntax-entry, as expected.  This leads me to think
> > that the problem has to do with ’ being outside the ASCII charset.

Indeed.  This is a feature: we don't let word-movement commands to
cross into a different script.  IOW, if

  (aref char-script-table C1)

and

  (aref char-script-table C2)

return different values, then we decide that there's a word boundary
between C1 and C2.  See the function word_boundary_p, which is called
from scan_words.

Maybe we should document this somewhere, like the ELisp manual.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction
  2016-08-11 15:24   ` Eli Zaretskii
@ 2016-08-12 22:37     ` npostavs
  2016-08-13  6:56       ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: npostavs @ 2016-08-12 22:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 10494, aaronecay

[-- Attachment #1: Type: text/plain, Size: 373 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

> Indeed.  This is a feature:

Ah, so doing

    (modify-syntax-entry ?’ "w" text-mode-syntax-table)
    (aset char-script-table ?’ 'latin)

does let word motion skip over ’ as OP wanted.

>
> Maybe we should document this somewhere, like the ELisp manual.

`(elisp) Word Motion' looks like a good place for it:


[-- Attachment #2: patch --]
[-- Type: text/plain, Size: 2177 bytes --]

From 03dbee2bf6bae29b21ea36ff3d73bce773458f78 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Fri, 12 Aug 2016 18:33:17 -0400
Subject: [PATCH v1] Document char-script-table's effect on word motion

* doc/lispref/positions.texi (Word Motion): Talk about
char-script-table (Bug #10494).
---
 doc/lispref/positions.texi | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/doc/lispref/positions.texi b/doc/lispref/positions.texi
index 1d748b8..3359ced 100644
--- a/doc/lispref/positions.texi
+++ b/doc/lispref/positions.texi
@@ -192,8 +192,8 @@ Word Motion
 @subsection Motion by Words
 
   The functions for parsing words described below use the syntax table
-to decide whether a given character is part of a word.  @xref{Syntax
-Tables}.
+and @code{char-script-table} to decide whether a given character is
+part of a word.  @xref{Syntax Tables} and @xref{Character Properties}.
 
 @deffn Command forward-word &optional count
 This function moves point forward @var{count} words (or backward if
@@ -207,11 +207,13 @@ Word Motion
 that begin and end words, known as @dfn{word boundaries}, are defined
 by the current buffer's syntax table (@pxref{Syntax Class Table}), but
 modes can override that by setting up a suitable
-@code{find-word-boundary-function-table}, described below.  In any
-case, this function cannot move point past the boundary of the
-accessible portion of the buffer, or across a field boundary
-(@pxref{Fields}).  The most common case of a field boundary is the end
-of the prompt in the minibuffer.
+@code{find-word-boundary-function-table}, described below.  Characters
+that belong to a different script (as defined by
+@code{char-syntax-table}), also mark a word boundary (@pxref{Character
+Properties}).  In any case, this function cannot move point past the
+boundary of the accessible portion of the buffer, or across a field
+boundary (@pxref{Fields}).  The most common case of a field boundary
+is the end of the prompt in the minibuffer.
 
 If it is possible to move @var{count} words, without being stopped
 prematurely by the buffer boundary or a field boundary, the value is
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction
  2016-08-12 22:37     ` npostavs
@ 2016-08-13  6:56       ` Eli Zaretskii
  2016-08-13 13:21         ` npostavs
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2016-08-13  6:56 UTC (permalink / raw)
  To: npostavs; +Cc: 10494, aaronecay

> From: npostavs@users.sourceforge.net
> Cc: 10494@debbugs.gnu.org,  aaronecay@gmail.com
> Date: Fri, 12 Aug 2016 18:37:56 -0400
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Indeed.  This is a feature:
> 
> Ah, so doing
> 
>     (modify-syntax-entry ?’ "w" text-mode-syntax-table)
>     (aset char-script-table ?’ 'latin)
> 
> does let word motion skip over ’ as OP wanted.

Yes.  But I don't recommend such a "solution", because that would most
probably bite elsewhere, when we do want that character behave as a
symbol.

> `(elisp) Word Motion' looks like a good place for it:

Right, thanks.

>    The functions for parsing words described below use the syntax table
> -to decide whether a given character is part of a word.  @xref{Syntax
> -Tables}.
> +and @code{char-script-table} to decide whether a given character is
> +part of a word.  @xref{Syntax Tables} and @xref{Character Properties}.

@xref generates a capitalized "See", so is inappropriate in the middle
of a sentence.  Please use "see @ref" instead.

> +@code{find-word-boundary-function-table}, described below.  Characters
> +that belong to a different script (as defined by

I'd say "belong to different scripts", otherwise the text begs the
question "different from what?".

> +@code{char-syntax-table}), also mark a word boundary (@pxref{Character

"define a word boundary" sounds better to me.

Othwrise, LGTM, thanks.  Please push to emacs-25.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction
  2016-08-13  6:56       ` Eli Zaretskii
@ 2016-08-13 13:21         ` npostavs
  2016-08-13 13:33           ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: npostavs @ 2016-08-13 13:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 10494, aaronecay

[-- Attachment #1: Type: text/plain, Size: 1801 bytes --]

Eli Zaretskii <eliz@gnu.org> writes:

>> From: npostavs@users.sourceforge.net
>> Cc: 10494@debbugs.gnu.org,  aaronecay@gmail.com
>> Date: Fri, 12 Aug 2016 18:37:56 -0400
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > Indeed.  This is a feature:
>> 
>> Ah, so doing
>> 
>>     (modify-syntax-entry ?’ "w" text-mode-syntax-table)
>>     (aset char-script-table ?’ 'latin)
>> 
>> does let word motion skip over ’ as OP wanted.
>
> Yes.  But I don't recommend such a "solution", because that would most
> probably bite elsewhere, when we do want that character behave as a
> symbol.

Sure, but it could be made local to text-mode:

    (modify-syntax-entry ?’ "w" text-mode-syntax-table)
    (defconst my-text-char-script-table
      (let ((table (copy-sequence char-script-table)))
        (aset table ?’ 'latin)
        table))

    (defun my-text-mode-hook ()
      (set (make-local-variable 'char-script-table)
           my-text-char-script-table))
    (add-hook 'text-mode-hook 'my-text-mode-hook)

>
>> `(elisp) Word Motion' looks like a good place for it:
>
> Right, thanks.
>
>>    The functions for parsing words described below use the syntax table
>> -to decide whether a given character is part of a word.  @xref{Syntax
>> -Tables}.
>> +and @code{char-script-table} to decide whether a given character is
>> +part of a word.  @xref{Syntax Tables} and @xref{Character Properties}.
>
> @xref generates a capitalized "See", so is inappropriate in the middle
> of a sentence.  Please use "see @ref" instead.

Uff, I find these multiple variants of reference very confusing, I also
got a complaint from makeinfo that I was missing punctuation after the
first @xref.  Does it look okay now? (I made the other wording fixes
too)


[-- Attachment #2: patch v2 --]
[-- Type: text/plain, Size: 2185 bytes --]

From e18a6dc7be2aa245767d00ac69a0e13605fc4440 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs@gmail.com>
Date: Fri, 12 Aug 2016 18:33:17 -0400
Subject: [PATCH v2] Document char-script-table's effect on word motion

* doc/lispref/positions.texi (Word Motion): Talk about
char-script-table (Bug #10494).
---
 doc/lispref/positions.texi | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/doc/lispref/positions.texi b/doc/lispref/positions.texi
index 1d748b8..b6133dc 100644
--- a/doc/lispref/positions.texi
+++ b/doc/lispref/positions.texi
@@ -192,8 +192,9 @@ Word Motion
 @subsection Motion by Words
 
   The functions for parsing words described below use the syntax table
-to decide whether a given character is part of a word.  @xref{Syntax
-Tables}.
+and @code{char-script-table} to decide whether a given character is
+part of a word.  @xref{Syntax Tables}, and see @ref{Character
+Properties}.
 
 @deffn Command forward-word &optional count
 This function moves point forward @var{count} words (or backward if
@@ -207,11 +208,13 @@ Word Motion
 that begin and end words, known as @dfn{word boundaries}, are defined
 by the current buffer's syntax table (@pxref{Syntax Class Table}), but
 modes can override that by setting up a suitable
-@code{find-word-boundary-function-table}, described below.  In any
-case, this function cannot move point past the boundary of the
-accessible portion of the buffer, or across a field boundary
-(@pxref{Fields}).  The most common case of a field boundary is the end
-of the prompt in the minibuffer.
+@code{find-word-boundary-function-table}, described below.  Characters
+that belong to different scripts (as defined by
+@code{char-syntax-table}), also define a word boundary
+(@pxref{Character Properties}).  In any case, this function cannot
+move point past the boundary of the accessible portion of the buffer,
+or across a field boundary (@pxref{Fields}).  The most common case of
+a field boundary is the end of the prompt in the minibuffer.
 
 If it is possible to move @var{count} words, without being stopped
 prematurely by the buffer boundary or a field boundary, the value is
-- 
2.9.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction
  2016-08-13 13:21         ` npostavs
@ 2016-08-13 13:33           ` Eli Zaretskii
  2016-08-13 14:19             ` npostavs
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2016-08-13 13:33 UTC (permalink / raw)
  To: npostavs; +Cc: 10494, aaronecay

> From: npostavs@users.sourceforge.net
> Cc: 10494@debbugs.gnu.org,  aaronecay@gmail.com
> Date: Sat, 13 Aug 2016 09:21:54 -0400
> 
> >>     (modify-syntax-entry ?’ "w" text-mode-syntax-table)
> >>     (aset char-script-table ?’ 'latin)
> >> 
> >> does let word motion skip over ’ as OP wanted.
> >
> > Yes.  But I don't recommend such a "solution", because that would most
> > probably bite elsewhere, when we do want that character behave as a
> > symbol.
> 
> Sure, but it could be made local to text-mode:
> 
>     (modify-syntax-entry ?’ "w" text-mode-syntax-table)
>     (defconst my-text-char-script-table
>       (let ((table (copy-sequence char-script-table)))
>         (aset table ?’ 'latin)
>         table))
> 
>     (defun my-text-mode-hook ()
>       (set (make-local-variable 'char-script-table)
>            my-text-char-script-table))
>     (add-hook 'text-mode-hook 'my-text-mode-hook)

Are you sure nothing in text-mode will ever want to use \s_ in any
regexp?

> > @xref generates a capitalized "See", so is inappropriate in the middle
> > of a sentence.  Please use "see @ref" instead.
> 
> Uff, I find these multiple variants of reference very confusing, I also
> got a complaint from makeinfo that I was missing punctuation after the
> first @xref.  Does it look okay now? (I made the other wording fixes
> too)

Yes, looks good, thanks.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction
  2016-08-13 13:33           ` Eli Zaretskii
@ 2016-08-13 14:19             ` npostavs
  2016-08-13 14:31               ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: npostavs @ 2016-08-13 14:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 10494, aaronecay

Eli Zaretskii <eliz@gnu.org> writes:

>> From: npostavs@users.sourceforge.net
>> Cc: 10494@debbugs.gnu.org,  aaronecay@gmail.com
>> Date: Sat, 13 Aug 2016 09:21:54 -0400
>> 
>> >>     (modify-syntax-entry ?’ "w" text-mode-syntax-table)
>> >>     (aset char-script-table ?’ 'latin)
>> >> 
>> >> does let word motion skip over ’ as OP wanted.
>> >
>> > Yes.  But I don't recommend such a "solution", because that would most
>> > probably bite elsewhere, when we do want that character behave as a
>> > symbol.
>> 
>> Sure, but it could be made local to text-mode:
>> 
>>     (modify-syntax-entry ?’ "w" text-mode-syntax-table)
>>     (defconst my-text-char-script-table
>>       (let ((table (copy-sequence char-script-table)))
>>         (aset table ?’ 'latin)
>>         table))
>> 
>>     (defun my-text-mode-hook ()
>>       (set (make-local-variable 'char-script-table)
>>            my-text-char-script-table))
>>     (add-hook 'text-mode-hook 'my-text-mode-hook)
>
> Are you sure nothing in text-mode will ever want to use \s_ in any
> regexp?

Did you mean \> (word boundary) or \s. (punctation)?  \s_ doesn't match
’ regardless because its syntax class is punctuation, not symbol.

If the user wants ’ to be part of a word, then surely it's correct for
regexps to treat it as such.

>
>> > @xref generates a capitalized "See", so is inappropriate in the middle
>> > of a sentence.  Please use "see @ref" instead.
>> 
>> Uff, I find these multiple variants of reference very confusing, I also
>> got a complaint from makeinfo that I was missing punctuation after the
>> first @xref.  Does it look okay now? (I made the other wording fixes
>> too)
>
> Yes, looks good, thanks.

Pushed as 8342e748





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction
  2016-08-13 14:19             ` npostavs
@ 2016-08-13 14:31               ` Eli Zaretskii
  2016-08-13 14:55                 ` Eli Zaretskii
  2016-08-13 18:14                 ` npostavs
  0 siblings, 2 replies; 17+ messages in thread
From: Eli Zaretskii @ 2016-08-13 14:31 UTC (permalink / raw)
  To: npostavs; +Cc: 10494, aaronecay

> From: npostavs@users.sourceforge.net
> Cc: 10494@debbugs.gnu.org,  aaronecay@gmail.com
> Date: Sat, 13 Aug 2016 10:19:34 -0400
> 
> >>     (modify-syntax-entry ?’ "w" text-mode-syntax-table)
> >>     (defconst my-text-char-script-table
> >>       (let ((table (copy-sequence char-script-table)))
> >>         (aset table ?’ 'latin)
> >>         table))
> >> 
> >>     (defun my-text-mode-hook ()
> >>       (set (make-local-variable 'char-script-table)
> >>            my-text-char-script-table))
> >>     (add-hook 'text-mode-hook 'my-text-mode-hook)
> >
> > Are you sure nothing in text-mode will ever want to use \s_ in any
> > regexp?
> 
> Did you mean \> (word boundary) or \s. (punctation)?  \s_ doesn't match
> ’ regardless because its syntax class is punctuation, not symbol.

Sorry, I guess I was thinking of \cl.  It will not match ’, although
it might be expected.

Anyway, my point is that these char-tables should really be treated as
read-only by Lisp applications.

> Pushed as 8342e748

Thanks.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction
  2016-08-13 14:31               ` Eli Zaretskii
@ 2016-08-13 14:55                 ` Eli Zaretskii
  2016-08-13 18:14                 ` npostavs
  1 sibling, 0 replies; 17+ messages in thread
From: Eli Zaretskii @ 2016-08-13 14:55 UTC (permalink / raw)
  To: npostavs; +Cc: 10494, aaronecay

> Date: Sat, 13 Aug 2016 17:31:48 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: 10494@debbugs.gnu.org, aaronecay@gmail.com
> 
> > From: npostavs@users.sourceforge.net
> > Cc: 10494@debbugs.gnu.org,  aaronecay@gmail.com
> > Date: Sat, 13 Aug 2016 10:19:34 -0400
> > 
> > >>     (modify-syntax-entry ?’ "w" text-mode-syntax-table)
> > >>     (defconst my-text-char-script-table
> > >>       (let ((table (copy-sequence char-script-table)))
> > >>         (aset table ?’ 'latin)
> > >>         table))
> > >> 
> > >>     (defun my-text-mode-hook ()
> > >>       (set (make-local-variable 'char-script-table)
> > >>            my-text-char-script-table))
> > >>     (add-hook 'text-mode-hook 'my-text-mode-hook)
> > >
> > > Are you sure nothing in text-mode will ever want to use \s_ in any
> > > regexp?
> > 
> > Did you mean \> (word boundary) or \s. (punctation)?  \s_ doesn't match
> > ’ regardless because its syntax class is punctuation, not symbol.
> 
> Sorry, I guess I was thinking of \cl.  It will not match ’, although
> it might be expected.
> 
> Anyway, my point is that these char-tables should really be treated as
> read-only by Lisp applications.

Btw, some believe that using ’ as an apostrophe is wrong.  They say
U+02BC should be used instead; see, for example, this discussion:

  http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0047.html

That character already is word-constituent.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction
  2016-08-13 14:31               ` Eli Zaretskii
  2016-08-13 14:55                 ` Eli Zaretskii
@ 2016-08-13 18:14                 ` npostavs
  2016-08-13 18:35                   ` Eli Zaretskii
  1 sibling, 1 reply; 17+ messages in thread
From: npostavs @ 2016-08-13 18:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 10494, aaronecay

Eli Zaretskii <eliz@gnu.org> writes:

>> From: npostavs@users.sourceforge.net
>> Cc: 10494@debbugs.gnu.org,  aaronecay@gmail.com
>> Date: Sat, 13 Aug 2016 10:19:34 -0400
>> 
>> >>     (modify-syntax-entry ?’ "w" text-mode-syntax-table)
>> >>     (defconst my-text-char-script-table
>> >>       (let ((table (copy-sequence char-script-table)))
>> >>         (aset table ?’ 'latin)
>> >>         table))
>> >> 
>> >>     (defun my-text-mode-hook ()
>> >>       (set (make-local-variable 'char-script-table)
>> >>            my-text-char-script-table))
>> >>     (add-hook 'text-mode-hook 'my-text-mode-hook)
>> >
>> > Are you sure nothing in text-mode will ever want to use \s_ in any
>> > regexp?
>> 
>> Did you mean \> (word boundary) or \s. (punctation)?  \s_ doesn't match
>> ’ regardless because its syntax class is punctuation, not symbol.
>
> Sorry, I guess I was thinking of \cl.  It will not match ’, although
> it might be expected.

Which could be fixed by (modify-category-entry ?’ ?l).

I would suggest this additional docstring patch, because I was confused
at first as to what CATEGORY was supposed to be (I looked around a bit
for how to create some kind of "category object"):

diff --git i/src/category.c w/src/category.c
index 4397f66..31ac2ec 100644
--- i/src/category.c
+++ w/src/category.c
@@ -336,6 +336,7 @@ DEFUN ("modify-category-entry", Fmodify_category_entry,
 the current buffer's category table.
 CHARACTER can be either a single character or a cons representing the
 lower and upper ends of an inclusive character range to modify.
+CATEGORY must be a category name (a character between ` ' and `~').
 If optional fourth argument RESET is non-nil,
 then delete CATEGORY from the category set instead of adding it.  */)
   (Lisp_Object character, Lisp_Object category, Lisp_Object table, Lisp_Object reset)

>
> Anyway, my point is that these char-tables should really be treated as
> read-only by Lisp applications.

Right, but I think this bug is about the user modifying stuff.





^ permalink raw reply related	[flat|nested] 17+ messages in thread

* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction
  2016-08-13 18:14                 ` npostavs
@ 2016-08-13 18:35                   ` Eli Zaretskii
  2016-08-13 21:42                     ` npostavs
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2016-08-13 18:35 UTC (permalink / raw)
  To: npostavs; +Cc: 10494, aaronecay

> From: npostavs@users.sourceforge.net
> Cc: 10494@debbugs.gnu.org,  aaronecay@gmail.com
> Date: Sat, 13 Aug 2016 14:14:49 -0400
> 
> > Sorry, I guess I was thinking of \cl.  It will not match ’, although
> > it might be expected.
> 
> Which could be fixed by (modify-category-entry ?’ ?l).

This is Emacs, right?

But the fact that you can do this doesn't yet mean you should want to,
or that we should encourage it.

> I would suggest this additional docstring patch, because I was confused
> at first as to what CATEGORY was supposed to be (I looked around a bit
> for how to create some kind of "category object"):
> 
> diff --git i/src/category.c w/src/category.c
> index 4397f66..31ac2ec 100644
> --- i/src/category.c
> +++ w/src/category.c
> @@ -336,6 +336,7 @@ DEFUN ("modify-category-entry", Fmodify_category_entry,
>  the current buffer's category table.
>  CHARACTER can be either a single character or a cons representing the
>  lower and upper ends of an inclusive character range to modify.
> +CATEGORY must be a category name (a character between ` ' and `~').
>  If optional fourth argument RESET is non-nil,
>  then delete CATEGORY from the category set instead of adding it.  */)
>    (Lisp_Object character, Lisp_Object category, Lisp_Object table, Lisp_Object reset)

How about mentioning describe-categories as well?

> > Anyway, my point is that these char-tables should really be treated as
> > read-only by Lisp applications.
> 
> Right, but I think this bug is about the user modifying stuff.

Which is even less recommendable, IMO.  How many users really
understand the implications?





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction
  2016-08-13 18:35                   ` Eli Zaretskii
@ 2016-08-13 21:42                     ` npostavs
  2016-08-14  2:32                       ` Eli Zaretskii
  0 siblings, 1 reply; 17+ messages in thread
From: npostavs @ 2016-08-13 21:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 10494, aaronecay

Eli Zaretskii <eliz@gnu.org> writes:
>
> How about mentioning describe-categories as well?

Okay, so:

diff --git i/src/category.c w/src/category.c
index 4397f66..8315797 100644
--- i/src/category.c
+++ w/src/category.c
@@ -336,6 +336,8 @@ DEFUN ("modify-category-entry", Fmodify_category_entry,
 the current buffer's category table.
 CHARACTER can be either a single character or a cons representing the
 lower and upper ends of an inclusive character range to modify.
+CATEGORY must be a category name (a character between ` ' and `~').
+Use `describe-categories' to see existing category names.
 If optional fourth argument RESET is non-nil,
 then delete CATEGORY from the category set instead of adding it.  */)
   (Lisp_Object character, Lisp_Object category, Lisp_Object table, Lisp_Object reset)


>
>> > Anyway, my point is that these char-tables should really be treated as
>> > read-only by Lisp applications.
>> 
>> Right, but I think this bug is about the user modifying stuff.
>
> Which is even less recommendable, IMO.  How many users really
> understand the implications?

Well, they might find what the implications are by trying it :)

Anyway, do you think there is anything else to do about this bug?





^ permalink raw reply related	[flat|nested] 17+ messages in thread

* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction
  2016-08-13 21:42                     ` npostavs
@ 2016-08-14  2:32                       ` Eli Zaretskii
  2016-08-14  2:58                         ` npostavs
  0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2016-08-14  2:32 UTC (permalink / raw)
  To: npostavs; +Cc: 10494, aaronecay

> From: npostavs@users.sourceforge.net
> Cc: 10494@debbugs.gnu.org,  aaronecay@gmail.com
> Date: Sat, 13 Aug 2016 17:42:03 -0400
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> >
> > How about mentioning describe-categories as well?
> 
> Okay, so:
> 
> diff --git i/src/category.c w/src/category.c
> index 4397f66..8315797 100644
> --- i/src/category.c
> +++ w/src/category.c
> @@ -336,6 +336,8 @@ DEFUN ("modify-category-entry", Fmodify_category_entry,
>  the current buffer's category table.
>  CHARACTER can be either a single character or a cons representing the
>  lower and upper ends of an inclusive character range to modify.
> +CATEGORY must be a category name (a character between ` ' and `~').
> +Use `describe-categories' to see existing category names.
>  If optional fourth argument RESET is non-nil,
>  then delete CATEGORY from the category set instead of adding it.  */)
>    (Lisp_Object character, Lisp_Object category, Lisp_Object table, Lisp_Object reset)

LGTM, thanks.

> Anyway, do you think there is anything else to do about this bug?

Not that I can see, no.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* bug#10494: 24.0.92; Syntax table and non-ASCII character interaction
  2016-08-14  2:32                       ` Eli Zaretskii
@ 2016-08-14  2:58                         ` npostavs
  0 siblings, 0 replies; 17+ messages in thread
From: npostavs @ 2016-08-14  2:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 10494, aaronecay

tags 10494 notabug
close 10494 
quit

Eli Zaretskii <eliz@gnu.org> writes:
>
> LGTM, thanks.

Pushed as 8d681476 "Document CATEGORY arg to modify-category-entry"

>
>> Anyway, do you think there is anything else to do about this bug?
>
> Not that I can see, no.

Okay, I'm closing it.





^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-08-14  2:58 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-13  8:40 bug#10494: 24.0.92; Syntax table and non-ASCII character interaction Aaron Ecay
2012-01-13 10:45 ` Andreas Schwab
2012-01-13 17:04   ` Aaron Ecay
2016-08-11  0:29 ` npostavs
2016-08-11 15:24   ` Eli Zaretskii
2016-08-12 22:37     ` npostavs
2016-08-13  6:56       ` Eli Zaretskii
2016-08-13 13:21         ` npostavs
2016-08-13 13:33           ` Eli Zaretskii
2016-08-13 14:19             ` npostavs
2016-08-13 14:31               ` Eli Zaretskii
2016-08-13 14:55                 ` Eli Zaretskii
2016-08-13 18:14                 ` npostavs
2016-08-13 18:35                   ` Eli Zaretskii
2016-08-13 21:42                     ` npostavs
2016-08-14  2:32                       ` Eli Zaretskii
2016-08-14  2:58                         ` npostavs

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).