bug#1726: 23.0.60; end-of-sentence and non-breaking space

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#1726: 23.0.60; end-of-sentence and non-breaking space
@ 2008-12-29 10:23 Richard M Stallman
  2011-09-11 18:43 ` Lars Magne Ingebrigtsen
  0 siblings, 1 reply; 24+ messages in thread
From: Richard M Stallman @ 2008-12-29 10:23 UTC (permalink / raw)
  To: emacs-pretest-bug

forward-sentence does not treat non-breaking space as a space
for purposes of sentence ends.

I will fix this as soon as I know a way to put non-breaking space
into a string constant.

In GNU Emacs 23.0.60.15 (mipsel-unknown-linux-gnu, GTK+ Version 2.12.11)
 of 2008-12-22 on lemote-yeeloong
configured using `configure  'CFLAGS=-O0 -g -Wno-pointer-sign' 'mipsel-unknown-linux-gnu' 'build_alias=mipsel-unknown-linux-gnu' 'host_alias=mipsel-unknown-linux-gnu' 'target_alias=mipsel-unknown-linux-gnu''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default-enable-multibyte-characters: t

Major mode: Mail

Minor modes in effect:
  gpm-mouse-mode: t
  tooltip-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  global-auto-composition-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t
  abbrev-mode: t

Recent input:
r e SPC s h o u l d SPC b e SPC a SPC C-x C-f ESC p 
ESC DEL ESC DEL l r e a d . c RET C-s C-j r e a d _ 
s c DEL DEL e s c C-v ESC v C-v C-v C-v C-v C-v C-v 
C-u C-x m C-a C-k P e r h a p s SPC \ N SPC s h o u 
d l SPC g i v e ESC b C-b C-b C-t C-e SPC n o n - b 
r e a k i n g SPC s o p a c e . ESC b C-f C-d C-c C-c 
C-x b o u t g TAB RET g C-p C-p C-n C-n R o u t - 2 
5 RET C-p C-p e C-u C-u C-n C-u C-u C-n C-p C-p C-o 
C-o I SPC n e e d SPC t h i s SPC ESC DEL t o SPC s 
o l v e SPC t h i s SPC p r o b l e m SPC i n SPC o 
r d e r SPC t o SPC i DEL f i x SPC p a r a g r a p 
h s . e l RET t o SPC h a b d l e C-b C-b C-b DEL n 
C-p C-e ESC DEL ESC DEL f o r w a r d - s e n t e n 
c e C-n SPC n o n ESC / ESC SPC ESC / . C-x C-s C-a 
C-p C-p C-@ C-n C-n C-n C-w C-x C-s ESC x r e p o r 
t SPC e m a v s SPC DEL DEL c s SPC b u g RET

Recent messages:
Auto save file for draft message exists; consider M-x mail-recover
Sending...
Wrote /home/rms/outgoing/out-24
Sending...done
Move: 1 of 1
Move: 1 file
Wrote /home/rms/outgoing/out-24
Mark set
Auto-saving...done
Wrote /home/rms/outgoing/out-24

^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1726: 23.0.60; end-of-sentence and non-breaking space
@ 2009-01-01  3:47 Chong Yidong
  0 siblings, 0 replies; 24+ messages in thread
From: Chong Yidong @ 2009-01-01  3:47 UTC (permalink / raw)
  To: emacs-devel; +Cc: 1726, rms, 1727

From bug#1726 and bug#1727:

> forward-sentence does not treat non-breaking space as a space for
> purposes of sentence ends.
...
> When I type C-x = at a non-breaking space, it tells me that it
> has code 160, hex a0.  But when I execute (insert "\xa0"),
> it inserts something that displays as `\240' and for which C-x =
> displays this:
>
>    Char:   (4194208, #o17777640, #x3fffa0, raw-byte) point=198 of 211
>    (93%) column=5
>
> Is that a bug?  It seems quite confusing to me.

ISTR that there was an extended discussion about classifying
non-breaking spaces on this list a while back.  But I can't find it now.
Does anyone remember the details?






^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1726: 23.0.60; end-of-sentence and non-breaking space
       [not found] <87y6xv7kmc.fsf@cyd.mit.edu>
  2009-01-02  1:25 ` bug#1727: 23.0.60; end-of-sentence and non-breaking space Richard M Stallman
@ 2009-01-02  1:25 ` Richard M Stallman
  2009-01-02  2:38 ` bug#1727: " Drew Adams
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Richard M Stallman @ 2009-01-02  1:25 UTC (permalink / raw)
  To: Chong Yidong; +Cc: 1726, 1727, emacs-devel

    > When I type C-x = at a non-breaking space, it tells me that it
    > has code 160, hex a0.  But when I execute (insert "\xa0"),
    > it inserts something that displays as `\240' and for which C-x =
    > displays this:

    >    Char:   (4194208, #o17777640, #x3fffa0, raw-byte) point=198 of 211
    >    (93%) column=5
    >
    > Is that a bug?  It seems quite confusing to me.

    ISTR that there was an extended discussion about classifying
    non-breaking spaces on this list a while back.  But I can't find it now.
    Does anyone remember the details?

I am not sure we are talking about the same question.
The issue I am raising is not one of classifying it,
it is that these two different character codes get used
and I don't see an explanation of what's going on.






^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1727: 23.0.60; end-of-sentence and non-breaking space
       [not found] <87y6xv7kmc.fsf@cyd.mit.edu>
@ 2009-01-02  1:25 ` Richard M Stallman
  2009-01-02  1:25 ` bug#1726: " Richard M Stallman
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Richard M Stallman @ 2009-01-02  1:25 UTC (permalink / raw)
  To: Chong Yidong; +Cc: 1726, 1727, emacs-devel

    > When I type C-x = at a non-breaking space, it tells me that it
    > has code 160, hex a0.  But when I execute (insert "\xa0"),
    > it inserts something that displays as `\240' and for which C-x =
    > displays this:

    >    Char:   (4194208, #o17777640, #x3fffa0, raw-byte) point=198 of 211
    >    (93%) column=5
    >
    > Is that a bug?  It seems quite confusing to me.

    ISTR that there was an extended discussion about classifying
    non-breaking spaces on this list a while back.  But I can't find it now.
    Does anyone remember the details?

I am not sure we are talking about the same question.
The issue I am raising is not one of classifying it,
it is that these two different character codes get used
and I don't see an explanation of what's going on.





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1727: 23.0.60; end-of-sentence and non-breaking space
       [not found] <87y6xv7kmc.fsf@cyd.mit.edu>
  2009-01-02  1:25 ` bug#1727: 23.0.60; end-of-sentence and non-breaking space Richard M Stallman
  2009-01-02  1:25 ` bug#1726: " Richard M Stallman
@ 2009-01-02  2:38 ` Drew Adams
  2009-01-02  2:38 ` bug#1726: " Drew Adams
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Drew Adams @ 2009-01-02  2:38 UTC (permalink / raw)
  To: 'Chong Yidong', 1727, emacs-devel; +Cc: 1726, rms

> ISTR that there was an extended discussion about classifying
> non-breaking spaces on this list a while back.  But I can't 
> find it now. Does anyone remember the details?

Dunno if this is what you were thinking of, but there was this discussion about
treating (classifying) nonbreaking space as whitespace:

http://lists.gnu.org/archive/html/emacs-devel/2007-06/msg01089.html







^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1726: bug#1727: 23.0.60; end-of-sentence and non-breaking space
       [not found] <87y6xv7kmc.fsf@cyd.mit.edu>
                   ` (2 preceding siblings ...)
  2009-01-02  2:38 ` bug#1727: " Drew Adams
@ 2009-01-02  2:38 ` Drew Adams
  2009-01-02  4:11 ` bug#1726: " Stefan Monnier
  2009-01-02  4:11 ` bug#1727: " Stefan Monnier
  5 siblings, 0 replies; 24+ messages in thread
From: Drew Adams @ 2009-01-02  2:38 UTC (permalink / raw)
  To: 'Chong Yidong', 1727, emacs-devel; +Cc: 1726, rms

> ISTR that there was an extended discussion about classifying
> non-breaking spaces on this list a while back.  But I can't 
> find it now. Does anyone remember the details?

Dunno if this is what you were thinking of, but there was this discussion about
treating (classifying) nonbreaking space as whitespace:

http://lists.gnu.org/archive/html/emacs-devel/2007-06/msg01089.html






^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1727: 23.0.60; end-of-sentence and non-breaking space
       [not found] <87y6xv7kmc.fsf@cyd.mit.edu>
                   ` (4 preceding siblings ...)
  2009-01-02  4:11 ` bug#1726: " Stefan Monnier
@ 2009-01-02  4:11 ` Stefan Monnier
  5 siblings, 0 replies; 24+ messages in thread
From: Stefan Monnier @ 2009-01-02  4:11 UTC (permalink / raw)
  To: Chong Yidong; +Cc: 1726, 1727, rms, emacs-devel

>> has code 160, hex a0.  But when I execute (insert "\xa0"),
>> it inserts something that displays as `\240' and for which C-x =
>> displays this:
>> 
>> Char:   (4194208, #o17777640, #x3fffa0, raw-byte) point=198 of 211
>> (93%) column=5
>> 
>> Is that a bug?  It seems quite confusing to me.

This raw-byte char is what used to be called an eight-bit-control (or
eight-bit-graphic depending on the actual value) char.

I.e. "\xa0" is treated as a string that contains the \xa0 byte (i.e. an
eight-bit-* (aka raw-byte) char) rather than the \xa0 char (a latin-1
non-breaking space).


        Stefan





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1726: 23.0.60; end-of-sentence and non-breaking space
       [not found] <87y6xv7kmc.fsf@cyd.mit.edu>
                   ` (3 preceding siblings ...)
  2009-01-02  2:38 ` bug#1726: " Drew Adams
@ 2009-01-02  4:11 ` Stefan Monnier
  2009-01-02  4:11 ` bug#1727: " Stefan Monnier
  5 siblings, 0 replies; 24+ messages in thread
From: Stefan Monnier @ 2009-01-02  4:11 UTC (permalink / raw)
  To: Chong Yidong; +Cc: 1726, 1727, rms, emacs-devel

>> has code 160, hex a0.  But when I execute (insert "\xa0"),
>> it inserts something that displays as `\240' and for which C-x =
>> displays this:
>> 
>> Char:   (4194208, #o17777640, #x3fffa0, raw-byte) point=198 of 211
>> (93%) column=5
>> 
>> Is that a bug?  It seems quite confusing to me.

This raw-byte char is what used to be called an eight-bit-control (or
eight-bit-graphic depending on the actual value) char.

I.e. "\xa0" is treated as a string that contains the \xa0 byte (i.e. an
eight-bit-* (aka raw-byte) char) rather than the \xa0 char (a latin-1
non-breaking space).


        Stefan






^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1726: 23.0.60; end-of-sentence and non-breaking space
       [not found] <jwvljtupcw3.fsf-monnier+emacsbugreports@gnu.org>
@ 2009-01-02 17:13 ` Richard M Stallman
       [not found] ` <E1LInaw-0006om-DT@fencepost.gnu.org>
  1 sibling, 0 replies; 24+ messages in thread
From: Richard M Stallman @ 2009-01-02 17:13 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: cyd, 1726, emacs-devel

    This raw-byte char is what used to be called an eight-bit-control (or
    eight-bit-graphic depending on the actual value) char.

    I.e. "\xa0" is treated as a string that contains the \xa0 byte (i.e. an
    eight-bit-* (aka raw-byte) char) rather than the \xa0 char (a latin-1
    non-breaking space).

1. Is that the right thing for \xa0 in a string to mean?
Or should it mean the character with code xa0?

2. I find it hard to think about that question since I don't see any
documentation explaining how this ought to work.  That documentation
is essential.






^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1726: 23.0.60; end-of-sentence and non-breaking space
       [not found] ` <E1LInaw-0006om-DT@fencepost.gnu.org>
@ 2009-01-03  3:06   ` Stefan Monnier
       [not found]   ` <jwv7i5dnlcy.fsf-monnier+emacsbugreports@gnu.org>
  1 sibling, 0 replies; 24+ messages in thread
From: Stefan Monnier @ 2009-01-03  3:06 UTC (permalink / raw)
  To: rms; +Cc: cyd, 1726, emacs-devel

>     This raw-byte char is what used to be called an eight-bit-control (or
>     eight-bit-graphic depending on the actual value) char.

>     I.e. "\xa0" is treated as a string that contains the \xa0 byte (i.e. an
>     eight-bit-* (aka raw-byte) char) rather than the \xa0 char (a latin-1
>     non-breaking space).

> 1. Is that the right thing for \xa0 in a string to mean?
> Or should it mean the character with code xa0?

> 2. I find it hard to think about that question since I don't see any
> documentation explaining how this ought to work.  That documentation
> is essential.

Good point.  Especially because I think this changed from Emacs-20 to
Emacs-21, and I think it also changed now from Emacs-22 to Emacs-23.

IIUC if you want the character with code #xa0, then using \u00a0 would
seem like the most unambiguous option (I notice that "\ua0" gives
a weird error "Non-hex digit used for Unicode escape").

Not sure what \NNN or \xMM should do.


        Stefan






^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1726: 23.0.60; end-of-sentence and non-breaking space
       [not found]   ` <jwv7i5dnlcy.fsf-monnier+emacsbugreports@gnu.org>
@ 2009-01-03  9:54     ` Eli Zaretskii
  2009-01-03 15:21     ` Richard M Stallman
       [not found]     ` <E1LJ8K6-0003dL-UD@fencepost.gnu.org>
  2 siblings, 0 replies; 24+ messages in thread
From: Eli Zaretskii @ 2009-01-03  9:54 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: cyd, emacs-devel, rms, 1726

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Fri, 02 Jan 2009 22:06:21 -0500
> Cc: cyd@stupidchicken.com, 1726@emacsbugs.donarmstrong.com, emacs-devel@gnu.org
> 
> Good point.  Especially because I think this changed from Emacs-20 to
> Emacs-21, and I think it also changed now from Emacs-22 to Emacs-23.

I think the change in Emacs 23 is OK, but it needs to be consistent in
characters and strings.

> IIUC if you want the character with code #xa0, then using \u00a0 would
> seem like the most unambiguous option

Agreed.  Using \uNNNN is an unambiguous way of saying you want a Unicode
character whose codepoint is NNNN in hex.

> Not sure what \NNN or \xMM should do.

I think they should insert a raw byte with that code, and I think they
should do that both in characters and in strings, so that the
inconsistent behavior reported by Richard will become consistent.






^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1726: 23.0.60; end-of-sentence and non-breaking space
       [not found]   ` <jwv7i5dnlcy.fsf-monnier+emacsbugreports@gnu.org>
  2009-01-03  9:54     ` Eli Zaretskii
@ 2009-01-03 15:21     ` Richard M Stallman
       [not found]     ` <E1LJ8K6-0003dL-UD@fencepost.gnu.org>
  2 siblings, 0 replies; 24+ messages in thread
From: Richard M Stallman @ 2009-01-03 15:21 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: cyd, 1726, emacs-devel

    IIUC if you want the character with code #xa0, then using \u00a0 would
    seem like the most unambiguous option (I notice that "\ua0" gives
    a weird error "Non-hex digit used for Unicode escape").

I expected \xa0 to give me that character.  It still seems strange
that it would do anything else.

When I read the documentation of \u, I thought it meant "unicode" as
opposed to "Emacs's internal code".  Since I knew that Emacs now
follows unicode for these characters, I saw no reason to consider
using \u.







^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1726: 23.0.60; end-of-sentence and non-breaking space
       [not found]     ` <E1LJ8K6-0003dL-UD@fencepost.gnu.org>
@ 2009-01-03 16:44       ` Eli Zaretskii
       [not found]       ` <utz8gmj9t.fsf@gnu.org>
  1 sibling, 0 replies; 24+ messages in thread
From: Eli Zaretskii @ 2009-01-03 16:44 UTC (permalink / raw)
  To: rms; +Cc: cyd, emacs-devel, 1726

> From: Richard M Stallman <rms@gnu.org>
> Date: Sat, 03 Jan 2009 10:21:58 -0500
> Cc: cyd@stupidchicken.com, 1726@emacsbugs.donarmstrong.com, emacs-devel@gnu.org
> 
>     IIUC if you want the character with code #xa0, then using \u00a0 would
>     seem like the most unambiguous option (I notice that "\ua0" gives
>     a weird error "Non-hex digit used for Unicode escape").
> 
> I expected \xa0 to give me that character.  It still seems strange
> that it would do anything else.

We need some way of inserting raw 8-bit bytes, because otherwise code
that encodes and decodes text in Lisp will not work.  For inserting
characters, we have the \u alternative; but I don't think there's
alternative for raw bytes except insert \xNN.






^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1726: 23.0.60; end-of-sentence and non-breaking space
       [not found]       ` <utz8gmj9t.fsf@gnu.org>
@ 2009-01-04  2:16         ` Richard M Stallman
       [not found]         ` <E1LJIXN-0008Vg-Pe@fencepost.gnu.org>
  2009-01-05  6:37         ` Kenichi Handa
  2 siblings, 0 replies; 24+ messages in thread
From: Richard M Stallman @ 2009-01-04  2:16 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: cyd, 1726, emacs-devel

    We need some way of inserting raw 8-bit bytes, because otherwise code
    that encodes and decodes text in Lisp will not work.  For inserting
    characters, we have the \u alternative; but I don't think there's
    alternative for raw bytes except insert \xNN.

Naybe that is a valid reason for the current behavior, but that
doesn't alter the need for the manual to document the behavior.

Meanwhile, the Chinese and Chinese-derived character codes
do not follow Unicode.  So you can't enter them with \u.
What is the way to enter them?






^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1726: 23.0.60; end-of-sentence and non-breaking space
       [not found]         ` <E1LJIXN-0008Vg-Pe@fencepost.gnu.org>
@ 2009-01-04  4:18           ` Eli Zaretskii
  2009-01-04  4:29           ` Jason Rumney
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: Eli Zaretskii @ 2009-01-04  4:18 UTC (permalink / raw)
  To: rms; +Cc: cyd, 1726, emacs-devel

> From: Richard M Stallman <rms@gnu.org>
> CC: cyd@stupidchicken.com, emacs-devel@gnu.org,
> 	monnier@iro.umontreal.ca, 1726@emacsbugs.donarmstrong.com
> Date: Sat, 03 Jan 2009 21:16:21 -0500
> 
>     We need some way of inserting raw 8-bit bytes, because otherwise code
>     that encodes and decodes text in Lisp will not work.  For inserting
>     characters, we have the \u alternative; but I don't think there's
>     alternative for raw bytes except insert \xNN.
> 
> Naybe that is a valid reason for the current behavior, but that
> doesn't alter the need for the manual to document the behavior.

That was an attempt at explaining the reasons, not telling they don't
need to be documented.

> Meanwhile, the Chinese and Chinese-derived character codes
> do not follow Unicode.  So you can't enter them with \u.
> What is the way to enter them?

The problem at hand exists only for codes that are less than FF hex.






^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1726: 23.0.60; end-of-sentence and non-breaking space
       [not found]         ` <E1LJIXN-0008Vg-Pe@fencepost.gnu.org>
  2009-01-04  4:18           ` Eli Zaretskii
@ 2009-01-04  4:29           ` Jason Rumney
       [not found]           ` <49603B11.5040101@gnu.org>
  2009-01-05  7:11           ` Kenichi Handa
  3 siblings, 0 replies; 24+ messages in thread
From: Jason Rumney @ 2009-01-04  4:29 UTC (permalink / raw)
  To: rms, 1726; +Cc: cyd, emacs-devel

Richard M Stallman wrote:
> Meanwhile, the Chinese and Chinese-derived character codes
> do not follow Unicode.

They do in Emacs 23, though I think if you enter \x1234, it will be 
treated the same as \u1234, as characters with more than 8 bits are 
clearly not eight-bit raw bytes.







^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1726: 23.0.60; end-of-sentence and non-breaking space
       [not found]           ` <49603B11.5040101@gnu.org>
@ 2009-01-04 16:45             ` Eli Zaretskii
  0 siblings, 0 replies; 24+ messages in thread
From: Eli Zaretskii @ 2009-01-04 16:45 UTC (permalink / raw)
  To: Jason Rumney; +Cc: cyd, 1726, rms, emacs-devel

> Date: Sun, 04 Jan 2009 12:29:05 +0800
> From: Jason Rumney <jasonr@gnu.org>
> Cc: Eli Zaretskii <eliz@gnu.org>, cyd@stupidchicken.com, emacs-devel@gnu.org
> 
> Richard M Stallman wrote:
> > Meanwhile, the Chinese and Chinese-derived character codes
> > do not follow Unicode.
> 
> They do in Emacs 23

Not all of them, I think.






^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1726: 23.0.60; end-of-sentence and non-breaking space
       [not found] <umye7n1pt.fsf@gnu.org>
@ 2009-01-04 21:42 ` Richard M Stallman
  0 siblings, 0 replies; 24+ messages in thread
From: Richard M Stallman @ 2009-01-04 21:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: cyd, emacs-devel, 1726

    > Meanwhile, the Chinese and Chinese-derived character codes
    > do not follow Unicode.  So you can't enter them with \u.
    > What is the way to enter them?

    The problem at hand exists only for codes that are less than FF hex.

Maybe, but isn't there a similar problem for Chinese-derived
characters?  How does one specify these codes in a string constant?
Shouldn't there be some way?

Maybe there is never a need to do it; maybe we don't need to add a
feature for it.  But if we don't, we should document that there is
currently no way.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1726: 23.0.60; end-of-sentence and non-breaking space
       [not found]       ` <utz8gmj9t.fsf@gnu.org>
  2009-01-04  2:16         ` Richard M Stallman
       [not found]         ` <E1LJIXN-0008Vg-Pe@fencepost.gnu.org>
@ 2009-01-05  6:37         ` Kenichi Handa
  2 siblings, 0 replies; 24+ messages in thread
From: Kenichi Handa @ 2009-01-05  6:37 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: cyd, 1726, rms, emacs-devel

In article <utz8gmj9t.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > From: Richard M Stallman <rms@gnu.org>
> > Date: Sat, 03 Jan 2009 10:21:58 -0500
> > Cc: cyd@stupidchicken.com, 1726@emacsbugs.donarmstrong.com, emacs-devel@gnu.org
> > 
> >     IIUC if you want the character with code #xa0, then using \u00a0 would
> >     seem like the most unambiguous option (I notice that "\ua0" gives
> >     a weird error "Non-hex digit used for Unicode escape").
> > 
> > I expected \xa0 to give me that character.  It still seems strange
> > that it would do anything else.

> We need some way of inserting raw 8-bit bytes, because otherwise code
> that encodes and decodes text in Lisp will not work.  For inserting
> characters, we have the \u alternative; but I don't think there's
> alternative for raw bytes except insert \xNN.

I modified read_escape to treat "\xXX" as a raw-byte code
but treat "\xXXX.." as a character code U+XXX...  As far as
I remember, this is to keep backward compatibility.

And, we have the alternative for raw bytes.  That is to use
octal form, something like "\240".

---
Kenichi Handa
handa@m17n.org






^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1726: 23.0.60; end-of-sentence and non-breaking space
       [not found]         ` <E1LJIXN-0008Vg-Pe@fencepost.gnu.org>
                             ` (2 preceding siblings ...)
       [not found]           ` <49603B11.5040101@gnu.org>
@ 2009-01-05  7:11           ` Kenichi Handa
  3 siblings, 0 replies; 24+ messages in thread
From: Kenichi Handa @ 2009-01-05  7:11 UTC (permalink / raw)
  To: rms; +Cc: emacs-devel, cyd, 1726

In article <E1LJIXN-0008Vg-Pe@fencepost.gnu.org>, Richard M Stallman <rms@gnu.org> writes:

>     We need some way of inserting raw 8-bit bytes, because otherwise code
>     that encodes and decodes text in Lisp will not work.  For inserting
>     characters, we have the \u alternative; but I don't think there's
>     alternative for raw bytes except insert \xNN.

> Naybe that is a valid reason for the current behavior, but that
> doesn't alter the need for the manual to document the behavior.

> Meanwhile, the Chinese and Chinese-derived character codes
> do not follow Unicode.  So you can't enter them with \u.
> What is the way to enter them?

Most of Chinese and Chinese-derived character codes are
unified into Unicode area.  Only a few codes can't be
unified with Unicode, and thus decoded into the character
space over #x110000.  But, in that sense, Chinese and
Chinese-derived character codes are not special.  There
exist several non-Chinese character sets (e.g. tibetan)
containing characters that doesn't exist in Unicode, and
they are decoded into the character space over #x110000 too.

But, all of them can be accessed by "\U00XXXXXX".

---
Kenichi Handa
handa@m17n.org

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug#1726: 23.0.60; end-of-sentence and non-breaking space
       [not found] <E1LJjcR-0001uN-1l@etlken.m17n.org>
@ 2009-01-06  0:01 ` Richard M Stallman
  2009-01-06  4:08   ` Eli Zaretskii
  0 siblings, 1 reply; 24+ messages in thread
From: Richard M Stallman @ 2009-01-06  0:01 UTC (permalink / raw)
  To: Kenichi Handa, 1726
  Cc: cyd, bug-submit-list, bug-gnu-emacs, 1726, emacs-devel

      There
    exist several non-Chinese character sets (e.g. tibetan)
    containing characters that doesn't exist in Unicode, and
    they are decoded into the character space over #x110000 too.

    But, all of them can be accessed by "\U00XXXXXX".

Can you please document this (and the rest of what we have discussed
in this thread)?




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug#1726: 23.0.60; end-of-sentence and non-breaking space
  2009-01-06  0:01 ` bug#1726: " Richard M Stallman
@ 2009-01-06  4:08   ` Eli Zaretskii
  0 siblings, 0 replies; 24+ messages in thread
From: Eli Zaretskii @ 2009-01-06  4:08 UTC (permalink / raw)
  To: rms; +Cc: 1726, emacs-devel, bug-gnu-emacs, cyd, handa

> From: Richard M Stallman <rms@gnu.org>
> Date: Mon, 05 Jan 2009 19:01:02 -0500
> Cc: cyd@stupidchicken.com, bug-submit-list@donarmstrong.com,
> 	bug-gnu-emacs@gnu.org, 1726@emacsbugs.donarmstrong.com, emacs-devel@gnu.org
> 
>       There
>     exist several non-Chinese character sets (e.g. tibetan)
>     containing characters that doesn't exist in Unicode, and
>     they are decoded into the character space over #x110000 too.
> 
>     But, all of them can be accessed by "\U00XXXXXX".
> 
> Can you please document this (and the rest of what we have discussed
> in this thread)?

You already asked me to do this, and it's on my TODO.




^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1726: 23.0.60; end-of-sentence and non-breaking space
  2008-12-29 10:23 Richard M Stallman
@ 2011-09-11 18:43 ` Lars Magne Ingebrigtsen
  2012-03-27 23:27   ` Glenn Morris
  0 siblings, 1 reply; 24+ messages in thread
From: Lars Magne Ingebrigtsen @ 2011-09-11 18:43 UTC (permalink / raw)
  To: rms; +Cc: 1726

Richard M Stallman <rms@gnu.org> writes:

> forward-sentence does not treat non-breaking space as a space
> for purposes of sentence ends.
>
> I will fix this as soon as I know a way to put non-breaking space
> into a string constant.

The discussion then turned to the various string literal syntaxes.

Has this issue been resolved so that the bug report can be closed?

-- 
(domestic pets only, the antidote for overdose, milk.)
  bloggy blog http://lars.ingebrigtsen.no/





^ permalink raw reply	[flat|nested] 24+ messages in thread

* bug#1726: 23.0.60; end-of-sentence and non-breaking space
  2011-09-11 18:43 ` Lars Magne Ingebrigtsen
@ 2012-03-27 23:27   ` Glenn Morris
  0 siblings, 0 replies; 24+ messages in thread
From: Glenn Morris @ 2012-03-27 23:27 UTC (permalink / raw)
  To: 1726-done

Version: 23.1

Lars Magne Ingebrigtsen wrote:

> Richard M Stallman <rms@gnu.org> writes:
>
>> forward-sentence does not treat non-breaking space as a space
>> for purposes of sentence ends.
[...]
> The discussion then turned to the various string literal syntaxes.
>
> Has this issue been resolved so that the bug report can be closed?

I see the issue in 22.3 but not 23.1 and later, so I assume it was fixed
ages ago.

2009-01-16  Richard M Stallman  <rms at gnu.org>

 * textmodes/paragraphs.el (sentence-end): Accept non-break space.





^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2012-03-27 23:27 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <87y6xv7kmc.fsf@cyd.mit.edu>
2009-01-02  1:25 ` bug#1727: 23.0.60; end-of-sentence and non-breaking space Richard M Stallman
2009-01-02  1:25 ` bug#1726: " Richard M Stallman
2009-01-02  2:38 ` bug#1727: " Drew Adams
2009-01-02  2:38 ` bug#1726: " Drew Adams
2009-01-02  4:11 ` bug#1726: " Stefan Monnier
2009-01-02  4:11 ` bug#1727: " Stefan Monnier
     [not found] <E1LJjcR-0001uN-1l@etlken.m17n.org>
2009-01-06  0:01 ` bug#1726: " Richard M Stallman
2009-01-06  4:08   ` Eli Zaretskii
     [not found] <umye7n1pt.fsf@gnu.org>
2009-01-04 21:42 ` Richard M Stallman
     [not found] <jwvljtupcw3.fsf-monnier+emacsbugreports@gnu.org>
2009-01-02 17:13 ` Richard M Stallman
     [not found] ` <E1LInaw-0006om-DT@fencepost.gnu.org>
2009-01-03  3:06   ` Stefan Monnier
     [not found]   ` <jwv7i5dnlcy.fsf-monnier+emacsbugreports@gnu.org>
2009-01-03  9:54     ` Eli Zaretskii
2009-01-03 15:21     ` Richard M Stallman
     [not found]     ` <E1LJ8K6-0003dL-UD@fencepost.gnu.org>
2009-01-03 16:44       ` Eli Zaretskii
     [not found]       ` <utz8gmj9t.fsf@gnu.org>
2009-01-04  2:16         ` Richard M Stallman
     [not found]         ` <E1LJIXN-0008Vg-Pe@fencepost.gnu.org>
2009-01-04  4:18           ` Eli Zaretskii
2009-01-04  4:29           ` Jason Rumney
     [not found]           ` <49603B11.5040101@gnu.org>
2009-01-04 16:45             ` Eli Zaretskii
2009-01-05  7:11           ` Kenichi Handa
2009-01-05  6:37         ` Kenichi Handa
2009-01-01  3:47 Chong Yidong
  -- strict thread matches above, loose matches on Subject: below --
2008-12-29 10:23 Richard M Stallman
2011-09-11 18:43 ` Lars Magne Ingebrigtsen
2012-03-27 23:27   ` Glenn Morris

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).