* Re: 23.0.60; end-of-sentence and non-breaking space
@ 2009-01-01 3:47 Chong Yidong
2009-01-02 1:25 ` Richard M Stallman
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Chong Yidong @ 2009-01-01 3:47 UTC (permalink / raw)
To: emacs-devel; +Cc: 1726, rms, 1727
From bug#1726 and bug#1727:
> forward-sentence does not treat non-breaking space as a space for
> purposes of sentence ends.
...
> When I type C-x = at a non-breaking space, it tells me that it
> has code 160, hex a0. But when I execute (insert "\xa0"),
> it inserts something that displays as `\240' and for which C-x =
> displays this:
>
> Char: (4194208, #o17777640, #x3fffa0, raw-byte) point=198 of 211
> (93%) column=5
>
> Is that a bug? It seems quite confusing to me.
ISTR that there was an extended discussion about classifying
non-breaking spaces on this list a while back. But I can't find it now.
Does anyone remember the details?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 23.0.60; end-of-sentence and non-breaking space
2009-01-01 3:47 23.0.60; end-of-sentence and non-breaking space Chong Yidong
@ 2009-01-02 1:25 ` Richard M Stallman
2009-01-02 2:38 ` bug#1727: " Drew Adams
2009-01-02 4:11 ` Stefan Monnier
2 siblings, 0 replies; 18+ messages in thread
From: Richard M Stallman @ 2009-01-02 1:25 UTC (permalink / raw)
To: Chong Yidong; +Cc: 1726, 1727, emacs-devel
> When I type C-x = at a non-breaking space, it tells me that it
> has code 160, hex a0. But when I execute (insert "\xa0"),
> it inserts something that displays as `\240' and for which C-x =
> displays this:
> Char: (4194208, #o17777640, #x3fffa0, raw-byte) point=198 of 211
> (93%) column=5
>
> Is that a bug? It seems quite confusing to me.
ISTR that there was an extended discussion about classifying
non-breaking spaces on this list a while back. But I can't find it now.
Does anyone remember the details?
I am not sure we are talking about the same question.
The issue I am raising is not one of classifying it,
it is that these two different character codes get used
and I don't see an explanation of what's going on.
^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: bug#1727: 23.0.60; end-of-sentence and non-breaking space
2009-01-01 3:47 23.0.60; end-of-sentence and non-breaking space Chong Yidong
2009-01-02 1:25 ` Richard M Stallman
@ 2009-01-02 2:38 ` Drew Adams
2009-01-02 4:11 ` Stefan Monnier
2 siblings, 0 replies; 18+ messages in thread
From: Drew Adams @ 2009-01-02 2:38 UTC (permalink / raw)
To: 'Chong Yidong', 1727, emacs-devel; +Cc: 1726, rms
> ISTR that there was an extended discussion about classifying
> non-breaking spaces on this list a while back. But I can't
> find it now. Does anyone remember the details?
Dunno if this is what you were thinking of, but there was this discussion about
treating (classifying) nonbreaking space as whitespace:
http://lists.gnu.org/archive/html/emacs-devel/2007-06/msg01089.html
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 23.0.60; end-of-sentence and non-breaking space
2009-01-01 3:47 23.0.60; end-of-sentence and non-breaking space Chong Yidong
2009-01-02 1:25 ` Richard M Stallman
2009-01-02 2:38 ` bug#1727: " Drew Adams
@ 2009-01-02 4:11 ` Stefan Monnier
2009-01-02 17:13 ` Richard M Stallman
2 siblings, 1 reply; 18+ messages in thread
From: Stefan Monnier @ 2009-01-02 4:11 UTC (permalink / raw)
To: Chong Yidong; +Cc: 1726, 1727, rms, emacs-devel
>> has code 160, hex a0. But when I execute (insert "\xa0"),
>> it inserts something that displays as `\240' and for which C-x =
>> displays this:
>>
>> Char: (4194208, #o17777640, #x3fffa0, raw-byte) point=198 of 211
>> (93%) column=5
>>
>> Is that a bug? It seems quite confusing to me.
This raw-byte char is what used to be called an eight-bit-control (or
eight-bit-graphic depending on the actual value) char.
I.e. "\xa0" is treated as a string that contains the \xa0 byte (i.e. an
eight-bit-* (aka raw-byte) char) rather than the \xa0 char (a latin-1
non-breaking space).
Stefan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 23.0.60; end-of-sentence and non-breaking space
2009-01-02 4:11 ` Stefan Monnier
@ 2009-01-02 17:13 ` Richard M Stallman
2009-01-03 3:06 ` Stefan Monnier
0 siblings, 1 reply; 18+ messages in thread
From: Richard M Stallman @ 2009-01-02 17:13 UTC (permalink / raw)
To: Stefan Monnier; +Cc: cyd, 1726, emacs-devel
This raw-byte char is what used to be called an eight-bit-control (or
eight-bit-graphic depending on the actual value) char.
I.e. "\xa0" is treated as a string that contains the \xa0 byte (i.e. an
eight-bit-* (aka raw-byte) char) rather than the \xa0 char (a latin-1
non-breaking space).
1. Is that the right thing for \xa0 in a string to mean?
Or should it mean the character with code xa0?
2. I find it hard to think about that question since I don't see any
documentation explaining how this ought to work. That documentation
is essential.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 23.0.60; end-of-sentence and non-breaking space
2009-01-02 17:13 ` Richard M Stallman
@ 2009-01-03 3:06 ` Stefan Monnier
2009-01-03 9:54 ` Eli Zaretskii
2009-01-03 15:21 ` Richard M Stallman
0 siblings, 2 replies; 18+ messages in thread
From: Stefan Monnier @ 2009-01-03 3:06 UTC (permalink / raw)
To: rms; +Cc: cyd, 1726, emacs-devel
> This raw-byte char is what used to be called an eight-bit-control (or
> eight-bit-graphic depending on the actual value) char.
> I.e. "\xa0" is treated as a string that contains the \xa0 byte (i.e. an
> eight-bit-* (aka raw-byte) char) rather than the \xa0 char (a latin-1
> non-breaking space).
> 1. Is that the right thing for \xa0 in a string to mean?
> Or should it mean the character with code xa0?
> 2. I find it hard to think about that question since I don't see any
> documentation explaining how this ought to work. That documentation
> is essential.
Good point. Especially because I think this changed from Emacs-20 to
Emacs-21, and I think it also changed now from Emacs-22 to Emacs-23.
IIUC if you want the character with code #xa0, then using \u00a0 would
seem like the most unambiguous option (I notice that "\ua0" gives
a weird error "Non-hex digit used for Unicode escape").
Not sure what \NNN or \xMM should do.
Stefan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 23.0.60; end-of-sentence and non-breaking space
2009-01-03 3:06 ` Stefan Monnier
@ 2009-01-03 9:54 ` Eli Zaretskii
2009-01-03 15:21 ` Richard M Stallman
1 sibling, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2009-01-03 9:54 UTC (permalink / raw)
To: Stefan Monnier; +Cc: cyd, emacs-devel, rms, 1726
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Fri, 02 Jan 2009 22:06:21 -0500
> Cc: cyd@stupidchicken.com, 1726@emacsbugs.donarmstrong.com, emacs-devel@gnu.org
>
> Good point. Especially because I think this changed from Emacs-20 to
> Emacs-21, and I think it also changed now from Emacs-22 to Emacs-23.
I think the change in Emacs 23 is OK, but it needs to be consistent in
characters and strings.
> IIUC if you want the character with code #xa0, then using \u00a0 would
> seem like the most unambiguous option
Agreed. Using \uNNNN is an unambiguous way of saying you want a Unicode
character whose codepoint is NNNN in hex.
> Not sure what \NNN or \xMM should do.
I think they should insert a raw byte with that code, and I think they
should do that both in characters and in strings, so that the
inconsistent behavior reported by Richard will become consistent.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 23.0.60; end-of-sentence and non-breaking space
2009-01-03 3:06 ` Stefan Monnier
2009-01-03 9:54 ` Eli Zaretskii
@ 2009-01-03 15:21 ` Richard M Stallman
2009-01-03 16:44 ` Eli Zaretskii
1 sibling, 1 reply; 18+ messages in thread
From: Richard M Stallman @ 2009-01-03 15:21 UTC (permalink / raw)
To: Stefan Monnier; +Cc: cyd, 1726, emacs-devel
IIUC if you want the character with code #xa0, then using \u00a0 would
seem like the most unambiguous option (I notice that "\ua0" gives
a weird error "Non-hex digit used for Unicode escape").
I expected \xa0 to give me that character. It still seems strange
that it would do anything else.
When I read the documentation of \u, I thought it meant "unicode" as
opposed to "Emacs's internal code". Since I knew that Emacs now
follows unicode for these characters, I saw no reason to consider
using \u.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 23.0.60; end-of-sentence and non-breaking space
2009-01-03 15:21 ` Richard M Stallman
@ 2009-01-03 16:44 ` Eli Zaretskii
2009-01-04 2:16 ` Richard M Stallman
2009-01-05 6:37 ` Kenichi Handa
0 siblings, 2 replies; 18+ messages in thread
From: Eli Zaretskii @ 2009-01-03 16:44 UTC (permalink / raw)
To: rms; +Cc: cyd, emacs-devel, monnier, 1726
> From: Richard M Stallman <rms@gnu.org>
> Date: Sat, 03 Jan 2009 10:21:58 -0500
> Cc: cyd@stupidchicken.com, 1726@emacsbugs.donarmstrong.com, emacs-devel@gnu.org
>
> IIUC if you want the character with code #xa0, then using \u00a0 would
> seem like the most unambiguous option (I notice that "\ua0" gives
> a weird error "Non-hex digit used for Unicode escape").
>
> I expected \xa0 to give me that character. It still seems strange
> that it would do anything else.
We need some way of inserting raw 8-bit bytes, because otherwise code
that encodes and decodes text in Lisp will not work. For inserting
characters, we have the \u alternative; but I don't think there's
alternative for raw bytes except insert \xNN.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 23.0.60; end-of-sentence and non-breaking space
2009-01-03 16:44 ` Eli Zaretskii
@ 2009-01-04 2:16 ` Richard M Stallman
2009-01-04 4:18 ` Eli Zaretskii
` (2 more replies)
2009-01-05 6:37 ` Kenichi Handa
1 sibling, 3 replies; 18+ messages in thread
From: Richard M Stallman @ 2009-01-04 2:16 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cyd, 1726, monnier, emacs-devel
We need some way of inserting raw 8-bit bytes, because otherwise code
that encodes and decodes text in Lisp will not work. For inserting
characters, we have the \u alternative; but I don't think there's
alternative for raw bytes except insert \xNN.
Naybe that is a valid reason for the current behavior, but that
doesn't alter the need for the manual to document the behavior.
Meanwhile, the Chinese and Chinese-derived character codes
do not follow Unicode. So you can't enter them with \u.
What is the way to enter them?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 23.0.60; end-of-sentence and non-breaking space
2009-01-04 2:16 ` Richard M Stallman
@ 2009-01-04 4:18 ` Eli Zaretskii
2009-01-04 21:42 ` Richard M Stallman
2009-01-04 4:29 ` bug#1726: " Jason Rumney
2009-01-05 7:11 ` Kenichi Handa
2 siblings, 1 reply; 18+ messages in thread
From: Eli Zaretskii @ 2009-01-04 4:18 UTC (permalink / raw)
To: rms; +Cc: cyd, 1726, monnier, emacs-devel
> From: Richard M Stallman <rms@gnu.org>
> CC: cyd@stupidchicken.com, emacs-devel@gnu.org,
> monnier@iro.umontreal.ca, 1726@emacsbugs.donarmstrong.com
> Date: Sat, 03 Jan 2009 21:16:21 -0500
>
> We need some way of inserting raw 8-bit bytes, because otherwise code
> that encodes and decodes text in Lisp will not work. For inserting
> characters, we have the \u alternative; but I don't think there's
> alternative for raw bytes except insert \xNN.
>
> Naybe that is a valid reason for the current behavior, but that
> doesn't alter the need for the manual to document the behavior.
That was an attempt at explaining the reasons, not telling they don't
need to be documented.
> Meanwhile, the Chinese and Chinese-derived character codes
> do not follow Unicode. So you can't enter them with \u.
> What is the way to enter them?
The problem at hand exists only for codes that are less than FF hex.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: bug#1726: 23.0.60; end-of-sentence and non-breaking space
2009-01-04 2:16 ` Richard M Stallman
2009-01-04 4:18 ` Eli Zaretskii
@ 2009-01-04 4:29 ` Jason Rumney
2009-01-04 16:45 ` Eli Zaretskii
2009-01-05 7:11 ` Kenichi Handa
2 siblings, 1 reply; 18+ messages in thread
From: Jason Rumney @ 2009-01-04 4:29 UTC (permalink / raw)
To: rms, 1726; +Cc: Eli Zaretskii, cyd, emacs-devel
Richard M Stallman wrote:
> Meanwhile, the Chinese and Chinese-derived character codes
> do not follow Unicode.
They do in Emacs 23, though I think if you enter \x1234, it will be
treated the same as \u1234, as characters with more than 8 bits are
clearly not eight-bit raw bytes.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: bug#1726: 23.0.60; end-of-sentence and non-breaking space
2009-01-04 4:29 ` bug#1726: " Jason Rumney
@ 2009-01-04 16:45 ` Eli Zaretskii
0 siblings, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2009-01-04 16:45 UTC (permalink / raw)
To: Jason Rumney; +Cc: cyd, 1726, rms, emacs-devel
> Date: Sun, 04 Jan 2009 12:29:05 +0800
> From: Jason Rumney <jasonr@gnu.org>
> Cc: Eli Zaretskii <eliz@gnu.org>, cyd@stupidchicken.com, emacs-devel@gnu.org
>
> Richard M Stallman wrote:
> > Meanwhile, the Chinese and Chinese-derived character codes
> > do not follow Unicode.
>
> They do in Emacs 23
Not all of them, I think.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 23.0.60; end-of-sentence and non-breaking space
2009-01-04 4:18 ` Eli Zaretskii
@ 2009-01-04 21:42 ` Richard M Stallman
0 siblings, 0 replies; 18+ messages in thread
From: Richard M Stallman @ 2009-01-04 21:42 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cyd, emacs-devel, 1726, monnier
> Meanwhile, the Chinese and Chinese-derived character codes
> do not follow Unicode. So you can't enter them with \u.
> What is the way to enter them?
The problem at hand exists only for codes that are less than FF hex.
Maybe, but isn't there a similar problem for Chinese-derived
characters? How does one specify these codes in a string constant?
Shouldn't there be some way?
Maybe there is never a need to do it; maybe we don't need to add a
feature for it. But if we don't, we should document that there is
currently no way.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 23.0.60; end-of-sentence and non-breaking space
2009-01-03 16:44 ` Eli Zaretskii
2009-01-04 2:16 ` Richard M Stallman
@ 2009-01-05 6:37 ` Kenichi Handa
1 sibling, 0 replies; 18+ messages in thread
From: Kenichi Handa @ 2009-01-05 6:37 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: cyd, 1726, rms, monnier, emacs-devel
In article <utz8gmj9t.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> > From: Richard M Stallman <rms@gnu.org>
> > Date: Sat, 03 Jan 2009 10:21:58 -0500
> > Cc: cyd@stupidchicken.com, 1726@emacsbugs.donarmstrong.com, emacs-devel@gnu.org
> >
> > IIUC if you want the character with code #xa0, then using \u00a0 would
> > seem like the most unambiguous option (I notice that "\ua0" gives
> > a weird error "Non-hex digit used for Unicode escape").
> >
> > I expected \xa0 to give me that character. It still seems strange
> > that it would do anything else.
> We need some way of inserting raw 8-bit bytes, because otherwise code
> that encodes and decodes text in Lisp will not work. For inserting
> characters, we have the \u alternative; but I don't think there's
> alternative for raw bytes except insert \xNN.
I modified read_escape to treat "\xXX" as a raw-byte code
but treat "\xXXX.." as a character code U+XXX... As far as
I remember, this is to keep backward compatibility.
And, we have the alternative for raw bytes. That is to use
octal form, something like "\240".
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 23.0.60; end-of-sentence and non-breaking space
2009-01-04 2:16 ` Richard M Stallman
2009-01-04 4:18 ` Eli Zaretskii
2009-01-04 4:29 ` bug#1726: " Jason Rumney
@ 2009-01-05 7:11 ` Kenichi Handa
2009-01-06 0:01 ` bug#1726: " Richard M Stallman
2 siblings, 1 reply; 18+ messages in thread
From: Kenichi Handa @ 2009-01-05 7:11 UTC (permalink / raw)
To: rms; +Cc: eliz, emacs-devel, cyd, monnier, 1726
In article <E1LJIXN-0008Vg-Pe@fencepost.gnu.org>, Richard M Stallman <rms@gnu.org> writes:
> We need some way of inserting raw 8-bit bytes, because otherwise code
> that encodes and decodes text in Lisp will not work. For inserting
> characters, we have the \u alternative; but I don't think there's
> alternative for raw bytes except insert \xNN.
> Naybe that is a valid reason for the current behavior, but that
> doesn't alter the need for the manual to document the behavior.
> Meanwhile, the Chinese and Chinese-derived character codes
> do not follow Unicode. So you can't enter them with \u.
> What is the way to enter them?
Most of Chinese and Chinese-derived character codes are
unified into Unicode area. Only a few codes can't be
unified with Unicode, and thus decoded into the character
space over #x110000. But, in that sense, Chinese and
Chinese-derived character codes are not special. There
exist several non-Chinese character sets (e.g. tibetan)
containing characters that doesn't exist in Unicode, and
they are decoded into the character space over #x110000 too.
But, all of them can be accessed by "\U00XXXXXX".
---
Kenichi Handa
handa@m17n.org
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: bug#1726: 23.0.60; end-of-sentence and non-breaking space
2009-01-05 7:11 ` Kenichi Handa
@ 2009-01-06 0:01 ` Richard M Stallman
2009-01-06 4:08 ` Eli Zaretskii
0 siblings, 1 reply; 18+ messages in thread
From: Richard M Stallman @ 2009-01-06 0:01 UTC (permalink / raw)
To: Kenichi Handa, 1726
Cc: cyd, bug-submit-list, bug-gnu-emacs, 1726, emacs-devel
There
exist several non-Chinese character sets (e.g. tibetan)
containing characters that doesn't exist in Unicode, and
they are decoded into the character space over #x110000 too.
But, all of them can be accessed by "\U00XXXXXX".
Can you please document this (and the rest of what we have discussed
in this thread)?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: bug#1726: 23.0.60; end-of-sentence and non-breaking space
2009-01-06 0:01 ` bug#1726: " Richard M Stallman
@ 2009-01-06 4:08 ` Eli Zaretskii
0 siblings, 0 replies; 18+ messages in thread
From: Eli Zaretskii @ 2009-01-06 4:08 UTC (permalink / raw)
To: rms; +Cc: 1726, emacs-devel, bug-gnu-emacs, cyd, handa
> From: Richard M Stallman <rms@gnu.org>
> Date: Mon, 05 Jan 2009 19:01:02 -0500
> Cc: cyd@stupidchicken.com, bug-submit-list@donarmstrong.com,
> bug-gnu-emacs@gnu.org, 1726@emacsbugs.donarmstrong.com, emacs-devel@gnu.org
>
> There
> exist several non-Chinese character sets (e.g. tibetan)
> containing characters that doesn't exist in Unicode, and
> they are decoded into the character space over #x110000 too.
>
> But, all of them can be accessed by "\U00XXXXXX".
>
> Can you please document this (and the rest of what we have discussed
> in this thread)?
You already asked me to do this, and it's on my TODO.
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2009-01-06 4:08 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-01 3:47 23.0.60; end-of-sentence and non-breaking space Chong Yidong
2009-01-02 1:25 ` Richard M Stallman
2009-01-02 2:38 ` bug#1727: " Drew Adams
2009-01-02 4:11 ` Stefan Monnier
2009-01-02 17:13 ` Richard M Stallman
2009-01-03 3:06 ` Stefan Monnier
2009-01-03 9:54 ` Eli Zaretskii
2009-01-03 15:21 ` Richard M Stallman
2009-01-03 16:44 ` Eli Zaretskii
2009-01-04 2:16 ` Richard M Stallman
2009-01-04 4:18 ` Eli Zaretskii
2009-01-04 21:42 ` Richard M Stallman
2009-01-04 4:29 ` bug#1726: " Jason Rumney
2009-01-04 16:45 ` Eli Zaretskii
2009-01-05 7:11 ` Kenichi Handa
2009-01-06 0:01 ` bug#1726: " Richard M Stallman
2009-01-06 4:08 ` Eli Zaretskii
2009-01-05 6:37 ` Kenichi Handa
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).