* bug#9608: 24.0.50; Emacs lisp reader thinks no-break space is 0x08a0 (should be 0x00a0)
@ 2011-09-27 0:00 David M. Cooke
2011-09-27 8:44 ` Andreas Schwab
0 siblings, 1 reply; 2+ messages in thread
From: David M. Cooke @ 2011-09-27 0:00 UTC (permalink / raw)
To: 9608
[zapped boilerplate header]
After reading through lread.c (I was writing an emacs lisp lexer for
syntax-highlighting in pygments), I discovered it treats the unicode
character U+08A0 as whitespace (with the comment "NBSP"). I believe this
was meant to be U+00A0 (NO-BREAK SPACE), as the code point U+08A0 has no
character assigned to it yet (it lies between the Samaritan and the
Devanagari blocks).
Additionally, you can see this by running the following lisp code:
(mapcar (lambda (sym) (string-as-unibyte (symbol-name sym) ))
(read "(a b c\u00a0d e\u08a0f g \u00a0 h i \u08a0 j)"))
This gives the result
("a" "b" "c\302\240d" "e" "f" "g" "\302\240" "h" "i" "j")
where we can see U+00A0 (utf-8: "\302\240") is being treated as a
symbol-constituent character, whereas U+08A0 is whitespace.
The changes to the whitespace handling were introduced in bzr revision
78902 (on 2007-07-30, which is a few weeks after a discussion about
handling NO-BREAK SPACE on the mailing list). I'm guessing using 0x8a0
was just a thinko.
cheers,
David M. Cooke <david.m.cooke@gmail.com>
If Emacs crashed, and you have the Emacs process in the gdb debugger,
please include the output from the following gdb commands:
`bt full' and `xbacktrace'.
For information about debugging Emacs, please read the file
/Applications/_Editors/Emacs.app/Contents/Resources/etc/DEBUG.
In GNU Emacs 24.0.50.2 (x86_64-apple-darwin10.7.0, NS apple-appkit-1038.35)
of 2011-05-27 on mars.lan
Windowing system distributor `Apple', version 10.3.1138
configured using `configure '--with-ns''
Important settings:
value of $LC_ALL: nil
value of $LC_COLLATE: nil
value of $LC_CTYPE: nil
value of $LC_MESSAGES: nil
value of $LC_MONETARY: nil
value of $LC_NUMERIC: nil
value of $LC_TIME: nil
value of $LANG: en_CA.UTF-8
value of $XMODIFIERS: nil
locale-coding-system: utf-8-unix
default enable-multibyte-characters: t
Major mode: Lisp Interaction
Minor modes in effect:
tooltip-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
line-number-mode: t
transient-mark-mode: t
Recent input:
( s <backspace> m a p c a r SPC ' s y m b o l - n a
m e SPC ( e v a l SPC " ( a SPC b SPC c \ u 0 0 a 0
d SPC e \ u 0 8 a 0 d <backspace> f ) " ) ) C-j q <down-mouse-1>
<mouse-1> # <down-mouse-1> <mouse-1> C-j q <down-mouse-1>
<mouse-1> ' C-e C-j q <backspace> <left> <left> <left>
<left> <left> <left> <left> <left> <left> <left> <left>
<left> <left> <left> <left> <left> <left> <left> <left>
<left> <left> <left> <left> <left> <left> <left> <backspace>
<backspace> " <left> <left> <backspace> <backspace>
<backspace> <backspace> r e a d C-e C-j <up> <left>
<left> <left> <left> <left> SPC g SPC \ u 0 0 a 0 SPC
h SPC i SPC \ u 0 8 a 0 SPC j C-e C-j <escape> x r
e m p o r <backspace> <backspace> <backspace> <backspace>
p o r <tab> <return>
Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Entering debugger...
Back to top level.
Entering debugger...
Back to top level.
Entering debugger...
Back to top level.
Load-path shadows:
None found.
Features:
(shadow sort gnus-util time-date mail-extr message format-spec rfc822
mml mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 rfc2047
rfc2045 ietf-drums mm-util mail-prsvr mailabbrev mail-utils gmm-utils
mailheader emacsbug help-mode easymenu view debug tooltip ediff-hook
vc-hooks lisp-float-type mwheel ns-win tool-bar dnd fontset image fringe
lisp-mode register page menu-bar rfn-eshadow timer select scroll-bar
mouse jit-lock font-lock syntax facemenu font-core frame cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev
minibuffer loaddefs button faces cus-face files text-properties overlay
sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote make-network-process dbusbind ns
multi-tty emacs)
^ permalink raw reply [flat|nested] 2+ messages in thread
* bug#9608: 24.0.50; Emacs lisp reader thinks no-break space is 0x08a0 (should be 0x00a0)
2011-09-27 0:00 bug#9608: 24.0.50; Emacs lisp reader thinks no-break space is 0x08a0 (should be 0x00a0) David M. Cooke
@ 2011-09-27 8:44 ` Andreas Schwab
0 siblings, 0 replies; 2+ messages in thread
From: Andreas Schwab @ 2011-09-27 8:44 UTC (permalink / raw)
To: David M. Cooke; +Cc: 9608-done
"David M. Cooke" <david.m.cooke@gmail.com> writes:
> The changes to the whitespace handling were introduced in bzr revision
> 78902 (on 2007-07-30, which is a few weeks after a discussion about
> handling NO-BREAK SPACE on the mailing list).
That was before the unicode merge.
> I'm guessing using 0x8a0 was just a thinko.
No, it was the correct number at that time, when Emacs used the mule
encoding internally.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2011-09-27 8:44 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-27 0:00 bug#9608: 24.0.50; Emacs lisp reader thinks no-break space is 0x08a0 (should be 0x00a0) David M. Cooke
2011-09-27 8:44 ` Andreas Schwab
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).