[Unicode-2] `read' always returns multibyte symbol

* [Unicode-2] `read' always returns multibyte symbol
@ 2007-11-13  9:41 Katsumi Yamaoka
  2007-11-13 12:55 ` Kenichi Handa
  2007-11-13 15:07 ` Stefan Monnier
  0 siblings, 2 replies; 40+ messages in thread
From: Katsumi Yamaoka @ 2007-11-13  9:41 UTC (permalink / raw)
  To: emacs-devel; +Cc: ding

Hi,

The following Lisp snippet emulates what Gnus does when reading
active data for the local.テスト newsgroup.  The buffer contains
data which have been retrieved from the nntp server.  Note that
the newsgroup name contains non-ASCII characters, which has been
encoded by utf-8 in the server.

--8<---------------cut here---------------start------------->8---
(let ((string (encode-coding-string "local.テスト" 'utf-8)))
  (with-temp-buffer
    (set-buffer-multibyte t)
    (insert (string-to-multibyte string))
    (goto-char (point-min))
    (multibyte-string-p (symbol-name (read (current-buffer))))))
--8<---------------cut here---------------end--------------->8---

While Emacs trunk returns nil for this, Emacs Unicode-2 returns t.

If it is not intentional, I hope `read' behaves just like it does
in Emacs trunk.  Otherwise, is there a way to make `read' return
a unibyte symbol (without slowing down)?

In the inside of Gnus, non-ASCII group names are all treated as
unibyte strings, that are the ones that the server has encoded
with certain coding systems.  Because of the present behavior of
`read' in Emacs Unicode-2, Gnus doesn't work with such newsgroups
perfectly.  You can find the actual code in gnus-start.el as
follows:

--8<---------------cut here---------------start------------->8---
;; Read an active file and place the results in `gnus-active-hashtb'.
(defun gnus-active-to-gnus-format (&optional method hashtb ignore-errors
					     real-active)
[...]
	      ;; group gets set to a symbol interned in the hash table
	      ;; (what a hack!!) - jwz
	      (setq group (let ((obarray hashtb)) (read cur)))
--8<---------------cut here---------------end--------------->8---

As you can see, it needs to work fast because there might be a
lot of newsgroups.  So, if possible, I don't want to modify it
into:

--8<---------------cut here---------------start------------->8---
 (setq group (intern (mm-string-as-unibyte (symbol-name (read cur))) hashtb))
--8<---------------cut here---------------end--------------->8---

Regards,

^ permalink raw reply	[flat|nested] 40+ messages in thread