* [Unicode-2] `read' always returns multibyte symbol
@ 2007-11-13 9:41 Katsumi Yamaoka
2007-11-13 12:55 ` Kenichi Handa
2007-11-13 15:07 ` Stefan Monnier
0 siblings, 2 replies; 40+ messages in thread
From: Katsumi Yamaoka @ 2007-11-13 9:41 UTC (permalink / raw)
To: emacs-devel; +Cc: ding
Hi,
The following Lisp snippet emulates what Gnus does when reading
active data for the local.テスト newsgroup. The buffer contains
data which have been retrieved from the nntp server. Note that
the newsgroup name contains non-ASCII characters, which has been
encoded by utf-8 in the server.
--8<---------------cut here---------------start------------->8---
(let ((string (encode-coding-string "local.テスト" 'utf-8)))
(with-temp-buffer
(set-buffer-multibyte t)
(insert (string-to-multibyte string))
(goto-char (point-min))
(multibyte-string-p (symbol-name (read (current-buffer))))))
--8<---------------cut here---------------end--------------->8---
While Emacs trunk returns nil for this, Emacs Unicode-2 returns t.
If it is not intentional, I hope `read' behaves just like it does
in Emacs trunk. Otherwise, is there a way to make `read' return
a unibyte symbol (without slowing down)?
In the inside of Gnus, non-ASCII group names are all treated as
unibyte strings, that are the ones that the server has encoded
with certain coding systems. Because of the present behavior of
`read' in Emacs Unicode-2, Gnus doesn't work with such newsgroups
perfectly. You can find the actual code in gnus-start.el as
follows:
--8<---------------cut here---------------start------------->8---
;; Read an active file and place the results in `gnus-active-hashtb'.
(defun gnus-active-to-gnus-format (&optional method hashtb ignore-errors
real-active)
[...]
;; group gets set to a symbol interned in the hash table
;; (what a hack!!) - jwz
(setq group (let ((obarray hashtb)) (read cur)))
--8<---------------cut here---------------end--------------->8---
As you can see, it needs to work fast because there might be a
lot of newsgroups. So, if possible, I don't want to modify it
into:
--8<---------------cut here---------------start------------->8---
(setq group (intern (mm-string-as-unibyte (symbol-name (read cur))) hashtb))
--8<---------------cut here---------------end--------------->8---
Regards,
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-13 9:41 [Unicode-2] `read' always returns multibyte symbol Katsumi Yamaoka
@ 2007-11-13 12:55 ` Kenichi Handa
2007-11-13 15:10 ` Stefan Monnier
2007-11-14 3:56 ` [Unicode-2] `read' always returns multibyte symbol Katsumi Yamaoka
2007-11-13 15:07 ` Stefan Monnier
1 sibling, 2 replies; 40+ messages in thread
From: Kenichi Handa @ 2007-11-13 12:55 UTC (permalink / raw)
To: Katsumi Yamaoka; +Cc: ding, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 3483 bytes --]
In article <b4moddycwjv.fsf@jpl.org>, Katsumi Yamaoka <yamaoka@jpl.org> writes:
> The following Lisp snippet emulates what Gnus does when reading
> active data for the local.テスト newsgroup. The buffer contains
> data which have been retrieved from the nntp server. Note that
> the newsgroup name contains non-ASCII characters, which has been
> encoded by utf-8 in the server.
> --8<---------------cut here---------------start------------->8---
> (let ((string (encode-coding-string "local.テスト" 'utf-8)))
> (with-temp-buffer
> (set-buffer-multibyte t)
> (insert (string-to-multibyte string))
> (goto-char (point-min))
> (multibyte-string-p (symbol-name (read (current-buffer))))))
> --8<---------------cut here---------------end--------------->8---
> While Emacs trunk returns nil for this, Emacs Unicode-2 returns t.
That is because `read' decides the name is unibyte or
multibyte by whether the name is a valid multibyte sequence
or not. In the trunk, utf-8 byte sequecne is not a valid
multibyte sequecne, but in emacs-unicode-2, it is valid.
> If it is not intentional, I hope `read' behaves just like it does
> in Emacs trunk.
The relevant code for `read' is very complicated and I want
to avoid touching it if there's another way.
In addition, I think it is the right thing that the above
code return t; i.e. any symbol created by reading a
multibyte buffer should have a multibyte string name. The
bug to fix is that the following code also returns t in
emacs-unicode-2.
< --8<---------------cut here---------------start------------->8---
< (let ((string (encode-coding-string "local.テスト" 'utf-8)))
< (with-temp-buffer
< (set-buffer-multibyte nil)
< (insert string)
< (goto-char (point-min))
< (multibyte-string-p (symbol-name (read (current-buffer))))))
< --8<---------------cut here---------------end--------------->8---
> Otherwise, is there a way to make `read' return a unibyte
> symbol (without slowing down)?
The replacement of the above code is simple as this:
(multibyte-string-p (intern (encode-coding-string "local.テスト" 'utf-8)))
But, hmmm, it seems that we can't use such a code in gnus...
> In the inside of Gnus, non-ASCII group names are all treated as
> unibyte strings, that are the ones that the server has encoded
> with certain coding systems. Because of the present behavior of
> `read' in Emacs Unicode-2, Gnus doesn't work with such newsgroups
> perfectly. You can find the actual code in gnus-start.el as
> follows:
> --8<---------------cut here---------------start------------->8---
> ;; Read an active file and place the results in `gnus-active-hashtb'.
> (defun gnus-active-to-gnus-format (&optional method hashtb ignore-errors
> real-active)
> [...]
> ;; group gets set to a symbol interned in the hash table
> ;; (what a hack!!) - jwz
> (setq group (let ((obarray hashtb)) (read cur)))
> --8<---------------cut here---------------end--------------->8---
How about this?
(setq group
(let ((obarray hashtb) pos)
(skip-syntax-forward "^w_")
(setq pos (point))
(skip-syntax-forward "w_")
(intern (buffer-substring pos (point)))))
I think the overhead is just several more function calls. The
actual task (searching for a range of symbol constituents,
make string from them, and intern it) is almost the same.
---
Kenichi Handa
handa@ni.aist.go.jp
[-- Attachment #2: Type: text/plain, Size: 142 bytes --]
_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-13 9:41 [Unicode-2] `read' always returns multibyte symbol Katsumi Yamaoka
2007-11-13 12:55 ` Kenichi Handa
@ 2007-11-13 15:07 ` Stefan Monnier
1 sibling, 0 replies; 40+ messages in thread
From: Stefan Monnier @ 2007-11-13 15:07 UTC (permalink / raw)
To: Katsumi Yamaoka; +Cc: ding, emacs-devel
> --8<---------------cut here---------------start------------->8---
> (let ((string (encode-coding-string "local.テスト" 'utf-8)))
> (with-temp-buffer
> (set-buffer-multibyte t)
> (insert (string-to-multibyte string))
> (goto-char (point-min))
> (multibyte-string-p (symbol-name (read (current-buffer))))))
> --8<---------------cut here---------------end--------------->8---
I'm not sure what Emacs should do in such a case, but in the example
above, using a multibyte buffer is asking for trouble.
Can't Gnus use a unibyte buffer in its corresponding code? That would
speed things up, save you the use of string-to-multibyte, and make it
crystal clear that the result should be unibyte.
Stefan "trying hard not to say that the use of a multibyte
buffer here is a plain bug ;-)"
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-13 12:55 ` Kenichi Handa
@ 2007-11-13 15:10 ` Stefan Monnier
2007-11-14 4:53 ` Kenichi Handa
2007-11-14 3:56 ` [Unicode-2] `read' always returns multibyte symbol Katsumi Yamaoka
1 sibling, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2007-11-13 15:10 UTC (permalink / raw)
To: Kenichi Handa; +Cc: Katsumi Yamaoka, ding, emacs-devel
> That is because `read' decides the name is unibyte or multibyte by
> whether the name is a valid multibyte sequence or not.
Yuck.
> The bug to fix is that the following code also returns t in
> emacs-unicode-2.
> < --8<---------------cut here---------------start------------->8---
> < (let ((string (encode-coding-string "local.テスト" 'utf-8)))
> < (with-temp-buffer
> < (set-buffer-multibyte nil)
> < (insert string)
> < (goto-char (point-min))
> < (multibyte-string-p (symbol-name (read (current-buffer))))))
> < --8<---------------cut here---------------end--------------->8---
Yes, that's a clear bug.
Stefan
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-13 12:55 ` Kenichi Handa
2007-11-13 15:10 ` Stefan Monnier
@ 2007-11-14 3:56 ` Katsumi Yamaoka
2007-11-14 11:39 ` Katsumi Yamaoka
1 sibling, 1 reply; 40+ messages in thread
From: Katsumi Yamaoka @ 2007-11-14 3:56 UTC (permalink / raw)
To: Kenichi Handa; +Cc: ding, emacs-devel
>>>>> Kenichi Handa <handa@ni.aist.go.jp> wrote:
> In addition, I think it is the right thing that the above
> code return t; i.e. any symbol created by reading a
> multibyte buffer should have a multibyte string name.
I agree with that behavior.
> The bug to fix is that the following code also returns t in
> emacs-unicode-2.
> < --8<---------------cut here---------------start------------->8---
> < (let ((string (encode-coding-string "local.テスト" 'utf-8)))
> < (with-temp-buffer
> < (set-buffer-multibyte nil)
> < (insert string)
> < (goto-char (point-min))
> < (multibyte-string-p (symbol-name (read (current-buffer))))))
> < --8<---------------cut here---------------end--------------->8---
Sure. I'll try using a unibyte buffer to parse active data
(after the bug is fixed).
Regards,
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-13 15:10 ` Stefan Monnier
@ 2007-11-14 4:53 ` Kenichi Handa
2007-11-14 7:06 ` [Unicode-2] `C-h f' error (was Re: `read' always returns multibyte symbol) Katsumi Yamaoka
0 siblings, 1 reply; 40+ messages in thread
From: Kenichi Handa @ 2007-11-14 4:53 UTC (permalink / raw)
To: Stefan Monnier; +Cc: yamaoka, ding, emacs-devel
In article <jwvoddykwt5.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
> > The bug to fix is that the following code also returns t in
> > emacs-unicode-2.
> > < --8<---------------cut here---------------start------------->8---
> > < (let ((string (encode-coding-string "local.テスト" 'utf-8)))
> > < (with-temp-buffer
> > < (set-buffer-multibyte nil)
> > < (insert string)
> > < (goto-char (point-min))
> > < (multibyte-string-p (symbol-name (read (current-buffer))))))
> > < --8<---------------cut here---------------end--------------->8---
> Yes, that's a clear bug.
I've just installed a fix.
---
Kenichi Handa
handa@ni.aist.go.jp
^ permalink raw reply [flat|nested] 40+ messages in thread
* [Unicode-2] `C-h f' error (was Re: `read' always returns multibyte symbol)
2007-11-14 4:53 ` Kenichi Handa
@ 2007-11-14 7:06 ` Katsumi Yamaoka
2007-11-14 13:01 ` Kenichi Handa
0 siblings, 1 reply; 40+ messages in thread
From: Katsumi Yamaoka @ 2007-11-14 7:06 UTC (permalink / raw)
To: Kenichi Handa; +Cc: emacs-devel
>>>>> Kenichi Handa <handa@ni.aist.go.jp> wrote:
> I've just installed a fix.
Thanks.
Isn't it a side effect of this change? The `C-h f' command causes
the following error, though it can be solved by reloading "help".
Debugger entered--Lisp error: (setting-constant :validate)
function-called-at-point()
[...]
call-interactively(describe-function)
It seems that the `with-syntax-table' macro, that
`function-called-at-point' uses, was not expanded properly when
dumping Emacs:
(disassemble 'function-called-at-point)
0 constant syntax-table
1 call 0
2 current-buffer
3 varbind :validate
4 varbind setup-function
5 constant (<byte code>...)
0 save-current-buffer
1 varref :validate
2 set-buffer
Regards,
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-14 3:56 ` [Unicode-2] `read' always returns multibyte symbol Katsumi Yamaoka
@ 2007-11-14 11:39 ` Katsumi Yamaoka
2007-11-14 14:52 ` Stefan Monnier
2007-11-15 10:20 ` Katsumi Yamaoka
0 siblings, 2 replies; 40+ messages in thread
From: Katsumi Yamaoka @ 2007-11-14 11:39 UTC (permalink / raw)
To: Kenichi Handa; +Cc: ding, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 509 bytes --]
>>>>> Katsumi Yamaoka wrote:
> I'll try using a unibyte buffer to parse active data (after the bug
> is fixed).
Handa-san, thank you for the fix in Unicode-2. I've also made a
change in the Gnus CVS trunk so that it may use a unibyte buffer.
Now it works not only with Emacs 23.0.60 but also with Emacs 22.1,
22.1.50, and 23.0.50.
BTW, I found another problem with Emacs 21 (Gnus still supports
Emacs 21, IIUC). So, I'll go on looking into it further.
The diff between Gnus trunk and Unicode-2 is here:
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 1379 bytes --]
*** gnus-start.el~ Sun Nov 11 21:51:22 2007
--- gnus-start.el Wed Nov 14 11:32:28 2007
***************
*** 2106,2112 ****
(if (equal method gnus-select-method)
(gnus-make-hashtable
(count-lines (point-min) (point-max)))
! (gnus-make-hashtable 4096)))))))
;; Delete unnecessary lines.
(goto-char (point-min))
(cond
--- 2106,2113 ----
(if (equal method gnus-select-method)
(gnus-make-hashtable
(count-lines (point-min) (point-max)))
! (gnus-make-hashtable 4096))))))
! group max min)
;; Delete unnecessary lines.
(goto-char (point-min))
(cond
***************
*** 2141,2148 ****
(insert prefix)
(zerop (forward-line 1)))))))
;; Store the active file in a hash table.
! (goto-char (point-min))
! (let (group max min)
(while (not (eobp))
(condition-case ()
(progn
--- 2142,2153 ----
(insert prefix)
(zerop (forward-line 1)))))))
;; Store the active file in a hash table.
! ;; Use a unibyte buffer in order to make `read' read non-ASCII
! ;; group names (which have been encoded) as unibyte strings.
! (mm-with-unibyte-buffer
! (insert-buffer-substring cur)
! (setq cur (current-buffer))
! (goto-char (point-min))
(while (not (eobp))
(condition-case ()
(progn
[-- Attachment #3: Type: text/plain, Size: 9 bytes --]
Regards,
[-- Attachment #4: Type: text/plain, Size: 142 bytes --]
_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error (was Re: `read' always returns multibyte symbol)
2007-11-14 7:06 ` [Unicode-2] `C-h f' error (was Re: `read' always returns multibyte symbol) Katsumi Yamaoka
@ 2007-11-14 13:01 ` Kenichi Handa
2007-11-15 2:06 ` [Unicode-2] `C-h f' error Katsumi Yamaoka
0 siblings, 1 reply; 40+ messages in thread
From: Kenichi Handa @ 2007-11-14 13:01 UTC (permalink / raw)
To: Katsumi Yamaoka; +Cc: emacs-devel
In article <b4mejete26t.fsf_-_@jpl.org>, Katsumi Yamaoka <yamaoka@jpl.org> writes:
>>>>>> Kenichi Handa <handa@ni.aist.go.jp> wrote:
> > I've just installed a fix.
> Thanks.
> Isn't it a side effect of this change? The `C-h f' command causes
> the following error, though it can be solved by reloading "help".
> Debugger entered--Lisp error: (setting-constant :validate)
> function-called-at-point()
> [...]
> call-interactively(describe-function)
I can't reproduce that bug. Perhaps because I did "make
bootstrap". How about you?
> It seems that the `with-syntax-table' macro, that
> `function-called-at-point' uses, was not expanded properly when
> dumping Emacs:
> (disassemble 'function-called-at-point)
> 0 constant syntax-table
> 1 call 0
> 2 current-buffer
> 3 varbind :validate
> 4 varbind setup-function
> 5 constant (<byte code>...)
> 0 save-current-buffer
> 1 varref :validate
> 2 set-buffer
I got this in *Disassemble* buffer.
byte code for function-called-at-point:
doc: Return a function around point or else called by the list containing point. ...
args: nil
0 constant syntax-table
1 call 0
2 current-buffer
3 varbind required-features
4 varbind standard-display-european-internal
5 constant (<byte code>...)
0 save-current-buffer
1 varref required-features
2 set-buffer
3 discard
4 constant set-syntax-table
5 varref standard-display-european-internal
6 call 1
7 discard
8 unbind 1
9 constant set-syntax-table
10 return
6 unwind-protect
7 constant set-syntax-table
8 varref emacs-lisp-mode-syntax-table
9 call 1
10 discard
11 constant nil
12 constant <byte code>
0 save-excursion
1 constant zerop
2 constant skip-syntax-backward
3 constant "_w"
4 call 1
5 call 1
6 goto-if-nil 1
9 following-char
10 char-syntax
11 constant 119
12 eq
13 goto-if-not-nil 1
16 following-char
17 char-syntax
18 constant 95
19 eq
20 goto-if-not-nil 1
23 constant forward-sexp
24 constant -1
25 call 1
26 discard
27:1 constant "'"
28 constant nil
29 skip-chars-forward
30 discard
31 constant read
32 current-buffer
33 call 1
34 dup
35 varbind obj
36 symbolp
37 goto-if-nil-else-pop 2
40 constant fboundp
41 varref obj
42 call 1
43 goto-if-nil-else-pop 2
46 varref obj
47:2 unbind 2
48 return
13 constant ((error))
14 condition-case
15 goto-if-not-nil-else-pop 1
18 constant nil
19 constant <byte code>
0 save-excursion
1 save-restriction
2 point-min
3 point
4 constant 1000
5 diff
6 max
7 point-max
8 narrow-to-region
9 discard
10 constant backward-up-list
11 constant 1
12 call 1
13 discard
14 constant 1
15 forward-char
16 discard
17 constant looking-at
18 constant "[ ]"
19 call 1
20 goto-if-nil 1
23 constant error
24 constant "Probably not a Lisp function call"
25 call 1
26 discard
27:1 constant read
28 current-buffer
29 call 1
30 dup
31 varbind obj
32 symbolp
33 goto-if-nil-else-pop 2
36 constant fboundp
37 varref obj
38 call 1
39 goto-if-nil-else-pop 2
42 varref obj
43:2 unbind 3
44 return
20 constant ((error))
21 condition-case
22:1 unbind 3
23 goto-if-not-nil-else-pop 6
26 constant find-tag-default
27 call 0
28 dup
29 varbind str
30 goto-if-nil-else-pop 2
33 constant intern-soft
34 varref str
35 call 1
36:2 dup
37 varbind sym
38 goto-if-nil 3
41 constant fboundp
42 varref sym
43 call 1
44 goto-if-nil 3
47 varref sym
48 goto 5
51:3 constant match-data
52 call 0
53 varbind save-match-data-internal
54 constant (<byte code>...)
0 constant set-match-data
1 varref save-match-data-internal
2 constant evaporate
3 call 2
4 return
55 unwind-protect
56 varref str
57 goto-if-nil-else-pop 4
60 constant string-match
61 constant "\\`\\W*\\(.*?\\)\\W*\\'"
62 varref str
63 call 2
64 goto-if-nil-else-pop 4
67 constant intern-soft
68 constant match-string
69 constant 1
70 varref str
71 call 2
72 call 1
73 varset sym
74 constant fboundp
75 varref sym
76 call 1
77 goto-if-nil-else-pop 4
80 varref sym
81:4 unbind 2
82:5 unbind 2
83:6 return
---
Kenichi Handa
handa@ni.aist.go.jp
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-14 11:39 ` Katsumi Yamaoka
@ 2007-11-14 14:52 ` Stefan Monnier
2007-11-14 23:52 ` Katsumi Yamaoka
2007-11-15 10:20 ` Katsumi Yamaoka
1 sibling, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2007-11-14 14:52 UTC (permalink / raw)
To: Katsumi Yamaoka; +Cc: Kenichi Handa, ding, emacs-devel
> ! ;; Use a unibyte buffer in order to make `read' read non-ASCII
> ! ;; group names (which have been encoded) as unibyte strings.
> ! (mm-with-unibyte-buffer
> ! (insert-buffer-substring cur)
Why is `cur' a multibyte buffer? Since it contains encoded strings, I'd
expect it would be better (more robust and convenient) to use a unibyte
buffer for it.
Stefan
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-14 14:52 ` Stefan Monnier
@ 2007-11-14 23:52 ` Katsumi Yamaoka
2007-11-15 1:15 ` Stefan Monnier
0 siblings, 1 reply; 40+ messages in thread
From: Katsumi Yamaoka @ 2007-11-14 23:52 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Kenichi Handa, ding, emacs-devel
>>>>> Stefan Monnier wrote:
>> ! ;; Use a unibyte buffer in order to make `read' read non-ASCII
>> ! ;; group names (which have been encoded) as unibyte strings.
>> ! (mm-with-unibyte-buffer
>> ! (insert-buffer-substring cur)
> Why is `cur' a multibyte buffer? Since it contains encoded strings, I'd
> expect it would be better (more robust and convenient) to use a unibyte
> buffer for it.
Good point. The `cur' is `nntp-server-buffer' (" *nntpd*") or
`gnus-work-buffer' (" *gnus work*") as the case may be. Gnus uses
those buffers for various purposes. Although there looks no
situation where it is necessary to have multibyte data as far as
I can observe, Gnus explicitly sets them as multibyte buffers (see
`nnheader-init-server-buffer' and `gnus-set-work-buffer'). I
believe the reason they do so is to prevent from breaking data
when copying them to another multibyte buffer (IIUC, copying data
from a multibyte buffer to a unibyte buffer causes no problem).
So, I didn't modify those buffers' multibyteness. If I introduced
a new unibyte work buffer (such as " *gnus binary work*"), it
required that `gnus-read-active-file-2' binds `nntp-server-buffer'
to it for example. It is used by all the back ends but I'm not
sure it never causes a problem with them all.
Regards,
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-14 23:52 ` Katsumi Yamaoka
@ 2007-11-15 1:15 ` Stefan Monnier
2007-11-15 3:01 ` Katsumi Yamaoka
0 siblings, 1 reply; 40+ messages in thread
From: Stefan Monnier @ 2007-11-15 1:15 UTC (permalink / raw)
To: Katsumi Yamaoka; +Cc: Kenichi Handa, ding, emacs-devel
>>> ! ;; Use a unibyte buffer in order to make `read' read non-ASCII
>>> ! ;; group names (which have been encoded) as unibyte strings.
>>> ! (mm-with-unibyte-buffer
>>> ! (insert-buffer-substring cur)
>> Why is `cur' a multibyte buffer? Since it contains encoded strings, I'd
>> expect it would be better (more robust and convenient) to use a unibyte
>> buffer for it.
> Good point. The `cur' is `nntp-server-buffer' (" *nntpd*") or
> `gnus-work-buffer' (" *gnus work*") as the case may be.
Don't know about gnus-work-buffer, but nntp-server-buffer should only
ever contain unibyte data AFAICT, so it would be better to put it in
unibyte mode.
> Gnus uses those buffers for various purposes. Although there looks no
> situation where it is necessary to have multibyte data as far as I can
> observe, Gnus explicitly sets them as multibyte buffers (see
> `nnheader-init-server-buffer' and `gnus-set-work-buffer').
> I believe the reason they do so is to prevent from breaking data when
> copying them to another multibyte buffer (IIUC, copying data from
> a multibyte buffer to a unibyte buffer causes no problem).
I'm not sure I understand: copying data from a multibyte buffer to
a unibyte buffer is exactly the case that can cause problems.
Stefan
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error
2007-11-14 13:01 ` Kenichi Handa
@ 2007-11-15 2:06 ` Katsumi Yamaoka
2007-11-19 8:31 ` Katsumi Yamaoka
0 siblings, 1 reply; 40+ messages in thread
From: Katsumi Yamaoka @ 2007-11-15 2:06 UTC (permalink / raw)
To: Kenichi Handa; +Cc: emacs-devel
>>>>> Kenichi Handa wrote:
> In article <b4mejete26t.fsf_-_@jpl.org>,
> Katsumi Yamaoka <yamaoka@jpl.org> writes:
>> Debugger entered--Lisp error: (setting-constant :validate)
>> function-called-at-point()
>> [...]
>> call-interactively(describe-function)
> I can't reproduce that bug. Perhaps because I did "make
> bootstrap". How about you?
I always do "make bootstrap" with a copy of the source checked
out from CVS. I did it again but it made no difference. Hm.
Though I'm not sure it is useful, the function definition of
`function-called-at-point' that has been dumped in the Emacs
executable is here (it differs from the one in help.elc):
(fset 'function-called-at-point (read (base64-decode-string "\
I1tuaWwgIlwzMDYgcBgZXDMwN1wyMTZcMzEwCiFcMjEwXDMxMVwzMTJcMzEzXDIxN1wyMDYWAFwz
MTFcMzE0XDMxNVwyMTcrXDIwNlMAXDMxNiBcMjExG1wyMDUkAFwzMTcLIVwyMTEcXDIwMzMAXDMy
MAwhXDIwMzMADFwyMDJSAFwzMjEgHVwzMjJcMjE2C1wyMDVRAFwzMjNcMzI0C1wiXDIwNVEAXDMx
N1wzMjVcMzI2C1wiIRRcMzIwDCFcMjA1UQAMKipcMjA3IiBbOnZhbGlkYXRlIHNldHVwLWZ1bmN0
aW9uIGVtYWNzLWxpc3AtbW9kZS1zeW50YXgtdGFibGUgc3RyIHN5bSBzYXZlLW1hdGNoLWRhdGEt
aW50ZXJuYWwgc3ludGF4LXRhYmxlICgoYnl0ZS1jb2RlICJyCHFcMjEwXDMwMgkhXDIxMClcMzAy
XDIwNyIgWzp2YWxpZGF0ZSBzZXR1cC1mdW5jdGlvbiBzZXQtc3ludGF4LXRhYmxlXSAyKSkgc2V0
LXN5bnRheC10YWJsZSBuaWwgKGJ5dGUtY29kZSAiXDIxMlwzMDFcMzAyXDMwMyEhXDIwMxsAZ3pc
MzA0PVwyMDQbAGd6XDMwNT1cMjA0GwBcMzA2XDMwNyFcMjEwXDMxMFwzMTF3XDIxMFwzMTJwIVwy
MTEYOVwyMDUvAFwzMTMIIVwyMDUvAAgqXDIwNyIgW29iaiB6ZXJvcCBza2lwLXN5bnRheC1iYWNr
d2FyZCAiX3ciIDExOSA5NSBmb3J3YXJkLXNleHAgLTEgIiciIG5pbCByZWFkIGZib3VuZHBdIDQp
ICgoZXJyb3IpKSAoYnl0ZS1jb2RlICJcMjEyXDIxNGVgXDMwMVpdZH1cMjEwXDMwMlwzMDMhXDIx
MFwzMDN1XDIxMFwzMDRcMzA1IVwyMDMbAFwzMDZcMzA3IVwyMTBcMzEwcCFcMjExGDlcMjA1KwBc
MzExCCFcMjA1KwAIK1wyMDciIFtvYmogMTAwMCBiYWNrd2FyZC11cC1saXN0IDEgbG9va2luZy1h
dCAiWyAJXSIgZXJyb3IgIlByb2JhYmx5IG5vdCBhIExpc3AgZnVuY3Rpb24gY2FsbCIgcmVhZCBm
Ym91bmRwXSA0KSAoKGVycm9yKSkgZmluZC10YWctZGVmYXVsdCBpbnRlcm4tc29mdCBmYm91bmRw
IG1hdGNoLWRhdGEgKChieXRlLWNvZGUgIlwzMDEIXDMwMlwiXDIwNyIgW3NhdmUtbWF0Y2gtZGF0
YS1pbnRlcm5hbCBzZXQtbWF0Y2gtZGF0YSBldmFwb3JhdGVdIDMpKSBzdHJpbmctbWF0Y2ggIlxc
YFxcVypcXCguKj9cXClcXFcqXFwnIiBtYXRjaC1zdHJpbmcgMV0gNSA5MDkyNDRd")))
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-15 1:15 ` Stefan Monnier
@ 2007-11-15 3:01 ` Katsumi Yamaoka
2007-11-15 3:39 ` Stefan Monnier
0 siblings, 1 reply; 40+ messages in thread
From: Katsumi Yamaoka @ 2007-11-15 3:01 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Kenichi Handa, ding, emacs-devel
>>>>> Stefan Monnier wrote:
> Don't know about gnus-work-buffer, but nntp-server-buffer should only
> ever contain unibyte data AFAICT, so it would be better to put it in
> unibyte mode.
I think it's better, too. However, there might be a code that
copies data from nntp-server-buffer to a multibyte buffer. I'm
not capable to check all the Gnus code.
>> (IIUC, copying data from a multibyte buffer to a unibyte buffer
>> causes no problem).
> I'm not sure I understand: copying data from a multibyte buffer to
> a unibyte buffer is exactly the case that can cause problems.
I agree that's generally true. But in Gnus' case, data in a
multibyte work buffer are the multibyte version of binary data.
I don't know proper words to explain it, sorry. In other words,
they are the one which `string-to-multibyte' converted binary
data to. For example:
(with-temp-buffer
(set-buffer-multibyte t)
(insert (string-to-multibyte (encode-coding-string "日本語" 'utf-8)))
(let ((buffer (current-buffer)))
(with-temp-buffer
(set-buffer-multibyte nil)
(insert-buffer-substring buffer)
(decode-coding-string (buffer-string) 'utf-8))))
=> "日本語"
I'm not sure it works with any data, though.
Regards,
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-15 3:01 ` Katsumi Yamaoka
@ 2007-11-15 3:39 ` Stefan Monnier
0 siblings, 0 replies; 40+ messages in thread
From: Stefan Monnier @ 2007-11-15 3:39 UTC (permalink / raw)
To: Katsumi Yamaoka; +Cc: Kenichi Handa, ding, emacs-devel
> I think it's better, too. However, there might be a code that
> copies data from nntp-server-buffer to a multibyte buffer. I'm
> not capable to check all the Gnus code.
I understand the desire to avoid changing code, but I think in the long
run it'll pay off.
>>> (IIUC, copying data from a multibyte buffer to a unibyte buffer
>>> causes no problem).
>> I'm not sure I understand: copying data from a multibyte buffer to
>> a unibyte buffer is exactly the case that can cause problems.
> I agree that's generally true. But in Gnus' case, data in a
> multibyte work buffer are the multibyte version of binary data.
> I don't know proper words to explain it, sorry. In other words,
> they are the one which `string-to-multibyte' converted binary
> data to. For example:
> (with-temp-buffer
> (set-buffer-multibyte t)
> (insert (string-to-multibyte (encode-coding-string "日本語" 'utf-8)))
> (let ((buffer (current-buffer)))
> (with-temp-buffer
> (set-buffer-multibyte nil)
> (insert-buffer-substring buffer)
> (decode-coding-string (buffer-string) 'utf-8))))
> => "日本語"
> I'm not sure it works with any data, though.
I'm not sure what you're saying. But IIUC the source buffer in your
example would be nntp-server-buffer, in which case turning it into
unibyte will not introduce any problem. On the contrary, it'll make it
more obviously correct.
Stefan
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-14 11:39 ` Katsumi Yamaoka
2007-11-14 14:52 ` Stefan Monnier
@ 2007-11-15 10:20 ` Katsumi Yamaoka
2007-11-15 11:08 ` Kenichi Handa
1 sibling, 1 reply; 40+ messages in thread
From: Katsumi Yamaoka @ 2007-11-15 10:20 UTC (permalink / raw)
To: ding; +Cc: emacs-devel
>>>>> Katsumi Yamaoka wrote:
> BTW, I found another problem with Emacs 21 (Gnus still supports
> Emacs 21, IIUC). So, I'll go on looking into it further.
I realized a network process that is created by
`open-network-stream' in Emacs 21 breaks encoded non-ASCII group
names if the process buffer is in the multibyte mode even if the
process coding system is binary. It behaves as if
`toggle-enable-multibyte-characters' modifies binary data when
turning on the multibyteness of a buffer. So, I made changes in
nntp.el in the Gnus trunk so that it makes a process buffer
unibyte. I also modified the nntp functions that copy data from
a unibyte buffer to a multibyte buffer.
Regards,
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-15 10:20 ` Katsumi Yamaoka
@ 2007-11-15 11:08 ` Kenichi Handa
2007-11-15 11:41 ` Katsumi Yamaoka
2007-11-15 15:22 ` Stefan Monnier
0 siblings, 2 replies; 40+ messages in thread
From: Kenichi Handa @ 2007-11-15 11:08 UTC (permalink / raw)
To: Katsumi Yamaoka; +Cc: ding, emacs-devel
In article <b4m1war6ca5.fsf@jpl.org>, Katsumi Yamaoka <yamaoka@jpl.org> writes:
>>>>>> Katsumi Yamaoka wrote:
> > BTW, I found another problem with Emacs 21 (Gnus still supports
> > Emacs 21, IIUC). So, I'll go on looking into it further.
> I realized a network process that is created by
> `open-network-stream' in Emacs 21 breaks encoded non-ASCII group
> names if the process buffer is in the multibyte mode even if the
> process coding system is binary. It behaves as if
> `toggle-enable-multibyte-characters' modifies binary data when
> turning on the multibyteness of a buffer.
If "modifies" means that 8-bit bytes are converted to
multibyte characters as what string-as-multibyte does, it's
an expected behaviour.
I long ago proposed a facility that turns on the
multibyteness of a buffer while converting 8-bit bytes to
multibyte characters as what string-to-multibyte does, but
not accepted.
---
Kenichi Handa
handa@ni.aist.go.jp
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-15 11:08 ` Kenichi Handa
@ 2007-11-15 11:41 ` Katsumi Yamaoka
2007-11-15 14:41 ` Kenichi Handa
2007-11-15 15:22 ` Stefan Monnier
1 sibling, 1 reply; 40+ messages in thread
From: Katsumi Yamaoka @ 2007-11-15 11:41 UTC (permalink / raw)
To: Kenichi Handa; +Cc: ding, emacs-devel
>>>>> Kenichi Handa wrote:
> In article <b4m1war6ca5.fsf@jpl.org>,
> Katsumi Yamaoka <yamaoka@jpl.org> writes:
>> I realized a network process that is created by
>> `open-network-stream' in Emacs 21 breaks encoded non-ASCII group
>> names if the process buffer is in the multibyte mode even if the
>> process coding system is binary. It behaves as if
>> `toggle-enable-multibyte-characters' modifies binary data when
>> turning on the multibyteness of a buffer.
(The changes that I made in nntp.el has been archived in
<URL:http://article.gmane.org/gmane.emacs.gnus.commits/5519>.)
> If "modifies" means that 8-bit bytes are converted to
> multibyte characters as what string-as-multibyte does, it's
> an expected behaviour.
What I observed was different. The group name "テスト" is
encoded by utf-8 by the nntp server into:
"\343\203\206\343\202\271\343\203\210"
After it is transferred to Gnus, in the nntp process bufer it is
modified into:
"\343\203XY\343\203\210"
Where X is (make-char 'greek-iso8859-7 99)
and Y is (make-char 'latin-iso8859-2 57).
Since Gnus treats a group name as a unibyte string, finally it
is made into:
"\343\203\343\271\343\203\210"
> I long ago proposed a facility that turns on the
> multibyteness of a buffer while converting 8-bit bytes to
> multibyte characters as what string-to-multibyte does, but
> not accepted.
But the modern Emacsen does do so, doesn't it?
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-15 11:41 ` Katsumi Yamaoka
@ 2007-11-15 14:41 ` Kenichi Handa
2007-11-15 23:31 ` Katsumi Yamaoka
0 siblings, 1 reply; 40+ messages in thread
From: Kenichi Handa @ 2007-11-15 14:41 UTC (permalink / raw)
To: Katsumi Yamaoka; +Cc: ding, emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1611 bytes --]
In article <b4moddv20sy.fsf@jpl.org>, Katsumi Yamaoka <yamaoka@jpl.org> writes:
> > If "modifies" means that 8-bit bytes are converted to
> > multibyte characters as what string-as-multibyte does, it's
> > an expected behaviour.
> What I observed was different. The group name "テスト" is
> encoded by utf-8 by the nntp server into:
> "\343\203\206\343\202\271\343\203\210"
> After it is transferred to Gnus, in the nntp process bufer it is
> modified into:
> "\343\203XY\343\203\210"
> Where X is (make-char 'greek-iso8859-7 99)
> and Y is (make-char 'latin-iso8859-2 57).
That is exactly what string-as-multibyte does. \206\343 and
\202\271 are valid multibyte forms in the current Emacs,
thus are treated as multibyte characters.
> Since Gnus treats a group name as a unibyte string, finally it
> is made into:
> "\343\203\343\271\343\203\210"
It seems that gnus treats "\343\203XY\343\203\210" as
unibyte by converting it by string-make-unibyte.
Please try this:
(string-make-unibyte
(string-as-multibyte "\343\203\206\343\202\271\343\203\210"))
You'll get the above result, ... yes, very weird.
On the other hand,
(string-as-unibyte
(string-as-multibyte "\343\203\206\343\202\271\343\203\210"))
=> "\343\203\206\343\202\271\343\203\210"
> > I long ago proposed a facility that turns on the
> > multibyteness of a buffer while converting 8-bit bytes to
> > multibyte characters as what string-to-multibyte does, but
> > not accepted.
> But the modern Emacsen does do so, doesn't it?
No.
---
Kenichi Handa
handa@ni.aist.go.jp
[-- Attachment #2: Type: text/plain, Size: 142 bytes --]
_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-15 11:08 ` Kenichi Handa
2007-11-15 11:41 ` Katsumi Yamaoka
@ 2007-11-15 15:22 ` Stefan Monnier
2007-11-16 0:29 ` Kenichi Handa
2007-11-16 10:50 ` Eli Zaretskii
1 sibling, 2 replies; 40+ messages in thread
From: Stefan Monnier @ 2007-11-15 15:22 UTC (permalink / raw)
To: Kenichi Handa; +Cc: Katsumi Yamaoka, ding, emacs-devel
> If "modifies" means that 8-bit bytes are converted to
> multibyte characters as what string-as-multibyte does, it's
> an expected behaviour.
99% of the uses of string-as-multibyte are bugs.
Stefan
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-15 14:41 ` Kenichi Handa
@ 2007-11-15 23:31 ` Katsumi Yamaoka
2007-11-16 0:51 ` Kenichi Handa
0 siblings, 1 reply; 40+ messages in thread
From: Katsumi Yamaoka @ 2007-11-15 23:31 UTC (permalink / raw)
To: Kenichi Handa; +Cc: ding, emacs-devel
>>>>> Kenichi Handa <handa@ni.aist.go.jp> wrote:
> In article <b4moddv20sy.fsf@jpl.org>,
> Katsumi Yamaoka <yamaoka@jpl.org> writes:
>> What I observed was different.
> That is exactly what string-as-multibyte does. \206\343 and
> \202\271 are valid multibyte forms in the current Emacs,
> thus are treated as multibyte characters.
I understood why such readable characters appeared abruptly.
[...]
> Please try this:
> (string-make-unibyte
> (string-as-multibyte "\343\203\206\343\202\271\343\203\210"))
> You'll get the above result, ... yes, very weird.
Oh, it made me surprised a bit. But I often view such a scene
while playing with unibyte and multibyte things, and it always
confuses me.
> On the other hand,
> (string-as-unibyte
> (string-as-multibyte "\343\203\206\343\202\271\343\203\210"))
> => "\343\203\206\343\202\271\343\203\210"
>>> I long ago proposed a facility that turns on the
>>> multibyteness of a buffer while converting 8-bit bytes to
>>> multibyte characters as what string-to-multibyte does, but
>>> not accepted.
>> But the modern Emacsen does do so, doesn't it?
> No.
Oops. I misunderstood that the reason why Emacs 22 and 23 don't
break 8-bit data while they are being fed into a multibyte buffer
from a network process of which the process coding system is
binary. So, maybe the best ways for the present are still to
use a unibyte buffer for unibyte data and to use a multibyte
buffer for multibyte data. And use a string, not a buffer, to
encode and decode data if the multibyteness of data will change,
like:
(insert (prog1
(decode-coding-string (buffer-string) 'coding)
(erase-buffer)
(set-buffer-multibyte t)))
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-15 15:22 ` Stefan Monnier
@ 2007-11-16 0:29 ` Kenichi Handa
2007-11-16 10:50 ` Eli Zaretskii
1 sibling, 0 replies; 40+ messages in thread
From: Kenichi Handa @ 2007-11-16 0:29 UTC (permalink / raw)
To: Stefan Monnier; +Cc: yamaoka, ding, emacs-devel
In article <jwvd4ubttyx.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes:
> > If "modifies" means that 8-bit bytes are converted to
> > multibyte characters as what string-as-multibyte does, it's
> > an expected behaviour.
> 99% of the uses of string-as-multibyte are bugs.
Sure.
---
Kenichi Handa
handa@ni.aist.go.jp
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-15 23:31 ` Katsumi Yamaoka
@ 2007-11-16 0:51 ` Kenichi Handa
2007-11-16 1:24 ` Katsumi Yamaoka
0 siblings, 1 reply; 40+ messages in thread
From: Kenichi Handa @ 2007-11-16 0:51 UTC (permalink / raw)
To: Katsumi Yamaoka; +Cc: ding, emacs-devel
In article <b4my7czxeze.fsf@jpl.org>, Katsumi Yamaoka <yamaoka@jpl.org> writes:
> Oops. I misunderstood that the reason why Emacs 22 and 23 don't
> break 8-bit data while they are being fed into a multibyte buffer
> from a network process of which the process coding system is
> binary. So, maybe the best ways for the present are still to
> use a unibyte buffer for unibyte data and to use a multibyte
> buffer for multibyte data. And use a string, not a buffer, to
> encode and decode data if the multibyteness of data will change,
> like:
> (insert (prog1
> (decode-coding-string (buffer-string) 'coding)
> (erase-buffer)
> (set-buffer-multibyte t)))
The best is to decide buffer's multibyteness just after it
is created, and don't change the multibyteness later.
---
Kenichi Handa
handa@ni.aist.go.jp
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-16 0:51 ` Kenichi Handa
@ 2007-11-16 1:24 ` Katsumi Yamaoka
2007-11-16 2:51 ` Stefan Monnier
0 siblings, 1 reply; 40+ messages in thread
From: Katsumi Yamaoka @ 2007-11-16 1:24 UTC (permalink / raw)
To: Kenichi Handa; +Cc: ding, emacs-devel
>>>>> Kenichi Handa wrote:
> In article <b4my7czxeze.fsf@jpl.org>,
> Katsumi Yamaoka <yamaoka@jpl.org> writes:
>> (insert (prog1
>> (decode-coding-string (buffer-string) 'coding)
>> (erase-buffer)
>> (set-buffer-multibyte t)))
> The best is to decide buffer's multibyteness just after it
> is created, and don't change the multibyteness later.
I see. In relation to this, I've been wanting to exterminate
the `mm-with-unibyte-current-buffer' macro that Gnus uses here
and there (if you have time, please look at how it is evil, in
mm-util.el).
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-16 1:24 ` Katsumi Yamaoka
@ 2007-11-16 2:51 ` Stefan Monnier
0 siblings, 0 replies; 40+ messages in thread
From: Stefan Monnier @ 2007-11-16 2:51 UTC (permalink / raw)
To: Katsumi Yamaoka; +Cc: Kenichi Handa, ding, emacs-devel
> I see. In relation to this, I've been wanting to exterminate
> the `mm-with-unibyte-current-buffer' macro that Gnus uses here
> and there (if you have time, please look at how it is evil, in
> mm-util.el).
Yes, I spotted it a while ago already (I'm using a few local hacks to
try and catch some multi/unibyte abuses so I tend to bump into bugs
a bit earlier than in normal use).
I think a mistake of Emacs's handling of encoding issues is that we use
"unibyte" and "multibyte" rather than "byteS" and chars".
Stefan
PS: Here are some hunks from my local changes.
@@ -1034,16 +1068,18 @@
(defmacro mm-with-unibyte-buffer (&rest forms)
"Create a temporary buffer, and evaluate FORMS there like `progn'.
Use unibyte mode for this."
- `(let (default-enable-multibyte-characters)
- (with-temp-buffer ,@forms)))
+ `(with-temp-buffer
+ (mm-disable-multibyte)
+ ,@forms))
(put 'mm-with-unibyte-buffer 'lisp-indent-function 0)
(put 'mm-with-unibyte-buffer 'edebug-form-spec '(body))
(defmacro mm-with-multibyte-buffer (&rest forms)
"Create a temporary buffer, and evaluate FORMS there like `progn'.
Use multibyte mode for this."
- `(let ((default-enable-multibyte-characters t))
- (with-temp-buffer ,@forms)))
+ `(with-temp-buffer
+ (mm-enable-multibyte)
+ ,@forms))
(put 'mm-with-multibyte-buffer 'lisp-indent-function 0)
(put 'mm-with-multibyte-buffer 'edebug-form-spec '(body))
@@ -1058,24 +1094,29 @@
harmful since it is likely to modify existing data in the buffer.
For instance, it converts \"\\300\\255\" into \"\\255\" in
Emacs 23 (unicode)."
- (let ((multibyte (make-symbol "multibyte"))
- (buffer (make-symbol "buffer")))
- `(if mm-emacs-mule
- (let ((,multibyte enable-multibyte-characters)
- (,buffer (current-buffer)))
- (unwind-protect
- (let (default-enable-multibyte-characters)
- (set-buffer-multibyte nil)
- ,@forms)
- (set-buffer ,buffer)
- (set-buffer-multibyte ,multibyte)))
- (let (default-enable-multibyte-characters)
- ,@forms))))
+ (message "Braindeadly defined macro: mm-with-unibyte-current-buffer")
+ ;; (let ((multibyte (make-symbol "multibyte"))
+ ;; (buffer (make-symbol "buffer")))
+ ;; `(if mm-emacs-mule
+ ;; (let ((,multibyte enable-multibyte-characters)
+ ;; (,buffer (current-buffer)))
+ ;; (unwind-protect
+ ;; (let (default-enable-multibyte-characters)
+ ;; (set-buffer-multibyte nil)
+ ;; ,@forms)
+ ;; (set-buffer ,buffer)
+ ;; (set-buffer-multibyte ,multibyte)))
+ ;; (let (default-enable-multibyte-characters)
+ ;; ,@forms)))
+ `(progn (assert (not enable-multibyte-characters))
+ ,@forms)
+ )
(put 'mm-with-unibyte-current-buffer 'lisp-indent-function 0)
(put 'mm-with-unibyte-current-buffer 'edebug-form-spec '(body))
(defmacro mm-with-unibyte (&rest forms)
"Eval the FORMS with the default value of `enable-multibyte-characters' nil."
+ (message "Braindead macro: mm-with-unibyte")
`(let (default-enable-multibyte-characters)
,@forms))
(put 'mm-with-unibyte 'lisp-indent-function 0)
@@ -1083,6 +1124,7 @@
(defmacro mm-with-multibyte (&rest forms)
"Eval the FORMS with the default value of `enable-multibyte-characters' t."
+ (message "Braindead macro: mm-with-multibyte")
`(let ((default-enable-multibyte-characters t))
,@forms))
(put 'mm-with-multibyte 'lisp-indent-function 0)
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol
2007-11-15 15:22 ` Stefan Monnier
2007-11-16 0:29 ` Kenichi Handa
@ 2007-11-16 10:50 ` Eli Zaretskii
1 sibling, 0 replies; 40+ messages in thread
From: Eli Zaretskii @ 2007-11-16 10:50 UTC (permalink / raw)
To: Stefan Monnier; +Cc: yamaoka, handa, ding, emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Thu, 15 Nov 2007 10:22:12 -0500
> Cc: Katsumi Yamaoka <yamaoka@jpl.org>, ding@gnus.org, emacs-devel@gnu.org
>
> 99% of the uses of string-as-multibyte are bugs.
Should we emit a warning from the byte compiler about that? (Sorry if
we already do: I didn't have time to look.)
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error
2007-11-15 2:06 ` [Unicode-2] `C-h f' error Katsumi Yamaoka
@ 2007-11-19 8:31 ` Katsumi Yamaoka
2007-11-20 11:09 ` CHENG Gao
0 siblings, 1 reply; 40+ messages in thread
From: Katsumi Yamaoka @ 2007-11-19 8:31 UTC (permalink / raw)
To: Kenichi Handa; +Cc: emacs-devel
>>>>> Katsumi Yamaoka wrote:
>>> Debugger entered--Lisp error: (setting-constant :validate)
>>> function-called-at-point()
>>> [...]
>>> call-interactively(describe-function)
I think I have reached to the real cause of this problem. Though
it may happen only to me, I've tested it with two machines running
different OS (Fedora 8 and RHL 9). The necessary conditions to
make it happen are:
A function is dumped into the Emacs executable.
It uses a macro in which uninterned symbols are used in `let'.
In that case, uninterned symbols seem to be replaced with the
interned ones when dumping into Emacs. The way I reproduced it is:
1. Make the /tmp/test.el file (attached below) and byte compile it.
2. Modify the lisp/loadup.el file as follows:
--8<---------------cut here---------------start------------->8---
*** loadup.el~ Sun Nov 11 21:51:19 2007
--- loadup.el Mon Nov 19 08:14:23 2007
***************
*** 85,88 ****
--- 85,89 ----
(load "simple")
+ (load "/tmp/test")
(load "help")
--8<---------------cut here---------------end--------------->8---
3. Dump Emacs in this way:
$ cd src
$ ./temacs -batch -l loadup dump
4. Run Emacs as:
$ ./emacs -batch -Q -eval '(foo)'
I got:
set-display-table-and-terminal-coding-system reset-language-environment English
5. Run Emacs as:
$ ./emacs -batch -Q -l /tmp/test -eval '(foo)'
I got:
foo bar baz
The test.el file is here:
--8<---------------cut here---------------start------------->8---
(defmacro foo-macro nil
(let ((foo (make-symbol "foo"))
(bar (make-symbol "bar"))
(baz (make-symbol "baz")))
`(message "%s %s %s" ',foo ',bar ',baz)))
(defun foo nil
(foo-macro))
--8<---------------cut here---------------end--------------->8---
Regards,
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error
2007-11-19 8:31 ` Katsumi Yamaoka
@ 2007-11-20 11:09 ` CHENG Gao
2007-11-21 10:55 ` Katsumi Yamaoka
0 siblings, 1 reply; 40+ messages in thread
From: CHENG Gao @ 2007-11-20 11:09 UTC (permalink / raw)
To: emacs-devel
*On Mon, 19 Nov 2007 17:31:02 +0900
* Also sprach Katsumi Yamaoka <yamaoka@jpl.org>:
> I think I have reached to the real cause of this problem. Though
> it may happen only to me, I've tested it with two machines running
> different OS (Fedora 8 and RHL 9). The necessary conditions to
> make it happen are:
Yesterday I reported this problem. I just read you already found this.
This is to confirm this problem does not only happen to you.
--
Vivere est cogitare
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error
2007-11-20 11:09 ` CHENG Gao
@ 2007-11-21 10:55 ` Katsumi Yamaoka
2007-11-21 12:14 ` Kenichi Handa
0 siblings, 1 reply; 40+ messages in thread
From: Katsumi Yamaoka @ 2007-11-21 10:55 UTC (permalink / raw)
To: CHENG Gao; +Cc: emacs-devel
>>>>> CHENG Gao wrote:
> *On Mon, 19 Nov 2007 17:31:02 +0900
> * Also sprach Katsumi Yamaoka <yamaoka@jpl.org>:
>> I think I have reached to the real cause of this problem. Though
>> it may happen only to me, I've tested it with two machines running
>> different OS (Fedora 8 and RHL 9). The necessary conditions to
>> make it happen are:
> Yesterday I reported this problem. I just read you already found this.
> This is to confirm this problem does not only happen to you.
Thanks for the confirmation. I tried bootstrapping Unicode-2
with lread.c before Handa-san changed. It works normal. For
the Lisp form
(aref (symbol-function (function function-called-at-point)) 2)
it returns
[buffer table emacs-lisp-mode-syntax-table str sym ...
while the latest Unicode-2 returns:
[:validate setup-function emacs-lisp-mode-syntax-table str sym ...
Regards,
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error
2007-11-21 10:55 ` Katsumi Yamaoka
@ 2007-11-21 12:14 ` Kenichi Handa
2007-11-21 12:28 ` Katsumi Yamaoka
` (2 more replies)
0 siblings, 3 replies; 40+ messages in thread
From: Kenichi Handa @ 2007-11-21 12:14 UTC (permalink / raw)
To: Katsumi Yamaoka; +Cc: emacs-devel, chenggao
In article <b4mbq9nlvf3.fsf@jpl.org>, Katsumi Yamaoka <yamaoka@jpl.org> writes:
> Thanks for the confirmation. I tried bootstrapping Unicode-2
> with lread.c before Handa-san changed. It works normal. For
> the Lisp form
> (aref (symbol-function (function function-called-at-point)) 2)
> it returns
> [buffer table emacs-lisp-mode-syntax-table str sym ...
> while the latest Unicode-2 returns:
> [:validate setup-function emacs-lisp-mode-syntax-table str sym ...
Thank you for investigating this problem. But, as I don't
have a time to work on it at the moment, I fixed lread.c so
that it works as previously.
---
Kenichi Handa
handa@ni.aist.go.jp
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error
2007-11-21 12:14 ` Kenichi Handa
@ 2007-11-21 12:28 ` Katsumi Yamaoka
2007-11-22 2:27 ` Richard Stallman
2007-11-23 15:20 ` Johan Bockgård
2 siblings, 0 replies; 40+ messages in thread
From: Katsumi Yamaoka @ 2007-11-21 12:28 UTC (permalink / raw)
To: Kenichi Handa; +Cc: emacs-devel, chenggao
>>>>> Kenichi Handa wrote:
> Thank you for investigating this problem. But, as I don't
> have a time to work on it at the moment, I fixed lread.c so
> that it works as previously.
No need to fix it in a hurry (at least for me), please take
your time. Thanks.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error
2007-11-21 12:14 ` Kenichi Handa
2007-11-21 12:28 ` Katsumi Yamaoka
@ 2007-11-22 2:27 ` Richard Stallman
2007-11-22 4:51 ` Kenichi Handa
2007-11-23 15:20 ` Johan Bockgård
2 siblings, 1 reply; 40+ messages in thread
From: Richard Stallman @ 2007-11-22 2:27 UTC (permalink / raw)
To: Kenichi Handa; +Cc: yamaoka, chenggao, emacs-devel
Thank you for investigating this problem. But, as I don't
have a time to work on it at the moment, I fixed lread.c so
that it works as previously.
What is the lread.c behavior that you changed?
What is the new behavior?
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error
2007-11-22 2:27 ` Richard Stallman
@ 2007-11-22 4:51 ` Kenichi Handa
2007-11-22 16:22 ` Richard Stallman
0 siblings, 1 reply; 40+ messages in thread
From: Kenichi Handa @ 2007-11-22 4:51 UTC (permalink / raw)
To: rms; +Cc: yamaoka, emacs-devel, chenggao
In article <E1Iv1mx-0003q8-0I@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
> Thank you for investigating this problem. But, as I don't
> have a time to work on it at the moment, I fixed lread.c so
> that it works as previously.
> What is the lread.c behavior that you changed?
Make the Lisp reader to generate a symbol of unibyte name
when it is read from a unibyte buffer. Previously, the
multibyteness of a symbol name is determined by the byte
sequence (by using make_string).
> What is the new behavior?
The following phenomenon was reported.
Katsumi Yamaoka <yamaoka@jpl.org> writes:
> Isn't it a side effect of this change? The `C-h f' command causes
> the following error, though it can be solved by reloading "help".
> Debugger entered--Lisp error: (setting-constant :validate)
> function-called-at-point()
> [...]
> call-interactively(describe-function)
> It seems that the `with-syntax-table' macro, that
> `function-called-at-point' uses, was not expanded properly when
> dumping Emacs:
> (disassemble 'function-called-at-point)
> 0 constant syntax-table
> 1 call 0
> 2 current-buffer
> 3 varbind :validate
> 4 varbind setup-function
> 5 constant (<byte code>...)
> 0 save-current-buffer
> 1 varref :validate
> 2 set-buffer
---
Kenichi Handa
handa@ni.aist.go.jp
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error
2007-11-22 4:51 ` Kenichi Handa
@ 2007-11-22 16:22 ` Richard Stallman
0 siblings, 0 replies; 40+ messages in thread
From: Richard Stallman @ 2007-11-22 16:22 UTC (permalink / raw)
To: Kenichi Handa; +Cc: yamaoka, chenggao, emacs-devel
Make the Lisp reader to generate a symbol of unibyte name
when it is read from a unibyte buffer. Previously, the
multibyteness of a symbol name is determined by the byte
sequence (by using make_string).
Please add a comment explaining the bad results that happened
when we tried the other way. (Unless you already did so.)
It is very important to explain this IN the source file.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error
2007-11-21 12:14 ` Kenichi Handa
2007-11-21 12:28 ` Katsumi Yamaoka
2007-11-22 2:27 ` Richard Stallman
@ 2007-11-23 15:20 ` Johan Bockgård
2007-11-25 12:35 ` Kenichi Handa
2007-11-25 12:39 ` Kenichi Handa
2 siblings, 2 replies; 40+ messages in thread
From: Johan Bockgård @ 2007-11-23 15:20 UTC (permalink / raw)
To: emacs-devel
Kenichi Handa <handa@ni.aist.go.jp> writes:
> Thank you for investigating this problem. But, as I don't
> have a time to work on it at the moment, I fixed lread.c so
> that it works as previously.
Your change replaced make_symbol with Fmake_symbol (and intern with
Fintern), and make_symbol does
Fmake_symbol ((!NILP (Vpurify_flag)
? make_pure_string (str, len, len, 0)
: make_string (str, len)));
In the make_symbol/Fmake_symbol pair of functions, the Vpurify_flag
check is in the former (so is not done after the change); but in the
intern/Fintern pair it is in the latter. Isn't this the problem?
--
Johan Bockgård
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error
2007-11-23 15:20 ` Johan Bockgård
@ 2007-11-25 12:35 ` Kenichi Handa
2007-12-02 21:27 ` Richard Stallman
2007-11-25 12:39 ` Kenichi Handa
1 sibling, 1 reply; 40+ messages in thread
From: Kenichi Handa @ 2007-11-25 12:35 UTC (permalink / raw)
To: Johan =?ISO-2022-JP-2?B?Qm9ja2c=GyQoRCspGyhCcmQ=?=; +Cc: emacs-devel
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=ISO-2022-JP-2, Size: 813 bytes --]
In article <yoijve7tvvis.fsf@remote1.student.chalmers.se>, bojohan+news@dd.chalmers.se (Johan Bockg^[$(D+)^[(Brd) writes:
> Your change replaced make_symbol with Fmake_symbol (and intern with
> Fintern), and make_symbol does
> Fmake_symbol ((!NILP (Vpurify_flag)
> ? make_pure_string (str, len, len, 0)
> : make_string (str, len)));
> In the make_symbol/Fmake_symbol pair of functions, the Vpurify_flag
> check is in the former (so is not done after the change); but in the
> intern/Fintern pair it is in the latter. Isn't this the problem?
Ah! Perhaps. But, I don't understand the reason of calling
make_pure_string always with the last arg multibyte as 0.
Richard, don't you remember anything? It seems that this
part was lastly modified by you about 10 years ago.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error
2007-11-23 15:20 ` Johan Bockgård
2007-11-25 12:35 ` Kenichi Handa
@ 2007-11-25 12:39 ` Kenichi Handa
1 sibling, 0 replies; 40+ messages in thread
From: Kenichi Handa @ 2007-11-25 12:39 UTC (permalink / raw)
To: Johan =?ISO-2022-JP-2?B?Qm9ja2c=GyQoRCspGyhCcmQ=?=; +Cc: emacs-devel
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=ISO-2022-JP-2, Size: 988 bytes --]
In article <yoijve7tvvis.fsf@remote1.student.chalmers.se>, bojohan+news@dd.chalmers.se (Johan Bockg^[$(D+)^[(Brd) writes:
> Your change replaced make_symbol with Fmake_symbol (and intern with
> Fintern), and make_symbol does
> Fmake_symbol ((!NILP (Vpurify_flag)
> ? make_pure_string (str, len, len, 0)
> : make_string (str, len)));
> In the make_symbol/Fmake_symbol pair of functions, the Vpurify_flag
> check is in the former (so is not done after the change); but in the
> intern/Fintern pair it is in the latter. Isn't this the problem?
Ah! Good point! Perhaps you are right.
But, I don't understand the reason of calling
make_pure_string always with the last arg multibyte as 0.
It isn't consistent with the case of Vpurify_flag is nil
(letting make_string determine the multibyteness).
Richard, don't you remember anything? It seems that this
part was lastly modified by you about 10 years ago.
---
Kenichi Handa
handa@ni.aist.go.jp
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error
2007-11-25 12:35 ` Kenichi Handa
@ 2007-12-02 21:27 ` Richard Stallman
2007-12-05 5:11 ` Kenichi Handa
0 siblings, 1 reply; 40+ messages in thread
From: Richard Stallman @ 2007-12-02 21:27 UTC (permalink / raw)
To: Kenichi Handa; +Cc: emacs-devel, bojohan+news
> Fmake_symbol ((!NILP (Vpurify_flag)
> ? make_pure_string (str, len, len, 0)
> : make_string (str, len)));
> In the make_symbol/Fmake_symbol pair of functions, the Vpurify_flag
> check is in the former (so is not done after the change); but in the
> intern/Fintern pair it is in the latter. Isn't this the problem?
Ah! Perhaps. But, I don't understand the reason of calling
make_pure_string always with the last arg multibyte as 0.
Richard, don't you remember anything?
Not any more.
Sorry.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error
2007-12-02 21:27 ` Richard Stallman
@ 2007-12-05 5:11 ` Kenichi Handa
2007-12-05 11:26 ` Katsumi Yamaoka
0 siblings, 1 reply; 40+ messages in thread
From: Kenichi Handa @ 2007-12-05 5:11 UTC (permalink / raw)
To: rms; +Cc: yamaoka, bojohan+news, emacs-devel
In article <E1IywL8-0006cL-L4@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes:
> In the make_symbol/Fmake_symbol pair of functions, the Vpurify_flag
> check is in the former (so is not done after the change); but in the
> intern/Fintern pair it is in the latter. Isn't this the problem?
> Ah! Perhaps. But, I don't understand the reason of calling
> make_pure_string always with the last arg multibyte as 0.
> Richard, don't you remember anything?
> Not any more.
> Sorry.
Ok, I fixed the code by checking Vpurfiy_flag. Please check
if the C-h f problem is fixed or not.
---
Kenichi Handa
handa@ni.aist.go.jp
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error
2007-12-05 5:11 ` Kenichi Handa
@ 2007-12-05 11:26 ` Katsumi Yamaoka
0 siblings, 0 replies; 40+ messages in thread
From: Katsumi Yamaoka @ 2007-12-05 11:26 UTC (permalink / raw)
To: Kenichi Handa; +Cc: emacs-devel, rms, bojohan+news
>>>>> Kenichi Handa wrote:
> Ok, I fixed the code by checking Vpurfiy_flag. Please check
> if the C-h f problem is fixed or not.
It works as expected. Thank you.
^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2007-12-05 11:26 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-13 9:41 [Unicode-2] `read' always returns multibyte symbol Katsumi Yamaoka
2007-11-13 12:55 ` Kenichi Handa
2007-11-13 15:10 ` Stefan Monnier
2007-11-14 4:53 ` Kenichi Handa
2007-11-14 7:06 ` [Unicode-2] `C-h f' error (was Re: `read' always returns multibyte symbol) Katsumi Yamaoka
2007-11-14 13:01 ` Kenichi Handa
2007-11-15 2:06 ` [Unicode-2] `C-h f' error Katsumi Yamaoka
2007-11-19 8:31 ` Katsumi Yamaoka
2007-11-20 11:09 ` CHENG Gao
2007-11-21 10:55 ` Katsumi Yamaoka
2007-11-21 12:14 ` Kenichi Handa
2007-11-21 12:28 ` Katsumi Yamaoka
2007-11-22 2:27 ` Richard Stallman
2007-11-22 4:51 ` Kenichi Handa
2007-11-22 16:22 ` Richard Stallman
2007-11-23 15:20 ` Johan Bockgård
2007-11-25 12:35 ` Kenichi Handa
2007-12-02 21:27 ` Richard Stallman
2007-12-05 5:11 ` Kenichi Handa
2007-12-05 11:26 ` Katsumi Yamaoka
2007-11-25 12:39 ` Kenichi Handa
2007-11-14 3:56 ` [Unicode-2] `read' always returns multibyte symbol Katsumi Yamaoka
2007-11-14 11:39 ` Katsumi Yamaoka
2007-11-14 14:52 ` Stefan Monnier
2007-11-14 23:52 ` Katsumi Yamaoka
2007-11-15 1:15 ` Stefan Monnier
2007-11-15 3:01 ` Katsumi Yamaoka
2007-11-15 3:39 ` Stefan Monnier
2007-11-15 10:20 ` Katsumi Yamaoka
2007-11-15 11:08 ` Kenichi Handa
2007-11-15 11:41 ` Katsumi Yamaoka
2007-11-15 14:41 ` Kenichi Handa
2007-11-15 23:31 ` Katsumi Yamaoka
2007-11-16 0:51 ` Kenichi Handa
2007-11-16 1:24 ` Katsumi Yamaoka
2007-11-16 2:51 ` Stefan Monnier
2007-11-15 15:22 ` Stefan Monnier
2007-11-16 0:29 ` Kenichi Handa
2007-11-16 10:50 ` Eli Zaretskii
2007-11-13 15:07 ` Stefan Monnier
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.