* [Unicode-2] `read' always returns multibyte symbol @ 2007-11-13 9:41 Katsumi Yamaoka 2007-11-13 12:55 ` Kenichi Handa 2007-11-13 15:07 ` Stefan Monnier 0 siblings, 2 replies; 40+ messages in thread From: Katsumi Yamaoka @ 2007-11-13 9:41 UTC (permalink / raw) To: emacs-devel; +Cc: ding Hi, The following Lisp snippet emulates what Gnus does when reading active data for the local.テスト newsgroup. The buffer contains data which have been retrieved from the nntp server. Note that the newsgroup name contains non-ASCII characters, which has been encoded by utf-8 in the server. --8<---------------cut here---------------start------------->8--- (let ((string (encode-coding-string "local.テスト" 'utf-8))) (with-temp-buffer (set-buffer-multibyte t) (insert (string-to-multibyte string)) (goto-char (point-min)) (multibyte-string-p (symbol-name (read (current-buffer)))))) --8<---------------cut here---------------end--------------->8--- While Emacs trunk returns nil for this, Emacs Unicode-2 returns t. If it is not intentional, I hope `read' behaves just like it does in Emacs trunk. Otherwise, is there a way to make `read' return a unibyte symbol (without slowing down)? In the inside of Gnus, non-ASCII group names are all treated as unibyte strings, that are the ones that the server has encoded with certain coding systems. Because of the present behavior of `read' in Emacs Unicode-2, Gnus doesn't work with such newsgroups perfectly. You can find the actual code in gnus-start.el as follows: --8<---------------cut here---------------start------------->8--- ;; Read an active file and place the results in `gnus-active-hashtb'. (defun gnus-active-to-gnus-format (&optional method hashtb ignore-errors real-active) [...] ;; group gets set to a symbol interned in the hash table ;; (what a hack!!) - jwz (setq group (let ((obarray hashtb)) (read cur))) --8<---------------cut here---------------end--------------->8--- As you can see, it needs to work fast because there might be a lot of newsgroups. So, if possible, I don't want to modify it into: --8<---------------cut here---------------start------------->8--- (setq group (intern (mm-string-as-unibyte (symbol-name (read cur))) hashtb)) --8<---------------cut here---------------end--------------->8--- Regards, ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-13 9:41 [Unicode-2] `read' always returns multibyte symbol Katsumi Yamaoka @ 2007-11-13 12:55 ` Kenichi Handa 2007-11-13 15:10 ` Stefan Monnier 2007-11-14 3:56 ` [Unicode-2] `read' always returns multibyte symbol Katsumi Yamaoka 2007-11-13 15:07 ` Stefan Monnier 1 sibling, 2 replies; 40+ messages in thread From: Kenichi Handa @ 2007-11-13 12:55 UTC (permalink / raw) To: Katsumi Yamaoka; +Cc: ding, emacs-devel [-- Attachment #1: Type: text/plain, Size: 3483 bytes --] In article <b4moddycwjv.fsf@jpl.org>, Katsumi Yamaoka <yamaoka@jpl.org> writes: > The following Lisp snippet emulates what Gnus does when reading > active data for the local.テスト newsgroup. The buffer contains > data which have been retrieved from the nntp server. Note that > the newsgroup name contains non-ASCII characters, which has been > encoded by utf-8 in the server. > --8<---------------cut here---------------start------------->8--- > (let ((string (encode-coding-string "local.テスト" 'utf-8))) > (with-temp-buffer > (set-buffer-multibyte t) > (insert (string-to-multibyte string)) > (goto-char (point-min)) > (multibyte-string-p (symbol-name (read (current-buffer)))))) > --8<---------------cut here---------------end--------------->8--- > While Emacs trunk returns nil for this, Emacs Unicode-2 returns t. That is because `read' decides the name is unibyte or multibyte by whether the name is a valid multibyte sequence or not. In the trunk, utf-8 byte sequecne is not a valid multibyte sequecne, but in emacs-unicode-2, it is valid. > If it is not intentional, I hope `read' behaves just like it does > in Emacs trunk. The relevant code for `read' is very complicated and I want to avoid touching it if there's another way. In addition, I think it is the right thing that the above code return t; i.e. any symbol created by reading a multibyte buffer should have a multibyte string name. The bug to fix is that the following code also returns t in emacs-unicode-2. < --8<---------------cut here---------------start------------->8--- < (let ((string (encode-coding-string "local.テスト" 'utf-8))) < (with-temp-buffer < (set-buffer-multibyte nil) < (insert string) < (goto-char (point-min)) < (multibyte-string-p (symbol-name (read (current-buffer)))))) < --8<---------------cut here---------------end--------------->8--- > Otherwise, is there a way to make `read' return a unibyte > symbol (without slowing down)? The replacement of the above code is simple as this: (multibyte-string-p (intern (encode-coding-string "local.テスト" 'utf-8))) But, hmmm, it seems that we can't use such a code in gnus... > In the inside of Gnus, non-ASCII group names are all treated as > unibyte strings, that are the ones that the server has encoded > with certain coding systems. Because of the present behavior of > `read' in Emacs Unicode-2, Gnus doesn't work with such newsgroups > perfectly. You can find the actual code in gnus-start.el as > follows: > --8<---------------cut here---------------start------------->8--- > ;; Read an active file and place the results in `gnus-active-hashtb'. > (defun gnus-active-to-gnus-format (&optional method hashtb ignore-errors > real-active) > [...] > ;; group gets set to a symbol interned in the hash table > ;; (what a hack!!) - jwz > (setq group (let ((obarray hashtb)) (read cur))) > --8<---------------cut here---------------end--------------->8--- How about this? (setq group (let ((obarray hashtb) pos) (skip-syntax-forward "^w_") (setq pos (point)) (skip-syntax-forward "w_") (intern (buffer-substring pos (point))))) I think the overhead is just several more function calls. The actual task (searching for a range of symbol constituents, make string from them, and intern it) is almost the same. --- Kenichi Handa handa@ni.aist.go.jp [-- Attachment #2: Type: text/plain, Size: 142 bytes --] _______________________________________________ Emacs-devel mailing list Emacs-devel@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-devel ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-13 12:55 ` Kenichi Handa @ 2007-11-13 15:10 ` Stefan Monnier 2007-11-14 4:53 ` Kenichi Handa 2007-11-14 3:56 ` [Unicode-2] `read' always returns multibyte symbol Katsumi Yamaoka 1 sibling, 1 reply; 40+ messages in thread From: Stefan Monnier @ 2007-11-13 15:10 UTC (permalink / raw) To: Kenichi Handa; +Cc: Katsumi Yamaoka, ding, emacs-devel > That is because `read' decides the name is unibyte or multibyte by > whether the name is a valid multibyte sequence or not. Yuck. > The bug to fix is that the following code also returns t in > emacs-unicode-2. > < --8<---------------cut here---------------start------------->8--- > < (let ((string (encode-coding-string "local.テスト" 'utf-8))) > < (with-temp-buffer > < (set-buffer-multibyte nil) > < (insert string) > < (goto-char (point-min)) > < (multibyte-string-p (symbol-name (read (current-buffer)))))) > < --8<---------------cut here---------------end--------------->8--- Yes, that's a clear bug. Stefan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-13 15:10 ` Stefan Monnier @ 2007-11-14 4:53 ` Kenichi Handa 2007-11-14 7:06 ` [Unicode-2] `C-h f' error (was Re: `read' always returns multibyte symbol) Katsumi Yamaoka 0 siblings, 1 reply; 40+ messages in thread From: Kenichi Handa @ 2007-11-14 4:53 UTC (permalink / raw) To: Stefan Monnier; +Cc: yamaoka, ding, emacs-devel In article <jwvoddykwt5.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes: > > The bug to fix is that the following code also returns t in > > emacs-unicode-2. > > < --8<---------------cut here---------------start------------->8--- > > < (let ((string (encode-coding-string "local.テスト" 'utf-8))) > > < (with-temp-buffer > > < (set-buffer-multibyte nil) > > < (insert string) > > < (goto-char (point-min)) > > < (multibyte-string-p (symbol-name (read (current-buffer)))))) > > < --8<---------------cut here---------------end--------------->8--- > Yes, that's a clear bug. I've just installed a fix. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 40+ messages in thread
* [Unicode-2] `C-h f' error (was Re: `read' always returns multibyte symbol) 2007-11-14 4:53 ` Kenichi Handa @ 2007-11-14 7:06 ` Katsumi Yamaoka 2007-11-14 13:01 ` Kenichi Handa 0 siblings, 1 reply; 40+ messages in thread From: Katsumi Yamaoka @ 2007-11-14 7:06 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel >>>>> Kenichi Handa <handa@ni.aist.go.jp> wrote: > I've just installed a fix. Thanks. Isn't it a side effect of this change? The `C-h f' command causes the following error, though it can be solved by reloading "help". Debugger entered--Lisp error: (setting-constant :validate) function-called-at-point() [...] call-interactively(describe-function) It seems that the `with-syntax-table' macro, that `function-called-at-point' uses, was not expanded properly when dumping Emacs: (disassemble 'function-called-at-point) 0 constant syntax-table 1 call 0 2 current-buffer 3 varbind :validate 4 varbind setup-function 5 constant (<byte code>...) 0 save-current-buffer 1 varref :validate 2 set-buffer Regards, ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error (was Re: `read' always returns multibyte symbol) 2007-11-14 7:06 ` [Unicode-2] `C-h f' error (was Re: `read' always returns multibyte symbol) Katsumi Yamaoka @ 2007-11-14 13:01 ` Kenichi Handa 2007-11-15 2:06 ` [Unicode-2] `C-h f' error Katsumi Yamaoka 0 siblings, 1 reply; 40+ messages in thread From: Kenichi Handa @ 2007-11-14 13:01 UTC (permalink / raw) To: Katsumi Yamaoka; +Cc: emacs-devel In article <b4mejete26t.fsf_-_@jpl.org>, Katsumi Yamaoka <yamaoka@jpl.org> writes: >>>>>> Kenichi Handa <handa@ni.aist.go.jp> wrote: > > I've just installed a fix. > Thanks. > Isn't it a side effect of this change? The `C-h f' command causes > the following error, though it can be solved by reloading "help". > Debugger entered--Lisp error: (setting-constant :validate) > function-called-at-point() > [...] > call-interactively(describe-function) I can't reproduce that bug. Perhaps because I did "make bootstrap". How about you? > It seems that the `with-syntax-table' macro, that > `function-called-at-point' uses, was not expanded properly when > dumping Emacs: > (disassemble 'function-called-at-point) > 0 constant syntax-table > 1 call 0 > 2 current-buffer > 3 varbind :validate > 4 varbind setup-function > 5 constant (<byte code>...) > 0 save-current-buffer > 1 varref :validate > 2 set-buffer I got this in *Disassemble* buffer. byte code for function-called-at-point: doc: Return a function around point or else called by the list containing point. ... args: nil 0 constant syntax-table 1 call 0 2 current-buffer 3 varbind required-features 4 varbind standard-display-european-internal 5 constant (<byte code>...) 0 save-current-buffer 1 varref required-features 2 set-buffer 3 discard 4 constant set-syntax-table 5 varref standard-display-european-internal 6 call 1 7 discard 8 unbind 1 9 constant set-syntax-table 10 return 6 unwind-protect 7 constant set-syntax-table 8 varref emacs-lisp-mode-syntax-table 9 call 1 10 discard 11 constant nil 12 constant <byte code> 0 save-excursion 1 constant zerop 2 constant skip-syntax-backward 3 constant "_w" 4 call 1 5 call 1 6 goto-if-nil 1 9 following-char 10 char-syntax 11 constant 119 12 eq 13 goto-if-not-nil 1 16 following-char 17 char-syntax 18 constant 95 19 eq 20 goto-if-not-nil 1 23 constant forward-sexp 24 constant -1 25 call 1 26 discard 27:1 constant "'" 28 constant nil 29 skip-chars-forward 30 discard 31 constant read 32 current-buffer 33 call 1 34 dup 35 varbind obj 36 symbolp 37 goto-if-nil-else-pop 2 40 constant fboundp 41 varref obj 42 call 1 43 goto-if-nil-else-pop 2 46 varref obj 47:2 unbind 2 48 return 13 constant ((error)) 14 condition-case 15 goto-if-not-nil-else-pop 1 18 constant nil 19 constant <byte code> 0 save-excursion 1 save-restriction 2 point-min 3 point 4 constant 1000 5 diff 6 max 7 point-max 8 narrow-to-region 9 discard 10 constant backward-up-list 11 constant 1 12 call 1 13 discard 14 constant 1 15 forward-char 16 discard 17 constant looking-at 18 constant "[ ]" 19 call 1 20 goto-if-nil 1 23 constant error 24 constant "Probably not a Lisp function call" 25 call 1 26 discard 27:1 constant read 28 current-buffer 29 call 1 30 dup 31 varbind obj 32 symbolp 33 goto-if-nil-else-pop 2 36 constant fboundp 37 varref obj 38 call 1 39 goto-if-nil-else-pop 2 42 varref obj 43:2 unbind 3 44 return 20 constant ((error)) 21 condition-case 22:1 unbind 3 23 goto-if-not-nil-else-pop 6 26 constant find-tag-default 27 call 0 28 dup 29 varbind str 30 goto-if-nil-else-pop 2 33 constant intern-soft 34 varref str 35 call 1 36:2 dup 37 varbind sym 38 goto-if-nil 3 41 constant fboundp 42 varref sym 43 call 1 44 goto-if-nil 3 47 varref sym 48 goto 5 51:3 constant match-data 52 call 0 53 varbind save-match-data-internal 54 constant (<byte code>...) 0 constant set-match-data 1 varref save-match-data-internal 2 constant evaporate 3 call 2 4 return 55 unwind-protect 56 varref str 57 goto-if-nil-else-pop 4 60 constant string-match 61 constant "\\`\\W*\\(.*?\\)\\W*\\'" 62 varref str 63 call 2 64 goto-if-nil-else-pop 4 67 constant intern-soft 68 constant match-string 69 constant 1 70 varref str 71 call 2 72 call 1 73 varset sym 74 constant fboundp 75 varref sym 76 call 1 77 goto-if-nil-else-pop 4 80 varref sym 81:4 unbind 2 82:5 unbind 2 83:6 return --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error 2007-11-14 13:01 ` Kenichi Handa @ 2007-11-15 2:06 ` Katsumi Yamaoka 2007-11-19 8:31 ` Katsumi Yamaoka 0 siblings, 1 reply; 40+ messages in thread From: Katsumi Yamaoka @ 2007-11-15 2:06 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel >>>>> Kenichi Handa wrote: > In article <b4mejete26t.fsf_-_@jpl.org>, > Katsumi Yamaoka <yamaoka@jpl.org> writes: >> Debugger entered--Lisp error: (setting-constant :validate) >> function-called-at-point() >> [...] >> call-interactively(describe-function) > I can't reproduce that bug. Perhaps because I did "make > bootstrap". How about you? I always do "make bootstrap" with a copy of the source checked out from CVS. I did it again but it made no difference. Hm. Though I'm not sure it is useful, the function definition of `function-called-at-point' that has been dumped in the Emacs executable is here (it differs from the one in help.elc): (fset 'function-called-at-point (read (base64-decode-string "\ I1tuaWwgIlwzMDYgcBgZXDMwN1wyMTZcMzEwCiFcMjEwXDMxMVwzMTJcMzEzXDIxN1wyMDYWAFwz MTFcMzE0XDMxNVwyMTcrXDIwNlMAXDMxNiBcMjExG1wyMDUkAFwzMTcLIVwyMTEcXDIwMzMAXDMy MAwhXDIwMzMADFwyMDJSAFwzMjEgHVwzMjJcMjE2C1wyMDVRAFwzMjNcMzI0C1wiXDIwNVEAXDMx N1wzMjVcMzI2C1wiIRRcMzIwDCFcMjA1UQAMKipcMjA3IiBbOnZhbGlkYXRlIHNldHVwLWZ1bmN0 aW9uIGVtYWNzLWxpc3AtbW9kZS1zeW50YXgtdGFibGUgc3RyIHN5bSBzYXZlLW1hdGNoLWRhdGEt aW50ZXJuYWwgc3ludGF4LXRhYmxlICgoYnl0ZS1jb2RlICJyCHFcMjEwXDMwMgkhXDIxMClcMzAy XDIwNyIgWzp2YWxpZGF0ZSBzZXR1cC1mdW5jdGlvbiBzZXQtc3ludGF4LXRhYmxlXSAyKSkgc2V0 LXN5bnRheC10YWJsZSBuaWwgKGJ5dGUtY29kZSAiXDIxMlwzMDFcMzAyXDMwMyEhXDIwMxsAZ3pc MzA0PVwyMDQbAGd6XDMwNT1cMjA0GwBcMzA2XDMwNyFcMjEwXDMxMFwzMTF3XDIxMFwzMTJwIVwy MTEYOVwyMDUvAFwzMTMIIVwyMDUvAAgqXDIwNyIgW29iaiB6ZXJvcCBza2lwLXN5bnRheC1iYWNr d2FyZCAiX3ciIDExOSA5NSBmb3J3YXJkLXNleHAgLTEgIiciIG5pbCByZWFkIGZib3VuZHBdIDQp ICgoZXJyb3IpKSAoYnl0ZS1jb2RlICJcMjEyXDIxNGVgXDMwMVpdZH1cMjEwXDMwMlwzMDMhXDIx MFwzMDN1XDIxMFwzMDRcMzA1IVwyMDMbAFwzMDZcMzA3IVwyMTBcMzEwcCFcMjExGDlcMjA1KwBc MzExCCFcMjA1KwAIK1wyMDciIFtvYmogMTAwMCBiYWNrd2FyZC11cC1saXN0IDEgbG9va2luZy1h dCAiWyAJXSIgZXJyb3IgIlByb2JhYmx5IG5vdCBhIExpc3AgZnVuY3Rpb24gY2FsbCIgcmVhZCBm Ym91bmRwXSA0KSAoKGVycm9yKSkgZmluZC10YWctZGVmYXVsdCBpbnRlcm4tc29mdCBmYm91bmRw IG1hdGNoLWRhdGEgKChieXRlLWNvZGUgIlwzMDEIXDMwMlwiXDIwNyIgW3NhdmUtbWF0Y2gtZGF0 YS1pbnRlcm5hbCBzZXQtbWF0Y2gtZGF0YSBldmFwb3JhdGVdIDMpKSBzdHJpbmctbWF0Y2ggIlxc YFxcVypcXCguKj9cXClcXFcqXFwnIiBtYXRjaC1zdHJpbmcgMV0gNSA5MDkyNDRd"))) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error 2007-11-15 2:06 ` [Unicode-2] `C-h f' error Katsumi Yamaoka @ 2007-11-19 8:31 ` Katsumi Yamaoka 2007-11-20 11:09 ` CHENG Gao 0 siblings, 1 reply; 40+ messages in thread From: Katsumi Yamaoka @ 2007-11-19 8:31 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel >>>>> Katsumi Yamaoka wrote: >>> Debugger entered--Lisp error: (setting-constant :validate) >>> function-called-at-point() >>> [...] >>> call-interactively(describe-function) I think I have reached to the real cause of this problem. Though it may happen only to me, I've tested it with two machines running different OS (Fedora 8 and RHL 9). The necessary conditions to make it happen are: A function is dumped into the Emacs executable. It uses a macro in which uninterned symbols are used in `let'. In that case, uninterned symbols seem to be replaced with the interned ones when dumping into Emacs. The way I reproduced it is: 1. Make the /tmp/test.el file (attached below) and byte compile it. 2. Modify the lisp/loadup.el file as follows: --8<---------------cut here---------------start------------->8--- *** loadup.el~ Sun Nov 11 21:51:19 2007 --- loadup.el Mon Nov 19 08:14:23 2007 *************** *** 85,88 **** --- 85,89 ---- (load "simple") + (load "/tmp/test") (load "help") --8<---------------cut here---------------end--------------->8--- 3. Dump Emacs in this way: $ cd src $ ./temacs -batch -l loadup dump 4. Run Emacs as: $ ./emacs -batch -Q -eval '(foo)' I got: set-display-table-and-terminal-coding-system reset-language-environment English 5. Run Emacs as: $ ./emacs -batch -Q -l /tmp/test -eval '(foo)' I got: foo bar baz The test.el file is here: --8<---------------cut here---------------start------------->8--- (defmacro foo-macro nil (let ((foo (make-symbol "foo")) (bar (make-symbol "bar")) (baz (make-symbol "baz"))) `(message "%s %s %s" ',foo ',bar ',baz))) (defun foo nil (foo-macro)) --8<---------------cut here---------------end--------------->8--- Regards, ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error 2007-11-19 8:31 ` Katsumi Yamaoka @ 2007-11-20 11:09 ` CHENG Gao 2007-11-21 10:55 ` Katsumi Yamaoka 0 siblings, 1 reply; 40+ messages in thread From: CHENG Gao @ 2007-11-20 11:09 UTC (permalink / raw) To: emacs-devel *On Mon, 19 Nov 2007 17:31:02 +0900 * Also sprach Katsumi Yamaoka <yamaoka@jpl.org>: > I think I have reached to the real cause of this problem. Though > it may happen only to me, I've tested it with two machines running > different OS (Fedora 8 and RHL 9). The necessary conditions to > make it happen are: Yesterday I reported this problem. I just read you already found this. This is to confirm this problem does not only happen to you. -- Vivere est cogitare ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error 2007-11-20 11:09 ` CHENG Gao @ 2007-11-21 10:55 ` Katsumi Yamaoka 2007-11-21 12:14 ` Kenichi Handa 0 siblings, 1 reply; 40+ messages in thread From: Katsumi Yamaoka @ 2007-11-21 10:55 UTC (permalink / raw) To: CHENG Gao; +Cc: emacs-devel >>>>> CHENG Gao wrote: > *On Mon, 19 Nov 2007 17:31:02 +0900 > * Also sprach Katsumi Yamaoka <yamaoka@jpl.org>: >> I think I have reached to the real cause of this problem. Though >> it may happen only to me, I've tested it with two machines running >> different OS (Fedora 8 and RHL 9). The necessary conditions to >> make it happen are: > Yesterday I reported this problem. I just read you already found this. > This is to confirm this problem does not only happen to you. Thanks for the confirmation. I tried bootstrapping Unicode-2 with lread.c before Handa-san changed. It works normal. For the Lisp form (aref (symbol-function (function function-called-at-point)) 2) it returns [buffer table emacs-lisp-mode-syntax-table str sym ... while the latest Unicode-2 returns: [:validate setup-function emacs-lisp-mode-syntax-table str sym ... Regards, ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error 2007-11-21 10:55 ` Katsumi Yamaoka @ 2007-11-21 12:14 ` Kenichi Handa 2007-11-21 12:28 ` Katsumi Yamaoka ` (2 more replies) 0 siblings, 3 replies; 40+ messages in thread From: Kenichi Handa @ 2007-11-21 12:14 UTC (permalink / raw) To: Katsumi Yamaoka; +Cc: emacs-devel, chenggao In article <b4mbq9nlvf3.fsf@jpl.org>, Katsumi Yamaoka <yamaoka@jpl.org> writes: > Thanks for the confirmation. I tried bootstrapping Unicode-2 > with lread.c before Handa-san changed. It works normal. For > the Lisp form > (aref (symbol-function (function function-called-at-point)) 2) > it returns > [buffer table emacs-lisp-mode-syntax-table str sym ... > while the latest Unicode-2 returns: > [:validate setup-function emacs-lisp-mode-syntax-table str sym ... Thank you for investigating this problem. But, as I don't have a time to work on it at the moment, I fixed lread.c so that it works as previously. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error 2007-11-21 12:14 ` Kenichi Handa @ 2007-11-21 12:28 ` Katsumi Yamaoka 2007-11-22 2:27 ` Richard Stallman 2007-11-23 15:20 ` Johan Bockgård 2 siblings, 0 replies; 40+ messages in thread From: Katsumi Yamaoka @ 2007-11-21 12:28 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel, chenggao >>>>> Kenichi Handa wrote: > Thank you for investigating this problem. But, as I don't > have a time to work on it at the moment, I fixed lread.c so > that it works as previously. No need to fix it in a hurry (at least for me), please take your time. Thanks. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error 2007-11-21 12:14 ` Kenichi Handa 2007-11-21 12:28 ` Katsumi Yamaoka @ 2007-11-22 2:27 ` Richard Stallman 2007-11-22 4:51 ` Kenichi Handa 2007-11-23 15:20 ` Johan Bockgård 2 siblings, 1 reply; 40+ messages in thread From: Richard Stallman @ 2007-11-22 2:27 UTC (permalink / raw) To: Kenichi Handa; +Cc: yamaoka, chenggao, emacs-devel Thank you for investigating this problem. But, as I don't have a time to work on it at the moment, I fixed lread.c so that it works as previously. What is the lread.c behavior that you changed? What is the new behavior? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error 2007-11-22 2:27 ` Richard Stallman @ 2007-11-22 4:51 ` Kenichi Handa 2007-11-22 16:22 ` Richard Stallman 0 siblings, 1 reply; 40+ messages in thread From: Kenichi Handa @ 2007-11-22 4:51 UTC (permalink / raw) To: rms; +Cc: yamaoka, emacs-devel, chenggao In article <E1Iv1mx-0003q8-0I@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes: > Thank you for investigating this problem. But, as I don't > have a time to work on it at the moment, I fixed lread.c so > that it works as previously. > What is the lread.c behavior that you changed? Make the Lisp reader to generate a symbol of unibyte name when it is read from a unibyte buffer. Previously, the multibyteness of a symbol name is determined by the byte sequence (by using make_string). > What is the new behavior? The following phenomenon was reported. Katsumi Yamaoka <yamaoka@jpl.org> writes: > Isn't it a side effect of this change? The `C-h f' command causes > the following error, though it can be solved by reloading "help". > Debugger entered--Lisp error: (setting-constant :validate) > function-called-at-point() > [...] > call-interactively(describe-function) > It seems that the `with-syntax-table' macro, that > `function-called-at-point' uses, was not expanded properly when > dumping Emacs: > (disassemble 'function-called-at-point) > 0 constant syntax-table > 1 call 0 > 2 current-buffer > 3 varbind :validate > 4 varbind setup-function > 5 constant (<byte code>...) > 0 save-current-buffer > 1 varref :validate > 2 set-buffer --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error 2007-11-22 4:51 ` Kenichi Handa @ 2007-11-22 16:22 ` Richard Stallman 0 siblings, 0 replies; 40+ messages in thread From: Richard Stallman @ 2007-11-22 16:22 UTC (permalink / raw) To: Kenichi Handa; +Cc: yamaoka, chenggao, emacs-devel Make the Lisp reader to generate a symbol of unibyte name when it is read from a unibyte buffer. Previously, the multibyteness of a symbol name is determined by the byte sequence (by using make_string). Please add a comment explaining the bad results that happened when we tried the other way. (Unless you already did so.) It is very important to explain this IN the source file. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error 2007-11-21 12:14 ` Kenichi Handa 2007-11-21 12:28 ` Katsumi Yamaoka 2007-11-22 2:27 ` Richard Stallman @ 2007-11-23 15:20 ` Johan Bockgård 2007-11-25 12:35 ` Kenichi Handa 2007-11-25 12:39 ` Kenichi Handa 2 siblings, 2 replies; 40+ messages in thread From: Johan Bockgård @ 2007-11-23 15:20 UTC (permalink / raw) To: emacs-devel Kenichi Handa <handa@ni.aist.go.jp> writes: > Thank you for investigating this problem. But, as I don't > have a time to work on it at the moment, I fixed lread.c so > that it works as previously. Your change replaced make_symbol with Fmake_symbol (and intern with Fintern), and make_symbol does Fmake_symbol ((!NILP (Vpurify_flag) ? make_pure_string (str, len, len, 0) : make_string (str, len))); In the make_symbol/Fmake_symbol pair of functions, the Vpurify_flag check is in the former (so is not done after the change); but in the intern/Fintern pair it is in the latter. Isn't this the problem? -- Johan Bockgård ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error 2007-11-23 15:20 ` Johan Bockgård @ 2007-11-25 12:35 ` Kenichi Handa 2007-12-02 21:27 ` Richard Stallman 2007-11-25 12:39 ` Kenichi Handa 1 sibling, 1 reply; 40+ messages in thread From: Kenichi Handa @ 2007-11-25 12:35 UTC (permalink / raw) To: Johan =?ISO-2022-JP-2?B?Qm9ja2c=GyQoRCspGyhCcmQ=?=; +Cc: emacs-devel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=ISO-2022-JP-2, Size: 813 bytes --] In article <yoijve7tvvis.fsf@remote1.student.chalmers.se>, bojohan+news@dd.chalmers.se (Johan Bockg^[$(D+)^[(Brd) writes: > Your change replaced make_symbol with Fmake_symbol (and intern with > Fintern), and make_symbol does > Fmake_symbol ((!NILP (Vpurify_flag) > ? make_pure_string (str, len, len, 0) > : make_string (str, len))); > In the make_symbol/Fmake_symbol pair of functions, the Vpurify_flag > check is in the former (so is not done after the change); but in the > intern/Fintern pair it is in the latter. Isn't this the problem? Ah! Perhaps. But, I don't understand the reason of calling make_pure_string always with the last arg multibyte as 0. Richard, don't you remember anything? It seems that this part was lastly modified by you about 10 years ago. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error 2007-11-25 12:35 ` Kenichi Handa @ 2007-12-02 21:27 ` Richard Stallman 2007-12-05 5:11 ` Kenichi Handa 0 siblings, 1 reply; 40+ messages in thread From: Richard Stallman @ 2007-12-02 21:27 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel, bojohan+news > Fmake_symbol ((!NILP (Vpurify_flag) > ? make_pure_string (str, len, len, 0) > : make_string (str, len))); > In the make_symbol/Fmake_symbol pair of functions, the Vpurify_flag > check is in the former (so is not done after the change); but in the > intern/Fintern pair it is in the latter. Isn't this the problem? Ah! Perhaps. But, I don't understand the reason of calling make_pure_string always with the last arg multibyte as 0. Richard, don't you remember anything? Not any more. Sorry. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error 2007-12-02 21:27 ` Richard Stallman @ 2007-12-05 5:11 ` Kenichi Handa 2007-12-05 11:26 ` Katsumi Yamaoka 0 siblings, 1 reply; 40+ messages in thread From: Kenichi Handa @ 2007-12-05 5:11 UTC (permalink / raw) To: rms; +Cc: yamaoka, bojohan+news, emacs-devel In article <E1IywL8-0006cL-L4@fencepost.gnu.org>, Richard Stallman <rms@gnu.org> writes: > In the make_symbol/Fmake_symbol pair of functions, the Vpurify_flag > check is in the former (so is not done after the change); but in the > intern/Fintern pair it is in the latter. Isn't this the problem? > Ah! Perhaps. But, I don't understand the reason of calling > make_pure_string always with the last arg multibyte as 0. > Richard, don't you remember anything? > Not any more. > Sorry. Ok, I fixed the code by checking Vpurfiy_flag. Please check if the C-h f problem is fixed or not. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error 2007-12-05 5:11 ` Kenichi Handa @ 2007-12-05 11:26 ` Katsumi Yamaoka 0 siblings, 0 replies; 40+ messages in thread From: Katsumi Yamaoka @ 2007-12-05 11:26 UTC (permalink / raw) To: Kenichi Handa; +Cc: emacs-devel, rms, bojohan+news >>>>> Kenichi Handa wrote: > Ok, I fixed the code by checking Vpurfiy_flag. Please check > if the C-h f problem is fixed or not. It works as expected. Thank you. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `C-h f' error 2007-11-23 15:20 ` Johan Bockgård 2007-11-25 12:35 ` Kenichi Handa @ 2007-11-25 12:39 ` Kenichi Handa 1 sibling, 0 replies; 40+ messages in thread From: Kenichi Handa @ 2007-11-25 12:39 UTC (permalink / raw) To: Johan =?ISO-2022-JP-2?B?Qm9ja2c=GyQoRCspGyhCcmQ=?=; +Cc: emacs-devel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=ISO-2022-JP-2, Size: 988 bytes --] In article <yoijve7tvvis.fsf@remote1.student.chalmers.se>, bojohan+news@dd.chalmers.se (Johan Bockg^[$(D+)^[(Brd) writes: > Your change replaced make_symbol with Fmake_symbol (and intern with > Fintern), and make_symbol does > Fmake_symbol ((!NILP (Vpurify_flag) > ? make_pure_string (str, len, len, 0) > : make_string (str, len))); > In the make_symbol/Fmake_symbol pair of functions, the Vpurify_flag > check is in the former (so is not done after the change); but in the > intern/Fintern pair it is in the latter. Isn't this the problem? Ah! Good point! Perhaps you are right. But, I don't understand the reason of calling make_pure_string always with the last arg multibyte as 0. It isn't consistent with the case of Vpurify_flag is nil (letting make_string determine the multibyteness). Richard, don't you remember anything? It seems that this part was lastly modified by you about 10 years ago. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-13 12:55 ` Kenichi Handa 2007-11-13 15:10 ` Stefan Monnier @ 2007-11-14 3:56 ` Katsumi Yamaoka 2007-11-14 11:39 ` Katsumi Yamaoka 1 sibling, 1 reply; 40+ messages in thread From: Katsumi Yamaoka @ 2007-11-14 3:56 UTC (permalink / raw) To: Kenichi Handa; +Cc: ding, emacs-devel >>>>> Kenichi Handa <handa@ni.aist.go.jp> wrote: > In addition, I think it is the right thing that the above > code return t; i.e. any symbol created by reading a > multibyte buffer should have a multibyte string name. I agree with that behavior. > The bug to fix is that the following code also returns t in > emacs-unicode-2. > < --8<---------------cut here---------------start------------->8--- > < (let ((string (encode-coding-string "local.テスト" 'utf-8))) > < (with-temp-buffer > < (set-buffer-multibyte nil) > < (insert string) > < (goto-char (point-min)) > < (multibyte-string-p (symbol-name (read (current-buffer)))))) > < --8<---------------cut here---------------end--------------->8--- Sure. I'll try using a unibyte buffer to parse active data (after the bug is fixed). Regards, ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-14 3:56 ` [Unicode-2] `read' always returns multibyte symbol Katsumi Yamaoka @ 2007-11-14 11:39 ` Katsumi Yamaoka 2007-11-14 14:52 ` Stefan Monnier 2007-11-15 10:20 ` Katsumi Yamaoka 0 siblings, 2 replies; 40+ messages in thread From: Katsumi Yamaoka @ 2007-11-14 11:39 UTC (permalink / raw) To: Kenichi Handa; +Cc: ding, emacs-devel [-- Attachment #1: Type: text/plain, Size: 509 bytes --] >>>>> Katsumi Yamaoka wrote: > I'll try using a unibyte buffer to parse active data (after the bug > is fixed). Handa-san, thank you for the fix in Unicode-2. I've also made a change in the Gnus CVS trunk so that it may use a unibyte buffer. Now it works not only with Emacs 23.0.60 but also with Emacs 22.1, 22.1.50, and 23.0.50. BTW, I found another problem with Emacs 21 (Gnus still supports Emacs 21, IIUC). So, I'll go on looking into it further. The diff between Gnus trunk and Unicode-2 is here: [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: Type: text/x-patch, Size: 1379 bytes --] *** gnus-start.el~ Sun Nov 11 21:51:22 2007 --- gnus-start.el Wed Nov 14 11:32:28 2007 *************** *** 2106,2112 **** (if (equal method gnus-select-method) (gnus-make-hashtable (count-lines (point-min) (point-max))) ! (gnus-make-hashtable 4096))))))) ;; Delete unnecessary lines. (goto-char (point-min)) (cond --- 2106,2113 ---- (if (equal method gnus-select-method) (gnus-make-hashtable (count-lines (point-min) (point-max))) ! (gnus-make-hashtable 4096)))))) ! group max min) ;; Delete unnecessary lines. (goto-char (point-min)) (cond *************** *** 2141,2148 **** (insert prefix) (zerop (forward-line 1))))))) ;; Store the active file in a hash table. ! (goto-char (point-min)) ! (let (group max min) (while (not (eobp)) (condition-case () (progn --- 2142,2153 ---- (insert prefix) (zerop (forward-line 1))))))) ;; Store the active file in a hash table. ! ;; Use a unibyte buffer in order to make `read' read non-ASCII ! ;; group names (which have been encoded) as unibyte strings. ! (mm-with-unibyte-buffer ! (insert-buffer-substring cur) ! (setq cur (current-buffer)) ! (goto-char (point-min)) (while (not (eobp)) (condition-case () (progn [-- Attachment #3: Type: text/plain, Size: 9 bytes --] Regards, [-- Attachment #4: Type: text/plain, Size: 142 bytes --] _______________________________________________ Emacs-devel mailing list Emacs-devel@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-devel ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-14 11:39 ` Katsumi Yamaoka @ 2007-11-14 14:52 ` Stefan Monnier 2007-11-14 23:52 ` Katsumi Yamaoka 2007-11-15 10:20 ` Katsumi Yamaoka 1 sibling, 1 reply; 40+ messages in thread From: Stefan Monnier @ 2007-11-14 14:52 UTC (permalink / raw) To: Katsumi Yamaoka; +Cc: Kenichi Handa, ding, emacs-devel > ! ;; Use a unibyte buffer in order to make `read' read non-ASCII > ! ;; group names (which have been encoded) as unibyte strings. > ! (mm-with-unibyte-buffer > ! (insert-buffer-substring cur) Why is `cur' a multibyte buffer? Since it contains encoded strings, I'd expect it would be better (more robust and convenient) to use a unibyte buffer for it. Stefan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-14 14:52 ` Stefan Monnier @ 2007-11-14 23:52 ` Katsumi Yamaoka 2007-11-15 1:15 ` Stefan Monnier 0 siblings, 1 reply; 40+ messages in thread From: Katsumi Yamaoka @ 2007-11-14 23:52 UTC (permalink / raw) To: Stefan Monnier; +Cc: Kenichi Handa, ding, emacs-devel >>>>> Stefan Monnier wrote: >> ! ;; Use a unibyte buffer in order to make `read' read non-ASCII >> ! ;; group names (which have been encoded) as unibyte strings. >> ! (mm-with-unibyte-buffer >> ! (insert-buffer-substring cur) > Why is `cur' a multibyte buffer? Since it contains encoded strings, I'd > expect it would be better (more robust and convenient) to use a unibyte > buffer for it. Good point. The `cur' is `nntp-server-buffer' (" *nntpd*") or `gnus-work-buffer' (" *gnus work*") as the case may be. Gnus uses those buffers for various purposes. Although there looks no situation where it is necessary to have multibyte data as far as I can observe, Gnus explicitly sets them as multibyte buffers (see `nnheader-init-server-buffer' and `gnus-set-work-buffer'). I believe the reason they do so is to prevent from breaking data when copying them to another multibyte buffer (IIUC, copying data from a multibyte buffer to a unibyte buffer causes no problem). So, I didn't modify those buffers' multibyteness. If I introduced a new unibyte work buffer (such as " *gnus binary work*"), it required that `gnus-read-active-file-2' binds `nntp-server-buffer' to it for example. It is used by all the back ends but I'm not sure it never causes a problem with them all. Regards, ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-14 23:52 ` Katsumi Yamaoka @ 2007-11-15 1:15 ` Stefan Monnier 2007-11-15 3:01 ` Katsumi Yamaoka 0 siblings, 1 reply; 40+ messages in thread From: Stefan Monnier @ 2007-11-15 1:15 UTC (permalink / raw) To: Katsumi Yamaoka; +Cc: Kenichi Handa, ding, emacs-devel >>> ! ;; Use a unibyte buffer in order to make `read' read non-ASCII >>> ! ;; group names (which have been encoded) as unibyte strings. >>> ! (mm-with-unibyte-buffer >>> ! (insert-buffer-substring cur) >> Why is `cur' a multibyte buffer? Since it contains encoded strings, I'd >> expect it would be better (more robust and convenient) to use a unibyte >> buffer for it. > Good point. The `cur' is `nntp-server-buffer' (" *nntpd*") or > `gnus-work-buffer' (" *gnus work*") as the case may be. Don't know about gnus-work-buffer, but nntp-server-buffer should only ever contain unibyte data AFAICT, so it would be better to put it in unibyte mode. > Gnus uses those buffers for various purposes. Although there looks no > situation where it is necessary to have multibyte data as far as I can > observe, Gnus explicitly sets them as multibyte buffers (see > `nnheader-init-server-buffer' and `gnus-set-work-buffer'). > I believe the reason they do so is to prevent from breaking data when > copying them to another multibyte buffer (IIUC, copying data from > a multibyte buffer to a unibyte buffer causes no problem). I'm not sure I understand: copying data from a multibyte buffer to a unibyte buffer is exactly the case that can cause problems. Stefan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-15 1:15 ` Stefan Monnier @ 2007-11-15 3:01 ` Katsumi Yamaoka 2007-11-15 3:39 ` Stefan Monnier 0 siblings, 1 reply; 40+ messages in thread From: Katsumi Yamaoka @ 2007-11-15 3:01 UTC (permalink / raw) To: Stefan Monnier; +Cc: Kenichi Handa, ding, emacs-devel >>>>> Stefan Monnier wrote: > Don't know about gnus-work-buffer, but nntp-server-buffer should only > ever contain unibyte data AFAICT, so it would be better to put it in > unibyte mode. I think it's better, too. However, there might be a code that copies data from nntp-server-buffer to a multibyte buffer. I'm not capable to check all the Gnus code. >> (IIUC, copying data from a multibyte buffer to a unibyte buffer >> causes no problem). > I'm not sure I understand: copying data from a multibyte buffer to > a unibyte buffer is exactly the case that can cause problems. I agree that's generally true. But in Gnus' case, data in a multibyte work buffer are the multibyte version of binary data. I don't know proper words to explain it, sorry. In other words, they are the one which `string-to-multibyte' converted binary data to. For example: (with-temp-buffer (set-buffer-multibyte t) (insert (string-to-multibyte (encode-coding-string "日本語" 'utf-8))) (let ((buffer (current-buffer))) (with-temp-buffer (set-buffer-multibyte nil) (insert-buffer-substring buffer) (decode-coding-string (buffer-string) 'utf-8)))) => "日本語" I'm not sure it works with any data, though. Regards, ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-15 3:01 ` Katsumi Yamaoka @ 2007-11-15 3:39 ` Stefan Monnier 0 siblings, 0 replies; 40+ messages in thread From: Stefan Monnier @ 2007-11-15 3:39 UTC (permalink / raw) To: Katsumi Yamaoka; +Cc: Kenichi Handa, ding, emacs-devel > I think it's better, too. However, there might be a code that > copies data from nntp-server-buffer to a multibyte buffer. I'm > not capable to check all the Gnus code. I understand the desire to avoid changing code, but I think in the long run it'll pay off. >>> (IIUC, copying data from a multibyte buffer to a unibyte buffer >>> causes no problem). >> I'm not sure I understand: copying data from a multibyte buffer to >> a unibyte buffer is exactly the case that can cause problems. > I agree that's generally true. But in Gnus' case, data in a > multibyte work buffer are the multibyte version of binary data. > I don't know proper words to explain it, sorry. In other words, > they are the one which `string-to-multibyte' converted binary > data to. For example: > (with-temp-buffer > (set-buffer-multibyte t) > (insert (string-to-multibyte (encode-coding-string "日本語" 'utf-8))) > (let ((buffer (current-buffer))) > (with-temp-buffer > (set-buffer-multibyte nil) > (insert-buffer-substring buffer) > (decode-coding-string (buffer-string) 'utf-8)))) > => "日本語" > I'm not sure it works with any data, though. I'm not sure what you're saying. But IIUC the source buffer in your example would be nntp-server-buffer, in which case turning it into unibyte will not introduce any problem. On the contrary, it'll make it more obviously correct. Stefan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-14 11:39 ` Katsumi Yamaoka 2007-11-14 14:52 ` Stefan Monnier @ 2007-11-15 10:20 ` Katsumi Yamaoka 2007-11-15 11:08 ` Kenichi Handa 1 sibling, 1 reply; 40+ messages in thread From: Katsumi Yamaoka @ 2007-11-15 10:20 UTC (permalink / raw) To: ding; +Cc: emacs-devel >>>>> Katsumi Yamaoka wrote: > BTW, I found another problem with Emacs 21 (Gnus still supports > Emacs 21, IIUC). So, I'll go on looking into it further. I realized a network process that is created by `open-network-stream' in Emacs 21 breaks encoded non-ASCII group names if the process buffer is in the multibyte mode even if the process coding system is binary. It behaves as if `toggle-enable-multibyte-characters' modifies binary data when turning on the multibyteness of a buffer. So, I made changes in nntp.el in the Gnus trunk so that it makes a process buffer unibyte. I also modified the nntp functions that copy data from a unibyte buffer to a multibyte buffer. Regards, ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-15 10:20 ` Katsumi Yamaoka @ 2007-11-15 11:08 ` Kenichi Handa 2007-11-15 11:41 ` Katsumi Yamaoka 2007-11-15 15:22 ` Stefan Monnier 0 siblings, 2 replies; 40+ messages in thread From: Kenichi Handa @ 2007-11-15 11:08 UTC (permalink / raw) To: Katsumi Yamaoka; +Cc: ding, emacs-devel In article <b4m1war6ca5.fsf@jpl.org>, Katsumi Yamaoka <yamaoka@jpl.org> writes: >>>>>> Katsumi Yamaoka wrote: > > BTW, I found another problem with Emacs 21 (Gnus still supports > > Emacs 21, IIUC). So, I'll go on looking into it further. > I realized a network process that is created by > `open-network-stream' in Emacs 21 breaks encoded non-ASCII group > names if the process buffer is in the multibyte mode even if the > process coding system is binary. It behaves as if > `toggle-enable-multibyte-characters' modifies binary data when > turning on the multibyteness of a buffer. If "modifies" means that 8-bit bytes are converted to multibyte characters as what string-as-multibyte does, it's an expected behaviour. I long ago proposed a facility that turns on the multibyteness of a buffer while converting 8-bit bytes to multibyte characters as what string-to-multibyte does, but not accepted. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-15 11:08 ` Kenichi Handa @ 2007-11-15 11:41 ` Katsumi Yamaoka 2007-11-15 14:41 ` Kenichi Handa 2007-11-15 15:22 ` Stefan Monnier 1 sibling, 1 reply; 40+ messages in thread From: Katsumi Yamaoka @ 2007-11-15 11:41 UTC (permalink / raw) To: Kenichi Handa; +Cc: ding, emacs-devel >>>>> Kenichi Handa wrote: > In article <b4m1war6ca5.fsf@jpl.org>, > Katsumi Yamaoka <yamaoka@jpl.org> writes: >> I realized a network process that is created by >> `open-network-stream' in Emacs 21 breaks encoded non-ASCII group >> names if the process buffer is in the multibyte mode even if the >> process coding system is binary. It behaves as if >> `toggle-enable-multibyte-characters' modifies binary data when >> turning on the multibyteness of a buffer. (The changes that I made in nntp.el has been archived in <URL:http://article.gmane.org/gmane.emacs.gnus.commits/5519>.) > If "modifies" means that 8-bit bytes are converted to > multibyte characters as what string-as-multibyte does, it's > an expected behaviour. What I observed was different. The group name "テスト" is encoded by utf-8 by the nntp server into: "\343\203\206\343\202\271\343\203\210" After it is transferred to Gnus, in the nntp process bufer it is modified into: "\343\203XY\343\203\210" Where X is (make-char 'greek-iso8859-7 99) and Y is (make-char 'latin-iso8859-2 57). Since Gnus treats a group name as a unibyte string, finally it is made into: "\343\203\343\271\343\203\210" > I long ago proposed a facility that turns on the > multibyteness of a buffer while converting 8-bit bytes to > multibyte characters as what string-to-multibyte does, but > not accepted. But the modern Emacsen does do so, doesn't it? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-15 11:41 ` Katsumi Yamaoka @ 2007-11-15 14:41 ` Kenichi Handa 2007-11-15 23:31 ` Katsumi Yamaoka 0 siblings, 1 reply; 40+ messages in thread From: Kenichi Handa @ 2007-11-15 14:41 UTC (permalink / raw) To: Katsumi Yamaoka; +Cc: ding, emacs-devel [-- Attachment #1: Type: text/plain, Size: 1611 bytes --] In article <b4moddv20sy.fsf@jpl.org>, Katsumi Yamaoka <yamaoka@jpl.org> writes: > > If "modifies" means that 8-bit bytes are converted to > > multibyte characters as what string-as-multibyte does, it's > > an expected behaviour. > What I observed was different. The group name "テスト" is > encoded by utf-8 by the nntp server into: > "\343\203\206\343\202\271\343\203\210" > After it is transferred to Gnus, in the nntp process bufer it is > modified into: > "\343\203XY\343\203\210" > Where X is (make-char 'greek-iso8859-7 99) > and Y is (make-char 'latin-iso8859-2 57). That is exactly what string-as-multibyte does. \206\343 and \202\271 are valid multibyte forms in the current Emacs, thus are treated as multibyte characters. > Since Gnus treats a group name as a unibyte string, finally it > is made into: > "\343\203\343\271\343\203\210" It seems that gnus treats "\343\203XY\343\203\210" as unibyte by converting it by string-make-unibyte. Please try this: (string-make-unibyte (string-as-multibyte "\343\203\206\343\202\271\343\203\210")) You'll get the above result, ... yes, very weird. On the other hand, (string-as-unibyte (string-as-multibyte "\343\203\206\343\202\271\343\203\210")) => "\343\203\206\343\202\271\343\203\210" > > I long ago proposed a facility that turns on the > > multibyteness of a buffer while converting 8-bit bytes to > > multibyte characters as what string-to-multibyte does, but > > not accepted. > But the modern Emacsen does do so, doesn't it? No. --- Kenichi Handa handa@ni.aist.go.jp [-- Attachment #2: Type: text/plain, Size: 142 bytes --] _______________________________________________ Emacs-devel mailing list Emacs-devel@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-devel ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-15 14:41 ` Kenichi Handa @ 2007-11-15 23:31 ` Katsumi Yamaoka 2007-11-16 0:51 ` Kenichi Handa 0 siblings, 1 reply; 40+ messages in thread From: Katsumi Yamaoka @ 2007-11-15 23:31 UTC (permalink / raw) To: Kenichi Handa; +Cc: ding, emacs-devel >>>>> Kenichi Handa <handa@ni.aist.go.jp> wrote: > In article <b4moddv20sy.fsf@jpl.org>, > Katsumi Yamaoka <yamaoka@jpl.org> writes: >> What I observed was different. > That is exactly what string-as-multibyte does. \206\343 and > \202\271 are valid multibyte forms in the current Emacs, > thus are treated as multibyte characters. I understood why such readable characters appeared abruptly. [...] > Please try this: > (string-make-unibyte > (string-as-multibyte "\343\203\206\343\202\271\343\203\210")) > You'll get the above result, ... yes, very weird. Oh, it made me surprised a bit. But I often view such a scene while playing with unibyte and multibyte things, and it always confuses me. > On the other hand, > (string-as-unibyte > (string-as-multibyte "\343\203\206\343\202\271\343\203\210")) > => "\343\203\206\343\202\271\343\203\210" >>> I long ago proposed a facility that turns on the >>> multibyteness of a buffer while converting 8-bit bytes to >>> multibyte characters as what string-to-multibyte does, but >>> not accepted. >> But the modern Emacsen does do so, doesn't it? > No. Oops. I misunderstood that the reason why Emacs 22 and 23 don't break 8-bit data while they are being fed into a multibyte buffer from a network process of which the process coding system is binary. So, maybe the best ways for the present are still to use a unibyte buffer for unibyte data and to use a multibyte buffer for multibyte data. And use a string, not a buffer, to encode and decode data if the multibyteness of data will change, like: (insert (prog1 (decode-coding-string (buffer-string) 'coding) (erase-buffer) (set-buffer-multibyte t))) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-15 23:31 ` Katsumi Yamaoka @ 2007-11-16 0:51 ` Kenichi Handa 2007-11-16 1:24 ` Katsumi Yamaoka 0 siblings, 1 reply; 40+ messages in thread From: Kenichi Handa @ 2007-11-16 0:51 UTC (permalink / raw) To: Katsumi Yamaoka; +Cc: ding, emacs-devel In article <b4my7czxeze.fsf@jpl.org>, Katsumi Yamaoka <yamaoka@jpl.org> writes: > Oops. I misunderstood that the reason why Emacs 22 and 23 don't > break 8-bit data while they are being fed into a multibyte buffer > from a network process of which the process coding system is > binary. So, maybe the best ways for the present are still to > use a unibyte buffer for unibyte data and to use a multibyte > buffer for multibyte data. And use a string, not a buffer, to > encode and decode data if the multibyteness of data will change, > like: > (insert (prog1 > (decode-coding-string (buffer-string) 'coding) > (erase-buffer) > (set-buffer-multibyte t))) The best is to decide buffer's multibyteness just after it is created, and don't change the multibyteness later. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-16 0:51 ` Kenichi Handa @ 2007-11-16 1:24 ` Katsumi Yamaoka 2007-11-16 2:51 ` Stefan Monnier 0 siblings, 1 reply; 40+ messages in thread From: Katsumi Yamaoka @ 2007-11-16 1:24 UTC (permalink / raw) To: Kenichi Handa; +Cc: ding, emacs-devel >>>>> Kenichi Handa wrote: > In article <b4my7czxeze.fsf@jpl.org>, > Katsumi Yamaoka <yamaoka@jpl.org> writes: >> (insert (prog1 >> (decode-coding-string (buffer-string) 'coding) >> (erase-buffer) >> (set-buffer-multibyte t))) > The best is to decide buffer's multibyteness just after it > is created, and don't change the multibyteness later. I see. In relation to this, I've been wanting to exterminate the `mm-with-unibyte-current-buffer' macro that Gnus uses here and there (if you have time, please look at how it is evil, in mm-util.el). ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-16 1:24 ` Katsumi Yamaoka @ 2007-11-16 2:51 ` Stefan Monnier 0 siblings, 0 replies; 40+ messages in thread From: Stefan Monnier @ 2007-11-16 2:51 UTC (permalink / raw) To: Katsumi Yamaoka; +Cc: Kenichi Handa, ding, emacs-devel > I see. In relation to this, I've been wanting to exterminate > the `mm-with-unibyte-current-buffer' macro that Gnus uses here > and there (if you have time, please look at how it is evil, in > mm-util.el). Yes, I spotted it a while ago already (I'm using a few local hacks to try and catch some multi/unibyte abuses so I tend to bump into bugs a bit earlier than in normal use). I think a mistake of Emacs's handling of encoding issues is that we use "unibyte" and "multibyte" rather than "byteS" and chars". Stefan PS: Here are some hunks from my local changes. @@ -1034,16 +1068,18 @@ (defmacro mm-with-unibyte-buffer (&rest forms) "Create a temporary buffer, and evaluate FORMS there like `progn'. Use unibyte mode for this." - `(let (default-enable-multibyte-characters) - (with-temp-buffer ,@forms))) + `(with-temp-buffer + (mm-disable-multibyte) + ,@forms)) (put 'mm-with-unibyte-buffer 'lisp-indent-function 0) (put 'mm-with-unibyte-buffer 'edebug-form-spec '(body)) (defmacro mm-with-multibyte-buffer (&rest forms) "Create a temporary buffer, and evaluate FORMS there like `progn'. Use multibyte mode for this." - `(let ((default-enable-multibyte-characters t)) - (with-temp-buffer ,@forms))) + `(with-temp-buffer + (mm-enable-multibyte) + ,@forms)) (put 'mm-with-multibyte-buffer 'lisp-indent-function 0) (put 'mm-with-multibyte-buffer 'edebug-form-spec '(body)) @@ -1058,24 +1094,29 @@ harmful since it is likely to modify existing data in the buffer. For instance, it converts \"\\300\\255\" into \"\\255\" in Emacs 23 (unicode)." - (let ((multibyte (make-symbol "multibyte")) - (buffer (make-symbol "buffer"))) - `(if mm-emacs-mule - (let ((,multibyte enable-multibyte-characters) - (,buffer (current-buffer))) - (unwind-protect - (let (default-enable-multibyte-characters) - (set-buffer-multibyte nil) - ,@forms) - (set-buffer ,buffer) - (set-buffer-multibyte ,multibyte))) - (let (default-enable-multibyte-characters) - ,@forms)))) + (message "Braindeadly defined macro: mm-with-unibyte-current-buffer") + ;; (let ((multibyte (make-symbol "multibyte")) + ;; (buffer (make-symbol "buffer"))) + ;; `(if mm-emacs-mule + ;; (let ((,multibyte enable-multibyte-characters) + ;; (,buffer (current-buffer))) + ;; (unwind-protect + ;; (let (default-enable-multibyte-characters) + ;; (set-buffer-multibyte nil) + ;; ,@forms) + ;; (set-buffer ,buffer) + ;; (set-buffer-multibyte ,multibyte))) + ;; (let (default-enable-multibyte-characters) + ;; ,@forms))) + `(progn (assert (not enable-multibyte-characters)) + ,@forms) + ) (put 'mm-with-unibyte-current-buffer 'lisp-indent-function 0) (put 'mm-with-unibyte-current-buffer 'edebug-form-spec '(body)) (defmacro mm-with-unibyte (&rest forms) "Eval the FORMS with the default value of `enable-multibyte-characters' nil." + (message "Braindead macro: mm-with-unibyte") `(let (default-enable-multibyte-characters) ,@forms)) (put 'mm-with-unibyte 'lisp-indent-function 0) @@ -1083,6 +1124,7 @@ (defmacro mm-with-multibyte (&rest forms) "Eval the FORMS with the default value of `enable-multibyte-characters' t." + (message "Braindead macro: mm-with-multibyte") `(let ((default-enable-multibyte-characters t)) ,@forms)) (put 'mm-with-multibyte 'lisp-indent-function 0) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-15 11:08 ` Kenichi Handa 2007-11-15 11:41 ` Katsumi Yamaoka @ 2007-11-15 15:22 ` Stefan Monnier 2007-11-16 0:29 ` Kenichi Handa 2007-11-16 10:50 ` Eli Zaretskii 1 sibling, 2 replies; 40+ messages in thread From: Stefan Monnier @ 2007-11-15 15:22 UTC (permalink / raw) To: Kenichi Handa; +Cc: Katsumi Yamaoka, ding, emacs-devel > If "modifies" means that 8-bit bytes are converted to > multibyte characters as what string-as-multibyte does, it's > an expected behaviour. 99% of the uses of string-as-multibyte are bugs. Stefan ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-15 15:22 ` Stefan Monnier @ 2007-11-16 0:29 ` Kenichi Handa 2007-11-16 10:50 ` Eli Zaretskii 1 sibling, 0 replies; 40+ messages in thread From: Kenichi Handa @ 2007-11-16 0:29 UTC (permalink / raw) To: Stefan Monnier; +Cc: yamaoka, ding, emacs-devel In article <jwvd4ubttyx.fsf-monnier+emacs@gnu.org>, Stefan Monnier <monnier@iro.umontreal.ca> writes: > > If "modifies" means that 8-bit bytes are converted to > > multibyte characters as what string-as-multibyte does, it's > > an expected behaviour. > 99% of the uses of string-as-multibyte are bugs. Sure. --- Kenichi Handa handa@ni.aist.go.jp ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-15 15:22 ` Stefan Monnier 2007-11-16 0:29 ` Kenichi Handa @ 2007-11-16 10:50 ` Eli Zaretskii 1 sibling, 0 replies; 40+ messages in thread From: Eli Zaretskii @ 2007-11-16 10:50 UTC (permalink / raw) To: Stefan Monnier; +Cc: yamaoka, handa, ding, emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Thu, 15 Nov 2007 10:22:12 -0500 > Cc: Katsumi Yamaoka <yamaoka@jpl.org>, ding@gnus.org, emacs-devel@gnu.org > > 99% of the uses of string-as-multibyte are bugs. Should we emit a warning from the byte compiler about that? (Sorry if we already do: I didn't have time to look.) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Unicode-2] `read' always returns multibyte symbol 2007-11-13 9:41 [Unicode-2] `read' always returns multibyte symbol Katsumi Yamaoka 2007-11-13 12:55 ` Kenichi Handa @ 2007-11-13 15:07 ` Stefan Monnier 1 sibling, 0 replies; 40+ messages in thread From: Stefan Monnier @ 2007-11-13 15:07 UTC (permalink / raw) To: Katsumi Yamaoka; +Cc: ding, emacs-devel > --8<---------------cut here---------------start------------->8--- > (let ((string (encode-coding-string "local.テスト" 'utf-8))) > (with-temp-buffer > (set-buffer-multibyte t) > (insert (string-to-multibyte string)) > (goto-char (point-min)) > (multibyte-string-p (symbol-name (read (current-buffer)))))) > --8<---------------cut here---------------end--------------->8--- I'm not sure what Emacs should do in such a case, but in the example above, using a multibyte buffer is asking for trouble. Can't Gnus use a unibyte buffer in its corresponding code? That would speed things up, save you the use of string-to-multibyte, and make it crystal clear that the result should be unibyte. Stefan "trying hard not to say that the use of a multibyte buffer here is a plain bug ;-)" ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2007-12-05 11:26 UTC | newest] Thread overview: 40+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-11-13 9:41 [Unicode-2] `read' always returns multibyte symbol Katsumi Yamaoka 2007-11-13 12:55 ` Kenichi Handa 2007-11-13 15:10 ` Stefan Monnier 2007-11-14 4:53 ` Kenichi Handa 2007-11-14 7:06 ` [Unicode-2] `C-h f' error (was Re: `read' always returns multibyte symbol) Katsumi Yamaoka 2007-11-14 13:01 ` Kenichi Handa 2007-11-15 2:06 ` [Unicode-2] `C-h f' error Katsumi Yamaoka 2007-11-19 8:31 ` Katsumi Yamaoka 2007-11-20 11:09 ` CHENG Gao 2007-11-21 10:55 ` Katsumi Yamaoka 2007-11-21 12:14 ` Kenichi Handa 2007-11-21 12:28 ` Katsumi Yamaoka 2007-11-22 2:27 ` Richard Stallman 2007-11-22 4:51 ` Kenichi Handa 2007-11-22 16:22 ` Richard Stallman 2007-11-23 15:20 ` Johan Bockgård 2007-11-25 12:35 ` Kenichi Handa 2007-12-02 21:27 ` Richard Stallman 2007-12-05 5:11 ` Kenichi Handa 2007-12-05 11:26 ` Katsumi Yamaoka 2007-11-25 12:39 ` Kenichi Handa 2007-11-14 3:56 ` [Unicode-2] `read' always returns multibyte symbol Katsumi Yamaoka 2007-11-14 11:39 ` Katsumi Yamaoka 2007-11-14 14:52 ` Stefan Monnier 2007-11-14 23:52 ` Katsumi Yamaoka 2007-11-15 1:15 ` Stefan Monnier 2007-11-15 3:01 ` Katsumi Yamaoka 2007-11-15 3:39 ` Stefan Monnier 2007-11-15 10:20 ` Katsumi Yamaoka 2007-11-15 11:08 ` Kenichi Handa 2007-11-15 11:41 ` Katsumi Yamaoka 2007-11-15 14:41 ` Kenichi Handa 2007-11-15 23:31 ` Katsumi Yamaoka 2007-11-16 0:51 ` Kenichi Handa 2007-11-16 1:24 ` Katsumi Yamaoka 2007-11-16 2:51 ` Stefan Monnier 2007-11-15 15:22 ` Stefan Monnier 2007-11-16 0:29 ` Kenichi Handa 2007-11-16 10:50 ` Eli Zaretskii 2007-11-13 15:07 ` Stefan Monnier
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/emacs.git https://git.savannah.gnu.org/cgit/emacs/org-mode.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.