unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* 23.0.50; utf7-decode failed with non latin-1 charactor
@ 2007-11-01 17:29 Topia
  2007-11-02 12:40 ` Jason Rumney
  0 siblings, 1 reply; 9+ messages in thread
From: Topia @ 2007-11-01 17:29 UTC (permalink / raw)
  To: emacs-pretest-bug

I want to use imap's internationalized (japanese) folder name,
from Wanderlust. but I see error "Unable to convert from Unicode"
from utf7-u16-latin1-char-converter.

You can see with this code:
   ;; "Sent Mail" in Japanese, imap's UTF-7
   (utf7-decode "&kAFP4W4IMH8w4TD8MOs-" 'imap)

so I found problem that utf7-utf-16-coding-system is nil, because:
 * Emacs 23 has utf-16-be, but this coding system has BOM.
 * utf-16-be-nosig is not found.

I found utf-16-be without BOM version, utf-16be. evalute above code
after (setq utf7-utf-16-coding-system 'utf-16be), no error occured.

Could you modify utf7-utf-16-coding-system to add utf-16be?

 (defconst utf7-utf-16-coding-system
   (cond ((mm-coding-system-p 'utf-16-be-no-signature) ; Mule-UCS
 	 'utf-16-be-no-signature)
 	((and (mm-coding-system-p 'utf-16-be) ; Emacs 21.3, Emacs 22
 	      ;; Avoid versions with BOM.
 	      (= 2 (length (encode-coding-string "a" 'utf-16-be))))
 	 'utf-16-be)
+	((and (mm-coding-system-p 'utf-16be) ; Emacs 23?
+	      ;; Avoid versions with BOM.
+	      (= 2 (length (encode-coding-string "a" 'utf-16be))))
+	 'utf-16be)
 	((mm-coding-system-p 'utf-16-be-nosig) ; ?
 	 'utf-16-be-nosig))
   "Coding system which encodes big endian UTF-16 without a BOM signature.")

In GNU Emacs 23.0.50.2 (x86_64-unknown-linux-gnu, GTK+ Version 2.12.0)
 of 2007-11-01 on undine
configured using `configure  '--prefix=/usr/opt/emacs/23.0.50' '--with-x-toolkit=gtk' '--with-x' '--with-xpm' '--with-jpeg' '--with-tiff' '--with-gif' '--with-png' '--with-kerberos5''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: C
  value of $LANG: ja_JP.UTF-8
  locale-coding-system: utf-8
  default-enable-multibyte-characters: t

Major mode: Emacs-Lisp

Regards,
-- 
Topia <topia@clovery.jp>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 23.0.50; utf7-decode failed with non latin-1 charactor
  2007-11-01 17:29 23.0.50; utf7-decode failed with non latin-1 charactor Topia
@ 2007-11-02 12:40 ` Jason Rumney
  2007-11-03  3:58   ` Richard Stallman
  2007-11-05  7:02   ` Kenichi Handa
  0 siblings, 2 replies; 9+ messages in thread
From: Jason Rumney @ 2007-11-02 12:40 UTC (permalink / raw)
  To: Topia; +Cc: emacs-pretest-bug

Topia wrote:
> I want to use imap's internationalized (japanese) folder name,
> from Wanderlust. but I see error "Unable to convert from Unicode"
> from utf7-u16-latin1-char-converter.
>
> You can see with this code:
>    ;; "Sent Mail" in Japanese, imap's UTF-7
>    (utf7-decode "&kAFP4W4IMH8w4TD8MOs-" 'imap)
>   
There appear to be two different implementations of utf-7 in Emacs, one
in lisp/international/utf-7.el, and one in lisp/gnus/utf7.el

The former seems to work for decoding, but always returns nil on
encoding without changing the buffer contents (the correctly encoded
text is in a buffer called " *temp*" however).

The latter only seems to work on Latin-1 text (as documented in the
commentary) and returns results from the encoder that are inconsistent
with iconv.

Probably lisp/international/utf-7.el should be fixed, and the Gnus one
dropped.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 23.0.50; utf7-decode failed with non latin-1 charactor
  2007-11-02 12:40 ` Jason Rumney
@ 2007-11-03  3:58   ` Richard Stallman
  2007-11-05  7:02   ` Kenichi Handa
  1 sibling, 0 replies; 9+ messages in thread
From: Richard Stallman @ 2007-11-03  3:58 UTC (permalink / raw)
  To: Jason Rumney, handa; +Cc: emacs-pretest-bug, topia

    There appear to be two different implementations of utf-7 in Emacs, one
    in lisp/international/utf-7.el, and one in lisp/gnus/utf7.el

    The former seems to work for decoding, but always returns nil on
    encoding without changing the buffer contents (the correctly encoded
    text is in a buffer called " *temp*" however).

    The latter only seems to work on Latin-1 text (as documented in the
    commentary) and returns results from the encoder that are inconsistent
    with iconv.

    Probably lisp/international/utf-7.el should be fixed, and the Gnus one
    dropped.

Handa, can you fix international/utf-7.el?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 23.0.50; utf7-decode failed with non latin-1 charactor
  2007-11-02 12:40 ` Jason Rumney
  2007-11-03  3:58   ` Richard Stallman
@ 2007-11-05  7:02   ` Kenichi Handa
  2007-11-06 19:53     ` imap.el: international/utf-7.el vs. gnus/utf7.el (was: 23.0.50; utf7-decode failed with non latin-1 charactor) Reiner Steib
  2007-11-07 12:12     ` 23.0.50; utf7-decode failed with non latin-1 charactor Jason Rumney
  1 sibling, 2 replies; 9+ messages in thread
From: Kenichi Handa @ 2007-11-05  7:02 UTC (permalink / raw)
  To: Jason Rumney; +Cc: emacs-pretest-bug, topia

In article <472B1AD5.3090006@gnu.org>, Jason Rumney <jasonr@gnu.org> writes:
[...]
> There appear to be two different implementations of utf-7 in Emacs, one
> in lisp/international/utf-7.el, and one in lisp/gnus/utf7.el

> The former seems to work for decoding, but always returns nil on
> encoding without changing the buffer contents (the correctly encoded
> text is in a buffer called " *temp*" however).
[...]
> Probably lisp/international/utf-7.el should be fixed, and the Gnus one
> dropped.

It seems that fucntions utf-7-decode and utf-7-encode are
designed to be called only as pre-write/post-read functions
of a coding system utf-7 (and commented out coding system
utf-7-imap) in lisp/international/utf-7.el.

I think the right thing is to uncomment all codes for
utf-7-map in utf-7.el, and modify gnus to use normal
encode/decode-coding-region/string with utf-7-imap.

I've just committed the former change.  Could someone do the
latter change?

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 9+ messages in thread

* imap.el: international/utf-7.el vs. gnus/utf7.el (was: 23.0.50; utf7-decode failed with non latin-1 charactor)
  2007-11-05  7:02   ` Kenichi Handa
@ 2007-11-06 19:53     ` Reiner Steib
  2007-11-07  0:36       ` Kenichi Handa
  2007-11-07 12:12     ` 23.0.50; utf7-decode failed with non latin-1 charactor Jason Rumney
  1 sibling, 1 reply; 9+ messages in thread
From: Reiner Steib @ 2007-11-06 19:53 UTC (permalink / raw)
  To: Kenichi Handa, ding, emacs-devel; +Cc: topia, Jason Rumney

On Mon, Nov 05 2007, Kenichi Handa wrote:

> Jason Rumney <jasonr@gnu.org> writes:
>> Probably lisp/international/utf-7.el should be fixed, and the Gnus one
>> dropped.

Please keep in mind the we want to keep the Gnus versions in
Emacs/trunk (Gnus 5.13) and Gnus/trunk (aka No Gnus) in sync.  We want
to keep No Gnus compatible with Emacs 21+ (and XEmacs 21.4+).  So we
need to add some compatibility code.

> I think the right thing is to uncomment all codes for
> utf-7-map in utf-7.el, and modify gnus to use normal
> encode/decode-coding-region/string with utf-7-imap.
>
> I've just committed the former change.  

AFAIKS, `utf-7-encode' also accepts that FROM is a string, but it's
not documented.  Can we rely on this?  Could you document it, please?

> Could someone do the latter change?

A grep through Gnus' sources suggests that it only uses the functions
`utf7-encode' and `utf7-decode' in `imap.el'.

How about the following patch to gnus/utf7.el (untested)?  Could you
suggest a better test instead of `(<= 23 emacs-major-version)'?

Instead of -OLD and -NEW we should use more suitable names or include
the defuns directly.

--8<---------------cut here---------------start------------->8---
--- utf7.el	04 Aug 2007 20:36:34 +0200	1.12
+++ utf7.el	06 Nov 2007 20:42:08 +0100	
@@ -207,7 +207,7 @@
   (mm-decode-coding-region (point-min) (point-max) 'iso-8859-1)
   (mm-enable-multibyte))
 
-(defun utf7-encode (string &optional for-imap)
+(defun utf7-encode-OLD (string &optional for-imap)
   "Encode UTF-7 STRING.  Use IMAP modification if FOR-IMAP is non-nil."
   (let ((default-enable-multibyte-characters t))
     (with-temp-buffer
@@ -215,7 +215,7 @@
       (utf7-encode-internal for-imap)
       (buffer-string))))
 
-(defun utf7-decode (string &optional for-imap)
+(defun utf7-decode-OLD (string &optional for-imap)
   "Decode UTF-7 STRING.  Use IMAP modification if FOR-IMAP is non-nil."
   (let ((default-enable-multibyte-characters nil))
     (with-temp-buffer
@@ -224,6 +224,31 @@
       (mm-enable-multibyte)
       (buffer-string))))
 
+(defun utf7-encode-NEW (string &optional for-imap)
+  (with-temp-buffer
+    ;; (utf-7-encode FROM TO IMAP)
+    ;;
+    ;; `utf-7-encode' also accepts that FROM is a string, but it's not
+    ;; documented.
+    (utf-7-encode string nil for-imap)
+    (buffer-string)))
+
+(defun utf7-decode-NEW (string &optional for-imap)
+  (with-temp-buffer
+    (insert string)
+    (goto-char (point-min))
+    ;; (utf-7-decode LEN IMAP)
+    (utf-7-decode (buffer-size) for-imap)
+    (buffer-string)))
+
+(if (and (require 'utf-7 nil t) ;; Additional test for XEmacs?
+	 (<= 23 emacs-major-version)) ;; A feature test would be better
+    (progn
+      (defalias 'utf7-encode 'utf7-encode-NEW)
+      (defalias 'utf7-decode 'utf7-decode-NEW))
+  (defalias 'utf7-encode 'utf7-encode-OLD)
+  (defalias 'utf7-decode 'utf7-decode-OLD))
+
 (provide 'utf7)
 
 ;;; arch-tag: 96078b55-85c7-4161-aed2-932c24b282c7
--8<---------------cut here---------------end--------------->8---

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: imap.el: international/utf-7.el vs. gnus/utf7.el (was: 23.0.50; utf7-decode failed with non latin-1 charactor)
  2007-11-06 19:53     ` imap.el: international/utf-7.el vs. gnus/utf7.el (was: 23.0.50; utf7-decode failed with non latin-1 charactor) Reiner Steib
@ 2007-11-07  0:36       ` Kenichi Handa
  2007-11-20 21:08         ` imap.el: international/utf-7.el vs. gnus/utf7.el Reiner Steib
  0 siblings, 1 reply; 9+ messages in thread
From: Kenichi Handa @ 2007-11-07  0:36 UTC (permalink / raw)
  To: Reiner Steib; +Cc: jasonr, topia, ding, emacs-devel

In article <v9prynjgm3.fsf_-_@marauder.physik.uni-ulm.de>, Reiner Steib <reinersteib+gmane@imap.cc> writes:

> > I think the right thing is to uncomment all codes for
> > utf-7-map in utf-7.el, and modify gnus to use normal
> > encode/decode-coding-region/string with utf-7-imap.
> >
> > I've just committed the former change.  

> AFAIKS, `utf-7-encode' also accepts that FROM is a string, but it's
> not documented.  Can we rely on this?  Could you document it, please?

No, don't use utf-7-encode directly but use
encode-coding-string.

For instance, this:

> +(defun utf7-encode-NEW (string &optional for-imap)
> +  (with-temp-buffer
> +    ;; (utf-7-encode FROM TO IMAP)
> +    ;;
> +    ;; `utf-7-encode' also accepts that FROM is a string, but it's not
> +    ;; documented.
> +    (utf-7-encode string nil for-imap)
> +    (buffer-string)))

can simply be:

(defun utf7-encode-NEW (string &optional for-imap)
  (encode-coding-string string (if for-imap 'utf-7-imap 'utf-7)))

And the test for the availability is:

(and (coding-system-p 'utf-7) (coding-system-p 'utf-7-imap))

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 23.0.50; utf7-decode failed with non latin-1 charactor
  2007-11-05  7:02   ` Kenichi Handa
  2007-11-06 19:53     ` imap.el: international/utf-7.el vs. gnus/utf7.el (was: 23.0.50; utf7-decode failed with non latin-1 charactor) Reiner Steib
@ 2007-11-07 12:12     ` Jason Rumney
  2007-11-07 12:51       ` Kenichi Handa
  1 sibling, 1 reply; 9+ messages in thread
From: Jason Rumney @ 2007-11-07 12:12 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-pretest-bug, topia

Kenichi Handa wrote:
> It seems that fucntions utf-7-decode and utf-7-encode are
> designed to be called only as pre-write/post-read functions
> of a coding system utf-7 (and commented out coding system
> utf-7-imap) in lisp/international/utf-7.el.
>   

Is it a requirement for such functions to return nil?
If not, can we return the encoded string instead of nil, to make the
undocumented string FROM argument useful (as a drop in replacement for
the gnus utf7-encode).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 23.0.50; utf7-decode failed with non latin-1 charactor
  2007-11-07 12:12     ` 23.0.50; utf7-decode failed with non latin-1 charactor Jason Rumney
@ 2007-11-07 12:51       ` Kenichi Handa
  0 siblings, 0 replies; 9+ messages in thread
From: Kenichi Handa @ 2007-11-07 12:51 UTC (permalink / raw)
  To: Jason Rumney; +Cc: emacs-pretest-bug, topia

In article <4731ABC9.4080604@gnu.org>, Jason Rumney <jasonr@gnu.org> writes:

> Kenichi Handa wrote:
> > It seems that fucntions utf-7-decode and utf-7-encode are
> > designed to be called only as pre-write/post-read functions
> > of a coding system utf-7 (and commented out coding system
> > utf-7-imap) in lisp/international/utf-7.el.
> >   

> Is it a requirement for such functions to return nil?
> If not, can we return the encoded string instead of nil, 

utf-7-encode is called from pre-write functions
utf-7-pre-write-conversion and
utf-7-imap-pre-write-conversion, and they expects
utf-7-encode to put the encoded result in a new buffer.

So, it's possible to make utf-7-encode return a string, but
it's inefficient to make a string that is just ignored when
callled from utf-7-pre-write-conversion.

> to make the
> undocumented string FROM argument useful (as a drop in replacement for
> the gnus utf7-encode).

Why is that necessary?  We can use encode-coding-string.

I want to keep the entry points for encoding and decoding
only to the functions decode/encode-coding-region/string.

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: imap.el: international/utf-7.el vs. gnus/utf7.el
  2007-11-07  0:36       ` Kenichi Handa
@ 2007-11-20 21:08         ` Reiner Steib
  0 siblings, 0 replies; 9+ messages in thread
From: Reiner Steib @ 2007-11-20 21:08 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: jasonr, topia, ding, emacs-devel

On Wed, Nov 07 2007, Kenichi Handa wrote:

>> > I think the right thing is to uncomment all codes for
>> > utf-7-map in utf-7.el, and modify gnus to use normal
>> > encode/decode-coding-region/string with utf-7-imap.
>> >
>> > I've just committed the former change.  

I have modified gnus/utf7.el accordingly:

	* utf7.el (utf7-encode, utf7-decode): Use coding system
	`utf-7'/`utf-7-imap' from utf-7.el' if available.

> No, don't use utf-7-encode directly but use encode-coding-string.
>
> For instance, this[...] can simply be:
>
> (defun utf7-encode-NEW (string &optional for-imap)
>   (encode-coding-string string (if for-imap 'utf-7-imap 'utf-7)))
>
> And the test for the availability is:
>
> (and (coding-system-p 'utf-7) (coding-system-p 'utf-7-imap))

Thanks for the hints.

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-11-20 21:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-01 17:29 23.0.50; utf7-decode failed with non latin-1 charactor Topia
2007-11-02 12:40 ` Jason Rumney
2007-11-03  3:58   ` Richard Stallman
2007-11-05  7:02   ` Kenichi Handa
2007-11-06 19:53     ` imap.el: international/utf-7.el vs. gnus/utf7.el (was: 23.0.50; utf7-decode failed with non latin-1 charactor) Reiner Steib
2007-11-07  0:36       ` Kenichi Handa
2007-11-20 21:08         ` imap.el: international/utf-7.el vs. gnus/utf7.el Reiner Steib
2007-11-07 12:12     ` 23.0.50; utf7-decode failed with non latin-1 charactor Jason Rumney
2007-11-07 12:51       ` Kenichi Handa

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).