23.0.50; utf7-decode failed with non latin-1 charactor

all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* 23.0.50; utf7-decode failed with non latin-1 charactor
@ 2007-11-01 17:29 Topia
  2007-11-02 12:40 ` Jason Rumney
  0 siblings, 1 reply; 7+ messages in thread
From: Topia @ 2007-11-01 17:29 UTC (permalink / raw)
  To: emacs-pretest-bug

I want to use imap's internationalized (japanese) folder name,
from Wanderlust. but I see error "Unable to convert from Unicode"
from utf7-u16-latin1-char-converter.

You can see with this code:
   ;; "Sent Mail" in Japanese, imap's UTF-7
   (utf7-decode "&kAFP4W4IMH8w4TD8MOs-" 'imap)

so I found problem that utf7-utf-16-coding-system is nil, because:
 * Emacs 23 has utf-16-be, but this coding system has BOM.
 * utf-16-be-nosig is not found.

I found utf-16-be without BOM version, utf-16be. evalute above code
after (setq utf7-utf-16-coding-system 'utf-16be), no error occured.

Could you modify utf7-utf-16-coding-system to add utf-16be?

 (defconst utf7-utf-16-coding-system
   (cond ((mm-coding-system-p 'utf-16-be-no-signature) ; Mule-UCS
 	 'utf-16-be-no-signature)
 	((and (mm-coding-system-p 'utf-16-be) ; Emacs 21.3, Emacs 22
 	      ;; Avoid versions with BOM.
 	      (= 2 (length (encode-coding-string "a" 'utf-16-be))))
 	 'utf-16-be)
+	((and (mm-coding-system-p 'utf-16be) ; Emacs 23?
+	      ;; Avoid versions with BOM.
+	      (= 2 (length (encode-coding-string "a" 'utf-16be))))
+	 'utf-16be)
 	((mm-coding-system-p 'utf-16-be-nosig) ; ?
 	 'utf-16-be-nosig))
   "Coding system which encodes big endian UTF-16 without a BOM signature.")

In GNU Emacs 23.0.50.2 (x86_64-unknown-linux-gnu, GTK+ Version 2.12.0)
 of 2007-11-01 on undine
configured using `configure  '--prefix=/usr/opt/emacs/23.0.50' '--with-x-toolkit=gtk' '--with-x' '--with-xpm' '--with-jpeg' '--with-tiff' '--with-gif' '--with-png' '--with-kerberos5''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: C
  value of $LANG: ja_JP.UTF-8
  locale-coding-system: utf-8
  default-enable-multibyte-characters: t

Major mode: Emacs-Lisp

Regards,
-- 
Topia <topia@clovery.jp>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* 23.0.50; utf7-decode failed with non latin-1 charactor
@ 2007-11-01 17:45 Topia
  0 siblings, 0 replies; 7+ messages in thread
From: Topia @ 2007-11-01 17:45 UTC (permalink / raw)
  To: bug-gnu-emacs

I want to use imap's internationalized (japanese) folder name,
from Wanderlust. but I see error "Unable to convert from Unicode"
from utf7-u16-latin1-char-converter.

You can see with this code:
   ;; "Sent Mail" in Japanese, imap's UTF-7
   (utf7-decode "&kAFP4W4IMH8w4TD8MOs-" 'imap)

tested on lisp/gnus/utf7.el:
   ;;; arch-tag: 96078b55-85c7-4161-aed2-932c24b282c7


so I found problem that utf7-utf-16-coding-system is nil, because:
 * Emacs 23 has utf-16-be, but this coding system has BOM.
 * utf-16-be-nosig is not found.

I found utf-16-be without BOM version, utf-16be. evalute above code
after (setq utf7-utf-16-coding-system 'utf-16be), no error occured.

Could you modify utf7-utf-16-coding-system to add utf-16be?

 (defconst utf7-utf-16-coding-system
   (cond ((mm-coding-system-p 'utf-16-be-no-signature) ; Mule-UCS
 	 'utf-16-be-no-signature)
 	((and (mm-coding-system-p 'utf-16-be) ; Emacs 21.3, Emacs 22 (BOM?)
 	      ;; Avoid versions with BOM.
 	      (= 2 (length (encode-coding-string "a" 'utf-16-be))))
 	 'utf-16-be)
+	((and (mm-coding-system-p 'utf-16be) ; Emacs 22 and later
+	      ;; Avoid versions with BOM.
+	      (= 2 (length (encode-coding-string "a" 'utf-16be))))
+	 'utf-16be)
 	((mm-coding-system-p 'utf-16-be-nosig) ; ?
 	 'utf-16-be-nosig))
   "Coding system which encodes big endian UTF-16 without a BOM signature.")

In GNU Emacs 23.0.50.2 (x86_64-unknown-linux-gnu, GTK+ Version 2.12.0)
 of 2007-11-01 on undine
configured using `configure  '--prefix=/usr/opt/emacs/23.0.50' '--with-x-toolkit=gtk' '--with-x' '--with-xpm' '--with-jpeg' '--with-tiff' '--with-gif' '--with-png' '--with-kerberos5''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: C
  value of $LANG: ja_JP.UTF-8
  locale-coding-system: utf-8
  default-enable-multibyte-characters: t

Major mode: Emacs-Lisp

Sorry for bad English.

Regards,
-- 
Topia <topia@clovery.jp>




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 23.0.50; utf7-decode failed with non latin-1 charactor
  2007-11-01 17:29 Topia
@ 2007-11-02 12:40 ` Jason Rumney
  2007-11-03  3:58   ` Richard Stallman
  2007-11-05  7:02   ` Kenichi Handa
  0 siblings, 2 replies; 7+ messages in thread
From: Jason Rumney @ 2007-11-02 12:40 UTC (permalink / raw)
  To: Topia; +Cc: emacs-pretest-bug

Topia wrote:
> I want to use imap's internationalized (japanese) folder name,
> from Wanderlust. but I see error "Unable to convert from Unicode"
> from utf7-u16-latin1-char-converter.
>
> You can see with this code:
>    ;; "Sent Mail" in Japanese, imap's UTF-7
>    (utf7-decode "&kAFP4W4IMH8w4TD8MOs-" 'imap)
>   
There appear to be two different implementations of utf-7 in Emacs, one
in lisp/international/utf-7.el, and one in lisp/gnus/utf7.el

The former seems to work for decoding, but always returns nil on
encoding without changing the buffer contents (the correctly encoded
text is in a buffer called " *temp*" however).

The latter only seems to work on Latin-1 text (as documented in the
commentary) and returns results from the encoder that are inconsistent
with iconv.

Probably lisp/international/utf-7.el should be fixed, and the Gnus one
dropped.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 23.0.50; utf7-decode failed with non latin-1 charactor
  2007-11-02 12:40 ` Jason Rumney
@ 2007-11-03  3:58   ` Richard Stallman
  2007-11-05  7:02   ` Kenichi Handa
  1 sibling, 0 replies; 7+ messages in thread
From: Richard Stallman @ 2007-11-03  3:58 UTC (permalink / raw)
  To: Jason Rumney, handa; +Cc: emacs-pretest-bug, topia

    There appear to be two different implementations of utf-7 in Emacs, one
    in lisp/international/utf-7.el, and one in lisp/gnus/utf7.el

    The former seems to work for decoding, but always returns nil on
    encoding without changing the buffer contents (the correctly encoded
    text is in a buffer called " *temp*" however).

    The latter only seems to work on Latin-1 text (as documented in the
    commentary) and returns results from the encoder that are inconsistent
    with iconv.

    Probably lisp/international/utf-7.el should be fixed, and the Gnus one
    dropped.

Handa, can you fix international/utf-7.el?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 23.0.50; utf7-decode failed with non latin-1 charactor
  2007-11-02 12:40 ` Jason Rumney
  2007-11-03  3:58   ` Richard Stallman
@ 2007-11-05  7:02   ` Kenichi Handa
  2007-11-07 12:12     ` Jason Rumney
  1 sibling, 1 reply; 7+ messages in thread
From: Kenichi Handa @ 2007-11-05  7:02 UTC (permalink / raw)
  To: Jason Rumney; +Cc: emacs-pretest-bug, topia

In article <472B1AD5.3090006@gnu.org>, Jason Rumney <jasonr@gnu.org> writes:
[...]
> There appear to be two different implementations of utf-7 in Emacs, one
> in lisp/international/utf-7.el, and one in lisp/gnus/utf7.el

> The former seems to work for decoding, but always returns nil on
> encoding without changing the buffer contents (the correctly encoded
> text is in a buffer called " *temp*" however).
[...]
> Probably lisp/international/utf-7.el should be fixed, and the Gnus one
> dropped.

It seems that fucntions utf-7-decode and utf-7-encode are
designed to be called only as pre-write/post-read functions
of a coding system utf-7 (and commented out coding system
utf-7-imap) in lisp/international/utf-7.el.

I think the right thing is to uncomment all codes for
utf-7-map in utf-7.el, and modify gnus to use normal
encode/decode-coding-region/string with utf-7-imap.

I've just committed the former change.  Could someone do the
latter change?

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 23.0.50; utf7-decode failed with non latin-1 charactor
  2007-11-05  7:02   ` Kenichi Handa
@ 2007-11-07 12:12     ` Jason Rumney
  2007-11-07 12:51       ` Kenichi Handa
  0 siblings, 1 reply; 7+ messages in thread
From: Jason Rumney @ 2007-11-07 12:12 UTC (permalink / raw)
  To: Kenichi Handa; +Cc: emacs-pretest-bug, topia

Kenichi Handa wrote:
> It seems that fucntions utf-7-decode and utf-7-encode are
> designed to be called only as pre-write/post-read functions
> of a coding system utf-7 (and commented out coding system
> utf-7-imap) in lisp/international/utf-7.el.
>   

Is it a requirement for such functions to return nil?
If not, can we return the encoded string instead of nil, to make the
undocumented string FROM argument useful (as a drop in replacement for
the gnus utf7-encode).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 23.0.50; utf7-decode failed with non latin-1 charactor
  2007-11-07 12:12     ` Jason Rumney
@ 2007-11-07 12:51       ` Kenichi Handa
  0 siblings, 0 replies; 7+ messages in thread
From: Kenichi Handa @ 2007-11-07 12:51 UTC (permalink / raw)
  To: Jason Rumney; +Cc: emacs-pretest-bug, topia

In article <4731ABC9.4080604@gnu.org>, Jason Rumney <jasonr@gnu.org> writes:

> Kenichi Handa wrote:
> > It seems that fucntions utf-7-decode and utf-7-encode are
> > designed to be called only as pre-write/post-read functions
> > of a coding system utf-7 (and commented out coding system
> > utf-7-imap) in lisp/international/utf-7.el.
> >   

> Is it a requirement for such functions to return nil?
> If not, can we return the encoded string instead of nil, 

utf-7-encode is called from pre-write functions
utf-7-pre-write-conversion and
utf-7-imap-pre-write-conversion, and they expects
utf-7-encode to put the encoded result in a new buffer.

So, it's possible to make utf-7-encode return a string, but
it's inefficient to make a string that is just ignored when
callled from utf-7-pre-write-conversion.

> to make the
> undocumented string FROM argument useful (as a drop in replacement for
> the gnus utf7-encode).

Why is that necessary?  We can use encode-coding-string.

I want to keep the entry points for encoding and decoding
only to the functions decode/encode-coding-region/string.

---
Kenichi Handa
handa@ni.aist.go.jp

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-11-07 12:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-01 17:45 23.0.50; utf7-decode failed with non latin-1 charactor Topia
  -- strict thread matches above, loose matches on Subject: below --
2007-11-01 17:29 Topia
2007-11-02 12:40 ` Jason Rumney
2007-11-03  3:58   ` Richard Stallman
2007-11-05  7:02   ` Kenichi Handa
2007-11-07 12:12     ` Jason Rumney
2007-11-07 12:51       ` Kenichi Handa

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.