all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* utf-7 encoding in imap.el is applied to already encoded byte sequences
@ 2007-12-12 19:13 Stefan Monnier
  2007-12-13  2:15 ` Katsumi Yamaoka
  0 siblings, 1 reply; 2+ messages in thread
From: Stefan Monnier @ 2007-12-12 19:13 UTC (permalink / raw)
  To: bugs; +Cc: emacs-devel


It seems that the utf-encode call in imap.el is often (always?) applied
to unibyte data (i.e. streams of bytes, a.k.a already encoded text).

The reason this is so, is because when reading newsrc.eld, Gnus calls
mm-string-as-unibyte (lisp/gnus/gnus-start.el:2420).  It's also because
Gnus pre-encodes the names when they're read from the keyboard in
gnus-read-move-group-name (lisp/gnus/gnus-sum.el:11785).

I see 3 problems here:
1 - The use of mm-string-as-unibyte (I consider any use of
    string-as-unibyte to be wrong, unless it is accompagnied by a comment
    that explains why it is right).
2 - Inconsistent encoding: gnus-sum.el apparently uses utf-8 (at least
    that's what (gnus-group-name-charset to-method to-newsgroup) returned
    in my tests, tho maybe it's because of my locale), whereas
    gnus-start.el uses emacs-mule (implicitly, via mm-string-as-unibyte).
3 - imap.el tries to re-encode in utf7 a folder names that have already
    been encoded (with emacs-mule or utf-8).


        Stefan

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: utf-7 encoding in imap.el is applied to already encoded byte sequences
  2007-12-12 19:13 utf-7 encoding in imap.el is applied to already encoded byte sequences Stefan Monnier
@ 2007-12-13  2:15 ` Katsumi Yamaoka
  0 siblings, 0 replies; 2+ messages in thread
From: Katsumi Yamaoka @ 2007-12-13  2:15 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: bugs, ding, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 381 bytes --]

>>>>> Stefan Monnier wrote:

> It seems that the utf-encode call in imap.el is often (always?) applied
> to unibyte data (i.e. streams of bytes, a.k.a already encoded text).

> The reason this is so, is because when reading newsrc.eld, Gnus calls
> mm-string-as-unibyte (lisp/gnus/gnus-start.el:2420).

Gnus saves a newsgroup name in the ~/.newsrc.eld file as a string
like this:


[-- Attachment #2: Type: text/plain, Size: 72 bytes --]

(prin1-to-string (encode-coding-string "NAME" 'CODING-SYSTEM))

[-- Attachment #3: Type: text/plain, Size: 2415 bytes --]


If this is a nntp group, what actually encodes it is the news
server.  For instance, news.newsfan.net uses gb2312 (or possibly
gbk).  Gnus reads it through the net and uses it as-is internally.
Only when displaying it for a user, the newsgroup name is decoded
according to `gnus-group-name-charset-method-alist' or
`gnus-group-name-charset-group-alist'.  Gnus does it for groups
based on the other back ends, too.  But please note that Gnus
trunk supports non-ASCII newsgroup names[1] for only nntp, nnml
(including nnagent), and nnrss back ends.  In those cases,
encoding of newsgroup names is done by Gnus by itself.

When reading such an encoded newsgroup name from the ~/.newsrc.eld
file, Gnus uses `read' and `eval' (gnus-start.el:2391).  Once both
Emacs trunk and Emacs Unicode-2 did read it as a multibyte string,
and it didn't match the one that was in the active data.  That is
why I added `mm-string-as-unibyte'.  Though it seems to be
unnecessary nowadays, it behaves as no-op for a unibyte string,
doesn't it?

> It's also because Gnus pre-encodes the names when they're read
> from the keyboard in gnus-read-move-group-name (lisp/gnus/gnus-sum.el:11785).

Because a non-ASCII group name should be an encoded unibyte string
for the internal use.  But there should be no non-ASCII name in
the nnimap groups and pre-encoding doesn't affect those newsgroup
names (am I wrong?).

> I see 3 problems here:
> 1 - The use of mm-string-as-unibyte (I consider any use of
>     string-as-unibyte to be wrong, unless it is accompagnied by a comment
>     that explains why it is right).
> 2 - Inconsistent encoding: gnus-sum.el apparently uses utf-8 (at least
>     that's what (gnus-group-name-charset to-method to-newsgroup) returned
>     in my tests, tho maybe it's because of my locale), whereas
>     gnus-start.el uses emacs-mule (implicitly, via mm-string-as-unibyte).
> 3 - imap.el tries to re-encode in utf7 a folder names that have already
>     been encoded (with emacs-mule or utf-8).

I'm sorry I'm ignorant in IMAP and don't know what encoding in
utf-7 is for.  But re-encoding of ASCII newsgroup names makes no
difference, doesn't it?  Although I'm not capable in improving
nnimap.el, it might have to decode encoded newsgroup names
according to `gnus-group-name-charset-method-alist' or
`gnus-group-name-charset-group-alist' before re-encoding in utf-7.

[1] (info "(gnus)Non-ASCII Group Names")

[-- Attachment #4: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-12-13  2:15 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-12 19:13 utf-7 encoding in imap.el is applied to already encoded byte sequences Stefan Monnier
2007-12-13  2:15 ` Katsumi Yamaoka

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.