From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai =?iso-8859-1?q?Gro=DFjohann?=) Newsgroups: gmane.emacs.devel Subject: Re: Rmail changes for Emacs 22 Date: Mon, 21 Oct 2002 18:37:23 +0200 Organization: University of Dortmund, Germany Sender: emacs-devel-admin@gnu.org Message-ID: <84ptu33fzw.fsf@crybaby.cs.uni-dortmund.de> References: NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: main.gmane.org 1035223062 9463 80.91.224.249 (21 Oct 2002 17:57:42 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Mon, 21 Oct 2002 17:57:42 +0000 (UTC) Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 183go6-0002Rn-00 for ; Mon, 21 Oct 2002 19:57:38 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 183hi5-0007jW-00 for ; Mon, 21 Oct 2002 20:55:29 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 183gnp-0004wK-00; Mon, 21 Oct 2002 13:57:21 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 183gnR-0004kw-00 for emacs-devel@gnu.org; Mon, 21 Oct 2002 13:56:57 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 183gnO-0004ii-00 for emacs-devel@gnu.org; Mon, 21 Oct 2002 13:56:56 -0400 Original-Received: from main.gmane.org ([80.91.224.249]) by monty-python.gnu.org with esmtp (Exim 4.10) id 183gnO-0004hG-00 for emacs-devel@gnu.org; Mon, 21 Oct 2002 13:56:54 -0400 Original-Received: from list by main.gmane.org with local (Exim 3.35 #1 (Debian)) id 183gmQ-0002Bx-00 for ; Mon, 21 Oct 2002 19:55:54 +0200 Original-To: emacs-devel@gnu.org X-Injected-Via-Gmane: http://gmane.org/ Original-Received: from news by main.gmane.org with local (Exim 3.35 #1 (Debian)) id 183gmQ-0002Bg-00 for ; Mon, 21 Oct 2002 19:55:54 +0200 Original-Path: not-for-mail Original-Lines: 55 Original-NNTP-Posting-Host: pd9e1eb14.dip.t-dialin.net Original-X-Trace: main.gmane.org 1035222954 7687 217.225.235.20 (21 Oct 2002 17:55:54 GMT) Original-X-Complaints-To: usenet@main.gmane.org Original-NNTP-Posting-Date: Mon, 21 Oct 2002 17:55:54 +0000 (UTC) User-Agent: Gnus/5.090008 (Oort Gnus v0.08) Emacs/21.3.50 (i686-pc-linux-gnu) Cancel-Lock: sha1:43ODSgsdJeVMmdZOEKY6HlJyZW0= Errors-To: emacs-devel-admin@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.devel:8622 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:8622 Dave Love writes: > Eli Zaretskii writes: > >> Personally, I think emacs-mule is not a good idea in this case, since >> mbox is not Emacs-private format, so some other software should be able >> to read it. > > I don't see how that follows, but any file that has to represent the > full range of Emacs characters has to be stored in the internal > encoding. I don't know what the rationale is for any of this, or why > rmail uses emacs-mule now. Well, mbox files usually contain data that arrived via email. So it would be safe to just keep the data as it arrived, unmodified. So most messages won't contain characters that only Emacs knows about. So there is a pretty good chance that an mbox file contains only charsets that other programs also grok. But what do other programs do? Convert all incoming messages to Unicode? If they read from /var/mail, that might be difficult to do. Or do other programs just grok multiple charsets (encodings?) in the same file? It would, however, be slightly difficult to keep messages encoded in ascii and utf-16 in the same file. Hm. But if one keeps Content-Length headers, say, then one would know that one is looking at the From_ line. Therefore, one could tell whether those five characters are encoded in something that looks like ascii or whether it looks like utf-16. That might be sufficient to find the Content-type header to be really sure what the charset/encoding is. >> A good alternative would be to encode each message as what >> the charset= header says (and add/fix such a header if there is none, or >> if the one that's there lies). > > I doubt you should do anything to them, especially as you have no > assurance any headers are correct. Maybe it would be useful to offer the user a command so that they can say "this message is encoded in Big5" and the like. Then RMAIL could store this information in a header (in the Content-Type header?) and subsequent views of the message would automatically use the "right" charset/encoding. Presumably, the user just tries a number of possible charsets and then they can just look at the message to see whether their guess was right. And if they are like me who can't distinguish a GB2312 encoded Chinese text from a Big5 encoded one, then choosing the wrong charset won't be much of a loss as they won't be able to read it anyhow :-) kai -- ~/.signature is: umop ap!sdn (Frank Nobis)