From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Simon Josefsson Newsgroups: gmane.emacs.bugs Subject: Re: bad rfc2047 encoding Date: Tue, 20 Aug 2002 19:22:47 +0200 Sender: bug-gnu-emacs-admin@gnu.org Message-ID: References: NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: main.gmane.org 1029864443 8175 127.0.0.1 (20 Aug 2002 17:27:23 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Tue, 20 Aug 2002 17:27:23 +0000 (UTC) Cc: bugs@gnus.org, bug-gnu-emacs@gnu.org Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17hCmn-00027k-00 for ; Tue, 20 Aug 2002 19:27:21 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 17hCnv-0001q5-00; Tue, 20 Aug 2002 13:28:31 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 17hCiY-0001GG-00 for bug-gnu-emacs@gnu.org; Tue, 20 Aug 2002 13:22:58 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 17hCiS-0001Fu-00 for bug-gnu-emacs@gnu.org; Tue, 20 Aug 2002 13:22:58 -0400 Original-Received: from 178.230.13.217.in-addr.dgcsystems.net ([217.13.230.178] helo=yxa.extundo.com) by monty-python.gnu.org with esmtp (Exim 4.10) id 17hCiS-0001Ff-00 for bug-gnu-emacs@gnu.org; Tue, 20 Aug 2002 13:22:52 -0400 Original-Received: from latte.josefsson.org (yxa.extundo.com [217.13.230.178]) (authenticated bits=0) by yxa.extundo.com (8.12.5/8.12.5) with ESMTP id g7KHMm9C006521; Tue, 20 Aug 2002 19:22:48 +0200 Original-To: Dave Love X-Hashcash: 0:020820:d.love@dl.ac.uk:7af879eab1d4b0f3 X-Hashcash: 0:020820:bugs@gnus.org:3b428ba3c0968cc2 X-Hashcash: 0:020820:bug-gnu-emacs@gnu.org:125ce60760f00e52 In-Reply-To: (Dave Love's message of "20 Aug 2002 18:02:59 +0100") Original-Lines: 147 User-Agent: Gnus/5.090008 (Oort Gnus v0.08) Emacs/21.3.50 (i686-pc-linux-gnu) X-MIME-Autoconverted: from 8bit to quoted-printable by yxa.extundo.com id g7KHMm9C006521 Errors-To: bug-gnu-emacs-admin@gnu.org X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Bug reports for GNU Emacs, the Swiss army knife of text editors List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.bugs:3253 X-Report-Spam: http://spam.gmane.org/gmane.emacs.bugs:3253 Dave Love writes: > Simon Josefsson writes: > >> This was fixed in Oort some time ago > > Does that mean that Gnus 5.9 isn't being maintained? That wasn't what I meant. I don't know the answer. >> (rev 6.5 of rfc2047.el in Gnus >> CVS), patch modified against work with 21.3: > > It doesn't solve the problem as far as I can tell. I'd have thought > that obeying the RFC means parsing the header, since it concerns > comment fields. Is that necessery? Encoded words are allowed inside comments, they must simply not contain the character ). Which the patch fixes. Your example (with-temp-buffer (insert "To: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Gro=DFjohann) ") (rfc2047-encode-message-header) (buffer-string)) evaluates to "To: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai =3D?iso-8859-1?q?Gro=3DDFjoh= ann?=3D) " with the patch, which seems valid to me. Compare an example in the RFC: From: Nathaniel Borenstein (=3D?iso-8859-8?b?7eXs+SDv4SDp7Oj08A=3D=3D?=3D) > I've restored bug-gnu-Emacs to the Cc since this is something I think > is important for a release. I agree. (I'm reading the gnus bugs list from quimby.gnus.org, which removes To/Cc so when I reply it only goes to the author and bugs@gnus.org.) Suggested patch (against Emacs 21.3 RC) included again below. 2000-11-19 12:00:00 ShengHuo ZHU * rfc2047.el (rfc2047-q-encoding-alist): Match Resent-. (rfc2047-header-encoding-alist): Addresses are different from text. (rfc2047-encode-message-header): Ditto. (rfc2047-dissect-region): Extra parameter. (rfc2047-encode-region): Ditto. (rfc2047-encode-string): Ditto. Index: rfc2047.el =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /cvsroot/emacs/emacs/lisp/gnus/rfc2047.el,v retrieving revision 1.10 diff -u -p -u -w -r1.10 rfc2047.el --- rfc2047.el 15 Jul 2001 17:42:53 -0000 1.10 +++ rfc2047.el 16 Aug 2002 19:23:17 -0000 @@ -41,6 +41,8 @@ (defvar rfc2047-header-encoding-alist '(("Newsgroups" . nil) ("Message-ID" . nil) + ("\\(Resent-\\)?\\(From\\|Cc\\|To\\|Bcc\\|Reply-To\\|Sender\\)" . + "-A-Za-z0-9!*+/=3D_") (t . mime)) "*Header/encoding method alist. The list is traversed sequentially. The keys can either be @@ -52,7 +54,8 @@ The values can be: 2) `mime', in which case the header will be encoded according to RFC2047= ; 3) a charset, in which case it will be encoded as that charset; 4) `default', in which case the field will be encoded as the rest - of the article.") + of the article. +5) a string, like `mime', expect for using it as word-chars.") =20 (defvar rfc2047-charset-encoding-alist '((us-ascii . nil) @@ -87,7 +90,8 @@ Valid encodings are nil, `Q' and `B'.") "Alist of RFC2047 encodings to encoding functions.") =20 (defvar rfc2047-q-encoding-alist - '(("\\(From\\|Cc\\|To\\|Bcc\||Reply-To\\):" . "-A-Za-z0-9!*+/") + '(("\\(Resent-\\)?\\(From\\|Cc\\|To\\|Bcc\\|Reply-To\\|Sender\\):"=20 + . "-A-Za-z0-9!*+/" ) ;; =3D (\075), _ (\137), ? (\077) are used in the encoded word. ;; Avoid using 8bit characters. ;; Equivalent to "^\000-\007\011\013\015-\037\200-\377=3D_?" @@ -142,6 +146,8 @@ Should be called narrowed to the head of (setq alist nil method (cdr elem)))) (cond + ((stringp method) + (rfc2047-encode-region (point-min) (point-max) method)) ((eq method 'mime) (rfc2047-encode-region (point-min) (point-max))) ((eq method 'default) @@ -179,11 +185,12 @@ The buffer may be narrowed." (setq found t))) found)) =20 -(defun rfc2047-dissect-region (b e) +(defun rfc2047-dissect-region (b e &optional word-chars) "Dissect the region between B and E into words." - (let ((word-chars "-A-Za-z0-9!*+/") - ;; Not using ietf-drums-specials-token makes life simple. - mail-parse-mule-charset + (unless word-chars + ;; Anything except most CTLs, WSP + (setq word-chars "\010\012\014\041-\177")) + (let (mail-parse-mule-charset words point current result word) (save-restriction @@ -233,9 +240,9 @@ The buffer may be narrowed." (setq word (pop words)))) result)) =20 -(defun rfc2047-encode-region (b e) - "Encode all encodable words in region B to E." - (let ((words (rfc2047-dissect-region b e)) word) +(defun rfc2047-encode-region (b e &optional word-chars) + "Encode all encodable words in REGION." + (let ((words (rfc2047-dissect-region b e word-chars)) word) (save-restriction (narrow-to-region b e) (delete-region (point-min) (point-max)) @@ -255,11 +262,11 @@ The buffer may be narrowed." (cdr word)))) (rfc2047-fold-region (point-min) (point-max))))) =20 -(defun rfc2047-encode-string (string) +(defun rfc2047-encode-string (string &optional word-chars) "Encode words in STRING." (with-temp-buffer (insert string) - (rfc2047-encode-region (point-min) (point-max)) + (rfc2047-encode-region (point-min) (point-max) word-chars) (buffer-string))) =20 (defun rfc2047-encode (b e charset)