From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Dave Love Newsgroups: gmane.emacs.bugs Subject: changes for rfc2047 Date: 04 Sep 2002 14:13:48 +0100 Sender: bug-gnu-emacs-admin@gnu.org Message-ID: NNTP-Posting-Host: localhost.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="colonel-subversive-BLU-97-A/B-HAMASMOIS-computer-terrorism" X-Trace: main.gmane.org 1031145160 3942 127.0.0.1 (4 Sep 2002 13:12:40 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Wed, 4 Sep 2002 13:12:40 +0000 (UTC) Cc: bug-gnu-emacs@gnu.org Return-path: Original-Received: from monty-python.gnu.org ([199.232.76.173]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 17mZxU-00011G-00 for ; Wed, 04 Sep 2002 15:12:36 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 17mZz5-000507-00; Wed, 04 Sep 2002 09:14:15 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 17mZyq-0004sr-00 for bug-gnu-emacs@gnu.org; Wed, 04 Sep 2002 09:14:00 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 17mZyk-0004sa-00 for bug-gnu-emacs@gnu.org; Wed, 04 Sep 2002 09:14:00 -0400 Original-Received: from albion.dl.ac.uk ([148.79.80.39]) by monty-python.gnu.org with esmtp (Exim 4.10) id 17mZyj-0004sW-00 for bug-gnu-emacs@gnu.org; Wed, 04 Sep 2002 09:13:53 -0400 Original-Received: from fx by albion.dl.ac.uk with local (Exim 3.35 #1 (Debian)) id 17mZyg-0000HV-00; Wed, 04 Sep 2002 14:13:50 +0100 Original-To: bugs@gnus.org Original-Lines: 496 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2 Errors-To: bug-gnu-emacs-admin@gnu.org X-BeenThere: bug-gnu-emacs@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Bug reports for GNU Emacs, the Swiss army knife of text editors List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.bugs:3411 X-Report-Spam: http://spam.gmane.org/gmane.emacs.bugs:3411 --colonel-subversive-BLU-97-A/B-HAMASMOIS-computer-terrorism Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable After my complaint about rfc2047, here's an attempt to DTRT by minimally parsing address headers as rfc2822 and removing mention of Emacs charsets. (It also B-encodes Greek and Hebrew, as appropriate for non-Latin script, presumably.) The patch is against Emacs 21.2. I haven't tested it extensively yet. Can anyone see problems with it? Here's a contrived example. [Note that I typed =CE=B1 and =CE=B2 with a Un= icode input method, but they're unified to 8859-7 in the first case, since greek-iso-8bit has higher priority than utf-8 in this session. That isn't ideal, but I don't think it's easy to fix, and has some advantage in this case since it's also not ideal that utf-8 is B-encoded generally, even if the text is mostly ASCII.] (rfc2047-encode-string=20 " \"Fred Bl=C3=B6ggs\" fred <\"f. bloggs\"@bloggs.net> (=CE=B1 (=CE=B2) = \"...\")") =3D> " =3D?iso-8859-1?q?Fred_Bl=3DF6ggs?=3D fred (=3D?iso-8859-7?q?=3DE1?=3D (=3D?iso-8859-7?q?=3DE2?=3D) ...)" =20=20 (let (rfc2047-special-chars) (rfc2047-encode-string=20 " \"Fred Bl=C3=B6ggs\" fred <\"f. bloggs\"@bloggs.net> (=CE=B1 (=CE=B2= ) \"...\")")) =3D> "=3D?utf-8?b?ICJGcmVkIEJsw7ZnZ3MiIA=3D=3D?=3D =3D?utf-8?b?ZnJlZCA8ImYuIGJsb2dn?=3D =3D?utf-8?b?cyJAYmxvZ2dzLm5ldD4g?=3D =3D?utf-8?b?KM6xICjOsikgIi4uLiIp?=3D" =20=20=20=20 I don't understand why `rfc2047-encode-region' is written the way it is. Is there a good reason to cons up a list of words and then insert them, rather than encoding in place? Also, why are there apparently redundant efforts to insert newlines? Why isn't it enough just to call `rfc2047-fold-region' at the end? --colonel-subversive-BLU-97-A/B-HAMASMOIS-computer-terrorism Content-Type: text/x-patch Content-Disposition: inline 2002-09-04 Dave Love * rfc2047.el (message-posting-charset): defvar when compiling. (rfc2047-header-encoding-alist): Add `address-mime' part. (rfc2047-charset-encoding-alist): Use B for iso-8859-7, iso-8859-8. Doc fix. (rfc2047-q-encoding-alist): Augment header list. (rfc2047-encodable-p): Use mm-find-mime-charset-region. (rfc2047-special-chars, rfc2047-non-special-chars): New. (rfc2047-dissect-region, rfc2047-encode-region, rfc2047-encode): Rewritten to avoid charset stuff and to take account of rfc2822 tokens. (rfc2047-encode-message-header): Don't include header name field in encoding. Add `address-mime' case and bind rfc2047-special-chars for `mime' case. *** rfc2047.el.~1.10.~ Sun Jul 15 18:42:53 2001 --- rfc2047.el Wed Sep 4 13:52:49 2002 *************** *** 1,5 **** ;;; rfc2047.el --- functions for encoding and decoding rfc2047 messages ! ;; Copyright (C) 1998, 1999, 2000 Free Software Foundation, Inc. ;; Author: Lars Magne Ingebrigtsen ;; MORIOKA Tomohiko --- 1,5 ---- ;;; rfc2047.el --- functions for encoding and decoding rfc2047 messages ! ;; Copyright (C) 1998, 1999, 2000, 2002 Free Software Foundation, Inc. ;; Author: Lars Magne Ingebrigtsen ;; MORIOKA Tomohiko *************** *** 27,33 **** ;;; Code: ! (eval-when-compile (require 'cl)) (require 'qp) (require 'mm-util) --- 27,35 ---- ;;; Code: ! (eval-when-compile ! (require 'cl) ! (defvar message-posting-charset)) (require 'qp) (require 'mm-util) *************** *** 41,46 **** --- 43,50 ---- (defvar rfc2047-header-encoding-alist '(("Newsgroups" . nil) ("Message-ID" . nil) + ("\\(Resent-\\)?\\(From\\|Cc\\|To\\|Bcc\\|Reply-To\\|Sender\\)" . + address-mime) (t . mime)) "*Header/encoding method alist. The list is traversed sequentially. The keys can either be *************** *** 50,57 **** 1) nil, in which case no encoding is done; 2) `mime', in which case the header will be encoded according to RFC2047; ! 3) a charset, in which case it will be encoded as that charset; ! 4) `default', in which case the field will be encoded as the rest of the article.") (defvar rfc2047-charset-encoding-alist --- 54,63 ---- 1) nil, in which case no encoding is done; 2) `mime', in which case the header will be encoded according to RFC2047; ! 3) `address-mime', like `mime', but takes account of the rules for address ! fields (where quoted strings and comments must be treated separately); ! 4) a charset, in which case it will be encoded as that charset; ! 5) `default', in which case the field will be encoded as the rest of the article.") (defvar rfc2047-charset-encoding-alist *************** *** 62,69 **** (iso-8859-4 . Q) (iso-8859-5 . B) (koi8-r . B) ! (iso-8859-7 . Q) ! (iso-8859-8 . Q) (iso-8859-9 . Q) (iso-8859-14 . Q) (iso-8859-15 . Q) --- 68,75 ---- (iso-8859-4 . Q) (iso-8859-5 . B) (koi8-r . B) ! (iso-8859-7 . B) ! (iso-8859-8 . B) (iso-8859-9 . Q) (iso-8859-14 . Q) (iso-8859-15 . Q) *************** *** 78,84 **** (iso-2022-jp-2 . B) (iso-2022-int-1 . B)) "Alist of MIME charsets to RFC2047 encodings. ! Valid encodings are nil, `Q' and `B'.") (defvar rfc2047-encoding-function-alist '((Q . rfc2047-q-encode-region) --- 84,91 ---- (iso-2022-jp-2 . B) (iso-2022-int-1 . B)) "Alist of MIME charsets to RFC2047 encodings. ! Valid encodings are nil, `Q' and `B'. These indicate binary (no) encoding, ! quoted-printable and base64 respectively.") (defvar rfc2047-encoding-function-alist '((Q . rfc2047-q-encode-region) *************** *** 87,93 **** "Alist of RFC2047 encodings to encoding functions.") (defvar rfc2047-q-encoding-alist ! '(("\\(From\\|Cc\\|To\\|Bcc\||Reply-To\\):" . "-A-Za-z0-9!*+/") ;; = (\075), _ (\137), ? (\077) are used in the encoded word. ;; Avoid using 8bit characters. ;; Equivalent to "^\000-\007\011\013\015-\037\200-\377=_?" --- 94,101 ---- "Alist of RFC2047 encodings to encoding functions.") (defvar rfc2047-q-encoding-alist ! '(("\\(Resent-\\)?\\(From\\|Cc\\|To\\|Bcc\\|Reply-To\\|Sender\\):" ! . "-A-Za-z0-9!*+/" ) ;; = (\075), _ (\137), ? (\077) are used in the encoded word. ;; Avoid using 8bit characters. ;; Equivalent to "^\000-\007\011\013\015-\037\200-\377=_?" *************** *** 141,161 **** (eq (car elem) t)) (setq alist nil method (cdr elem)))) (cond ((eq method 'mime) ! (rfc2047-encode-region (point-min) (point-max))) ((eq method 'default) (if (and (featurep 'mule) (if (boundp 'default-enable-multibyte-characters) default-enable-multibyte-characters) mail-parse-charset) ! (mm-encode-coding-region (point-min) (point-max) mail-parse-charset))) ((mm-coding-system-p method) (if (and (featurep 'mule) (if (boundp 'default-enable-multibyte-characters) default-enable-multibyte-characters)) ! (mm-encode-coding-region (point-min) (point-max) method))) ;; Hm. (t))) (goto-char (point-max))))))) --- 149,174 ---- (eq (car elem) t)) (setq alist nil method (cdr elem)))) + (goto-char (point-min)) + (re-search-forward "^[^:]+: *" nil t) (cond + ((eq method 'address-mime) + (rfc2047-encode-region (point) (point-max))) ((eq method 'mime) ! (let (rfc2047-special-chars) ! (rfc2047-encode-region (point) (point-max)))) ((eq method 'default) (if (and (featurep 'mule) (if (boundp 'default-enable-multibyte-characters) default-enable-multibyte-characters) mail-parse-charset) ! (mm-encode-coding-region (point) (point-max) mail-parse-charset))) ((mm-coding-system-p method) (if (and (featurep 'mule) (if (boundp 'default-enable-multibyte-characters) default-enable-multibyte-characters)) ! (mm-encode-coding-region (point) (point-max) method))) ;; Hm. (t))) (goto-char (point-max))))))) *************** *** 169,247 **** The buffer may be narrowed." (require 'message) ; for message-posting-charset (let ((charsets ! (mapcar ! 'mm-mime-charset ! (mm-find-charset-region (point-min) (point-max)))) ! (cs (list 'us-ascii (car message-posting-charset))) ! found) ! (while charsets ! (unless (memq (pop charsets) cs) ! (setq found t))) ! found)) (defun rfc2047-dissect-region (b e) ! "Dissect the region between B and E into words." ! (let ((word-chars "-A-Za-z0-9!*+/") ! ;; Not using ietf-drums-specials-token makes life simple. ! mail-parse-mule-charset ! words point current ! result word) ! (save-restriction ! (narrow-to-region b e) ! (goto-char (point-min)) ! (skip-chars-forward "\000-\177") ! (while (not (eobp)) ! (setq point (point)) ! (skip-chars-backward word-chars b) ! (unless (eq b (point)) ! (push (cons (buffer-substring b (point)) nil) words)) ! (setq b (point)) ! (goto-char point) ! (setq current (mm-charset-after)) ! (forward-char 1) ! (skip-chars-forward word-chars) ! (while (and (not (eobp)) ! (eq (mm-charset-after) current)) ! (forward-char 1) ! (skip-chars-forward word-chars)) ! (unless (eq b (point)) ! (push (cons (buffer-substring b (point)) current) words)) ! (setq b (point)) ! (skip-chars-forward "\000-\177")) ! (unless (eq b (point)) ! (push (cons (buffer-substring b (point)) nil) words))) ! ;; merge adjacent words ! (setq word (pop words)) ! (while word ! (if (and (cdr word) ! (caar words) ! (not (cdar words)) ! (not (string-match "[^ \t]" (caar words)))) ! (if (eq (cdr (nth 1 words)) (cdr word)) ! (progn ! (setq word (cons (concat ! (car (nth 1 words)) (caar words) ! (car word)) ! (cdr word))) ! (pop words) ! (pop words)) ! (push (cons (concat (caar words) (car word)) (cdr word)) ! result) ! (pop words) ! (setq word (pop words))) ! (push word result) ! (setq word (pop words)))) ! result)) (defun rfc2047-encode-region (b e) ! "Encode all encodable words in region B to E." (let ((words (rfc2047-dissect-region b e)) word) (save-restriction (narrow-to-region b e) (delete-region (point-min) (point-max)) ! (while (setq word (pop words)) ! (if (not (cdr word)) ! (insert (car word)) (rfc2047-fold-region (gnus-point-at-bol) (point)) (goto-char (point-max)) (if (> (- (point) (save-restriction --- 182,256 ---- The buffer may be narrowed." (require 'message) ; for message-posting-charset (let ((charsets ! (mm-find-mime-charset-region (point-min) (point-max)))) ! (and charsets (not (equal charsets (list message-posting-charset)))))) ! ! ;; ietf-drums-specials-token less \ . @ ! (defconst rfc2047-special-chars (append "()<>[]:;,\"" nil) ! "List of characters treated as special when rfc207-encoding address fields. ! When encoding other sorts of fields, bin it to nil to avoid treating ! RFC 2822 quoted words and comments specially.") ! ! (defconst rfc2047-non-special-chars (concat "^" rfc2047-special-chars)) (defun rfc2047-dissect-region (b e) ! "Dissect the region between B and E into tokens. ! The tokens comprise sequences of atoms, quoted strings, special ! characters and whitespace." ! (save-restriction ! (narrow-to-region b e) ! (if (null rfc2047-special-chars) ! ;; simple `mime' case -- no need to tokenize ! (list (buffer-substring b e)) ! ;; `address-mime' case -- take care of quoted words, comments ! (with-syntax-table ietf-drums-syntax-table ! (let ((start (point)) ! words) ! (goto-char (point-min)) ! (condition-case nil ; in case of unbalanced specials ! ;; Dissect into: sequences of atoms, quoted strings, ! ;; specials, whitespace. (Specials mustn't be encoded.) ! (while (not (eobp)) ! (setq start (point)) ! (unless (= 0 (skip-chars-forward ietf-drums-wsp-token)) ! (push (buffer-substring start (point)) words) ! (setq start (point))) ! (cond ! ((memq (char-after) rfc2047-special-chars) ! ;; Grab string or special char. ! (if (eq ?\" (char-after)) ! (progn ! (forward-sexp) ! (push (buffer-substring start (point)) words)) ! (push (string (char-after)) words) ! (forward-char))) ! ((not (char-after))) ; eob ! (t ; normal token/whitespace sequence ! (skip-chars-forward rfc2047-non-special-chars) ! (skip-chars-backward ietf-drums-wsp-token) ! (push (buffer-substring start (point)) words)))) ! (error (error "Invalid data for rfc2047 encoding: %s" ! (buffer-substring b e)))) ! (nreverse words)))))) + ;; Fixme: why does this cons a list of words and insert them, rather + ;; than encoding in place? (defun rfc2047-encode-region (b e) ! "Encode all encodable words in region B to E. ! By default, the region is treated as containing addresses (see ! `rfc2047-special-chars')." (let ((words (rfc2047-dissect-region b e)) word) (save-restriction (narrow-to-region b e) (delete-region (point-min) (point-max)) ! (dolist (word words) ! ;; Quoted strings can't contain encoded words. Strip the ! ;; quotes. ! (if rfc2047-special-chars ! (if (eq ?\" (aref word 0)) ! (setq word (substring word 1 -1)))) ! (if (string-match "\\`[\0-\177]*\\'" word) ; including whitespace ! (insert word) (rfc2047-fold-region (gnus-point-at-bol) (point)) (goto-char (point-max)) (if (> (- (point) (save-restriction *************** *** 251,301 **** ;; Insert blank between encoded words (if (eq (char-before) ?=) (insert " ")) (rfc2047-encode (point) ! (progn (insert (car word)) (point)) ! (cdr word)))) (rfc2047-fold-region (point-min) (point-max))))) (defun rfc2047-encode-string (string) ! "Encode words in STRING." (with-temp-buffer (insert string) (rfc2047-encode-region (point-min) (point-max)) (buffer-string))) ! (defun rfc2047-encode (b e charset) ! "Encode the word in the region B to E with CHARSET." ! (let* ((mime-charset (mm-mime-charset charset)) ! (cs (mm-charset-to-coding-system mime-charset)) ! (encoding (or (cdr (assq mime-charset rfc2047-charset-encoding-alist)) ! 'B)) (start (concat "=?" (downcase (symbol-name mime-charset)) "?" (downcase (symbol-name encoding)) "?")) (first t)) ! (save-restriction ! (narrow-to-region b e) ! (when (eq encoding 'B) ! ;; break into lines before encoding ! (goto-char (point-min)) ! (while (not (eobp)) ! (goto-char (min (point-max) (+ 15 (point)))) ! (unless (eobp) ! (insert "\n")))) ! (if (and (mm-multibyte-p) ! (mm-coding-system-p cs)) ! (mm-encode-coding-region (point-min) (point-max) cs)) ! (funcall (cdr (assq encoding rfc2047-encoding-function-alist)) ! (point-min) (point-max)) ! (goto-char (point-min)) ! (while (not (eobp)) ! (unless first ! (insert " ")) ! (setq first nil) ! (insert start) ! (end-of-line) ! (insert "?=") ! (forward-line 1))))) (defun rfc2047-fold-region (b e) "Fold long lines in region B to E." --- 260,322 ---- ;; Insert blank between encoded words (if (eq (char-before) ?=) (insert " ")) (rfc2047-encode (point) ! (progn (insert word) (point))))) (rfc2047-fold-region (point-min) (point-max))))) (defun rfc2047-encode-string (string) ! "Encode words in STRING. ! By default, the string is treated as containing addresses (see ! `rfc2047-special-chars')." (with-temp-buffer (insert string) (rfc2047-encode-region (point-min) (point-max)) (buffer-string))) ! (defun rfc2047-encode (b e) ! "Encode the word(s) in the region B to E. ! By default, the region is treated as containing addresses (see ! `rfc2047-special-chars')." ! (let* ((mime-charset (mm-find-mime-charset-region b e)) ! (cs (if (> (length mime-charset) 1) ! ;; Fixme: instead of this, try to break region into ! ;; parts that can be encoded separately. ! (error "Can't rfc2047-encode `%s'" ! (buffer-substring b e)) ! (setq mime-charset (car mime-charset)) ! (mm-charset-to-coding-system mime-charset))) ! (encoding (if (assq mime-charset ! rfc2047-charset-encoding-alist) ! (cdr (assq mime-charset rfc2047-charset-encoding-alist)) ! 'B)) (start (concat "=?" (downcase (symbol-name mime-charset)) "?" (downcase (symbol-name encoding)) "?")) (first t)) ! (if mime-charset ! (save-restriction ! (narrow-to-region b e) ! (when (eq encoding 'B) ! ;; break into lines before encoding ! (goto-char (point-min)) ! (while (not (eobp)) ! (goto-char (min (point-max) (+ 15 (point)))) ! (unless (eobp) ! (insert "\n")))) ! (if (and (mm-multibyte-p) ! (mm-coding-system-p cs)) ! (mm-encode-coding-region (point-min) (point-max) cs)) ! (funcall (cdr (assq encoding rfc2047-encoding-function-alist)) ! (point-min) (point-max)) ! (goto-char (point-min)) ! (while (not (eobp)) ! (unless first ! (insert " ")) ! (setq first nil) ! (insert start) ! (end-of-line) ! (insert "?=") ! (forward-line 1)))))) (defun rfc2047-fold-region (b e) "Fold long lines in region B to E." --colonel-subversive-BLU-97-A/B-HAMASMOIS-computer-terrorism--