From mboxrd@z Thu Jan 1 00:00:00 1970 Path: main.gmane.org!not-for-mail From: Richard Stallman Newsgroups: gmane.emacs.devel Subject: [d.love@dl.ac.uk: rfc2047.el] Date: Sat, 26 Oct 2002 16:14:09 -0400 Sender: emacs-devel-admin@gnu.org Message-ID: Reply-To: rms@gnu.org NNTP-Posting-Host: main.gmane.org X-Trace: main.gmane.org 1035663501 12555 80.91.224.249 (26 Oct 2002 20:18:21 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sat, 26 Oct 2002 20:18:21 +0000 (UTC) Return-path: Original-Received: from quimby.gnus.org ([80.91.224.244]) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 185XO0-0003GN-00 for ; Sat, 26 Oct 2002 22:18:20 +0200 Original-Received: from monty-python.gnu.org ([199.232.76.173]) by quimby.gnus.org with esmtp (Exim 3.12 #1 (Debian)) id 185XRI-0001ya-00 for ; Sat, 26 Oct 2002 22:21:44 +0200 Original-Received: from localhost ([127.0.0.1] helo=monty-python.gnu.org) by monty-python.gnu.org with esmtp (Exim 4.10) id 185XMS-00063l-00; Sat, 26 Oct 2002 16:16:44 -0400 Original-Received: from list by monty-python.gnu.org with tmda-scanned (Exim 4.10) id 185XKA-00050h-00 for emacs-devel@gnu.org; Sat, 26 Oct 2002 16:14:22 -0400 Original-Received: from mail by monty-python.gnu.org with spam-scanned (Exim 4.10) id 185XJy-0004gy-00 for emacs-devel@gnu.org; Sat, 26 Oct 2002 16:14:20 -0400 Original-Received: from fencepost.gnu.org ([199.232.76.164]) by monty-python.gnu.org with esmtp (Exim 4.10) id 185XJx-0004gs-00 for emacs-devel@gnu.org; Sat, 26 Oct 2002 16:14:09 -0400 Original-Received: from rms by fencepost.gnu.org with local (Exim 4.10) id 185XJx-0002Es-00; Sat, 26 Oct 2002 16:14:09 -0400 Original-To: emacs-devel@gnu.org Errors-To: emacs-devel-admin@gnu.org X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.0.11 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: Emacs development discussions. List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.emacs.devel:8816 X-Report-Spam: http://spam.gmane.org/gmane.emacs.devel:8816 I think a larger group of people should see this patch. Can people please test this? ------- Start of forwarded message ------- Envelope-to: emacs-pretest-bug@gnu.org Delivery-date: Fri, 25 Oct 2002 13:37:43 -0400 To: emacs-pretest-bug@gnu.org Subject: rfc2047.el From: Dave Love Date: 25 Oct 2002 18:37:03 +0100 X-Spam-Status: No, hits=-1.2 required=5.0 tests=PATCH_CONTEXT_DIFF,SPAM_PHRASE_00_01,TO_HAS_SPACES, USER_AGENT version=2.41 X-Spam-Level: - --benelux-Jiang-Zemin-Qaddafi-MD5-subversive This is the best I can do on minimal changes to rfc2047.el to fix the problems with it generating invalid encodings for 21.3. It's not well tested yet, but survived a few attempts to break it. - --benelux-Jiang-Zemin-Qaddafi-MD5-subversive Content-Type: text/x-patch Content-Disposition: inline 2002-10-25 Dave Love * rfc2047.el (ietf-drums): Don't require. (message-posting-charset): defvar when compiling. (rfc2047-header-encoding-alist): Add `address-mime' part. (rfc2047-charset-encoding-alist): Use B for iso-8859-7, iso-8859-8. Doc fix. (rfc2047-q-encoding-alist): Augment header list. (rfc2047-encoding-type, rfc2047-syntax-table): New. (rfc2047-encode-message-header): Account for address-mime method. (rfc2047-dissect-region): Rewritten for RFC2822 rules. Now just return list of strings. (rfc2047-encode-region): Change test for need to encode word. Change rfc2047-encode call. (rfc2047-encode): Remove CHARSET arg and decide encoding from region contents. Index: rfc2047.el =================================================================== RCS file: /cvsroot/emacs/emacs/lisp/gnus/rfc2047.el,v retrieving revision 1.10 diff -u -p -c -r1.10 rfc2047.el cvs server: conflicting specifications of output style *** rfc2047.el 15 Jul 2001 17:42:53 -0000 1.10 - --- rfc2047.el 25 Oct 2002 15:01:44 -0000 *************** *** 1,5 **** ;;; rfc2047.el --- functions for encoding and decoding rfc2047 messages ! ;; Copyright (C) 1998, 1999, 2000 Free Software Foundation, Inc. ;; Author: Lars Magne Ingebrigtsen ;; MORIOKA Tomohiko - --- 1,5 ---- ;;; rfc2047.el --- functions for encoding and decoding rfc2047 messages ! ;; Copyright (C) 1998, 1999, 2000, 2002 Free Software Foundation, Inc. ;; Author: Lars Magne Ingebrigtsen ;; MORIOKA Tomohiko *************** *** 27,37 **** ;;; Code: ! (eval-when-compile (require 'cl)) (require 'qp) (require 'mm-util) - - (require 'ietf-drums) (require 'mail-prsvr) (require 'base64) ;; Fixme: Avoid this (for gnus-point-at-...) mm dependence on gnus. - --- 27,38 ---- ;;; Code: ! (eval-when-compile ! (require 'cl) ! (defvar message-posting-charset)) (require 'qp) (require 'mm-util) (require 'mail-prsvr) (require 'base64) ;; Fixme: Avoid this (for gnus-point-at-...) mm dependence on gnus. *************** *** 41,46 **** - --- 42,49 ---- (defvar rfc2047-header-encoding-alist '(("Newsgroups" . nil) ("Message-ID" . nil) + ("\\(Resent-\\)?\\(From\\|Cc\\|To\\|Bcc\\|Reply-To\\|Sender\\)" . + address-mime) (t . mime)) "*Header/encoding method alist. The list is traversed sequentially. The keys can either be *************** The values can be: *** 50,57 **** 1) nil, in which case no encoding is done; 2) `mime', in which case the header will be encoded according to RFC2047; ! 3) a charset, in which case it will be encoded as that charset; ! 4) `default', in which case the field will be encoded as the rest of the article.") (defvar rfc2047-charset-encoding-alist - --- 53,62 ---- 1) nil, in which case no encoding is done; 2) `mime', in which case the header will be encoded according to RFC2047; ! 3) `address-mime', like `mime', but takes account of the rules for address ! fields (where quoted strings and comments must be treated separately); ! 4) a charset, in which case it will be encoded as that charset; ! 5) `default', in which case the field will be encoded as the rest of the article.") (defvar rfc2047-charset-encoding-alist *************** The values can be: *** 62,69 **** (iso-8859-4 . Q) (iso-8859-5 . B) (koi8-r . B) ! (iso-8859-7 . Q) ! (iso-8859-8 . Q) (iso-8859-9 . Q) (iso-8859-14 . Q) (iso-8859-15 . Q) - --- 67,74 ---- (iso-8859-4 . Q) (iso-8859-5 . B) (koi8-r . B) ! (iso-8859-7 . B) ! (iso-8859-8 . B) (iso-8859-9 . Q) (iso-8859-14 . Q) (iso-8859-15 . Q) *************** The values can be: *** 78,84 **** (iso-2022-jp-2 . B) (iso-2022-int-1 . B)) "Alist of MIME charsets to RFC2047 encodings. ! Valid encodings are nil, `Q' and `B'.") (defvar rfc2047-encoding-function-alist '((Q . rfc2047-q-encode-region) - --- 83,90 ---- (iso-2022-jp-2 . B) (iso-2022-int-1 . B)) "Alist of MIME charsets to RFC2047 encodings. ! Valid encodings are nil, `Q' and `B'. These indicate binary (no) encoding, ! quoted-printable and base64 respectively.") (defvar rfc2047-encoding-function-alist '((Q . rfc2047-q-encode-region) *************** Valid encodings are nil, `Q' and `B'.") *** 87,93 **** "Alist of RFC2047 encodings to encoding functions.") (defvar rfc2047-q-encoding-alist ! '(("\\(From\\|Cc\\|To\\|Bcc\||Reply-To\\):" . "-A-Za-z0-9!*+/") ;; = (\075), _ (\137), ? (\077) are used in the encoded word. ;; Avoid using 8bit characters. ;; Equivalent to "^\000-\007\011\013\015-\037\200-\377=_?" - --- 93,100 ---- "Alist of RFC2047 encodings to encoding functions.") (defvar rfc2047-q-encoding-alist ! '(("\\(Resent-\\)?\\(From\\|Cc\\|To\\|Bcc\\|Reply-To\\|Sender\\):" ! . "-A-Za-z0-9!*+/" ) ;; = (\075), _ (\137), ? (\077) are used in the encoded word. ;; Avoid using 8bit characters. ;; Equivalent to "^\000-\007\011\013\015-\037\200-\377=_?" *************** Valid encodings are nil, `Q' and `B'.") *** 112,117 **** - --- 119,130 ---- (point-max)))) (goto-char (point-min))) + (defvar rfc2047-encoding-type 'address-mime + "The type of encoding done by `rfc2047-encode-region'. + This should be dynamically bound around calls to + `rfc2047-encode-region' to either `mime' or `address-mime'. See + `rfc2047-header-encoding-alist', for definitions.") + (defun rfc2047-encode-message-header () "Encode the message header according to `rfc2047-header-encoding-alist'. Should be called narrowed to the head of the message." *************** Should be called narrowed to the head of *** 142,149 **** (setq alist nil method (cdr elem)))) (cond ! ((eq method 'mime) ! (rfc2047-encode-region (point-min) (point-max))) ((eq method 'default) (if (and (featurep 'mule) (if (boundp 'default-enable-multibyte-characters) - --- 155,165 ---- (setq alist nil method (cdr elem)))) (cond ! ((or (eq method 'address-mime) (eq method 'mime)) ! (goto-char (point-min)) ! (re-search-forward "^[^:]+: *" nil t) ! (let ((rfc2047-encoding-type method)) ! (rfc2047-encode-region (point) (point-max)))) ((eq method 'default) (if (and (featurep 'mule) (if (boundp 'default-enable-multibyte-characters) *************** The buffer may be narrowed." *** 179,258 **** (setq found t))) found)) (defun rfc2047-dissect-region (b e) ! "Dissect the region between B and E into words." ! (let ((word-chars "-A-Za-z0-9!*+/") ! ;; Not using ietf-drums-specials-token makes life simple. ! mail-parse-mule-charset ! words point current ! result word) ! (save-restriction ! (narrow-to-region b e) ! (goto-char (point-min)) ! (skip-chars-forward "\000-\177") ! (while (not (eobp)) ! (setq point (point)) ! (skip-chars-backward word-chars b) ! (unless (eq b (point)) ! (push (cons (buffer-substring b (point)) nil) words)) ! (setq b (point)) ! (goto-char point) ! (setq current (mm-charset-after)) ! (forward-char 1) ! (skip-chars-forward word-chars) ! (while (and (not (eobp)) ! (eq (mm-charset-after) current)) ! (forward-char 1) ! (skip-chars-forward word-chars)) ! (unless (eq b (point)) ! (push (cons (buffer-substring b (point)) current) words)) ! (setq b (point)) ! (skip-chars-forward "\000-\177")) ! (unless (eq b (point)) ! (push (cons (buffer-substring b (point)) nil) words))) ! ;; merge adjacent words ! (setq word (pop words)) ! (while word ! (if (and (cdr word) ! (caar words) ! (not (cdar words)) ! (not (string-match "[^ \t]" (caar words)))) ! (if (eq (cdr (nth 1 words)) (cdr word)) ! (progn ! (setq word (cons (concat ! (car (nth 1 words)) (caar words) ! (car word)) ! (cdr word))) ! (pop words) ! (pop words)) ! (push (cons (concat (caar words) (car word)) (cdr word)) ! result) ! (pop words) ! (setq word (pop words))) ! (push word result) ! (setq word (pop words)))) ! result)) (defun rfc2047-encode-region (b e) ! "Encode all encodable words in region B to E." (let ((words (rfc2047-dissect-region b e)) word) (save-restriction (narrow-to-region b e) (delete-region (point-min) (point-max)) (while (setq word (pop words)) ! (if (not (cdr word)) ! (insert (car word)) ! (rfc2047-fold-region (gnus-point-at-bol) (point)) (goto-char (point-max)) - - (if (> (- (point) (save-restriction - - (widen) - - (gnus-point-at-bol))) 76) - - (insert "\n ")) ;; Insert blank between encoded words (if (eq (char-before) ?=) (insert " ")) (rfc2047-encode (point) ! (progn (insert (car word)) (point)) ! (cdr word)))) (rfc2047-fold-region (point-min) (point-max))))) (defun rfc2047-encode-string (string) - --- 195,314 ---- (setq found t))) found)) + ;; Use this syntax table when parsing into regions that may need + ;; encoding. Double quotes are string delimiters, backslash is + ;; character quoting, and all other RFC 2822 special characters are + ;; treated as punctuation so we can use forward-sexp/forward-word to + ;; skip to the end of regions appropriately. Nb. ietf-drums does + ;; things differently. + (defconst rfc2047-syntax-table + (let ((table (make-char-table 'syntax-table '(2)))) + (modify-syntax-entry ?\\ "\\" table) + (modify-syntax-entry ?\" "\"" table) + (modify-syntax-entry ?\( "." table) + (modify-syntax-entry ?\) "." table) + (modify-syntax-entry ?\< "." table) + (modify-syntax-entry ?\> "." table) + (modify-syntax-entry ?\[ "." table) + (modify-syntax-entry ?\] "." table) + (modify-syntax-entry ?: "." table) + (modify-syntax-entry ?\; "." table) + (modify-syntax-entry ?, "." table) + (modify-syntax-entry ?@ "." table) + table)) + (defun rfc2047-dissect-region (b e) ! "Dissect the region between B and E into tokens. ! The tokens comprise sequences of atoms, quoted strings, special ! characters and whitespace." ! (save-restriction ! (narrow-to-region b e) ! (if (eq 'mime rfc2047-encoding-type) ! ;; Simple case -- no need to tokenize. ! (list (buffer-substring b e)) ! ;; `address-mime' case -- take care of quoted words, comments. ! (with-syntax-table rfc2047-syntax-table ! (let ((start (point)) ! words last-encoded end) ! (goto-char (point-min)) ! (condition-case nil ; in case of unbalanced specials ! ;; Look for rfc2822-style: sequences of atoms, quoted ! ;; strings, specials, whitespace. (Specials mustn't be ! ;; encoded.) ! (while (not (eobp)) ! (setq start (point)) ! (cond ! ((not (char-after))) ; eob ! ((or (eq ?\ (char-after)) ! (eq ?\t (char-after))) ! (skip-chars-forward " \t") ! (push (buffer-substring start (point)) words)) ! ;; else token start ! ((eq ?\" (char-syntax (char-after))) ! ;; Quoted word. ! (forward-sexp) ! (setq end (point)) ! ;; Does it need encoding? ! (goto-char start) ! (skip-chars-forward "\000-\177" end) ! (if (= end (point)) ! ;; It doesn't need encoding. ! (progn (push (buffer-substring start end) words) ! (setq last-encoded nil)) ! ;; It needs encoding. Strip the quotes first, ! ;; since encoded words can't occur in quotes. ! (goto-char end) ! (setq end (1- end) ! start (1+ start)) ! (if (and last-encoded ! (string-match "^[ \t]+\\'" (car-safe words))) ! ;; There was a preceding quoted word followed by ! ;; whitespace. Include the whitespace in this ! ;; word to avoid it getting lost. ! (push (concat (pop words) (buffer-substring start end)) ! words) ! (push (buffer-substring start end) words)) ! (setq last-encoded t))) ; record that it was encoded ! ((eq ?. (char-syntax (char-after))) ! ;; Skip other delimiters, but record that they've ! ;; potentially separated quoted words. ! (forward-char) ! (push (string (char-before)) words) ! (setq last-encoded nil)) ! (t ; normal token/whitespace sequence ! ;; Find the end. ! (forward-word 1) ! (skip-chars-backward " \t") ! (setq end (point)) ! ;; Deal with encoding and leading space as for ! ;; quoted words. ! (goto-char start) ! (skip-chars-forward "\000-\177" end) ! (setq last-encoded (/= end (point))) ! (goto-char end) ! (push (buffer-substring start end) words)))) ! (error (error "Invalid data for rfc2047 encoding: %s" ! (buffer-substring b e)))) ! (nreverse words)))))) + ;; Fixme: why does this cons a list of words and insert them, rather + ;; than encoding in place? (defun rfc2047-encode-region (b e) ! "Encode all encodable words in region B to E. ! By default, the region is treated as containing addresses (see ! `rfc2047-special-chars')." (let ((words (rfc2047-dissect-region b e)) word) (save-restriction (narrow-to-region b e) (delete-region (point-min) (point-max)) (while (setq word (pop words)) ! (if (string-match "\\`[\0-\177]*\\'" word) ; including whitespace ! (insert word) (goto-char (point-max)) ;; Insert blank between encoded words (if (eq (char-before) ?=) (insert " ")) (rfc2047-encode (point) ! (progn (insert word) (point))))) (rfc2047-fold-region (point-min) (point-max))))) (defun rfc2047-encode-string (string) *************** The buffer may be narrowed." *** 262,301 **** (rfc2047-encode-region (point-min) (point-max)) (buffer-string))) ! (defun rfc2047-encode (b e charset) ! "Encode the word in the region B to E with CHARSET." ! (let* ((mime-charset (mm-mime-charset charset)) ! (cs (mm-charset-to-coding-system mime-charset)) ! (encoding (or (cdr (assq mime-charset rfc2047-charset-encoding-alist)) ! 'B)) (start (concat "=?" (downcase (symbol-name mime-charset)) "?" (downcase (symbol-name encoding)) "?")) (first t)) ! (save-restriction ! (narrow-to-region b e) ! (when (eq encoding 'B) ! ;; break into lines before encoding ! (goto-char (point-min)) ! (while (not (eobp)) ! (goto-char (min (point-max) (+ 15 (point)))) ! (unless (eobp) ! (insert "\n")))) ! (if (and (mm-multibyte-p) ! (mm-coding-system-p cs)) ! (mm-encode-coding-region (point-min) (point-max) cs)) ! (funcall (cdr (assq encoding rfc2047-encoding-function-alist)) ! (point-min) (point-max)) ! (goto-char (point-min)) ! (while (not (eobp)) ! (unless first ! (insert " ")) ! (setq first nil) ! (insert start) ! (end-of-line) ! (insert "?=") ! (forward-line 1))))) (defun rfc2047-fold-region (b e) "Fold long lines in region B to E." - --- 318,366 ---- (rfc2047-encode-region (point-min) (point-max)) (buffer-string))) ! (defun rfc2047-encode (b e) ! "Encode the word(s) in the region B to E." ! (let* ((mime-charset (mm-find-mime-charset-region b e)) ! (cs (if (> (length mime-charset) 1) ! ;; Fixme: instead of this, try to break region into ! ;; parts that can be encoded separately. ! (error "Can't rfc2047-encode `%s'" ! (buffer-substring b e)) ! (setq mime-charset (car mime-charset)) ! (mm-charset-to-coding-system mime-charset))) ! (encoding (if (assq mime-charset ! rfc2047-charset-encoding-alist) ! (cdr (assq mime-charset rfc2047-charset-encoding-alist)) ! 'B)) (start (concat "=?" (downcase (symbol-name mime-charset)) "?" (downcase (symbol-name encoding)) "?")) (first t)) ! (if mime-charset ! (save-restriction ! (narrow-to-region b e) ! (when (eq encoding 'B) ! ;; break into lines before encoding ! (goto-char (point-min)) ! (while (not (eobp)) ! (goto-char (min (point-max) (+ 15 (point)))) ! (unless (eobp) ! (insert "\n")))) ! (if (and (mm-multibyte-p) ! (mm-coding-system-p cs)) ! (mm-encode-coding-region (point-min) (point-max) cs)) ! (funcall (cdr (assq encoding rfc2047-encoding-function-alist)) ! (point-min) (point-max)) ! (goto-char (point-min)) ! (while (not (eobp)) ! (unless first ! (insert " ")) ! (setq first nil) ! (insert start) ! (end-of-line) ! (insert "?=") ! (forward-line 1)))))) (defun rfc2047-fold-region (b e) "Fold long lines in region B to E." - --benelux-Jiang-Zemin-Qaddafi-MD5-subversive-- ------- End of forwarded message -------