unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* New mail-related routines
@ 2004-10-18 21:57 Alexander Pohoyda
  2004-10-18 22:12 ` Stefan Monnier
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Alexander Pohoyda @ 2004-10-18 21:57 UTC (permalink / raw)


I've developed a list of functions which I find very useful.  These
are basic functions to deal with header fields in mail messages.  A
great deal of code in "mail" directory could eventually be simplified
using these functions.

I know that some functionality is very similar to one found in
lisp/mail/mailheader.el file, but my small library is more powerful
(it parses structured header fields) and is closer to normal text
manipulation routines (header field searching, sorting, other
processing, folding/unfolding).

As you can see, I have moved (and re-implemented) functions
`mail-text-start' and `mail-head-end' from sendmail.el file, and
function `rfc822-goto-eoh' from simple.el file.  I think they are
general-purpose mail functions are belong to mail-utils.el file.


I would very like to hear comments on this code.


Index: mail-utils.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/mail/mail-utils.el,v
retrieving revision 1.57
diff -u -r1.57 mail-utils.el
--- mail-utils.el	4 Mar 2004 17:02:13 -0000	1.57
+++ mail-utils.el	18 Oct 2004 21:15:03 -0000
@@ -352,7 +352,12 @@
 		    "\\|"
 		    (substring labels (match-end 0))))))
   labels)
+
 \f
+;;;
+;;; Date/Time
+;;;
+
 (defun mail-rfc822-time-zone (time)
   (let* ((sec (or (car (current-time-zone time)) 0))
 	 (absmin (/ (abs sec) 60)))
@@ -368,6 +373,353 @@
 	    (substring s (match-beginning 3) (match-end 3)) " "
 	    (mail-rfc822-time-zone time))))
 
+\f
+;;;
+;;; Some variables
+;;;
+
+;;; The -hf suffix means Header Field.
+
+(defconst mail-wsp-regexp "[\040\011]")
+(defconst mail-crlf-regexp "[\015]?[\012]")
+
+;; Header fields must be unfolded before using these regexps.  This
+;; agrees with the RFC 2822, section 2.2.3, last paragraph.
+
+;; Unstructured header fields
+(defconst mail-hf-name-regexp "[\041-\071\073-\176]+")
+(defconst mail-hf-body-regexp "[^\015\012]*")
+(defconst mail-hf-regexp
+  (format "^\\(%s\\)%s*:%s*\\(%s\\)%s*\\(%s\\)?"
+	  mail-hf-name-regexp mail-wsp-regexp mail-wsp-regexp
+	  mail-hf-body-regexp mail-wsp-regexp mail-crlf-regexp))
+
+;; Structured header fields
+(defconst mail-hf-value-itself-regexp "[^;\040\011]*")
+(defconst mail-hf-value-regexp
+  (format "\\(%s\\)%s*"
+	  mail-hf-value-itself-regexp mail-wsp-regexp))
+
+(defconst mail-hf-param-name-regexp "[^=]+")
+(defconst mail-hf-param-value-regexp "\"\\([^\"]*\\)\"\\|\\([^\";\040\011]*\\)")
+(defconst mail-hf-param-regexp
+  (format ";%s*\\(%s\\)=\\(%s\\)"
+	  mail-wsp-regexp
+	  mail-hf-param-name-regexp mail-hf-param-value-regexp))
+
+;; Not used
+(defconst mail-hf-structured-regexp
+  (format "^\\(%s\\)%s*:%s*\\(%s\\)%s*\\(%s\\)*\\(%s\\)?"
+	  mail-hf-name-regexp mail-wsp-regexp mail-wsp-regexp
+	  mail-hf-value-itself-regexp mail-wsp-regexp
+	  mail-hf-param-regexp mail-crlf-regexp))
+
+\f
+;;;
+;;; General-purpose mail functions
+;;;
+
+;; Moved from sendmail.el
+(defun mail-text-start ()
+  "Return the buffer location of the start of text, as a number."
+  (save-restriction
+    (widen)
+    (mail-body-start-position)))
+
+(defun mail-body-start-position (&optional from to)
+  "Return a position where the body of a message starts.
+
+If called without arguments, the current buffer is assumed to be
+narrowed to exactly one message.
+
+This function may also be used to get the body start position of
+a MIME entity in the region between FROM and TO."
+  (let ((from (or from (point-min)))
+	(to (or to (point-max))))
+    (save-excursion
+      (goto-char from)
+      (save-match-data
+	(if (or (search-forward (concat "\n" mail-header-separator "\n") to t)
+		(search-forward "\n\n" to t))
+	    (point)
+	  ;; TODO: Shouldn't we return nil instead?
+	  (message "This entity has no body")
+	  to)))))
+
+;; Moved from simple.el
+(defun rfc822-goto-eoh ()
+  "Go to header delimiter line in a mail message, following RFC822 rules."
+  (goto-char (mail-header-end-position)))
+
+(defalias 'mail-rfc822-goto-eoh 'rfc822-goto-eoh)
+
+;; Moved from sendmail.el
+(defun mail-header-end ()
+  "Return the buffer location of the end of headers, as a number."
+  (save-restriction
+    (widen)
+    (mail-header-end-position)))
+
+(defun mail-header-end-position (&optional from to)
+  "Return a position where the header of a message ends.
+
+If called without arguments, the current buffer is assumed to be
+narrowed to exactly one message.
+
+This function may also be used to get the header end position of
+a MIME entity in the region between FROM and TO."
+  (save-excursion
+    (goto-char (mail-body-start-position from to))
+    (forward-line -1)
+    (point)))
+
+;; TODO: to be refined and extended
+(defun mail-token-p (candidate)
+  "Return t if the CANDIDATE is a valid token."
+  (not (or (string-match mail-wsp-regexp candidate)
+	   (string-match "[=?]" candidate))))
+
+\f
+;;;
+;;; Header field functions
+;;;
+
+(defsubst mail-make-hf (name body)
+  "Return \"NAME: BODY\" string."
+  (when name (concat name ": " body)))
+
+(defsubst mail-insert-hf (header-field)
+  (when header-field (insert header-field "\n")))
+
+(defun mail-make-hf-param (attribute value)
+  "Return and \"ATTRIBUTE=VALUE\" string.
+The VALUE is quoted if it contains SPACE, CTLs, or TSPECIALs."
+  (if (mail-token-p attribute)
+      ;; valid ATTRIBUTE
+      (if (mail-token-p value)
+	  ;; the VALUE is a token
+	  (concat attribute "=" value)
+	;; the VALUE must be quoted
+	(concat attribute "=" (format "%S" value)))
+    ;; the ATTRIBUTE contains invalid characters
+    (error "Invalid attribute.")))
+
+(defun mail-parse-hf (header-field)
+  "Parse the HEADER-FIELD and return a list of type
+\(HF-NAME (HF-VALUE ((HF-ATTR1-NAME . HF-ATTR1-VALUE) (...))))
+if a header field is structured, or
+\(HF-NAME (HF-BODY nil))
+for unstructured header field."
+  (when header-field
+    (let ((name (mail-get-hf-name header-field))
+	  (body (mail-get-hf-body header-field)))
+      (when name
+	(list name
+	      (when (and body (string-match mail-hf-value-regexp body))
+		(list (match-string 1 body)
+		      (mail-parse-hf-parameters
+		       (substring body (match-end 1))))))))))
+
+(defun mail-parse-hf-parameters (header-field)
+  "Parse the HEADER-FIELD and return a list of type
+\((HF-ATTR1-NAME . HF-ATTR1-VALUE) (...))."
+  (when (and header-field
+	     (string-match mail-hf-param-regexp header-field))
+    (cons (cons (match-string 1 header-field)
+		(or (match-string 3 header-field)
+		    (match-string 2 header-field)))
+	  (mail-parse-hf-parameters
+	   (substring header-field (match-end 2))))))
+
+(defun mail-recreate-hf (hf-list)
+  "Return a header field recreated from the HF-LIST."
+  (when hf-list
+    (mail-make-hf
+     (car hf-list)
+     (let ((body (caar (cdr hf-list)))
+	   (hf-params (cadr (cadr hf-list))))
+       (dolist (part hf-params body)
+	 (let ((attribute (car-safe part))
+	       (value (cdr-safe part)))
+	   (setq body
+		 (concat body "; "
+			 (mail-make-hf-param attribute value)))))))))
+
+(defun mail-search-hf (name &optional from to)
+  "Find a header field named NAME in the message header.
+Set point at the beginning of the field found, and return point.
+If the header field is not found, do not move the point and return nil.
+
+The argument FROM defaults to `point-min' and the argument TO is
+set to be the message header end."
+  (let ((found nil)
+	(case-fold-search t)
+	(from (or from (point-min)))
+	(to (or to (mail-header-end-position from (point-max)))))
+    (save-excursion
+      (goto-char from)
+      (save-match-data
+	(when (re-search-forward (concat "^" name ":") to t)
+	  (setq found (point-at-bol)))))
+    (when found (goto-char found))))
+
+(defun mail-hf-body-position ()
+  "Return a position where the current header field body starts."
+  (save-excursion
+    (save-match-data
+      (re-search-forward (format ":\\(%s*\\)" mail-wsp-regexp) nil t))))
+
+(defun mail-hf-end-position ()
+  "Return a position where the current header field ends."
+  (save-excursion
+    (save-match-data
+      (while (progn
+	       (forward-line)
+	       (looking-at (format "%s+" mail-wsp-regexp))))
+      (point))))
+
+(defun mail-get-hf-at-point ()
+  "Return the header field at point."
+  (buffer-substring-no-properties (point) (mail-hf-end-position)))
+
+(defun mail-get-hf (name &optional from to)
+  "Return the whole header field called NAME as a string.
+
+The argument FROM defaults to `point-min' and the argument TO is
+set to be the message header end.
+
+The trailing CRLF is also included."
+  (save-excursion
+    (when (mail-search-hf name from to)
+      (mail-get-hf-at-point))))
+
+(defun mail-get-hf-name (header-field)
+  "Return the name of the HEADER-FIELD."
+  (when header-field
+    (save-match-data
+      (setq header-field (mail-unfold-hf header-field))
+      (when (string-match mail-hf-regexp header-field)
+	(match-string-no-properties 1 header-field)))))
+
+(defun mail-get-hf-body (header-field)
+  "Return the body of the HEADER-FIELD."
+  (when header-field
+    (save-match-data
+      (setq header-field (mail-unfold-hf header-field))
+      (when (string-match mail-hf-regexp header-field)
+	(match-string-no-properties 2 header-field)))))
+
+(defun mail-get-hf-value (header-field)
+  "Return the value of the HEADER-FIELD."
+  (when header-field
+    (caar (cdr (mail-parse-hf header-field)))))
+
+(defun mail-get-hf-attribute (header-field attr-name)
+  "Return the attribute ATTR-NAME from the HEADER-FIELD."
+  (when header-field
+    (let ((attribute-list (cadr (cadr (mail-parse-hf header-field))))
+	  attribute)
+      (while (and (setq attribute (car attribute-list))
+		  (not (string-equal (upcase attr-name)
+				     (upcase (car attribute)))))
+	(setq attribute-list (cdr attribute-list)))
+      (cdr attribute))))
+
+(defun mail-process-hfs-in-region (from to func)
+  "Enumerate all header fields in the region between FROM and TO and
+call FUNC on them."
+  (save-excursion
+    (goto-char from)
+    (save-restriction
+      (narrow-to-region from to)
+      ;; RFC 2822, section 2.2.3.
+      (while (re-search-forward "^[^ \t]+:" nil t)
+	(beginning-of-line)
+	;;(message "Processing `%s' header..."
+	;;	 (mail-get-hf-name (mail-get-hf-at-point)))
+	(funcall func (point) (mail-hf-end-position))
+	;; Goto next header field
+	(goto-char (mail-hf-end-position)))
+      (- (point-max) from))))
+
+(defun mail-sort-hfs-in-region (from to sort-list)
+  "Sort header fields in the region between FROM and TO, using
+SORT-LIST as a sequence."
+  (save-excursion
+    (goto-char from)
+    (save-restriction
+      (narrow-to-region from to)
+      ;; Do the job.
+      (let ((my-pos (point))
+	    my-hf)
+	(dolist (sorted-hf sort-list)
+	  ;;(message "Sorting `%s' header..." sorted-hf)
+	  (when (mail-search-hf sorted-hf)
+	    (setq my-hf (mail-get-hf-at-point))
+	    (delete-region (point) (mail-hf-end-position))
+	    (goto-char my-pos)
+	    (insert my-hf)
+	    (setq my-pos (point))))))))
+
+(defun mail-fold-hf (header-field)
+  (when header-field
+    (with-temp-buffer
+      ;;(message "Header to fold:\n%s" header-field)
+      (insert header-field)
+      (mail-fold-region (point-min) (point-max))
+      (buffer-string))))
+
+(defun mail-fold-region (from to &optional limit)
+  "Fold header fields in the region between FROM and TO,
+as defined by RFC 2822.
+LIMIT defaults to 76."
+  (save-excursion
+    (goto-char from)
+    (save-restriction
+      (narrow-to-region from to)
+      (let ((limit (or limit 76))
+	    start)
+	(while (not (eobp))
+	  (setq start (point))
+	  (goto-char (min (+ (point) (- limit (current-column)))
+			  (point-at-eol)))
+	  (if (and (>= (current-column) limit)
+		   (re-search-backward "[ \t]" start t)
+		   (not (looking-at "\n[ \t]")))
+	      ;; Insert line break
+	      (progn
+		(delete-char 1)
+		(insert-char ?\n 1)         ;; CRLF
+		(insert-char ?\t 1))        ;; WSP
+	    (if (re-search-backward "[ \t]" start t)
+		(forward-line)
+	      ;; Token is too long, so we skip it
+	      (re-search-forward "[ \t]" nil t)
+	      (backward-char)
+	      (delete-char 1)
+	      (insert-char ?\n 1)
+	      (insert-char ?\t 1))))))))
+
+(defun mail-unfold-hf (header-field)
+  (when header-field
+    (with-temp-buffer
+      ;;(message "Header to unfold:\n%s" header-field)
+      (insert header-field)
+      (mail-unfold-region (point-min) (point-max))
+      (buffer-string))))
+
+(defun mail-unfold-region (from to)
+  "Unfold header fields in the region between FROM and TO, 
+as defined by RFC 2822."
+  (save-excursion
+    (goto-char from)
+    (save-restriction
+      (narrow-to-region from to)
+      (save-match-data
+	(while (re-search-forward
+		(format "%s%s+" mail-crlf-regexp mail-wsp-regexp) nil t)
+	  (replace-match " " nil t))))))
+
 (provide 'mail-utils)
 
 ;;; arch-tag: b24aec2f-fd65-4ceb-9e39-3cc2827036fd



-- 
Alexander Pohoyda <alexander.pohoyda@gmx.net>
PGP Key fingerprint: 7F C9 CC 5A 75 CD 89 72  15 54 5F 62 20 23 C6 44

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New mail-related routines
  2004-10-18 21:57 New mail-related routines Alexander Pohoyda
@ 2004-10-18 22:12 ` Stefan Monnier
  2004-10-19  7:06   ` Alexander Pohoyda
  2004-10-19 12:32 ` Reiner Steib
  2004-10-24 12:03 ` Simon Josefsson
  2 siblings, 1 reply; 16+ messages in thread
From: Stefan Monnier @ 2004-10-18 22:12 UTC (permalink / raw)
  Cc: emacs-devel

> +  (let ((from (or from (point-min)))
> +	(to (or to (point-max))))

I'd recommend

  (unless from (setq from (point-min)))
  (unless to (setq to (point-max)))

it saves a bit of indentation and is marginally more efficient.

> +	(if (or (search-forward (concat "\n" mail-header-separator "\n") to t)
> +		(search-forward "\n\n" to t))

This is less robust than what rfc822-goto-eoh uses, in the case where the
mail-header-separator is modified.

> +	    (point)
> +	  ;; TODO: Shouldn't we return nil instead?
> +	  (message "This entity has no body")
> +	  to)))))

I'd argue we should return `to' because the whole thing is the header.


        Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New mail-related routines
  2004-10-18 22:12 ` Stefan Monnier
@ 2004-10-19  7:06   ` Alexander Pohoyda
  2004-10-19 12:51     ` Stefan Monnier
  2004-10-19 18:37     ` Alexander Pohoyda
  0 siblings, 2 replies; 16+ messages in thread
From: Alexander Pohoyda @ 2004-10-19  7:06 UTC (permalink / raw)
  Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> > +  (let ((from (or from (point-min)))
> > +	(to (or to (point-max))))
> 
> I'd recommend
> 
>   (unless from (setq from (point-min)))
>   (unless to (setq to (point-max)))
> 
> it saves a bit of indentation and is marginally more efficient.

Good point, I'll change this everywhere.


> > +	(if (or (search-forward (concat "\n" mail-header-separator "\n") to t)
> > +		(search-forward "\n\n" to t))
> 
> This is less robust than what rfc822-goto-eoh uses, in the case where the
> mail-header-separator is modified.

Yes, you're right.  I'll reuse the original regexp from
rfc822-goto-eoh, but I need a function to return the point and don't
move the cursor.


> > +	    (point)
> > +	  ;; TODO: Shouldn't we return nil instead?
> > +	  (message "This entity has no body")
> > +	  to)))))
> 
> I'd argue we should return `to' because the whole thing is the header.

Sorry, I don't clearly understand you here.
Do you agree with returning `to'?

Thank you very much!


-- 
Alexander Pohoyda <alexander.pohoyda@gmx.net>
PGP Key fingerprint: 7F C9 CC 5A 75 CD 89 72  15 54 5F 62 20 23 C6 44

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New mail-related routines
  2004-10-18 21:57 New mail-related routines Alexander Pohoyda
  2004-10-18 22:12 ` Stefan Monnier
@ 2004-10-19 12:32 ` Reiner Steib
  2004-10-19 17:47   ` Alexander Pohoyda
  2004-10-24 12:03 ` Simon Josefsson
  2 siblings, 1 reply; 16+ messages in thread
From: Reiner Steib @ 2004-10-19 12:32 UTC (permalink / raw)
  Cc: Emacs development

On Mon, Oct 18 2004, Alexander Pohoyda wrote:

> +;;; The -hf suffix means Header Field.

IIRC, the coding conventions (in Emacs Lisp) say not to abbreviate
function and variable names.

> +(defun mail-unfold-region (from to)
> +  "Unfold header fields in the region between FROM and TO, 
> +as defined by RFC 2822."
[...]
> +	(while (re-search-forward
> +		(format "%s%s+" mail-crlf-regexp mail-wsp-regexp) nil t)
> +	  (replace-match " " nil t))))))

I didn't look at the other functions, but this one is incorrect,
AFAICS:

,----[ rfc2822 / 2.2.3. Long Header Fields ]
|    The process of moving from this folded multiple-line representation
|    of a header field to its single line representation is called
|    "unfolding". Unfolding is accomplished by simply removing any CRLF
|    that is immediately followed by WSP.  Each header field should be
|    treated in its unfolded form for further syntactic and semantic
|    evaluation.
`----

(with-temp-buffer
  (insert "Subject: foo\n  bar")
  (mail-unfold-region (point-min) (point-max))
  (buffer-string))
==> "Subject: foo bar"

Your function removes all additional spaces.  The result with
`rfc2047-unfold-region' is correct:

(with-temp-buffer
  (insert "Subject: foo\n  bar")
  (rfc2047-unfold-region (point-min) (point-max))
  (buffer-string))
==> "Subject: foo  bar"

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New mail-related routines
  2004-10-19  7:06   ` Alexander Pohoyda
@ 2004-10-19 12:51     ` Stefan Monnier
  2004-10-19 18:37     ` Alexander Pohoyda
  1 sibling, 0 replies; 16+ messages in thread
From: Stefan Monnier @ 2004-10-19 12:51 UTC (permalink / raw)
  Cc: emacs-devel

>> > +	    (point)
>> > +	  ;; TODO: Shouldn't we return nil instead?
>> > +	  (message "This entity has no body")
>> > +	  to)))))
>> 
>> I'd argue we should return `to' because the whole thing is the header.

> Sorry, I don't clearly understand you here.
> Do you agree with returning `to'?

Yes I agree.  I was arguing against the "TODO" comment.


        Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New mail-related routines
  2004-10-19 12:32 ` Reiner Steib
@ 2004-10-19 17:47   ` Alexander Pohoyda
  2004-10-19 20:02     ` Reiner Steib
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Pohoyda @ 2004-10-19 17:47 UTC (permalink / raw)
  Cc: Emacs development

Reiner Steib <reinersteib+gmane@imap.cc> writes:

> > +;;; The -hf suffix means Header Field.
> 
> IIRC, the coding conventions (in Emacs Lisp) say not to abbreviate
> function and variable names.

I have no problems with that.  If required, I will expand all names.


> > +(defun mail-unfold-region (from to)
> > +  "Unfold header fields in the region between FROM and TO, 
> > +as defined by RFC 2822."
> [...]
> > +	(while (re-search-forward
> > +		(format "%s%s+" mail-crlf-regexp mail-wsp-regexp) nil t)
> > +	  (replace-match " " nil t))))))
> 
> I didn't look at the other functions, but this one is incorrect,
> AFAICS:

Yes, you're right, the function does not conform strictly.  However,
many MUAs insert either TAB or few SPACE characters during header
field folding, so this kind of "loose" unfolding is also desired, I
think.  I'll add an optional argument to control this behaviour.


>                                               The result with
> `rfc2047-unfold-region' is correct:

Funny that the RFC 2047 itself does not define header field
folding/unfolding, so rfc2047-unfold-region is rather confusing name
for this function.

However, let's talk about RFC 2047 functions later :-)


Thank you very much for your comments!


-- 
Alexander Pohoyda <alexander.pohoyda@gmx.net>
PGP Key fingerprint: 7F C9 CC 5A 75 CD 89 72  15 54 5F 62 20 23 C6 44

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New mail-related routines
  2004-10-19  7:06   ` Alexander Pohoyda
  2004-10-19 12:51     ` Stefan Monnier
@ 2004-10-19 18:37     ` Alexander Pohoyda
  2004-10-19 19:29       ` Stefan Monnier
  1 sibling, 1 reply; 16+ messages in thread
From: Alexander Pohoyda @ 2004-10-19 18:37 UTC (permalink / raw)
  Cc: emacs-devel

Alexander Pohoyda <alexander.pohoyda@gmx.net> writes:

> > > +	(if (or (search-forward (concat "\n" mail-header-separator "\n") to t)
> > > +		(search-forward "\n\n" to t))
> > 
> > This is less robust than what rfc822-goto-eoh uses, in the case where the
> > mail-header-separator is modified.
> 
> Yes, you're right.  I'll reuse the original regexp from
> rfc822-goto-eoh, ...

There's a small problem with this approach.  In mbox file format,
messages start with "From blah-blah..", so the original
rfc822-goto-eoh stops at this line.

Wouldn't it be better to use pure "\n\n" as defined by RFC and to make
this a good general-purpose function?  I tend to this solution now.


-- 
Alexander Pohoyda <alexander.pohoyda@gmx.net>
PGP Key fingerprint: 7F C9 CC 5A 75 CD 89 72  15 54 5F 62 20 23 C6 44

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New mail-related routines
  2004-10-19 18:37     ` Alexander Pohoyda
@ 2004-10-19 19:29       ` Stefan Monnier
  2004-10-19 23:56         ` Alexander Pohoyda
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Monnier @ 2004-10-19 19:29 UTC (permalink / raw)
  Cc: emacs-devel

> There's a small problem with this approach.  In mbox file format,
> messages start with "From blah-blah..", so the original
> rfc822-goto-eoh stops at this line.

That's the problem when you try to merge several "identical" functions
into one.  There's a reason why they're not 100% identical.


        Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New mail-related routines
  2004-10-19 17:47   ` Alexander Pohoyda
@ 2004-10-19 20:02     ` Reiner Steib
  2004-10-20  0:03       ` Alexander Pohoyda
  0 siblings, 1 reply; 16+ messages in thread
From: Reiner Steib @ 2004-10-19 20:02 UTC (permalink / raw)
  Cc: Emacs development

On Tue, Oct 19 2004, Alexander Pohoyda wrote:

> Reiner Steib <reinersteib+gmane@imap.cc> writes:
[...]
>> > +(defun mail-unfold-region (from to)
>> > +  "Unfold header fields in the region between FROM and TO, 
>> > +as defined by RFC 2822."
[...]
>> I didn't look at the other functions, but this one is incorrect,
>> AFAICS:
>
> Yes, you're right, the function does not conform strictly.  

Well, the doc string says "as defined by RFC 2822".

> However, many MUAs insert either TAB

IIRC, using "\n\t" is correct.  (Disclaimer: I'm not an expert on
this.)

> or few SPACE characters during header field folding, so this kind of
> "loose" unfolding is also desired, I think.

I'd call it "broken".  Some programs also convert "foo bar" to
"foobar" because of incorrect unfolding.

> I'll add an optional argument to control this behaviour.

Please make the _strict_ behavior the default.

>> The result with `rfc2047-unfold-region' is correct:
>
> Funny that the RFC 2047 itself does not define header field
> folding/unfolding, 

Section 8 of RFC 2047 contains examples for unfolding.

> so rfc2047-unfold-region is rather confusing name for this function.

As soon as the MUA or news client claims to support MIME, it has to
unfold headers according to the MIME rules.

> However, let's talk about RFC 2047 functions later :-)

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo---  |  PGP key available  |  http://rsteib.home.pages.de/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New mail-related routines
  2004-10-19 19:29       ` Stefan Monnier
@ 2004-10-19 23:56         ` Alexander Pohoyda
  0 siblings, 0 replies; 16+ messages in thread
From: Alexander Pohoyda @ 2004-10-19 23:56 UTC (permalink / raw)
  Cc: emacs-devel

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> > There's a small problem with this approach.  In mbox file format,
> > messages start with "From blah-blah..", so the original
> > rfc822-goto-eoh stops at this line.
> 
> That's the problem when you try to merge several "identical" functions
> into one.  There's a reason why they're not 100% identical.

Right.  So the rfc822-goto-eoh function stays unchanged.

-- 
Alexander Pohoyda <alexander.pohoyda@gmx.net>
PGP Key fingerprint: 7F C9 CC 5A 75 CD 89 72  15 54 5F 62 20 23 C6 44

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New mail-related routines
  2004-10-19 20:02     ` Reiner Steib
@ 2004-10-20  0:03       ` Alexander Pohoyda
  0 siblings, 0 replies; 16+ messages in thread
From: Alexander Pohoyda @ 2004-10-20  0:03 UTC (permalink / raw)
  Cc: Emacs development

Reiner Steib <reinersteib+gmane@imap.cc> writes:

> "foobar" because of incorrect unfolding.
> 
> > I'll add an optional argument to control this behaviour.
> 
> Please make the _strict_ behavior the default.

Yes, I did.


> >> The result with `rfc2047-unfold-region' is correct:
> >
> > Funny that the RFC 2047 itself does not define header field
> > folding/unfolding, 
> 
> Section 8 of RFC 2047 contains examples for unfolding.

Please note that those examples contain encoded words, not ordinary
tokens.  Here's the definition:

   When displaying a particular header field that contains multiple
   'encoded-word's, any 'linear-white-space' that separates a pair of
   adjacent 'encoded-word's is ignored.


> > so rfc2047-unfold-region is rather confusing name for this function.
> 
> As soon as the MUA or news client claims to support MIME, it has to
> unfold headers according to the MIME rules.

Sure, but we are not talking about MIME yet.


-- 
Alexander Pohoyda <alexander.pohoyda@gmx.net>
PGP Key fingerprint: 7F C9 CC 5A 75 CD 89 72  15 54 5F 62 20 23 C6 44

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New mail-related routines
  2004-10-18 21:57 New mail-related routines Alexander Pohoyda
  2004-10-18 22:12 ` Stefan Monnier
  2004-10-19 12:32 ` Reiner Steib
@ 2004-10-24 12:03 ` Simon Josefsson
  2004-10-25 22:15   ` Alexander Pohoyda
  2004-10-25 22:43   ` Alexander Pohoyda
  2 siblings, 2 replies; 16+ messages in thread
From: Simon Josefsson @ 2004-10-24 12:03 UTC (permalink / raw)


Is there an updated version of your patch to review?  I think I agree
with all comments raised so far.  It would be easier to review your
code if there is an updated version to look at.

I think it is important to improve documentation regarding these new
functions.  Right now, similar functions are implemented many times in
many places in Emacs.  It would be good if these could be fixed to use
only one correct implementation.  If there is no guidance, people will
just write new functions for the same purpose again, or use one of the
existing but inelegant interfaces.

Perhaps a new texinfo manual could be created, "Emacs Lisp Mail
Functions" (or something), to document the recommended mail APIs to
use in Emacs Lisp.  It should be in a non-MUA specific (Gnus, RMAIL
etc) way.  Perhaps extending emacs-mime.texi is the right thing, it
already contain some non-MIME, but generic purpose mail functions.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New mail-related routines
  2004-10-24 12:03 ` Simon Josefsson
@ 2004-10-25 22:15   ` Alexander Pohoyda
  2004-10-25 22:43   ` Alexander Pohoyda
  1 sibling, 0 replies; 16+ messages in thread
From: Alexander Pohoyda @ 2004-10-25 22:15 UTC (permalink / raw)


Simon Josefsson <jas@extundo.com> writes:

> Is there an updated version of your patch to review?  I think I agree
> with all comments raised so far.  It would be easier to review your
> code if there is an updated version to look at.

I'll post an updated version soon.  Meanwhile, I have realized that
some function from my patch are not as general as I wanted them to
be, so they don't belong into mail-utils.el file.


> I think it is important to improve documentation regarding these new
> functions.  Right now, similar functions are implemented many times in
> many places in Emacs.  It would be good if these could be fixed to use
> only one correct implementation.  If there is no guidance, people will
> just write new functions for the same purpose again, or use one of the
> existing but inelegant interfaces.

Yes, absolutely.  I hope we will agree on some "correct
implementation" so that we have something to document and use.


> Perhaps a new texinfo manual could be created, "Emacs Lisp Mail
> Functions" (or something), to document the recommended mail APIs to
> use in Emacs Lisp.  It should be in a non-MUA specific (Gnus, RMAIL
> etc) way.  Perhaps extending emacs-mime.texi is the right thing, it
> already contain some non-MIME, but generic purpose mail functions.

Yes, I have this in my TODO list.

Thank you very much for your input!


-- 
Alexander Pohoyda <alexander.pohoyda@gmx.net>
PGP Key fingerprint: 7F C9 CC 5A 75 CD 89 72  15 54 5F 62 20 23 C6 44

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New mail-related routines
  2004-10-24 12:03 ` Simon Josefsson
  2004-10-25 22:15   ` Alexander Pohoyda
@ 2004-10-25 22:43   ` Alexander Pohoyda
  2004-10-26 23:16     ` Kevin Rodgers
  1 sibling, 1 reply; 16+ messages in thread
From: Alexander Pohoyda @ 2004-10-25 22:43 UTC (permalink / raw)


Please comment on this code.  Thank you!

;;; The -hf suffix means Header Field.

(defconst mail-wsp-regexp "[\040\011]")
(defconst mail-crlf-regexp "[\015]?[\012]")

;; Header fields must be unfolded before using these regexps.  This
;; agrees with the RFC 2822, section 2.2.3, last paragraph.

;; Unstructured header fields
(defconst mail-hf-name-regexp "[\041-\071\073-\176]+")
(defconst mail-hf-body-regexp "[^\015\012]*")
(defconst mail-hf-regexp
  (format "^\\(%s\\)%s*:%s*\\(%s\\)%s*\\(%s\\)?"
	  mail-hf-name-regexp mail-wsp-regexp mail-wsp-regexp
	  mail-hf-body-regexp mail-wsp-regexp mail-crlf-regexp))

\f
;;;
;;; General-purpose mail functions
;;;

;; Merging this function with `rfc822-goto-eoh' failed, because
;; mbox-formatted messages start with "From name@example.org...",
;; which is neither a valid header field, nor the end of header.
(defun mail-body-start-position (&optional from to)
  "Return a position where the body of a message starts.

If called without arguments, the current buffer is assumed to be
narrowed to exactly one message.

This function may also be used to get the body start position of
a MIME entity in the region between FROM and TO."
  (save-excursion
    (goto-char (or from (point-min)))
    (save-match-data
      (if (or (search-forward (concat "\n" mail-header-separator "\n") to t)
	      (search-forward "\n\n" to t))
	  (point)
	(message "This entity has no body")
	(or to (point-max))))))

(defun mail-header-end-position (&optional from to)
  "Return a position where the header of a message ends.

If called without arguments, the current buffer is assumed to be
narrowed to exactly one message.

This function may also be used to get the header end position of
a MIME entity in the region between FROM and TO."
  (save-excursion
    (goto-char (mail-body-start-position (or from (point-min))
					 (or to (point-max))))
    (forward-line -1)
    (point)))

\f
;;;
;;; Header field functions
;;;

(defsubst mail-make-hf (name body)
  "Return \"NAME: BODY\" string."
  (when name (concat name ": " body)))

(defsubst mail-insert-hf (header-field)
  "Insert the HEADER-FIELD created by `mail-make-hf' function at point."
  (when header-field (insert header-field "\n")))

(defun mail-search-hf (name &optional from to)
  "Find a header field named NAME in the message header.
Set point at the beginning of the field found, and return point.
If the header field is not found, do not move the point and return nil.
The argument FROM defaults to `point-min' and the argument TO is
set to the message header end."
  (let ((found nil)
	(case-fold-search t))
    (save-excursion
      (goto-char (or from (point-min)))
      (save-match-data
	(when (re-search-forward (concat "^" name ":") to t)
	  (setq found (point-at-bol)))))
    (when found (goto-char found))))

(defun mail-hf-body-position ()
  "Return a position where the current header field body starts."
  (save-excursion
    (save-match-data
      (re-search-forward (format ":\\(%s*\\)" mail-wsp-regexp) nil t))))

(defun mail-hf-end-position ()
  "Return a position where the current header field ends."
  (save-excursion
    (save-match-data
      (while (progn
	       (forward-line)
	       (looking-at (format "%s+" mail-wsp-regexp))))
      (point))))

(defun mail-get-hf-at-point ()
  "Return the header field at point."
  (buffer-substring-no-properties (point) (mail-hf-end-position)))

(defun mail-get-hf (name &optional from to)
  "Return the whole header field called NAME as a string.

The argument FROM defaults to `point-min' and the argument TO is
set to the message header end.

The trailing CRLF is also included."
  (save-excursion
    (when (mail-search-hf name from to)
      (mail-get-hf-at-point))))

(defun mail-get-hf-name (header-field)
  "Return the name of the HEADER-FIELD string."
  (when header-field
    (setq header-field (mail-unfold-hf header-field))
    (save-match-data
      (when (string-match mail-hf-regexp header-field)
	(match-string-no-properties 1 header-field)))))

(defun mail-get-hf-body (header-field)
  "Return the body of the HEADER-FIELD string."
  (when header-field
    (setq header-field (mail-unfold-hf header-field))
    (save-match-data
      (when (string-match mail-hf-regexp header-field)
	(match-string-no-properties 2 header-field)))))

(defun mail-process-hfs-in-region (from to function)
  "Enumerate all header fields in the region between FROM and TO and
call FUNCTION on them."
  (save-excursion
    (goto-char from)
    (save-restriction
      (narrow-to-region from to)
      ;; RFC 2822, section 2.2.3.
      (while (re-search-forward "^[^ \t]+:" nil t)
	(beginning-of-line)
	;;(message "Processing `%s' header..."
	;;	 (mail-get-hf-name (mail-get-hf-at-point)))
	(funcall function (point) (mail-hf-end-position))
	;; Goto next header field
	(goto-char (mail-hf-end-position)))
      (- (point-max) from))))

(defun mail-sort-hfs-in-region (from to sort-list)
  "Sort header fields in the region between FROM and TO, using
SORT-LIST as a sequence."
  (save-excursion
    (goto-char from)
    (save-restriction
      (narrow-to-region from to)
      ;; Do the job.
      (let ((my-pos (point))
	    my-hf)
	(dolist (sorted-hf sort-list)
	  ;;(message "Sorting `%s' header..." sorted-hf)
	  (when (mail-search-hf sorted-hf)
	    (setq my-hf (mail-get-hf-at-point))
	    (delete-region (point) (mail-hf-end-position))
	    (goto-char my-pos)
	    (insert my-hf)
	    (setq my-pos (point))))))))

(defun mail-fold-hf (header-field)
  "See description of `mail-fold-region' function."
  (when header-field
    (with-temp-buffer
      ;;(message "Header to fold:\n%s" header-field)
      (insert header-field)
      (mail-fold-region (point-min) (point-max))
      (buffer-string))))

(defun mail-fold-region (from to &optional limit)
  "Fold header fields in the region between FROM and TO,
as defined by RFC 2822.  The LIMIT argument defaults to 76."
  (save-excursion
    (goto-char from)
    (save-restriction
      (narrow-to-region from to)
      (let ((limit (or limit 76))
	    start)
	(while (not (eobp))
	  (setq start (point))
	  (goto-char (min (+ (point) (- limit (current-column)))
			  (point-at-eol)))
	  (if (and (>= (current-column) limit)
		   (re-search-backward mail-wsp-regexp start t)
		   (not (looking-at (format "\n%s" mail-wsp-regexp))))
	      ;; Insert line break
	      (progn
		(insert "\n")
		(forward-char))
	    (if (re-search-backward mail-wsp-regexp start t)
		(forward-line)
	      ;; Token is too long, so we skip it
	      (re-search-forward mail-wsp-regexp nil t)
	      (backward-char)
	      (insert "\n")
	      (forward-char))))))))

(defun mail-unfold-hf (header-field &optional loose)
  "See description of `mail-unfold-region' function."
  (when header-field
    (with-temp-buffer
      ;;(message "Header to unfold:\n%s" header-field)
      (insert header-field)
      (mail-unfold-region (point-min) (point-max) loose)
      (buffer-string))))

(defun mail-unfold-region (from to &optional loose)
  "Unfold header fields in the region between FROM and TO, 
as defined by RFC 2822.

If LOOSE argument is non-nil, replace also all leading WSP
characters with just one SPACE."
  (save-excursion
    (goto-char from)
    (save-restriction
      (narrow-to-region from to)
      (save-match-data
	(while (re-search-forward
		(format "\\(%s\\)%s+" mail-crlf-regexp mail-wsp-regexp) nil t)
	  (if loose
	      (replace-match " " nil t)
	    (replace-match "" nil t nil 1)))))))


-- 
Alexander Pohoyda <alexander.pohoyda@gmx.net>
PGP Key fingerprint: 7F C9 CC 5A 75 CD 89 72  15 54 5F 62 20 23 C6 44

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New mail-related routines
  2004-10-25 22:43   ` Alexander Pohoyda
@ 2004-10-26 23:16     ` Kevin Rodgers
  2004-10-27 16:04       ` Alexander Pohoyda
  0 siblings, 1 reply; 16+ messages in thread
From: Kevin Rodgers @ 2004-10-26 23:16 UTC (permalink / raw)


Alexander Pohoyda wrote:
> Please comment on this code.  Thank you!
> 
> ;;; The -hf suffix means Header Field.
> 
> (defconst mail-wsp-regexp "[\040\011]")
> (defconst mail-crlf-regexp "[\015]?[\012]")

What is the point of putting a single character in [...]?

-- 
Kevin Rodgers

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: New mail-related routines
  2004-10-26 23:16     ` Kevin Rodgers
@ 2004-10-27 16:04       ` Alexander Pohoyda
  0 siblings, 0 replies; 16+ messages in thread
From: Alexander Pohoyda @ 2004-10-27 16:04 UTC (permalink / raw)
  Cc: emacs-devel

Kevin Rodgers <ihs_4664@yahoo.com> writes:

> Alexander Pohoyda wrote:
> > Please comment on this code.  Thank you!
> > ;;; The -hf suffix means Header Field.
> > (defconst mail-wsp-regexp "[\040\011]")
> > (defconst mail-crlf-regexp "[\015]?[\012]")
> 
> What is the point of putting a single character in [...]?

Thank you!  Fixed.

-- 
Alexander Pohoyda <alexander.pohoyda@gmx.net>
PGP Key fingerprint: 7F C9 CC 5A 75 CD 89 72  15 54 5F 62 20 23 C6 44

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2004-10-27 16:04 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-18 21:57 New mail-related routines Alexander Pohoyda
2004-10-18 22:12 ` Stefan Monnier
2004-10-19  7:06   ` Alexander Pohoyda
2004-10-19 12:51     ` Stefan Monnier
2004-10-19 18:37     ` Alexander Pohoyda
2004-10-19 19:29       ` Stefan Monnier
2004-10-19 23:56         ` Alexander Pohoyda
2004-10-19 12:32 ` Reiner Steib
2004-10-19 17:47   ` Alexander Pohoyda
2004-10-19 20:02     ` Reiner Steib
2004-10-20  0:03       ` Alexander Pohoyda
2004-10-24 12:03 ` Simon Josefsson
2004-10-25 22:15   ` Alexander Pohoyda
2004-10-25 22:43   ` Alexander Pohoyda
2004-10-26 23:16     ` Kevin Rodgers
2004-10-27 16:04       ` Alexander Pohoyda

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).