unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* automatic MIME decoding in rmail
@ 2006-03-23 16:32 Evil Boris
  2006-03-26  0:21 ` Richard Stallman
  0 siblings, 1 reply; 9+ messages in thread
From: Evil Boris @ 2006-03-23 16:32 UTC (permalink / raw)



I have been caught several times by the following.  Email comes with
the content-type message header

---------
Content-Type: text/plain;
        format=flowed;
        charset="koi8-r";
        reply-type=original
---------

Notice that CHARSET specifier does not immediately follow
"text/plain"---there is a FORMAT specification intervening in between.
For example hotmail.com sends such messages and several other
services.  (I have not been able to determine from reading the spec if
this is allowed by the standard.  Not sure how relevant that is
though.)  Emacs does not decode the charset correctly correct, because
of the following:

(defvar rmail-mime-charset-pattern
  "^content-type:[ ]*text/plain;[ \t\n]*charset=\"?\\([^ \t\n\";]+\\)\"?"
  "Regexp to match MIME-charset specification in a header of message.
The first parenthesized expression should match the MIME-charset name.")

I guess the fix is to replace [ \t\n]* with a pattern matching any
number of intervening specifications.  Not sure what form they should
take, so will leave that for an expert.

Thank,
        --Boris

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: automatic MIME decoding in rmail
  2006-03-23 16:32 automatic MIME decoding in rmail Evil Boris
@ 2006-03-26  0:21 ` Richard Stallman
  2006-03-27 23:18   ` Stefan Monnier
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Stallman @ 2006-03-26  0:21 UTC (permalink / raw)
  Cc: emacs-devel

Does this fix it?

*** rmail.el	18 Mar 2006 13:28:06 -0500	1.422
--- rmail.el	25 Mar 2006 18:19:10 -0500	
***************
*** 622,628 ****
  
  ;;;###autoload
  (defvar rmail-mime-charset-pattern
!   "^content-type:[ ]*text/plain;[ \t\n]*charset=\"?\\([^ \t\n\";]+\\)\"?"
    "Regexp to match MIME-charset specification in a header of message.
  The first parenthesized expression should match the MIME-charset name.")
  
--- 622,629 ----
  
  ;;;###autoload
  (defvar rmail-mime-charset-pattern
!   (concat "^content-type:[ ]*text/plain;\\(?:[ \t\n]*format=[a-z]+;\\)?"
! 	  "[ \t\n]*charset=\"?\\([^ \t\n\";]+\\)\"?")
    "Regexp to match MIME-charset specification in a header of message.
  The first parenthesized expression should match the MIME-charset name.")

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: automatic MIME decoding in rmail
  2006-03-26  0:21 ` Richard Stallman
@ 2006-03-27 23:18   ` Stefan Monnier
  2006-03-28 19:33     ` Richard Stallman
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Monnier @ 2006-03-27 23:18 UTC (permalink / raw)
  Cc: emacs-devel, Evil Boris

> !   (concat "^content-type:[ ]*text/plain;\\(?:[ \t\n]*format=[a-z]+;\\)?"
> ! 	  "[ \t\n]*charset=\"?\\([^ \t\n\";]+\\)\"?")

As far as I know, it's is prefectly valid to add any random number of
arbitrary non-standard args.  So "format=[a-z]+" is too restrictive.
We'd probably want something more like:

   (concat "^content-type:[ ]*text/plain;"
           "\\(?:[ \t\n]*[-a-z]+=\\(?:[^\";]+\\|\"[^\"]+\"\\);\\)*"
 	   "[ \t\n]*charset=\"?\\([^ \t\n\";]+\\)\"?")

although I can't remember the exact BNF rules for this MIME header, so it
can probably do with a bit more work.


        Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: automatic MIME decoding in rmail
  2006-03-27 23:18   ` Stefan Monnier
@ 2006-03-28 19:33     ` Richard Stallman
  2006-04-09 21:32       ` Evil Boris
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Stallman @ 2006-03-28 19:33 UTC (permalink / raw)
  Cc: emacs-devel, evilborisnet

    As far as I know, it's is prefectly valid to add any random number of
    arbitrary non-standard args.  So "format=[a-z]+" is too restrictive.

It is an easy fix that may work in practice.

    We'd probably want something more like:

       (concat "^content-type:[ ]*text/plain;"
	       "\\(?:[ \t\n]*[-a-z]+=\\(?:[^\";]+\\|\"[^\"]+\"\\);\\)*"
	       "[ \t\n]*charset=\"?\\([^ \t\n\";]+\\)\"?")

I don't mind installing this, but it may not be really correct,
any more than my quick fix is.

Trying to solve the problem in a fully general way could be a lot
harder, however.

    although I can't remember the exact BNF rules for this MIME header, so it
    can probably do with a bit more work.

Yes, that's what could make a fully correct solution hard.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: automatic MIME decoding in rmail
  2006-03-28 19:33     ` Richard Stallman
@ 2006-04-09 21:32       ` Evil Boris
  2006-04-10  3:26         ` Richard Stallman
  0 siblings, 1 reply; 9+ messages in thread
From: Evil Boris @ 2006-04-09 21:32 UTC (permalink / raw)



> Stefan Monnier  writes:
>     As far as I know, it's is prefectly valid to add any random number of
>     arbitrary non-standard args.  So "format=[a-z]+" is too restrictive.

The RFC (3676) that defines format=fixed or format=flowed also
mentions delsp=yes or delsp=no.  Looking around, but not too
carefully, I have not found any other allowed arguments, for
text/plain.

I think there are two alternatives.  One is to try and enumerate the
allowed args explicitly (please do not forget to surround the
argument values with optional quotes, as they are commonly used, e.g.,
allowing both format=flowed and format="flowed".  (I.e., modify
Richard's expression by allowing quotes and delsp=... .)

Alternatively, some sensible expression that would match a general set
of arguments such as what Stefan suggests.

I tried the first alternative with expression:

 (concat "^content-type:[ ]*text/plain;\\(?:[ \t\n]*format=\"?[a-z]+\"?;\\)?"
                 "[ \t\n]*charset=\"?\\([^ \t\n\";]+\\)\"?")

and had no trouble so far, though I receive very few msgs in this
format and perhaps the regex only got exercised once or twice for the
class of msgs we are discussing.  A more general expression including
delsp would look something like this:

 (concat "^content-type:[ ]*text/plain;"
         "\\(?:[ \t\n]*\\(?:format\\|delsp\\)=\"?[a-z]+\"?;\\)?"
         "[ \t\n]*charset=\"?\\([^ \t\n\";]+\\)\"?")

One could imagine even specifying the two legal values for formal and
delsp, but this seems like overkill...

Should one of the above versions be incorporated in CVS?  If it already
has been, my apologies...

     --Boris

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: automatic MIME decoding in rmail
  2006-04-09 21:32       ` Evil Boris
@ 2006-04-10  3:26         ` Richard Stallman
  2006-05-06  0:05           ` Evil Boris
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Stallman @ 2006-04-10  3:26 UTC (permalink / raw)
  Cc: emacs-devel

I will make it handle delsp also.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: automatic MIME decoding in rmail
  2006-04-10  3:26         ` Richard Stallman
@ 2006-05-06  0:05           ` Evil Boris
  2006-05-06 23:36             ` Richard Stallman
  0 siblings, 1 reply; 9+ messages in thread
From: Evil Boris @ 2006-05-06  0:05 UTC (permalink / raw)



Richard Stallman <rms@gnu.org> writes:

> I will make it handle delsp also.

Thanks. Today, however, having updated from CVS, I had very strange
things happen.  To make a long story short, I got caught by
loaddefs.el not regenerating properly (not sure if it was something I
did or the "cvs update" followed by "make" followed by "recompile"
followed by "make" is insufficient...; I was stuck with new version of
rmail.el, but old version of rmail-mime-charset-pattern---this broke
terribly, see below).  Once I fixed that, I see that

==========
rmail-mime-charset-pattern is a variable defined in `rmail.el'.
Its value is
"^content-type:[ ]*text/plain;\\(?:[    \n]*\\(format\\|delsp\\)=\"?[-a-z0-9]+\\
"?;\\)*[         \n]*charset=\"?\\([^    \n\";]+\\)\"?"
==========

Which I thought was good.  But then I see
===========
revision 1.425
date: 2006-04-19 09:55:40 +0000;  author: rfrancoise;  state: Exp;  lines: +2 -2
(rmail-convert-to-babyl-format): Use second group from
`rmail-mime-charset-pattern'.
===========

which sounded odd, until I realized that there is a ":?" missing in
front of "format\\|".   So perhaps one should change the regexp and
put (match-string 1) back in?

      --Boris

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: automatic MIME decoding in rmail
  2006-05-06  0:05           ` Evil Boris
@ 2006-05-06 23:36             ` Richard Stallman
  2006-05-16 14:55               ` Evil Boris
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Stallman @ 2006-05-06 23:36 UTC (permalink / raw)
  Cc: emacs-devel

    which sounded odd, until I realized that there is a ":?" missing in
    front of "format\\|".   So perhaps one should change the regexp and
    put (match-string 1) back in?

Ok.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: automatic MIME decoding in rmail
  2006-05-06 23:36             ` Richard Stallman
@ 2006-05-16 14:55               ` Evil Boris
  0 siblings, 0 replies; 9+ messages in thread
From: Evil Boris @ 2006-05-16 14:55 UTC (permalink / raw)



Richard Stallman <rms@gnu.org> writes:

>     which sounded odd, until I realized that there is a ":?" missing in
>     front of "format\\|".   So perhaps one should change the regexp and
>     put (match-string 1) back in?
>
> Ok.

Was the change ever made?  It does not seem to be in CVS.  

[Reminder: I am referring to undoing the change:

----------------------------
revision 1.425
date: 2006-04-19 09:55:40 +0000;  author: rfrancoise;  state: Exp;  lines: +2 -2
(rmail-convert-to-babyl-format): Use second group from
`rmail-mime-charset-pattern'.
----------------------------
in rmail.el and changing the regexp

=====
;;;###autoload
(defvar rmail-mime-charset-pattern
  (concat "^content-type:[ ]*text/plain;"
          "\\(?:[ \t\n]*\\(format\\|delsp\\)=\"?[-a-z0-9]+\"?;\\)*"
          "[ \t\n]*charset=\"?\\([^ \t\n\";]+\\)\"?")
  "Regexp to match MIME-charset specification in a header of message.
The first parenthesized expression should match the MIME-charset name.")
======

to 

======
;;;###autoload
(defvar rmail-mime-charset-pattern
  (concat "^content-type:[ ]*text/plain;"
          "\\(?:[ \t\n]*\\(?:format\\|delsp\\)=\"?[-a-z0-9]+\"?;\\)*"
          "[ \t\n]*charset=\"?\\([^ \t\n\";]+\\)\"?")
  "Regexp to match MIME-charset specification in a header of message.
The first parenthesized expression should match the MIME-charset name.")
=======

[just added "?:" in front of "format"]

Thanks,

      --Boris

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-05-16 14:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-23 16:32 automatic MIME decoding in rmail Evil Boris
2006-03-26  0:21 ` Richard Stallman
2006-03-27 23:18   ` Stefan Monnier
2006-03-28 19:33     ` Richard Stallman
2006-04-09 21:32       ` Evil Boris
2006-04-10  3:26         ` Richard Stallman
2006-05-06  0:05           ` Evil Boris
2006-05-06 23:36             ` Richard Stallman
2006-05-16 14:55               ` Evil Boris

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).