all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bad rfc2047 encoding
@ 2002-08-15 22:00 Dave Love
       [not found] ` <ilu8z36li3m.fsf@latte.josefsson.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Love @ 2002-08-15 22:00 UTC (permalink / raw)
  Cc: bugs

In Emacs 21.2, I see

(with-temp-buffer
  (insert "To: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann)
")
  (rfc2047-encode-message-header)
  (buffer-string))
  => "To: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai =?iso-8859-1?q?Gro=DFjohann)?=
"

which is an invalid encoding according to §5 of rfc2047, and Exim
refuses to deliver the result:

  Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai =?iso-8859-1?q?Gro=DFjohann)?=: malformed address: ?= may not follow Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai =?iso-8859-1?q?Gro=DFjohann)

I'm surprised this hasn't been noticed and fixed before, but I think
it needs fixing for 21.3 by someone more familiar with the standards
and the logic in rfc2047.el.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bad rfc2047 encoding
       [not found] ` <ilu8z36li3m.fsf@latte.josefsson.org>
@ 2002-08-20 17:02   ` Dave Love
  2002-08-20 17:22     ` Simon Josefsson
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Love @ 2002-08-20 17:02 UTC (permalink / raw)
  Cc: bugs, bug-gnu-emacs

Simon Josefsson <jas@extundo.com> writes:

> This was fixed in Oort some time ago

Does that mean that Gnus 5.9 isn't being maintained?

> (rev 6.5 of rfc2047.el in Gnus
> CVS), patch modified against work with 21.3:

It doesn't solve the problem as far as I can tell.  I'd have thought
that obeying the RFC means parsing the header, since it concerns
comment fields.

I've restored bug-gnu-Emacs to the Cc since this is something I think
is important for a release.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bad rfc2047 encoding
  2002-08-20 17:02   ` Dave Love
@ 2002-08-20 17:22     ` Simon Josefsson
  2002-08-21 16:54       ` Dave Love
       [not found]       ` <hvofbxtmpd.fsf@rasputin.ws.nextra.no>
  0 siblings, 2 replies; 14+ messages in thread
From: Simon Josefsson @ 2002-08-20 17:22 UTC (permalink / raw)
  Cc: bugs, bug-gnu-emacs

Dave Love <d.love@dl.ac.uk> writes:

> Simon Josefsson <jas@extundo.com> writes:
>
>> This was fixed in Oort some time ago
>
> Does that mean that Gnus 5.9 isn't being maintained?

That wasn't what I meant.  I don't know the answer.

>> (rev 6.5 of rfc2047.el in Gnus
>> CVS), patch modified against work with 21.3:
>
> It doesn't solve the problem as far as I can tell.  I'd have thought
> that obeying the RFC means parsing the header, since it concerns
> comment fields.

Is that necessery?  Encoded words are allowed inside comments, they
must simply not contain the character ).  Which the patch fixes.

Your example

(with-temp-buffer
  (insert "To: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai Großjohann)
")
  (rfc2047-encode-message-header)
  (buffer-string))

evaluates to

"To: Kai.Grossjohann@CS.Uni-Dortmund.DE (Kai =?iso-8859-1?q?Gro=DFjohann?=)
"

with the patch, which seems valid to me.  Compare an example in the RFC:

   From: Nathaniel Borenstein <nsb@thumper.bellcore.com>
         (=?iso-8859-8?b?7eXs+SDv4SDp7Oj08A==?=)

> I've restored bug-gnu-Emacs to the Cc since this is something I think
> is important for a release.

I agree.  (I'm reading the gnus bugs list from quimby.gnus.org, which
removes To/Cc so when I reply it only goes to the author and
bugs@gnus.org.)

Suggested patch (against Emacs 21.3 RC) included again below.

2000-11-19 12:00:00  ShengHuo ZHU  <zsh@cs.rochester.edu>

	* rfc2047.el (rfc2047-q-encoding-alist): Match Resent-.
	(rfc2047-header-encoding-alist): Addresses are different from text.
	(rfc2047-encode-message-header): Ditto.
	(rfc2047-dissect-region): Extra parameter.
	(rfc2047-encode-region): Ditto.
	(rfc2047-encode-string): Ditto.

Index: rfc2047.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/gnus/rfc2047.el,v
retrieving revision 1.10
diff -u -p -u -w -r1.10 rfc2047.el
--- rfc2047.el	15 Jul 2001 17:42:53 -0000	1.10
+++ rfc2047.el	16 Aug 2002 19:23:17 -0000
@@ -41,6 +41,8 @@
 (defvar rfc2047-header-encoding-alist
   '(("Newsgroups" . nil)
     ("Message-ID" . nil)
+    ("\\(Resent-\\)?\\(From\\|Cc\\|To\\|Bcc\\|Reply-To\\|Sender\\)" .
+     "-A-Za-z0-9!*+/=_")
     (t . mime))
   "*Header/encoding method alist.
 The list is traversed sequentially.  The keys can either be
@@ -52,7 +54,8 @@ The values can be:
 2) `mime', in which case the header will be encoded according to RFC2047;
 3) a charset, in which case it will be encoded as that charset;
 4) `default', in which case the field will be encoded as the rest
-   of the article.")
+   of the article.
+5) a string, like `mime', expect for using it as word-chars.")
 
 (defvar rfc2047-charset-encoding-alist
   '((us-ascii . nil)
@@ -87,7 +90,8 @@ Valid encodings are nil, `Q' and `B'.")
   "Alist of RFC2047 encodings to encoding functions.")
 
 (defvar rfc2047-q-encoding-alist
-  '(("\\(From\\|Cc\\|To\\|Bcc\||Reply-To\\):" . "-A-Za-z0-9!*+/")
+  '(("\\(Resent-\\)?\\(From\\|Cc\\|To\\|Bcc\\|Reply-To\\|Sender\\):" 
+     . "-A-Za-z0-9!*+/" )
     ;; = (\075), _ (\137), ? (\077) are used in the encoded word.
     ;; Avoid using 8bit characters.
     ;; Equivalent to "^\000-\007\011\013\015-\037\200-\377=_?"
@@ -142,6 +146,8 @@ Should be called narrowed to the head of
 		(setq alist nil
 		      method (cdr elem))))
 	    (cond
+	     ((stringp method)
+	      (rfc2047-encode-region (point-min) (point-max) method))
 	     ((eq method 'mime)
 	      (rfc2047-encode-region (point-min) (point-max)))
 	     ((eq method 'default)
@@ -179,11 +185,12 @@ The buffer may be narrowed."
 	(setq found t)))
     found))
 
-(defun rfc2047-dissect-region (b e)
+(defun rfc2047-dissect-region (b e &optional word-chars)
   "Dissect the region between B and E into words."
-  (let ((word-chars "-A-Za-z0-9!*+/")
-	;; Not using ietf-drums-specials-token makes life simple.
-	mail-parse-mule-charset
+  (unless word-chars
+    ;; Anything except most CTLs, WSP
+    (setq word-chars "\010\012\014\041-\177"))
+  (let (mail-parse-mule-charset
 	words point current
 	result word)
     (save-restriction
@@ -233,9 +240,9 @@ The buffer may be narrowed."
 	(setq word (pop words))))
     result))
 
-(defun rfc2047-encode-region (b e)
-  "Encode all encodable words in region B to E."
-  (let ((words (rfc2047-dissect-region b e)) word)
+(defun rfc2047-encode-region (b e &optional word-chars)
+  "Encode all encodable words in REGION."
+  (let ((words (rfc2047-dissect-region b e word-chars)) word)
     (save-restriction
       (narrow-to-region b e)
       (delete-region (point-min) (point-max))
@@ -255,11 +262,11 @@ The buffer may be narrowed."
 			  (cdr word))))
       (rfc2047-fold-region (point-min) (point-max)))))
 
-(defun rfc2047-encode-string (string)
+(defun rfc2047-encode-string (string &optional word-chars)
   "Encode words in STRING."
   (with-temp-buffer
     (insert string)
-    (rfc2047-encode-region (point-min) (point-max))
+    (rfc2047-encode-region (point-min) (point-max) word-chars)
     (buffer-string)))
 
 (defun rfc2047-encode (b e charset)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bad rfc2047 encoding
  2002-08-20 17:22     ` Simon Josefsson
@ 2002-08-21 16:54       ` Dave Love
  2002-08-21 17:06         ` Simon Josefsson
       [not found]       ` <hvofbxtmpd.fsf@rasputin.ws.nextra.no>
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Love @ 2002-08-21 16:54 UTC (permalink / raw)
  Cc: bugs, bug-gnu-emacs

Simon Josefsson <jas@extundo.com> writes:

> > Does that mean that Gnus 5.9 isn't being maintained?
> 
> That wasn't what I meant.  I don't know the answer.

Well, it looks that way.

> Is that necessery?

I don't know for sure, which is why I wanted someone to address it
who's more familiar with the standards than I am.

> Encoded words are allowed inside comments,

Sure.

> they must simply not contain the character ).  Which the patch fixes.

It didn't seem to, but I think I fell for a defvar not getting
replaced, since the horrible mess of require'ments means you can't
unload the rfc2047 feature.  [rfc2047.el really shouldn't depend on
`message-posting-charset', for instance, even if people are set
against providing a general MIME library.]

In a fresh Emacs the patched version does do the trick, thanks.
However, clearly the result for the typical form of address used in
the microshaft world is bogus:

(with-temp-buffer
  (insert "To: \"Großjohann, K (Kai)\" <Kai.Grossjohann@CS.Uni-Dortmund.DE>
")
  (rfc2047-encode-message-header)
  (buffer-string))
  => "To: \"=?iso-8859-1?q?Gro=DFjohann,_K_(Kai)\"_<Kai.Grossjohann@CS.Uni-Dort?=
 =?iso-8859-1?q?mund.DE>?=
"

Sorry, I don't have time to grovel the RFCs.  Is one simply not
allowed to use such addresses?  If so, Gnus should warn rather than
completely mangling it.

> I agree.  (I'm reading the gnus bugs list from quimby.gnus.org, which
> removes To/Cc so when I reply it only goes to the author and
> bugs@gnus.org.)

That seems unfortunate.  People reporting bugs will potentially miss
discussion.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bad rfc2047 encoding
  2002-08-21 16:54       ` Dave Love
@ 2002-08-21 17:06         ` Simon Josefsson
  2002-08-21 17:40           ` Reiner Steib
  2002-08-22 11:47           ` Dave Love
  0 siblings, 2 replies; 14+ messages in thread
From: Simon Josefsson @ 2002-08-21 17:06 UTC (permalink / raw)
  Cc: bugs, bug-gnu-emacs

Dave Love <d.love@dl.ac.uk> writes:

> However, clearly the result for the typical form of address used in
> the microshaft world is bogus:
>
> (with-temp-buffer
>   (insert "To: \"Großjohann, K (Kai)\" <Kai.Grossjohann@CS.Uni-Dortmund.DE>
> ")
>   (rfc2047-encode-message-header)
>   (buffer-string))
>   => "To: \"=?iso-8859-1?q?Gro=DFjohann,_K_(Kai)\"_<Kai.Grossjohann@CS.Uni-Dort?=
>  =?iso-8859-1?q?mund.DE>?=
> "
>
> Sorry, I don't have time to grovel the RFCs.  Is one simply not
> allowed to use such addresses?  If so, Gnus should warn rather than
> completely mangling it.

I can't reproduce this, I get the output below which seems fine.

"To: \"=?iso-8859-1?q?Gro=DFjohann?=, K (Kai)\"
  <Kai.Grossjohann@CS.Uni-Dortmund.DE>
 "

This is with EMACS_21_1_RC.  The output you get seems invalid to me.

>> I agree.  (I'm reading the gnus bugs list from quimby.gnus.org, which
>> removes To/Cc so when I reply it only goes to the author and
>> bugs@gnus.org.)
>
> That seems unfortunate.  People reporting bugs will potentially miss
> discussion.

It appears I was wrong, quimby.gnus.org does not alter headers, at
least the last few messages CC bug-gnu-emacs properly, but the
original submission did not.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bad rfc2047 encoding
       [not found]       ` <hvofbxtmpd.fsf@rasputin.ws.nextra.no>
@ 2002-08-21 17:14         ` Simon Josefsson
  2002-08-22 12:20           ` Dave Love
       [not found]           ` <rzq4rdncdta.fsf@albion.dl.ac.uk>
  0 siblings, 2 replies; 14+ messages in thread
From: Simon Josefsson @ 2002-08-21 17:14 UTC (permalink / raw)
  Cc: Dave Love, bugs, bug-gnu-emacs

Bjørn Mork <bmork@dod.no> writes:

> And
>
> (with-temp-buffer
>   (insert "To: <Kai.Großjohann@CS.Uni-Dortmund.DE>")
>   (rfc2047-encode-message-header)
>   (buffer-string))
>
> evaluates to
>
> "To: <Kai.=?iso-8859-1?q?Gro=DFjohann?=@CS.Uni-Dortmund.DE>"
>
> OK, it's an illegal local part, but still... such local parts _do_
> exist.  Gnus is handling this in an ugly and non-compliant way IMHO.
> RFC2047 says: "An 'encoded-word' MUST NOT appear in any portion of an
> 'addr-spec'." It does not say that an illegal local part cancels this
> requirement. 
>
> I agree with Dave that the headers should be parsed to ensure that
> only comments, text and words within phrases are encoded. Other parts
> of the headers should never be encoded no matter which characters they
> contain. Gnus should instead probably warn the user about the illegal
> header content.
>
> Another example where Gnus fails:
>
> (with-temp-buffer
>   (insert "From: \"Bjørn Mork\" <bmork@dod.no>")
>   (rfc2047-encode-message-header)
>   (buffer-string))
> "From: \"=?iso-8859-1?q?Bj=F8rn?= Mork\" <bmork@dod.no>"
>
> RFC2047: 
> "An 'encoded-word' MUST NOT appear within a 'quoted-string'."

Yup, I now agree parsing the header is required.  RFC 2047 is only to
be used in some parts of the header.  I wonder if rfc822.el is up to
this though, I remember it didn't cope with non-ASCII properly at all.
Anyone want to work on it?

> In this example the quoted-string should probably be unquoted and
> then encoded. Or would that break something?

It could perhaps encode the quotes too.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bad rfc2047 encoding
  2002-08-21 17:06         ` Simon Josefsson
@ 2002-08-21 17:40           ` Reiner Steib
  2002-08-22 11:47           ` Dave Love
  1 sibling, 0 replies; 14+ messages in thread
From: Reiner Steib @ 2002-08-21 17:40 UTC (permalink / raw)
  Cc: bug-gnu-emacs

On Wed, Aug 21 2002, Simon Josefsson wrote:

> Dave Love <d.love@dl.ac.uk> writes:
[...]
>>> (I'm reading the gnus bugs list from quimby.gnus.org, which
>>> removes To/Cc so when I reply it only goes to the author and
>>> bugs@gnus.org.)
>>
>> That seems unfortunate.  People reporting bugs will potentially miss
>> discussion.
>
> It appears I was wrong, quimby.gnus.org does not alter headers, at
> least the last few messages CC bug-gnu-emacs properly, but the
> original submission did not.

Dave's first message had bug-gnu-emacs@gnu.org in the To-Header, not
in the Cc as the next ones. The posting script on quimby changes the
"To:" to "Original-To:":

| From: Dave Love <d.love@dl.ac.uk>
| Newsgroups: gnus.gnus-bug
| Cc: bugs@gnus.org
| Original-To: bug-gnu-emacs@gnu.org

The reason for this was ...

,----
| From: Lars Magne Ingebrigtsen <larsi@gnus.org>
| Subject: Re: gnus-summary-edit-article and wrapped From-line
| Newsgroups: gnus.ding
| Date: Sun, 30 Dec 2001 22:37:09 +0100
| Message-ID: <m3bsggjsoq.fsf@quimbies.gnus.org>
| Original-To: ding@gnus.org
| 
| Reiner Steib <reiner.steib@gmx.de> writes:
| 
| > BTW, the parent article <m37kr5ci6y.fsf@quimbies.gnus.org> is not
| > available on news.gnus.org:
| 
| Looking at the logs, it was rejected because I inserted a To header,
| leading to the article getting two To headers.  Would it be more
| reasonable if `C-c C-t' inserted a Cc header instead?
| 
| Anyway, I've now had the posting script rename To to Original-To.
`----

Bye, Reiner.
-- 
       ,,,
      (o o)
---ooO-(_)-Ooo--- PGP key available via WWW   http://rsteib.home.pages.de/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bad rfc2047 encoding
  2002-08-21 17:06         ` Simon Josefsson
  2002-08-21 17:40           ` Reiner Steib
@ 2002-08-22 11:47           ` Dave Love
  2002-08-22 17:48             ` Simon Josefsson
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Love @ 2002-08-22 11:47 UTC (permalink / raw)
  Cc: bugs, bug-gnu-emacs

Simon Josefsson <jas@extundo.com> writes:

> "To: \"=?iso-8859-1?q?Gro=DFjohann?=, K (Kai)\"
>   <Kai.Grossjohann@CS.Uni-Dortmund.DE>
>  "

Well, that's invalid too.

> This is with EMACS_21_1_RC.

I was using your patch (in a non-vanilla 21.2).  It looks as though
some of the insufficiently-tested changes I made for Emacs 22 were
responsible for the differences, sorry.  I think that just indicates
in another way that rfc2047.el doesn't DTRT.  I guess the use of Emacs
charsets saves it from going badly wrong in this case by chance, but
it's not generally right.  It's coding systems which are
(more-or-less) equivalent to MIME charsets and need to be checked, not
Emacs charsets.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bad rfc2047 encoding
  2002-08-21 17:14         ` Simon Josefsson
@ 2002-08-22 12:20           ` Dave Love
       [not found]           ` <rzq4rdncdta.fsf@albion.dl.ac.uk>
  1 sibling, 0 replies; 14+ messages in thread
From: Dave Love @ 2002-08-22 12:20 UTC (permalink / raw)
  Cc: Bjørn Mork, bugs, bug-gnu-emacs

[Re-sent due to encoding problems :-/.]

Simon Josefsson <jas@extundo.com> writes:

> Bjørn Mork <bmork@dod.no> writes:

[In something that hasn't reached here...]

> Yup, I now agree parsing the header is required.  RFC 2047 is only to
> be used in some parts of the header.  I wonder if rfc822.el is up to
> this though,

I don't think it does the right job.

> I remember it didn't cope with non-ASCII properly at all.

In what way?  I went through checking for such problems long ago.  I
doubtless missed some, and maybe some have been added since, but they
should be easy to fix.  (Emacs 21 only -- I don't know how you'd do it
properly in XEmacs.)

> > In this example the quoted-string should probably be unquoted and
> > then encoded. Or would that break something?
> 
> It could perhaps encode the quotes too.

The spirit of the RFC and examples seem to suggest unquoting the
string, encoding it, and (perhaps) re-quoting encoded word sequences
on decoding.  Is the correct behaviour really not explicitly specified
somewhere?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bad rfc2047 encoding
       [not found]           ` <rzq4rdncdta.fsf@albion.dl.ac.uk>
@ 2002-08-22 13:50             ` Bjørn Mork
  2002-08-30 17:59               ` Dave Love
  2002-08-22 17:55             ` Simon Josefsson
  1 sibling, 1 reply; 14+ messages in thread
From: Bjørn Mork @ 2002-08-22 13:50 UTC (permalink / raw)
  Cc: Simon Josefsson, bugs, bug-gnu-emacs

Dave Love <d.love@dl.ac.uk> writes:
> Simon Josefsson <jas@extundo.com> writes:
>
>> Bjørn Mork <bmork@dod.no> writes:
>
> [In something that hasn't reached here...]

Sorry about that. I am also reading bugs@gnus.org via the nntp gateway
and forgot to add a Cc-header. BTW, your Cc-header did not look good:

Cc: =?iso-8859-1?q?Bj=F8rn_Mork_<bmork@dod.no>, __bugs@gnus.org,
   __bug-gn?=.=?iso-8859-1?q?u-emacs@gnu.org?=

I guess this is the same problem you demonstrated in a previous
example. Never seen Gnus behave as bad as this before, though...

>> > In this example the quoted-string should probably be unquoted and
>> > then encoded. Or would that break something?
>> 
>> It could perhaps encode the quotes too.
>
> The spirit of the RFC and examples seem to suggest unquoting the
> string, encoding it, and (perhaps) re-quoting encoded word sequences
> on decoding.  Is the correct behaviour really not explicitly specified
> somewhere?

I am no expert on this, but trying to read RFC2822 and RFC2047 makes
me think that Simon is right. A 'quoted-string' is a 'word', and may
as such be replaced by an 'encoded-word'. The DQUOTE is part of the
'quoted-string' and should therefore be encoded with it.

So the proper way to encode

  From: "Bjørn Mork" <bmork@dod.no>

would be

  From: =?iso-8859-1?q?=22Bj=F8rn_Mork=22?= <bmork@dod.no>

I guess. An example of proper handling of a 'quoted-string' in RFC2047
would have been nice. But as far as I can see, this is the only way
you can do compliant encoding without losing any information.


Bjørn

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bad rfc2047 encoding
  2002-08-22 11:47           ` Dave Love
@ 2002-08-22 17:48             ` Simon Josefsson
  2002-08-30 18:08               ` Dave Love
  0 siblings, 1 reply; 14+ messages in thread
From: Simon Josefsson @ 2002-08-22 17:48 UTC (permalink / raw)
  Cc: bugs, bug-gnu-emacs

Dave Love <d.love@dl.ac.uk> writes:

> Simon Josefsson <jas@extundo.com> writes:
>
>> "To: \"=?iso-8859-1?q?Gro=DFjohann?=, K (Kai)\"
>>   <Kai.Grossjohann@CS.Uni-Dortmund.DE>
>>  "
>
> Well, that's invalid too.

Ouch, bad.

>> This is with EMACS_21_1_RC.
>
> I was using your patch (in a non-vanilla 21.2).  It looks as though
> some of the insufficiently-tested changes I made for Emacs 22 were
> responsible for the differences, sorry.  I think that just indicates
> in another way that rfc2047.el doesn't DTRT.  I guess the use of Emacs
> charsets saves it from going badly wrong in this case by chance, but
> it's not generally right.  It's coding systems which are
> (more-or-less) equivalent to MIME charsets and need to be checked, not
> Emacs charsets.

Yes.  It would be nice to have it do the right thing, but it seem to
require at least a fairly working rfc 2822 parser/decoder.  Lots of
work, I think.  Anyone want to work on it?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bad rfc2047 encoding
       [not found]           ` <rzq4rdncdta.fsf@albion.dl.ac.uk>
  2002-08-22 13:50             ` Bjørn Mork
@ 2002-08-22 17:55             ` Simon Josefsson
  1 sibling, 0 replies; 14+ messages in thread
From: Simon Josefsson @ 2002-08-22 17:55 UTC (permalink / raw)
  Cc: Bjørn Mork, bugs, bug-gnu-emacs

Dave Love <d.love@dl.ac.uk> writes:

>> I remember it didn't cope with non-ASCII properly at all.
>
> In what way?  I went through checking for such problems long ago.  I
> doubtless missed some, and maybe some have been added since, but they
> should be easy to fix.  (Emacs 21 only -- I don't know how you'd do it
> properly in XEmacs.)

I don't have a failing test case, but I remember some discussion of
this recently.

>> > In this example the quoted-string should probably be unquoted and
>> > then encoded. Or would that break something?
>> 
>> It could perhaps encode the quotes too.
>
> The spirit of the RFC and examples seem to suggest unquoting the
> string, encoding it, and (perhaps) re-quoting encoded word sequences
> on decoding.  Is the correct behaviour really not explicitly specified
> somewhere?

I dunno.  Ask on some IETF list?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bad rfc2047 encoding
  2002-08-22 13:50             ` Bjørn Mork
@ 2002-08-30 17:59               ` Dave Love
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Love @ 2002-08-30 17:59 UTC (permalink / raw)
  Cc: Simon Josefsson, bugs, bug-gnu-emacs

Bjørn Mork <bmork@dod.no> writes:

> I am no expert on this, but trying to read RFC2822 and RFC2047 makes
> me think that Simon is right. A 'quoted-string' is a 'word', and may
> as such be replaced by an 'encoded-word'.

Sure.

> The DQUOTE is part of the
> 'quoted-string' and should therefore be encoded with it.

Why should they be there at all?

> So the proper way to encode
> 
>   From: "Bjørn Mork" <bmork@dod.no>
> 
> would be
> 
>   From: =?iso-8859-1?q?=22Bj=F8rn_Mork=22?= <bmork@dod.no>

I'm not sure that's the right way to look at it, which is probably why
it isn't treated explicitly in the RFC.  As far as I remember, the RFC
implies that your name should be encoded as
`=?iso-8859-1?q?Bj=F8rn_Mork?='.  Why should quotes be introduced at
all?  RFC2822 only deals with ASCII, and defines what ASCII characters
need quoting -- it's silent on non-ASCII isn't it?

> I guess. An example of proper handling of a 'quoted-string' in RFC2047
> would have been nice. But as far as I can see, this is the only way
> you can do compliant encoding without losing any information.

As I understand it, the quotes aren't information content, just
syntax.  I'd think they should be stripped when the word is non-ASCII
and the contents should then be rfc2047-encoded.  Do any
specifications forbid that?

[Are there any more recent RFCs that are relevant, by any chance?]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: bad rfc2047 encoding
  2002-08-22 17:48             ` Simon Josefsson
@ 2002-08-30 18:08               ` Dave Love
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Love @ 2002-08-30 18:08 UTC (permalink / raw)
  Cc: bugs, bug-gnu-emacs

Simon Josefsson <jas@extundo.com> writes:

> Yes.  It would be nice to have it do the right thing, but it seem to
> require at least a fairly working rfc 2822 parser/decoder.  Lots of
> work, I think.

I don't think it should be so much work, at least for the fields
relevant here.  I guess I'll have to do it if it means the difference
between mail getting delivered or not, but I've got other things to
do...

The thing is that there are already three or four different parsing
packages, including rfc822, ietf-drums (which should presumably be
rfc2822), mail-extr, mail-utils, and presumably things like supercite,
but none of them actually have an interface for tokenizing addresses
and it's not clear how correct they all are.  Someone should sort this
out...

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2002-08-30 18:08 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-08-15 22:00 bad rfc2047 encoding Dave Love
     [not found] ` <ilu8z36li3m.fsf@latte.josefsson.org>
2002-08-20 17:02   ` Dave Love
2002-08-20 17:22     ` Simon Josefsson
2002-08-21 16:54       ` Dave Love
2002-08-21 17:06         ` Simon Josefsson
2002-08-21 17:40           ` Reiner Steib
2002-08-22 11:47           ` Dave Love
2002-08-22 17:48             ` Simon Josefsson
2002-08-30 18:08               ` Dave Love
     [not found]       ` <hvofbxtmpd.fsf@rasputin.ws.nextra.no>
2002-08-21 17:14         ` Simon Josefsson
2002-08-22 12:20           ` Dave Love
     [not found]           ` <rzq4rdncdta.fsf@albion.dl.ac.uk>
2002-08-22 13:50             ` Bjørn Mork
2002-08-30 17:59               ` Dave Love
2002-08-22 17:55             ` Simon Josefsson

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.