unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#13328: 24.2; Rmail does not properly decode MIME messages containing "From " lines or save such attachments correctly
@ 2013-01-01 21:34 Mark Lillibridge
  2020-12-06 19:30 ` Lars Ingebrigtsen
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Lillibridge @ 2013-01-01 21:34 UTC (permalink / raw)
  To: 13328


[|'s added to quote From lines]

    Run Rmail (emacs -q) on the following mbox file (remove |'s at line
start first):

|From mdl  Wed Oct 26 10:18:58 2011
|X-RMAIL-ATTRIBUTES: --EF----
|X-RMAIL-KEYWORDS: *
|X-Spam-Status: No, score=-4.7 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED,
|	RP_MATCHES_RCVD autolearn=unavailable version=3.3.1
|From: Mark Lillibridge <mark.lillibridge@hp.com>
|To: <bug-gnu-emacs@gnu.org>
|Subject: 23.3; rmail-edit-current-message adds extra '>'s to Froms...
|Date: Sat, 22 Oct 2011 12:05:57 -0700
|Reply-To: <mark.lillibridge@hp.com>
|
|
|Start with a Rmail message (this one will do) not containing MIME that
|contains lines like the following:
|
|>From level 1
|>>From level 2
|>>>From level 3
|
|From mdl  Tue Jan  1 12:27:17 2013
|Delivered-To: lillibridge@gmail.com
|Received: from gmail-pop.l.google.com [74.125.25.108]
|	by foil.strangled.net with POP3 (fetchmail-6.3.18)
|	for <mdl@localhost> (single-drop); Tue, 01 Jan 2013 12:27:17 -0800 (PST)
|Received: by 10.49.49.9 with SMTP id q9csp680843qen;
|        Tue, 1 Jan 2013 12:27:08 -0800 (PST)
|X-Received: by 10.52.69.201 with SMTP id g9mr59443696vdu.98.1357072028236;
|        Tue, 01 Jan 2013 12:27:08 -0800 (PST)
|Return-Path: <mark.lillibridge@hp.com>
|Received: from alum-mailsec-relay-12.mit.edu (ALUM-MAILSEC-RELAY-12.MIT.EDU. [18.7.68.32])
|        by mx.google.com with ESMTPS id o8si46143160vdv.77.2013.01.01.12.27.08
|        (version=TLSv1/SSLv3 cipher=OTHER);
|        Tue, 01 Jan 2013 12:27:08 -0800 (PST)
|Received: from alum-mailsec-scanner-7.mit.edu (ALUM-MAILSEC-SCANNER-7.MIT.EDU [18.7.68.19])
|	by alum-mailsec-relay-12.mit.edu (8.13.8/8.12.8) with ESMTP id r01KR7kv027035
|	for <mdl@alum.mit.edu>; Tue, 1 Jan 2013 15:27:07 -0500
|Received: from madara.hpl.hp.com (madara.hpl.hp.com [192.6.19.124])
|	by alum-mailsec-scanner-7.mit.edu (Symantec Messaging Gateway) with SMTP id A5.53.02235.B9643E05; Tue,  1 Jan 2013 15:27:07 -0500 (EST)
|Received: from postal.hpl.hp.com (postal.hpl.hp.com [15.25.115.25])
|	by madara.hpl.hp.com (8.14.4/8.14.4/HPL-PA Relay) with ESMTP id r01KR5Ae021394
|	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT)
|	for <mdl@alum.mit.edu>; Tue, 1 Jan 2013 12:27:06 -0800
|Received: from build-debian-1.u.hpl.hp.com (build-debian-1.u.hpl.hp.com [15.8.240.3])
|	by postal.hpl.hp.com (8.14.3/8.14.3/HPL-PA Hub) with ESMTP id r01KR4Df022085;
|	Tue, 1 Jan 2013 12:27:04 -0800
|From: Mark Lillibridge <mark.lillibridge@hp.com>
|To: mdl@alum.mit.edu
|Subject: test message containing From's in an attachment
|Reply-to: mark.lillibridge@hp.com
|Date: Tue, 01 Jan 2013 12:27:04 -0800
|Message-ID: <qmhip7gmyxz.fsf@build-debian-1.u.hpl.hp.com>
|MIME-Version: 1.0
|Content-Type: text/plain
|Content-Disposition: attachment; filename=froms.txt
|X-RMAIL-ATTRIBUTES: ------U-
|
|>From level 1
|>>From level 2
|>>>From level 3
|
|From mdl  Tue Jan  1 13:04:30 2013
|Delivered-To: lillibridge@gmail.com
|Received: from gmail-pop.l.google.com [74.125.25.108]
|	by foil.strangled.net with POP3 (fetchmail-6.3.18)
|	for <mdl@localhost> (single-drop); Tue, 01 Jan 2013 13:04:30 -0800 (PST)
|Received: by 10.49.49.9 with SMTP id q9csp681694qen;
|        Tue, 1 Jan 2013 13:04:13 -0800 (PST)
|X-Received: by 10.52.75.100 with SMTP id b4mr59369212vdw.52.1357074253041;
|        Tue, 01 Jan 2013 13:04:13 -0800 (PST)
|Return-Path: <mark.lillibridge@hp.com>
|Received: from alum-mailsec-relay-12.mit.edu (ALUM-MAILSEC-RELAY-12.MIT.EDU. [18.7.68.32])
|        by mx.google.com with ESMTPS id u1si24918746vdh.137.2013.01.01.13.04.12
|        (version=TLSv1/SSLv3 cipher=OTHER);
|        Tue, 01 Jan 2013 13:04:13 -0800 (PST)
|Received: from alum-mailsec-scanner-3.mit.edu (ALUM-MAILSEC-SCANNER-3.MIT.EDU [18.7.68.14])
|	by alum-mailsec-relay-12.mit.edu (8.13.8/8.12.8) with ESMTP id r01L47Pu030367
|	for <mdl@alum.mit.edu>; Tue, 1 Jan 2013 16:04:12 -0500
|Authentication-Results: symauth.service.identifier
|Received: from madara.hpl.hp.com (madara.hpl.hp.com [192.6.19.124])
|	by alum-mailsec-scanner-3.mit.edu (Symantec Messaging Gateway) with SMTP id E2.ED.02222.B4F43E05; Tue,  1 Jan 2013 16:04:12 -0500 (EST)
|Received: from masterns.hpl.hp.com (masterns.hpl.hp.com [15.0.48.4])
|	by madara.hpl.hp.com (8.14.4/8.14.4/HPL-PA Relay) with ESMTP id r01L4Auq022776
|	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT)
|	for <mdl@alum.mit.edu>; Tue, 1 Jan 2013 13:04:10 -0800
|Received: from build-debian-1.u.hpl.hp.com (build-debian-1.u.hpl.hp.com [15.8.240.3])
|	by masterns.hpl.hp.com (8.14.3/8.14.3/HPL-PA Hub) with ESMTP id r01L47j3018514;
|	Tue, 1 Jan 2013 13:04:08 -0800
|From: Mark Lillibridge <mark.lillibridge@hp.com>
|To: mdl@alum.mit.edu
|Subject: test message containing From's in an attachment
|Reply-to: mark.lillibridge@hp.com
|Date: Tue, 01 Jan 2013 13:04:07 -0800
|Message-ID: <qmhhan0mx88.fsf@build-debian-1.u.hpl.hp.com>
|MIME-Version: 1.0
|Content-Type: multipart/mixed; boundary="=-=-="
|X-RMAIL-ATTRIBUTES: --------
|
|--=-=-=
|Content-Type: text/plain
|
|
|body part
|
|
|--=-=-=
|Content-Type: text/plain
|Content-Disposition: attachment; filename=froms.txt
|
|>From level 1
|>>From level 2
|>>>From level 3
|
|--=-=-=--
|

    The result should be 3 messages, the first message (non-MIME) of
which looks like:

|X-Spam-Status: No, score=-4.7 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED,
|	RP_MATCHES_RCVD autolearn=unavailable version=3.3.1
|From: Mark Lillibridge <mark.lillibridge@hp.com>
|To: <bug-gnu-emacs@gnu.org>
|Subject: 23.3; rmail-edit-current-message adds extra '>'s to Froms...
|Date: Sat, 22 Oct 2011 12:05:57 -0700
|Reply-To: <mark.lillibridge@hp.com>
|
|
|Start with a Rmail message (this one will do) not containing MIME that
|contains lines like the following:
|
|From level 1
|>From level 2
|>>From level 3
|

This is properly decoded assuming mboxrd except for the extra blank line
at the end (bug #10080).  The second message, however, is incorrectly
decoded (toggle the show to see the froms.txt contents):

|From: Mark Lillibridge <mark.lillibridge@hp.com>
|To: mdl@alum.mit.edu
|Subject: test message containing From's in an attachment
|Reply-to: mark.lillibridge@hp.com
|Date: Tue, 01 Jan 2013 12:27:04 -0800
|Content-Type: text/plain
|Content-Disposition: attachment; filename=froms.txt
|
|
|[text/plain Hide Save:froms.txt (46B)]
|
|>From level 1
|>>From level 2
|>>>From level 3
|

The From lines should look identical to the first message.  The third
message demonstrates this problem with a non-in-line attachment:

|Authentication-Results: symauth.service.identifier
|From: Mark Lillibridge <mark.lillibridge@hp.com>
|To: mdl@alum.mit.edu
|Subject: test message containing From's in an attachment
|Reply-to: mark.lillibridge@hp.com
|Date: Tue, 01 Jan 2013 13:04:07 -0800
|Content-Type: multipart/mixed; boundary="=-=-="
|
|
|[1:text/plain Hide]
|
|
|body part
|
|
|[2:text/plain Hide Save:froms.txt (45B)]
|
|>From level 1
|>>From level 2
|>>>From level 3


    If I understand the resolution of bug 6574 correctly, it added a
variable for switching between mboxrd and mboxo (a broken but common
variant).  For reference, with mboxo all three sets of From lines should
read:

|From level 1
|>>From level 2
|>>>From level 3


    Saving one of the attachments gives incorrect results as well so
that functionality doesn't work correctly either:

froms.txt as saved:
|>From level 1
|>>From level 2
|>>>From level 3

- Mark





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#13328: 24.2; Rmail does not properly decode MIME messages containing "From " lines or save such attachments correctly
  2013-01-01 21:34 bug#13328: 24.2; Rmail does not properly decode MIME messages containing "From " lines or save such attachments correctly Mark Lillibridge
@ 2020-12-06 19:30 ` Lars Ingebrigtsen
  2020-12-06 19:54   ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Lars Ingebrigtsen @ 2020-12-06 19:30 UTC (permalink / raw)
  To: Mark Lillibridge; +Cc: 13328

Mark Lillibridge <mdl@alum.mit.edu> writes:

> |>From level 1
> |>>From level 2
> |>>>From level 3
> |
>
> The From lines should look identical to the first message.

(This bug report unfortunately got no response at the time.)

Thanks for the easy recipe to reproduce this behaviour.

The issue is: In non-MIME messages, rmail strips one level of ">" from
all lines that start with ">+From ".  In MIME messages, it doesn't do
that.

I think the latter behaviour sounds correct, though.  It's common for
mailers to prepend a ">" to lines that start with "From " -- but usually
only when that actually matches an mbox delimiter, which "From level 1"
does not.

Furthermore, it's really uncommon for anything to prepend a ">" when the
line starts with ">From ".

So I think the bug here is that it does the stripping at all in these
cases.

Opinions?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#13328: 24.2; Rmail does not properly decode MIME messages containing "From " lines or save such attachments correctly
  2020-12-06 19:30 ` Lars Ingebrigtsen
@ 2020-12-06 19:54   ` Eli Zaretskii
  2020-12-07 14:47     ` Lars Ingebrigtsen
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2020-12-06 19:54 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: mdl, 13328

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Date: Sun, 06 Dec 2020 20:30:08 +0100
> Cc: 13328@debbugs.gnu.org
> 
> Furthermore, it's really uncommon for anything to prepend a ">" when the
> line starts with ">From ".
> 
> So I think the bug here is that it does the stripping at all in these
> cases.
> 
> Opinions?

If it doesn't strip, but keeps adding ">", these additions will
accumulate.  Therefore, I think that it can only stop stripping if it
also stops adding ">".  And that cannot be don, AFAIU, if we want to
be able to preserve the mbox format (for example, the FCC feature in
Rmail saves messages in mbox files).

Am I missing something?





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#13328: 24.2; Rmail does not properly decode MIME messages containing "From " lines or save such attachments correctly
  2020-12-06 19:54   ` Eli Zaretskii
@ 2020-12-07 14:47     ` Lars Ingebrigtsen
  2020-12-07 16:04       ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Lars Ingebrigtsen @ 2020-12-07 14:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mdl, 13328

Eli Zaretskii <eliz@gnu.org> writes:

>> Furthermore, it's really uncommon for anything to prepend a ">" when the
>> line starts with ">From ".
>> 
>> So I think the bug here is that it does the stripping at all in these
>> cases.
>> 
>> Opinions?
>
> If it doesn't strip, but keeps adding ">", these additions will
> accumulate.

No -- there's no accumulation.

Other mailers only add a ">" to the front of lines that match mbox
separators.  ">From" does not match any mbox separator, so there's no
accumulation.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#13328: 24.2; Rmail does not properly decode MIME messages containing "From " lines or save such attachments correctly
  2020-12-07 14:47     ` Lars Ingebrigtsen
@ 2020-12-07 16:04       ` Eli Zaretskii
  2020-12-08 13:34         ` Lars Ingebrigtsen
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2020-12-07 16:04 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: mdl, 13328

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: mdl@alum.mit.edu,  13328@debbugs.gnu.org
> Date: Mon, 07 Dec 2020 15:47:03 +0100
> 
> Other mailers only add a ">" to the front of lines that match mbox
> separators.  ">From" does not match any mbox separator, so there's no
> accumulation.

But Rmail uses "^From:?" as the regexp that identifies the beginning
of another message in an mbox file, so it must quote other "From"s to
avoid catastrophic failures later.  Or am I missing something?





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#13328: 24.2; Rmail does not properly decode MIME messages containing "From " lines or save such attachments correctly
  2020-12-07 16:04       ` Eli Zaretskii
@ 2020-12-08 13:34         ` Lars Ingebrigtsen
  2020-12-08 15:45           ` Eli Zaretskii
  0 siblings, 1 reply; 9+ messages in thread
From: Lars Ingebrigtsen @ 2020-12-08 13:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mdl, 13328

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Lars Ingebrigtsen <larsi@gnus.org>
>> Cc: mdl@alum.mit.edu,  13328@debbugs.gnu.org
>> Date: Mon, 07 Dec 2020 15:47:03 +0100
>> 
>> Other mailers only add a ">" to the front of lines that match mbox
>> separators.  ">From" does not match any mbox separator, so there's no
>> accumulation.
>
> But Rmail uses "^From:?" as the regexp that identifies the beginning
> of another message in an mbox file, so it must quote other "From"s to
> avoid catastrophic failures later.  Or am I missing something?

If rmail uses that as the regexp, then I guess rmail will have to
continue quoting all "From:?" lines.  But that's a pretty odd regexp to
use to split an mbox file -- first of all, "^From:" is not an mbox
separator at all, and neither is "Fromage".

The normal way to split an mbox is to use something like
`message-unix-mail-delimiter'.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#13328: 24.2; Rmail does not properly decode MIME messages containing "From " lines or save such attachments correctly
  2020-12-08 13:34         ` Lars Ingebrigtsen
@ 2020-12-08 15:45           ` Eli Zaretskii
  2020-12-08 16:18             ` Lars Ingebrigtsen
  0 siblings, 1 reply; 9+ messages in thread
From: Eli Zaretskii @ 2020-12-08 15:45 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: mdl, 13328

> From: Lars Ingebrigtsen <larsi@gnus.org>
> Cc: mdl@alum.mit.edu,  13328@debbugs.gnu.org
> Date: Tue, 08 Dec 2020 14:34:09 +0100
> 
> > But Rmail uses "^From:?" as the regexp that identifies the beginning
> > of another message in an mbox file, so it must quote other "From"s to
> > avoid catastrophic failures later.  Or am I missing something?
> 
> If rmail uses that as the regexp, then I guess rmail will have to
> continue quoting all "From:?" lines.  But that's a pretty odd regexp to
> use to split an mbox file -- first of all, "^From:" is not an mbox
> separator at all, and neither is "Fromage".

AFAIK, mbox format requires that every new message begins with a line
that starts with "From " (sorry, I forgot the space in my previous
message).  See, for example, this page:

  https://www.loc.gov/preservation/digital/formats/fdd/fdd000383.shtml





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#13328: 24.2; Rmail does not properly decode MIME messages containing "From " lines or save such attachments correctly
  2020-12-08 15:45           ` Eli Zaretskii
@ 2020-12-08 16:18             ` Lars Ingebrigtsen
  2022-10-10 18:00               ` Mark Lillibridge
  0 siblings, 1 reply; 9+ messages in thread
From: Lars Ingebrigtsen @ 2020-12-08 16:18 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mdl, 13328

Eli Zaretskii <eliz@gnu.org> writes:

> AFAIK, mbox format requires that every new message begins with a line
> that starts with "From " (sorry, I forgot the space in my previous
> message).  See, for example, this page:
>
>   https://www.loc.gov/preservation/digital/formats/fdd/fdd000383.shtml

The mbox format is more restrictive than just that.
`message-unix-mail-delimiter' is a regexp that matches these lines.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#13328: 24.2; Rmail does not properly decode MIME messages containing "From " lines or save such attachments correctly
  2020-12-08 16:18             ` Lars Ingebrigtsen
@ 2022-10-10 18:00               ` Mark Lillibridge
  0 siblings, 0 replies; 9+ messages in thread
From: Mark Lillibridge @ 2022-10-10 18:00 UTC (permalink / raw)
  To: Lars Ingebrigtsen; +Cc: eliz, 13328

[-- Attachment #1: Type: text/plain, Size: 3234 bytes --]


>  From: Lars Ingebrigtsen <larsi@gnus.org>
>  To: Eli Zaretskii <eliz@gnu.org>
>  Cc: mdl@alum.mit.edu,  13328@debbugs.gnu.org
>  Subject: Re: bug#13328: 24.2; Rmail does not properly decode MIME messages
>   containing "From " lines or save such attachments correctly
>  Date: Tue, 08 Dec 2020 17:18:36 +0100
>  In-Reply-To: <83lfe8714d.fsf@gnu.org> (Eli Zaretskii's message of "Tue, 08 Dec
>  	2020 17:45:22 +0200")
>  
>  Eli Zaretskii <eliz@gnu.org> writes:
>  
>  > AFAIK, mbox format requires that every new message begins with a line
>  > that starts with "From " (sorry, I forgot the space in my previous
>  > message).  See, for example, this page:
>  >
>  >   https://www.loc.gov/preservation/digital/formats/fdd/fdd000383.shtml
>  
>  The mbox format is more restrictive than just that.
>  `message-unix-mail-delimiter' is a regexp that matches these lines.

    First, I wanted to report that this bug is still present in Emacs
28.1.  Start emacs 28.1 -q, find file the gunziped version of the
attachment in this message (should be same mbox as quoted one in
original report), then do M-x rmail-mode.  You will see the last 2
messages w/ extra >'s on the Froms, which continues when you save the
attachments.


    Back to the discussion.  I think the confusion here is that there
are multiple levels of quoting.  At the outermost level there is quoting
of messages to store them in the mbox file.  This is what adds the >'s
in front of from lines.  Without this quoting, a single message
containing From lines would be decoded as multiple messages.  

    Because messages in mbox files are separated by a blank line
followed by a line started with "From ", quoting traditionally on Unix
for mbox (e.g., fetchmail, getmail) is adding a > in front of any line
matching regexp "^>*From ".  When extracting the messages, un-quoting
must be done by removing one > from each line in the message matching
that regular expression.

    Why can there be multiple >'s?  Because we have to safely quote message
lines like ">From ".

    There are some other formats of mbox that do not do this quoting in
a data preserving manner.  They should not be relevant except when
importing mbox files (as opposed to receiving messages) from other
mailing systems.

   MIME may additionally quote parts of its messages internally as well.


    Note that the mbox quoting is not part of the messages and is never
supposed to be seen by the user or exposed when sending messages --
ideally, it should be removed whenever a message is read from the mbox
reader component.  Rmail, unfortunately due to bugs, does not properly
do this.  What should happen is that the current message is unquoted and
copied to a seperate view buffer, which is then displayed, possibly
after decoding various MIME or other quoting (e.g., base 64).  What I'm
guessing is happening for MIME is that the region of the mbox containing
the message body is being passed directly to the MIME decoder.  This
likely explains bug #10080 as well as the blank line of the message
separator is not being correctly excluded.

   You can look at https://en.wikipedia.org/wiki/Mbox for more on mbox
formats; we are talking about mboxrd format in that article's terms.

- Mark

[-- Attachment #2: mbox_13328.gz --]
[-- Type: application/gzip, Size: 1585 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-10-10 18:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-01 21:34 bug#13328: 24.2; Rmail does not properly decode MIME messages containing "From " lines or save such attachments correctly Mark Lillibridge
2020-12-06 19:30 ` Lars Ingebrigtsen
2020-12-06 19:54   ` Eli Zaretskii
2020-12-07 14:47     ` Lars Ingebrigtsen
2020-12-07 16:04       ` Eli Zaretskii
2020-12-08 13:34         ` Lars Ingebrigtsen
2020-12-08 15:45           ` Eli Zaretskii
2020-12-08 16:18             ` Lars Ingebrigtsen
2022-10-10 18:00               ` Mark Lillibridge

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).