Re: base64 behavior is not MIME compliant

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Re: base64 behavior is not MIME compliant
       [not found] <t53zmt4dce3.fsf@central-air-conditioning.toybox.cambridge.ma.us>
@ 2005-07-03 20:43 ` Richard M. Stallman
  2005-07-03 21:09   ` Nic Ferrier
  2005-07-04  4:59   ` Marc Horowitz
  0 siblings, 2 replies; 11+ messages in thread
From: Richard M. Stallman @ 2005-07-03 20:43 UTC (permalink / raw)
  Cc: bugs, emacs-devel

    RFC 3548 has this to say about characters not part of the encoding
    alphabet:

	Implementations MUST reject the encoding if it contains characters
	outside the base alphabet when interpreting base encoded data, unless
	the specification referring to this document explicitly states
	otherwise.  Such specifications may, as MIME does, instead state that
	characters outside the base encoding alphabet should simply be ignored
	when interpreting data ("be liberal in what you accept").

Words such as "must" claim an authority we do not recognize in the GNU
Project.  We do not _obey_ standards--rather, we see what they have to
say, consider their recommendations, then do what seems best.

    I believe the best fix is for base64-decode-region to take an optional
    argument which specifies how liberal it should be about it's input,
    defaulting to the current behavior, and for Gnus to use this argument.

To decide whether to do this, we need to know the answers to three questions:

Is there some situation in which the current behavior of
base64-decode-region causes an actual problem or confusion for users?

Is there some situation in which the current behavior provides an
advantage?

Also, how does the current development Emacs handle these things?
Your report is based on 21.4; the current sources may be different.

      Of course, Gnus
    can fix this independently by using an external base64 implementation
    which is MIME-compliant.

Theoretically it could, but that's a very undesirable thing to do, so
we won't do that.  If this calls for fixing, we should fix it in
base64-decode-region; if not, we shouldn't fix what isn't broken,
not in Gnus or anywhere else.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: base64 behavior is not MIME compliant
  2005-07-03 20:43 ` base64 behavior is not MIME compliant Richard M. Stallman
@ 2005-07-03 21:09   ` Nic Ferrier
  2005-07-04  4:59   ` Marc Horowitz
  1 sibling, 0 replies; 11+ messages in thread
From: Nic Ferrier @ 2005-07-03 21:09 UTC (permalink / raw)
  Cc: Marc Horowitz, bugs, emacs-devel

"Richard M. Stallman" <rms@gnu.org> writes:

>     RFC 3548 has this to say about characters not part of the encoding
>     alphabet:
>
> 	Implementations MUST reject the encoding if it contains characters
> 	outside the base alphabet when interpreting base encoded data, unless
> 	the specification referring to this document explicitly states
> 	otherwise.  Such specifications may, as MIME does, instead state that
> 	characters outside the base encoding alphabet should simply be ignored
> 	when interpreting data ("be liberal in what you accept").
>
> Words such as "must" claim an authority we do not recognize in the GNU
> Project.  We do not _obey_ standards--rather, we see what they have to
> say, consider their recommendations, then do what seems best.
>
>     I believe the best fix is for base64-decode-region to take an optional
>     argument which specifies how liberal it should be about it's input,
>     defaulting to the current behavior, and for Gnus to use this argument.
>
> To decide whether to do this, we need to know the answers to three questions:
>
> Is there some situation in which the current behavior of
> base64-decode-region causes an actual problem or confusion for
> users?

I use base64-decode-region in my own email client written in elisp.

I have *never* had a problem with it decoding a file so I couldn't
read it.


Nic

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: base64 behavior is not MIME compliant
  2005-07-03 20:43 ` base64 behavior is not MIME compliant Richard M. Stallman
  2005-07-03 21:09   ` Nic Ferrier
@ 2005-07-04  4:59   ` Marc Horowitz
  2005-07-05  4:35     ` Richard M. Stallman
  2005-07-05 22:52     ` Arne Jørgensen
  1 sibling, 2 replies; 11+ messages in thread
From: Marc Horowitz @ 2005-07-04  4:59 UTC (permalink / raw)
  Cc: bugs, emacs-devel

"Richard M. Stallman" <rms@gnu.org> writes:

>> Is there some situation in which the current behavior of
>> base64-decode-region causes an actual problem or confusion for users?

I never would have noticed this had it not caused me a problem.

I received a piece of email which passed through an older MTA.  This
MTA inserted a ! and a newline after every 1000 characters of a very
long line of base64-encoded data, which used to be common behavior.
When Gnus tried to display this email, it failed, because the !
characters were not recognized as valid base64 encoding.

>> Is there some situation in which the current behavior provides an
>> advantage?

The only case I can think of is if a program or the user tries to
base64 decode something which is not base64 encoded, they will receive
an error, instead of some other, possibly confusing behavior.
However, I believe this case is less common than non-transparent MTAs
making small changes to base64-encded data.

>> Also, how does the current development Emacs handle these things?
>> Your report is based on 21.4; the current sources may be different.

I do not know.

                Marc

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: base64 behavior is not MIME compliant
  2005-07-04  4:59   ` Marc Horowitz
@ 2005-07-05  4:35     ` Richard M. Stallman
  2005-07-05 21:35       ` Marc Horowitz
  2005-07-05 22:52     ` Arne Jørgensen
  1 sibling, 1 reply; 11+ messages in thread
From: Richard M. Stallman @ 2005-07-05  4:35 UTC (permalink / raw)
  Cc: bugs, emacs-devel

    I received a piece of email which passed through an older MTA.  This
    MTA inserted a ! and a newline after every 1000 characters of a very
    long line of base64-encoded data, which used to be common behavior.
    When Gnus tried to display this email, it failed, because the !
    characters were not recognized as valid base64 encoding.

Maybe I misunderstood what you were asking for.  I thought you
were asking us to make additional base64-decode-region signal
errors in cases where currently it does not.  But now it looks
like you are asking for it to accept input that now gives
an error.

Could you give a self-contained description of the change that you are
requesting in the behavior of this function?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: base64 behavior is not MIME compliant
  2005-07-05  4:35     ` Richard M. Stallman
@ 2005-07-05 21:35       ` Marc Horowitz
  2005-07-05 22:10         ` Nic Ferrier
  0 siblings, 1 reply; 11+ messages in thread
From: Marc Horowitz @ 2005-07-05 21:35 UTC (permalink / raw)
  Cc: bugs, emacs-devel

"Richard M. Stallman" <rms@gnu.org> writes:

>>     I received a piece of email which passed through an older MTA.  This
>>     MTA inserted a ! and a newline after every 1000 characters of a very
>>     long line of base64-encoded data, which used to be common behavior.
>>     When Gnus tried to display this email, it failed, because the !
>>     characters were not recognized as valid base64 encoding.
>
>> Maybe I misunderstood what you were asking for.  I thought you
>> were asking us to make additional base64-decode-region signal
>> errors in cases where currently it does not.  But now it looks
>> like you are asking for it to accept input that now gives
>> an error.

I'm sorry I was confusing.  To quote my earlier email:

    I believe the best fix is for base64-decode-region to take an optional
    argument which specifies how liberal it should be about it's input,
    defaulting to the current behavior, and for Gnus to use this argument.

Defaulting to the current behavior should certainly not mean
signalling errors in new cases.  You are correct that I'm asking for a
behavior variant which would result in more input being accepted.  If
this variant became the default, I would not object, but I would
certainly understand if you did not want to change the default
behavior.

>> Could you give a self-contained description of the change that you are
>> requesting in the behavior of this function?

For the purposes of reading mail, it is valuable to ignore all
characters not part of the base64 character set when decoding.  So, my
minimum proposal would be for base64-decode-region to ignore all
unknown characters, instead of signalling errors in this case.

It would be more generally useful to provide three forms of the
base64-decode-region function, either by having three functions, or
one with an optional argument:

    Form 1: all characters not part of the base64 character set would
    be ignored.

    Form 2: any character not part of the base64 character set would
    cause an error to be signalled.

    Form 3: any character not part of the union of the base64
    character set and the whitespace characters would cause an error
    to be signalled.

Form 3 is the current observed behavior.  I believe there is a need
for Form 1, to make mail reading work more smoothly.  Form 2 mainly
exists for completeness.

                Marc

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: base64 behavior is not MIME compliant
  2005-07-05 21:35       ` Marc Horowitz
@ 2005-07-05 22:10         ` Nic Ferrier
  2005-07-05 23:55           ` Marc Horowitz
  2005-07-06  1:15           ` Ken Raeburn
  0 siblings, 2 replies; 11+ messages in thread
From: Nic Ferrier @ 2005-07-05 22:10 UTC (permalink / raw)
  Cc: emacs-devel, rms, bugs

Marc Horowitz <marc@mit.edu> writes:

> "Richard M. Stallman" <rms@gnu.org> writes:
>
>>>     I received a piece of email which passed through an older MTA.  This
>>>     MTA inserted a ! and a newline after every 1000 characters of a very
>>>     long line of base64-encoded data, which used to be common behavior.
>>>     When Gnus tried to display this email, it failed, because the !
>>>     characters were not recognized as valid base64 encoding.
>>
>>> Maybe I misunderstood what you were asking for.  I thought you
>>> were asking us to make additional base64-decode-region signal
>>> errors in cases where currently it does not.  But now it looks
>>> like you are asking for it to accept input that now gives
>>> an error.
>
> I'm sorry I was confusing.  To quote my earlier email:
>
>     I believe the best fix is for base64-decode-region to take an optional
>     argument which specifies how liberal it should be about it's input,
>     defaulting to the current behavior, and for Gnus to use this argument.
>
> Defaulting to the current behavior should certainly not mean
> signalling errors in new cases.  You are correct that I'm asking for a
> behavior variant which would result in more input being accepted.  If
> this variant became the default, I would not object, but I would
> certainly understand if you did not want to change the default
> behavior.
>
>>> Could you give a self-contained description of the change that you are
>>> requesting in the behavior of this function?
>
> For the purposes of reading mail, it is valuable to ignore all
> characters not part of the base64 character set when decoding.  So, my
> minimum proposal would be for base64-decode-region to ignore all
> unknown characters, instead of signalling errors in this case.
>
> It would be more generally useful to provide three forms of the
> base64-decode-region function, either by having three functions, or
> one with an optional argument:
>
>     Form 1: all characters not part of the base64 character set would
>     be ignored.
>
>     Form 2: any character not part of the base64 character set would
>     cause an error to be signalled.
>
>     Form 3: any character not part of the union of the base64
>     character set and the whitespace characters would cause an error
>     to be signalled.
>
> Form 3 is the current observed behavior.  I believe there is a need
> for Form 1, to make mail reading work more smoothly.  Form 2 mainly
> exists for completeness.

Why can't you just pre-parse the data parsed to the base64 decoder? I
believe that's the correct behaviour. A base64 decoder should decode
base64, not "base64 but also it does this extra trick if you wave your
hand in the air"


Nic

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: base64 behavior is not MIME compliant
  2005-07-04  4:59   ` Marc Horowitz
  2005-07-05  4:35     ` Richard M. Stallman
@ 2005-07-05 22:52     ` Arne Jørgensen
  1 sibling, 0 replies; 11+ messages in thread
From: Arne Jørgensen @ 2005-07-05 22:52 UTC (permalink / raw)

Marc Horowitz <marc@mit.edu> writes:

> "Richard M. Stallman" <rms@gnu.org> writes:
>
>>> Is there some situation in which the current behavior of
>>> base64-decode-region causes an actual problem or confusion for users?
>
> I never would have noticed this had it not caused me a problem.
>
> I received a piece of email which passed through an older MTA.  This
> MTA inserted a ! and a newline after every 1000 characters of a very
> long line of base64-encoded data, which used to be common behavior.
> When Gnus tried to display this email, it failed, because the !
> characters were not recognized as valid base64 encoding.

MIME puts a limit on the line length at 76 characters. So in most
cases this will in it self be a broken behavior. (Base64 can probably
be used outside MIME too, of course).

>>> Is there some situation in which the current behavior provides an
>>> advantage?
>
> The only case I can think of is if a program or the user tries to
> base64 decode something which is not base64 encoded, they will receive
> an error, instead of some other, possibly confusing behavior.
> However, I believe this case is less common than non-transparent MTAs
> making small changes to base64-encded data.

I actually wrote some code for No Gnus recently that depended on
base64-decode-string to throw an error on strings that where not
base64 encoded.

Then it turned out that XEmacs' implementation of
base64-decode-region/base64-decode-string _does_ ignore illegal
characters in the base64 encoding.

The problem is that we have no other way to detect if a region/string
is base64 encoded or not. I thought about doing a decode and then
encode and compare the before and after string, but I finally found
another to recognize the data (it was a PEM encoded X509 certificate
and should therefore begin with "MII").

>>> Also, how does the current development Emacs handle these things?
>>> Your report is based on 21.4; the current sources may be different.
>
> I do not know.

I think the behavior is unchanged.

Kind regards,
-- 
Arne Jørgensen <http://arnested.dk/>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: base64 behavior is not MIME compliant
  2005-07-05 22:10         ` Nic Ferrier
@ 2005-07-05 23:55           ` Marc Horowitz
  2005-07-06  1:06             ` Nic Ferrier
  2005-07-06  1:15           ` Ken Raeburn
  1 sibling, 1 reply; 11+ messages in thread
From: Marc Horowitz @ 2005-07-05 23:55 UTC (permalink / raw)
  Cc: emacs-devel, rms, bugs

Nic Ferrier <nferrier@tapsellferrier.co.uk> writes:

>> Why can't you just pre-parse the data parsed to the base64 decoder? I
>> believe that's the correct behaviour. A base64 decoder should decode
>> base64, not "base64 but also it does this extra trick if you wave your
>> hand in the air"

You could do that, but then the pre-parser needs to know what base64
character sets looks like, so it's not a very clean abstraction.  The
decoder already knows everything it needs to know.  It's also likely
that other apps which want to do base64 decoding will want this same
functionality, so repeating it makes little sense.

But in the end, I don't care strongly if the code is in emacs or in
gnus, as long as it's somewhere.

                Marc

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: base64 behavior is not MIME compliant
  2005-07-05 23:55           ` Marc Horowitz
@ 2005-07-06  1:06             ` Nic Ferrier
  0 siblings, 0 replies; 11+ messages in thread
From: Nic Ferrier @ 2005-07-06  1:06 UTC (permalink / raw)
  Cc: rms, bugs, emacs-devel

Marc Horowitz <marc@mit.edu> writes:

> Nic Ferrier <nferrier@tapsellferrier.co.uk> writes:
>
>>> Why can't you just pre-parse the data parsed to the base64 decoder? I
>>> believe that's the correct behaviour. A base64 decoder should decode
>>> base64, not "base64 but also it does this extra trick if you wave your
>>> hand in the air"
>
> You could do that, but then the pre-parser needs to know what base64
> character sets looks like, so it's not a very clean abstraction.  

I disagree. Base64 is a well documented and understood
encoding. Particularly the acceptable characters.

A function base64-clean or clean-for-base64 would be better than
changing the base64 decoder to accept unstandard characters.



> decoder already knows everything it needs to know.  It's also likely
> that other apps which want to do base64 decoding will want this same
> functionality, so repeating it makes little sense.
>
> But in the end, I don't care strongly if the code is in emacs or in
> gnus, as long as it's somewhere.

Can you go over again where you had the problem? Was it a particular
message from some particular MTA? Or was it something more general?


Nic

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: base64 behavior is not MIME compliant
  2005-07-05 22:10         ` Nic Ferrier
  2005-07-05 23:55           ` Marc Horowitz
@ 2005-07-06  1:15           ` Ken Raeburn
  2005-07-06  1:48             ` Nic Ferrier
  1 sibling, 1 reply; 11+ messages in thread
From: Ken Raeburn @ 2005-07-06  1:15 UTC (permalink / raw)
  Cc: Marc Horowitz, rms, bugs, emacs-devel

On Jul 5, 2005, at 18:10, Nic Ferrier wrote:
> Why can't you just pre-parse the data parsed to the base64 decoder? I
> believe that's the correct behaviour. A base64 decoder should decode
> base64, not "base64 but also it does this extra trick if you wave your
> hand in the air"

Except "this extra trick" is specifically outlined as an option in the 
base64 spec (RFC 3548), and MIME invokes that option.  So proper "MIME 
base64 decoding" would require this extra step of throwing away 
characters that are not part of a base64 encoding, and then making a 
second pass with the strict base64 decoder.  In fact, as I read RFC 
3548 section 2.3, the CR/LF line break sequences in MIME messages are 
not part of the base64 alphabet, and therefore 
fns.c:IS_BASE64_IGNORABLE already implements a limited form of what 
Marc is asking for.

Ken

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: base64 behavior is not MIME compliant
  2005-07-06  1:15           ` Ken Raeburn
@ 2005-07-06  1:48             ` Nic Ferrier
  0 siblings, 0 replies; 11+ messages in thread
From: Nic Ferrier @ 2005-07-06  1:48 UTC (permalink / raw)
  Cc: Marc Horowitz, rms, bugs, emacs-devel

Ken Raeburn <raeburn@raeburn.org> writes:

> On Jul 5, 2005, at 18:10, Nic Ferrier wrote:
>> Why can't you just pre-parse the data parsed to the base64 decoder? I
>> believe that's the correct behaviour. A base64 decoder should decode
>> base64, not "base64 but also it does this extra trick if you wave your
>> hand in the air"
>
> Except "this extra trick" is specifically outlined as an option in the 
> base64 spec (RFC 3548), and MIME invokes that option.  So proper "MIME 
> base64 decoding" would require this extra step of throwing away 
> characters that are not part of a base64 encoding, and then making a 
> second pass with the strict base64 decoder.  In fact, as I read RFC 
> 3548 section 2.3, the CR/LF line break sequences in MIME messages are 
> not part of the base64 alphabet, and therefore 
> fns.c:IS_BASE64_IGNORABLE already implements a limited form of what 
> Marc is asking for.

You're quite right - my view of what a base64 decoder should do is
based on previous implementation rather than what the spec says (mea
culpa).

Anyway - 2045 (the last MIME/base64 spec I was aware of) says this:

   The encoded output stream must be represented in lines of no more
   than 76 characters each.  All line breaks or other characters not
   found in Table 1 [the acceptable alphabet table] must be ignored by
   decoding software.  In base64 data, characters other than those in
   Table 1, line breaks, and other white space probably indicate a
   transmission error, about which a warning message or even a message
   rejection might be appropriate under some circumstances.

So this spec suggests that the base64 decoder should optionally error
or throw an exception or call a supplied handler or something.

How about I write some advice to just throw unwanted characters away
for base64-decode-xx?

Different advice could be maybe be used when errors are needed.

I'm happy to do this if someone wants it - advice is fairly easy to
include in stuff without it necessarily being in Emacs.

I'll do it on my train journey to work tommorow morning.

Nic

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-07-06  1:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <t53zmt4dce3.fsf@central-air-conditioning.toybox.cambridge.ma.us>
2005-07-03 20:43 ` base64 behavior is not MIME compliant Richard M. Stallman
2005-07-03 21:09   ` Nic Ferrier
2005-07-04  4:59   ` Marc Horowitz
2005-07-05  4:35     ` Richard M. Stallman
2005-07-05 21:35       ` Marc Horowitz
2005-07-05 22:10         ` Nic Ferrier
2005-07-05 23:55           ` Marc Horowitz
2005-07-06  1:06             ` Nic Ferrier
2005-07-06  1:15           ` Ken Raeburn
2005-07-06  1:48             ` Nic Ferrier
2005-07-05 22:52     ` Arne Jørgensen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).