* Re: base64 behavior is not MIME compliant [not found] <t53zmt4dce3.fsf@central-air-conditioning.toybox.cambridge.ma.us> @ 2005-07-03 20:43 ` Richard M. Stallman 2005-07-03 21:09 ` Nic Ferrier 2005-07-04 4:59 ` Marc Horowitz 0 siblings, 2 replies; 11+ messages in thread From: Richard M. Stallman @ 2005-07-03 20:43 UTC (permalink / raw) Cc: bugs, emacs-devel RFC 3548 has this to say about characters not part of the encoding alphabet: Implementations MUST reject the encoding if it contains characters outside the base alphabet when interpreting base encoded data, unless the specification referring to this document explicitly states otherwise. Such specifications may, as MIME does, instead state that characters outside the base encoding alphabet should simply be ignored when interpreting data ("be liberal in what you accept"). Words such as "must" claim an authority we do not recognize in the GNU Project. We do not _obey_ standards--rather, we see what they have to say, consider their recommendations, then do what seems best. I believe the best fix is for base64-decode-region to take an optional argument which specifies how liberal it should be about it's input, defaulting to the current behavior, and for Gnus to use this argument. To decide whether to do this, we need to know the answers to three questions: Is there some situation in which the current behavior of base64-decode-region causes an actual problem or confusion for users? Is there some situation in which the current behavior provides an advantage? Also, how does the current development Emacs handle these things? Your report is based on 21.4; the current sources may be different. Of course, Gnus can fix this independently by using an external base64 implementation which is MIME-compliant. Theoretically it could, but that's a very undesirable thing to do, so we won't do that. If this calls for fixing, we should fix it in base64-decode-region; if not, we shouldn't fix what isn't broken, not in Gnus or anywhere else. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: base64 behavior is not MIME compliant 2005-07-03 20:43 ` base64 behavior is not MIME compliant Richard M. Stallman @ 2005-07-03 21:09 ` Nic Ferrier 2005-07-04 4:59 ` Marc Horowitz 1 sibling, 0 replies; 11+ messages in thread From: Nic Ferrier @ 2005-07-03 21:09 UTC (permalink / raw) Cc: Marc Horowitz, bugs, emacs-devel "Richard M. Stallman" <rms@gnu.org> writes: > RFC 3548 has this to say about characters not part of the encoding > alphabet: > > Implementations MUST reject the encoding if it contains characters > outside the base alphabet when interpreting base encoded data, unless > the specification referring to this document explicitly states > otherwise. Such specifications may, as MIME does, instead state that > characters outside the base encoding alphabet should simply be ignored > when interpreting data ("be liberal in what you accept"). > > Words such as "must" claim an authority we do not recognize in the GNU > Project. We do not _obey_ standards--rather, we see what they have to > say, consider their recommendations, then do what seems best. > > I believe the best fix is for base64-decode-region to take an optional > argument which specifies how liberal it should be about it's input, > defaulting to the current behavior, and for Gnus to use this argument. > > To decide whether to do this, we need to know the answers to three questions: > > Is there some situation in which the current behavior of > base64-decode-region causes an actual problem or confusion for > users? I use base64-decode-region in my own email client written in elisp. I have *never* had a problem with it decoding a file so I couldn't read it. Nic ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: base64 behavior is not MIME compliant 2005-07-03 20:43 ` base64 behavior is not MIME compliant Richard M. Stallman 2005-07-03 21:09 ` Nic Ferrier @ 2005-07-04 4:59 ` Marc Horowitz 2005-07-05 4:35 ` Richard M. Stallman 2005-07-05 22:52 ` Arne Jørgensen 1 sibling, 2 replies; 11+ messages in thread From: Marc Horowitz @ 2005-07-04 4:59 UTC (permalink / raw) Cc: bugs, emacs-devel "Richard M. Stallman" <rms@gnu.org> writes: >> Is there some situation in which the current behavior of >> base64-decode-region causes an actual problem or confusion for users? I never would have noticed this had it not caused me a problem. I received a piece of email which passed through an older MTA. This MTA inserted a ! and a newline after every 1000 characters of a very long line of base64-encoded data, which used to be common behavior. When Gnus tried to display this email, it failed, because the ! characters were not recognized as valid base64 encoding. >> Is there some situation in which the current behavior provides an >> advantage? The only case I can think of is if a program or the user tries to base64 decode something which is not base64 encoded, they will receive an error, instead of some other, possibly confusing behavior. However, I believe this case is less common than non-transparent MTAs making small changes to base64-encded data. >> Also, how does the current development Emacs handle these things? >> Your report is based on 21.4; the current sources may be different. I do not know. Marc ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: base64 behavior is not MIME compliant 2005-07-04 4:59 ` Marc Horowitz @ 2005-07-05 4:35 ` Richard M. Stallman 2005-07-05 21:35 ` Marc Horowitz 2005-07-05 22:52 ` Arne Jørgensen 1 sibling, 1 reply; 11+ messages in thread From: Richard M. Stallman @ 2005-07-05 4:35 UTC (permalink / raw) Cc: bugs, emacs-devel I received a piece of email which passed through an older MTA. This MTA inserted a ! and a newline after every 1000 characters of a very long line of base64-encoded data, which used to be common behavior. When Gnus tried to display this email, it failed, because the ! characters were not recognized as valid base64 encoding. Maybe I misunderstood what you were asking for. I thought you were asking us to make additional base64-decode-region signal errors in cases where currently it does not. But now it looks like you are asking for it to accept input that now gives an error. Could you give a self-contained description of the change that you are requesting in the behavior of this function? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: base64 behavior is not MIME compliant 2005-07-05 4:35 ` Richard M. Stallman @ 2005-07-05 21:35 ` Marc Horowitz 2005-07-05 22:10 ` Nic Ferrier 0 siblings, 1 reply; 11+ messages in thread From: Marc Horowitz @ 2005-07-05 21:35 UTC (permalink / raw) Cc: bugs, emacs-devel "Richard M. Stallman" <rms@gnu.org> writes: >> I received a piece of email which passed through an older MTA. This >> MTA inserted a ! and a newline after every 1000 characters of a very >> long line of base64-encoded data, which used to be common behavior. >> When Gnus tried to display this email, it failed, because the ! >> characters were not recognized as valid base64 encoding. > >> Maybe I misunderstood what you were asking for. I thought you >> were asking us to make additional base64-decode-region signal >> errors in cases where currently it does not. But now it looks >> like you are asking for it to accept input that now gives >> an error. I'm sorry I was confusing. To quote my earlier email: I believe the best fix is for base64-decode-region to take an optional argument which specifies how liberal it should be about it's input, defaulting to the current behavior, and for Gnus to use this argument. Defaulting to the current behavior should certainly not mean signalling errors in new cases. You are correct that I'm asking for a behavior variant which would result in more input being accepted. If this variant became the default, I would not object, but I would certainly understand if you did not want to change the default behavior. >> Could you give a self-contained description of the change that you are >> requesting in the behavior of this function? For the purposes of reading mail, it is valuable to ignore all characters not part of the base64 character set when decoding. So, my minimum proposal would be for base64-decode-region to ignore all unknown characters, instead of signalling errors in this case. It would be more generally useful to provide three forms of the base64-decode-region function, either by having three functions, or one with an optional argument: Form 1: all characters not part of the base64 character set would be ignored. Form 2: any character not part of the base64 character set would cause an error to be signalled. Form 3: any character not part of the union of the base64 character set and the whitespace characters would cause an error to be signalled. Form 3 is the current observed behavior. I believe there is a need for Form 1, to make mail reading work more smoothly. Form 2 mainly exists for completeness. Marc ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: base64 behavior is not MIME compliant 2005-07-05 21:35 ` Marc Horowitz @ 2005-07-05 22:10 ` Nic Ferrier 2005-07-05 23:55 ` Marc Horowitz 2005-07-06 1:15 ` Ken Raeburn 0 siblings, 2 replies; 11+ messages in thread From: Nic Ferrier @ 2005-07-05 22:10 UTC (permalink / raw) Cc: emacs-devel, rms, bugs Marc Horowitz <marc@mit.edu> writes: > "Richard M. Stallman" <rms@gnu.org> writes: > >>> I received a piece of email which passed through an older MTA. This >>> MTA inserted a ! and a newline after every 1000 characters of a very >>> long line of base64-encoded data, which used to be common behavior. >>> When Gnus tried to display this email, it failed, because the ! >>> characters were not recognized as valid base64 encoding. >> >>> Maybe I misunderstood what you were asking for. I thought you >>> were asking us to make additional base64-decode-region signal >>> errors in cases where currently it does not. But now it looks >>> like you are asking for it to accept input that now gives >>> an error. > > I'm sorry I was confusing. To quote my earlier email: > > I believe the best fix is for base64-decode-region to take an optional > argument which specifies how liberal it should be about it's input, > defaulting to the current behavior, and for Gnus to use this argument. > > Defaulting to the current behavior should certainly not mean > signalling errors in new cases. You are correct that I'm asking for a > behavior variant which would result in more input being accepted. If > this variant became the default, I would not object, but I would > certainly understand if you did not want to change the default > behavior. > >>> Could you give a self-contained description of the change that you are >>> requesting in the behavior of this function? > > For the purposes of reading mail, it is valuable to ignore all > characters not part of the base64 character set when decoding. So, my > minimum proposal would be for base64-decode-region to ignore all > unknown characters, instead of signalling errors in this case. > > It would be more generally useful to provide three forms of the > base64-decode-region function, either by having three functions, or > one with an optional argument: > > Form 1: all characters not part of the base64 character set would > be ignored. > > Form 2: any character not part of the base64 character set would > cause an error to be signalled. > > Form 3: any character not part of the union of the base64 > character set and the whitespace characters would cause an error > to be signalled. > > Form 3 is the current observed behavior. I believe there is a need > for Form 1, to make mail reading work more smoothly. Form 2 mainly > exists for completeness. Why can't you just pre-parse the data parsed to the base64 decoder? I believe that's the correct behaviour. A base64 decoder should decode base64, not "base64 but also it does this extra trick if you wave your hand in the air" Nic ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: base64 behavior is not MIME compliant 2005-07-05 22:10 ` Nic Ferrier @ 2005-07-05 23:55 ` Marc Horowitz 2005-07-06 1:06 ` Nic Ferrier 2005-07-06 1:15 ` Ken Raeburn 1 sibling, 1 reply; 11+ messages in thread From: Marc Horowitz @ 2005-07-05 23:55 UTC (permalink / raw) Cc: emacs-devel, rms, bugs Nic Ferrier <nferrier@tapsellferrier.co.uk> writes: >> Why can't you just pre-parse the data parsed to the base64 decoder? I >> believe that's the correct behaviour. A base64 decoder should decode >> base64, not "base64 but also it does this extra trick if you wave your >> hand in the air" You could do that, but then the pre-parser needs to know what base64 character sets looks like, so it's not a very clean abstraction. The decoder already knows everything it needs to know. It's also likely that other apps which want to do base64 decoding will want this same functionality, so repeating it makes little sense. But in the end, I don't care strongly if the code is in emacs or in gnus, as long as it's somewhere. Marc ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: base64 behavior is not MIME compliant 2005-07-05 23:55 ` Marc Horowitz @ 2005-07-06 1:06 ` Nic Ferrier 0 siblings, 0 replies; 11+ messages in thread From: Nic Ferrier @ 2005-07-06 1:06 UTC (permalink / raw) Cc: rms, bugs, emacs-devel Marc Horowitz <marc@mit.edu> writes: > Nic Ferrier <nferrier@tapsellferrier.co.uk> writes: > >>> Why can't you just pre-parse the data parsed to the base64 decoder? I >>> believe that's the correct behaviour. A base64 decoder should decode >>> base64, not "base64 but also it does this extra trick if you wave your >>> hand in the air" > > You could do that, but then the pre-parser needs to know what base64 > character sets looks like, so it's not a very clean abstraction. I disagree. Base64 is a well documented and understood encoding. Particularly the acceptable characters. A function base64-clean or clean-for-base64 would be better than changing the base64 decoder to accept unstandard characters. > decoder already knows everything it needs to know. It's also likely > that other apps which want to do base64 decoding will want this same > functionality, so repeating it makes little sense. > > But in the end, I don't care strongly if the code is in emacs or in > gnus, as long as it's somewhere. Can you go over again where you had the problem? Was it a particular message from some particular MTA? Or was it something more general? Nic ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: base64 behavior is not MIME compliant 2005-07-05 22:10 ` Nic Ferrier 2005-07-05 23:55 ` Marc Horowitz @ 2005-07-06 1:15 ` Ken Raeburn 2005-07-06 1:48 ` Nic Ferrier 1 sibling, 1 reply; 11+ messages in thread From: Ken Raeburn @ 2005-07-06 1:15 UTC (permalink / raw) Cc: Marc Horowitz, rms, bugs, emacs-devel On Jul 5, 2005, at 18:10, Nic Ferrier wrote: > Why can't you just pre-parse the data parsed to the base64 decoder? I > believe that's the correct behaviour. A base64 decoder should decode > base64, not "base64 but also it does this extra trick if you wave your > hand in the air" Except "this extra trick" is specifically outlined as an option in the base64 spec (RFC 3548), and MIME invokes that option. So proper "MIME base64 decoding" would require this extra step of throwing away characters that are not part of a base64 encoding, and then making a second pass with the strict base64 decoder. In fact, as I read RFC 3548 section 2.3, the CR/LF line break sequences in MIME messages are not part of the base64 alphabet, and therefore fns.c:IS_BASE64_IGNORABLE already implements a limited form of what Marc is asking for. Ken ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: base64 behavior is not MIME compliant 2005-07-06 1:15 ` Ken Raeburn @ 2005-07-06 1:48 ` Nic Ferrier 0 siblings, 0 replies; 11+ messages in thread From: Nic Ferrier @ 2005-07-06 1:48 UTC (permalink / raw) Cc: Marc Horowitz, rms, bugs, emacs-devel Ken Raeburn <raeburn@raeburn.org> writes: > On Jul 5, 2005, at 18:10, Nic Ferrier wrote: >> Why can't you just pre-parse the data parsed to the base64 decoder? I >> believe that's the correct behaviour. A base64 decoder should decode >> base64, not "base64 but also it does this extra trick if you wave your >> hand in the air" > > Except "this extra trick" is specifically outlined as an option in the > base64 spec (RFC 3548), and MIME invokes that option. So proper "MIME > base64 decoding" would require this extra step of throwing away > characters that are not part of a base64 encoding, and then making a > second pass with the strict base64 decoder. In fact, as I read RFC > 3548 section 2.3, the CR/LF line break sequences in MIME messages are > not part of the base64 alphabet, and therefore > fns.c:IS_BASE64_IGNORABLE already implements a limited form of what > Marc is asking for. You're quite right - my view of what a base64 decoder should do is based on previous implementation rather than what the spec says (mea culpa). Anyway - 2045 (the last MIME/base64 spec I was aware of) says this: The encoded output stream must be represented in lines of no more than 76 characters each. All line breaks or other characters not found in Table 1 [the acceptable alphabet table] must be ignored by decoding software. In base64 data, characters other than those in Table 1, line breaks, and other white space probably indicate a transmission error, about which a warning message or even a message rejection might be appropriate under some circumstances. So this spec suggests that the base64 decoder should optionally error or throw an exception or call a supplied handler or something. How about I write some advice to just throw unwanted characters away for base64-decode-xx? Different advice could be maybe be used when errors are needed. I'm happy to do this if someone wants it - advice is fairly easy to include in stuff without it necessarily being in Emacs. I'll do it on my train journey to work tommorow morning. Nic ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: base64 behavior is not MIME compliant 2005-07-04 4:59 ` Marc Horowitz 2005-07-05 4:35 ` Richard M. Stallman @ 2005-07-05 22:52 ` Arne Jørgensen 1 sibling, 0 replies; 11+ messages in thread From: Arne Jørgensen @ 2005-07-05 22:52 UTC (permalink / raw) Marc Horowitz <marc@mit.edu> writes: > "Richard M. Stallman" <rms@gnu.org> writes: > >>> Is there some situation in which the current behavior of >>> base64-decode-region causes an actual problem or confusion for users? > > I never would have noticed this had it not caused me a problem. > > I received a piece of email which passed through an older MTA. This > MTA inserted a ! and a newline after every 1000 characters of a very > long line of base64-encoded data, which used to be common behavior. > When Gnus tried to display this email, it failed, because the ! > characters were not recognized as valid base64 encoding. MIME puts a limit on the line length at 76 characters. So in most cases this will in it self be a broken behavior. (Base64 can probably be used outside MIME too, of course). >>> Is there some situation in which the current behavior provides an >>> advantage? > > The only case I can think of is if a program or the user tries to > base64 decode something which is not base64 encoded, they will receive > an error, instead of some other, possibly confusing behavior. > However, I believe this case is less common than non-transparent MTAs > making small changes to base64-encded data. I actually wrote some code for No Gnus recently that depended on base64-decode-string to throw an error on strings that where not base64 encoded. Then it turned out that XEmacs' implementation of base64-decode-region/base64-decode-string _does_ ignore illegal characters in the base64 encoding. The problem is that we have no other way to detect if a region/string is base64 encoded or not. I thought about doing a decode and then encode and compare the before and after string, but I finally found another to recognize the data (it was a PEM encoded X509 certificate and should therefore begin with "MII"). >>> Also, how does the current development Emacs handle these things? >>> Your report is based on 21.4; the current sources may be different. > > I do not know. I think the behavior is unchanged. Kind regards, -- Arne Jørgensen <http://arnested.dk/> ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2005-07-06 1:48 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <t53zmt4dce3.fsf@central-air-conditioning.toybox.cambridge.ma.us> 2005-07-03 20:43 ` base64 behavior is not MIME compliant Richard M. Stallman 2005-07-03 21:09 ` Nic Ferrier 2005-07-04 4:59 ` Marc Horowitz 2005-07-05 4:35 ` Richard M. Stallman 2005-07-05 21:35 ` Marc Horowitz 2005-07-05 22:10 ` Nic Ferrier 2005-07-05 23:55 ` Marc Horowitz 2005-07-06 1:06 ` Nic Ferrier 2005-07-06 1:15 ` Ken Raeburn 2005-07-06 1:48 ` Nic Ferrier 2005-07-05 22:52 ` Arne Jørgensen
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).