unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* base64-decode-region inserts carriage-returns
@ 2002-06-08 20:42 Eric Hanchrow
  2002-06-08 21:06 ` Eric Hanchrow
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Eric Hanchrow @ 2002-06-08 20:42 UTC (permalink / raw)


In GNU Emacs 21.2.1 (i386-debian-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2002-05-18 on offby1, modified by Debian
configured using `configure  i386-debian-linux-gnu --prefix=/usr --sharedstatedir=/var/lib --libexecdir=/usr/lib --localstatedir=/var/lib --infodir=/usr/share/info --mandir=/usr/share/man --with-pop=yes --with-x=yes --with-x-toolkit=athena --without-gif'
Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: nil
  locale-coding-system: nil
  default-enable-multibyte-characters: nil

Using Bash, create a binary file containing eight bytes in two lines:

        bash$ echo -n $'\001\002\003\n\001\002\003\n' > /tmp/bin

Double-check that the file contains what we think it does:

        bash$ od -c /tmp/bin

        you'll see 0000000 001 002 003  \n 001 002 003  \n

Start Emacs with -q --no-site-file.

Visit that file in Emacs:

        M-x find-file-literally RET /tmp/bin RET

Base64-encode it:

        C-x h M-x base64-encode-region RET

Put a carriage-return-linefeed pair at the end of the single line:

        M-> C-q C-m RET

Save the encoded version:

        C-x C-w bin.b64 RET

Revisit the file, thus setting the buffer to use the MS-DOS line
ending convention:

        C-x C-v RET

Base64-decode the file:

        C-x h M-x base64-decode-region

Save the decoded version to a different file for comparison with the
original:

        C-x C-w bin.again RET

Now examine the newly-saved version with od back at the shell:

        od -c /tmp/bin.again 

        you'll now see 0000000 001 002 003  \r  \n 001 002 003  \r  \n

Thus the binary file has had some carriage-returns inserted into it,
which is a Bad Thing, since those carriage-returns were not present in
the encoded data.

RFC 2045 says both

        All line breaks or other characters not found in Table 1 must
        be ignored by decoding software.

and

        Any characters outside of the base64 alphabet are to be
        ignored in base64-encoded data.


If this is indeed a bug (as opposed to my misunderstanding how
base64-decode-region is supposed to work) then a possible fix would be
to have base64-decode-region, after it's done its work, do
(set-buffer-file-coding-system 'raw-text-unix) or something similar.

-- 
PGP Fingerprint: 3E7B A3F3 96CA 8958 ACC5  C8BD 6337 0041 C01C 5276

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: base64-decode-region inserts carriage-returns
  2002-06-08 20:42 base64-decode-region inserts carriage-returns Eric Hanchrow
@ 2002-06-08 21:06 ` Eric Hanchrow
  2002-06-09  9:13 ` Eli Zaretskii
  2002-06-10 10:14 ` Richard Stallman
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Hanchrow @ 2002-06-08 21:06 UTC (permalink / raw)


I think 'no-conversion is a better suggestion than 'raw-text-unix.

And if the Emacs maintainers decide not to change Emacs, I can get
along just fine by putting the following in my .emacs:

    (fset 'builtin-base64-decode-region (symbol-function 'base64-decode-region))

    (defun base64-decode-region (beg end)
      "Just like `builtin-base64-decode-region', but avoids putting
    spurious carriage-returns in the output."
      (interactive "r")
      (builtin-base64-decode-region beg end)
      (set-buffer-file-coding-system 'no-conversion))

-- 
PGP Fingerprint: 3E7B A3F3 96CA 8958 ACC5  C8BD 6337 0041 C01C 5276

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: base64-decode-region inserts carriage-returns
  2002-06-08 20:42 base64-decode-region inserts carriage-returns Eric Hanchrow
  2002-06-08 21:06 ` Eric Hanchrow
@ 2002-06-09  9:13 ` Eli Zaretskii
  2002-06-10 10:14 ` Richard Stallman
  2 siblings, 0 replies; 4+ messages in thread
From: Eli Zaretskii @ 2002-06-09  9:13 UTC (permalink / raw)
  Cc: bug-gnu-emacs


On 8 Jun 2002, Eric Hanchrow wrote:

> Save the decoded version to a different file for comparison with the
> original:
> 
>         C-x C-w bin.again RET
> 
> Now examine the newly-saved version with od back at the shell:
> 
>         od -c /tmp/bin.again 
> 
>         you'll now see 0000000 001 002 003  \r  \n 001 002 003  \r  \n
> 
> Thus the binary file has had some carriage-returns inserted into it,

This happens because the buffer where you performed the conversion has 
undecided-dos as the value of its buffer-file-coding-system variable.

> which is a Bad Thing, since those carriage-returns were not present in
> the encoded data.

I'm not sure your conclusion is right.  base64-decode-region is a 
primitive which acts on a region.  It doesn't have any clue about what 
does the caller want to do with the result of decoding, and it is IMHO 
wrong to change buffer-file-coding-system because something you did with 
a portion of buffer's text.

For example, imagine that some program source sent as a base64-encoded 
attachment is being decoded on a Windows system.  In that case, I think 
there's nothing wrong with saving the result with DOS EOLs; users might 
even expect that.

More generally, the way the buffer should be saved is something a user or 
higher-level features should determine.  That is, it is up to the caller 
of base64-decode-region to decide whether or not to change the way buffer 
is encoded when written  Primitives that operate on a region should not 
change that.

I'd say that a user-friendly interface to base64-decode-region should 
decode the text in a scratch buffer, and the insert the result into the 
user buffer.  That way, the value of buffer-file-coding-system in the 
scratch buffer doesn't count, and encoding of the user buffer is not 
affected by a primitive.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: base64-decode-region inserts carriage-returns
  2002-06-08 20:42 base64-decode-region inserts carriage-returns Eric Hanchrow
  2002-06-08 21:06 ` Eric Hanchrow
  2002-06-09  9:13 ` Eli Zaretskii
@ 2002-06-10 10:14 ` Richard Stallman
  2 siblings, 0 replies; 4+ messages in thread
From: Richard Stallman @ 2002-06-10 10:14 UTC (permalink / raw)
  Cc: bug-gnu-emacs

    Base64-decode the file:

	    C-x h M-x base64-decode-region

At this point you should have a buffer with newlines in it.

    Save the decoded version to a different file for comparison with the
    original:

	    C-x C-w bin.again RET

Since you already arranged to use MSDOS encoding, the file will
be encoded that way.  That seems correct to me.

    Now examine the newly-saved version with od back at the shell:

	    od -c /tmp/bin.again 

	    you'll now see 0000000 001 002 003  \r  \n 001 002 003  \r  \n

I see no bug here.  base64-decode-region produced the right results;
if you don't want that encoded in MSDOS when you save the file,
you should do something to specify otherwise.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2002-06-10 10:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-06-08 20:42 base64-decode-region inserts carriage-returns Eric Hanchrow
2002-06-08 21:06 ` Eric Hanchrow
2002-06-09  9:13 ` Eli Zaretskii
2002-06-10 10:14 ` Richard Stallman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).