unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Juri Linkov <juri@linkov.net>
To: Andreas Schwab <schwab@linux-m68k.org>
Cc: Lars Ingebrigtsen <larsi@gnus.org>, 38587@debbugs.gnu.org
Subject: bug#38587: base64-decode-region breaks encoding
Date: Mon, 16 Dec 2019 00:40:55 +0200	[thread overview]
Message-ID: <87zhft9rl4.fsf@mail.linkov.net> (raw)
In-Reply-To: <87eex66k7h.fsf@hase.home> (Andreas Schwab's message of "Sun, 15 Dec 2019 09:56:18 +0100")

>> Maybe an additional CODING arg for base64-decode-region?
>
> BASE64 is defined on a sequence of bytes.  It doesn't make sense to
> apply it to characters.

But isn't UTF-8 a multibyte encoding represented by a sequence of bytes
(e.g. when saved to a file)?

Then why base64-encode-region couldn't use the buffer's coding
to convert the region to a sequence of bytes?

Also why base64-encode-region accepts region's characters
only from the charsets ‘eight-bit-control’ and ‘eight-bit-graphic’,
but not other UTF-8 characters?

> The input of base64-encode-region needs to be encoded into bytes and the
> output of base64-decode-region needs to be decoded into characters.  If
> you do that, you get a full reversible operation.

I guess base64-encode-region already encodes the region into bytes,
but only partially - it signals an error on some characters,
I don't understand why it can't encode all of them.

>> Or it would be enough to use the coding system of the
>> output buffer?
>
> The coding system of the output buffer has nothing to do with the coding
> of the data produced by base64-decode-region, just like
> process-coding-system is independent from the coding system of the
> process buffer.

It's understandable that the coding system of the output buffer
is not necessarily the same as expected from the output of
base64-decode-region.

But is it still possible to tell base64-decode-region
about the expected output coding system?  Maybe using
a prefix arg: C-u M-x base64-decode-region could ask
for a coding, defaulting to the buffer's coding.

For example, in Ruby

  require 'base64'
  Base64.decode64(Base64.encode64("☃"))
  => "\xE2\x98\x83"

indeed outputs ASCII not encoded to UTF-8.
But it's possible to force encoding with:

  Base64.decode64(Base64.encode64("☃")).force_encoding('UTF-8')
  => "☃"

Is there an equivalent of force_encoding('UTF-8') in Emacs?
I tried to call after base64-decode-region on its output:

  (decode-coding-region (point-min) (point-max) 'binary)

but it doesn't work, neither this:

  (encode-coding-region (point-min) (point-max) 'utf-8)

Also this doesn't work on the string output:

  (decode-coding-string (base64-decode-string (base64-encode-string "ä"))
                        'utf-8)

Maybe I'm doing something wrong?





  reply	other threads:[~2019-12-15 22:40 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-12 23:55 bug#38587: base64-decode-region breaks encoding Juri Linkov
2019-12-13  2:52 ` Lars Ingebrigtsen
2019-12-13  7:12   ` Eli Zaretskii
2019-12-14 23:31   ` Juri Linkov
2019-12-15  8:56     ` Andreas Schwab
2019-12-15 22:40       ` Juri Linkov [this message]
2019-12-16 15:58         ` Eli Zaretskii
2019-12-16 21:51           ` Juri Linkov
2019-12-17 16:04             ` Eli Zaretskii
2019-12-17 23:10               ` Juri Linkov
2019-12-24 15:37                 ` Lars Ingebrigtsen
2019-12-24 16:13                   ` Lars Ingebrigtsen
2019-12-16 16:18         ` Andreas Schwab
2019-12-17 16:27         ` Lars Ingebrigtsen
2019-12-15 15:26     ` Eli Zaretskii
2019-12-15 22:41       ` Juri Linkov
2019-12-16  3:28         ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zhft9rl4.fsf@mail.linkov.net \
    --to=juri@linkov.net \
    --cc=38587@debbugs.gnu.org \
    --cc=larsi@gnus.org \
    --cc=schwab@linux-m68k.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).