From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#38587: base64-decode-region breaks encoding Date: Mon, 16 Dec 2019 17:58:29 +0200 Message-ID: <835zig5kka.fsf@gnu.org> References: <87blsdhzeb.fsf@mail.linkov.net> <87pngtndhd.fsf@gnus.org> <87v9qieb6t.fsf@mail.linkov.net> <87eex66k7h.fsf@hase.home> <87zhft9rl4.fsf@mail.linkov.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="248965"; mail-complaints-to="usenet@blaine.gmane.org" Cc: larsi@gnus.org, schwab@linux-m68k.org, 38587@debbugs.gnu.org To: Juri Linkov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Dec 16 16:59:13 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1igsm9-0012dt-5B for geb-bug-gnu-emacs@m.gmane.org; Mon, 16 Dec 2019 16:59:13 +0100 Original-Received: from localhost ([::1]:56072 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1igsm7-0002Vr-QA for geb-bug-gnu-emacs@m.gmane.org; Mon, 16 Dec 2019 10:59:11 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:43046) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1igslz-0002TS-GC for bug-gnu-emacs@gnu.org; Mon, 16 Dec 2019 10:59:04 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1igsly-0000XH-Da for bug-gnu-emacs@gnu.org; Mon, 16 Dec 2019 10:59:03 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:34321) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1igsly-0000X3-AN for bug-gnu-emacs@gnu.org; Mon, 16 Dec 2019 10:59:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1igsly-0001bW-Aq for bug-gnu-emacs@gnu.org; Mon, 16 Dec 2019 10:59:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 16 Dec 2019 15:59:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 38587 X-GNU-PR-Package: emacs Original-Received: via spool by 38587-submit@debbugs.gnu.org id=B38587.15765119346142 (code B ref 38587); Mon, 16 Dec 2019 15:59:02 +0000 Original-Received: (at 38587) by debbugs.gnu.org; 16 Dec 2019 15:58:54 +0000 Original-Received: from localhost ([127.0.0.1]:40294 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1igslq-0001b0-7V for submit@debbugs.gnu.org; Mon, 16 Dec 2019 10:58:54 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:33875) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1igslp-0001al-Ax for 38587@debbugs.gnu.org; Mon, 16 Dec 2019 10:58:53 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:32937) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1igslj-0008St-PX; Mon, 16 Dec 2019 10:58:47 -0500 Original-Received: from [176.228.60.248] (port=3002 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1igsli-0002O5-Du; Mon, 16 Dec 2019 10:58:47 -0500 In-reply-to: <87zhft9rl4.fsf@mail.linkov.net> (message from Juri Linkov on Mon, 16 Dec 2019 00:40:55 +0200) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:173441 Archived-At: > From: Juri Linkov > Date: Mon, 16 Dec 2019 00:40:55 +0200 > Cc: Lars Ingebrigtsen , 38587@debbugs.gnu.org > > > BASE64 is defined on a sequence of bytes. It doesn't make sense to > > apply it to characters. > > But isn't UTF-8 a multibyte encoding represented by a sequence of bytes > (e.g. when saved to a file)? When saved to a file, yes. > Then why base64-encode-region couldn't use the buffer's coding > to convert the region to a sequence of bytes? Because it isn't guaranteed that the buffer's encoding is indeed the right one for this job. > Also why base64-encode-region accepts region's characters > only from the charsets ‘eight-bit-control’ and ‘eight-bit-graphic’, > but not other UTF-8 characters? Because it wants raw bytes, and only eight-bit charsets fit that condition. Eight-bit charset is the charset of raw bytes in a multibyte buffer or string. (base64-encode-region can also work on unibyte buffers and strings, in which case "charset" of such "text" has no meaning.) > > The input of base64-encode-region needs to be encoded into bytes and the > > output of base64-decode-region needs to be decoded into characters. If > > you do that, you get a full reversible operation. > > I guess base64-encode-region already encodes the region into bytes, > but only partially - it signals an error on some characters, > I don't understand why it can't encode all of them. Once again, because it wants to process only raw bytes. > But is it still possible to tell base64-decode-region > about the expected output coding system? Maybe using > a prefix arg: C-u M-x base64-decode-region could ask > for a coding, defaulting to the buffer's coding. If we want to make such a change, then "C-x RET c" is a better prefix command, as it is consistent with other commands that accept coding-system overrides. > Is there an equivalent of force_encoding('UTF-8') in Emacs? "C-x RET c utf-8 RET M-x SOME-COMMAND RET" > Also this doesn't work on the string output: > > (decode-coding-string (base64-decode-string (base64-encode-string "ä")) > 'utf-8) It will work if you encode "ä" first: (decode-coding-string (base64-decode-string (base64-encode-string (encode-coding-string "ä" 'utf-8))) 'utf-8)