From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Juri Linkov Newsgroups: gmane.emacs.bugs Subject: bug#38587: base64-decode-region breaks encoding Date: Mon, 16 Dec 2019 00:40:55 +0200 Organization: LINKOV.NET Message-ID: <87zhft9rl4.fsf@mail.linkov.net> References: <87blsdhzeb.fsf@mail.linkov.net> <87pngtndhd.fsf@gnus.org> <87v9qieb6t.fsf@mail.linkov.net> <87eex66k7h.fsf@hase.home> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="227096"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (x86_64-pc-linux-gnu) Cc: Lars Ingebrigtsen , 38587@debbugs.gnu.org To: Andreas Schwab Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Dec 16 00:19:00 2019 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([209.51.188.17]) by blaine.gmane.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89) (envelope-from ) id 1igdAC-000wvc-01 for geb-bug-gnu-emacs@m.gmane.org; Mon, 16 Dec 2019 00:19:00 +0100 Original-Received: from localhost ([::1]:44570 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1igdAA-0002sh-PZ for geb-bug-gnu-emacs@m.gmane.org; Sun, 15 Dec 2019 18:18:58 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:36917) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1igd9I-0001qy-JO for bug-gnu-emacs@gnu.org; Sun, 15 Dec 2019 18:18:05 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1igd9G-000473-Ft for bug-gnu-emacs@gnu.org; Sun, 15 Dec 2019 18:18:04 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]:59750) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1igd9G-00046p-CU for bug-gnu-emacs@gnu.org; Sun, 15 Dec 2019 18:18:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1igd9G-000489-8r for bug-gnu-emacs@gnu.org; Sun, 15 Dec 2019 18:18:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Juri Linkov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 15 Dec 2019 23:18:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 38587 X-GNU-PR-Package: emacs Original-Received: via spool by 38587-submit@debbugs.gnu.org id=B38587.157645184515796 (code B ref 38587); Sun, 15 Dec 2019 23:18:02 +0000 Original-Received: (at 38587) by debbugs.gnu.org; 15 Dec 2019 23:17:25 +0000 Original-Received: from localhost ([127.0.0.1]:37482 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1igd8f-00046i-D0 for submit@debbugs.gnu.org; Sun, 15 Dec 2019 18:17:25 -0500 Original-Received: from eastern.birch.relay.mailchannels.net ([23.83.209.55]:58480) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1igd8c-00046Y-Gw for 38587@debbugs.gnu.org; Sun, 15 Dec 2019 18:17:23 -0500 X-Sender-Id: dreamhost|x-authsender|jurta@jurta.org Original-Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 31670500A35; Sun, 15 Dec 2019 23:17:21 +0000 (UTC) Original-Received: from pdx1-sub0-mail-a34.g.dreamhost.com (100-96-60-111.trex.outbound.svc.cluster.local [100.96.60.111]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 8B1CF5016F0; Sun, 15 Dec 2019 23:17:20 +0000 (UTC) X-Sender-Id: dreamhost|x-authsender|jurta@jurta.org Original-Received: from pdx1-sub0-mail-a34.g.dreamhost.com ([TEMPUNAVAIL]. [64.90.62.162]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.18.5); Sun, 15 Dec 2019 23:17:20 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|jurta@jurta.org X-MailChannels-Auth-Id: dreamhost X-Lettuce-Stretch: 5e8e314f3ad6be90_1576451840815_2660741199 X-MC-Loop-Signature: 1576451840815:3673164450 X-MC-Ingress-Time: 1576451840814 Original-Received: from pdx1-sub0-mail-a34.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a34.g.dreamhost.com (Postfix) with ESMTP id 7C3817F5E9; Sun, 15 Dec 2019 15:17:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=linkov.net; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type:content-transfer-encoding; s=linkov.net; bh=6/IZog RsQt9VM2oLIukqTbPw0WQ=; b=H5idN6PdB2ThWY5bAn+eIvN+W2gj69EYjtaiCI PIfq9OLhWOk/dfX1Pq12gApoUDUb84++NRP4fcB8POHNhtUUAIaVQkRe8HzbcWiQ Y1SzsgFNBCJgscU5C4Gdv54ZfJsHHlWrMnTAfOaYi9nyxBnDBTTcSZA81rYLZwTf RD2fw= Original-Received: from mail.jurta.org (m91-129-107-186.cust.tele2.ee [91.129.107.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: jurta@jurta.org) by pdx1-sub0-mail-a34.g.dreamhost.com (Postfix) with ESMTPSA id ACF847F5EF; Sun, 15 Dec 2019 15:17:11 -0800 (PST) X-DH-BACKEND: pdx1-sub0-mail-a34 In-Reply-To: <87eex66k7h.fsf@hase.home> (Andreas Schwab's message of "Sun, 15 Dec 2019 09:56:18 +0100") X-VR-OUT-STATUS: OK X-VR-OUT-SCORE: -100 X-VR-OUT-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedufedrvddtgedgtdekucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuggftfghnshhusghstghrihgsvgdpffftgfetoffjqffuvfenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhephffvufhofhffjgfkfgggtgfgsehtkeertddtreejnecuhfhrohhmpefluhhrihcunfhinhhkohhvuceojhhurhhisehlihhnkhhovhdrnhgvtheqnecukfhppeeluddruddvledruddtjedrudekieenucfrrghrrghmpehmohguvgepshhmthhppdhhvghlohepmhgrihhlrdhjuhhrthgrrdhorhhgpdhinhgvthepledurdduvdelrddutdejrddukeeipdhrvghtuhhrnhdqphgrthhhpefluhhrihcunfhinhhkohhvuceojhhurhhisehlihhnkhhovhdrnhgvtheqpdhmrghilhhfrhhomhepjhhurhhisehlihhnkhhovhdrnhgvthdpnhhrtghpthhtohepshgthhifrggssehlihhnuhigqdhmieekkhdrohhrghenucevlhhushhtvghrufhiiigvpedt X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.51.188.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: "bug-gnu-emacs" Xref: news.gmane.org gmane.emacs.bugs:173407 Archived-At: >> Maybe an additional CODING arg for base64-decode-region? > > BASE64 is defined on a sequence of bytes. It doesn't make sense to > apply it to characters. But isn't UTF-8 a multibyte encoding represented by a sequence of bytes (e.g. when saved to a file)? Then why base64-encode-region couldn't use the buffer's coding to convert the region to a sequence of bytes? Also why base64-encode-region accepts region's characters only from the charsets =E2=80=98eight-bit-control=E2=80=99 and =E2=80=98e= ight-bit-graphic=E2=80=99, but not other UTF-8 characters? > The input of base64-encode-region needs to be encoded into bytes and th= e > output of base64-decode-region needs to be decoded into characters. If > you do that, you get a full reversible operation. I guess base64-encode-region already encodes the region into bytes, but only partially - it signals an error on some characters, I don't understand why it can't encode all of them. >> Or it would be enough to use the coding system of the >> output buffer? > > The coding system of the output buffer has nothing to do with the codin= g > of the data produced by base64-decode-region, just like > process-coding-system is independent from the coding system of the > process buffer. It's understandable that the coding system of the output buffer is not necessarily the same as expected from the output of base64-decode-region. But is it still possible to tell base64-decode-region about the expected output coding system? Maybe using a prefix arg: C-u M-x base64-decode-region could ask for a coding, defaulting to the buffer's coding. For example, in Ruby require 'base64' Base64.decode64(Base64.encode64("=E2=98=83")) =3D> "\xE2\x98\x83" indeed outputs ASCII not encoded to UTF-8. But it's possible to force encoding with: Base64.decode64(Base64.encode64("=E2=98=83")).force_encoding('UTF-8') =3D> "=E2=98=83" Is there an equivalent of force_encoding('UTF-8') in Emacs? I tried to call after base64-decode-region on its output: (decode-coding-region (point-min) (point-max) 'binary) but it doesn't work, neither this: (encode-coding-region (point-min) (point-max) 'utf-8) Also this doesn't work on the string output: (decode-coding-string (base64-decode-string (base64-encode-string "=C3=A4= ")) 'utf-8) Maybe I'm doing something wrong?