From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: bug#23750: 25.0.95; bug in url-retrieve or json.el Date: Fri, 02 Dec 2016 17:45:02 +0200 Message-ID: <83k2bimj29.fsf@gnu.org> References: <6d0c8c2e-8428-2fdb-0d6e-899f7b9d7ffd@nifty.com> <8053af81-80e1-a24a-f649-8ffc86963ed5@nifty.com> <0cc7fab4-9a2c-6a8d-def7-36bd50317ca3@yandex.ru> <7f9a799f-de88-fd78-0cdc-dac0928f1503@nifty.com> <308bb78f-8be3-092d-d877-e129d340242b@nifty.com> <4dc615e7-ec73-60a5-426e-0d6986f15d76@yandex.ru> <0cb406fb-ffc4-a4ad-557a-2cacc99b8e75@nifty.com> <86ccb4af-5719-c017-26bb-fc06b4c904d2@yandex.ru> <83r35uxkr5.fsf@gnu.org> <4e12d4ad-cd6b-3087-5d7c-449d4c1886e2@yandex.ru> <83lgw1q9uu.fsf@gnu.org> <83eg1tq8is.fsf@gnu.org> <787e5206-53e0-752f-a339-4608d2f7ad39@yandex.ru> <837f7lq6lg.fsf@gnu.org> <83bmwvpo0o.fsf@gnu.org> <04e2da4e-1fa3-4483-459c-32f272378486@yandex.ru> <83vav2mmsl.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1480693537 1056 195.159.176.226 (2 Dec 2016 15:45:37 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Fri, 2 Dec 2016 15:45:37 +0000 (UTC) Cc: p.stephani2@gmail.com, emacs-devel@gnu.org, kentaro.nakazawa@nifty.com, larsi@gnus.org, dgutov@yandex.ru To: Yuri Khan Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Dec 02 16:45:30 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cCq1g-0007Ie-PP for ged-emacs-devel@m.gmane.org; Fri, 02 Dec 2016 16:45:28 +0100 Original-Received: from localhost ([::1]:35215 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cCq1k-000427-Pa for ged-emacs-devel@m.gmane.org; Fri, 02 Dec 2016 10:45:32 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:54773) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cCq1d-00040d-EB for emacs-devel@gnu.org; Fri, 02 Dec 2016 10:45:26 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cCq1Z-0006Nw-D8 for emacs-devel@gnu.org; Fri, 02 Dec 2016 10:45:25 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:55628) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cCq1K-0006ET-Fb; Fri, 02 Dec 2016 10:45:06 -0500 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1775 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1cCq1I-0005h5-2e; Fri, 02 Dec 2016 10:45:06 -0500 In-reply-to: (message from Yuri Khan on Fri, 2 Dec 2016 21:53:16 +0700) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:209928 Archived-At: > From: Yuri Khan > Date: Fri, 2 Dec 2016 21:53:16 +0700 > Cc: Dmitry Gutov , Philipp Stephani , > It is really unfortunate that we talk about ASCII strings, unibyte > strings, multibyte strings, as if that was a meaningful > classification. It is meaningful when you work on Emacs code. > The real dichotomy is between text (aka strings) and MIME-type-tagged > byte arrays. That might be so in the context of HTTP, but in general, byte arrays ("raw bytes" in Emacs parlance) are not limited to MIME types. Moreover, there are very frequent use cases where Emacs code needs to work with a byte array whose type is unknown, or even cannot be known at all, because it doesn't come with any meta-data of any kind. > In order to send a string over HTTP, one must encode it > to a byte array and tag it as "text/plain; charset=utf-8" or > "text/html; charset=utf-8" or application/json (no charset parameter > because json must always be encoded in one of utf-* for transmission). > Conversely, a byte array received over HTTP can, MIME type allowing, > decoded into a string. > > The fact that there exist strings for which encoding and decoding are > identity transforms should be regarded only as an implementation > detail. You are talking generalities here, whereas this discussion is about Emacs-specific internal issues. In Emacs, a plain-ASCII string is indistinguishable from a "byte array" whose bytes are all below 128. They have the same representation. To muddy the water even more, a plain-ASCII string can be "marked" as multibyte (again, internally), but it should be clear that such a "mark" has no meaning at all for ASCII text. >From the Lisp application POV, whether a plain-ASCII string it receives or processes is marked as unibyte or multibyte is entirely random. So if some ASCII text is accepted by an Emacs API involved in sending HTTP requests, while an identical ASCII string is rejected, it could be a source of surprises and bug reports. That is the core of the issues discussed here. > Attempts by libraries and frameworks to silently DTRT for this > subset lead to applications neglecting to properly encode or tag > strings, leading, in turn, to breakage in presence of multilingual > text. Based on Emacs experience of dealing with multibyte text and its encoding/decoding, the conclusion was that it is better to silently DTRT where we can be sure we know how. Making a point of educating users by harsh measures such as signaling errors where Emacs could easily proceed, is generally not welcome. We will see if this case is any different.