From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: bug#23750: 25.0.95; bug in url-retrieve or json.el Date: Wed, 28 Dec 2016 20:34:28 +0200 Message-ID: <83lguzvr63.fsf@gnu.org> References: <6d0c8c2e-8428-2fdb-0d6e-899f7b9d7ffd@nifty.com> <8053af81-80e1-a24a-f649-8ffc86963ed5@nifty.com> <0cc7fab4-9a2c-6a8d-def7-36bd50317ca3@yandex.ru> <7f9a799f-de88-fd78-0cdc-dac0928f1503@nifty.com> <308bb78f-8be3-092d-d877-e129d340242b@nifty.com> <4dc615e7-ec73-60a5-426e-0d6986f15d76@yandex.ru> <0cb406fb-ffc4-a4ad-557a-2cacc99b8e75@nifty.com> <86ccb4af-5719-c017-26bb-fc06b4c904d2@yandex.ru> <83r35uxkr5.fsf@gnu.org> <4e12d4ad-cd6b-3087-5d7c-449d4c1886e2@yandex.ru> <83lgw1q9uu.fsf@gnu.org> <83eg1tq8is.fsf@gnu.org> <787e5206-53e0-752f-a339-4608d2f7ad39@yandex.ru> <8360n5q6j4.fsf@gnu.org> <8337i8rkbe.fsf@gnu.org> <83polcpzwk.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1482950107 28138 195.159.176.226 (28 Dec 2016 18:35:07 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 28 Dec 2016 18:35:07 +0000 (UTC) Cc: larsi@gnus.org, dgutov@yandex.ru, kentaro.nakazawa@nifty.com, emacs-devel@gnu.org To: Philipp Stephani Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Wed Dec 28 19:34:59 2016 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cMJ3y-0006B9-94 for ged-emacs-devel@m.gmane.org; Wed, 28 Dec 2016 19:34:58 +0100 Original-Received: from localhost ([::1]:60443 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cMJ43-0002HW-3M for ged-emacs-devel@m.gmane.org; Wed, 28 Dec 2016 13:35:03 -0500 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:42203) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cMJ3q-0002HP-Ea for emacs-devel@gnu.org; Wed, 28 Dec 2016 13:34:54 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cMJ3m-0007Wt-G6 for emacs-devel@gnu.org; Wed, 28 Dec 2016 13:34:50 -0500 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:43552) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cMJ3c-0007ML-Am; Wed, 28 Dec 2016 13:34:36 -0500 Original-Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:4235 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1cMJ3b-0005HB-CP; Wed, 28 Dec 2016 13:34:35 -0500 In-reply-to: (message from Philipp Stephani on Wed, 28 Dec 2016 18:18:25 +0000) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:210921 Archived-At: > From: Philipp Stephani > Date: Wed, 28 Dec 2016 18:18:25 +0000 > Cc: larsi@gnus.org, emacs-devel@gnu.org, kentaro.nakazawa@nifty.com, > dgutov@yandex.ru > > > > That's right -- why should any code care? Yet url.el does. > > > > No, it doesn't, not if the string is plain ASCII. > > > > But in that case it isn't, it's morally a byte array. > > Yes, because the internal representation of characters in Emacs is a > superset of UTF-8. > > That has nothing to do with characters. A byte array is conceptually different from a character string. In Emacs, they are both implemented using very similar objects. > > What Emacs lacks is good support for byte arrays. > > Unibyte strings are byte arrays. What do you think we lack in that regard? > > If unibyte strings should be used for byte arrays, then the URL functions should indeed signal an error > whenever url-request-data is a multibyte string, as HTTP requests are conceptually byte arrays, not character > strings. Which is what we do now. > > For HTTP, process-send-string shouldn't need to deal > > with encoding or EOL conversion, it should just accept a byte array and send that, unmodified. > > I disagree. Handling unibyte strings is a nuisance, so Emacs allows > most applications be oblivious about them, and just handle > human-readable text. > > That is the wrong approach (byte arrays and character strings are fundamentally different types, and mixing > them together only causes pain), and it cannot work when implementing network protocols. HTTP requests > are *not* human-readable text, they are byte arrays. Attempting to handle Unicode strings can't work because > we wouldn't know the number of encoded bytes. You are arguing against a long and quite painful history of non-ASCII strings in Emacs. What we have now is based on a lot of experience and at least two very large refactoring jobs. Going back would be a very bad idea indeed, as we've been there already, and users didn't like that. Some of us are old enough to remember the notorious \201 bytes creeping into text files and mail messages, due to that. Never again. Our experience is that we should keep use of unibyte strings in Lisp application code to the absolute minimum, ideally zero. Once we arrived at that conclusion, we've been living happily ever after. This minor issue we are discussing here is certainly not worth repeating past mistakes for which we paid plenty in sweat and blood.