From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.emacs.devel Subject: Re: Emacs Lisp's future Date: Mon, 13 Oct 2014 09:41:55 +0200 Message-ID: <87iojoza9o.fsf@fencepost.gnu.org> References: <54193A70.9020901@member.fsf.org> <87d2ahm3nw.fsf@fencepost.gnu.org> <871tqneyvl.fsf@netris.org> <87d2a54t1m.fsf@yeeloong.lan> <83lhotme1e.fsf@gnu.org> <871tql17uw.fsf@yeeloong.lan> <838uktm9gw.fsf@gnu.org> <87h9zgarvp.fsf@fencepost.gnu.org> <83y4srjaot.fsf@gnu.org> <83r3yhiu8c.fsf@gnu.org> <8738ax7k8w.fsf@fencepost.gnu.org> <83k349iqjj.fsf@gnu.org> <87iojt61j4.fsf@fencepost.gnu.org> <871tqcy8k9.fsf@netris.org> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1413187202 18017 80.91.229.3 (13 Oct 2014 08:00:02 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 13 Oct 2014 08:00:02 +0000 (UTC) Cc: rms@gnu.org, dmantipov@yandex.ru, emacs-devel@gnu.org, handa@gnu.org, monnier@iro.umontreal.ca, Eli Zaretskii , stephen@xemacs.org To: Mark H Weaver Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Mon Oct 13 09:59:55 2014 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1XdaXq-00080f-Hf for ged-emacs-devel@m.gmane.org; Mon, 13 Oct 2014 09:59:54 +0200 Original-Received: from localhost ([::1]:60719 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XdaXn-0004Ty-Je for ged-emacs-devel@m.gmane.org; Mon, 13 Oct 2014 03:59:51 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:58756) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XdaXj-0004Ts-PZ for emacs-devel@gnu.org; Mon, 13 Oct 2014 03:59:49 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XdaXi-0007Zl-Dc for emacs-devel@gnu.org; Mon, 13 Oct 2014 03:59:47 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:56925) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XdaXi-0007Zf-Ae for emacs-devel@gnu.org; Mon, 13 Oct 2014 03:59:46 -0400 Original-Received: from localhost ([127.0.0.1]:35864 helo=lola) by fencepost.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XdaXa-0003jq-4U; Mon, 13 Oct 2014 03:59:38 -0400 Original-Received: by lola (Postfix, from userid 1000) id CF4DAE069D; Mon, 13 Oct 2014 09:41:55 +0200 (CEST) In-Reply-To: <871tqcy8k9.fsf@netris.org> (Mark H. Weaver's message of "Sun, 12 Oct 2014 23:04:06 -0400") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:175313 Archived-At: Mark H Weaver writes: > David Kastrup writes: > >> The conceptual lack of separation between internal and external utf-8 >> encoding leads to strangenesses like >> >> scheme@(guile-user)> (with-input-from-string "\ufeff!" read-char) >> $8 =3D #\! >> >> Yes, this is a string->string operation losing a byte order mark in >> spite of no indication that I would like to get encodings involved in >> any manner. > > Byte Order Marks are an ugly corner of Unicode, and I spent a lot of > effort to try to do the right thing here. What we do in Guile is > described here: > > https://www.gnu.org/software/guile/manual/html_node/BOM-Handling.html > > I agree that we should inhibit BOM handling for string ports. > >> And when I can say "let's see where this kind of thinking will lead" and >> find a hole to poke within a minute, > > BTW, your claim that you found this hole "within a minute" is a > bald-faced lie and you know it. > In , I stated my belief that our internal > use of UTF-8 in string ports was not visible to the application as > long as you didn't manually change the encoding for the string port or > use seek/ftell. That was on Sept 24th. Uh, my claim was not that I found this problem a minute after first thinking about GUILE's string handling. It was more about how long it took me after deciding to look for an example for _this_ discussion. Now my above description may not be accurate since "let's see where this kind of thinking will lead" is obviously not something that occured to me just these days, or even these years. So it applied to the more concrete case of reading in the GUILE manual about its BOM handling, making the connection to string ports, thinking "now that's likely to be another half-baked bean", and finding that issue by experiment. To the best of my memory, this _was_ the first time I read about BOM handling in GUILE. That does not mean that I can vouch for this page never having been on-screen before, or even me having skimmed through it. But it definitely is the first time I remember having read it now. > You spent a *lot* of time arguing with us in that bug report, and this > is exactly the observation you could have used to bolster your > argument, but you never found it until now. Because I did not look for it before. At any rate, in relation to that bug report I had a different actual example exposed in (for which I provided a patch in ). Here the attempt to create an open-coded fast path to speed up a few gratuitous conversions when reading numbers from a string port (encode to UTF-8 because string ports are implemented as byte streams, decode when reading, reencode when ungetting the non-digit read after the last digit, redecode when reading it again...). I think it was more or less sorted into the "one bug does not demonstrate a problem" category. That bug jumped out at me not when I was searching for a redecoding problem but rather when I looked at the code in ports.c (which that issue was about) after musing "how are they going to unread in a string port?". And the open-coded conversion was there to to avoid calling the apparently slow libunistring (yes, libunistring) function u32_conv_to_encoding . Bugs happen. But code that is not called in the first place can cause no bug. At any rate, when looking for a snappy "this might not work well with reencoding example" on the Emacs Lisp, I first looked at surrogate words. Well, (integer->char #xd800) throws an out-of-range error. So one is not even allowed to talk about surrogate words at the character/word level, look for them with regular expressions and so on. I have some choice words for that as well, but it's not a bug. It's pretty much a necessary consequence of the design that does not give representation to input outside of the proper UTF-8 range. Since "not practical" was already cried down as a consideration in this discussion, I wanted an actual bug rather than just a refusal to work with things defined as invalid. So I looked in the GUILE manual to see whether I could find something about surrogate words and instead chanced upon "BOM" which apparently _was_ allowed into strings, so I just thought "oh, that could be an equally bad can of worms". And admittedly, my first try was using the string port in the other direction, namely with-output-to-string. From the description I'd have expected _that_ to blow up rather than the other way round. And the time from "oh, this one could be bad as well" to finding the problem (I am not even sure it is a bug rather than a particularly jarring but logical consequence of the way string ports are defined in GUILE as a byte stream with encoding) was not more than a few minutes at best. A fix will likely be equally fast to do, and there is a school that every sufficiently patched-up software is indistinguishable from design. So that's the history of this bald-faced lie of mine. I am sure that I=A0offer better opportunities for ad hominem attacks than that. --=20 David Kastrup