From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: David Kastrup Newsgroups: gmane.lisp.guile.devel Subject: Re: The empty string and other empty strings Date: Fri, 13 Jan 2012 18:36:24 +0100 Organization: Organization?!? Message-ID: <87k44v7bdz.fsf@fencepost.gnu.org> References: <4F027F35.5020001@gmail.com> <4F04D01D.5050801@gnu.org> <8762grf28k.fsf@netris.org> <4F05DC47.1000202@gnu.org> <878vlldb4k.fsf@netris.org> <1325811764.22562.YahooMailNeo@web37903.mail.mud.yahoo.com> <87wr95bo9y.fsf@netris.org> <1325857075.77324.YahooMailNeo@web37903.mail.mud.yahoo.com> <877h14bsx0.fsf@netris.org> <4F07747A.4080202@gnu.org> <87sjjsa0kh.fsf@netris.org> <87boqfa8cd.fsf@netris.org> <874nw353a4.fsf_-_@gnu.org> <1326194907.55971.YahooMailNeo@web37901.mail.mud.yahoo.com> <87y5tf90c5.fsf@netris.org> <87pqer8t0p.fsf@netris.org> <87obubk17w.fsf@fencepost.gnu.org> <87lipf8rg5.fsf@netris.org> <87aa5smtzc.fsf@gnu.org> <871ur47y0g.fsf@fencepost.gnu.org> <87y5tb7e12.fsf@netris.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Trace: dough.gmane.org 1326480483 8793 80.91.229.12 (13 Jan 2012 18:48:03 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Fri, 13 Jan 2012 18:48:03 +0000 (UTC) To: guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Fri Jan 13 19:47:56 2012 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1RlmAO-0006cv-4y for guile-devel@m.gmane.org; Fri, 13 Jan 2012 19:47:56 +0100 Original-Received: from localhost ([::1]:55505 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RlmAN-0003qS-Mz for guile-devel@m.gmane.org; Fri, 13 Jan 2012 13:47:55 -0500 Original-Received: from eggs.gnu.org ([140.186.70.92]:58627) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rll3c-0004Z1-7x for guile-devel@gnu.org; Fri, 13 Jan 2012 12:36:56 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Rll3V-0007vm-Rg for guile-devel@gnu.org; Fri, 13 Jan 2012 12:36:52 -0500 Original-Received: from lo.gmane.org ([80.91.229.12]:33683) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Rll3V-0007ve-GG for guile-devel@gnu.org; Fri, 13 Jan 2012 12:36:45 -0500 Original-Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1Rll3O-0005en-6b for guile-devel@gnu.org; Fri, 13 Jan 2012 18:36:38 +0100 Original-Received: from p508ead5a.dip.t-dialin.net ([80.142.173.90]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 13 Jan 2012 18:36:38 +0100 Original-Received: from dak by p508ead5a.dip.t-dialin.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 13 Jan 2012 18:36:38 +0100 X-Injected-Via-Gmane: http://gmane.org/ Original-Lines: 107 Original-X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: p508ead5a.dip.t-dialin.net X-Face: 2FEFf>]>q>2iw=B6, xrUubRI>pR&Ml9=ao@P@i)L:\urd*t9M~y1^:+Y]'C0~{mAl`oQuAl \!3KEIp?*w`|bL5qr,H)LFO6Q=qx~iH4DN; i"; /yuIsqbLLCh/!U#X[S~(5eZ41to5f%E@'ELIi$t^ Vc\LWP@J5p^rst0+('>Er0=^1{]M9!p?&:\z]|;&=NP3AhB!B_bi^]Pfkw User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.92 (gnu/linux) Cancel-Lock: sha1:EX0TyzduoOfLYP3+IvPOthECLv0= X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 80.91.229.12 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:13480 Archived-At: Mark H Weaver writes: > David Kastrup writes: > >> ludo@gnu.org (Ludovic Courtès) writes: >> >>> Hi Mark, >>> >>> Mark H Weaver skribis: >>> >>>> What do other people think? >>> >>> As you said, R5RS makes it clear that there can be several (in the sense >>> of eq?) empty strings, so I think what you did is the right thing. >> >> Since it uses the same verbiage with regard to '(), could you please >> point out _where_ R5RS states that "freshly allocated" means "not >> eq?"? > > Section 3.4 (Storage model) of the R5RS states: > > Whenever this report speaks of storage being allocated for a variable > or object, what is meant is that an appropriate number of locations > are chosen from the set of locations that are not in use, and the > chosen locations are marked to indicate that they are now in use > before the variable or object is made to denote them. And that's perfectly fine for the characters of a string. However, (string) has no characters. Like (list) has no list members. (list) does not need _any_ allocation, and neither would (string). For me it makes sense to make the fundamental building block of a type a self-contained value. For multi-value non-composite types (like numerical types) that is not necessarily feasible. For composite types with a single elementary non-composite value, it makes sense for me to make this value a basic cell value. Since empty strings are valid substrings of both mutable and non-mutable strings, I don't see that it makes sense to apply either property to them since it is impossible to change any character through them. So there are a number of operations which should for consistency's sake be able to check for this special value efficiently. Reserving a cell value for it seems like the straightforward thing to do, and that is what is done with lists also. >> For me it means "does not contain any component in common with >> previously allocated material". The fixed constant '() or (list) >> (the neutral element with regard to list concatenation) not >> containing any allocated pairs meets that description, and the fixed >> constant "" or (string) (the neutral element with regard to string >> concatenation) not containing any allocated characters meets that >> description. > > I think this is a very reasonable interpretation, but this is not in > accordance with the standard. Are you saying that (eq? (list) (list)) is not in accordance with the standard since the standard specifies that a freshly allocated list is to be returned? >> So why treat them differently? What does it buy us except trouble? > > I don't see how our current behavior buys us _any_ trouble. We've > voluntarily opted-out of a (marginal) optimization opportunity, and > that's all. > > In your proposed behavior: in _almost_ all cases, `scm_from_stringn' > (et al) would return an object that is not `eq?' to any other existing > object. However, in a single edge case, you'd have it return > something that _is_ `eq?' to other existing objects. This is the kind > of behavior that could easily buy us trouble. Why? You can't change any other value _through_ it. Do you want to use (string) as a not-eq-to-anything sentinel like Lisp people do with (list nil) sometimes? It is known that (list) will not do for that purpose (in spite of the standard saying that list will return a freshly allocated list), so do you really think people will expect (string) to do? > To my mind, if the optimization is insignificant (and I suspect that > it is), then it is safer to treat the edge cases the same as the > common case, for the sake of simplifying the semantics. You'll find yourself to be checking for "" more often in connection with strings than for 0 in connection with numbers because "" is special in that it contains no characters or other members. So for me "" is a prime candidate for a single-cell constant. We can live with other objects like 0 not being eq to equal values, so we certainly can with this one. > However, my mind is not set in stone on this. Does anyone else here > agree with David? Should we defend the legitimacy of this > optimization, and ask the R7RS working group to include explicit > language specifying that empty strings/vectors need not be freshly > allocated? They don't specify that empty lists need not be freshly allocated, either, so it would be strange to make a difference here. I think it makes more sense to define "freshly allocated" instead, as "no pre-existing object can be modified through any operation on it". That means that any single-cell constant is by definition "freshly allocated". And indeed, its _cell_ is freshly allocated even though that cell _value_ may be eq? to that of other cells. -- David Kastrup