From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Ken Raeburn <raeburn@raeburn.org>
Newsgroups: gmane.lisp.guile.devel
Subject: Re: redoing SCM representation in 2.2
Date: Sun, 15 May 2011 05:00:07 -0400
Message-ID: <2932B3D9-7CE6-46B9-8A1E-51702E417D53@raeburn.org>
References: <m3fwok2pez.fsf@unquote.localdomain>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
X-Trace: dough.gmane.org 1305450023 5344 80.91.229.12 (15 May 2011 09:00:23 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Sun, 15 May 2011 09:00:23 +0000 (UTC)
Cc: guile-devel <guile-devel@gnu.org>
To: Andy Wingo <wingo@pobox.com>
Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sun May 15 11:00:17 2011
Return-path: <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>
Envelope-to: guile-devel@m.gmane.org
Original-Received: from lists.gnu.org ([140.186.70.17])
	by lo.gmane.org with esmtp (Exim 4.69)
	(envelope-from <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>)
	id 1QLXBQ-0007mf-Ni
	for guile-devel@m.gmane.org; Sun, 15 May 2011 11:00:16 +0200
Original-Received: from localhost ([::1]:50559 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>)
	id 1QLXBQ-0003M9-BW
	for guile-devel@m.gmane.org; Sun, 15 May 2011 05:00:16 -0400
Original-Received: from eggs.gnu.org ([140.186.70.92]:42145)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <raeburn@raeburn.org>) id 1QLXBN-0003JX-4M
	for guile-devel@gnu.org; Sun, 15 May 2011 05:00:14 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <raeburn@raeburn.org>) id 1QLXBL-00015f-V2
	for guile-devel@gnu.org; Sun, 15 May 2011 05:00:13 -0400
Original-Received: from mail-qw0-f41.google.com ([209.85.216.41]:60039)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <raeburn@raeburn.org>) id 1QLXBL-00015T-RG
	for guile-devel@gnu.org; Sun, 15 May 2011 05:00:11 -0400
Original-Received: by qwa26 with SMTP id 26so2533765qwa.0
	for <guile-devel@gnu.org>; Sun, 15 May 2011 02:00:09 -0700 (PDT)
Original-Received: by 10.224.40.211 with SMTP id l19mr2516120qae.46.1305450009856;
	Sun, 15 May 2011 02:00:09 -0700 (PDT)
Original-Received: from [10.0.0.158] (c-24-128-48-142.hsd1.ma.comcast.net
	[24.128.48.142])
	by mx.google.com with ESMTPS id l10sm2497301qck.26.2011.05.15.02.00.08
	(version=TLSv1/SSLv3 cipher=OTHER);
	Sun, 15 May 2011 02:00:08 -0700 (PDT)
In-Reply-To: <m3fwok2pez.fsf@unquote.localdomain>
X-Mailer: Apple Mail (2.1084)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2)
X-Received-From: 209.85.216.41
X-BeenThere: guile-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Developers list for Guile,
	the GNU extensibility library" <guile-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/guile-devel>,
	<mailto:guile-devel-request@gnu.org?subject=unsubscribe>
List-Archive: </archive/html/guile-devel>
List-Post: <mailto:guile-devel@gnu.org>
List-Help: <mailto:guile-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guile-devel>,
	<mailto:guile-devel-request@gnu.org?subject=subscribe>
Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org
Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.lisp.guile.devel:12493
Archived-At: <http://permalink.gmane.org/gmane.lisp.guile.devel/12493>

On May 12, 2011, at 06:17, Andy Wingo wrote:
> I'm looking at new SCM representation and tagging possibilities in =
2.2.
> Read the whole mail please, as it's a little complicated.

Innnnteresting....

> I would like to revisit the SCM representation and tagging scheme in
> 2.2.  In particular, I would like to look at NaN-boxing.  I explain =
the
> plan a bit below, but if you like to get your information depth-first,
> check out:

So... Guile 2.2 won't work on the VAXstation in my basement, which =
doesn't do IEEE math? :-(
(Not that I've powered it up in some time...)
Guess I hadn't thought about that before; we've got code that refers to =
IEEE floating point already, so does that mean we require IEEE floating =
point already?

On 64-bit SPARC and perhaps some other architectures, we'd be dependent =
on the OS only effectively using 48 bits worth of address space even if =
the hardware supports more.  I'd be surprised if we encounter a program =
that needs more storage than that, and I expect most current OSes will =
tend to have a couple of regions growing toward each other rather than =
scatter stuff all over the address space, but I could imagine a =
particularly na=EFve or aggressive form of address space layout =
randomization trying to take advantage of all 64 bits by scattering =
mapped memory throughout all 2**64 addresses (minus whatever the kernel =
uses), for libraries or heap allocation or both.


> Basically I think it's OK to restrict the Scheme heap to be within a
> 48-bit space, at least for the next decade or so.  But given that the
> total address space is more than 48 bits on many architectures,
> arbitrary immediate foreign pointers may not be possible on
> e.g. Sparc64.

Nor the full range of (u)int64_t values we might get from a library.

Though, I'll just throw these out there:

If the 64-bit SCM type isn't required to represent the full range of =
integer values the machine can support as immediate values, does it =
really have to encompass the full range of "double" values?  Is that =
really what we should be optimizing the encoding for?  Maybe for really =
large or really tiny values it would be okay to use heap storage as we =
do for bignums, and steal an exponent bit to use as a tag?  Or if you =
steal a few mantissa bits, you lose a little precision but keep all the =
exponent bits.  So you don't need to waste 13 bits on saying "this is =
not a floating point value" all the time, and you can widen the range =
permissible for immediate integer and pointer values.

How much range and precision do we need in floating point values, =
anyways?  Is there a reason to use "double" and not "float" or "long =
double"?  If "float" is acceptable (which I assume it's probably not; =
I'm just exploring the idea "out loud" as it were), we could just encode =
an intact "float" and a bunch of tag bits together in a 64-bit value, on =
any machine where "float" is 32 bits, and it'd probably have the range =
needed for a lot of everyday use.  Or, combine the ideas -- on a 32-bit =
machine, use a 32-bit type, one bit indicates "this is a 'float' with =
one exponent bit stolen", otherwise more tag bits indicate other =
immediate or non-immediate types, and one of the non-immediate ones =
encodes a full "double" when the wider range is needed.


> I think we need to do the JSC way, as it appears to be the only way to
> work with the BDW GC, currently anyway.  We will need some integration
> with the GC to ensure the 48-bit space, but that should be doable.

Don't we have some objects now which can be initialized statically by =
the compiler, and for which the addresses get encoded directly into the =
resulting SCM objects?  That means the mapping of executable and library =
images would have to fit in the 48-bit address space, and that's =
generally up to the OS kernel; having BDW-GC do some magic at allocation =
time wouldn't be enough.

Ken=