From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ken Raeburn Newsgroups: gmane.lisp.guile.devel Subject: Re: redoing SCM representation in 2.2 Date: Sun, 15 May 2011 05:00:07 -0400 Message-ID: <2932B3D9-7CE6-46B9-8A1E-51702E417D53@raeburn.org> References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1305450023 5344 80.91.229.12 (15 May 2011 09:00:23 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sun, 15 May 2011 09:00:23 +0000 (UTC) Cc: guile-devel To: Andy Wingo Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sun May 15 11:00:17 2011 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QLXBQ-0007mf-Ni for guile-devel@m.gmane.org; Sun, 15 May 2011 11:00:16 +0200 Original-Received: from localhost ([::1]:50559 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QLXBQ-0003M9-BW for guile-devel@m.gmane.org; Sun, 15 May 2011 05:00:16 -0400 Original-Received: from eggs.gnu.org ([140.186.70.92]:42145) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QLXBN-0003JX-4M for guile-devel@gnu.org; Sun, 15 May 2011 05:00:14 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QLXBL-00015f-V2 for guile-devel@gnu.org; Sun, 15 May 2011 05:00:13 -0400 Original-Received: from mail-qw0-f41.google.com ([209.85.216.41]:60039) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QLXBL-00015T-RG for guile-devel@gnu.org; Sun, 15 May 2011 05:00:11 -0400 Original-Received: by qwa26 with SMTP id 26so2533765qwa.0 for ; Sun, 15 May 2011 02:00:09 -0700 (PDT) Original-Received: by 10.224.40.211 with SMTP id l19mr2516120qae.46.1305450009856; Sun, 15 May 2011 02:00:09 -0700 (PDT) Original-Received: from [10.0.0.158] (c-24-128-48-142.hsd1.ma.comcast.net [24.128.48.142]) by mx.google.com with ESMTPS id l10sm2497301qck.26.2011.05.15.02.00.08 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 15 May 2011 02:00:08 -0700 (PDT) In-Reply-To: X-Mailer: Apple Mail (2.1084) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 209.85.216.41 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:12493 Archived-At: On May 12, 2011, at 06:17, Andy Wingo wrote: > I'm looking at new SCM representation and tagging possibilities in = 2.2. > Read the whole mail please, as it's a little complicated. Innnnteresting.... > I would like to revisit the SCM representation and tagging scheme in > 2.2. In particular, I would like to look at NaN-boxing. I explain = the > plan a bit below, but if you like to get your information depth-first, > check out: So... Guile 2.2 won't work on the VAXstation in my basement, which = doesn't do IEEE math? :-( (Not that I've powered it up in some time...) Guess I hadn't thought about that before; we've got code that refers to = IEEE floating point already, so does that mean we require IEEE floating = point already? On 64-bit SPARC and perhaps some other architectures, we'd be dependent = on the OS only effectively using 48 bits worth of address space even if = the hardware supports more. I'd be surprised if we encounter a program = that needs more storage than that, and I expect most current OSes will = tend to have a couple of regions growing toward each other rather than = scatter stuff all over the address space, but I could imagine a = particularly na=EFve or aggressive form of address space layout = randomization trying to take advantage of all 64 bits by scattering = mapped memory throughout all 2**64 addresses (minus whatever the kernel = uses), for libraries or heap allocation or both. > Basically I think it's OK to restrict the Scheme heap to be within a > 48-bit space, at least for the next decade or so. But given that the > total address space is more than 48 bits on many architectures, > arbitrary immediate foreign pointers may not be possible on > e.g. Sparc64. Nor the full range of (u)int64_t values we might get from a library. Though, I'll just throw these out there: If the 64-bit SCM type isn't required to represent the full range of = integer values the machine can support as immediate values, does it = really have to encompass the full range of "double" values? Is that = really what we should be optimizing the encoding for? Maybe for really = large or really tiny values it would be okay to use heap storage as we = do for bignums, and steal an exponent bit to use as a tag? Or if you = steal a few mantissa bits, you lose a little precision but keep all the = exponent bits. So you don't need to waste 13 bits on saying "this is = not a floating point value" all the time, and you can widen the range = permissible for immediate integer and pointer values. How much range and precision do we need in floating point values, = anyways? Is there a reason to use "double" and not "float" or "long = double"? If "float" is acceptable (which I assume it's probably not; = I'm just exploring the idea "out loud" as it were), we could just encode = an intact "float" and a bunch of tag bits together in a 64-bit value, on = any machine where "float" is 32 bits, and it'd probably have the range = needed for a lot of everyday use. Or, combine the ideas -- on a 32-bit = machine, use a 32-bit type, one bit indicates "this is a 'float' with = one exponent bit stolen", otherwise more tag bits indicate other = immediate or non-immediate types, and one of the non-immediate ones = encodes a full "double" when the wider range is needed. > I think we need to do the JSC way, as it appears to be the only way to > work with the BDW GC, currently anyway. We will need some integration > with the GC to ensure the 48-bit space, but that should be doable. Don't we have some objects now which can be initialized statically by = the compiler, and for which the addresses get encoded directly into the = resulting SCM objects? That means the mapping of executable and library = images would have to fit in the 48-bit address space, and that's = generally up to the OS kernel; having BDW-GC do some magic at allocation = time wouldn't be enough. Ken=