From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Ken Raeburn <raeburn@raeburn.org>
Newsgroups: gmane.lisp.guile.devel
Subject: Re: redoing SCM representation in 2.2
Date: Thu, 19 May 2011 03:58:31 -0400
Message-ID: <B4447FDD-E0DD-453F-A255-4C099BAB18EC@raeburn.org>
References: <m3fwok2pez.fsf@unquote.localdomain>
	<2932B3D9-7CE6-46B9-8A1E-51702E417D53@raeburn.org>
	<m3r580aruq.fsf@unquote.localdomain>
	<72BFF702-0847-45A6-A433-4BAC00F5950B@raeburn.org>
	<87oc31p2zd.fsf@netris.org>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-Trace: dough.gmane.org 1305791924 22214 80.91.229.12 (19 May 2011 07:58:44 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Thu, 19 May 2011 07:58:44 +0000 (UTC)
Cc: Andy Wingo <wingo@pobox.com>, guile-devel <guile-devel@gnu.org>
To: Mark H Weaver <mhw@netris.org>
Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Thu May 19 09:58:40 2011
Return-path: <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>
Envelope-to: guile-devel@m.gmane.org
Original-Received: from lists.gnu.org ([140.186.70.17])
	by lo.gmane.org with esmtp (Exim 4.69)
	(envelope-from <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>)
	id 1QMy7z-0000Rp-Qa
	for guile-devel@m.gmane.org; Thu, 19 May 2011 09:58:40 +0200
Original-Received: from localhost ([::1]:37677 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <guile-devel-bounces+guile-devel=m.gmane.org@gnu.org>)
	id 1QMy7z-0007fy-DR
	for guile-devel@m.gmane.org; Thu, 19 May 2011 03:58:39 -0400
Original-Received: from eggs.gnu.org ([140.186.70.92]:35004)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <raeburn@raeburn.org>) id 1QMy7x-0007fi-4g
	for guile-devel@gnu.org; Thu, 19 May 2011 03:58:38 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <raeburn@raeburn.org>) id 1QMy7v-0001Qc-EX
	for guile-devel@gnu.org; Thu, 19 May 2011 03:58:37 -0400
Original-Received: from mail-qw0-f41.google.com ([209.85.216.41]:51392)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <raeburn@raeburn.org>) id 1QMy7v-0001QI-9l
	for guile-devel@gnu.org; Thu, 19 May 2011 03:58:35 -0400
Original-Received: by qwa26 with SMTP id 26so1654598qwa.0
	for <guile-devel@gnu.org>; Thu, 19 May 2011 00:58:33 -0700 (PDT)
Original-Received: by 10.229.37.79 with SMTP id w15mr2156655qcd.180.1305791913694;
	Thu, 19 May 2011 00:58:33 -0700 (PDT)
Original-Received: from [10.0.0.158] (c-24-128-48-142.hsd1.ma.comcast.net
	[24.128.48.142])
	by mx.google.com with ESMTPS id s9sm1471897qco.0.2011.05.19.00.58.32
	(version=TLSv1/SSLv3 cipher=OTHER);
	Thu, 19 May 2011 00:58:32 -0700 (PDT)
In-Reply-To: <87oc31p2zd.fsf@netris.org>
X-Mailer: Apple Mail (2.1084)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2)
X-Received-From: 209.85.216.41
X-BeenThere: guile-devel@gnu.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "Developers list for Guile,
	the GNU extensibility library" <guile-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/guile-devel>,
	<mailto:guile-devel-request@gnu.org?subject=unsubscribe>
List-Archive: </archive/html/guile-devel>
List-Post: <mailto:guile-devel@gnu.org>
List-Help: <mailto:guile-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guile-devel>,
	<mailto:guile-devel-request@gnu.org?subject=subscribe>
Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org
Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.lisp.guile.devel:12504
Archived-At: <http://permalink.gmane.org/gmane.lisp.guile.devel/12504>

On May 17, 2011, at 14:59, Mark H Weaver wrote:
>> But the question I was after was, even if we want to use the full
>> range of the values, do we need the entire set to be representable =
*in
>> immediate form*?  I doubt the very largest and smallest values are
>> used often, so making just those values use heap storage probably
>> wouldn't be too costly in space.
>=20
> In principle, I agree with you that it might be reasonable to cut the
> exponent range of immediate doubles in half in order to gain an extra
> bit of tagging information.  Unfortunately, it seems to me that the
> details of the IEEE 754 floating-point formats would make the required
> checks rather expensive.
>=20
> The exponent field of an IEEE 754 binary floating-point number is not
> stored in twos-complement format, but is instead "biased", meaning =
that
> a constant is added to the true exponent of two, such that 1 =
represents
> the smallest representable exponent, and 2^EXPBITS-2 represents the
> largest representable exponent.  The worse problem is that a 0 in the
> exponent field means that the number is a (signed) zero, and =
2^EXPBITS-1
> means that the number is either infinity of a NaN.

So the most interesting exponent values to keep expressible as =
immediates would include (hex) 000, 7ff, and values near 3ff and 400, =
right?  One way of looking at that is, in the top 3 exponent bits, four =
of the eight possible patterns (000, 111, 011, 100) represent the most =
interesting floating point values, and the rest can be relegated to heap =
storage.  If we leave it at that, actually, some of the really-small and =
really-large numbers could be encoded immediately, but values of medium =
magnitudes could not.

Hmm... we'd want to map field values of 0, 7, 3, and 4 to one value, and =
1, 2, 5, and 6 to another.  How about: "(field+1) & 2"?  I think that =
gives you 0 for 0/inf/nan and values with large positive or negative =
exponents, and for values closer to 1 in magnitude, and gives 2 for =
finite values with medium-range magnitudes.  (I threw some code together =
testing some powers of 10; it looks like 1.0e-231 and smaller, 1.0e-76 =
through 1.0e77, and 1.0e232 and larger, along with 0/nan/inf, would get =
classified together that way.)

Make the "field+1" part of the encoding, and all those values encode in =
a form that has that specific bit clear (or bits could be shuffled to =
put it elsewhere); then values with the bit set would indicate =
everything other than immediate doubles.

A bit more arithmetic trickery might let you swap exponent 000 with 200 =
and 7ff with 5ff, giving you only 0x3ff exponent values you'd want to =
store immediately, same as above, but with only two of the exponent =
values "stolen" for 0/inf/nan.

This is all still only somewhat IEEE-754 dependent, in that it doesn't =
require the existence of inf or nan values, it just twists things around =
a bit so that the IEEE-754 encodings of those values would be among =
those encoded as immediate values.  If a machine like the Vax doesn't =
support them, no big deal; the basic idea still works.


But, what's the case we should optimize for?  I'm guessing that in cases =
where lots of floating point values are encoded and decoded, we're doing =
lots of math or lots of I/O, and I suspect that 0/inf/nan are not nearly =
as common as finite nonzero values in some truncated range.  (And how =
big a range of exponents do most applications need?)  So a variant =
encoding that uses magic values for 0/inf/nan that can be easily tested =
for collectively, much like we do for #t/#f/#nil now, and heap encoding =
for the largest-magnitude exponents, cuts in half the number of values =
we need to be able to encode in immediate form using lots of bits, at =
the cost of a few more magic constants, and a slightly slower =
classify/encode/decode process because of the use of three encodings.  =
And we may be able to streamline usage slightly with a macro or inline =
function to express "tell me if this is a double and if it is also =
extract the value into this variable".

I guess one important question is how important efficient use and =
encoding of the full range of floating point values is compared to other =
types.  I think I'd rather have the wider space for integers and =
pointers, personally, but I'm sure other people's work may have =
different demands.  And for that matter, I don't know if all that much =
of the 64-bit integer data I ever deal with really is outside the 48-bit =
range, though I could imagine issues coming up with bit masks.


> Given this, I don't see an obvious way to steal a bit or two from the
> exponent field without making the associated checks too expensive to =
be
> worth the benefits.  I think at least two conditional branches would =
be
> required.  Do you see a clever way around this?  If so, can you please
> post the details?

Correctly predicted branches are generally cheap on most modern =
architectures, I believe.  So I'm not sure how costly it would really =
be, compared to all the other stuff we do.  It would cost a little more =
than in Andy's proposal, I think.

If the test you care about is, "is this a double", then with the first =
form above I guess it would be implemented as "is bit X clear; if not, =
does the tag indicate that it's a double stored in the heap".  So, looks =
like two tests to determine that it isn't a double, and they may or may =
not be easy to combine into one test.  The second form preserves more of =
the value space for non-double values, but would probably take three =
tests (bit X; 0/nan/inf magic constants; heap-double tag) to determine =
that a value isn't a double.  For a value that is a double, hopefully =
the first test would catch it, most of the time.

Ken=