From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Ken Raeburn Newsgroups: gmane.lisp.guile.devel Subject: Re: redoing SCM representation in 2.2 Date: Thu, 19 May 2011 03:58:31 -0400 Message-ID: References: <2932B3D9-7CE6-46B9-8A1E-51702E417D53@raeburn.org> <72BFF702-0847-45A6-A433-4BAC00F5950B@raeburn.org> <87oc31p2zd.fsf@netris.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1305791924 22214 80.91.229.12 (19 May 2011 07:58:44 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 19 May 2011 07:58:44 +0000 (UTC) Cc: Andy Wingo , guile-devel To: Mark H Weaver Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Thu May 19 09:58:40 2011 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QMy7z-0000Rp-Qa for guile-devel@m.gmane.org; Thu, 19 May 2011 09:58:40 +0200 Original-Received: from localhost ([::1]:37677 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QMy7z-0007fy-DR for guile-devel@m.gmane.org; Thu, 19 May 2011 03:58:39 -0400 Original-Received: from eggs.gnu.org ([140.186.70.92]:35004) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QMy7x-0007fi-4g for guile-devel@gnu.org; Thu, 19 May 2011 03:58:38 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QMy7v-0001Qc-EX for guile-devel@gnu.org; Thu, 19 May 2011 03:58:37 -0400 Original-Received: from mail-qw0-f41.google.com ([209.85.216.41]:51392) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QMy7v-0001QI-9l for guile-devel@gnu.org; Thu, 19 May 2011 03:58:35 -0400 Original-Received: by qwa26 with SMTP id 26so1654598qwa.0 for ; Thu, 19 May 2011 00:58:33 -0700 (PDT) Original-Received: by 10.229.37.79 with SMTP id w15mr2156655qcd.180.1305791913694; Thu, 19 May 2011 00:58:33 -0700 (PDT) Original-Received: from [10.0.0.158] (c-24-128-48-142.hsd1.ma.comcast.net [24.128.48.142]) by mx.google.com with ESMTPS id s9sm1471897qco.0.2011.05.19.00.58.32 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 19 May 2011 00:58:32 -0700 (PDT) In-Reply-To: <87oc31p2zd.fsf@netris.org> X-Mailer: Apple Mail (2.1084) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 209.85.216.41 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:12504 Archived-At: On May 17, 2011, at 14:59, Mark H Weaver wrote: >> But the question I was after was, even if we want to use the full >> range of the values, do we need the entire set to be representable = *in >> immediate form*? I doubt the very largest and smallest values are >> used often, so making just those values use heap storage probably >> wouldn't be too costly in space. >=20 > In principle, I agree with you that it might be reasonable to cut the > exponent range of immediate doubles in half in order to gain an extra > bit of tagging information. Unfortunately, it seems to me that the > details of the IEEE 754 floating-point formats would make the required > checks rather expensive. >=20 > The exponent field of an IEEE 754 binary floating-point number is not > stored in twos-complement format, but is instead "biased", meaning = that > a constant is added to the true exponent of two, such that 1 = represents > the smallest representable exponent, and 2^EXPBITS-2 represents the > largest representable exponent. The worse problem is that a 0 in the > exponent field means that the number is a (signed) zero, and = 2^EXPBITS-1 > means that the number is either infinity of a NaN. So the most interesting exponent values to keep expressible as = immediates would include (hex) 000, 7ff, and values near 3ff and 400, = right? One way of looking at that is, in the top 3 exponent bits, four = of the eight possible patterns (000, 111, 011, 100) represent the most = interesting floating point values, and the rest can be relegated to heap = storage. If we leave it at that, actually, some of the really-small and = really-large numbers could be encoded immediately, but values of medium = magnitudes could not. Hmm... we'd want to map field values of 0, 7, 3, and 4 to one value, and = 1, 2, 5, and 6 to another. How about: "(field+1) & 2"? I think that = gives you 0 for 0/inf/nan and values with large positive or negative = exponents, and for values closer to 1 in magnitude, and gives 2 for = finite values with medium-range magnitudes. (I threw some code together = testing some powers of 10; it looks like 1.0e-231 and smaller, 1.0e-76 = through 1.0e77, and 1.0e232 and larger, along with 0/nan/inf, would get = classified together that way.) Make the "field+1" part of the encoding, and all those values encode in = a form that has that specific bit clear (or bits could be shuffled to = put it elsewhere); then values with the bit set would indicate = everything other than immediate doubles. A bit more arithmetic trickery might let you swap exponent 000 with 200 = and 7ff with 5ff, giving you only 0x3ff exponent values you'd want to = store immediately, same as above, but with only two of the exponent = values "stolen" for 0/inf/nan. This is all still only somewhat IEEE-754 dependent, in that it doesn't = require the existence of inf or nan values, it just twists things around = a bit so that the IEEE-754 encodings of those values would be among = those encoded as immediate values. If a machine like the Vax doesn't = support them, no big deal; the basic idea still works. But, what's the case we should optimize for? I'm guessing that in cases = where lots of floating point values are encoded and decoded, we're doing = lots of math or lots of I/O, and I suspect that 0/inf/nan are not nearly = as common as finite nonzero values in some truncated range. (And how = big a range of exponents do most applications need?) So a variant = encoding that uses magic values for 0/inf/nan that can be easily tested = for collectively, much like we do for #t/#f/#nil now, and heap encoding = for the largest-magnitude exponents, cuts in half the number of values = we need to be able to encode in immediate form using lots of bits, at = the cost of a few more magic constants, and a slightly slower = classify/encode/decode process because of the use of three encodings. = And we may be able to streamline usage slightly with a macro or inline = function to express "tell me if this is a double and if it is also = extract the value into this variable". I guess one important question is how important efficient use and = encoding of the full range of floating point values is compared to other = types. I think I'd rather have the wider space for integers and = pointers, personally, but I'm sure other people's work may have = different demands. And for that matter, I don't know if all that much = of the 64-bit integer data I ever deal with really is outside the 48-bit = range, though I could imagine issues coming up with bit masks. > Given this, I don't see an obvious way to steal a bit or two from the > exponent field without making the associated checks too expensive to = be > worth the benefits. I think at least two conditional branches would = be > required. Do you see a clever way around this? If so, can you please > post the details? Correctly predicted branches are generally cheap on most modern = architectures, I believe. So I'm not sure how costly it would really = be, compared to all the other stuff we do. It would cost a little more = than in Andy's proposal, I think. If the test you care about is, "is this a double", then with the first = form above I guess it would be implemented as "is bit X clear; if not, = does the tag indicate that it's a double stored in the heap". So, looks = like two tests to determine that it isn't a double, and they may or may = not be easy to combine into one test. The second form preserves more of = the value space for non-double values, but would probably take three = tests (bit X; 0/nan/inf magic constants; heap-double tag) to determine = that a value isn't a double. For a value that is a double, hopefully = the first test would catch it, most of the time. Ken=