redoing SCM representation in 2.2

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

* redoing SCM representation in 2.2
@ 2011-05-12 10:17 Andy Wingo
  2011-05-12 11:12 ` nalaginrut
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Andy Wingo @ 2011-05-12 10:17 UTC (permalink / raw)
  To: guile-devel

Hello list,

I'm looking at new SCM representation and tagging possibilities in 2.2.
Read the whole mail please, as it's a little complicated.

The current system has a couple of problems:

  1) You can only tell a pair by dereferencing the pointer and checking
     the cell's low bits, and then we need complicated tricks to set the
     pair's bits on the value that is in the car.

     (This strategy was necessary with the old GC, but is not needed
     with the BDW GC.)

  2) You can only tell a procedure by dereferencing the pointer and
     checking the cell's word.

  3) Doubles are allocated on the heap.  This is, as my grandmother
     would say, turrible.

I would like to revisit the SCM representation and tagging scheme in
2.2.  In particular, I would like to look at NaN-boxing.  I explain the
plan a bit below, but if you like to get your information depth-first,
check out:

  http://blog.mozilla.com/rob-sayre/2010/08/02/mozillas-new-javascript-value-representation/

The concrete implementation strategy that I would like to use is
JSValue, from WebKit's JavaScriptCore (JSC):

  http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/runtime/JSValue.h
  http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/runtime/JSValueInlineMethods.h

(Almost makes you want C++, doesn't it? :)

                              *   *   *

Recall, a double is laid out like this:

  <- msb                                                      lsb ->

  sign: 1 bit
  | exponent: 11 bits
  | |           mantissa: 52 bits
  ------------------------------------------------------------------
  0 00000000000 0000000000000000000000000000000000000000000000000000

This particular value represents 0.0.  The special values +inf.0 and
-inf.0 are, respectively:

  0 11111111111 0000000000000000000000000000000000000000000000000000
  1 11111111111 0000000000000000000000000000000000000000000000000000

Any other bit pattern with all bits set in the exponent is a NaN.  But
in practice, hardware and software will only generate one particular NaN
value:

  > (number->string (bytevector-u64-native-ref #f64(+nan.0) 0) 2)
  $8 = "111111111111000000000000000000000000000000000000000000000000000"

Which is:

  0 11111111111 100000000000000000000000000000000000000000000000000

So that in practice, we have 51 bits into which we can stuff data, if we
set the MSB of a NaN.  If we assume a 48-bit payload, then:

  1 11111111111 1XXX_______________________________________________

So 8 tags there.

We also have any other combination with the NaN prefix that sets any
bit.  If we assume a 48-bit payload, that gives us

  0 11111111111 XXXX_______________________________________________

with the values 1000 and 0000 reserved, so 14 usable tags there.

                           *   *   *

Now, if you're like me, you probably have two big questions that end in
exclamation marks:

 1) 64 bits?  That's wasting too much memory on 32-bit systems!

 2) 48 bits?  That's not enough for a pointer on 64-bit systems!

With regards to (1), I can only agree.  However, as it gives us more
tags, we can represent more things as immediates (besides doubles,
foreign pointers would be one example), and we can shave off the initial
word from some other structures, so perhaps that's not all bad.

Additionally, we can limit the payload to 32-bits on 32-bit systems, so
that a type check is a simple 32-bit load, and payload extraction is
also a 32-bit load, so that's probably faster than the current thing.
(This is the JSVALUE32_64 case in JSValue, linked below.  It has been
found that comparing 8-bit tags actually takes more time than comparing
32-bit tags; see https://bugzilla.mozilla.org/show_bug.cgi?id=549143 for
the gory details).

Regarding (2), may I commend the following diagram to your perusal:

  http://en.wikipedia.org/wiki/X86-64#Virtual_address_space_details

Basically, current x86-64 chips restrict addresses to a 48-bit space.
Other architectures are not so restrictive:

  http://developers.sun.com/solaris/articles/solaris_memory.html

Note that on x86-64, Solaris will give user addresses in the
0xffff... range, unlike Linux or Windows, which only allocate the bottom
47 bits to userspace.  On Sparc64, there is no architecture-imposed VMA
hole as there is in x86-64.

Basically I think it's OK to restrict the Scheme heap to be within a
48-bit space, at least for the next decade or so.  But given that the
total address space is more than 48 bits on many architectures,
arbitrary immediate foreign pointers may not be possible on
e.g. Sparc64.

See also:

  https://bugzilla.mozilla.org/show_bug.cgi?id=577056

                           *   *   *

The difference between JSC's "nan-boxing" and spidermonkey (SM, the
thing from mozilla)'s "nun-boxing" is that JSC prefers the pointer
representation, and SM prefers the double representation.

So for the value 0.0, SM has

  0 00000000000 0000000000000000000000000000000000000000000000000000

And for a pointer to 0x0, SM has

  1 11111111111 1000000000000000000000000000000000000000000000000000

or something like that.  JSC on the other hand rotates the double space,
so that a pointer to 0x0 is

  0 00000000000 0000000000000000000000000000000000000000000000000000

and the double for 0.0 is something more like:

  1 10000000000 1000000000000000000000000000000000000000000000000000

I think we need to do the JSC way, as it appears to be the only way to
work with the BDW GC, currently anyway.  We will need some integration
with the GC to ensure the 48-bit space, but that should be doable.

With this encoding we might also be able to allocate an all-bits-set tag
to fixnums so that we can use the native overflow flags when doing
fxops.

Note that for pointers to the heap, we also still have the lower three
bits to play with.

                           *   *   *

The plan is basically to give it a go and see what happens, in a
branch.  But I'm putting this mail out there so that our architectural
gurus can give it a look-over and point out any portability issues this
strategy might present.

Thanks for reading, and looking forward to the feedback!

Andy
-- 
http://wingolog.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: redoing SCM representation in 2.2
  2011-05-12 10:17 redoing SCM representation in 2.2 Andy Wingo
@ 2011-05-12 11:12 ` nalaginrut
  2011-05-12 12:50 ` Stefan Israelsson Tampe
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: nalaginrut @ 2011-05-12 11:12 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

So we'll see a 48-bits solution in 2.2? 
Sorry I can't say I'm clear since it's a long article.

-- 
GNU Powered it
GPL Protected it
GOD Blessed it

HFG - NalaGinrut

--hacker key--
v4sw7CUSMhw6ln6pr8OSFck4ma9u8MLSOFw3WDXGm7g/l8Li6e7t4TNGSb8AGORTDLMen6g6RASZOGCHPa28s1MIr4p-x hackerkey.com
---end key---




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: redoing SCM representation in 2.2
  2011-05-12 10:17 redoing SCM representation in 2.2 Andy Wingo
  2011-05-12 11:12 ` nalaginrut
@ 2011-05-12 12:50 ` Stefan Israelsson Tampe
  2011-05-13 22:08 ` Mark H Weaver
  2011-05-15  9:00 ` Ken Raeburn
  3 siblings, 0 replies; 13+ messages in thread
From: Stefan Israelsson Tampe @ 2011-05-12 12:50 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 1383 bytes --]

Looks interesting.

I would like to reserve an object for the stack machine I use. Actually I
will be fine to model the prolog
engine ontop of your segestion if it was not for one idea.

So you have a layout of this
(X Tag14 Data Tag8)

Consider a the following tag in Tag14, frame-tag. Then Data is a frame
number and the next 8 bytes is a SCM value.
The idea is to use a set! like function, like

(frame-set! Frame FrameObj Val)

and use it like
(let ((fr (newframe)))
  (frame-set! *fluid* 1)
  ...
  (frame-set! *fluid* 2)
  ...
  (frame-set! *fluid* 3)
  (unwind fr))

(define (new-frame)
  (set! *fr* (+ *fr* 1))
  *fr*)

So with mem layout,
*fluid* : [Fr SCM]
Fr      : [X frame-tag Data Tag8]
SCM     : [X Tag14     Data Tag8]

(define (frame-set! d val)
   (if (not (eq? *fr*  d.Fr.Data))
       (begin
     (set! f.Fr.Data *fr*)
     (add-to-stack d))
   (set! f.SCM val))

e.g. we can manage an automatic selection between a set-fluid! and
with-fluid
semantics and control it by using newframe and unwind which is more logical
in
the prolog context. The tagging is useful because that we will now directly
that
we are hitting on an element of meta information and the actual data can be
reached in the next comming bytes. How about reserving one tag in Tag14 to
repressent  (X Tag14-xtra TagX Data) where Data is 32 bit and allow it to be

used by applications?

/Stefan

[-- Attachment #2: Type: text/html, Size: 1551 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: redoing SCM representation in 2.2
  2011-05-12 10:17 redoing SCM representation in 2.2 Andy Wingo
  2011-05-12 11:12 ` nalaginrut
  2011-05-12 12:50 ` Stefan Israelsson Tampe
@ 2011-05-13 22:08 ` Mark H Weaver
  2011-05-14  9:47   ` Andy Wingo
  2011-05-15  9:00 ` Ken Raeburn
  3 siblings, 1 reply; 13+ messages in thread
From: Mark H Weaver @ 2011-05-13 22:08 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

Hi Andy,

Andy Wingo <wingo@pobox.com> writes:
> I'm looking at new SCM representation and tagging possibilities in 2.2.
> Read the whole mail please, as it's a little complicated.

Unfortunately I don't have time to write a proper response right now,
but on 32-bit architectures, I expect that this will nearly double the
memory usage for typical programs, where pointers are by far the most
common object.

Let's keep in mind that on modern architectures, efficient memory usage
is becoming increasingly important for good performance.  It is often
far more important than other optimizations.

NaN-boxing is probably a somewhat more reasonable choice for Javascript
implementations in web browsers for two reasons: (1) the only
first-class numbers supported by Javascript are floating-point numbers,
and (2) web browsers are already memory hogs, so doubling the memory
used by Javascript's first-class values won't be noticed as much.

> The current system has a couple of problems:
>
>   1) You can only tell a pair by dereferencing the pointer and checking
>      the cell's low bits, and then we need complicated tricks to set the
>      pair's bits on the value that is in the car.
>
>      (This strategy was necessary with the old GC, but is not needed
>      with the BDW GC.)
>
>   2) You can only tell a procedure by dereferencing the pointer and
>      checking the cell's word.
>
>   3) Doubles are allocated on the heap.  This is, as my grandmother
>      would say, turrible.

Of the three problems mentioned above, I find only the third to be
compelling, but still not sufficiently compelling to justify nearly
doubling memory usage of typical programs.

     Best,
      Mark

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: redoing SCM representation in 2.2
  2011-05-13 22:08 ` Mark H Weaver
@ 2011-05-14  9:47   ` Andy Wingo
  2011-05-15  9:02     ` Ken Raeburn
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Wingo @ 2011-05-14  9:47 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-devel

Hi Mark,

On Sat 14 May 2011 00:08, Mark H Weaver <mhw@netris.org> writes:

> Andy Wingo <wingo@pobox.com> writes:
>> I'm looking at new SCM representation and tagging possibilities in 2.2.
>> Read the whole mail please, as it's a little complicated.
>
> Unfortunately I don't have time to write a proper response right now,
> but on 32-bit architectures, I expect that this will nearly double the
> memory usage for typical programs, where pointers are by far the most
> common object.

You are right that it would be a negative point, though I don't think
that it's as bad as you say; there are potential savings in immediate
foreign pointers, shaving off type words from some heap values, etc.

However, I realized that this isn't going to work on 32-bit, and for an
unexpected reason: GC.  The problem is that the low 32-bits can be
interpreted as a pointer, so you need to tag those bits to make the
payloads of immediate values like integers or characters not confusable
with pointers, and that takes away any potential advantage (wider fixnum
range for example).

So, bummer.  NaN-boxing is probably best on 64-bit machines though.

Andy
-- 
http://wingolog.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: redoing SCM representation in 2.2
  2011-05-12 10:17 redoing SCM representation in 2.2 Andy Wingo
                   ` (2 preceding siblings ...)
  2011-05-13 22:08 ` Mark H Weaver
@ 2011-05-15  9:00 ` Ken Raeburn
  2011-05-15 15:47   ` Andy Wingo
  3 siblings, 1 reply; 13+ messages in thread
From: Ken Raeburn @ 2011-05-15  9:00 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

On May 12, 2011, at 06:17, Andy Wingo wrote:
> I'm looking at new SCM representation and tagging possibilities in 2.2.
> Read the whole mail please, as it's a little complicated.

Innnnteresting....

> I would like to revisit the SCM representation and tagging scheme in
> 2.2.  In particular, I would like to look at NaN-boxing.  I explain the
> plan a bit below, but if you like to get your information depth-first,
> check out:

So... Guile 2.2 won't work on the VAXstation in my basement, which doesn't do IEEE math? :-(
(Not that I've powered it up in some time...)
Guess I hadn't thought about that before; we've got code that refers to IEEE floating point already, so does that mean we require IEEE floating point already?

On 64-bit SPARC and perhaps some other architectures, we'd be dependent on the OS only effectively using 48 bits worth of address space even if the hardware supports more.  I'd be surprised if we encounter a program that needs more storage than that, and I expect most current OSes will tend to have a couple of regions growing toward each other rather than scatter stuff all over the address space, but I could imagine a particularly naïve or aggressive form of address space layout randomization trying to take advantage of all 64 bits by scattering mapped memory throughout all 2**64 addresses (minus whatever the kernel uses), for libraries or heap allocation or both.

> Basically I think it's OK to restrict the Scheme heap to be within a
> 48-bit space, at least for the next decade or so.  But given that the
> total address space is more than 48 bits on many architectures,
> arbitrary immediate foreign pointers may not be possible on
> e.g. Sparc64.

Nor the full range of (u)int64_t values we might get from a library.

Though, I'll just throw these out there:

If the 64-bit SCM type isn't required to represent the full range of integer values the machine can support as immediate values, does it really have to encompass the full range of "double" values?  Is that really what we should be optimizing the encoding for?  Maybe for really large or really tiny values it would be okay to use heap storage as we do for bignums, and steal an exponent bit to use as a tag?  Or if you steal a few mantissa bits, you lose a little precision but keep all the exponent bits.  So you don't need to waste 13 bits on saying "this is not a floating point value" all the time, and you can widen the range permissible for immediate integer and pointer values.

How much range and precision do we need in floating point values, anyways?  Is there a reason to use "double" and not "float" or "long double"?  If "float" is acceptable (which I assume it's probably not; I'm just exploring the idea "out loud" as it were), we could just encode an intact "float" and a bunch of tag bits together in a 64-bit value, on any machine where "float" is 32 bits, and it'd probably have the range needed for a lot of everyday use.  Or, combine the ideas -- on a 32-bit machine, use a 32-bit type, one bit indicates "this is a 'float' with one exponent bit stolen", otherwise more tag bits indicate other immediate or non-immediate types, and one of the non-immediate ones encodes a full "double" when the wider range is needed.

> I think we need to do the JSC way, as it appears to be the only way to
> work with the BDW GC, currently anyway.  We will need some integration
> with the GC to ensure the 48-bit space, but that should be doable.

Don't we have some objects now which can be initialized statically by the compiler, and for which the addresses get encoded directly into the resulting SCM objects?  That means the mapping of executable and library images would have to fit in the 48-bit address space, and that's generally up to the OS kernel; having BDW-GC do some magic at allocation time wouldn't be enough.

Ken

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: redoing SCM representation in 2.2
  2011-05-14  9:47   ` Andy Wingo
@ 2011-05-15  9:02     ` Ken Raeburn
  2011-05-15 15:35       ` Andy Wingo
  0 siblings, 1 reply; 13+ messages in thread
From: Ken Raeburn @ 2011-05-15  9:02 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Mark H Weaver, guile-devel

On May 14, 2011, at 05:47, Andy Wingo wrote:
> However, I realized that this isn't going to work on 32-bit, and for an
> unexpected reason: GC.  The problem is that the low 32-bits can be
> interpreted as a pointer, so you need to tag those bits to make the
> payloads of immediate values like integers or characters not confusable
> with pointers, and that takes away any potential advantage (wider fixnum
> range for example).

Is that really any more of an issue this way than with the current encoding -- if not for SCM, then for heap data structures including both SCM objects and integers or characters?  I thought the GC code already had to cope with things looking like they could be pointers but not actually corresponding to anything allocated via the GC library.

Ken


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: redoing SCM representation in 2.2
  2011-05-15  9:02     ` Ken Raeburn
@ 2011-05-15 15:35       ` Andy Wingo
  0 siblings, 0 replies; 13+ messages in thread
From: Andy Wingo @ 2011-05-15 15:35 UTC (permalink / raw)
  To: Ken Raeburn; +Cc: Mark H Weaver, guile-devel

On Sun 15 May 2011 11:02, Ken Raeburn <raeburn@raeburn.org> writes:

> Is that really any more of an issue this way than with the current
> encoding -- if not for SCM, then for heap data structures including both
> SCM objects and integers or characters?  I thought the GC code already
> had to cope with things looking like they could be pointers but not
> actually corresponding to anything allocated via the GC library.

It's not an issue for SCM values.  The Scheme stack has only Scheme
values, so there's no problem there either; and the C stack is small,
mostly ephemeral, and there is good blacklisting, so no problem there
either, AFAIK.  Heap values which contain no GC-controlled pointers are
allocated "atomically", which means the GC doesn't trace them.  So the
issue would only be with structures which are traced by GC and include
both, and there are very few of those.

Regards,

Andy
-- 
http://wingolog.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: redoing SCM representation in 2.2
  2011-05-15  9:00 ` Ken Raeburn
@ 2011-05-15 15:47   ` Andy Wingo
  2011-05-15 20:43     ` Ken Raeburn
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Wingo @ 2011-05-15 15:47 UTC (permalink / raw)
  To: Ken Raeburn; +Cc: guile-devel

On Sun 15 May 2011 11:00, Ken Raeburn <raeburn@raeburn.org> writes:

> So... Guile 2.2 won't work on the VAXstation in my basement, which
> doesn't do IEEE math? :-(
> (Not that I've powered it up in some time...)

I have no idea.

However... note that GCC obsoleted all vax ports but openbsd and netbsd
in 4.3, removed them in 4.4, and just obsoleted it on netbsd recently I
think.  Just saying :)

> Guess I hadn't thought about that before; we've got code that refers to
> IEEE floating point already, so does that mean we require IEEE floating
> point already?

Probably?  I hope so? :)  I think I'm too young to know about pre-ieee
FP :)

> On 64-bit SPARC and perhaps some other architectures, we'd be dependent
> on the OS only effectively using 48 bits worth of address space even if
> the hardware supports more.  I'd be surprised if we encounter a program
> that needs more storage than that, and I expect most current OSes will
> tend to have a couple of regions growing toward each other rather than
> scatter stuff all over the address space, but I could imagine a
> particularly naïve or aggressive form of address space layout
> randomization trying to take advantage of all 64 bits by scattering
> mapped memory throughout all 2**64 addresses (minus whatever the kernel
> uses), for libraries or heap allocation or both.

Yes, indeed.

> If the 64-bit SCM type isn't required to represent the full range of
> integer values the machine can support as immediate values, does it
> really have to encompass the full range of "double" values?

No, but the nice thing about doubles is that it's a closed set.  Any
operation on a double produces a double.  Subsets do not have that
property.

>> I think we need to do the JSC way, as it appears to be the only way to
>> work with the BDW GC, currently anyway.  We will need some integration
>> with the GC to ensure the 48-bit space, but that should be doable.
>
> Don't we have some objects now which can be initialized statically by
> the compiler, and for which the addresses get encoded directly into the
> resulting SCM objects?

Yes, though this is optional.  We could whitelist some set of platforms
for which this can work.

FWIW I plan on moving objcode to be ELF in 2.2, which will mean we write
our own loader for ELF, so we would have similar concerns about mapping
the file in the right address range.

Regards,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: redoing SCM representation in 2.2
  2011-05-15 15:47   ` Andy Wingo
@ 2011-05-15 20:43     ` Ken Raeburn
  2011-05-16  9:40       ` Andy Wingo
  2011-05-17 18:59       ` Mark H Weaver
  0 siblings, 2 replies; 13+ messages in thread
From: Ken Raeburn @ 2011-05-15 20:43 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-devel

On May 15, 2011, at 11:47, Andy Wingo wrote:
> However... note that GCC obsoleted all vax ports but openbsd and netbsd
> in 4.3, removed them in 4.4, and just obsoleted it on netbsd recently I
> think.  Just saying :)

I knew a bunch of ancient OSes for Vax were made obsolete in gcc, but last I looked, the source files were still in the tree.

The latest NetBSD release, NetBSD 5.1, included updates to the vax port.  They just introduced a 3-tier system for the ports; Vax is in the second tier -- not the main project focus, but still alive and community-supported.

> Probably?  I hope so? :)  I think I'm too young to know about pre-ieee
> FP :)

What are they teaching you kids today? :)

Seriously, at some point it probably is reasonable to ditch the really old platforms no one cares about any more (e.g., try to find free or open-source software for a machine without 8-bit bytes), but at the same time, there is a part of the community that wants to use some of the old hardware out there with modern software.  For some it may just be nostalgia, or the challenge of it, or maybe some have real needs for old hardware (supporting old but still-important devices via old bus architectures?).  For me, it's partly nostalgia (4.2/4.3BSD was how I learned UNIX in school), and partly trying to keep awareness of general porting issues.

>> If the 64-bit SCM type isn't required to represent the full range of
>> integer values the machine can support as immediate values, does it
>> really have to encompass the full range of "double" values?
> 
> No, but the nice thing about doubles is that it's a closed set.  Any
> operation on a double produces a double.  Subsets do not have that
> property.

Well, I think it's also a subset of "long double", but if you define the operations as overflowing rather than giving results in the long double range, then you've defined it as a closed set, sure.

But the question I was after was, even if we want to use the full range of the values, do we need the entire set to be representable *in immediate form*?  I doubt the very largest and smallest values are used often, so making just those values use heap storage probably wouldn't be too costly in space.  (The code might be a bit annoying, but like I said, it gets us lots of bits back for pointer/integer values.)  And if those largest and smallest values are used more than I suspect, that brings up the question of whether we might want the "long double" range available instead.

> FWIW I plan on moving objcode to be ELF in 2.2, which will mean we write
> our own loader for ELF, so we would have similar concerns about mapping
> the file in the right address range.

Ooh, also very interesting.  Though, I would think the win of using ELF would be the ability to treat it as a native object file and use native support, which only works on ELF systems.  Not Windows, not Mac OS X, probably not on some UNIX system still being sold, almost certainly not on some UNIX system still being used.  Why "ELF everywhere" rather than "native object file formats"?  Scheme is wonderful and all, but I think calling "dlopen" is probably much nicer than writing it from scratch in any language. :-)

If you have to write your own loader anyways, we could pick up some lessons from ELF, and maybe use something modeled on it, but I'm not sure ELF itself is best; it's probably got a lot of stuff we don't need.

Ken

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: redoing SCM representation in 2.2
  2011-05-15 20:43     ` Ken Raeburn
@ 2011-05-16  9:40       ` Andy Wingo
  2011-05-17 18:59       ` Mark H Weaver
  1 sibling, 0 replies; 13+ messages in thread
From: Andy Wingo @ 2011-05-16  9:40 UTC (permalink / raw)
  To: Ken Raeburn; +Cc: guile-devel

Hi Ken,

On Sun 15 May 2011 22:43, Ken Raeburn <raeburn@raeburn.org> writes:

>> FWIW I plan on moving objcode to be ELF in 2.2, which will mean we
> write
>> our own loader for ELF, so we would have similar concerns about
> mapping
>> the file in the right address range.
>
> Ooh, also very interesting.  Though, I would think the win of using ELF
> would be the ability to treat it as a native object file and use native
> support, which only works on ELF systems.

Actually that's only part of it.  The point I'm mainly interested in is
its extensibility.  The fact that it can interoperate with native tools
on GNU systems is a plus, but I want to use ELF even for bytecode.

> If you have to write your own loader anyways, we could pick up some
> lessons from ELF, and maybe use something modeled on it, but I'm not
> sure ELF itself is best; it's probably got a lot of stuff we don't need.

True, true; definitely something to figure out as we go.

Regards,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: redoing SCM representation in 2.2
  2011-05-15 20:43     ` Ken Raeburn
  2011-05-16  9:40       ` Andy Wingo
@ 2011-05-17 18:59       ` Mark H Weaver
  2011-05-19  7:58         ` Ken Raeburn
  1 sibling, 1 reply; 13+ messages in thread
From: Mark H Weaver @ 2011-05-17 18:59 UTC (permalink / raw)
  To: Ken Raeburn; +Cc: Andy Wingo, guile-devel

Ken Raeburn <raeburn@raeburn.org> writes:
> On May 15, 2011, at 11:47, Andy Wingo wrote:
>> No, but the nice thing about doubles is that it's a closed set.  Any
>> operation on a double produces a double.  Subsets do not have that
>> property.
>
> Well, I think it's also a subset of "long double", but if you define
> the operations as overflowing rather than giving results in the long
> double range, then you've defined it as a closed set, sure.
>
> But the question I was after was, even if we want to use the full
> range of the values, do we need the entire set to be representable *in
> immediate form*?  I doubt the very largest and smallest values are
> used often, so making just those values use heap storage probably
> wouldn't be too costly in space.

In principle, I agree with you that it might be reasonable to cut the
exponent range of immediate doubles in half in order to gain an extra
bit of tagging information.  Unfortunately, it seems to me that the
details of the IEEE 754 floating-point formats would make the required
checks rather expensive.

The exponent field of an IEEE 754 binary floating-point number is not
stored in twos-complement format, but is instead "biased", meaning that
a constant is added to the true exponent of two, such that 1 represents
the smallest representable exponent, and 2^EXPBITS-2 represents the
largest representable exponent.  The worse problem is that a 0 in the
exponent field means that the number is a (signed) zero, and 2^EXPBITS-1
means that the number is either infinity of a NaN.

Given this, I don't see an obvious way to steal a bit or two from the
exponent field without making the associated checks too expensive to be
worth the benefits.  I think at least two conditional branches would be
required.  Do you see a clever way around this?  If so, can you please
post the details?

    Best,
     Mark

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: redoing SCM representation in 2.2
  2011-05-17 18:59       ` Mark H Weaver
@ 2011-05-19  7:58         ` Ken Raeburn
  0 siblings, 0 replies; 13+ messages in thread
From: Ken Raeburn @ 2011-05-19  7:58 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Andy Wingo, guile-devel

On May 17, 2011, at 14:59, Mark H Weaver wrote:
>> But the question I was after was, even if we want to use the full
>> range of the values, do we need the entire set to be representable *in
>> immediate form*?  I doubt the very largest and smallest values are
>> used often, so making just those values use heap storage probably
>> wouldn't be too costly in space.
> 
> In principle, I agree with you that it might be reasonable to cut the
> exponent range of immediate doubles in half in order to gain an extra
> bit of tagging information.  Unfortunately, it seems to me that the
> details of the IEEE 754 floating-point formats would make the required
> checks rather expensive.
> 
> The exponent field of an IEEE 754 binary floating-point number is not
> stored in twos-complement format, but is instead "biased", meaning that
> a constant is added to the true exponent of two, such that 1 represents
> the smallest representable exponent, and 2^EXPBITS-2 represents the
> largest representable exponent.  The worse problem is that a 0 in the
> exponent field means that the number is a (signed) zero, and 2^EXPBITS-1
> means that the number is either infinity of a NaN.

So the most interesting exponent values to keep expressible as immediates would include (hex) 000, 7ff, and values near 3ff and 400, right?  One way of looking at that is, in the top 3 exponent bits, four of the eight possible patterns (000, 111, 011, 100) represent the most interesting floating point values, and the rest can be relegated to heap storage.  If we leave it at that, actually, some of the really-small and really-large numbers could be encoded immediately, but values of medium magnitudes could not.

Hmm... we'd want to map field values of 0, 7, 3, and 4 to one value, and 1, 2, 5, and 6 to another.  How about: "(field+1) & 2"?  I think that gives you 0 for 0/inf/nan and values with large positive or negative exponents, and for values closer to 1 in magnitude, and gives 2 for finite values with medium-range magnitudes.  (I threw some code together testing some powers of 10; it looks like 1.0e-231 and smaller, 1.0e-76 through 1.0e77, and 1.0e232 and larger, along with 0/nan/inf, would get classified together that way.)

Make the "field+1" part of the encoding, and all those values encode in a form that has that specific bit clear (or bits could be shuffled to put it elsewhere); then values with the bit set would indicate everything other than immediate doubles.

A bit more arithmetic trickery might let you swap exponent 000 with 200 and 7ff with 5ff, giving you only 0x3ff exponent values you'd want to store immediately, same as above, but with only two of the exponent values "stolen" for 0/inf/nan.

This is all still only somewhat IEEE-754 dependent, in that it doesn't require the existence of inf or nan values, it just twists things around a bit so that the IEEE-754 encodings of those values would be among those encoded as immediate values.  If a machine like the Vax doesn't support them, no big deal; the basic idea still works.

But, what's the case we should optimize for?  I'm guessing that in cases where lots of floating point values are encoded and decoded, we're doing lots of math or lots of I/O, and I suspect that 0/inf/nan are not nearly as common as finite nonzero values in some truncated range.  (And how big a range of exponents do most applications need?)  So a variant encoding that uses magic values for 0/inf/nan that can be easily tested for collectively, much like we do for #t/#f/#nil now, and heap encoding for the largest-magnitude exponents, cuts in half the number of values we need to be able to encode in immediate form using lots of bits, at the cost of a few more magic constants, and a slightly slower classify/encode/decode process because of the use of three encodings.  And we may be able to streamline usage slightly with a macro or inline function to express "tell me if this is a double and if it is also extract the value into this variable".

I guess one important question is how important efficient use and encoding of the full range of floating point values is compared to other types.  I think I'd rather have the wider space for integers and pointers, personally, but I'm sure other people's work may have different demands.  And for that matter, I don't know if all that much of the 64-bit integer data I ever deal with really is outside the 48-bit range, though I could imagine issues coming up with bit masks.

> Given this, I don't see an obvious way to steal a bit or two from the
> exponent field without making the associated checks too expensive to be
> worth the benefits.  I think at least two conditional branches would be
> required.  Do you see a clever way around this?  If so, can you please
> post the details?

Correctly predicted branches are generally cheap on most modern architectures, I believe.  So I'm not sure how costly it would really be, compared to all the other stuff we do.  It would cost a little more than in Andy's proposal, I think.

If the test you care about is, "is this a double", then with the first form above I guess it would be implemented as "is bit X clear; if not, does the tag indicate that it's a double stored in the heap".  So, looks like two tests to determine that it isn't a double, and they may or may not be easy to combine into one test.  The second form preserves more of the value space for non-double values, but would probably take three tests (bit X; 0/nan/inf magic constants; heap-double tag) to determine that a value isn't a double.  For a value that is a double, hopefully the first test would catch it, most of the time.

Ken

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-05-19  7:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-12 10:17 redoing SCM representation in 2.2 Andy Wingo
2011-05-12 11:12 ` nalaginrut
2011-05-12 12:50 ` Stefan Israelsson Tampe
2011-05-13 22:08 ` Mark H Weaver
2011-05-14  9:47   ` Andy Wingo
2011-05-15  9:02     ` Ken Raeburn
2011-05-15 15:35       ` Andy Wingo
2011-05-15  9:00 ` Ken Raeburn
2011-05-15 15:47   ` Andy Wingo
2011-05-15 20:43     ` Ken Raeburn
2011-05-16  9:40       ` Andy Wingo
2011-05-17 18:59       ` Mark H Weaver
2011-05-19  7:58         ` Ken Raeburn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).