unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* Immediate doubles (up to 2^256) and rationals coming to Guile 3
@ 2019-06-06  9:40 Mark H Weaver
  2019-06-06 12:56 ` Mark H Weaver
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Mark H Weaver @ 2019-06-06  9:40 UTC (permalink / raw)
  To: guile-devel

I've found a way to efficiently support both immediate IEEE binary-64
doubles up to ~1.158e77 (with larger ones transparently allocated on the
heap), and also immediate exact rationals with up to 54 binary digits
(~16 decimal digits), without restricting the 64-bit pointer space at
all, and without any loss of arithmetic precision.

I have a working draft implementation that roughly doubles the speed of
a simple "substract 1.0 until negative" loop for inexact reals less than
2^256, compared with current 'master' (near 2.9.2).  The same loop for
exact rationals runs in ~70% of the time compared with current master.
More importantly, no heap allocation is required when working with these
immediate values.

More precisely, large finite doubles with magnitude >= 2^256
(1.157920892373162e77) must be heap allocated.  All other doubles,
including all Inf/NaNs, are immediate floats, or "iflos".

I call this approach "high-exponent boxing", because instead of using
the space of ~2^53 NaNs for non-flonum purposes, I use the space of
finite doubles with binary exponent larger than 255.  That's 3/8ths of
the total space of IEEE doubles.  This gives us a total of (3 * 2^61)
SCM values to use for other things.

To convert a double into an SCM, just add (2^60 + 2^52) to its unsigned
64-bit representation, and rotate left 4 bits.  This maps the largest
finite doubles to SCM values with 000, 110, or 111 in the low 3 bits.

* * *

I've also implemented immediate exact rationals, or "fixrats".  The
precise rule is that on a 64-bit system, an exact non-integer rational
is immediate if and only if it can be written with 54 binary digits or
less.  In terms of decimal digits, any rational that can be written with
16 decimal digits is immediate, and some that require 17 or 18 decimal
digits are immediate.

Here's the format of fixrats on 64-bit systems:

  Srrrrrrxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx0111
  |\____/\___________________________________________________/\__/
  |  |                          |                              |
  | rank (6 bits)             data (53 bits)                  tag
 sign

The 53-bit data field contains both the numerator and denominator, with
the denominator in the highest bits.  The most significant bit of the
denominator is implicit, i.e. not explicitly represented.  That's why we
can represent up to 54 binary digits in a 53-bit field.  The remaining
bits of the denominator occupy the most significant data bits, and the
magnitude of the numerator occupy the least significant data bits.

The 6-bit rank is 2 less than the integer length of the denominator,
i.e. 1 less than the number of data bits allocated to the denominator.

I chose this representation because it allows us to leverage existing
floating-point hardware to efficiently pack fixrats.  Simply convert the
denominator to an IEEE double, and now the rank will be in the low bits
of the IEEE exponent field, immediately adjacent to the denominator with
its high bit removed.  This simplifies the packing operation.

There's also a nice way to extract the denominator from a fixrat: mask
out the sign bit, shift right 5 bits, and interpret it as an IEEE
double.  The denominator will be the integer part of the resulting
value, with the numerator in the fraction bits.  Simply cast this double
to an integer to discard the numerator bits.

* * *

Here are the tags used in my draft implementation:

;; /* with iflos:   xxx:  iflo (000 < xxx < 110)
;;    (64-bit)     1111:  fixnum
;;                 0111:  fixrat
;;
;;                  000:  heap object
;;             tttt0110:  immediate non-number
;;                 1110:  [NOT_SCM]
;;                11110:  [NOT_SCM] struct tag
;;          ttttt101110:  [NOT_SCM] non-pair non-struct non-smob tag
;;     ttttttttx1001110:  [NOT_SCM] smob

and here's my plan for 32-bit systems, not yet implemented:

;; without iflos:     1:  fixnum
;;    (32-bit)       10:  fixrat
;;
;;                  000:  heap object
;;             tttt0100:  immediate non-number
;;                 1100:  [NOT_SCM]
;;                11100:  [NOT_SCM] struct tag
;;          ttttt101100:  [NOT_SCM] non-pair non-struct non-smob tag
;;     ttttttttx1001100:  [NOT_SCM] smob                              */

I'll have more to say about all of this in future messages, and of
course I welcome your comments and suggestions.  For now, I've pushed my
preliminary work to the 'wip-new-tagging' branch on Savannah.

  https://git.savannah.gnu.org/cgit/guile.git/log/?h=wip-new-tagging

Give it a try and let me know what you think.

      Mark



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
  2019-06-06  9:40 Immediate doubles (up to 2^256) and rationals coming to Guile 3 Mark H Weaver
@ 2019-06-06 12:56 ` Mark H Weaver
  2019-06-06 19:37 ` Arne Babenhauserheide
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Mark H Weaver @ 2019-06-06 12:56 UTC (permalink / raw)
  To: guile-devel

Earlier I wrote:

> There's also a nice way to extract the denominator from a fixrat: mask
> out the sign bit, shift right 5 bits, and interpret it as an IEEE
> double.  The denominator will be the integer part of the resulting
> value, with the numerator in the fraction bits.  Simply cast this double
> to an integer to discard the numerator bits.

Sorry, I forgot a step in the description above.  You must also set the
most significant bit of the biased exponent field after shifting right.

> Here are the tags used in my draft implementation:
>
> ;; /* with iflos:   xxx:  iflo (000 < xxx < 110)
> ;;    (64-bit)     1111:  fixnum
> ;;                 0111:  fixrat
> ;;
> ;;                  000:  heap object
> ;;             tttt0110:  immediate non-number
> ;;                 1110:  [NOT_SCM]
> ;;                11110:  [NOT_SCM] struct tag
> ;;          ttttt101110:  [NOT_SCM] non-pair non-struct non-smob tag
> ;;     ttttttttx1001110:  [NOT_SCM] smob

I should also mention that although the current patch set adds about 4
bits to the size of all heap tags, e.g. changing most tc7 tags to tc11,
I can see a way to avoid this: we could tag pairs in the low bits of
their pointers, instead of in their CARs.

More concretely, 1110 would become the "pair pointer" tag, thus
eliminating the need for the 1110 "NOT_SCM" tag (a.k.a. the non-pair
heap object tag).  This would eliminate the need to change the heap tags
at all, and dramatically reduce the size of my patch set.  Moreover, the
low bit would no longer need to be 1, so we would have many more tc7
tags to work with.

       Mark



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
  2019-06-06  9:40 Immediate doubles (up to 2^256) and rationals coming to Guile 3 Mark H Weaver
  2019-06-06 12:56 ` Mark H Weaver
@ 2019-06-06 19:37 ` Arne Babenhauserheide
  2019-06-12  0:03   ` Mark H Weaver
  2019-06-06 20:09 ` tomas
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Arne Babenhauserheide @ 2019-06-06 19:37 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 2609 bytes --]


Mark H Weaver <mhw@netris.org> writes:
> I have a working draft implementation that roughly doubles the speed of
> a simple "substract 1.0 until negative" loop for inexact reals less than
> 2^256, compared with current 'master' (near 2.9.2).  The same loop for
> exact rationals runs in ~70% of the time compared with current master.
> More importantly, no heap allocation is required when working with these
> immediate values.

> More precisely, large finite doubles with magnitude >= 2^256
> (1.157920892373162e77) must be heap allocated.  All other doubles,
> including all Inf/NaNs, are immediate floats, or "iflos".
>
> I call this approach "high-exponent boxing", because instead of using
> the space of ~2^53 NaNs for non-flonum purposes, I use the space of
> finite doubles with binary exponent larger than 255.  That's 3/8ths of
> the total space of IEEE doubles.  This gives us a total of (3 * 2^61)
> SCM values to use for other things.
> I've also implemented immediate exact rationals, or "fixrats".  The
> precise rule is that on a 64-bit system, an exact non-integer rational
> is immediate if and only if it can be written with 54 binary digits or
> less.  In terms of decimal digits, any rational that can be written with
> 16 decimal digits is immediate, and some that require 17 or 18 decimal
> digits are immediate.

Wow, that’s awesome!

> I should also mention that although the current patch set adds about 4
> bits to the size of all heap tags, e.g. changing most tc7 tags to tc11,
> I can see a way to avoid this: we could tag pairs in the low bits of
> their pointers, instead of in their CARs.
>
> More concretely, 1110 would become the "pair pointer" tag, thus
> eliminating the need for the 1110 "NOT_SCM" tag (a.k.a. the non-pair
> heap object tag).  This would eliminate the need to change the heap tags
> at all, and dramatically reduce the size of my patch set.  Moreover, the
> low bit would no longer need to be 1, so we would have many more tc7
> tags to work with.

I don’t understand the implications of this, but I realized that I would
love to have info about the sizes of different datastructures in Guile.

I recently started measuring how much memory is required for a record
compared to a list, so I could give people concrete guidelines which
datastructures to use for which algorithm, and for that such low-level
details would be great to have! (or just to know where to read them up —
or which keywords to look for)

Best wishes,
Arne
--
Unpolitisch sein
heißt politisch sein
ohne es zu merken

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1076 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
  2019-06-06  9:40 Immediate doubles (up to 2^256) and rationals coming to Guile 3 Mark H Weaver
  2019-06-06 12:56 ` Mark H Weaver
  2019-06-06 19:37 ` Arne Babenhauserheide
@ 2019-06-06 20:09 ` tomas
  2019-06-07 19:46 ` Ludovic Courtès
  2019-06-08  1:13 ` Mark H Weaver
  4 siblings, 0 replies; 18+ messages in thread
From: tomas @ 2019-06-06 20:09 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 500 bytes --]

On Thu, Jun 06, 2019 at 05:40:39AM -0400, Mark H Weaver wrote:
> I've found a way to efficiently support both immediate IEEE binary-64
> doubles up to ~1.158e77 (with larger ones transparently allocated on the
> heap), and also immediate exact rationals with up to 54 binary digits
> (~16 decimal digits), without restricting the 64-bit pointer space at
> all, and without any loss of arithmetic precision.

[...]

Impressive bitfiddling feat. Thanks for the enjoyable explanation, too!

Cheers
-- t

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
  2019-06-06  9:40 Immediate doubles (up to 2^256) and rationals coming to Guile 3 Mark H Weaver
                   ` (2 preceding siblings ...)
  2019-06-06 20:09 ` tomas
@ 2019-06-07 19:46 ` Ludovic Courtès
  2019-06-09 16:56   ` Mark H Weaver
  2019-06-08  1:13 ` Mark H Weaver
  4 siblings, 1 reply; 18+ messages in thread
From: Ludovic Courtès @ 2019-06-07 19:46 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Andy Wingo, guile-devel

Hi Mark,

Mark H Weaver <mhw@netris.org> skribis:

> I've found a way to efficiently support both immediate IEEE binary-64
> doubles up to ~1.158e77 (with larger ones transparently allocated on the
> heap), and also immediate exact rationals with up to 54 binary digits
> (~16 decimal digits), without restricting the 64-bit pointer space at
> all, and without any loss of arithmetic precision.

Woow, impressive, and really clever!

As Dave Thompson wrote on IRC yesterday, this can make a significant
difference for applications such as graphics and game engines.  I hadn’t
read the message yet and thought “hey, why not make instructions for
things like trigonometric functions so you get unboxed floats” but
obviously, as Dave pointed out, that wouldn’t scale well.  :-)

> Here's the format of fixrats on 64-bit systems:
>
>   Srrrrrrxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx0111
>   |\____/\___________________________________________________/\__/
>   |  |                          |                              |
>   | rank (6 bits)             data (53 bits)                  tag
>  sign

[...]

> I chose this representation because it allows us to leverage existing
> floating-point hardware to efficiently pack fixrats.  Simply convert the
> denominator to an IEEE double, and now the rank will be in the low bits
> of the IEEE exponent field, immediately adjacent to the denominator with
> its high bit removed.  This simplifies the packing operation.

Fun.  :-)

Fixrats can make rationals more practical in applications where one
would have avoided them for performance.

IIUC, your plan is to have a different tagging on 32-bit platforms,
without fixflos, right?  I’m curious to see how much complexity would
entail from that.

I don’t know what Andy thinks, but if there’s a good way forward for
both 32-bit and 64-bit, it’d be a nice bonus for 3.0!

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
  2019-06-06  9:40 Immediate doubles (up to 2^256) and rationals coming to Guile 3 Mark H Weaver
                   ` (3 preceding siblings ...)
  2019-06-07 19:46 ` Ludovic Courtès
@ 2019-06-08  1:13 ` Mark H Weaver
  2019-06-08  8:07   ` Arne Babenhauserheide
  4 siblings, 1 reply; 18+ messages in thread
From: Mark H Weaver @ 2019-06-08  1:13 UTC (permalink / raw)
  To: guile-devel

Mark H Weaver <mhw@netris.org> writes:

> Here's the format of fixrats on 64-bit systems:
>
>   Srrrrrrxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx0111
>   |\____/\___________________________________________________/\__/
>   |  |                          |                              |
>   | rank (6 bits)             data (53 bits)                  tag
>  sign

I've since thought of another way to represent fixrats, which allow
simpler packing and unpacking operations, although it would also mean
sacrificing one bit of precision:

   rrrrrrxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxS00111
   \____/\__________________________________________________/|\___/
     |                          |                            |  |
    rank (6 bits)             data (52 bits)              sign  tag

In this new representation, the positions of the numerator and
denominator are swapped.  Now, the numerator occupies the
most-significant data bits (with its most-significant bit removed), and
the denominator occupies the lowest data bits.  The 6-bit rank field is
now 1 less than the integer-length of the numerator, i.e. the rank is
equal to the number of data bits allocated to the numerator.

This allows for an elegant packing operation:

  if (SCM_I_INUMP (numerator) && SCM_I_INUMP (denominator))
    {
      union { double f; uint64_t u; } u;
      u.f  = SCM_I_INUM (numerator);
      u.u += SCM_I_INUM (denominator);
      if ((scm_t_inum) u.f == SCM_I_INUM (numerator))
        {
          u.u += 0xc010000000000000;
          u.u = (u.u << 6) | (u.u >> 58);
          if ((u.u & 0x1f) == 0)
            return SCM_PACK (u.u | scm_fixrat_tag);
        }
    }

We start by converting the numerator into an IEEE double.  We then use
unsigned integer addition to add the denominator to the 64-bit
representation of that double.  We now check that this new IEEE double
has integer part equal to the numerator.  If it doesn't, this indicates
either that the numerator is too large to be represented exactly, or
that the denominator is too large to fit in the remaining bits.

The only thing that remains to be done at this point is to rotate left 6
bits, so that the 5 highest exponent bits become the lowest 5 bits,
where the fixrat tag will go, and to add a value which adjusts the IEEE
biased exponent field to be 0 when the numerator is 1 or -1.

In the code above, I make one final check to make sure the low 5 bits
are zeroes, before applying the 5-bit tag.  However, it's possible that
this final test might be safely omitted.  I'll need to take a careful
look at that.

On 32-bit systems, we can use the same approach, but using IEEE singles
(C floats) instead of doubles, and with a 3-bit tag.  This would allow
fixrats that can be written with 24 binary digits or less.

I went ahead and implemented this, and it is _marginally_ faster than
the earlier version, but not as much as I hoped for.  So, I'm not yet
sure whether to make the switch, but I wanted to post about it anyway.

       Mark



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
  2019-06-08  1:13 ` Mark H Weaver
@ 2019-06-08  8:07   ` Arne Babenhauserheide
  2019-06-08  9:08     ` Chris Vine
  0 siblings, 1 reply; 18+ messages in thread
From: Arne Babenhauserheide @ 2019-06-08  8:07 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 1960 bytes --]


Mark H Weaver <mhw@netris.org> writes:

> Mark H Weaver <mhw@netris.org> writes:
>
>> Here's the format of fixrats on 64-bit systems:
> This allows for an elegant packing operation:
>
>   if (SCM_I_INUMP (numerator) && SCM_I_INUMP (denominator))
>     {
>       union { double f; uint64_t u; } u;
>       u.f  = SCM_I_INUM (numerator);
>       u.u += SCM_I_INUM (denominator);

Wow, I didn’t know that you could do that.

However: "The details of that allocation are implementation-defined, and
it's undefined behavior to read from the member of the union that wasn't
most recently written." https://en.cppreference.com/w/cpp/language/union

Can you guarantee that this works?

>       if ((scm_t_inum) u.f == SCM_I_INUM (numerator))
>         {
>           u.u += 0xc010000000000000;
>           u.u = (u.u << 6) | (u.u >> 58);
>           if ((u.u & 0x1f) == 0)
>             return SCM_PACK (u.u | scm_fixrat_tag);
>         }
>     }
>
> We start by converting the numerator into an IEEE double.  We then use
> unsigned integer addition to add the denominator to the 64-bit
> representation of that double.  We now check that this new IEEE double
> has integer part equal to the numerator.  If it doesn't, this indicates
> either that the numerator is too large to be represented exactly, or
> that the denominator is too large to fit in the remaining bits.
>
> The only thing that remains to be done at this point is to rotate left 6
> bits, so that the 5 highest exponent bits become the lowest 5 bits,
> where the fixrat tag will go, and to add a value which adjusts the IEEE
> biased exponent field to be 0 when the numerator is 1 or -1.

It is really cool to read these deep details — is there a chance that
when this lands you could re-vamp the emails you wrote here into a
blog-post we can easily link to?

Best wishes,
Arne
--
Unpolitisch sein
heißt politisch sein
ohne es zu merken

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1076 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
  2019-06-08  8:07   ` Arne Babenhauserheide
@ 2019-06-08  9:08     ` Chris Vine
  2019-06-08  9:46       ` Arne Babenhauserheide
  2019-06-08 13:12       ` Hans Åberg
  0 siblings, 2 replies; 18+ messages in thread
From: Chris Vine @ 2019-06-08  9:08 UTC (permalink / raw)
  To: guile-devel

On Sat, 08 Jun 2019 10:07:45 +0200
Arne Babenhauserheide <arne_bab@web.de> wrote:
[snip]
> Wow, I didn’t know that you could do that.
> 
> However: "The details of that allocation are implementation-defined, and
> it's undefined behavior to read from the member of the union that wasn't
> most recently written." https://en.cppreference.com/w/cpp/language/union
> 
> Can you guarantee that this works?

This is C and not C++ and the provision to which you refer does not
apply.

Reading from a member of a union other than the one last written to is
implementation defined in C89/90, and defined in C99 (with Technical
Corrigendum 3) and above, although it might include a trap
representation (but wouldn't on any platform supported by guile).  You
might want to see in particular footnote 95 of C11 (which isn't
normative but is intended to explain the provisions of §6.2.6.1 which
are).

gcc and clang have always supported type punning through unions.

Chris



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
  2019-06-08  9:08     ` Chris Vine
@ 2019-06-08  9:46       ` Arne Babenhauserheide
  2019-06-08 10:24         ` Chris Vine
  2019-06-08 13:12       ` Hans Åberg
  1 sibling, 1 reply; 18+ messages in thread
From: Arne Babenhauserheide @ 2019-06-08  9:46 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 873 bytes --]


Chris Vine <vine35792468@gmail.com> writes:

> On Sat, 08 Jun 2019 10:07:45 +0200
> Arne Babenhauserheide <arne_bab@web.de> wrote:
> [snip]
>> Wow, I didn’t know that you could do that.
>> 
>> However: "The details of that allocation are implementation-defined, and
>> it's undefined behavior to read from the member of the union that wasn't
>> most recently written." https://en.cppreference.com/w/cpp/language/union
>> 
>> Can you guarantee that this works?
>
> This is C and not C++ and the provision to which you refer does not
> apply.
>
> Reading from a member of a union other than the one last written to is
> implementation defined in C89/90, and defined in C99 (with Technical
> Corrigendum 3) and above

Thank you for the correction and explanation!

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein
ohne es zu merken

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1076 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
  2019-06-08  9:46       ` Arne Babenhauserheide
@ 2019-06-08 10:24         ` Chris Vine
  0 siblings, 0 replies; 18+ messages in thread
From: Chris Vine @ 2019-06-08 10:24 UTC (permalink / raw)
  To: guile-devel

On Sat, 08 Jun 2019 11:46:10 +0200
Arne Babenhauserheide <arne_bab@web.de> wrote:
> Chris Vine <vine35792468@gmail.com> writes:
> > On Sat, 08 Jun 2019 10:07:45 +0200
> > Arne Babenhauserheide <arne_bab@web.de> wrote:
> > [snip]
> >> Wow, I didn’t know that you could do that.
> >> 
> >> However: "The details of that allocation are implementation-defined, and
> >> it's undefined behavior to read from the member of the union that wasn't
> >> most recently written." https://en.cppreference.com/w/cpp/language/union
> >> 
> >> Can you guarantee that this works?
> >
> > This is C and not C++ and the provision to which you refer does not
> > apply.
> >
> > Reading from a member of a union other than the one last written to is
> > implementation defined in C89/90, and defined in C99 (with Technical
> > Corrigendum 3) and above
> 
> Thank you for the correction and explanation!

You have a good point though if visible type transformations were to
appear in a header rather than a *.c file, because guile headers are
(at the moment) intended to be in the common subset of C and C++ so that
libguile.h can be included in a C++ programme.

Having said that, gcc and clang support type punning through unions in
C++ as well as C.  I don't know if guile is supposed to compile with
other compilers nowadays: but frankly it would be perverse for some
other compiler which supports both C and C++ to invoke different
behaviour for unions in such cases.

Chris



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
  2019-06-08  9:08     ` Chris Vine
  2019-06-08  9:46       ` Arne Babenhauserheide
@ 2019-06-08 13:12       ` Hans Åberg
  1 sibling, 0 replies; 18+ messages in thread
From: Hans Åberg @ 2019-06-08 13:12 UTC (permalink / raw)
  To: Chris Vine; +Cc: guile-devel


> On 8 Jun 2019, at 11:08, Chris Vine <vine35792468@gmail.com> wrote:
> 
> On Sat, 08 Jun 2019 10:07:45 +0200
> Arne Babenhauserheide <arne_bab@web.de> wrote:
> [snip]
>> Wow, I didn’t know that you could do that.
>> 
>> However: "The details of that allocation are implementation-defined, and
>> it's undefined behavior to read from the member of the union that wasn't
>> most recently written." https://en.cppreference.com/w/cpp/language/union
>> 
>> Can you guarantee that this works?
> 
> This is C and not C++ and the provision to which you refer does not
> apply.
> 
> Reading from a member of a union other than the one last written to is
> implementation defined in C89/90, and defined in C99 (with Technical
> Corrigendum 3) and above, although it might include a trap
> representation (but wouldn't on any platform supported by guile).  You
> might want to see in particular footnote 95 of C11 (which isn't
> normative but is intended to explain the provisions of §6.2.6.1 which
> are).
> 
> gcc and clang have always supported type punning through unions.

FYI, the site mentioned above also covers C (there is a link the bottom of the C++ page above):
  https://en.cppreference.com/w/c/language/union





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
  2019-06-07 19:46 ` Ludovic Courtès
@ 2019-06-09 16:56   ` Mark H Weaver
  2019-06-09 17:30     ` Mark H Weaver
  2019-06-11  8:39     ` Ludovic Courtès
  0 siblings, 2 replies; 18+ messages in thread
From: Mark H Weaver @ 2019-06-09 16:56 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Andy Wingo, guile-devel

Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:
> As Dave Thompson wrote on IRC yesterday, this can make a significant
> difference for applications such as graphics and game engines.

Yes, it's especially important to be able to avoid heap allocation in
games, where GC pauses can be painful.  However, any application that
needs to do a lot of inexact arithmetic should benefit from the fact
that it's a lot cheaper to pack these immediate floats than to allocate
and later reclaim a heap-allocated cell.

> I hadn’t
> read the message yet and thought “hey, why not make instructions for
> things like trigonometric functions so you get unboxed floats” but
> obviously, as Dave pointed out, that wouldn’t scale well.  :-)

The unboxing done by our compiler is still the best option where it can
be done.  However, there are many cases where floats must be boxed, and
that seems unavoidable in a language like Scheme.

> Fixrats can make rationals more practical in applications where one
> would have avoided them for performance.

Yes, exactly, that's my motivation for fixrats.  I probably should have
said so explicitly in my original message, so thanks for mentioning it.
I've certainly avoided exact rationals in the past for performance
reasons, although they would have made the code much clearer.  I suspect
I'm not alone.  My hope is that in a future, we might be able to use
exact rationals more frequently.

> IIUC, your plan is to have a different tagging on 32-bit platforms,
> without fixflos, right?  I’m curious to see how much complexity would
> entail from that.

Yes, although I'm avoiding the term "fixflos" because IEEE doubles are
also fixed width, and thus the term "fixflos" wouldn't adequately
distinguish them from IEEE doubles.

Anyway, I agree that it's inconvenient to have different tags on
different targets, and I've been working to minimize the differences.

At present, I'm currently implementing an alternative strategy where
pairs are tagged in their pointers instead of in their CARs, which
enables us to separate the heap tags and immediate tags into two
independent spaces.

In this new approach, the heap tags are left unchanged, and the only
tags that vary with target word size are the fixints, fixrats, iflos,
and pair pointers.  All other tags will be uniform across targets,
including the non-number immediates.  Here's the new version:

;; /* with iflos:   xxx:  iflo (000 < xxx < 110)
;;    (64-bit)     0111:  fixrat
;;                 1111:  fixnum
;;                 0110:  pair
;;                  000:  tagged heap object (thob)
;;             tttt1110:  other immediate
;;
;; without iflos:     1:  fixnum
;;    (32-bit)      010:  fixrat
;;                  100:  pair
;;                  000:  tagged heap object (thob)
;;             tttt1110:  other immediate

This new approach brings its own complications, mainly two:

(1) It breaks the long-standing assumptions in Guile that all
    non-immediates have a tag in their first word and that pointers are
    always untagged.  In my preliminary patch, I introduce a new concept
    called a "tagged heap object" or "thob", and most existing checks
    for SCM_NIMP or !SCM_IMP must be changed to use SCM_THOB_P.

(2) Our existing VM instructions almost invariably specify offsets with
    a granularity of whole words.  To support tagged pair pointers with
    good performance, I think we need a few new instructions that
    specify byte offsets, to avoid the expensive extra step of removing
    the tag before accessing the CAR or CDR of a pair.

I've already implemented these changes, and am now in another painful
round of debugging.  I suspect it will be fine, and likely preferable to
my earlier approach of changing all tc7 tags to tc11.  I'll report back
when I have it working.

      Thanks,
        Mark



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
  2019-06-09 16:56   ` Mark H Weaver
@ 2019-06-09 17:30     ` Mark H Weaver
  2019-06-11  8:39     ` Ludovic Courtès
  1 sibling, 0 replies; 18+ messages in thread
From: Mark H Weaver @ 2019-06-09 17:30 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Andy Wingo, guile-devel

I should mention that I'm very open to suggestions Andy might have about
any of this.  The new approach I'm currently working on with tagged pair
pointers requires changes to both the VM and the compiler, and I'm not
confident that I've made the best choices there.  I've made some
preliminary choices for now in order to get something working ASAP, but
I fully expect that the final version will look different based on input
from Andy and others.  I also don't know whether the new approach (with
tagged pair pointers) will be preferable to the earlier one (with most
tc7 tags changed to tc11).  My goal is to present a couple of working
alternatives, and then we can decide among them.

For now, I went ahead and pushed my new (not yet working) branch
'wip-new-tagging-bis-broken', in case you want to see the details of
this new approach-in-progress.  The new branch doesn't yet include the
iflo- or fixrat-enabling patches, which I'll add later.

       Regards,
         Mark



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
  2019-06-09 16:56   ` Mark H Weaver
  2019-06-09 17:30     ` Mark H Weaver
@ 2019-06-11  8:39     ` Ludovic Courtès
  2019-06-11 10:58       ` Mark H Weaver
  2019-06-11 11:34       ` Mark H Weaver
  1 sibling, 2 replies; 18+ messages in thread
From: Ludovic Courtès @ 2019-06-11  8:39 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Andy Wingo, guile-devel

Hello,

Mark H Weaver <mhw@netris.org> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:

[...]

>> IIUC, your plan is to have a different tagging on 32-bit platforms,
>> without fixflos, right?  I’m curious to see how much complexity would
>> entail from that.
>
> Yes, although I'm avoiding the term "fixflos" because IEEE doubles are
> also fixed width, and thus the term "fixflos" wouldn't adequately
> distinguish them from IEEE doubles.

Right!

> Anyway, I agree that it's inconvenient to have different tags on
> different targets, and I've been working to minimize the differences.
>
> At present, I'm currently implementing an alternative strategy where
> pairs are tagged in their pointers instead of in their CARs, which
> enables us to separate the heap tags and immediate tags into two
> independent spaces.

At first this sounds rather radical :-), but maybe it’s preferable this
way.

> In this new approach, the heap tags are left unchanged, and the only
> tags that vary with target word size are the fixints, fixrats, iflos,
> and pair pointers.  All other tags will be uniform across targets,
> including the non-number immediates.  Here's the new version:
>
> ;; /* with iflos:   xxx:  iflo (000 < xxx < 110)
> ;;    (64-bit)     0111:  fixrat
> ;;                 1111:  fixnum
> ;;                 0110:  pair
> ;;                  000:  tagged heap object (thob)
> ;;             tttt1110:  other immediate
> ;;
> ;; without iflos:     1:  fixnum
> ;;    (32-bit)      010:  fixrat
> ;;                  100:  pair
> ;;                  000:  tagged heap object (thob)
> ;;             tttt1110:  other immediate
>
> This new approach brings its own complications, mainly two:
>
> (1) It breaks the long-standing assumptions in Guile that all
>     non-immediates have a tag in their first word and that pointers are
>     always untagged.  In my preliminary patch, I introduce a new concept
>     called a "tagged heap object" or "thob", and most existing checks
>     for SCM_NIMP or !SCM_IMP must be changed to use SCM_THOB_P.

Though an immediate, like a fixnum or an iflo, is still something
different from a tagged heap object like a pair, right?  So I would
expect SCM_THOB_P to be a different test, not a drop-in replacement for
SCM_NIMP, is that correct?

> (2) Our existing VM instructions almost invariably specify offsets with
>     a granularity of whole words.  To support tagged pair pointers with
>     good performance, I think we need a few new instructions that
>     specify byte offsets, to avoid the expensive extra step of removing
>     the tag before accessing the CAR or CDR of a pair.

So instead of a pointer dereference, SCM_CAR becomes mask + dereference,
right?

I think we disable GC “interior pointer” scanning.  With this scheme, an
SCM for a pair would actually point in the middle of a pair; could this
be an issue for GC?

Thank you!

Ludo’.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
  2019-06-11  8:39     ` Ludovic Courtès
@ 2019-06-11 10:58       ` Mark H Weaver
  2019-06-11 12:21         ` Ludovic Courtès
  2019-06-11 11:34       ` Mark H Weaver
  1 sibling, 1 reply; 18+ messages in thread
From: Mark H Weaver @ 2019-06-11 10:58 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Andy Wingo, guile-devel

Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:
> Though an immediate, like a fixnum or an iflo, is still something
> different from a tagged heap object like a pair, right?  So I would
> expect SCM_THOB_P to be a different test, not a drop-in replacement for
> SCM_NIMP, is that correct?

That's right.  It's not possible to create a drop-in replacement for
SCM_NIMP, because it is being used to answer two different questions
which used to be effectively equivalent, but no longer are:

(1) Is X a pointer to a heap object with a heap tag in the first word?
(2) Is X a reference to a heap object?

Test (1) needs to be done before checking the heap tag, to implement
type predicates for heap objects.  Test (2) is needed in relatively few
places, e.g. to decide whether to register disappearing links when
adding an entry to a weak hash table.

Actually, in my current branch I've removed the SCM_IMP and SCM_NIMP
macros outright, because it seems to me they are likely to be misused.

SCM_THOB_P implements test (1) and SCM_HEAP_OBJECT_P implements test (2).

>> (2) Our existing VM instructions almost invariably specify offsets with
>>     a granularity of whole words.  To support tagged pair pointers with
>>     good performance, I think we need a few new instructions that
>>     specify byte offsets, to avoid the expensive extra step of removing
>>     the tag before accessing the CAR or CDR of a pair.
>
> So instead of a pointer dereference, SCM_CAR becomes mask + dereference,
> right?

There's no masking involved.  Rather, it is subtracted from the pointer,
which allows the tag to be fused with the field offset.  For example, on
x86-64, whereas CAR and CDR were previously:

     1c0:   48 8b 07                mov    (%rdi),%rax      ;old car
and:
     1d0:   48 8b 47 08             mov    0x8(%rdi),%rax   ;old cdr

Now they become:

     1e0:   48 8b 47 fa             mov    -0x6(%rdi),%rax  ;new car
and:
     1f0:   48 8b 47 02             mov    0x2(%rdi),%rax   ;new cdr

In the VM, I've added four new instructions:

  make-tagged-non-immediate dst:12 tag:12 offset:32
  tagged-allocate-words/immediate dst:8 count:8 tag:8
  tagged-scm-ref/immediate dst:8 obj:8 byte-offset:8
  tagged-scm-set!/immediate obj:8 byte-offset:8 val:8

The last two instructions above are like 'scm-ref/immediate' and
'scm-set!/immediate' except that they accept byte offsets instead of
word offsets.  CAR and CDR become:

     (tagged-scm-ref/immediate DST SRC -6)    ;CAR
and:
     (tagged-scm-ref/immediate DST SRC 2)     ;CDR

(although at present the -6 prints as 250 in the disassembler).

> I think we disable GC “interior pointer” scanning.

I agree.

> With this scheme, an SCM for a pair would actually point in the middle
> of a pair; could this be an issue for GC?

It is already the case that Guile has tagged pointers in the first word
of every struct.  The first word of a struct contains a pointer to the
vtable, with scm_tc3_struct added.

Fortunately, BDW-GC provides GC_REGISTER_DISPLACEMENT, which allows us
to register a small offset K, such that BDW-GC should recognize pointers
that point K bytes into a heap block.  We've been using this in both 2.0
and 2.2 from scm_init_struct (), and in 'master' it's done in
scm_storage_prehistory ().

This new approach entails registering one additional displacement.

In 2.0 and 2.2, we register displacements 0, 16, and 17.  The last two
are for struct vtables, which point 2 words into a heap block, even
before adding scm_tc3_struct (which is 1).

In current master, we register 0 and 1 (scm_tc3_struct).  With this new
approach, we would register 0, 1, and 6 (scm_pair_tag).

What do you think?

      Thanks,
        Mark



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
  2019-06-11  8:39     ` Ludovic Courtès
  2019-06-11 10:58       ` Mark H Weaver
@ 2019-06-11 11:34       ` Mark H Weaver
  1 sibling, 0 replies; 18+ messages in thread
From: Mark H Weaver @ 2019-06-11 11:34 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Andy Wingo, guile-devel

Ludovic Courtès <ludo@gnu.org> writes:
> Though an immediate, like a fixnum or an iflo, is still something
> different from a tagged heap object like a pair, right?

I should clarify that in this new approach, a pair is *not* a tagged
heap object.  Tagged heap objects are those which have a tag in their
first word.  In this new approach, pairs are not tagged in this sense,
and it is no longer possible to distinguish a pair from a non-pair by
looking at their raw heap blocks.

Tagged heap objects (thobs) have 000 in the low 3 bits of the SCM.
Pairs have 0110 in the low 4 bits of the SCM.

    Regards,
      Mark



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
  2019-06-11 10:58       ` Mark H Weaver
@ 2019-06-11 12:21         ` Ludovic Courtès
  0 siblings, 0 replies; 18+ messages in thread
From: Ludovic Courtès @ 2019-06-11 12:21 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Andy Wingo, guile-devel

Hi,

Mark H Weaver <mhw@netris.org> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>> Though an immediate, like a fixnum or an iflo, is still something
>> different from a tagged heap object like a pair, right?  So I would
>> expect SCM_THOB_P to be a different test, not a drop-in replacement for
>> SCM_NIMP, is that correct?
>
> That's right.  It's not possible to create a drop-in replacement for
> SCM_NIMP, because it is being used to answer two different questions
> which used to be effectively equivalent, but no longer are:
>
> (1) Is X a pointer to a heap object with a heap tag in the first word?
> (2) Is X a reference to a heap object?
>
> Test (1) needs to be done before checking the heap tag, to implement
> type predicates for heap objects.  Test (2) is needed in relatively few
> places, e.g. to decide whether to register disappearing links when
> adding an entry to a weak hash table.
>
> Actually, in my current branch I've removed the SCM_IMP and SCM_NIMP
> macros outright, because it seems to me they are likely to be misused.
>
> SCM_THOB_P implements test (1) and SCM_HEAP_OBJECT_P implements test (2).

I see.

> There's no masking involved.  Rather, it is subtracted from the pointer,
> which allows the tag to be fused with the field offset.  For example, on
> x86-64, whereas CAR and CDR were previously:
>
>      1c0:   48 8b 07                mov    (%rdi),%rax      ;old car
> and:
>      1d0:   48 8b 47 08             mov    0x8(%rdi),%rax   ;old cdr
>
> Now they become:
>
>      1e0:   48 8b 47 fa             mov    -0x6(%rdi),%rax  ;new car
> and:
>      1f0:   48 8b 47 02             mov    0x2(%rdi),%rax   ;new cdr

Looks reasonable.  :-)

> Fortunately, BDW-GC provides GC_REGISTER_DISPLACEMENT, which allows us
> to register a small offset K, such that BDW-GC should recognize pointers
> that point K bytes into a heap block.  We've been using this in both 2.0
> and 2.2 from scm_init_struct (), and in 'master' it's done in
> scm_storage_prehistory ().
>
> This new approach entails registering one additional displacement.

Cool; if it’s just one displacement, that’s OK.

> What do you think?

It all looks perfectly reasonable to me!

Thanks for explaining,
Ludo’.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Immediate doubles (up to 2^256) and rationals coming to Guile 3
  2019-06-06 19:37 ` Arne Babenhauserheide
@ 2019-06-12  0:03   ` Mark H Weaver
  0 siblings, 0 replies; 18+ messages in thread
From: Mark H Weaver @ 2019-06-12  0:03 UTC (permalink / raw)
  To: Arne Babenhauserheide; +Cc: guile-devel

Hi Arne,

Arne Babenhauserheide <arne_bab@web.de> writes:

> I don’t understand the implications of this, but I realized that I would
> love to have info about the sizes of different datastructures in Guile.

Sure.

> I recently started measuring how much memory is required for a record
> compared to a list, so I could give people concrete guidelines which
> datastructures to use for which algorithm, and for that such low-level
> details would be great to have! (or just to know where to read them up —
> or which keywords to look for)

For core data structures implemented in C, the place to look is in the
corresponding *.h and *.c files in libguile, for example struct.[ch],
strings.[ch], numbers.[ch], etc.  However, pairs receive special
handling, and only part of their implementation is in pairs.[ch].

I will summarize the costs of some important structures below:

In the following, a "word" is the size of a pointer (4 bytes on a 32-bit
system, or 8 bytes on a 64-bit system).

I will _not_ include the SCM word itself in the accounting of the size
of the object represented by that SCM.  That's because for
heap-allocated objects, the SCM is simply a pointer to the object, and
there can be many pointers to the same object.  The cost of SCM words
will instead be included in the cost of the aggregate objects that
contain them.

In this method of accounting, immediate objects cost nothing, because
they are entirely stored within the SCM itself.

Pairs cost 2 words.  Lists cost 2 words per element.  The empty list is
immediate, and therefore costs nothing.

In current 'master', records cost 1 word per field, plus another word,
rounded up to the nearest multiple of 2 words.  In 2.0 and 2.2, they
cost 1 word per field, plus another _2_ words, rounded up

Vectors cost 1 word per element, plus another word, rounded up to the
nearest multiple of 2 words.

Bytevectors cost 4 words plus the data itself, rounded up to the nearest
multiple of 2 words.  As a special case, empty bytevectors cost nothing.

Non-immediate inexact real numbers cost 16 bytes.  Non-immediate complex
numbers cost 24 bytes on 32-bit systems, or 32 bytes on 64-bit systems.

In the common case, strings currently cost 6 words plus the character
data itself, rounded up to the nearest multiple of 2 words.  The
character data costs either 1 byte per character (if all code points are
less than 256, i.e. the string is latin-1), or 4 bytes per character.
(However, I may propose a new string representation for 3.0).

As a special case, the empty string costs nothing.

Finally, I should mention that the ability to share substructure in
lists and pairs make them potentially more memory efficient than
vectors, depending on the use case.  For example, if you need a
structure with 4 fields, and a common operation in your application will
involve copying the structure except for one field which is changed,
then you might be able save memory by using lists, even though a
4-element list requires more memory than a 4-element vector or struct.
If you put the commonly-changed field in the CAR, then you can update
that field by allocating only a single pair (2 words) and sharing the
rest of the structure with the old version.  This is not possible with
vectors or structs, where you must allocate a fresh new object of 5 or 6
words to perform the same operation.

     Regards,
       Mark



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-06-12  0:03 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-06-06  9:40 Immediate doubles (up to 2^256) and rationals coming to Guile 3 Mark H Weaver
2019-06-06 12:56 ` Mark H Weaver
2019-06-06 19:37 ` Arne Babenhauserheide
2019-06-12  0:03   ` Mark H Weaver
2019-06-06 20:09 ` tomas
2019-06-07 19:46 ` Ludovic Courtès
2019-06-09 16:56   ` Mark H Weaver
2019-06-09 17:30     ` Mark H Weaver
2019-06-11  8:39     ` Ludovic Courtès
2019-06-11 10:58       ` Mark H Weaver
2019-06-11 12:21         ` Ludovic Courtès
2019-06-11 11:34       ` Mark H Weaver
2019-06-08  1:13 ` Mark H Weaver
2019-06-08  8:07   ` Arne Babenhauserheide
2019-06-08  9:08     ` Chris Vine
2019-06-08  9:46       ` Arne Babenhauserheide
2019-06-08 10:24         ` Chris Vine
2019-06-08 13:12       ` Hans Åberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).