unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* reader musings
@ 2022-06-07 23:07 Stefan Israelsson Tampe
  2022-06-08  7:49 ` Maxime Devos
  0 siblings, 1 reply; 3+ messages in thread
From: Stefan Israelsson Tampe @ 2022-06-07 23:07 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 3240 bytes --]

After finishing a fast writer that balances the use of C and scheme in such
a way so that you do not need to spend a long time in C, and all custom
hooks are on the scheme level meaning a much more customizable and hackable
reader as well as performant.

Why not everything in scheme. Well C code can be dramatically faster even
if we jit and do some assembly magic's still taking advantage of SIMD
instructions we do not have much yet to play with. So in this project we
make hefty use of this. For example if we want to write latin1 to utf8 we
know that in most cases all letters are ascii with some instances of text
with a few special symbols outside the ascii set. And there are some texts
that all letters are outside. Anyhow if we have ascii text we can read in
16 bytes and in one instruction check if all are ascii, if not go to a
special loop and spend time there until it looks like you are back to ascii
text. Basically this. The tricky thing is to handle alignment and end of
stream (or shorter strings) also trying to be smart to get some speed
increase. Anyhow with this we can write more than 2G ascii characters / s
which is kind of cool.

Tonn's of similar optimizations makes this write tool speedy as hell. For
example a great amount of effort was put into writing real numbers in not
just decimal form, but also in hex octal and binary representations similar
to what you get from number->string.  The end result is more than 20M of
reals that can be printed. Efforts have also been made to write out lists
and vectors very fast if they contain mostly atoms. For more deep tree like
structures with just a small number of leafs at each node the writer is
just on par with guile's internal writer written entirely in C. Now we have
the advanced features in scheme and mostly the atoms are C - ified (like
numbers). What's still missing are some code to handle bigints for which we
make use of the number->string function.

Guile's internal writer does not handle floats well, it just converts the
float to double and then prints out using the double printer. Now in this
write tool we
put effort into writing out proper presentations of floats meaning that 1.2
will not be written as 1.200003734676  which leads to ugly imprecise and
inefficient management of float's.

Over to the reader. Reading reals in decimal form is 3X faster than guile's
string->number function and reading a bytevector of reals is 5X faster
(because guile has a high dispatch overhead). But we added a flag to the
writer so that we can specify that all numbers will be printed in hex form,
also doubles and floats. ANd now the reader can read in those values and do
that so that a bytevector of such coding is 50X faster than guile's vanilla
reader. And not only this, we do not lose precision by writing and reading
numbers. Now this is of limited use, as a binary representation is  usually
a better alternative, but still, if you just want to dump a data structure,
you get quite an improvement.

Code is at:
https://gitlab.com/tampe/guile-persist/-/tree/master/ice-9
https://gitlab.com/tampe/guile-persist/-/tree/master/src/write

I'm wondering if I should make a C library for other projects to take
advantage of this work.

Happy hacking

[-- Attachment #2: Type: text/html, Size: 3689 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: reader musings
  2022-06-07 23:07 reader musings Stefan Israelsson Tampe
@ 2022-06-08  7:49 ` Maxime Devos
  2022-06-08 12:52   ` Stefan Israelsson Tampe
  0 siblings, 1 reply; 3+ messages in thread
From: Maxime Devos @ 2022-06-08  7:49 UTC (permalink / raw)
  To: Stefan Israelsson Tampe, guile-devel

[-- Attachment #1: Type: text/plain, Size: 1441 bytes --]

Stefan Israelsson Tampe schreef op wo 08-06-2022 om 01:07 [+0200]:
> https://gitlab.com/tampe/guile-persist/-/tree/master/ice-9
> https://gitlab.com/tampe/guile-persist/-/tree/master/src/write
> 
> I'm wondering if I should make a C library for other projects to take
> advantage of this work.

Could they be integrated in Guile itself?  That would reach the most
people I think.


  int      exp = (((*((uint64_t *) &d)) >> 52) & ((1L<<11)-1)) - 1023L;
  uint64_t man = ((*((uint64_t *)  &d)) & ((1L<<52)-1L)) + (1L << 52);

double's aren't uint64_t, so maybe a strict aliasing vilation and hence
undefined behaviour.  If so, maybe use -fno-strict-aliasing, or use
type punning through an union?

Also, this assumes IEEE doubles, so maybe do some checks whether things
are actually IEEE (see m4/fpieee.m4, and maybe some checks like
sizeof(double)=sizeof(uint64_t) and alignof(double)=sizeof(uint64_t)
and check DBL_DIG, DBL_MANT_DIG, DBL_MAX_EXP, ...)?

That line also assumes 'long' = 'uint64_t' (or at least, that they have
the same size), which to me seems a bold assumption to make in the
general case.

Also, more generally, there was some paper on subtle errors that can
easily happen when printing floating point numbers and how to test for
them and avoid them, though I cannot find it anymore, and the
implementation isn't documented and doesn't seem to have automatic
tests.

Greetings,
Maxime.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: reader musings
  2022-06-08  7:49 ` Maxime Devos
@ 2022-06-08 12:52   ` Stefan Israelsson Tampe
  0 siblings, 0 replies; 3+ messages in thread
From: Stefan Israelsson Tampe @ 2022-06-08 12:52 UTC (permalink / raw)
  To: Maxime Devos; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 1774 bytes --]

I found a better way to find the mantissa and exponent e.g. frexp and when
constructing the double/float
ldexp is great. That makes the code more portable. These functions are
really fast and the call overhead
is not terrible.

On Wed, Jun 8, 2022 at 9:49 AM Maxime Devos <maximedevos@telenet.be> wrote:

> Stefan Israelsson Tampe schreef op wo 08-06-2022 om 01:07 [+0200]:
> > https://gitlab.com/tampe/guile-persist/-/tree/master/ice-9
> > https://gitlab.com/tampe/guile-persist/-/tree/master/src/write
> >
> > I'm wondering if I should make a C library for other projects to take
> > advantage of this work.
>
> Could they be integrated in Guile itself?  That would reach the most
> people I think.
>
>
>   int      exp = (((*((uint64_t *) &d)) >> 52) & ((1L<<11)-1)) - 1023L;
>   uint64_t man = ((*((uint64_t *)  &d)) & ((1L<<52)-1L)) + (1L << 52);
>
> double's aren't uint64_t, so maybe a strict aliasing vilation and hence
> undefined behaviour.  If so, maybe use -fno-strict-aliasing, or use
> type punning through an union?
>
> Also, this assumes IEEE doubles, so maybe do some checks whether things
> are actually IEEE (see m4/fpieee.m4, and maybe some checks like
> sizeof(double)=sizeof(uint64_t) and alignof(double)=sizeof(uint64_t)
> and check DBL_DIG, DBL_MANT_DIG, DBL_MAX_EXP, ...)?
>
> That line also assumes 'long' = 'uint64_t' (or at least, that they have
> the same size), which to me seems a bold assumption to make in the
> general case.
>
> Also, more generally, there was some paper on subtle errors that can
> easily happen when printing floating point numbers and how to test for
> them and avoid them, though I cannot find it anymore, and the
> implementation isn't documented and doesn't seem to have automatic
> tests.
>
> Greetings,
> Maxime.
>

[-- Attachment #2: Type: text/html, Size: 2503 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-06-08 12:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-07 23:07 reader musings Stefan Israelsson Tampe
2022-06-08  7:49 ` Maxime Devos
2022-06-08 12:52   ` Stefan Israelsson Tampe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).