unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* reader musings
@ 2022-06-07 23:07 Stefan Israelsson Tampe
  2022-06-08  7:49 ` Maxime Devos
  0 siblings, 1 reply; 3+ messages in thread
From: Stefan Israelsson Tampe @ 2022-06-07 23:07 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 3240 bytes --]

After finishing a fast writer that balances the use of C and scheme in such
a way so that you do not need to spend a long time in C, and all custom
hooks are on the scheme level meaning a much more customizable and hackable
reader as well as performant.

Why not everything in scheme. Well C code can be dramatically faster even
if we jit and do some assembly magic's still taking advantage of SIMD
instructions we do not have much yet to play with. So in this project we
make hefty use of this. For example if we want to write latin1 to utf8 we
know that in most cases all letters are ascii with some instances of text
with a few special symbols outside the ascii set. And there are some texts
that all letters are outside. Anyhow if we have ascii text we can read in
16 bytes and in one instruction check if all are ascii, if not go to a
special loop and spend time there until it looks like you are back to ascii
text. Basically this. The tricky thing is to handle alignment and end of
stream (or shorter strings) also trying to be smart to get some speed
increase. Anyhow with this we can write more than 2G ascii characters / s
which is kind of cool.

Tonn's of similar optimizations makes this write tool speedy as hell. For
example a great amount of effort was put into writing real numbers in not
just decimal form, but also in hex octal and binary representations similar
to what you get from number->string.  The end result is more than 20M of
reals that can be printed. Efforts have also been made to write out lists
and vectors very fast if they contain mostly atoms. For more deep tree like
structures with just a small number of leafs at each node the writer is
just on par with guile's internal writer written entirely in C. Now we have
the advanced features in scheme and mostly the atoms are C - ified (like
numbers). What's still missing are some code to handle bigints for which we
make use of the number->string function.

Guile's internal writer does not handle floats well, it just converts the
float to double and then prints out using the double printer. Now in this
write tool we
put effort into writing out proper presentations of floats meaning that 1.2
will not be written as 1.200003734676  which leads to ugly imprecise and
inefficient management of float's.

Over to the reader. Reading reals in decimal form is 3X faster than guile's
string->number function and reading a bytevector of reals is 5X faster
(because guile has a high dispatch overhead). But we added a flag to the
writer so that we can specify that all numbers will be printed in hex form,
also doubles and floats. ANd now the reader can read in those values and do
that so that a bytevector of such coding is 50X faster than guile's vanilla
reader. And not only this, we do not lose precision by writing and reading
numbers. Now this is of limited use, as a binary representation is  usually
a better alternative, but still, if you just want to dump a data structure,
you get quite an improvement.

Code is at:
https://gitlab.com/tampe/guile-persist/-/tree/master/ice-9
https://gitlab.com/tampe/guile-persist/-/tree/master/src/write

I'm wondering if I should make a C library for other projects to take
advantage of this work.

Happy hacking

[-- Attachment #2: Type: text/html, Size: 3689 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-06-08 12:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-07 23:07 reader musings Stefan Israelsson Tampe
2022-06-08  7:49 ` Maxime Devos
2022-06-08 12:52   ` Stefan Israelsson Tampe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).