After finishing a fast writer that balances the use of C and scheme in such a way so that you do not need to spend a long time in C, and all custom hooks are on the scheme level meaning a much more customizable and hackable reader as well as performant.

Why not everything in scheme. Well C code can be dramatically faster even if we jit and do some assembly magic's still taking advantage of SIMD instructions we do not have much yet to play with. So in this project we make hefty use of this. For example if we want to write latin1 to utf8 we know that in most cases all letters are ascii with some instances of text with a few special symbols outside the ascii set. And there are some texts that all letters are outside. Anyhow if we have ascii text we can read in 16 bytes and in one instruction check if all are ascii, if not go to a special loop and spend time there until it looks like you are back to ascii text. Basically this. The tricky thing is to handle alignment and end of stream (or shorter strings) also trying to be smart to get some speed increase. Anyhow with this we can write more than 2G ascii characters / s which is kind of cool.

Tonn's of similar optimizations makes this write tool speedy as hell. For example a great amount of effort was put into writing real numbers in not just decimal form, but also in hex octal and binary representations similar to what you get from number->string.  The end result is more than 20M of reals that can be printed. Efforts have also been made to write out lists and vectors very fast if they contain mostly atoms. For more deep tree like structures with just a small number of leafs at each node the writer is just on par with guile's internal writer written entirely in C. Now we have the advanced features in scheme and mostly the atoms are C - ified (like numbers). What's still missing are some code to handle bigints for which we make use of the number->string function.

Guile's internal writer does not handle floats well, it just converts the float to double and then prints out using the double printer. Now in this write tool we
put effort into writing out proper presentations of floats meaning that 1.2 will not be written as 1.200003734676  which leads to ugly imprecise and inefficient management of float's. 

Over to the reader. Reading reals in decimal form is 3X faster than guile's string->number function and reading a bytevector of reals is 5X faster (because guile has a high dispatch overhead). But we added a flag to the writer so that we can specify that all numbers will be printed in hex form, also doubles and floats. ANd now the reader can read in those values and do that so that a bytevector of such coding is 50X faster than guile's vanilla reader. And not only this, we do not lose precision by writing and reading numbers. Now this is of limited use, as a binary representation is  usually a better alternative, but still, if you just want to dump a data structure, you get quite an improvement. 

Code is at:
https://gitlab.com/tampe/guile-persist/-/tree/master/ice-9
https://gitlab.com/tampe/guile-persist/-/tree/master/src/write

I'm wondering if I should make a C library for other projects to take advantage of this work.

Happy hacking