unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* [ANN] guile-wiredtiger 0.8.0
@ 2019-05-16 21:05 Amirouche
  2019-05-17 10:11 ` Amirouche
  0 siblings, 1 reply; 5+ messages in thread
From: Amirouche @ 2019-05-16 21:05 UTC (permalink / raw)
  To: guile-user gnu

I am please to announce the release of guile-wiredtiger 0.8.0.

You can find it at:

   https://framagit.org/a-guile-mind/guile-wiredtiger/

Or using my guix channel:

   $ cat ~/.config/guix/channels.scm
   (cons (channel
           (name 'amz3)
           (url "https://git.sr.ht/~amz3/guix-amz3-channel"))
         %default-channels)
   $ guix pull
   $ guix package -i guile-wiredtiger@0.8.0

Here is the list of changes:

- add support for single bytes column
- fix bug in rollback
- rely on guile-r7rs
- remove null byte added in strings
- add some benchmarks...
- add session-reset
- rename cursor-search to cursor-search?
- improve cursor-search-near to return a symbol or #f
- rename cursor-next and cursor-prev to cursor-next? and cursor-prev?
- %key-not-found is not public anymore, no need.

The main additions are:

- add (wiredtiger pack) lexicographic packing of scheme object
- add (wiredtiger okvs) SRFI-167
- add (wiredtiger nstore) SRFI-168

Eventually, I figured what went wrong. I faced two issues:

- wiredtiger raising WT_ROLLBACK using a single application thread
   and a single session which was due to the fact that I did not have
   a big enough cache for storing the whole transaction in memory.
   This is solved when using okvs with 'cache key set to a "reasonable"
   value. With gotofish I set the value to 1GB. It doesn't mean that
   the transaction can be 1GB big, it means that wiredtiger will use at
   most 1GB to execute a transaction.

- my program leaking memory. I am not sure but it is unlikely that guile
   part of the code leaks memory [...] AND I experimented with both Chez
   Scheme and Python, they both seems to leak memory. The latter takes
   more time but in the end the result is the same. I don't have mongodb
   confirmation, to my mind it is again due to a configuration problem.
   The default configuration of wiredtiger use one thread for cache 
eviction.
   That is there is a single thread dedicated to fighting the growth of 
the
   cache using some Least Recently Used algorithm IIRC. Anyway, setting
   okvs 'eviction-trigger to 85% (aka. triggers eviction when 85% of the 
cache
   is filled) and using 4 threads for eviction itself, allows 
gotofish.scm
   to complete its mission.

The key word is fine-tuning. That is what makes the database works.

So if you read the above carefully you figured that gotofish can index
wikipedia vital articles level 3 that is 500MB big in two hours. Let's
try it:

   $ time guile -L . gotofish.scm search GNU

   ** 0.09737717752984928: 
data/wikipedia-vital-articles-level-3/Mathematics/Arithmetic/Division_%28mathematics%29
   ** 0.07194504699927694: 
data/wikipedia-vital-articles-level-3/Mathematics/Geometry/Trigonometry
   ** 0.06146528292562392: 
data/wikipedia-vital-articles-level-3/Mathematics/Other/Probability
   ** 0.03677014042867702: data/wikipedia-vital-articles-level-3/Society 
and social sciences/Language/Cyrillic_script
   ** 0.03422772617819057: 
data/wikipedia-vital-articles-level-3/Technology/Food and 
health/Medical_imaging
   ** 0.021683228730822873: 
data/wikipedia-vital-articles-level-3/Technology/Computing and 
information technology/Computer

   real	0m3.760s
   user	0m1.680s
   sys	0m2.090s

Three seconds is not bad since it includes the time necessary to open
the database. Also it is using a USB SSD. By the way, the database
behaves better on SSD without encryption...

gotofish code will prolly end up in guile-wiredtiger repository as an
example. In the mean time, it is available at:

   https://git.sr.ht/~amz3/guile-gotofish

Last but not least, there is still a non-determinist error about locale
that fails to be set, I don't know where it is coming from.

# What the future will bring

Regarding guile-wiredtiger, I hope to keep the interface as is. What I 
plan
to do is:

- drop the use of guile-bytestructures OR add support somehow for 
function
   pointer in C structs.

- Optimize for the single bytes column using a dedicated set of 
procedures

- Improve the support of scheme object in (wiredtiger pack)

By the way, I did some testing using guile-next from guix and nothing 
weird
happened.

I also tried sqlite3 lsm extension but it is (also!) leaking memory. So,
I will not take that route right now. There is also the possibility to
use rocksdb or even postgresql. Anyway, I prefer to continue building 
datae [0]
and then redo benchmark and switch database when I see it must be done.
That is SRFI 168 rely on SRFI 167 and it is easy to switch backend. I 
tried
with foundationdb. Once you have the okvs interface tested, you just 
have to
drop nstore.scm in your project and re-run the tests for nstore.scm. 
That the
magic of abstractions :)


Happy hacking!


[0] https://github.com/awesome-data-distribution/datae



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ANN] guile-wiredtiger 0.8.0
  2019-05-16 21:05 [ANN] guile-wiredtiger 0.8.0 Amirouche
@ 2019-05-17 10:11 ` Amirouche
  2019-05-17 11:24   ` Nala Ginrut
  2019-05-17 19:10   ` [ANN] guile-wiredtiger 0.8.1 Amirouche
  0 siblings, 2 replies; 5+ messages in thread
From: Amirouche @ 2019-05-17 10:11 UTC (permalink / raw)
  To: guile-user gnu; +Cc: guile-user

On 2019-05-16 23:05, Amirouche wrote:
> 
> - my program leaking memory. I am not sure but it is unlikely that 
> guile
>   part of the code leaks memory [...] AND I experimented with both Chez
>   Scheme and Python, they both seems to leak memory. The latter takes
>   more time but in the end the result is the same. I don't have mongodb
>   confirmation, to my mind it is again due to a configuration problem.
>   The default configuration of wiredtiger use one thread for cache 
> eviction.
>   That is there is a single thread dedicated to fighting the growth of 
> the
>   cache using some Least Recently Used algorithm IIRC. Anyway, setting
>   okvs 'eviction-trigger to 85% (aka. triggers eviction when 85% of the 
> cache
>   is filled) and using 4 threads for eviction itself, allows 
> gotofish.scm
>   to complete its mission.

Here is memory plot of the run of gotofish: 
https://i.paste.pics/5959b4e53f8197af3253a812e86a43c7.png

Like I try to explain, my understanding is not supposed to sustain a lot 
of write
for a long period of time.

> 
> The key word is fine-tuning. That is what makes the database works.
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ANN] guile-wiredtiger 0.8.0
  2019-05-17 10:11 ` Amirouche
@ 2019-05-17 11:24   ` Nala Ginrut
  2019-05-17 12:17     ` Amirouche
  2019-05-17 19:10   ` [ANN] guile-wiredtiger 0.8.1 Amirouche
  1 sibling, 1 reply; 5+ messages in thread
From: Nala Ginrut @ 2019-05-17 11:24 UTC (permalink / raw)
  To: Amirouche; +Cc: guile-user gnu, guile-user

Congrats!
Do you think it could be standalone NOSQL database and integrated to Artanis?

On Fri, May 17, 2019 at 6:14 PM Amirouche <amirouche@hyper.dev> wrote:
>
> On 2019-05-16 23:05, Amirouche wrote:
> >
> > - my program leaking memory. I am not sure but it is unlikely that
> > guile
> >   part of the code leaks memory [...] AND I experimented with both Chez
> >   Scheme and Python, they both seems to leak memory. The latter takes
> >   more time but in the end the result is the same. I don't have mongodb
> >   confirmation, to my mind it is again due to a configuration problem.
> >   The default configuration of wiredtiger use one thread for cache
> > eviction.
> >   That is there is a single thread dedicated to fighting the growth of
> > the
> >   cache using some Least Recently Used algorithm IIRC. Anyway, setting
> >   okvs 'eviction-trigger to 85% (aka. triggers eviction when 85% of the
> > cache
> >   is filled) and using 4 threads for eviction itself, allows
> > gotofish.scm
> >   to complete its mission.
>
> Here is memory plot of the run of gotofish:
> https://i.paste.pics/5959b4e53f8197af3253a812e86a43c7.png
>
> Like I try to explain, my understanding is not supposed to sustain a lot
> of write
> for a long period of time.
>
> >
> > The key word is fine-tuning. That is what makes the database works.
> >
>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ANN] guile-wiredtiger 0.8.0
  2019-05-17 11:24   ` Nala Ginrut
@ 2019-05-17 12:17     ` Amirouche
  0 siblings, 0 replies; 5+ messages in thread
From: Amirouche @ 2019-05-17 12:17 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: guile-user gnu, guile-user

On 2019-05-17 13:24, Nala Ginrut wrote:
> Congrats!
> Do you think it could be standalone NOSQL database and integrated to 
> Artanis?
> 

Thanks!

My plan to work with it embedded in the scheme process. So no, it is not 
stand-alone
for the time being.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [ANN] guile-wiredtiger 0.8.1
  2019-05-17 10:11 ` Amirouche
  2019-05-17 11:24   ` Nala Ginrut
@ 2019-05-17 19:10   ` Amirouche
  1 sibling, 0 replies; 5+ messages in thread
From: Amirouche @ 2019-05-17 19:10 UTC (permalink / raw)
  To: guile-user gnu; +Cc: guile-user

On 2019-05-17 12:11, Amirouche wrote:
>> 
>> The key word is fine-tuning. That is what makes the database works.
>> 

I made a quick fix that was planned but forgot about and made a new
release.

Here is the diff:

   
https://framagit.org/a-guile-mind/guile-wiredtiger/commit/64f33033e85bcd970d1599c9f54f26dfe462c55e

On a related not, I turned gotofish.scm into a web app running guix
guile-next, and indexing a bunch of scheme related websites. I will
publish the url once the indexing is finished.


Happy week-end!



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-05-17 19:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-16 21:05 [ANN] guile-wiredtiger 0.8.0 Amirouche
2019-05-17 10:11 ` Amirouche
2019-05-17 11:24   ` Nala Ginrut
2019-05-17 12:17     ` Amirouche
2019-05-17 19:10   ` [ANN] guile-wiredtiger 0.8.1 Amirouche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).