unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Performance issues with using local-file/interned-file/add-to-store
@ 2017-09-17 16:42 Christopher Baines
  2017-09-17 20:22 ` Ludovic Courtès
  0 siblings, 1 reply; 4+ messages in thread
From: Christopher Baines @ 2017-09-17 16:42 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 979 bytes --]

Hey,

So I've been playing around with managing some data files with Guix. On
the whole, this is working quite nicely so far, but I'm having some
performance issues with using the local-file gexp.

Using it for large files (~1 to ~4 GB in my case) causes the
guix-daemon to use a large amount of CPU and memory. This isn't a
problem, but as the data files I'm working with don't change, once the
file has been added to the store, it doesn't need to be added again.

The slow performance means that even if nothing needs adding to
the store or building, it can take a while to work that out, as all the
big files that you are using have to be added to the store again anyway.

It would be good for me to have the option to cache the result of
running add-to-store through the local-file gexp, perhaps using the
full filename, or the hash as the cache key?

Is anyone else encountering similar issues, or have ideas about
speeding this up?

Thanks,

Chris

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 963 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Performance issues with using local-file/interned-file/add-to-store
  2017-09-17 16:42 Performance issues with using local-file/interned-file/add-to-store Christopher Baines
@ 2017-09-17 20:22 ` Ludovic Courtès
  2017-09-17 20:52   ` Christopher Baines
  0 siblings, 1 reply; 4+ messages in thread
From: Ludovic Courtès @ 2017-09-17 20:22 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

Hello!

Christopher Baines <mail@cbaines.net> skribis:

> So I've been playing around with managing some data files with Guix. On
> the whole, this is working quite nicely so far, but I'm having some
> performance issues with using the local-file gexp.
>
> Using it for large files (~1 to ~4 GB in my case) causes the
> guix-daemon to use a large amount of CPU and memory. This isn't a
> problem, but as the data files I'm working with don't change, once the
> file has been added to the store, it doesn't need to be added again.

The memory issue in guix-daemon is a Known Problem, see
<https://bugs.gnu.org/23666>.

I guess it wasn’t a pressing issue until now because almost all our use
cases were dealing with small files.

> The slow performance means that even if nothing needs adding to
> the store or building, it can take a while to work that out, as all the
> big files that you are using have to be added to the store again anyway.
>
> It would be good for me to have the option to cache the result of
> running add-to-store through the local-file gexp, perhaps using the
> full filename, or the hash as the cache key?

Internally, (guix store) has a cache (see ‘add-to-store-cache’) to make
sure that, during a session, the ‘add-to-store’ RPC for a given file is
done once only.

If you have large files, that single RPC can already be a lot, though.

In the “RPC pipelining” thread¹, I proposed a patch that allows us to
avoid actually making the ‘add-to-store’ RPC.  The patch changes
‘add-to-store’ to compute the resulting store file name locally, without
actually making the RPC, and makes that RPC at a later time.

This approach is not helpful performance-wise for small files and when
talking to a local daemon.  However, it could serve as a trick for large
files, where we could do something like:

  1. Compute store file name for large file on the client side;

  2. Call ‘add-temp-root’ for that store item; if that works, that means
     it’s already in store, otherwise we need to do ‘add-to-store’.

The downside is that #1 requires traversing the whole file, so maybe it
doesn’t help.

Hmm…

Of course we could also have a cache in ~/.cache/guix for all this, as a
last resort.

Thoughts?

Ludo’.

¹ https://lists.gnu.org/archive/html/guix-devel/2017-07/msg00135.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Performance issues with using local-file/interned-file/add-to-store
  2017-09-17 20:22 ` Ludovic Courtès
@ 2017-09-17 20:52   ` Christopher Baines
  2017-09-18  7:58     ` Ludovic Courtès
  0 siblings, 1 reply; 4+ messages in thread
From: Christopher Baines @ 2017-09-17 20:52 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 3661 bytes --]

On Sun, 17 Sep 2017 22:22:32 +0200
ludo@gnu.org (Ludovic Courtès) wrote:

> Hello!
> 
> Christopher Baines <mail@cbaines.net> skribis:
> 
> > So I've been playing around with managing some data files with
> > Guix. On the whole, this is working quite nicely so far, but I'm
> > having some performance issues with using the local-file gexp.
> >
> > Using it for large files (~1 to ~4 GB in my case) causes the
> > guix-daemon to use a large amount of CPU and memory. This isn't a
> > problem, but as the data files I'm working with don't change, once
> > the file has been added to the store, it doesn't need to be added
> > again.  
> 
> The memory issue in guix-daemon is a Known Problem, see
> <https://bugs.gnu.org/23666>.
> 
> I guess it wasn’t a pressing issue until now because almost all our
> use cases were dealing with small files.

Ah, ok, that makes sense then. Maybe this can be improved on with the
guile guix-daemon...

> > The slow performance means that even if nothing needs adding to
> > the store or building, it can take a while to work that out, as all
> > the big files that you are using have to be added to the store
> > again anyway.
> >
> > It would be good for me to have the option to cache the result of
> > running add-to-store through the local-file gexp, perhaps using the
> > full filename, or the hash as the cache key?  
> 
> Internally, (guix store) has a cache (see ‘add-to-store-cache’) to
> make sure that, during a session, the ‘add-to-store’ RPC for a given
> file is done once only.

I did see this, but didn't really think about it when thinking about
caching. But yes, this does do caching, but with the way I'm working at
the moment, it doesn't quite persist long enough.

> If you have large files, that single RPC can already be a lot, though.
> 
> In the “RPC pipelining” thread¹, I proposed a patch that allows us to
> avoid actually making the ‘add-to-store’ RPC.  The patch changes
> ‘add-to-store’ to compute the resulting store file name locally,
> without actually making the RPC, and makes that RPC at a later time.
> 
> This approach is not helpful performance-wise for small files and when
> talking to a local daemon.  However, it could serve as a trick for
> large files, where we could do something like:
> 
>   1. Compute store file name for large file on the client side;
> 
>   2. Call ‘add-temp-root’ for that store item; if that works, that
> means it’s already in store, otherwise we need to do ‘add-to-store’.
> 
> The downside is that #1 requires traversing the whole file, so maybe
> it doesn’t help.

I'll go and have a read of that thread again, but my impression of this
is that it might open up the opportuntity to forgo the add-to-store if
you can confirm that the relevant store already has the thing in it.

Regarding the small/large file tradeoff, a file size threshold could be
hardcoded and used to decide which method to use.

> Hmm…
> 
> Of course we could also have a cache in ~/.cache/guix for all this,
> as a last resort.

That was something I was thinking of, caching things with the key being
the filename or hash, but not doing this by default, as it probably
wouldn't improve the general use case.

> Thoughts?

There does seem to be an intersection between the pipelining
requirements for computing the store file name, and the potential to
use that name in working out if the add-to-store even needs to happen.

Maybe computing the store file name for large files, and using that to
avoid expensive add-to-store calls is a good starting point?

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 963 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Performance issues with using local-file/interned-file/add-to-store
  2017-09-17 20:52   ` Christopher Baines
@ 2017-09-18  7:58     ` Ludovic Courtès
  0 siblings, 0 replies; 4+ messages in thread
From: Ludovic Courtès @ 2017-09-18  7:58 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

Morning!

Christopher Baines <mail@cbaines.net> skribis:

> There does seem to be an intersection between the pipelining
> requirements for computing the store file name, and the potential to
> use that name in working out if the add-to-store even needs to happen.
>
> Maybe computing the store file name for large files, and using that to
> avoid expensive add-to-store calls is a good starting point?

Yes, though we’d have to check whether that really improves performance
since it’d have to read the whole file in order to compute its hash.

Ludo’.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-09-18  7:58 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-17 16:42 Performance issues with using local-file/interned-file/add-to-store Christopher Baines
2017-09-17 20:22 ` Ludovic Courtès
2017-09-17 20:52   ` Christopher Baines
2017-09-18  7:58     ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).