From mboxrd@z Thu Jan 1 00:00:00 1970 From: Roel Janssen Subject: Re: Performance on NFS Date: Sun, 18 Jun 2017 01:15:42 +0200 Message-ID: <87poe22okh.fsf@gnu.org> References: <20170527105641.9426-1-mail@cbaines.net> <20170527123113.1ca668e7@cbaines.net> <87tw424cap.fsf@gnu.org> <87fufhkw85.fsf@gnu.org> <871sr0ok2h.fsf@gnu.org> <8760gbh2th.fsf@gnu.org> <87efuym57c.fsf@gnu.org> <87a85kt7ad.fsf_-_@gnu.org> <87a85hnvqm.fsf@gnu.org> <87tw3lyz7t.fsf@inria.fr> <87poe4c5y9.fsf_-_@gnu.org> <87fuezkquf.fsf@gnu.org> <87lgoq8ch8.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:53730) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dMMwe-0000Sn-Cv for guix-devel@gnu.org; Sat, 17 Jun 2017 19:15:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dMMwb-0003rC-8B for guix-devel@gnu.org; Sat, 17 Jun 2017 19:15:56 -0400 In-reply-to: <87lgoq8ch8.fsf@gnu.org> List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Ludovic =?utf-8?Q?Court=C3=A8s?= Cc: guix-devel@gnu.org Ludovic Courtès writes: > Hi! > > Roel Janssen skribis: > >> I applied the patch, and here are the results: >> >> [roel@hpcguix guix]$ time guixr environment --ad-hoc coreutils --pure -- true >> The following derivations will be built: >> /gnu/store/0hz8g844432b5h9zbqr9cpsjy0brg15h-profile.drv >> /gnu/store/wkksb7bbx3jr0p6p5cj4kkphbwday0yd-info-dir.drv >> /gnu/store/cd2mwx9qprdy23p7j3pik2zs14nifn36-manual-database.drv >> Creating manual page database for 1 packages... done in 1.816 s >> >> real 1m14.686s >> user 0m5.761s >> sys 0m0.498s >> [roel@hpcguix guix]$ time guixr environment --ad-hoc coreutils --pure -- true >> >> real 0m34.100s >> user 0m5.599s >> sys 0m0.414s >> [roel@hpcguix guix]$ time guixr environment --ad-hoc coreutils --pure -- true >> >> real 0m33.821s >> user 0m5.140s >> sys 0m0.432s > > You’re telling me it’s just as bad as before, right? Sorry for the somewhat hasted response. Well, before it was more variable what the speed was. Now it seems to be pretty stable around ~30 to ~35 seconds with grafting, and ~15 to ~20 seconds without grafting. I really appreciate the effort for optimizing. And I feel it is improving. > >> [roel@hpcguix guix]$ time guixr environment --ad-hoc coreutils --pure --no-substitutes --no-grafts -- true >> The following derivations will be built: >> /gnu/store/rvh0imjdimwm90nzr0fmr5gmp97lyiix-profile.drv >> /gnu/store/5hm3v4afjf9gix92ixqzv9bwc11a608s-fonts-dir.drv >> >> real 0m37.200s >> user 0m3.408s >> sys 0m0.284s >> [roel@hpcguix guix]$ time guixr environment --ad-hoc coreutils --pure --no-substitutes --no-grafts -- true >> >> real 0m19.415s >> user 0m3.466s >> sys 0m0.306s >> [roel@hpcguix guix]$ time guixr environment --ad-hoc coreutils --pure --no-substitutes --no-grafts -- true >> >> real 0m18.850s >> user 0m3.536s >> sys 0m0.346s >> [roel@hpcguix guix]$ time guixr environment --ad-hoc coreutils --pure --no-grafts -- true >> >> real 0m16.003s >> user 0m3.246s >> sys 0m0.301s >> [roel@hpcguix guix]$ time guixr environment --ad-hoc coreutils --pure --no-grafts -- true >> >> real 0m18.205s >> user 0m3.470s >> sys 0m0.314s >> [roel@hpcguix guix]$ time guixr environment --ad-hoc coreutils --pure --no-substitutes -- true >> >> real 0m33.731s >> user 0m5.111s >> sys 0m0.428s >> [roel@hpcguix guix]$ time guixr environment --ad-hoc coreutils --pure --no-substitutes -- true >> >> real 0m30.993s >> user 0m5.049s >> sys 0m0.458s >> >> Why is grafting so slow, even if it doesn't have to graft anything? > > Grafting leads to a bunch of additional RPCs: > > --8<---------------cut here---------------start------------->8--- > $ GUIX_PROFILING=rpc ./pre-inst-env guix build coreutils > /gnu/store/mskh7zisxa313anqv68c5lr4hajldjc5-coreutils-8.27-debug > /gnu/store/xbvwxf4k5njnb3hn93xwqlppjkiz4hdv-coreutils-8.27 > Remote procedure call summary: 379 RPCs > build-things ... 1 > built-in-builders ... 1 > valid-path? ... 5 > query-substitutable-path-infos ... 8 > query-references ... 22 > query-valid-derivers ... 48 > add-text-to-store ... 294 > $ GUIX_PROFILING=rpc ./pre-inst-env guix build coreutils --no-grafts > /gnu/store/mskh7zisxa313anqv68c5lr4hajldjc5-coreutils-8.27-debug > /gnu/store/xbvwxf4k5njnb3hn93xwqlppjkiz4hdv-coreutils-8.27 > Remote procedure call summary: 294 RPCs > built-in-builders ... 1 > query-substitutable-path-infos ... 1 > build-things ... 1 > valid-path? ... 5 > add-text-to-store ... 286 > --8<---------------cut here---------------end--------------->8--- > > So the problem is probably not NFS in this case but rather RPC > performance. > > However, I can’t help with this until you drop ‘guixr’ and use > GUIX_DAEMON_SOCKET=guix:// instead. Hint hint. ;-) This is what guixr is already doing, see: https://github.com/UMCUGenetics/guix-additions/blob/master/umcu/packages/guix.scm#L95 So I went a little bit further and did this: [roel@hpcguix ~]$ export GUIX_DAEMON_SOCKET="/gnu/daemon-socket/socket" [roel@hpcguix ~]$ export NIX_STATE_DIR=/gnu This means that if I run "guix" on the same machine as where guix-daemon is running, and communicate over the UNIX socket, I should not experience a performance problem, other than a little bit of NFS latency.. Here are the timings: [roel@hpcguix ~]$ time guix environment --ad-hoc coreutils --pure --no-grafts -- true real 0m16.293s user 0m2.550s sys 0m0.274s [roel@hpcguix ~]$ time guix environment --ad-hoc coreutils --pure --no-grafts -- true real 0m15.746s user 0m2.411s sys 0m0.260s [roel@hpcguix ~]$ time guix environment --ad-hoc coreutils --pure -- true real 0m32.821s user 0m4.342s sys 0m0.375s [roel@hpcguix ~]$ time guix environment --ad-hoc coreutils --pure -- true real 0m31.140s user 0m4.361s sys 0m0.312s Looks pretty much the same. Now, the only thing that can be trouble, other than NFS's overhead, is that the daemon socket is on the NFS mount as well. I don't think that matters because the socket is not an actual file from the kernel's perspective, so there shouldn't be any NFS involvement there. Nevertheless, I will try mounting the /gnu/daemon-socket directory on a tmpfs (in memory) for maximum speed. There's "nfsstat" that can display some NFS numbers. Here's what the client information looks like (run on hpcguix): [roel@hpcguix ~]$ nfsstat -c Client rpc stats: calls retrans authrefrsh 21090135 0 21097679 Client nfs v3: null getattr setattr lookup access readlink 0 0% 17580080 83% 500076 2% 842458 3% 930443 4% 1142 0% read write create mkdir symlink mknod 153231 0% 313668 1% 232301 1% 60846 0% 10604 0% 80 0% remove rmdir rename link readdir readdirplus 181436 0% 53549 0% 74060 0% 75984 0% 278 0% 76912 0% fsstat fsinfo pathconf commit 1402 0% 6 0% 3 0% 2481 0% The high number of getattr is probably due to the attribute cache. I also tried disabling the attribute cache with the "noac" mount option. This lead to timings of (stably) around 2 minutes (with grafts enabled). What worries me from these numbers is the authrefrsh, which are almost equal to the number of calls. What I think it means is that for almost every call, it also checks whether the client is still authorized to get data from the NFS server, effectively doubling the network packets, doubling the network overhead. What I can try here is disable authentication. I am not sure whether that is a good solution, but at least we can get some performance impact numbers here. I will report back with my findings here. > > Thanks for your feedback, > Ludo’. Thanks a lot for your efforts in speeding things up! Kind regards, Roel Janssen