Caleb Ristvedt <caleb@russelstein.xyz> writes:

>> Still though, I'd like to hear what people think about which direction
>> the implementation should go, and what features they'd like to see. Even
>> if those are not essential to make the Guile implementation viable, it
>> still might inform the direction to take.
> Okay, brain dump time:
>
> I think that using fibers has a lot of potential, but there are
> obstacles that need to be worked around.  In the single-threaded case,
> we risk a big slowdown if multiple clients are active at once, since
> we're doing what used to be done by n processes with one single thread.
> It would be especially noticeable during big disk reads and writes,
> since those basically ignore O_NONBLOCK, and most procedures that act on
> entire files at once would therefore probably not hit many yield points.
> The worst situation would be where multiple worker fibers are attempting
> to do reference scanning at the same time.  Posix asynchronous disk IO
> could be used, but glibc currently implements it... using a thread pool.
> There is the RWF_NOWAIT flag to preadv2, though it's only available on
> newer linuxes and has bugs in 5.9 and 5.10.
>
> Additionally, being single-threaded means use of F_SETLKW is a no-go, so
> you're stuck with polling there.  Granted, that's not such a big issue if
> in 99% of use cases there is only one process doing the locking, so it
> can all be managed internally.

I think a thread pool is essential for using SQLite, so we're already
going to have thread pools interacting with fibers, so if more cases
come up where this is needed, then I think it's an option.

> Speaking of file locks, the risk of accidental clobbering of locks jumps
> way up once it's all moved in to one process, and IIRC we already have
> bugs with accidental clobbering of locks.  You can get a half-decent
> interface by doing what sqlite does, which is a combination of
> intra-process locking and holding on to open file descriptors until all
> locks on the underlying file are released.  There are some subtle
> pathological cases there that are a lot more likely in the guix daemon
> than in sqlite, though.  For example, suppose you open a file twice to
> get ports p1 and p2, acquire read locks on both of them, then close p1,
> then open the file again to get p3, acquire a read lock on it, close p2,
> get p4, acquire a read lock on it, close p3, get p5... and so on.  This
> will cause unbounded file descriptor usage, and eventually you'll run
> out.  There is no workaround in this model other than "hope that usage
> pattern doesn't come up much".  Additionally, you need to ensure that
> every close of a potentially-locked file goes through a special
> close-wrapper.
>
> I'm actually in the middle of working on a solution for this that
> involves a separate locker process that gets passed file descriptors to
> lock via a unix socket.
>
> Speaking of file descriptors, running the entire daemon in one process
> is going to mean much higher pressure on file descriptor resource usage.
> IIRC, while building a derivation, the closure of its inputs needs to be
> locked, and that means a file descriptor for each and every store item
> in its input closure, simultaneously.  The separate locker process would
> make it possible to retain those locks while not having them open in the
> main process.

Maybe that will still need to happen, however I think it might be
possible to replace the IPC through locking files with inter-fiber
communication inside of the dameon.

Backwards compatibility is the priority though, so this can only happen
in cases where that's unaffected.

> In the multithreaded case, fork() and clone() become concerns, since
> they can no longer be safely run from guile.  One way around this would
> be to use posix_spawn to produce a single-threaded guile process, then
> have that do the fork or clone as necessary.  The fork case shouldn't
> actually be necessary, though, as the child process can just exec
> directly.  In the clone case, CLONE_PARENT can be used to make the
> resulting process a child of the original, main process, though I don't
> know how portable that is to hurd (really, I don't know how namespace
> setup works in general on hurd).  Instead of doing this
> spawn-two-processes-to-spawn-one routine every time we want to set up a
> container, we could create a spawner-helper process once and just keep
> it around.  If we can do that before any threads are created, we don't
> even need posix_spawn (though it is nice to have around, and I do have
> bindings to it).  I remember reading that that's what the apache web
> server did.
>
> This would however mean some places would need to use interfaces like
> "eval-with-container" instead of "call-with-container", which is
> somewhat less convenient.  But code staging shouldn't be a terribly
> foreign concept to most guixers.
>
> Another concern is child process management; a blocking waitpid will of
> course block the calling thread, so something like a SIGCHLD handler or
> a dedicated reaper thread would be needed in order to simulate a
> blocking waitpid.  Personally I think it would be a good idea to go with
> something like Shepherd's process monitor approach, but with some
> changes.  First, move child reaping into the process monitor itself, so
> that all the SIGCHLD handler does is send a notification to the process
> monitor (and it should do this via a condition variable, not a channel,
> so that it doesn't block, since asyncs run on whatever fiber happens to
> be current on that thread's scheduler at the time, and this means it is
> possible for a signal's handler to be run from within the process
> monitor fiber).  Second, wrap all process-spawning procedures such that
> they now return <process> objects instead of PIDs.  A <process> object
> contains a PID, a condition variable signaled when the process is
> terminated, and a slot for holding the exit status.  Immediately before
> spawning a process, send a message to the process monitor temporarily
> disabling reaping, then spawn the process, create the <process> object,
> and register it with the process monitor, resuming reaping at the same
> time.  Then a waitpid replacement can very easily operate on these
> process objects.

I think the work on the shepherd has probably made things much easier in
this area, it is probably going to be tricky though as this isn't
something I've done much with before.

> Sqlite is yet another concern.  I haven't yet looked at how you've
> handled this in the build coordinator, but I'm curious.  Any blocking
> interface it has, such as a busy handler, isn't going to work very well.
> We could wrap the sqlite procedures with ones that retry with
> exponential backoff (which is what "PRAGMA busy_timeout = ..." does
> internally).  That would work, though not optimally.  I'm not sure of a
> better way, though - https://www.sqlite.org/c3ref/unlock_notify.html
> looks sort of right, but the documentation says that's just for "shared
> cache mode".  It seems a bit of a shame to keep guessing at when the
> database "might" next be available when the other entities accessing the
> database may well be in the very same process and so could just give us
> an earliest-possible checking point directly.

As I say above, I think the way to use SQLite is a thread pool. So you
have a channel to talk to the thread pool, you put-message on it with
some query/lambda and a response channel, then get-message on the
response channel. The thread pool thread takes care of reading and
writing to the channel, as well as calling in to the SQLite code to
actually execute the query.

> Those are all the design concerns that I had off the top of my head, I
> might recall some more later on.  Personally I think it would be prudent
> to design as if for multiple threads.
>
> On the subject of features, I would like it if downloaders (fixed-output
> derivations) had access to /gnu/store/.links so that they can easily
> look up whether a file with a given hash already exists, and copy it
> over if so.  Often when writing a package definition I'll run "guix
> download" to get the hash, and that will put it in the store as a side
> effect, but then when it comes time to build the package it will
> re-download it all over again because the file name is different.
> Technically that should already be achievable just by tweaking
> chroot-dirs and the downloaders.  It would also be nice if the same
> concept could be applied to directories, such as git repositories -
> perhaps a /gnu/store/.dirlinks with symbolic links?  Of course, those
> wouldn't be used for deduplication, just for easy access for
> fixed-output derivations.  A similar approach could also include e.g. a
> mounted DVD with The GNU Source Distribution on it.

I guess this would be nice, although you'd want to not couple fixed
output derivations with part of the daemon implementation. Maybe the
daemon could just look at the fixed output derivation though and work
out that it can do the work by grabbing the data and arranging it
properly, and avoid running the builder altogether.

> Oh, another feature that just occurred to me - the ability to
> automatically retry a failed derivation with fewer builds in parallel,
> or depending on system load or available memory.  It's quite annoying to
> run a guix command that is supposed to take multiple days, and have it
> end up taking more than a week because it fails several times each day -
> often bringing down several hours worth of progress with it due to very
> large derivations - because of some small package's flaky tests that
> fail under high load.

I think there's a whole load of interesting features in this area. The
build coordinator has a way of doing automatic retries, on the agents it
also has some tuned delays for starting new builds based on the system
load. There's more I think that could be done in terms of packing builds
on to a system to make good use of the hardware, whilst avoiding
resource based failures.

Thanks for your long email!

Chris