Re: System deployment commands hoarding all RAM

all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed

* Re: System deployment commands hoarding all RAM
@ 2024-06-02 22:10 Juliana Sims
  2024-06-10 18:02 ` Sergio Pastor Pérez
  0 siblings, 1 reply; 3+ messages in thread
From: Juliana Sims @ 2024-06-02 22:10 UTC (permalink / raw)
  To: sergio.pastorperez; +Cc: michal_atlas+gnu, guix-devel

Hi Sergio,

Continuing on from our out-of-band conversation about this topic, the 
most likely cause of your RAM issue is that you use recursion 
extensively, but not in tail position. This means that Guile cannot 
optimize this recursion to re-use stack frames and thus your memory 
usage is unbounded. I'll try to explain a bit more what that means in 
this email in the hopes that you'll be able to resolve it. Forgive me 
if I repeat things you already know; I want to make sure you have all 
the information you need to solve your problem.

You know what recursion is: calling a function from inside itself (or 
one of the functions it calls). You mentioned you've also heard of tail 
calls. I'll go ahead and describe tail calls a couple ways just in case 
these different descriptions help something click for you. I know I 
needed to hear several different explanations before I really 
understood them.

The common definition is that a tail call is a call made as the last 
thing a function does. I think of it as calling a function when there's 
no more work to do in the current function. Functions called in this 
way are said to be "in tail position." Tail recursion is simply making 
a recursive call in tail position. Here's a silly example defined first 
without tail recursion, then with:
```
(define (add-x n x)
  "Add @var{x} to @var{n} one at a time."
  (if (= x 0)
      n
      (+ 1 (add-x n (- x 1)))))

(define (add-x-tail n x)
  (if (= x 0)
      n
      (add-x-tail (+ n 1) (- x 1))))
```
An important note: while the recursive call is the last line of text in 
the tail-recursive function definition, this isn't necessarily 
required. That idea threw me off for quite some time. The following 
function is also tail recursive and produces the same output as the 
other two definitions:
```
(define (add-x-tail-2 n x)
  (if (not (= x 0))
      (add-x-tail-2 (+ n 1) (- x 1))
      n))
```

Let's return to the first definition for now. You may notice that the 
recursive call happens inside of a call to `+`. Because arguments have 
to be fully evaluated to values in order to be passed to functions, the 
`+` call cannot be fully evaluated without the result of the recursive 
call. We therefore have to keep around the outer `add-x` call so that 
we can fully evaluate `+` and return its value from `add-x`.

Let's walk through expanding a quick example, skipping some 
intermediary evaluation to save space and using ellipses to replace 
parts of the function we don't care about anymore but have to keep 
around anyway:

```
(add-x 3 2)
-> ;; replace add-x with its body, replacing variable names with values
(if (= 2 0)
    3
    (+ 1 (add-x 3 1)))
-> ;; repeat the above steps, replacing the inner call to add-x with 
its body and its variable names with their values
(... (+ 1 (if (= 1 0)
              3
              (+ 1 (add-x 3 0)))))
-> ;; and again
(... (+ 1 (... (+ 1 (if (= 0 0) 3))))) ;; we skip writing the second 
arm because we don't evaluate it
-> ;; now that everything is fully expanded, we can begin evaluating in 
ernest
(... (+ 1 (... (+ 1 3))))
->
(... (+ 1 4))
->
5
```

Compare that to the second function definition. In this definition, the 
`+` and `-` calls both happen inside the call to `add-x-tail`. These 
are operating on known values and can be fully evaluated before the 
call to `add-x-tail`. That is, there is no information in the first 
`add-x-tail` call that is required to use the result of the second (or 
third, fourth, etc) call to `add-x-tail`. This reveals another way to 
think of tail calls. Tail calls are calls for which all arguments are 
known and whose result is immediately returned by the calling function.

Here's where we introduce a new term: recursive tail call optimization. 
Tail calls and tail recursion just describe situations you find in 
code. In and of themselves, they only talk about what code does on a 
theoretical level. Tail call optimization is a way to take advantage of 
the theoretical aspects I've been discussing. I've been framing tail 
calls as calls which do not need any information from the function that 
calls them. Recursive tail call optimization allows the compiler to 
replace the arguments of the calling function with the new argument 
values and then simply re-evaluate that function in-place. If you're 
familiar with how computers operate at the level of machine code and 
the stack, recursive tail call optimization is usually implemented by 
moving the new arguments into the same argument registers as were used 
for the first call to a function, then jumping back to the beginning of 
the function's stack frame. Guile (and all Scheme implementations 
compliant with R5RS or later standards) does something very similar.

We'll step through evaluation of our tail recursive definition with 
this understanding in mind, as well as previous rules for making the 
steps shorter:
```
(add-x-tail 3 2)
-> ;; replace add-x-tail with its body and replace variable names with 
their values
(if (= 2 0)
    3
    (add-x-tail 4 1))
-> ;; we only need to replace variables this time
(if (= 1 0)
    4
    (add-x-tail 5 0))
-> ;; and now we can finish evaluation
(if (= 0 0) 5)
->
5
```

As you can see, we just replaced the arguments in-line and evaluated 
the function over. When deciding if you're using a recursive tail call, 
ask yourself: if I replace the arguments to the current call with the 
arguments to the next call and evaluate the function again with no 
other changes, will I get the result I want? If the answer is yes, then 
congratulations! That's a recursive tail call!

Now, that's all just to help you understand recursive tail calls, why 
they're useful, and how to recognize them. That doesn't actually help 
you write them. And even when you understand recursive tail calls well 
enough to write them reliably, it's often non-trivial to represent a 
particular computational task using only recursive tail calls. Still, 
hopefully this helps.

Here are some other explanations of tail calls and tail recursion which 
may also help:
* a brief description and demonstration from Computerphile on YouTube 
(I recommend a free software frontend): https://youtu.be/_JtPhF8MshA
* a more thorough treatment of the topic from "An Introduction to 
Scheme and Its Implementation": 
https://www.cs.utexas.edu/ftp/garbage/cs345/schintro-v14/schintro_127.html
* the entire sixth part of "How to Design Programs" talks about 
recursive tail calls and how to define functions using them (note that 
this section builds on the rest of the book so it may not be 
immediately obvious what it's saying without other context): 
https://htdp.org/2023-8-14/Book/part_six.html

I'm sure other seasoned Schemers on this list can help refine this 
explanation or point you to other resources, and they may even be able 
to point you towards tail-recursive solutions to your problem. I don't 
have the time to try my hand at it at the moment, unfortunately, but I 
wish you good luck and happy hacking!

-Juli

PS If you've never spent time with "How to Design Programs" (HtDP), I 
highly recommend it. It can be slow going if you're not new to 
programming, but the design recipe and the program design discipline it 
teaches are the most valuable education I've ever gotten in, well, 
designing programs (and the functions that compose them). HtDP is an 
incredibly good replacement for the program design parts of the 
legendary "Structure and Interpretation of Computer Programs" (SICP) 
because it is able to build upon and refine the lessons of SICP with 
further experience from decades more of teaching. Plus, its laser-focus 
on one specific subset of the many topics SICP covers allows it to 
progress more gently and explore concepts in more depth from more 
angles.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: System deployment commands hoarding all RAM
  2024-06-02 22:10 System deployment commands hoarding all RAM Juliana Sims
@ 2024-06-10 18:02 ` Sergio Pastor Pérez
  0 siblings, 0 replies; 3+ messages in thread
From: Sergio Pastor Pérez @ 2024-06-10 18:02 UTC (permalink / raw)
  To: AS8P251MB0337E3439FE24C98DA02D6BBF3F72; +Cc: michal_atlas+gnu, guix-devel

Hi Juliana,

Thanks for your help on the matter!

After reading your explanation about tail recursion I understand the
concept much better.

I still need to find a way to implement the problematic code in a
tail-recursive fashion. But, I'm not convinced that it will solve the
problem. Specially since issuing the `apply-to-record-type` over an
operating system record yields a short computation time:
--8<---------------cut here---------------start------------->8---
scheme@(pastor utils gpu-specification)> 
,time (begin ; Avoid printing the output
        (apply-to-record-type
         (lambda (pkg)
           pkg)
         (@ (guix packages) <package>)
         (@ (base-system) %base-gnome-system))
        #f)
$15 = #f
;; 0.007433s real time, 0.007428s run time.  0.000000s spent in GC.
--8<---------------cut here---------------end--------------->8---

I'm not able to use the profiler to display how deep the recursion
grows. In any case, I believe it should grow equally for the following
invocations:
--8<---------------cut here---------------start------------->8---
λ guix system build transformed-sheepbook.scm
/gnu/store/kpk9la4h9xwp6n7vabd5lfs6kbhb2f2d-system

λ guix system vm transformed-sheepbook.scm
Killed
--8<---------------cut here---------------end--------------->8---

Since the second one hoards the RAM, I'm guessing that the commands
operate differently over the resulting operating system record.

I'm not able to dig further since the `guix repl` does not support
tracing, and using a pure Guile REPL errors out whenever I try to use
trace points for this procedure:
--8<---------------cut here---------------start------------->8---
scheme@(pastor utils gpu-specification)> ,tracepoint apply-to-record-type
scheme@(pastor utils gpu-specification)> ((@ (guix scripts system) guix-system) "vm" "/home/pastor/.config/guix/systems/transformed-sheepbook.scm")
ice-9/boot-9.scm:1685:16: In procedure raise-exception:
In procedure port-column: Wrong type argument in position 1: #<closed: string 7fa2d32dd000>
--8<---------------cut here---------------end--------------->8---

Interestingly, the traces work when using the `build` command instead of
the `vm` command:
--8<---------------cut here---------------start------------->8---
scheme@(pastor utils gpu-specification)> ((@ (guix scripts system) guix-system) "build" "/home/pastor/.config/guix/systems/transformed-sheepbook.scm")
Trap 0: (apply-to-record-type #<procedure 7fa2aed6e868 at transformed-sheep…> …)
Trap 0: |  (apply-to-record-type #<procedure 7fa2aed6e868 at transformed-sh…> …)
...
/gnu/store/6xxzsp3grp53x9w49wh513i8sjyal3zg-system
--8<---------------cut here---------------end--------------->8---

I think I've hit a road block.

Does any fellow Guix hacker knows a way to debug `guix system` commands
or why the `vm` subcommand does not make the same computation over the
record as the `build` subcommand does?


Thanks again for your time.
Regards,
Sergio.

PS: I appreciate the book recommendations!


^ permalink raw reply	[flat|nested] 3+ messages in thread

* System deployment commands hoarding all RAM
@ 2024-05-26 12:44 Sergio Pastor Pérez
  0 siblings, 0 replies; 3+ messages in thread
From: Sergio Pastor Pérez @ 2024-05-26 12:44 UTC (permalink / raw)
  To: guix-devel; +Cc: michal_atlas+gnu

Hello all.

Michal and I have been experimenting with a recursive function to apply
procedures to all elements that match a certain type within a
collection.

With this new procedure one could write something like this:
--8<---------------cut here---------------start------------->8---
(apply-to-record-type
  (lambda (_)
    (@ (gnu packages games) cowsay))
  (@ (guix packages) <package>)
  my-operating-system)
--8<---------------cut here---------------end--------------->8---

Which will replace every package on the system for the cowsay
package. Albeit a cowsified OS will not be very useful, other kinds of
transformations could greatly simplify OS definitions. It would make it
very convenient to deploy systems with grafts on important libraries
that are part of many packages. A more useful example would be this
syntax rule:
--8<---------------cut here---------------start------------->8---
(define-syntax-rule (custom-libc-operating-system exp ...)
  "Like 'operating-system' but graft 'libc' with the a custom 'libc'
package."
  (apply-to-record-type
    replace-libc
    (@ (guix packages) <package>)
    (operating-system exp ...)))
--8<---------------cut here---------------end--------------->8---

Which, if `replace-libc` grafts your custom libc using
`package-input-rewriting`, would allow you to define a system like so:
--8<---------------cut here---------------start------------->8---
(custom-libc-operating-system
  ...)
--8<---------------cut here---------------end--------------->8---

Instead of having to manually apply the function to every package in the
system which is very cumbersome to do for packages within services.

The function that provides this convenience is this one:
--8<---------------cut here---------------start------------->8---
(define (apply-to-record-type fn type var)
  "Recursing into child fields, apply FN to every element of VAR which holds a
value of TYPE."
  (let ((type-predicate (record-predicate type)))
    (cond ((type-predicate var)
           (fn var))
          ((procedure? var)
           (lambda args
             (apply values
                    (map
                     (lambda (var)
                       (apply-to-record-type fn type var))
                     (call-with-values
                         (lambda () (apply var args))
                       list)))))
          ((list? var)
           (map (lambda (elt)
                  (apply-to-record-type fn type elt))
                var))
          ((vector? var)
           (vector-map (lambda (vec elt)
                         (apply-to-record-type fn type elt))
                       var))
          ((and (record? var)
                (not (eq? (@ (gnu services) <service-type>)
                          (record-type-descriptor var))))
           (let* ((record-type (record-type-descriptor var))
                  (record-fields (record-type-fields record-type)))
             (apply
              (record-constructor record-type)
              (map (lambda (field)
                     (let* ((accessor (record-accessor record-type field))
                            (val (accessor var)))
                       (apply-to-record-type fn type val)))
                   record-fields))))
          (else
           var))))
--8<---------------cut here---------------end--------------->8---

During our testing, we have observed that it successfully applies the
function to every element of the record and it's child. Also, innocuous
transformations output the same system derivation. For example:
--8<---------------cut here---------------start------------->8---
(apply-to-record-type
   (lambda (var)
     var)
   (@ (guix packages) <package>)
   my-operating-system)
--8<---------------cut here---------------end--------------->8---

Should return the same OS applied or not, since it is not doing
anything. And, as expected, we see:
--8<---------------cut here---------------start------------->8---
λ guix system build sheepbook.scm
/gnu/store/kpk9la4h9xwp6n7vabd5lfs6kbhb2f2d-system

λ guix system build transformed-sheepbook.scm
/gnu/store/kpk9la4h9xwp6n7vabd5lfs6kbhb2f2d-system
--8<---------------cut here---------------end--------------->8---

All this is well and good but we have noticed this issue when trying to
deploy the system. Issuing the `reconfigure` or `vm` subcomands hoards
all the RAM of the system until the OS kills the process. This is the
output when trying to build the images:
--8<---------------cut here---------------start------------->8---
λ guix system vm sheepbook.scm
/gnu/store/c3pv8hwckbl01qacdpckn9yfwr74k629-run-vm.sh

λ guix system vm transformed-sheepbook.scm 
Killed
--8<---------------cut here---------------end--------------->8---

Monitoring the RAM usage of the second process, one can see that it will
take all the RAM available. Given that both system definitions produce
the same output, I would expect the deployment process to be identical.

Is this a bug on how Guix deploys the system? Or the transformation we
are applying introduces a cycle somewhere that only affects the
deployment commands?

Thanks for your time. Have a nice evening.
Sergio.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-06-10 18:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-02 22:10 System deployment commands hoarding all RAM Juliana Sims
2024-06-10 18:02 ` Sergio Pastor Pérez
  -- strict thread matches above, loose matches on Subject: below --
2024-05-26 12:44 Sergio Pastor Pérez

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.