unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Christopher Baines <mail@cbaines.net>
To: Luciana Lima Brito <lubrito@posteo.net>
Cc: guix-devel@gnu.org
Subject: Re: Outreachy: Timeline tasks
Date: Wed, 28 Apr 2021 19:17:51 +0100	[thread overview]
Message-ID: <87y2d2e0j4.fsf@cbaines.net> (raw)
In-Reply-To: <20210428145941.4bd0dd6f@lubrito>

[-- Attachment #1: Type: text/plain, Size: 2396 bytes --]


Luciana Lima Brito <lubrito@posteo.net> writes:

> I was thinking about the timeline of tasks.
>
> The main tasks are:
>
> 1. Add instrumentation to identify the slow parts of processing
>   new revisions
>
> 2. Improve the performance of these slow parts
>
> I'm writing some ideas I have to divide the tasks in small steps, see
> what you think about it.
>
> About the first task I understand that the whole thing starts with
> identifying how the data for new revisions arrives on Guix Data
> Service: the relevant queries and their processing on the code. Based
> on it I would propose start with mapping these queries and their uses,
> so I could run them locally and get their statistics.
>
> Once I get this information I could identify which are the possible
> problematic ones and work on them. If the process is slow but the query
> is not, maybe the problem would be hidden in the code.

So, there's already some code for timing different parts of the data
loading process, if you look in the job output and search for ", took "
you should see timings printed out.

These timings being printed out does help, but having the information in
the log doesn't make it easy to figure out which part is the slowest for
example.

I'd also not consider this a "one off" thing, the data loading code will
continue to change as Guix changes and it's performance will probably
change too.

I've been wondering about visualisations, I remember systemd had a
feature to plot the systems boot as a image which made seeing which
parts are slow much easier (here's an example [1]).

1: https://lizards.opensuse.org/wp-content/uploads/2012/07/plot001.gif

> About the improvements on the performance of slow parts, it is a little
> bit abstract for me to see now how to break it in smaller tasks. I do
> believe that it would require to reformulate some parts of the queries,
> and as their result may change a bit, tweaks could be required on
> the code too. My point is, how would I propose an improvement approach
> if I don't even know what exactly is to be improved? But I imagine that
> work on this second task is more demanding than the first and will take
> most of the time of the internship.

As I said before, this part is dependent on deciding where the areas for
improvement are. Maybe have a look through one of the job logs on
data.guix.gnu.org and see if you can spot some slow parts?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

  reply	other threads:[~2021-04-28 18:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-28 17:59 Outreachy: Timeline tasks Luciana Lima Brito
2021-04-28 18:17 ` Christopher Baines [this message]
2021-04-28 19:20   ` Luciana Lima Brito
2021-04-28 20:00     ` Christopher Baines
2021-04-29 16:02       ` lubrito
2021-04-29 20:14         ` Christopher Baines
2021-04-30 15:44           ` Luciana Lima Brito
2021-04-30 17:05             ` Christopher Baines
2021-04-30 21:19               ` Luciana Lima Brito
2021-05-01  8:16                 ` Christopher Baines
2021-05-01 13:48                   ` Luciana Lima Brito
2021-05-01 19:07                     ` Christopher Baines
2021-05-01 23:17                       ` Luciana Lima Brito
2021-05-02  9:20                         ` Christopher Baines
2021-05-03 14:23                           ` Luciana Lima Brito
2021-05-03 15:29                             ` Christopher Baines

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y2d2e0j4.fsf@cbaines.net \
    --to=mail@cbaines.net \
    --cc=guix-devel@gnu.org \
    --cc=lubrito@posteo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).