Luciana Lima Brito writes: > I was thinking about the timeline of tasks. > > The main tasks are: > > 1. Add instrumentation to identify the slow parts of processing > new revisions > > 2. Improve the performance of these slow parts > > I'm writing some ideas I have to divide the tasks in small steps, see > what you think about it. > > About the first task I understand that the whole thing starts with > identifying how the data for new revisions arrives on Guix Data > Service: the relevant queries and their processing on the code. Based > on it I would propose start with mapping these queries and their uses, > so I could run them locally and get their statistics. > > Once I get this information I could identify which are the possible > problematic ones and work on them. If the process is slow but the query > is not, maybe the problem would be hidden in the code. So, there's already some code for timing different parts of the data loading process, if you look in the job output and search for ", took " you should see timings printed out. These timings being printed out does help, but having the information in the log doesn't make it easy to figure out which part is the slowest for example. I'd also not consider this a "one off" thing, the data loading code will continue to change as Guix changes and it's performance will probably change too. I've been wondering about visualisations, I remember systemd had a feature to plot the systems boot as a image which made seeing which parts are slow much easier (here's an example [1]). 1: https://lizards.opensuse.org/wp-content/uploads/2012/07/plot001.gif > About the improvements on the performance of slow parts, it is a little > bit abstract for me to see now how to break it in smaller tasks. I do > believe that it would require to reformulate some parts of the queries, > and as their result may change a bit, tweaks could be required on > the code too. My point is, how would I propose an improvement approach > if I don't even know what exactly is to be improved? But I imagine that > work on this second task is more demanding than the first and will take > most of the time of the internship. As I said before, this part is dependent on deciding where the areas for improvement are. Maybe have a look through one of the job logs on data.guix.gnu.org and see if you can spot some slow parts?