unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Outreachy: Timeline tasks
@ 2021-04-28 17:59 Luciana Lima Brito
  2021-04-28 18:17 ` Christopher Baines
  0 siblings, 1 reply; 16+ messages in thread
From: Luciana Lima Brito @ 2021-04-28 17:59 UTC (permalink / raw)
  To: mail, guix-devel

Hi,

I was thinking about the timeline of tasks.

The main tasks are:

1. Add instrumentation to identify the slow parts of processing
  new revisions

2. Improve the performance of these slow parts

I'm writing some ideas I have to divide the tasks in small steps, see
what you think about it.

About the first task I understand that the whole thing starts with
identifying how the data for new revisions arrives on Guix Data
Service: the relevant queries and their processing on the code. Based
on it I would propose start with mapping these queries and their uses,
so I could run them locally and get their statistics.

Once I get this information I could identify which are the possible
problematic ones and work on them. If the process is slow but the query
is not, maybe the problem would be hidden in the code.

About the improvements on the performance of slow parts, it is a little
bit abstract for me to see now how to break it in smaller tasks. I do
believe that it would require to reformulate some parts of the queries,
and as their result may change a bit, tweaks could be required on
the code too. My point is, how would I propose an improvement approach
if I don't even know what exactly is to be improved? But I imagine that
work on this second task is more demanding than the first and will take
most of the time of the internship.

I appreciate if you could clarify some of these ideas I mentioned.

-- 
Best Regards,

Luciana Lima Brito
MSc. in Computer Science


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Outreachy: Timeline tasks
  2021-04-28 17:59 Outreachy: Timeline tasks Luciana Lima Brito
@ 2021-04-28 18:17 ` Christopher Baines
  2021-04-28 19:20   ` Luciana Lima Brito
  0 siblings, 1 reply; 16+ messages in thread
From: Christopher Baines @ 2021-04-28 18:17 UTC (permalink / raw)
  To: Luciana Lima Brito; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2396 bytes --]


Luciana Lima Brito <lubrito@posteo.net> writes:

> I was thinking about the timeline of tasks.
>
> The main tasks are:
>
> 1. Add instrumentation to identify the slow parts of processing
>   new revisions
>
> 2. Improve the performance of these slow parts
>
> I'm writing some ideas I have to divide the tasks in small steps, see
> what you think about it.
>
> About the first task I understand that the whole thing starts with
> identifying how the data for new revisions arrives on Guix Data
> Service: the relevant queries and their processing on the code. Based
> on it I would propose start with mapping these queries and their uses,
> so I could run them locally and get their statistics.
>
> Once I get this information I could identify which are the possible
> problematic ones and work on them. If the process is slow but the query
> is not, maybe the problem would be hidden in the code.

So, there's already some code for timing different parts of the data
loading process, if you look in the job output and search for ", took "
you should see timings printed out.

These timings being printed out does help, but having the information in
the log doesn't make it easy to figure out which part is the slowest for
example.

I'd also not consider this a "one off" thing, the data loading code will
continue to change as Guix changes and it's performance will probably
change too.

I've been wondering about visualisations, I remember systemd had a
feature to plot the systems boot as a image which made seeing which
parts are slow much easier (here's an example [1]).

1: https://lizards.opensuse.org/wp-content/uploads/2012/07/plot001.gif

> About the improvements on the performance of slow parts, it is a little
> bit abstract for me to see now how to break it in smaller tasks. I do
> believe that it would require to reformulate some parts of the queries,
> and as their result may change a bit, tweaks could be required on
> the code too. My point is, how would I propose an improvement approach
> if I don't even know what exactly is to be improved? But I imagine that
> work on this second task is more demanding than the first and will take
> most of the time of the internship.

As I said before, this part is dependent on deciding where the areas for
improvement are. Maybe have a look through one of the job logs on
data.guix.gnu.org and see if you can spot some slow parts?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Outreachy: Timeline tasks
  2021-04-28 18:17 ` Christopher Baines
@ 2021-04-28 19:20   ` Luciana Lima Brito
  2021-04-28 20:00     ` Christopher Baines
  0 siblings, 1 reply; 16+ messages in thread
From: Luciana Lima Brito @ 2021-04-28 19:20 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

On Wed, 28 Apr 2021 19:17:51 +0100
Christopher Baines <mail@cbaines.net> wrote:

> So, there's already some code for timing different parts of the data
> loading process, if you look in the job output and search for ", took
> " you should see timings printed out.
> 
> These timings being printed out does help, but having the information
> in the log doesn't make it easy to figure out which part is the
> slowest for example.
> 
> I'd also not consider this a "one off" thing, the data loading code
> will continue to change as Guix changes and it's performance will
> probably change too.
> 
> I've been wondering about visualisations, I remember systemd had a
> feature to plot the systems boot as a image which made seeing which
> parts are slow much easier (here's an example [1]).
> 
> 1: https://lizards.opensuse.org/wp-content/uploads/2012/07/plot001.gif

This is interesting! In fact, one of the things that attracted me was
the possibility to work with visualizations, as I saw that on the
roadmap there is one task related to provide statistics over time). 
My master degree is on Information Visualization, so I would appreciate
very much if I could help with that.

In this matter, we should determine what else, other than time, would
be interesting to see. The visualization should be clear enough about
timing but should also provide information about what could be related
to the delays, such as size of the queries, complexity, the return it
gives... So, first I think we should determine what information we want
to see, then depending on the variables, we choose a suitable way to
present the visualization.

About implementing, I'm kind of new to guile and I never built a
visualization in guile, so I don't know which libraries it would take to
build a visual work like that. Depending on what we have,
interactions could be compromised, and instead we would have to work
with charts (static visualizations). Can you tell me more about that?

And one last thing, a visualization can be simple or can be very
complex.The time for that should be carefully taken into account in
order to not impair the main goal which is the improvements of the slow
parts.

> 
> > About the improvements on the performance of slow parts, it is a
> > little bit abstract for me to see now how to break it in smaller
> > tasks. I do believe that it would require to reformulate some parts
> > of the queries, and as their result may change a bit, tweaks could
> > be required on the code too. My point is, how would I propose an
> > improvement approach if I don't even know what exactly is to be
> > improved? But I imagine that work on this second task is more
> > demanding than the first and will take most of the time of the
> > internship.  
> 
> As I said before, this part is dependent on deciding where the areas
> for improvement are. Maybe have a look through one of the job logs on
> data.guix.gnu.org and see if you can spot some slow parts?

I'll look into that and get back to you.

-- 
Best Regards,

Luciana Lima Brito
MSc. in Computer Science


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Outreachy: Timeline tasks
  2021-04-28 19:20   ` Luciana Lima Brito
@ 2021-04-28 20:00     ` Christopher Baines
  2021-04-29 16:02       ` lubrito
  0 siblings, 1 reply; 16+ messages in thread
From: Christopher Baines @ 2021-04-28 20:00 UTC (permalink / raw)
  To: Luciana Lima Brito; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 3555 bytes --]


Luciana Lima Brito <lubrito@posteo.net> writes:

> On Wed, 28 Apr 2021 19:17:51 +0100
> Christopher Baines <mail@cbaines.net> wrote:
>
>> So, there's already some code for timing different parts of the data
>> loading process, if you look in the job output and search for ", took
>> " you should see timings printed out.
>>
>> These timings being printed out does help, but having the information
>> in the log doesn't make it easy to figure out which part is the
>> slowest for example.
>>
>> I'd also not consider this a "one off" thing, the data loading code
>> will continue to change as Guix changes and it's performance will
>> probably change too.
>>
>> I've been wondering about visualisations, I remember systemd had a
>> feature to plot the systems boot as a image which made seeing which
>> parts are slow much easier (here's an example [1]).
>>
>> 1: https://lizards.opensuse.org/wp-content/uploads/2012/07/plot001.gif
>
> This is interesting! In fact, one of the things that attracted me was
> the possibility to work with visualizations, as I saw that on the
> roadmap there is one task related to provide statistics over time).
> My master degree is on Information Visualization, so I would appreciate
> very much if I could help with that.
>
> In this matter, we should determine what else, other than time, would
> be interesting to see. The visualization should be clear enough about
> timing but should also provide information about what could be related
> to the delays, such as size of the queries, complexity, the return it
> gives... So, first I think we should determine what information we want
> to see, then depending on the variables, we choose a suitable way to
> present the visualization.

I think time is the main thing, since that alone will help to identify
which are the slow parts.

> About implementing, I'm kind of new to guile and I never built a
> visualization in guile, so I don't know which libraries it would take to
> build a visual work like that. Depending on what we have,
> interactions could be compromised, and instead we would have to work
> with charts (static visualizations). Can you tell me more about that?

Given the Guix Data Service outputs HTML as well as JSON, it might be
possible to build something with HTML. The package history pages sort of
visualise the data by adding some grey bars to the table [1].

1: http://data.guix.gnu.org/repository/1/branch/master/package/guix

> And one last thing, a visualization can be simple or can be very
> complex.The time for that should be carefully taken into account in
> order to not impair the main goal which is the improvements of the slow
> parts.

Indeed, simplicity is a good thing to keep in mind.

>> > About the improvements on the performance of slow parts, it is a
>> > little bit abstract for me to see now how to break it in smaller
>> > tasks. I do believe that it would require to reformulate some parts
>> > of the queries, and as their result may change a bit, tweaks could
>> > be required on the code too. My point is, how would I propose an
>> > improvement approach if I don't even know what exactly is to be
>> > improved? But I imagine that work on this second task is more
>> > demanding than the first and will take most of the time of the
>> > internship.
>>
>> As I said before, this part is dependent on deciding where the areas
>> for improvement are. Maybe have a look through one of the job logs on
>> data.guix.gnu.org and see if you can spot some slow parts?
>
> I'll look into that and get back to you.

Great.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Outreachy: Timeline tasks
  2021-04-28 20:00     ` Christopher Baines
@ 2021-04-29 16:02       ` lubrito
  2021-04-29 20:14         ` Christopher Baines
  0 siblings, 1 reply; 16+ messages in thread
From: lubrito @ 2021-04-29 16:02 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

Hi,

On Wed, 28 Apr 2021 21:00:44 +0100
Christopher Baines <mail@cbaines.net> wrote:


> I think time is the main thing, since that alone will help to identify
> which are the slow parts.

> Indeed, simplicity is a good thing to keep in mind.

I'll keep in mind what you have said, so here are some ideas to
break the tasks.

Task 1: Add instrumentation to identify the slow parts of processing
   new revisions

	- Implementing a chart over time to identify slow parts;
	- Select candidate parts to be analyzed;
	- Should this part really be taking all this time?
	  - Identify if the delay is caused by the code itself or by a
	    query.
	- Prepare a document organizing where each part should
	  be improved.

Task 2: Improve the performance of these slow parts

	- implementing each improvements.

This Task 2 is really difficult for me to divide in smaller tasks, I'm
really stuck with ideas.

I took a look at one of the logs as you told me, and I found this parts
taking a considerable time:

- Computing the channel derivations 636s
- Acquiring advisory transaction lock:
   loading-new-guix-revision-inserts 2534s *
- Getting derivation lint warnings 975s
- fetching inferior lint warnings 1488s
- Acquiring advisory transaction lock:
   loading-new-guix-revision-inserts 3226s *
- Querying the temp-package-metadata 1949s
- Fetching inferior package metadata 2476s

To pick this parts is what should be facilitated by the chart?
So I would perform the analysis of the task 1 and then the ta

By the way, I noticed that the ones I marked with * are performed twice.
Why is that?



-- 
Best Regards,

Luciana Lima Brito
MSc. in Computer Science



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Outreachy: Timeline tasks
  2021-04-29 16:02       ` lubrito
@ 2021-04-29 20:14         ` Christopher Baines
  2021-04-30 15:44           ` Luciana Lima Brito
  0 siblings, 1 reply; 16+ messages in thread
From: Christopher Baines @ 2021-04-29 20:14 UTC (permalink / raw)
  To: lubrito; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 3277 bytes --]


lubrito@posteo.net writes:

> Hi,
>
> On Wed, 28 Apr 2021 21:00:44 +0100
> Christopher Baines <mail@cbaines.net> wrote:
>
>
>> I think time is the main thing, since that alone will help to identify
>> which are the slow parts.
>
>> Indeed, simplicity is a good thing to keep in mind.
>
> I'll keep in mind what you have said, so here are some ideas to
> break the tasks.
>
> Task 1: Add instrumentation to identify the slow parts of processing
>   new revisions
>
> 	- Implementing a chart over time to identify slow parts;

Great, can you add more detail to this bit? Given the instrumentation is
a really important part, it would be good to have some working ideas for
what this chart might look like, and what goes in to making it (like
where the data comes from and if and where it's stored).

> 	- Select candidate parts to be analyzed;
> 	- Should this part really be taking all this time?
> 	  - Identify if the delay is caused by the code itself or by a
> 	    query.
> 	- Prepare a document organizing where each part should
> 	  be improved.
>
Given these don't contribute to the instrumentation, I would see them as
part of "Task 2".

> Task 2: Improve the performance of these slow parts
>
> 	- implementing each improvements.
>
> This Task 2 is really difficult for me to divide in smaller tasks, I'm
> really stuck with ideas.

I'd try to set out more of a strategy, so what might be causes of
slowness, how would you investigate them, and what are common approaches
to making those things faster?

> I took a look at one of the logs as you told me, and I found this parts
> taking a considerable time:

Great :)

> - Computing the channel derivations 636s
> - Acquiring advisory transaction lock:
>   loading-new-guix-revision-inserts 2534s *
> - Getting derivation lint warnings 975s
> - fetching inferior lint warnings 1488s
> - Acquiring advisory transaction lock:
>   loading-new-guix-revision-inserts 3226s *
> - Querying the temp-package-metadata 1949s
> - Fetching inferior package metadata 2476s
>
> To pick this parts is what should be facilitated by the chart?
> So I would perform the analysis of the task 1 and then the ta

I would hope that the "instrumentation" would make finding out what
parts are slowest easier, and making it visual information rather than
text scattered through the long log will hopefully go a long way towards
that.

> By the way, I noticed that the ones I marked with * are performed twice.
> Why is that?

I'd try reading the code to find out. In case you haven't worked with
database locks much before, they are being used in this case to allow
multiple processes to interact with the database to load different
revisions at the same time. The locking is very coarse, so locks can be
held for a very long time, so what you're seeing in the log when it's
saying it's acquiring a lock, and it takes a long time, is that process
was waiting for another process to commit or rollback a transaction in
which it was holding that same lock.

The technical details that are useful to know:

 - The locks here apply within transactions

 - The lock will be released when the transaction commits or rolls back

 - Advisory locks are user defined locks:
     https://www.postgresql.org/docs/13/explicit-locking.html#ADVISORY-LOCKS

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Outreachy: Timeline tasks
  2021-04-29 20:14         ` Christopher Baines
@ 2021-04-30 15:44           ` Luciana Lima Brito
  2021-04-30 17:05             ` Christopher Baines
  0 siblings, 1 reply; 16+ messages in thread
From: Luciana Lima Brito @ 2021-04-30 15:44 UTC (permalink / raw)
  To: Christopher Baines, guix-devel

Hi,

On Thu, 29 Apr 2021 21:14:10 +0100
Christopher Baines <mail@cbaines.net> wrote:

> Great, can you add more detail to this bit? Given the instrumentation
> is a really important part, it would be good to have some working
> ideas for what this chart might look like, and what goes in to making
> it (like where the data comes from and if and where it's stored).

Task 1: Add instrumentation to identify the slow parts of processing
new revisions:

  - Implementing a chart over time to identify slow parts:
	- The chart should consider two aspects, the time took by
	  each specific part, and its name, for identification purpose.
	  A bar chart is a good candidate for this task, it is simple
	  and can show the whole picture of the time taken by the
	  system to process a new revision;
	- The bars on the chart should be sorted in order of precedence
	  of the process in the X axis, and its height, which is
	  determined by the time it takes, measured in the Y
	  axis;
	- The charts should work as picture-logs for timing;
	- A chart should be generated for each new revision in real
	  time;
	- The time is already being computed for each part and it is
	  shown in the logs, the same data can be used to build the
	  charts. This way data does not need to be stored, because the
	  chart can be built in real time, but the chart itself needs
	  to be stored, in the same way as the logs already are.

> I'd try to set out more of a strategy, so what might be causes of
> slowness, how would you investigate them, and what are common
> approaches to making those things faster?

Task 2: Improve the performance of these slow parts:

  - Select candidate parts to be analyzed;
  - Identify the causes of slowness:
	- If it uses queries, perform query analysis, for example using
	  EXPLAIN and ANALYZE, get its statistics and improve them when
	  needed;
	- On the code, investigate the structure, for example, using
	  Tracing to discover if a recursion is too long and if it can
	  be modified to be tail recursive.
  - Implement the required improvements.

In fact, I have never performed improvements on queries or code this
way, but I am studying. Please, tell me if I am missing important
details. See what you think about that.

-- 
Best Regards,

Luciana Lima Brito
MSc. in Computer Science


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Outreachy: Timeline tasks
  2021-04-30 15:44           ` Luciana Lima Brito
@ 2021-04-30 17:05             ` Christopher Baines
  2021-04-30 21:19               ` Luciana Lima Brito
  0 siblings, 1 reply; 16+ messages in thread
From: Christopher Baines @ 2021-04-30 17:05 UTC (permalink / raw)
  To: Luciana Lima Brito; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2605 bytes --]


Luciana Lima Brito <lubrito@posteo.net> writes:

> Hi,
>
> On Thu, 29 Apr 2021 21:14:10 +0100
> Christopher Baines <mail@cbaines.net> wrote:
>
>> Great, can you add more detail to this bit? Given the instrumentation
>> is a really important part, it would be good to have some working
>> ideas for what this chart might look like, and what goes in to making
>> it (like where the data comes from and if and where it's stored).
>
> Task 1: Add instrumentation to identify the slow parts of processing
> new revisions:
>
>   - Implementing a chart over time to identify slow parts:
> 	- The chart should consider two aspects, the time took by
> 	  each specific part, and its name, for identification purpose.
> 	  A bar chart is a good candidate for this task, it is simple
> 	  and can show the whole picture of the time taken by the
> 	  system to process a new revision;
> 	- The bars on the chart should be sorted in order of precedence
> 	  of the process in the X axis, and its height, which is
> 	  determined by the time it takes, measured in the Y
> 	  axis;

I'm not sure what you mean by "precedence of the process" here?

> 	- The charts should work as picture-logs for timing;
> 	- A chart should be generated for each new revision in real
> 	  time;
> 	- The time is already being computed for each part and it is
> 	  shown in the logs, the same data can be used to build the
> 	  charts. This way data does not need to be stored, because the
> 	  chart can be built in real time, but the chart itself needs
> 	  to be stored, in the same way as the logs already are.

It would be good to say explicitly how the chart will be stored, since
"the same way as the logs already are" is quite vague.

>> I'd try to set out more of a strategy, so what might be causes of
>> slowness, how would you investigate them, and what are common
>> approaches to making those things faster?
>
> Task 2: Improve the performance of these slow parts:
>
>   - Select candidate parts to be analyzed;
>   - Identify the causes of slowness:
> 	- If it uses queries, perform query analysis, for example using
> 	  EXPLAIN and ANALYZE, get its statistics and improve them when
> 	  needed;
> 	- On the code, investigate the structure, for example, using
> 	  Tracing to discover if a recursion is too long and if it can
> 	  be modified to be tail recursive.
>   - Implement the required improvements.
>
> In fact, I have never performed improvements on queries or code this
> way, but I am studying. Please, tell me if I am missing important
> details. See what you think about that.

Great, this is more like it.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Outreachy: Timeline tasks
  2021-04-30 17:05             ` Christopher Baines
@ 2021-04-30 21:19               ` Luciana Lima Brito
  2021-05-01  8:16                 ` Christopher Baines
  0 siblings, 1 reply; 16+ messages in thread
From: Luciana Lima Brito @ 2021-04-30 21:19 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

On Fri, 30 Apr 2021 18:05:15 +0100
Christopher Baines <mail@cbaines.net> wrote:

> >
> > Task 1: Add instrumentation to identify the slow parts of processing
> > new revisions:
> >
> >   - Implementing a chart over time to identify slow parts:
> > 	- The chart should consider two aspects, the time took by
> > 	  each specific part, and its name, for identification
> >       purpose. A bar chart is a good candidate for this task, it is
> >       simple and can show the whole picture of the time taken by the
> > 	  system to process a new revision;
> > 	- The bars on the chart should be sorted in order of
> >       precedence of the process in the X axis, and its height, which
> >       is determined by the time it takes, measured in the Y
> > 	  axis;  
> 
> I'm not sure what you mean by "precedence of the process" here?

As we are concerned only with the time each process takes, and each bar
on the chart will be presented side by side, the order of the bars would
be the order that the processes appear. If you prefer, they could be
sorted alphabetically, or by time taken.

> > 	- The charts should work as picture-logs for timing;
> > 	- A chart should be generated for each new revision in real
> > 	  time; 
> 
> It would be good to say explicitly how the chart will be stored, since
> "the same way as the logs already are" is quite vague.

I misunderstood this point so this part about storing the chart is
completely messed. But I think now I got it. Let me correct the last
point and add some more:
  ...
	- The time is already being computed for each part and it
	  is shown in the logs, the same data can be used to build the charts.
	  The data to build the chart for each revision could be
	  stored as a new table in the database*, first in parts, for
	  when a revision is still being processed, then combining their
	  parts when the processing is finished.
	- The new table on the database should store three information:
	  the job_id, the action, and the time taken.
	- Then, when one wants to see the chart, this table is queried
	  and the chart is rendered as html.

* Although the information is already in the log, it is stored as a
  text, so it is harder to get the names of the actions and the time
  taken by each, so I think that create a new table, with only these
  values, is more suitable. 
> 
> Great, this is more like it.

:)

-- 
Best Regards,

Luciana Lima Brito
MSc. in Computer Science


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Outreachy: Timeline tasks
  2021-04-30 21:19               ` Luciana Lima Brito
@ 2021-05-01  8:16                 ` Christopher Baines
  2021-05-01 13:48                   ` Luciana Lima Brito
  0 siblings, 1 reply; 16+ messages in thread
From: Christopher Baines @ 2021-05-01  8:16 UTC (permalink / raw)
  To: Luciana Lima Brito; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2813 bytes --]


Luciana Lima Brito <lubrito@posteo.net> writes:

> On Fri, 30 Apr 2021 18:05:15 +0100
> Christopher Baines <mail@cbaines.net> wrote:
>
>> >
>> > Task 1: Add instrumentation to identify the slow parts of processing
>> > new revisions:
>> >
>> >   - Implementing a chart over time to identify slow parts:
>> > 	- The chart should consider two aspects, the time took by
>> > 	  each specific part, and its name, for identification
>> >       purpose. A bar chart is a good candidate for this task, it is
>> >       simple and can show the whole picture of the time taken by the
>> > 	  system to process a new revision;
>> > 	- The bars on the chart should be sorted in order of
>> >       precedence of the process in the X axis, and its height, which
>> >       is determined by the time it takes, measured in the Y
>> > 	  axis;
>>
>> I'm not sure what you mean by "precedence of the process" here?
>
> As we are concerned only with the time each process takes, and each bar
> on the chart will be presented side by side, the order of the bars would
> be the order that the processes appear. If you prefer, they could be
> sorted alphabetically, or by time taken.

Ok, I think I follow.

Currently the timing of various sections of the process includes timing
smaller sections, and that may complicate reading the chart, since it
won't convey which timed sections include other timed sections. Does
that make sense?

>> > 	- The charts should work as picture-logs for timing;
>> > 	- A chart should be generated for each new revision in real
>> > 	  time; 
>> 
>> It would be good to say explicitly how the chart will be stored, since
>> "the same way as the logs already are" is quite vague.
>
> I misunderstood this point so this part about storing the chart is
> completely messed. But I think now I got it. Let me correct the last
> point and add some more:
>   ...
> 	- The time is already being computed for each part and it
> 	  is shown in the logs, the same data can be used to build the charts.
> 	  The data to build the chart for each revision could be
> 	  stored as a new table in the database*, first in parts, for
> 	  when a revision is still being processed, then combining their
> 	  parts when the processing is finished.
> 	- The new table on the database should store three information:
> 	  the job_id, the action, and the time taken.
> 	- Then, when one wants to see the chart, this table is queried
> 	  and the chart is rendered as html.
>
> * Although the information is already in the log, it is stored as a
>   text, so it is harder to get the names of the actions and the time
>   taken by each, so I think that create a new table, with only these
>   values, is more suitable.

Great, this is a good amount of detail.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Outreachy: Timeline tasks
  2021-05-01  8:16                 ` Christopher Baines
@ 2021-05-01 13:48                   ` Luciana Lima Brito
  2021-05-01 19:07                     ` Christopher Baines
  0 siblings, 1 reply; 16+ messages in thread
From: Luciana Lima Brito @ 2021-05-01 13:48 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

Hi,

On Sat, 01 May 2021 09:16:08 +0100
Christopher Baines <mail@cbaines.net> wrote:
 
> Currently the timing of various sections of the process includes
> timing smaller sections, and that may complicate reading the chart,
> since it won't convey which timed sections include other timed
> sections. Does that make sense?

Yes, I understand. But just to make sure, you say that the actions we
see in the logs are actually subsections of a bigger process? The
problem here would be to clearly mark in the code actions of a same
process. I'll take this into account on my planning.

For that I propose to build 2 charts, one of the
macro view, what we call "overview first", showing the
sections(processes) and their whole time taken. This way we could just
see what we were aiming for, which is to identify slowness.
The second chart would be what we call "details on demand", in which we
could have the subsections(actions) being shown. To differ to which
section(process) they are bound, we could use two meaningless
alternating colours (just to group the subsections of a section), and
they would follow the same order as the first chart.

The use of alternating colours could be applied to both charts in order
to make clear the equivalence. Both charts should appear at the same
time, one above the other, to ease comparison.

> Great, this is a good amount of detail.

I'll add this to the plan and to the final application, ok?

-- 
Best Regards,

Luciana Lima Brito
MSc. in Computer Science


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Outreachy: Timeline tasks
  2021-05-01 13:48                   ` Luciana Lima Brito
@ 2021-05-01 19:07                     ` Christopher Baines
  2021-05-01 23:17                       ` Luciana Lima Brito
  0 siblings, 1 reply; 16+ messages in thread
From: Christopher Baines @ 2021-05-01 19:07 UTC (permalink / raw)
  To: Luciana Lima Brito; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2016 bytes --]


Luciana Lima Brito <lubrito@posteo.net> writes:

> On Sat, 01 May 2021 09:16:08 +0100
> Christopher Baines <mail@cbaines.net> wrote:
>
>> Currently the timing of various sections of the process includes
>> timing smaller sections, and that may complicate reading the chart,
>> since it won't convey which timed sections include other timed
>> sections. Does that make sense?
>
> Yes, I understand. But just to make sure, you say that the actions we
> see in the logs are actually subsections of a bigger process? The
> problem here would be to clearly mark in the code actions of a same
> process. I'll take this into account on my planning.

Take the lint warnings for example, currently the time to fetch all lint
warnings is timed, but the usage of each individual linter is
timed. Both bits of information are helpful.

> For that I propose to build 2 charts, one of the
> macro view, what we call "overview first", showing the
> sections(processes) and their whole time taken. This way we could just
> see what we were aiming for, which is to identify slowness.
> The second chart would be what we call "details on demand", in which we
> could have the subsections(actions) being shown. To differ to which
> section(process) they are bound, we could use two meaningless
> alternating colours (just to group the subsections of a section), and
> they would follow the same order as the first chart.
>
> The use of alternating colours could be applied to both charts in order
> to make clear the equivalence. Both charts should appear at the same
> time, one above the other, to ease comparison.

That sounds better, although I think a timeline, similar to what the
systemd-analyze example uses [1] might be a more natural representation
of the data, colour could then be used to represent relatively how long
each part takes.

1: https://lizards.opensuse.org/wp-content/uploads/2012/07/plot001.gif

>> Great, this is a good amount of detail.
>
> I'll add this to the plan and to the final application, ok?

Yep.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Outreachy: Timeline tasks
  2021-05-01 19:07                     ` Christopher Baines
@ 2021-05-01 23:17                       ` Luciana Lima Brito
  2021-05-02  9:20                         ` Christopher Baines
  0 siblings, 1 reply; 16+ messages in thread
From: Luciana Lima Brito @ 2021-05-01 23:17 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

On Sat, 01 May 2021 20:07:56 +0100
Christopher Baines <mail@cbaines.net> wrote:

> Luciana Lima Brito <lubrito@posteo.net> writes:
> 
> > On Sat, 01 May 2021 09:16:08 +0100
> > Christopher Baines <mail@cbaines.net> wrote:
> >  
> >> Currently the timing of various sections of the process includes
> >> timing smaller sections, and that may complicate reading the chart,
> >> since it won't convey which timed sections include other timed
> >> sections. Does that make sense?  
> >
> > Yes, I understand. But just to make sure, you say that the actions
> > we see in the logs are actually subsections of a bigger process? The
> > problem here would be to clearly mark in the code actions of a same
> > process. I'll take this into account on my planning.  
> 
> Take the lint warnings for example, currently the time to fetch all
> lint warnings is timed, but the usage of each individual linter is
> timed. Both bits of information are helpful.

OK.
 
> > For that I propose to build 2 charts, one of the
> > macro view, what we call "overview first", showing the
> > sections(processes) and their whole time taken. This way we could
> > just see what we were aiming for, which is to identify slowness.
> > The second chart would be what we call "details on demand", in
> > which we could have the subsections(actions) being shown. To differ
> > to which section(process) they are bound, we could use two
> > meaningless alternating colours (just to group the subsections of a
> > section), and they would follow the same order as the first chart.
> >
> > The use of alternating colours could be applied to both charts in
> > order to make clear the equivalence. Both charts should appear at
> > the same time, one above the other, to ease comparison.  
> 
> That sounds better, although I think a timeline, similar to what the
> systemd-analyze example uses [1] might be a more natural
> representation of the data, colour could then be used to represent
> relatively how long each part takes.
> 
> 1: https://lizards.opensuse.org/wp-content/uploads/2012/07/plot001.gif

Here I have two observations to debate:
1 - Is the starting and ending time of each process an important
information to determine its slowness? If this information is not
necessary, maybe we should avoid the timeline, in order to make the
chart cleaner. A timeline could impair the comparisons of bars, so I
would recommend simple bar charts.

2 - About the colours to represent how long each part takes, I don't
know if I get it right. Do you mean to have one colour for slow parts
and other colour to normal parts? 

Anyway, I think all this can be further discussed while the work is in
progress.

> > I'll add this to the plan and to the final application, ok?  
> 
> Yep.

-- 
Best Regards,

Luciana Lima Brito
MSc. in Computer Science


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Outreachy: Timeline tasks
  2021-05-01 23:17                       ` Luciana Lima Brito
@ 2021-05-02  9:20                         ` Christopher Baines
  2021-05-03 14:23                           ` Luciana Lima Brito
  0 siblings, 1 reply; 16+ messages in thread
From: Christopher Baines @ 2021-05-02  9:20 UTC (permalink / raw)
  To: Luciana Lima Brito; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2147 bytes --]


Luciana Lima Brito <lubrito@posteo.net> writes:

> On Sat, 01 May 2021 20:07:56 +0100
> Christopher Baines <mail@cbaines.net> wrote:
>
>> Luciana Lima Brito <lubrito@posteo.net> writes:
>>
>> > For that I propose to build 2 charts, one of the
>> > macro view, what we call "overview first", showing the
>> > sections(processes) and their whole time taken. This way we could
>> > just see what we were aiming for, which is to identify slowness.
>> > The second chart would be what we call "details on demand", in
>> > which we could have the subsections(actions) being shown. To differ
>> > to which section(process) they are bound, we could use two
>> > meaningless alternating colours (just to group the subsections of a
>> > section), and they would follow the same order as the first chart.
>> >
>> > The use of alternating colours could be applied to both charts in
>> > order to make clear the equivalence. Both charts should appear at
>> > the same time, one above the other, to ease comparison.
>>
>> That sounds better, although I think a timeline, similar to what the
>> systemd-analyze example uses [1] might be a more natural
>> representation of the data, colour could then be used to represent
>> relatively how long each part takes.
>>
>> 1: https://lizards.opensuse.org/wp-content/uploads/2012/07/plot001.gif
>
> Here I have two observations to debate:
> 1 - Is the starting and ending time of each process an important
> information to determine its slowness? If this information is not
> necessary, maybe we should avoid the timeline, in order to make the
> chart cleaner. A timeline could impair the comparisons of bars, so I
> would recommend simple bar charts.

I think what things are happening when is relevant, but that's more
about understanding the hierarchy, rather than specific start and end
times.

> 2 - About the colours to represent how long each part takes, I don't
> know if I get it right. Do you mean to have one colour for slow parts
> and other colour to normal parts?

Basically, although using more colours (from a gradient, like white to
red) would probably convey more information than just two colours.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Outreachy: Timeline tasks
  2021-05-02  9:20                         ` Christopher Baines
@ 2021-05-03 14:23                           ` Luciana Lima Brito
  2021-05-03 15:29                             ` Christopher Baines
  0 siblings, 1 reply; 16+ messages in thread
From: Luciana Lima Brito @ 2021-05-03 14:23 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

On Sun, 02 May 2021 10:20:56 +0100
Christopher Baines <mail@cbaines.net> wrote:

> I think what things are happening when is relevant, but that's more
> about understanding the hierarchy, rather than specific start and end
> times.

> Basically, although using more colours (from a gradient, like white to
> red) would probably convey more information than just two colours.

I submitted my final application. I added to it some notes on what we
have been discussing, but I think some specific things could be better
discussed visually with a mock up.

Don't you think now would be a good time for some new task? :)

-- 
Best Regards,

Luciana Lima Brito
MSc. in Computer Science


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Outreachy: Timeline tasks
  2021-05-03 14:23                           ` Luciana Lima Brito
@ 2021-05-03 15:29                             ` Christopher Baines
  0 siblings, 0 replies; 16+ messages in thread
From: Christopher Baines @ 2021-05-03 15:29 UTC (permalink / raw)
  To: Luciana Lima Brito; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1142 bytes --]


Luciana Lima Brito <lubrito@posteo.net> writes:

> On Sun, 02 May 2021 10:20:56 +0100
> Christopher Baines <mail@cbaines.net> wrote:
>
>> I think what things are happening when is relevant, but that's more
>> about understanding the hierarchy, rather than specific start and end
>> times.
>
>> Basically, although using more colours (from a gradient, like white to
>> red) would probably convey more information than just two colours.
>
> I submitted my final application. I added to it some notes on what we
> have been discussing, but I think some specific things could be better
> discussed visually with a mock up.

Great :)

> Don't you think now would be a good time for some new task? :)

As the Outreachy contribution period has now ended, there's no specific
need.

If you really want something to look at, you could investigate the slow
package metadata related query that Canan was looking in to. There's
some details in this email [1] and then later on in that same thread as
well [2].

1: https://lists.gnu.org/archive/html/guix-devel/2021-04/msg00395.html
2: https://lists.gnu.org/archive/html/guix-devel/2021-04/msg00434.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-05-03 15:30 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-28 17:59 Outreachy: Timeline tasks Luciana Lima Brito
2021-04-28 18:17 ` Christopher Baines
2021-04-28 19:20   ` Luciana Lima Brito
2021-04-28 20:00     ` Christopher Baines
2021-04-29 16:02       ` lubrito
2021-04-29 20:14         ` Christopher Baines
2021-04-30 15:44           ` Luciana Lima Brito
2021-04-30 17:05             ` Christopher Baines
2021-04-30 21:19               ` Luciana Lima Brito
2021-05-01  8:16                 ` Christopher Baines
2021-05-01 13:48                   ` Luciana Lima Brito
2021-05-01 19:07                     ` Christopher Baines
2021-05-01 23:17                       ` Luciana Lima Brito
2021-05-02  9:20                         ` Christopher Baines
2021-05-03 14:23                           ` Luciana Lima Brito
2021-05-03 15:29                             ` Christopher Baines

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).