all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* [Outreachy] [Guix Data Service]: Identify the slow parts of process
@ 2021-04-27 14:10 Canan Talayhan
  2021-04-27 18:26 ` Christopher Baines
  0 siblings, 1 reply; 4+ messages in thread
From: Canan Talayhan @ 2021-04-27 14:10 UTC (permalink / raw)
  To: guix-devel, Christopher Baines

[-- Attachment #1: Type: text/plain, Size: 728 bytes --]

Hi Chris,

I am writing to give you an update on the progress that I have made.

I've created a temporary table named temp_package_metadata[1] and
insert a revision that already in my local database[2]. Then as you
said I've run the slow query with EXPLAIN ANALYZE. (screenshot is
attached) I may understand the slow query's working logic.

[1]CREATE TEMPORARY TABLE temp_package_metadata (LIKE package_metadata
INCLUDING ALL)

[2]INSERT INTO temp_package_metadata (home_page,
location_id,license_set_id,package_description_set_id,
package_synopsis_set_id) VALUES ('https://zlib.net/',9,9,2373,1407)

Now, I'm looking into a specific issue that arises when called
insert-missing-data-and-return-all-ids.

Thanks,
Canan Talayhan

[-- Attachment #2: query_plan.png --]
[-- Type: image/png, Size: 70041 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Outreachy] [Guix Data Service]: Identify the slow parts of process
  2021-04-27 14:10 [Outreachy] [Guix Data Service]: Identify the slow parts of process Canan Talayhan
@ 2021-04-27 18:26 ` Christopher Baines
  2021-05-02 13:01   ` Canan Talayhan
  0 siblings, 1 reply; 4+ messages in thread
From: Christopher Baines @ 2021-04-27 18:26 UTC (permalink / raw)
  To: Canan Talayhan; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 977 bytes --]


Canan Talayhan <canan.t.talayhan@gmail.com> writes:

> I am writing to give you an update on the progress that I have made.

Great :)

> I've created a temporary table named temp_package_metadata[1] and
> insert a revision that already in my local database[2]. Then as you
> said I've run the slow query with EXPLAIN ANALYZE. (screenshot is
> attached) I may understand the slow query's working logic.
>
> [1]CREATE TEMPORARY TABLE temp_package_metadata (LIKE package_metadata
> INCLUDING ALL)
>
> [2]INSERT INTO temp_package_metadata (home_page,
> location_id,license_set_id,package_description_set_id,
> package_synopsis_set_id) VALUES ('https://zlib.net/',9,9,2373,1407)

From this I'm guessing the temp_package_metadata table has only one
row. My understanding is that this table would normally have as many
rows as packages in the revision of Guix being processed. It might not
be possible to reproduce the slowness of the query without more rows.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Outreachy] [Guix Data Service]: Identify the slow parts of process
  2021-04-27 18:26 ` Christopher Baines
@ 2021-05-02 13:01   ` Canan Talayhan
  2021-05-02 14:10     ` Christopher Baines
  0 siblings, 1 reply; 4+ messages in thread
From: Canan Talayhan @ 2021-05-02 13:01 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2271 bytes --]

>From this I'm guessing the temp_package_metadata table has only one
>row. My understanding is that this table would normally have as many
>rows as packages in the revision of Guix being processed. It might not
>be possible to reproduce the slowness of the query without more rows.

I've inserted one row just as an example. As you've already said,
the temp_package_metadata table should have as many rows
as package_metadata. After populated the temp_package_metadata
with 500 rows of package_metadata, the query takes a long time
as we expected.

I'm using Flame Graph to visualize the slow paths on the revision part.
At first, I choose the slow one that I already know.
However, I can't successfully trigger the slow query following the below step:

* Run the **guix-data-service-process-job** under guix-data-service/scripts
folder as standalone providing an existing revision on my local db.

Am I on the right path for adding new jobs log to my local db?

In addition, I've successfully generated simple Flame Graph using Linux perf.
It visualizes only the data that was captured while I'm browsing on the
Guix Data Service Page. Please find the svg file attached.

Thanks,
Canan Talayhan

On Tue, Apr 27, 2021 at 9:26 PM Christopher Baines <mail@cbaines.net> wrote:
>
>
> Canan Talayhan <canan.t.talayhan@gmail.com> writes:
>
> > I am writing to give you an update on the progress that I have made.
>
> Great :)
>
> > I've created a temporary table named temp_package_metadata[1] and
> > insert a revision that already in my local database[2]. Then as you
> > said I've run the slow query with EXPLAIN ANALYZE. (screenshot is
> > attached) I may understand the slow query's working logic.
> >
> > [1]CREATE TEMPORARY TABLE temp_package_metadata (LIKE package_metadata
> > INCLUDING ALL)
> >
> > [2]INSERT INTO temp_package_metadata (home_page,
> > location_id,license_set_id,package_description_set_id,
> > package_synopsis_set_id) VALUES ('https://zlib.net/',9,9,2373,1407)
>
> From this I'm guessing the temp_package_metadata table has only one
> row. My understanding is that this table would normally have as many
> rows as packages in the revision of Guix being processed. It might not
> be possible to reproduce the slowness of the query without more rows.

[-- Attachment #2: guix_data_service_perf_fg.svg --]
[-- Type: image/svg+xml, Size: 37307 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Outreachy] [Guix Data Service]: Identify the slow parts of process
  2021-05-02 13:01   ` Canan Talayhan
@ 2021-05-02 14:10     ` Christopher Baines
  0 siblings, 0 replies; 4+ messages in thread
From: Christopher Baines @ 2021-05-02 14:10 UTC (permalink / raw)
  To: Canan Talayhan; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2394 bytes --]


Canan Talayhan <canan.t.talayhan@gmail.com> writes:

>>From this I'm guessing the temp_package_metadata table has only one
>>row. My understanding is that this table would normally have as many
>>rows as packages in the revision of Guix being processed. It might not
>>be possible to reproduce the slowness of the query without more rows.
>
> I've inserted one row just as an example. As you've already said,
> the temp_package_metadata table should have as many rows
> as package_metadata.

"as many rows as packages in the revision of Guix being processed" is
only going to be similar to the number of rows in the package_metadata
table if there's only been one or a number of similar revisions
processed, since the package_metadata table has entires covering all
processed revisions.

> After populated the temp_package_metadata with 500 rows of
> package_metadata, the query takes a long time as we expected.

Great, being able to reproduce the problem in a way that makes trying
things out easy is a good step forward.

I'd pull on this thread further, now you've got a slow query, how can
you make it faster?

> I'm using Flame Graph to visualize the slow paths on the revision part.
> At first, I choose the slow one that I already know.
> However, I can't successfully trigger the slow query following the below step:
>
> * Run the **guix-data-service-process-job** under guix-data-service/scripts
> folder as standalone providing an existing revision on my local db.
>
> Am I on the right path for adding new jobs log to my local db?
>
> In addition, I've successfully generated simple Flame Graph using Linux perf.
> It visualizes only the data that was captured while I'm browsing on the
> Guix Data Service Page. Please find the svg file attached.

If this relates to the query involving the temp_package_metadata table,
I'd focus on analyzing the slow query you're able to execute manually,
rather than processing an entire revision.

If you do however want to add more unprocessed jobs to your local
database, then you can use the
guix-data-service-process-branch-updated-mbox script to do this. It
takes one argument, an mbox file (file containing a bunch of
emails). You can download files by month from here [1], and you'll
probably want the month or next month on from the latest revision your
local database knows about.

1: https://lists.gnu.org/archive/mbox/guix-commits/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-05-02 14:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-27 14:10 [Outreachy] [Guix Data Service]: Identify the slow parts of process Canan Talayhan
2021-04-27 18:26 ` Christopher Baines
2021-05-02 13:01   ` Canan Talayhan
2021-05-02 14:10     ` Christopher Baines

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.