* Guix Data Service - September update
@ 2019-09-08 19:14 Christopher Baines
2019-09-11 20:29 ` Ludovic Courtès
0 siblings, 1 reply; 7+ messages in thread
From: Christopher Baines @ 2019-09-08 19:14 UTC (permalink / raw)
To: guix-devel
[-- Attachment #1: Type: text/plain, Size: 5457 bytes --]
Hey,
I think I sent out the last update about the Guix Data Service back in May
[1], and quite a few things have changed since then. This is a summary of
changes since then, and a list of things that I'm interested in looking at
next.
1: https://lists.gnu.org/archive/html/guix-devel/2019-05/msg00332.html
More progress with the Guix Data Service (17th of May)
I ran out of disk space on the server I'd been using to run the Guix Data
Service [2] so I removed it. Thanks to a kind donation of a machine from
UNIMI-DI through Giovanni Biscuolo, I deployed the Guix Data Service to
milano-guix-1, which has a lot more resources. It was down recently due to a
disk failure, but it's back online now [3].
2: https://prototype-guix-data-service.cbaines.net/ (no longer used)
3: http://milano-guix-1.di.unimi.it:8765/
It's not currently running on the standard ports, as I haven't got around to
setting up NGinx yet, but that's something I'm looking to work on soon. Quite
a few things have changed (and hopefully improved) in the last few months,
I've tried to give a summary below, but it's probably easier just to have a
look at it running on milano-guix-1 [3].
There have been many changes to the user interface, the index page has changed
to show a list of branches (rather than some revisions and jobs), there are
more links between pages, and some pages now link to cgit where useful. The
404 pages have been improved and cache headers are now set as well.
For processing jobs, the records in the database used to be deleted when the
job was completed, but now the records are kept and there's a page showing
jobs [4]. Also, the code now supports processing jobs without container
support for inferiors in Guix, and can process jobs in parallel, prioritising
the latest revision for each branch. Separate processes are used for each job
to allow concurrency, as well as improving memory management as those
processes exit when the job is finished. The log handling for jobs is also
more efficient.
4: http://milano-guix-1.di.unimi.it:8765/jobs
In terms of the Guix service, Sqitch is integrated in to the guix-data-service
script that provides the web server, so database migrations can be
automatically run on startup. There's also an option to create a pid file,
which is useful as it prevents the jobs process from starting until migrations
have been applied to the database.
I investigated why the comparison function was broken, and it turned out that
some unique constraints didn't work as intended in the case of columns with
NULL values, and the queries around inserting data failed in a similar
way. This is now handled properly, and there are migrations to remove the
duplicate values from the database. This was breaking some of the comparison
functionality.
Most recently, lint warnings for lint checkers that don't require network
access are stored in the database. The warnings are displayed on the revision
page, and included on the compare page (for example [5]). This follows on from
the changes that I started talking about here [6].
5: http://milano-guix-1.di.unimi.it:8765/compare?base_commit=e1e3fe08480868f960eea3ec1584c0c12b022e25&target_commit=067ea2989fce98f3f3f115534e2e685cfc681039
6: https://lists.gnu.org/archive/html/guix-devel/2019-05/msg00127.html
Linting, and how to get the information in to the Guix Data Serivce (6th May)
Other smaller things:
- There are pages for the latest processed revision for a branch (e.g. [7])
- There's less code duplication for the code relating to inserting new data
in to the database.
- Names are with each database connection, so it's easier to see what each
connection is doing
- glibc-locales from the inferior Guix is used when loading data, which fixes
some locale issues
- I hacked some better NULL value support on top of guile-squee [8]`
- I started changing the code to handle data in the natural type (e.g. number
for numbers), rather than using strings. This worked for a while as squee
always returned and expected strings, but to provide more adaptable code
for working with the database, being able to use the type information for
each value is really useful.
7: http://milano-guix-1.di.unimi.it:8765/repository/1/branch/master/latest-processed-revision
8: https://git.cbaines.net/guix/data-service/commit/?id=14419422008cc1ba42dea5ef90e6fb2762633064
In terms of what's next:
- I've started writing a proposal for the upcoming Outreachy round relating
to internationalisation in the Guix Data Service
- I want to get back to making progress on automating code review for Guix
patches, this was one of the main motivations for getting lint warnings in
the database and on to the comparison page
- I want to provide public dumps of the database from milano-guix-1, as well
as a small extract of that database. I think restoring a database locally
is a good way to get data for local development
- Relating to the Outreachy proposal but also generally, I want to write some
documentation on how to get the Guix Data Service running locally
- Currently the Git repository is on my personal Git server, and there are
discussions about moving it to Savannah
- The Guix package and service definitions haven't been merged, so I want to
look at that once the location for the Git repository is sorted out
Do let me know if you have any comments, questions, or also interested in
this,
Chris
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 962 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Guix Data Service - September update
2019-09-08 19:14 Guix Data Service - September update Christopher Baines
@ 2019-09-11 20:29 ` Ludovic Courtès
2019-09-14 10:51 ` Christopher Baines
2019-10-02 8:05 ` Christopher Baines
0 siblings, 2 replies; 7+ messages in thread
From: Ludovic Courtès @ 2019-09-11 20:29 UTC (permalink / raw)
To: Christopher Baines; +Cc: guix-devel
Hi Chris,
Christopher Baines <mail@cbaines.net> skribis:
> For processing jobs, the records in the database used to be deleted when the
> job was completed, but now the records are kept and there's a page showing
> jobs [4].
>
> 4: http://milano-guix-1.di.unimi.it:8765/jobs
Nice! The page only shows completed jobs, not queued jobs, right?
> Most recently, lint warnings for lint checkers that don't require network
> access are stored in the database. The warnings are displayed on the revision
> page, and included on the compare page (for example [5]). This follows on from
> the changes that I started talking about here [6].
Nice. The under-the-hood changes you mentioned above are also really
cool, it seems to be a solid base now.
> In terms of what's next:
>
> - I've started writing a proposal for the upcoming Outreachy round relating
> to internationalisation in the Guix Data Service
Yay!
> - I want to get back to making progress on automating code review for Guix
> patches, this was one of the main motivations for getting lint warnings in
> the database and on to the comparison page
That would be great. In the end, it seems to be that there are quite a
few services we could build around the Data Service. I’m not sure how
they should interact.
For instance, Mumi could talk to data.guix.gnu.org over an HTTP API, or
should we replicate the database at issues.guix.gnu.org so that Mumi can
tap directly into it?
Likewise, how should something like hpcguix-web (the package browser at
<https://hpc.guix.info/browse>) exploit available data, for instance to
show the history of package versions?
> - I want to provide public dumps of the database from milano-guix-1, as well
> as a small extract of that database. I think restoring a database locally
> is a good way to get data for local development
>
> - Relating to the Outreachy proposal but also generally, I want to write some
> documentation on how to get the Guix Data Service running locally
>
> - Currently the Git repository is on my personal Git server, and there are
> discussions about moving it to Savannah
>
> - The Guix package and service definitions haven't been merged, so I want to
> look at that once the location for the Git repository is sorted out
Looks like it’s done now:
<https://git.savannah.gnu.org/cgit/guix/data-service.git>. :-)
Are there specific areas where you’d like help? Would you encourage
people to start and hack tools or services that build upon the available
data?
Thank you!
Ludo’.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Guix Data Service - September update
2019-09-11 20:29 ` Ludovic Courtès
@ 2019-09-14 10:51 ` Christopher Baines
2019-09-16 15:10 ` zimoun
2019-10-02 8:05 ` Christopher Baines
1 sibling, 1 reply; 7+ messages in thread
From: Christopher Baines @ 2019-09-14 10:51 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guix-devel
[-- Attachment #1: Type: text/plain, Size: 2737 bytes --]
Ludovic Courtès <ludo@gnu.org> writes:
> Hi Chris,
>
> Christopher Baines <mail@cbaines.net> skribis:
>
>> For processing jobs, the records in the database used to be deleted when the
>> job was completed, but now the records are kept and there's a page showing
>> jobs [4].
>>
>> 4: http://milano-guix-1.di.unimi.it:8765/jobs
>
> Nice! The page only shows completed jobs, not queued jobs, right?
It shows all jobs currently, queued, running and completed (based off
the events relating to the job in the database).
>> - I want to get back to making progress on automating code review for Guix
>> patches, this was one of the main motivations for getting lint warnings in
>> the database and on to the comparison page
>
> That would be great. In the end, it seems to be that there are quite a
> few services we could build around the Data Service. I’m not sure how
> they should interact.
>
> For instance, Mumi could talk to data.guix.gnu.org over an HTTP API, or
> should we replicate the database at issues.guix.gnu.org so that Mumi can
> tap directly into it?
It's probably better to use some standard interface like a HTTP API
rather than the database directly. What data were you thinking would be
useful for Mumi?
> Likewise, how should something like hpcguix-web (the package browser at
> <https://hpc.guix.info/browse>) exploit available data, for instance to
> show the history of package versions?
There are some URLs that can be used to access data, for example this
URL should return packages for the latest revision of the master branch
[1].
1: http://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/packages.json?all_results=on
Accessing the history of package versions isn't possible yet, but this
is something I can look at adding, the information is there in the
database.
>> - The Guix package and service definitions haven't been merged, so I want to
>> look at that once the location for the Git repository is sorted out
>
> Looks like it’s done now:
> <https://git.savannah.gnu.org/cgit/guix/data-service.git>. :-)
Yep, and I've gone ahead and reconfigured milano-guix-1, so
http://data.guix.gnu.org/ is now the URL.
> Are there specific areas where you’d like help?
Yes, or rather I have far more ideas than time. I need to get my
thoughts in order though and write them down somewhere, maybe a ROADMAP
file in the repository, similar to the one in the main Guix
repository...
> Would you encourage people to start and hack tools or services that
> build upon the available data?
Yes, although I'm still unsure how stable the API will be, so that's an
important thing to keep in mind.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 962 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Guix Data Service - September update
2019-09-14 10:51 ` Christopher Baines
@ 2019-09-16 15:10 ` zimoun
0 siblings, 0 replies; 7+ messages in thread
From: zimoun @ 2019-09-16 15:10 UTC (permalink / raw)
To: Christopher Baines; +Cc: Guix Devel
Hi Chris,
On Sat, 14 Sep 2019 at 12:52, Christopher Baines <mail@cbaines.net> wrote:
> Accessing the history of package versions isn't possible yet, but this
> is something I can look at adding, the information is there in the
> database.
This should be awesome. :-)
It often arises when one wants to reproduce a scientific paper
providing the versions of the tools used.
For example see [1].
Well, it is not easy [2]: locate in which .scm file the package is
defined, then checkout the Guix repo and `git log` this file.
And it is even more error-prone if the package has changed of .scm
file (e.g., the recent haskell-xyz move).
[1] https://lists.gnu.org/archive/html/help-guix/2019-06/msg00094.html
[2] https://lists.gnu.org/archive/html/help-guix/2019-06/msg00098.html
Well the access to the information will ease the time travel. :-)
Thank you for this initiative even if I am not sure to clearly
understand yet what the Guix Data Service is. :-)
All the best,
simon
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Guix Data Service - September update
2019-09-11 20:29 ` Ludovic Courtès
2019-09-14 10:51 ` Christopher Baines
@ 2019-10-02 8:05 ` Christopher Baines
2019-10-02 10:49 ` Alex Sassmannshausen
1 sibling, 1 reply; 7+ messages in thread
From: Christopher Baines @ 2019-10-02 8:05 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guix-devel
[-- Attachment #1: Type: text/plain, Size: 1245 bytes --]
Ludovic Courtès <ludo@gnu.org> writes:
> That would be great. In the end, it seems to be that there are quite a
> few services we could build around the Data Service. I’m not sure how
> they should interact.
>
> For instance, Mumi could talk to data.guix.gnu.org over an HTTP API, or
> should we replicate the database at issues.guix.gnu.org so that Mumi can
> tap directly into it?
>
> Likewise, how should something like hpcguix-web (the package browser at
> <https://hpc.guix.info/browse>) exploit available data, for instance to
> show the history of package versions?
So I've got an initial thing working for the version histories now. You
can construct a URL like [1], which will show a table about the known
versions of the package (icecat in this case) on the master branch.
1: http://data.guix.gnu.org/repository/1/branch/master/package/icecat
The same data is available in JSON [2], and that might work for getting
the data in the hpcguix-web service.
2: http://data.guix.gnu.org/repository/1/branch/master/package/icecat.json
Fetching the data for individual packages definately won't work well for
all applications, so I'm definately open to exposing the data in other
ways as well.
Chris
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 962 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Guix Data Service - September update
2019-10-02 8:05 ` Christopher Baines
@ 2019-10-02 10:49 ` Alex Sassmannshausen
2019-10-06 9:52 ` Ludovic Courtès
0 siblings, 1 reply; 7+ messages in thread
From: Alex Sassmannshausen @ 2019-10-02 10:49 UTC (permalink / raw)
To: guix-devel
Christopher Baines <mail@cbaines.net> writes:
> Ludovic Courtès <ludo@gnu.org> writes:
>
>> That would be great. In the end, it seems to be that there are quite a
>> few services we could build around the Data Service. I’m not sure how
>> they should interact.
>>
>> For instance, Mumi could talk to data.guix.gnu.org over an HTTP API, or
>> should we replicate the database at issues.guix.gnu.org so that Mumi can
>> tap directly into it?
>>
>> Likewise, how should something like hpcguix-web (the package browser at
>> <https://hpc.guix.info/browse>) exploit available data, for instance to
>> show the history of package versions?
>
> So I've got an initial thing working for the version histories now. You
> can construct a URL like [1], which will show a table about the known
> versions of the package (icecat in this case) on the master branch.
>
> 1: http://data.guix.gnu.org/repository/1/branch/master/package/icecat
>
> The same data is available in JSON [2], and that might work for getting
> the data in the hpcguix-web service.
>
> 2: http://data.guix.gnu.org/repository/1/branch/master/package/icecat.json
This is incredibly cool.
Suddenly I understand how useful the Data Service could turn out to be!
Alex
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Guix Data Service - September update
2019-10-02 10:49 ` Alex Sassmannshausen
@ 2019-10-06 9:52 ` Ludovic Courtès
0 siblings, 0 replies; 7+ messages in thread
From: Ludovic Courtès @ 2019-10-06 9:52 UTC (permalink / raw)
To: Alex Sassmannshausen; +Cc: guix-devel
Howdy!
Alex Sassmannshausen <alex.sassmannshausen@gmail.com> skribis:
> Christopher Baines <mail@cbaines.net> writes:
[...]
>> So I've got an initial thing working for the version histories now. You
>> can construct a URL like [1], which will show a table about the known
>> versions of the package (icecat in this case) on the master branch.
>>
>> 1: http://data.guix.gnu.org/repository/1/branch/master/package/icecat
>>
>> The same data is available in JSON [2], and that might work for getting
>> the data in the hpcguix-web service.
>>
>> 2: http://data.guix.gnu.org/repository/1/branch/master/package/icecat.json
>
> This is incredibly cool.
Seconded, and the timeline looks nice too!
That means we could build UIs along the lines of:
guix pull --when=icecat=60.5
Very cool!
Ludo’.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-10-06 9:52 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-09-08 19:14 Guix Data Service - September update Christopher Baines
2019-09-11 20:29 ` Ludovic Courtès
2019-09-14 10:51 ` Christopher Baines
2019-09-16 15:10 ` zimoun
2019-10-02 8:05 ` Christopher Baines
2019-10-02 10:49 ` Alex Sassmannshausen
2019-10-06 9:52 ` Ludovic Courtès
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).