unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludovic.courtes@inria.fr>
To: Timothy Sample <samplet@ngyro.com>
Cc: guix-devel <guix-devel@gnu.org>,
	 guix-sysadmin@gnu.org,
	 Simon Tournier <zimon.toutoune@gmail.com>
Subject: Re: Disarchive database synchronization
Date: Mon, 20 Mar 2023 10:14:41 +0100	[thread overview]
Message-ID: <87v8ivokce.fsf@inria.fr> (raw)
In-Reply-To: <87sfe1lu0h.fsf@ngyro.com> (Timothy Sample's message of "Sat, 18 Mar 2023 13:49:34 -0600")

Howdy Timothy!

Timothy Sample <samplet@ngyro.com> skribis:

> Ludovic Courtès <ludovic.courtes@inria.fr> writes:

[...]

>> For the remaining entries, it’s trickier.  Sometimes it’s just the
>> gzip compression parameters that differ, which could be addressed with a
>> little bit more work:
>>
>> $ file ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz ../../disarchive/sha256/ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz
>> ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz:                         gzip compressed data, max compression, from Unix, original size modulo 2^32 446731
>> ../../disarchive/sha256/ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz: gzip compressed data, max speed, from Unix, original size modulo 2^32 446731
>
> I’m not sure getting the compressed files to match matters.

No it doesn’t matter for sure; it’s just that it would have made it
easier to check for relevant differences between the two Disarchive
databases.

>> Sometimes it’s trickier:
>>
>> # diff -u <(gunzip -d < 0001f025c1425ffe36270a81cb091eade87dd8d29ac773735ae47e1a8c8066c9.gz) <(gunzip -d < ../../disarchive/sha256/0001f025c1425ffe36270a81cb091eade87dd8d29ac773735ae47e1a8c8066c9.gz)
>> --- /dev/fd/63  2023-03-14 16:13:21.635733426 +0100
>> +++ /dev/fd/62  2023-03-14 16:13:21.635733426 +0100
>> @@ -1,7 +1,7 @@
>>  (disarchive
>>    (version 0)
>>    (gzip-member
>> -    (name "webview-sys-0.6.2.tar.gz")
>> +    (name "rust-webview-sys-0.6.2.tar.gz")

[...]

> The name field is not used for data reconstruction.  It’s for human
> consumption (and it may have made some early examples of use at the
> command line easier to explain).  Here, the difference is based on the
> fact that Crate URIs are weird, and the Preservation of Guix code does
> not keep the origin file name.  Hence, the PoG version extracts the
> Crate name alone from the URI, and the Cuirass version uses the Guix
> package name with the “rust-” prefix.

OK.  Again I was looking at this from the perspective of determining
whether there were “relevant” differences between the two Disarchive
databases.  Looks like it would be quite some work to determine that
automatically.

>> As Tim pointed out, Disarchive disassembly is not fully deterministic
>> and/or might change a bit over time as Disarchive evolves, and that’s
>> prolly what we’re seeing here.
>
> I honestly think this is a good thing.  My instincts tell me that we
> should excise all sources of ambiguity, like we’re trying to do in the
> big picture.  However, Disarchive will get better at describing things
> over time.  For instance, it doesn’t handle tar extension headers
> elegantly at the moment.  In the future, if I fix this, I might consider
> creating a “migrate” feature that improves existing specifications
> (e.g., converting the old, verbose representation of extension headers
> into the new representation).  In particular, I’ve left some warts in
> the software in order to ship it, and I would be sad to try and commit
> to those for the rest of time!

That makes a lot of sense!

> We might also add other resolver addresses besides SWHIDs....
>
> Maybe I’m missing some perspective, but I don’t think trying to commit
> to reproducible outputs for Disarchive makes sense.

Yes, I feel the same.

> P.S., we’ll have to do this dance again shortly, as I just computed
> 2,023 historical bzip2 specifications.  They’re not online yet, but
> they’ll be up when I publish the next PoG report – which should take less
> than a year this time!  :p

Woow, bzip2!  I was just now looking at a concrete disappearing-tarball
issue that involves bzip2:

  https://issues.guix.gnu.org/62071#8

Thank you!

Ludo’.


  reply	other threads:[~2023-03-20  9:15 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-14 15:55 Disarchive database synchronization Ludovic Courtès
2023-03-18 19:49 ` Timothy Sample
2023-03-20  9:14   ` Ludovic Courtès [this message]
2023-04-03 15:07   ` Simon Tournier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v8ivokce.fsf@inria.fr \
    --to=ludovic.courtes@inria.fr \
    --cc=guix-devel@gnu.org \
    --cc=guix-sysadmin@gnu.org \
    --cc=samplet@ngyro.com \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).