unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / Atom feed
* Preservation of Guix 2021-10-22
@ 2021-10-23  1:09 Timothy Sample
  2021-10-23  7:52 ` zimoun
  0 siblings, 1 reply; 13+ messages in thread
From: Timothy Sample @ 2021-10-23  1:09 UTC (permalink / raw)
  To: guix-devel

Hey all,

As promised, here is the updated Preservation of Guix Report:

    https://ngyro.com/pog-reports/2021-10-22/

It takes into account the as yet unreleased Disarchive fix.  The results
are quite a bit better!  Note especially that for the most recent
commit, of the 72.8% that I could check, 97.8% are in the SWH archive.

I didn’t add the breakdowns that zimoun suggested (yet), but here is a
bit of extra information.  Of the missing fixed-output derivations, we
have:

    git      376
    tar+gz  3092
    total   3468

If we limit ourselves to the most recent commit (258a27e):

    git      217
    tar+gz    78
    total    295

My guess (and that’s all it is!) is that before “sources.json”, a lot of
tarballs slipped through the cracks.  Now, most of the tarballs are
getting archived via “sources.json”, but since the Git references are
not, there are several that are being missed.

I was curious about the tarballs in the most recent commit, so I took a
look.  There’s no clear pattern.  There were three old GNU packages
(from “commencement.scm”), which I thought was strange, because SWH has
a special GNU loader.  OK, let’s look at Gawk 3.0.0.  It’s there, but
the ID is different.  It turns out the SWH version is missing the
executable bit on all the scripts.  I wonder if they somehow made a
mistake when they first ingested it....  (At least it’s not *another*
Disarchive bug!)


-- Tim


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Preservation of Guix 2021-10-22
  2021-10-23  1:09 Preservation of Guix 2021-10-22 Timothy Sample
@ 2021-10-23  7:52 ` zimoun
  2021-10-23 15:55   ` Timothy Sample
  0 siblings, 1 reply; 13+ messages in thread
From: zimoun @ 2021-10-23  7:52 UTC (permalink / raw)
  To: Timothy Sample, guix-devel

Hi Timothy,

On Fri, 22 Oct 2021 at 21:09, Timothy Sample <samplet@ngyro.com> wrote:

> As promised, here is the updated Preservation of Guix Report:
>
>     https://ngyro.com/pog-reports/2021-10-22/

Cool!


> I didn’t add the breakdowns that zimoun suggested (yet), but here is a
> bit of extra information.  Of the missing fixed-output derivations, we
> have:
>
>     git      376
>     tar+gz  3092
>     total   3468
>
> If we limit ourselves to the most recent commit (258a27e):
>
>     git      217
>     tar+gz    78
>     total    295

How can I get the list of these 376+217 packages?  Because it appears to
me easy to send a save request for them. :_)


> My guess (and that’s all it is!) is that before “sources.json”, a lot of
> tarballs slipped through the cracks.  Now, most of the tarballs are
> getting archived via “sources.json”, but since the Git references are
> not, there are several that are being missed.

Yeah, probably because “guix lint” is not systematically run by
submitter or reviewer.

The Data Service is running linter [1] and one can explore the report
[2].  However, I do not remember if the ’archival’ checker is run.

1: <https://data.guix.gnu.org/revision/4a0cd6297af35a36e9f492bb234fc110d6423a4d>
2: <https://data.guix.gnu.org/revision/4a0cd6297af35a36e9f492bb234fc110d6423a4d/lint-warnings> 



Cheers,
simon


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Preservation of Guix 2021-10-22
  2021-10-23  7:52 ` zimoun
@ 2021-10-23 15:55   ` Timothy Sample
  2021-10-25  8:43     ` zimoun
  2021-10-25 21:51     ` zimoun
  0 siblings, 2 replies; 13+ messages in thread
From: Timothy Sample @ 2021-10-23 15:55 UTC (permalink / raw)
  To: zimoun; +Cc: guix-devel

Hello,

zimoun <zimon.toutoune@gmail.com> writes:

> How can I get the list of these 376+217 packages?  Because it appears to
> me easy to send a save request for them. :_)

Download the database (there’s a button at the bottom of the report),
and use SQLite to run the following queries.

For the 376:

    SELECT fod_id,
        swhid,
        reference
    FROM fods
        LEFT JOIN fod_references USING (fod_id)
    WHERE NOT is_in_swh
        AND reference LIKE '(git-reference%';

For the 217 (which is the best place to start):

    SELECT fod_id,
        swhid,
        reference
    FROM fods
        JOIN fod_commit_links USING (fod_id)
        JOIN commits USING (commit_id)
        LEFT JOIN fod_references USING (fod_id)
    WHERE commits.hash = '258a27eea9aab4f8af995f95743ccd264b5efcb5'
        AND NOT is_in_swh
        AND reference LIKE '(git-reference%';

Keep in mind that there is still a possibility of bugs or issues.  Some
of the results make sense to me, like if it’s from sourcehut.  However,
most of them are from Github, and they should have ended up in the
archive one way or another.  In short, sometimes you have to double
check the SWHID.  You can do this by searching by origin and finding the
tag.  If there’s a difference, it’s not necessarily a bug (like with the
Gawk 3.0.0 tarball), but it will have to be investigated.

I investigated one just to see: <https://github.com/libusb/hidapi>.  It
turns out that SWH just hasn’t visited it since September, so they
didn’t have the most recent tags.  I asked them to visit it, and now
they do.  It’s as simple as that!  I still think there might be more
interesting problems, but it’s nice that some of them are that simple.


-- Tim


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Preservation of Guix 2021-10-22
  2021-10-23 15:55   ` Timothy Sample
@ 2021-10-25  8:43     ` zimoun
  2021-10-25  9:55       ` indieterminacy
  2021-10-25 20:49       ` zimoun
  2021-10-25 21:51     ` zimoun
  1 sibling, 2 replies; 13+ messages in thread
From: zimoun @ 2021-10-25  8:43 UTC (permalink / raw)
  To: Timothy Sample; +Cc: guix-devel

Hi,

On Sat, 23 Oct 2021 at 11:55, Timothy Sample <samplet@ngyro.com> wrote:

> Download the database (there’s a button at the bottom of the report),
> and use SQLite to run the following queries.

Cool!  Thanks.


If someone wants to help,

    $ wget https://ngyro.com/pog-reports/2021-10-22/pog.db
    $ guix environment --ad-hoc sqlite -- sqlite3
    sqlite> .open pog.db
    sqlite>

then copy/paste that:

> For the 376:
>
>     SELECT fod_id,
>         swhid,
>         reference
>     FROM fods
>         LEFT JOIN fod_references USING (fod_id)
>     WHERE NOT is_in_swh
>         AND reference LIKE '(git-reference%';

Therefore, I start with these ones.  After this query, a quick Emacs
macro to keep the URLs and then sort them to see if a pattern emerges.
Nothing flashy.  For the record, I get 214 GitHub URLs and 32 GitLab
(.com) ones.  I also note, among various stuff, that:

 "https://notabug.org/cwebber/guile-squee.git"
 "https://notabug.org/mothacehe/guile-squee.git"

because one is defined at (gnu packages guile-xyz) by guile-squee and
the other at (gnu packages ci) by guile-squee-dev.  Then, other remark:
Julia packages are listed:

 "https://github.com/JuliaArrays/OffsetArrays.jl"
 "https://github.com/JuliaArrays/StaticArrays.jl"
 "https://github.com/JuliaCI/BenchmarkTools.jl"
 "https://github.com/JuliaCollections/OrderedCollections.jl"
 "https://github.com/JuliaData/Parsers.jl"
 "https://github.com/JuliaDiff/ChainRules.jl"
 "https://github.com/JuliaDiff/ChainRulesCore.jl"
 "https://github.com/JuliaDiff/ChainRulesTestUtils.jl"
 "https://github.com/JuliaDiff/FiniteDifferences.jl"
 "https://github.com/JuliaGPU/Adapt.jl"
 "https://github.com/JuliaGraphics/ColorTypes.jl"
 "https://github.com/JuliaGraphics/Colors.jl"
 "https://github.com/JuliaLang/Compat.jl"
 "https://github.com/JuliaObjects/ConstructionBase.jl"
 "https://github.com/JuliaPackaging/JLLWrappers.jl"
 "https://github.com/JuliaWeb/URIs.jl"

when I am sure to have scheduled them couple of days (weeks?) ago.
Therefore, I have not investigated yet if the archiving failed or if POG
is behind.

Another general remark, some URLs are duplicated, for instance:

 "https://codeberg.org/dnkl/fcft"
 "https://git.cbaines.net/git/guix/build-coordinator"
 "https://git.code.sf.net/p/wsjt/wsjtx.git"
 "https://git.code.sf.net/u/bsomervi/hamlib.git"
 "https://git.elephly.net/software/mumi.git"
 "https://git.mfiano.net/mfiano/golden-utils"
 "https://git.sr.ht/~bzg/org-contrib"
 "https://git.systemreboot.net/guile-email"
 "https://git.systemreboot.net/guile-xapian"
 "https://git.umaneti.net/flycheck-grammalecte/"
 "https://github.com/Eloston/ungoogled-chromium"
 etc.
 …
 
I have not checked if several packages refer to the same URL.


Now, the core point.  Running ’save-origin’ form (guix swh) on the URLs,
I get for instance:

        https://code.divoplade.fr/mkdir-p.git   accepted	failed
        git://pumpa.branchable.com 	accepted	failed

for some I have checked.  Investigation why they fails is required.


Last, I failed to use TOKEN from “guix repl”.  I just do:

--8<---------------cut here---------------start------------->8---
(use-modules (guix swh)
             (srfi srfi-1))

(setenv "TOKEN"
        "eyJhb…"
         )

(define missings
  (list
   "git://pumpa.branchable.com/"

[...]

   "https://salsa.debian.org/installer-team/debootstrap.git"
   ))

(for-each
 (lambda (url)
   (save-origin url))
 missings)
--8<---------------cut here---------------end--------------->8---

but this fails.  What do I miss?  Does %swh-token need to be exported
and tweaked in the script?


Cheers,
simon


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Preservation of Guix 2021-10-22
  2021-10-25  8:43     ` zimoun
@ 2021-10-25  9:55       ` indieterminacy
  2021-10-25 11:19         ` zimoun
  2021-10-25 20:49       ` zimoun
  1 sibling, 1 reply; 13+ messages in thread
From: indieterminacy @ 2021-10-25  9:55 UTC (permalink / raw)
  To: zimoun; +Cc: guix-devel

Hi Simon,

Would it be remiss to cross reference contributors to these identified
scripts against Guix contributors (commits, ML messages)?

With that it may be possible to email them a templated message,
featuring:
* The current activity and reason for outreach
* The advantages of providing improvements
* A polite request for them (or interested parties) to assist (whenever)
possible
* A reference to documentation or cookbooks to provide further context
and approaches for resolving

WDYT?


Jonathan

zimoun <zimon.toutoune@gmail.com> writes:

> Hi,
>
> On Sat, 23 Oct 2021 at 11:55, Timothy Sample <samplet@ngyro.com> wrote:
>
>> Download the database (there’s a button at the bottom of the report),
>> and use SQLite to run the following queries.
>
> Cool!  Thanks.
>
>
> If someone wants to help,
>
>     $ wget https://ngyro.com/pog-reports/2021-10-22/pog.db
>     $ guix environment --ad-hoc sqlite -- sqlite3
>     sqlite> .open pog.db
>     sqlite>
>
> then copy/paste that:
>
>> For the 376:
>>
>>     SELECT fod_id,
>>         swhid,
>>         reference
>>     FROM fods
>>         LEFT JOIN fod_references USING (fod_id)
>>     WHERE NOT is_in_swh
>>         AND reference LIKE '(git-reference%';
>
> Therefore, I start with these ones.  After this query, a quick Emacs
> macro to keep the URLs and then sort them to see if a pattern emerges.
> Nothing flashy.  For the record, I get 214 GitHub URLs and 32 GitLab
> (.com) ones.  I also note, among various stuff, that:
>
>  "https://notabug.org/cwebber/guile-squee.git"
>  "https://notabug.org/mothacehe/guile-squee.git"
>
> because one is defined at (gnu packages guile-xyz) by guile-squee and
> the other at (gnu packages ci) by guile-squee-dev.  Then, other remark:
> Julia packages are listed:
>
>  "https://github.com/JuliaArrays/OffsetArrays.jl"
>  "https://github.com/JuliaArrays/StaticArrays.jl"
>  "https://github.com/JuliaCI/BenchmarkTools.jl"
>  "https://github.com/JuliaCollections/OrderedCollections.jl"
>  "https://github.com/JuliaData/Parsers.jl"
>  "https://github.com/JuliaDiff/ChainRules.jl"
>  "https://github.com/JuliaDiff/ChainRulesCore.jl"
>  "https://github.com/JuliaDiff/ChainRulesTestUtils.jl"
>  "https://github.com/JuliaDiff/FiniteDifferences.jl"
>  "https://github.com/JuliaGPU/Adapt.jl"
>  "https://github.com/JuliaGraphics/ColorTypes.jl"
>  "https://github.com/JuliaGraphics/Colors.jl"
>  "https://github.com/JuliaLang/Compat.jl"
>  "https://github.com/JuliaObjects/ConstructionBase.jl"
>  "https://github.com/JuliaPackaging/JLLWrappers.jl"
>  "https://github.com/JuliaWeb/URIs.jl"
>
> when I am sure to have scheduled them couple of days (weeks?) ago.
> Therefore, I have not investigated yet if the archiving failed or if POG
> is behind.
>
> Another general remark, some URLs are duplicated, for instance:
>
>  "https://codeberg.org/dnkl/fcft"
>  "https://git.cbaines.net/git/guix/build-coordinator"
>  "https://git.code.sf.net/p/wsjt/wsjtx.git"
>  "https://git.code.sf.net/u/bsomervi/hamlib.git"
>  "https://git.elephly.net/software/mumi.git"
>  "https://git.mfiano.net/mfiano/golden-utils"
>  "https://git.sr.ht/~bzg/org-contrib"
>  "https://git.systemreboot.net/guile-email"
>  "https://git.systemreboot.net/guile-xapian"
>  "https://git.umaneti.net/flycheck-grammalecte/"
>  "https://github.com/Eloston/ungoogled-chromium"
>  etc.
>  …
>  
> I have not checked if several packages refer to the same URL.
>
>
> Now, the core point.  Running ’save-origin’ form (guix swh) on the URLs,
> I get for instance:
>
>         https://code.divoplade.fr/mkdir-p.git   accepted	failed
>         git://pumpa.branchable.com 	accepted	failed
>
> for some I have checked.  Investigation why they fails is required.
>
>
> Last, I failed to use TOKEN from “guix repl”.  I just do:
>
> (use-modules (guix swh)
>              (srfi srfi-1))
>
> (setenv "TOKEN"
>         "eyJhb…"
>          )
>
> (define missings
>   (list
>    "git://pumpa.branchable.com/"
>
> [...]
>
>    "https://salsa.debian.org/installer-team/debootstrap.git"
>    ))
>
> (for-each
>  (lambda (url)
>    (save-origin url))
>  missings)
>
> but this fails.  What do I miss?  Does %swh-token need to be exported
> and tweaked in the script?
>
>
> Cheers,
> simon



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Preservation of Guix 2021-10-22
  2021-10-25  9:55       ` indieterminacy
@ 2021-10-25 11:19         ` zimoun
  0 siblings, 0 replies; 13+ messages in thread
From: zimoun @ 2021-10-25 11:19 UTC (permalink / raw)
  To: indieterminacy; +Cc: guix-devel

Hi Jonathan,

On Mon, 25 Oct 2021 at 11:55, indieterminacy@libre.brussels wrote:

> Would it be remiss to cross reference contributors to these identified
> scripts against Guix contributors (commits, ML messages)?

I do not understand what you are asking?

The only thing contributors can do is “./pre-inst-env guix lint
<package>” before they submit.  Then, «we» are investigating why the
coverage is not 100% and are trying to address the blockers (from Guix
infra first, then ask SWH).

It is an experimental work-in-progress.  It seems too early for
cross-reference something – it is already enough work without digging
commits or ML messages. :-)

Where people can help is by exploring the current pog.db file and check
against SWH; as Timothy or I are trying here or there, for example.

Or people can also help by subscribing to Software Heritage
Authentication service [1], get a TOKEN, and run systematically:

   $ GUIX_SWH_TOKEN=$TOKEN guix lint -c archival

Well, for sure, it is a poor man solution.  But it helps to improve the
coverage waiting a robust mechanism. :-) For instance, one can run,

--8<---------------cut here---------------start------------->8---
$ TOKEN=eyJh…
$ for p in $(guix package -p ~/.guix-profile -I | cut -f1);\
> do GUIX_SWH_TOKEN=$TOKEN guix lint -c archival $p \
> ;done
--8<---------------cut here---------------end--------------->8---

for all the profiles they use.  For instance, it should display
something like:

--8<---------------cut here---------------start------------->8---
gnu/packages/emacs-xyz.scm:21700:5: emacs-rust-mode@1.0.0: scheduled Software Heritage archival
gnu/packages/emacs-xyz.scm:18966:5: emacs-org-re-reveal@3.12.1: scheduled Software Heritage archival
gnu/packages/emacs-xyz.scm:12296:5: emacs-org-contrib@0.3: scheduled Software Heritage archival
gnu/packages/emacs-xyz.scm:12264:5: emacs-org@9.5: source not archived on Software Heritage and missing from the Disarchive database
gnu/packages/emacs-xyz.scm:786:5: emacs-magit@3.3.0: scheduled Software Heritage archival
gnu/packages/guile.scm:317:12: guile@3.0.7: source not archived on Software Heritage and missing from the Disarchive database
gnu/packages/emacs.scm:80:12: emacs@27.2: source not archived on Software Heritage and missing from the Disarchive database
gnu/packages/mail.scm:1352:12: notmuch@0.33.2: source not archived on Software Heritage and missing from the Disarchive database
gnu/packages/aspell.scm:115:12: aspell-dict-en@2020.12.07-0: source not archived on Software Heritage and missing from the Disarchive database
--8<---------------cut here---------------end--------------->8---

Then, people can go to

   <https://archive.softwareheritage.org/save/#requests>

and check if the «scheduled» packages «succeeded».  Please report if the
status is «failed».  Because it requires investigations.

1: <https://archive.softwareheritage.org/oidc/login/>


Cheers,
simon


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Preservation of Guix 2021-10-22
  2021-10-25  8:43     ` zimoun
  2021-10-25  9:55       ` indieterminacy
@ 2021-10-25 20:49       ` zimoun
  2021-10-29 14:25         ` Ludovic Courtès
  1 sibling, 1 reply; 13+ messages in thread
From: zimoun @ 2021-10-25 20:49 UTC (permalink / raw)
  To: Timothy Sample; +Cc: guix-devel

Hi,

On Mon, 25 Oct 2021 at 10:43, zimoun <zimon.toutoune@gmail.com> wrote:
> On Sat, 23 Oct 2021 at 11:55, Timothy Sample <samplet@ngyro.com> wrote:

>> For the 376:
>>
>>     SELECT fod_id,
>>         swhid,
>>         reference
>>     FROM fods
>>         LEFT JOIN fod_references USING (fod_id)
>>     WHERE NOT is_in_swh
>>         AND reference LIKE '(git-reference%';

Using the the URLs reported by this query, I notice:

 1, https://git.code.sf.net is rejected by SWH.  For instance, «The
 origin url is not valid or does not reference a code repository» or
 «Error: The "save code now" request has been rejected because the
 provided origin url is blacklisted.»  It means these 5:

   "https://git.code.sf.net/p/fldigi/flamp"
   "https://git.code.sf.net/p/fldigi/flrig"
   "https://git.code.sf.net/p/fldigi/flwrap"
   "https://git.code.sf.net/p/wsjt/wsjtx.git"
   "https://git.code.sf.net/u/bsomervi/hamlib.git"

 2. Even saved manually, these systematically fail:

   "git://pumpa.branchable.com"

   "https://github.com/scikit-learn/scikit-learn"
   "https://git.minetest.land/Wuzzy/MineClone2"
   "https://git.savannah.gnu.org/git/gsequencer.git"


Cheers,
simon


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Preservation of Guix 2021-10-22
  2021-10-23 15:55   ` Timothy Sample
  2021-10-25  8:43     ` zimoun
@ 2021-10-25 21:51     ` zimoun
  2021-10-26 13:12       ` Timothy Sample
  1 sibling, 1 reply; 13+ messages in thread
From: zimoun @ 2021-10-25 21:51 UTC (permalink / raw)
  To: Timothy Sample; +Cc: guix-devel

Hi Timothy,

On Sat, 23 Oct 2021 at 11:55, Timothy Sample <samplet@ngyro.com> wrote:
> zimoun <zimon.toutoune@gmail.com> writes:
>
>> How can I get the list of these 376+217 packages?  Because it appears to
>> me easy to send a save request for them. :-)

Done. :-)


> Download the database (there’s a button at the bottom of the report),
> and use SQLite to run the following queries.
>
> For the 376:
>
>     SELECT fod_id,
>         swhid,
>         reference
>     FROM fods
>         LEFT JOIN fod_references USING (fod_id)
>     WHERE NOT is_in_swh
>         AND reference LIKE '(git-reference%';
>
> For the 217 (which is the best place to start):
>
>     SELECT fod_id,
>         swhid,
>         reference
>     FROM fods
>         JOIN fod_commit_links USING (fod_id)
>         JOIN commits USING (commit_id)
>         LEFT JOIN fod_references USING (fod_id)
>     WHERE commits.hash = '258a27eea9aab4f8af995f95743ccd264b5efcb5'
>         AND NOT is_in_swh
>         AND reference LIKE '(git-reference%';

I have not checked one per one but I guess the 217 are included in the
376 ones.  As I said earlier, many of them are duplicates.

Stupid me about previous question using SWH token, I did not use the
right environment variable.  Arf!

Using something like that,

--8<---------------cut here---------------start------------->8---
(use-modules (guix swh)
             (srfi srfi-1))

(setenv "GUIX_SWH_TOKEN"
        "eyJhbG…"
        )

(define missings
  (list
   ;;"git://pumpa.branchable.com/"
   "http://genome-source.cse.ucsc.edu/samtabix.git"

[...]

   "https://salsa.debian.org/installer-team/debootstrap.git"
   ))

(for-each
 (lambda (url)
   (unless (lookup-origin url)
     (pk url)
     (save-origin url)))
 missings)
--8<---------------cut here---------------end--------------->8---

now, all the missing URLs should be ingested by SWH – modulo a couple
rejected or failed*.  It is not bullet-proof for the future – somehow it
shows that submitters and reviewers do not systematically run “guix lint
<package>” – but it should improve the POG report. :-)

I expect to have 0 missing git-reference package for the recent
revisions.  If not, please raise!


*reject or failed: I am going to ask SWH folks.


From my side, the next steps I envision is:

 1. have sources.json produced by a derivation so it could be run by CI
 2. support SVN

I need help for #1 because I am not sure to understand how to do that.
About #2, it is few packages but many packages depends on some TeX.

Cheers,
simon


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Preservation of Guix 2021-10-22
  2021-10-25 21:51     ` zimoun
@ 2021-10-26 13:12       ` Timothy Sample
  2021-10-26 15:50         ` zimoun
  0 siblings, 1 reply; 13+ messages in thread
From: Timothy Sample @ 2021-10-26 13:12 UTC (permalink / raw)
  To: zimoun; +Cc: guix-devel

Hi zimoun,

zimoun <zimon.toutoune@gmail.com> writes:

>>> How can I get the list of these 376+217 packages?  Because it appears to
>>> me easy to send a save request for them. :-)
>
> Done. :-)

Nice!

> I have not checked one per one but I guess the 217 are included in the
> 376 ones.  As I said earlier, many of them are duplicates.

The 376 is the total across all commits, and the 217 only looks at one
commit.  In other words, yes, each of the 217 is included in the 376.

> I expect to have 0 missing git-reference package for the recent
> revisions.  If not, please raise!

Wow!  I really appreciate the work you’re putting into this.  I think
we’re really getting somewhere.

> *reject or failed: I am going to ask SWH folks.

Cool.  I noticed a few of these months ago and was wondering what could
be done about them.


-- Tim


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Preservation of Guix 2021-10-22
  2021-10-26 13:12       ` Timothy Sample
@ 2021-10-26 15:50         ` zimoun
  2021-10-29 14:30           ` Ludovic Courtès
  0 siblings, 1 reply; 13+ messages in thread
From: zimoun @ 2021-10-26 15:50 UTC (permalink / raw)
  To: Timothy Sample; +Cc: guix-devel

Hi Timothy,

On Tue, 26 Oct 2021 at 09:12, Timothy Sample <samplet@ngyro.com> wrote:

>> I expect to have 0 missing git-reference package for the recent
>> revisions.  If not, please raise!
>
> Wow!  I really appreciate the work you’re putting into this.  I think
> we’re really getting somewhere.

I have been too fast. :-) Well, let me know if git-references for the
next PoG report.  Because I suspect that (guix swh) used by
’check-archival’ reports incorrectly missing archives and IIUC you use
another entry-point.  For instance, give a look at [1].

1: <https://lists.gnu.org/archive/html/guix-devel/2021-10/msg00250.html>


>> *reject or failed: I am going to ask SWH folks.
>
> Cool.  I noticed a few of these months ago and was wondering what could
> be done about them.

Some should be fixed soon. Other are already fixed but the fix has not
yet landed to production.  However, some are still open; for instance

--8<---------------cut here---------------start------------->8---
26/10/2021, 09:29:59 git https://github.com/scikit-learn/scikit-learn accepted failed
--8<---------------cut here---------------end--------------->8---

and the log says:

--8<---------------cut here---------------start------------->8---
swh.loader.git.converters.HashMismatch: Expected Directory hash to be
5475d108765c5591003210e70fc01bf2a77fca55, got 59d6219647cd84a027780487f0e021e4f57e93d6
--8<---------------cut here---------------end--------------->8---

and in the same time
<https://archive.softwareheritage.org/browse/origin/directory/?origin_url=https://github.com/scikit-learn/scikit-learn>.

And probably also hitting when [1] reports.

Bah, complicated story… :-)


Cheers,
simon


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Preservation of Guix 2021-10-22
  2021-10-25 20:49       ` zimoun
@ 2021-10-29 14:25         ` Ludovic Courtès
  0 siblings, 0 replies; 13+ messages in thread
From: Ludovic Courtès @ 2021-10-29 14:25 UTC (permalink / raw)
  To: zimoun; +Cc: guix-devel

Hello!

zimoun <zimon.toutoune@gmail.com> skribis:

> Using the the URLs reported by this query, I notice:
>
>  1, https://git.code.sf.net is rejected by SWH.  For instance, «The
>  origin url is not valid or does not reference a code repository» or
>  «Error: The "save code now" request has been rejected because the
>  provided origin url is blacklisted.»  It means these 5:
>
>    "https://git.code.sf.net/p/fldigi/flamp"
>    "https://git.code.sf.net/p/fldigi/flrig"
>    "https://git.code.sf.net/p/fldigi/flwrap"
>    "https://git.code.sf.net/p/wsjt/wsjtx.git"
>    "https://git.code.sf.net/u/bsomervi/hamlib.git"
>
>  2. Even saved manually, these systematically fail:
>
>    "git://pumpa.branchable.com"
>
>    "https://github.com/scikit-learn/scikit-learn"
>    "https://git.minetest.land/Wuzzy/MineClone2"
>    "https://git.savannah.gnu.org/git/gsequencer.git"

Did the SWH folks eventually provide you info as to why they fail?

Ludo’.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Preservation of Guix 2021-10-22
  2021-10-26 15:50         ` zimoun
@ 2021-10-29 14:30           ` Ludovic Courtès
  2021-10-30 16:00             ` zimoun
  0 siblings, 1 reply; 13+ messages in thread
From: Ludovic Courtès @ 2021-10-29 14:30 UTC (permalink / raw)
  To: zimoun; +Cc: guix-devel

zimoun <zimon.toutoune@gmail.com> skribis:

> Some should be fixed soon. Other are already fixed but the fix has not
> yet landed to production.  However, some are still open; for instance
>
> 26/10/2021, 09:29:59 git https://github.com/scikit-learn/scikit-learn accepted failed
>
>
> and the log says:
>
> swh.loader.git.converters.HashMismatch: Expected Directory hash to be
> 5475d108765c5591003210e70fc01bf2a77fca55, got 59d6219647cd84a027780487f0e021e4f57e93d6

I learned that SWH has a canonicalization problem with Git manifests
that’s similar to the canonicalization problem with tar headers that
Disarchive addresses:

  https://sympa.inria.fr/sympa/arc/swh-devel/2021-10/msg00038.html

Could it be that the scikit-learn repo contains “weird” (non-canonical)
Git objects?

Ludo’.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Preservation of Guix 2021-10-22
  2021-10-29 14:30           ` Ludovic Courtès
@ 2021-10-30 16:00             ` zimoun
  0 siblings, 0 replies; 13+ messages in thread
From: zimoun @ 2021-10-30 16:00 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

Hi Ludo,

On Fri, 29 Oct 2021 at 16:30, Ludovic Courtès <ludo@gnu.org> wrote:

> Could it be that the scikit-learn repo contains “weird” (non-canonical)
> Git objects?

Yes.  It is my understanding from what olasd more or less explained on
#swh-devel.

Other said, for some cases, it is not covered not because Guix is not
doing the job but because SWH has a bug. :-)

Cheers,
simon


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-10-30 16:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-23  1:09 Preservation of Guix 2021-10-22 Timothy Sample
2021-10-23  7:52 ` zimoun
2021-10-23 15:55   ` Timothy Sample
2021-10-25  8:43     ` zimoun
2021-10-25  9:55       ` indieterminacy
2021-10-25 11:19         ` zimoun
2021-10-25 20:49       ` zimoun
2021-10-29 14:25         ` Ludovic Courtès
2021-10-25 21:51     ` zimoun
2021-10-26 13:12       ` Timothy Sample
2021-10-26 15:50         ` zimoun
2021-10-29 14:30           ` Ludovic Courtès
2021-10-30 16:00             ` zimoun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).