unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* (Re-) Designing extractong-downaloder
@ 2022-02-23  8:57 Hartmut Goebel
  2022-02-23 10:52 ` pukkamustard
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Hartmut Goebel @ 2022-02-23  8:57 UTC (permalink / raw)
  To: guix-devel

Hi,

TL;DR: What do you think about the idea of an „extracting dowloader“?

I'm about pick up work on „extracting downloader“ and the rebar build 
system (for erlang), see <https://issues.guix.gnu.org/51061> for a first 
try, In the aforementioned issue some points came up regarding the basic 
design of the patch. Thus before starting to write code, I'd like to 
agree on a basic design.

The basic idea behind „extracting downloader“ is as follows: Packages 
provided by hex.pm (the distribution repository for erlang and elixir 
packages) are tar-archives containing some meta-data files and the 
actual source (contents.tar.gz), see example below, So the ideas was to 
only store the contents.tar.gz (instead of requiring an additional 
unpacking step).

In some earlier discussion someone mentioned, this could be interesting 
for ruby gems, too.

Storing only the archive would allow to have the archive's hash as the 
"source"-hash and allow for easy validation of the hash. Anyhow, much of 
the complexity of the current implementation (see issue 51061) is caused 
by this idea, since the code needs to postbone hashing to after the 
download.

Also In some earlier discussion Ludo (afair) brought up the point 
whether e.g. swh would be able provide a source-package if hased this way.

What do you think about the idea of an „extracting dowloader“?


Example for a package from hex.pm:

$ wget https://repo.hex.pm/tarballs/getopt-1.0.2.tar
…
$ tar tvf getopt-1.0.2.tar
-rw-r--r-- 0/0               1 2000-01-01 01:00 VERSION
-rw-r--r-- 0/0              64 2000-01-01 01:00 CHECKSUM
-rw-r--r-- 0/0             451 2000-01-01 01:00 metadata.config
-rw-r--r-- 0/0           14513 2000-01-01 01:00 contents.tar.gz


-- 
Regards
Hartmut Goebel

| Hartmut Goebel          | h.goebel@crazy-compilers.com               |
| www.crazy-compilers.com | compilers which you thought are impossible |



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: (Re-) Designing extractong-downaloder
  2022-02-23  8:57 (Re-) Designing extractong-downaloder Hartmut Goebel
@ 2022-02-23 10:52 ` pukkamustard
  2022-02-24  8:50   ` Hartmut Goebel
  2022-02-23 12:30 ` Maxime Devos
       [not found] ` <beb0e29f-6066-d1b5-b560-22a3d0a98ad8@goebel-consult.de>
  2 siblings, 1 reply; 8+ messages in thread
From: pukkamustard @ 2022-02-23 10:52 UTC (permalink / raw)
  To: Hartmut Goebel; +Cc: guix-devel


Hi Hartmut,

Hartmut Goebel <h.goebel@crazy-compilers.com> writes:

> I'm about pick up work on „extracting downloader“ and the rebar build
> system (for erlang),

I'm very much looking forward to this!

> The basic idea behind „extracting downloader“ is as follows: Packages
> provided by hex.pm (the distribution repository for erlang and elixir
> packages) are tar-archives containing some meta-data files and the
> actual source (contents.tar.gz), see example below, So the ideas was
> to only store the contents.tar.gz (instead of requiring an additional
> unpacking step).

Why use the source from hex.pm at all? Would it be possible to just
fetch the hex.pm archive when importing a package, read the
metadata.config file and then try and use upstream source (e.g. GitHub)?

The hex.pm metadata.config file does not seem to exactly specify the
upstream source. We would need some heuristics to figure this out. But
maybe we could find a heuristic that works well enough? This would solve
the double-archive problem.

For packages where the heuristics fails we fallback and use the source
as provided from hex.pm (unextracted) and use an additional build phase
to do the double extraction? If this only affects a few packages then
storing the source double-archived does not seem so bad.

Thanks,
pukkamustard


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: (Re-) Designing extractong-downaloder
  2022-02-23  8:57 (Re-) Designing extractong-downaloder Hartmut Goebel
  2022-02-23 10:52 ` pukkamustard
@ 2022-02-23 12:30 ` Maxime Devos
  2022-02-23 12:35   ` Maxime Devos
       [not found] ` <beb0e29f-6066-d1b5-b560-22a3d0a98ad8@goebel-consult.de>
  2 siblings, 1 reply; 8+ messages in thread
From: Maxime Devos @ 2022-02-23 12:30 UTC (permalink / raw)
  To: Hartmut Goebel, guix-devel

[-- Attachment #1: Type: text/plain, Size: 1444 bytes --]

Hartmut Goebel schreef op wo 23-02-2022 om 09:57 [+0100]:
> TL;DR: What do you think about the idea of an „extracting dowloader“?
> 
> I'm about pick up work on „extracting downloader“ and the rebar build 
> system (for erlang), see <https://issues.guix.gnu.org/51061> for a first 
> try, In the aforementioned issue some points came up regarding the basic 
> design of the patch. Thus before starting to write code, I'd like to 
> agree on a basic design.

Could the ‘extracting’ downloader be built on top of the regular
downloader?  More concretely:

(package
  (name "some-package-from-hex")
  (source
    (extract-from-hex
      (origin
        (method
          "http://some-url-pointing-to-a-tarball-wrapped-in-a-tarball")
        (sha256 (base32 <hash of the wrapper tarball>)))))
  (build-system ...))

Here, 'extract-from-hex' would turn a file-like object into a <extract-
from-hex>, which lowers to some derivation extracting the tarball from
the tarball.  (guix upstream) might need to be modified to support
<extract-from-hex>.

Also, is there some fundamental reason that hex.pm wraps tars inside
tars and only provides the wrapped tars, or could hex.pm be convinced
to also serve the underlying tars directly?

A benefit of delegating the actual downloading to url-fetch, is that
(guix scripts perform-download) would be used, so connections could be
cached.

Greetings,
Maxime.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: (Re-) Designing extractong-downaloder
  2022-02-23 12:30 ` Maxime Devos
@ 2022-02-23 12:35   ` Maxime Devos
  2022-02-24  8:56     ` Hartmut Goebel
  0 siblings, 1 reply; 8+ messages in thread
From: Maxime Devos @ 2022-02-23 12:35 UTC (permalink / raw)
  To: Hartmut Goebel, guix-devel

[-- Attachment #1: Type: text/plain, Size: 557 bytes --]

Maxime Devos schreef op wo 23-02-2022 om 13:30 [+0100]:
> A benefit of delegating the actual downloading to url-fetch, is that
> (guix scripts perform-download) would be used, so connections could
> be cached.

Nevermind, this benefit is probably undone by the extra unpacking.
I still recommend delegating the downloading to url-fetch though, such
that if, say, a bug in (guix swh) has been fixed or (guix swh) has been
improved in other ways, then time-travellers to the past can still
benefit of the improved (guix swh).

Greetings,
Maxime.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: (Re-) Designing extractong-downaloder
  2022-02-23 10:52 ` pukkamustard
@ 2022-02-24  8:50   ` Hartmut Goebel
  0 siblings, 0 replies; 8+ messages in thread
From: Hartmut Goebel @ 2022-02-24  8:50 UTC (permalink / raw)
  To: pukkamustard; +Cc: guix-devel

Am 23.02.22 um 11:52 schrieb pukkamustard:
> Why use the source from hex.pm at all?

While issue 51061 is about the hex.pm importer and the rebar build 
system, this thread in only about the extracting downloader :-)

> The hex.pm metadata.config file does not seem to exactly specify the
> upstream source. We would need some heuristics to figure this out. But
> maybe we could find a heuristic that works well enough? This would solve
> the double-archive problem.

FMPOV, hex.pm is one important valid distribution point for erlang and 
elixir packages. Like PypPi is for Python and CPAN is for Perl. So we 
should support defining this as a packages source, which can also be 
used for checking for updates much easier than any git repository or 
git-based forge.

Some of the packages I've investigated so far are easier to build from 
hex.pm than from github. E.g. some github repos contain a „rebar“ binary 
(which needs to be deleted by a snippet when defining the source), while 
the corresponding hex.pm package can be used as-is.

Regarding heuristics: Since build should be reproducible, a source 
definition must not use any heuristics. Anyhow this might be useful for 
the hex.pm importer.

-- 
Regards
Hartmut Goebel

| Hartmut Goebel          | h.goebel@crazy-compilers.com               |
| www.crazy-compilers.com | compilers which you thought are impossible |



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: (Re-) Designing extractong-downaloder
  2022-02-23 12:35   ` Maxime Devos
@ 2022-02-24  8:56     ` Hartmut Goebel
  0 siblings, 0 replies; 8+ messages in thread
From: Hartmut Goebel @ 2022-02-24  8:56 UTC (permalink / raw)
  To: Maxime Devos, guix-devel

Am 23.02.22 um 13:35 schrieb Maxime Devos:
> Nevermind, this benefit is probably undone by the extra unpacking.

Probably.

Anyway, this is worth thinking of, as it would make the additional 
unpacking part of the source. And thus unpacking would be decoupled from 
the build-system. (Which was part of the idea behind the proposal.)

After considering this for some time, I actually like your idea: it is 
explicit (which is better than implicit), flexible and simple (no 
extracting downloader required at all). And it also does not lead to any 
problems with content-addressed downloads like SWH. The only downside I 
can see at the moment is that is stores both the outer and the inner 
archive.

Let's see what others think about it.

-- 
Regards
Hartmut Goebel

| Hartmut Goebel          | h.goebel@crazy-compilers.com               |
| www.crazy-compilers.com | compilers which you thought are impossible |



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Designing importers (was: (Re-) Designing extracting-downloader)
       [not found]   ` <87h77ly1ja.fsf@gmail.com>
@ 2022-04-06 16:44     ` Hartmut Goebel
  2022-04-10 20:33       ` Designing importers Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Hartmut Goebel @ 2022-04-06 16:44 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: guix-devel

Am 26.03.22 um 01:56 schrieb Maxim Cournoyer:

[Answering on the question how to design the extracting download I 
originally thought of using got hex.pm packages:]

> Is there a strong reason to want to use the archive instead of the
> sources from the project repository?

For the same reason you prefer to import from a PyPI package instead of 
the project git-repo: The metadata is easily available.

Anyhow, using the git-repo could be a pro, since the hex.pm package 
might miss tests or test-data. OTOH I discovered that some Erlang 
projects have the build-tool binary („rebar3“)  committed in the 
git-repo, So when using the git-repo, this needs to be removed by a 
snippet (which would not be required when using the hex.pm archive).

So this is a more general discussion: Would it be better — also in 
regard to detecting new versions — to use the projects source-repo or 
the package manager's repo.

Given the recent discussion about how to make packaging easier, maybe 
the hex.pm importer (and others) should become much more capable: E.g. 
the importer could fetch the meta-data from hex.pm and then create a 
package definition pointing to github (falling back to hex.pm). And then 
- to make life easy for packagers, check the repo for „rebar3“ and in 
case create a snippet for removing it.

-- 
Regards
Hartmut Goebel

| Hartmut Goebel          | h.goebel@crazy-compilers.com               |
| www.crazy-compilers.com | compilers which you thought are impossible |



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Designing importers
  2022-04-06 16:44     ` Designing importers (was: (Re-) Designing extracting-downloader) Hartmut Goebel
@ 2022-04-10 20:33       ` Ludovic Courtès
  0 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2022-04-10 20:33 UTC (permalink / raw)
  To: Hartmut Goebel; +Cc: guix-devel, Maxim Cournoyer

Hi Hartmut,

Hartmut Goebel <h.goebel@crazy-compilers.com> skribis:

> So this is a more general discussion: Would it be better — also in
> regard to detecting new versions — to use the projects source-repo or 
> the package manager's repo.

I guess it depends on how the repository is managed.  It’s not uncommon
for PyPI and Rubygems to contain archives whose content differ from
what’s available upstream—for instance lacking tests, sometimes worse¹.

> Given the recent discussion about how to make packaging easier, maybe
> the hex.pm importer (and others) should become much more capable:
> E.g. the importer could fetch the meta-data from hex.pm and then
> create a package definition pointing to github (falling back to
> hex.pm). And then - to make life easy for packagers, check the repo
> for „rebar3“ and in case create a snippet for removing it.

If an importer can determine what the upstream repository is, then yes,
I guess it would be good to use that repo.

The PyPI importer sometimes has that information but it currently
doesn’t use it.  A good exercise would be to try and have it fetch code
from Git instead of pypi.org.

Thanks,
Ludo’.

¹ See for example the ‘LastPyMile’ paper:
  https://securitylab.disi.unitn.it/lib/exe/fetch.php?media=research_activities:experiments:esecfse2021.pdf


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-04-10 20:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-23  8:57 (Re-) Designing extractong-downaloder Hartmut Goebel
2022-02-23 10:52 ` pukkamustard
2022-02-24  8:50   ` Hartmut Goebel
2022-02-23 12:30 ` Maxime Devos
2022-02-23 12:35   ` Maxime Devos
2022-02-24  8:56     ` Hartmut Goebel
     [not found] ` <beb0e29f-6066-d1b5-b560-22a3d0a98ad8@goebel-consult.de>
     [not found]   ` <87h77ly1ja.fsf@gmail.com>
2022-04-06 16:44     ` Designing importers (was: (Re-) Designing extracting-downloader) Hartmut Goebel
2022-04-10 20:33       ` Designing importers Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).