* Internet Archive APIs useful as fallback?
@ 2018-12-19 5:26 swedebugia
2018-12-19 14:57 ` Ludovic Courtès
0 siblings, 1 reply; 8+ messages in thread
From: swedebugia @ 2018-12-19 5:26 UTC (permalink / raw)
To: guix-devel
Hi
I stumbled over these at clintons blog and thought I would share them
here if anybody is interested.
APIs for content other that way-back machine:
https://blog.archive.org/2018/12/13/documentation-for-public-apis-at-the-internet-archive/
APIs for the way-back machine:
https://archive.org/help/wayback_api.php
excerp:
Wayback Availability JSON API
This simple API for Wayback is a test to see if a given url is archived
and currenlty accessible in the Wayback Machine. This API is useful for
providing a 404 or other error handler which checks Wayback to see if it
has an archived copy ready to display. The API can be used as follows:
http://archive.org/wayback/available?url=example.com
which might return:
{
"archived_snapshots": {
"closest": {
"available": true,
"url":
"http://web.archive.org/web/20130919044612/http://example.com/",
"timestamp": "20130919044612",
"status": "200"
}
}
}
--
Cheers Swedebugia
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Internet Archive APIs useful as fallback?
2018-12-19 5:26 Internet Archive APIs useful as fallback? swedebugia
@ 2018-12-19 14:57 ` Ludovic Courtès
2018-12-19 17:18 ` swedebugia
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Ludovic Courtès @ 2018-12-19 14:57 UTC (permalink / raw)
To: swedebugia; +Cc: guix-devel
Hi!
swedebugia <swedebugia@riseup.net> skribis:
> I stumbled over these at clintons blog and thought I would share them
> here if anybody is interested.
>
> APIs for content other that way-back machine:
> https://blog.archive.org/2018/12/13/documentation-for-public-apis-at-the-internet-archive/
>
> APIs for the way-back machine:
> https://archive.org/help/wayback_api.php
We added support to retrieve Git checkouts (and some tarballs) from
Software Heritage recently:
https://issues.guix.info/issue/33432
The Internet Archive is not in the business of archiving software, but
it’d be interesting to see if it archives tarballs that people put on
“random” web sites, in which case it could also be useful.
Thoughts?
Ludo’.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Internet Archive APIs useful as fallback?
2018-12-19 14:57 ` Ludovic Courtès
@ 2018-12-19 17:18 ` swedebugia
2018-12-20 7:47 ` Ludovic Courtès
2018-12-19 21:28 ` Björn Höfling
2018-12-20 3:00 ` bill-auger
2 siblings, 1 reply; 8+ messages in thread
From: swedebugia @ 2018-12-19 17:18 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guix-devel
On 2018-12-19 15:57, Ludovic Courtès wrote:
> Hi!
>
> swedebugia <swedebugia@riseup.net> skribis:
>
>> I stumbled over these at clintons blog and thought I would share them
>> here if anybody is interested.
>>
>> APIs for content other that way-back machine:
>> https://blog.archive.org/2018/12/13/documentation-for-public-apis-at-the-internet-archive/
>>
>> APIs for the way-back machine:
>> https://archive.org/help/wayback_api.php
>
> We added support to retrieve Git checkouts (and some tarballs) from
> Software Heritage recently:
>
> https://issues.guix.info/issue/33432
>
> The Internet Archive is not in the business of archiving software, but
> it’d be interesting to see if it archives tarballs that people put on
> “random” web sites, in which case it could also be useful.
>
> Thoughts?
Thanks for the quick reply :)
Yes, thanks for working on SWH! I did not yet succede to download from
it but I guess it has to do with the baking and all.
---
I tested 3 of the quicklisp (there are ~1000) packages and these 2 worked:
https://web.archive.org/web/20170313123155/http://beta.quicklisp.org/archive/cl-moneris/2011-04-18/cl-moneris-20110418-git.tgz
https://web.archive.org/web/20170330204246/http://beta.quicklisp.org/archive/cl-modlisp/2015-09-23/cl-modlisp-20150923-git.tgz
:D
I tested the pypi packages and none out of 8 I tried was available.
To my knowledge the wayback crawler archives everything now, tgz, pdf,
exe, you name it. It seems though that it does not archive these large
files for every pass.
I think that the quicklisp example above is enough merit to add it as a
last resort after trying SWH.
Thoughts?
--
Cheers Swedebugia
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Internet Archive APIs useful as fallback?
2018-12-19 14:57 ` Ludovic Courtès
2018-12-19 17:18 ` swedebugia
@ 2018-12-19 21:28 ` Björn Höfling
2018-12-20 3:00 ` bill-auger
2 siblings, 0 replies; 8+ messages in thread
From: Björn Höfling @ 2018-12-19 21:28 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guix-devel
[-- Attachment #1: Type: text/plain, Size: 734 bytes --]
On Wed, 19 Dec 2018 15:57:04 +0100
Ludovic Courtès <ludo@gnu.org> wrote:
> We added support to retrieve Git checkouts (and some tarballs) from
> Software Heritage recently:
>
> https://issues.guix.info/issue/33432
>
> The Internet Archive is not in the business of archiving software, but
> it’d be interesting to see if it archives tarballs that people put on
> “random” web sites, in which case it could also be useful.
We sometimes use it to replace expired packages or home-page URIs: In
some cases, the home-page or even the tar-file is in the archive. But
that is a manual step of cause. A famous example of recent time is the
package bzip2, where the domain expired and was taken over.
Björn
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Internet Archive APIs useful as fallback?
2018-12-19 14:57 ` Ludovic Courtès
2018-12-19 17:18 ` swedebugia
2018-12-19 21:28 ` Björn Höfling
@ 2018-12-20 3:00 ` bill-auger
2018-12-20 4:42 ` swedebugia
2 siblings, 1 reply; 8+ messages in thread
From: bill-auger @ 2018-12-20 3:00 UTC (permalink / raw)
To: guix-devel
On Wed, 19 Dec 2018 15:57:04 +0100 Ludovic Courtès wrote:
> The Internet Archive is not in the business of archiving software, but
> it’d be interesting to see if it archives tarballs that people put on
> “random” web sites
FWIW, The Internet Archive is not *in the business* of *anything* - it
is a charity - but more importantly, this post is referring to the
"Wayback Machine"; which is not identical to "The Internet Archive", but
is a very specialized subset of the archive.org service - the Wayback
Machine differs from the main Internet Archive both in function and
scope
namely, the Wayback Machine only caches web pages, it does so in a
semi-automated fashion, carries no metadata, and does not download any
external assets (such as images and tarballs or any external HTML in
frames) unless they are defined explicitly in the HTML of the base web
page (such as data-uri images and other blobs)
the larger Internet Archive is generally useful for anything that is
naturally suitable for archival and it has a specific section/tags for
software; but that is entirely a manual process done by a registered
user; and all items are associated with that registered account
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Internet Archive APIs useful as fallback?
2018-12-20 3:00 ` bill-auger
@ 2018-12-20 4:42 ` swedebugia
0 siblings, 0 replies; 8+ messages in thread
From: swedebugia @ 2018-12-20 4:42 UTC (permalink / raw)
To: bill-auger, guix-devel
On 2018-12-20 04:00, bill-auger wrote:
snip
>
> namely, the Wayback Machine only caches web pages, it does so in a
> semi-automated fashion, carries no metadata, and does not download any
> external assets (such as images and tarballs or any external HTML in
> frames) unless they are defined explicitly in the HTML of the base web
> page (such as data-uri images and other blobs)
This is no longer the case. I found that it most of the times archives
blobs too.
As an aside, if we need it (and we do/did in the case of bzip2) we could
consider helping them raise funds (they have their yearly fundraiser
going now)
--
Cheers Swedebugia
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Internet Archive APIs useful as fallback?
2018-12-19 17:18 ` swedebugia
@ 2018-12-20 7:47 ` Ludovic Courtès
2018-12-20 12:55 ` swedebugia
0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2018-12-20 7:47 UTC (permalink / raw)
To: swedebugia; +Cc: guix-devel
Hi!
swedebugia <swedebugia@riseup.net> skribis:
> I tested 3 of the quicklisp (there are ~1000) packages and these 2 worked:
> https://web.archive.org/web/20170313123155/http://beta.quicklisp.org/archive/cl-moneris/2011-04-18/cl-moneris-20110418-git.tgz
> https://web.archive.org/web/20170330204246/http://beta.quicklisp.org/archive/cl-modlisp/2015-09-23/cl-modlisp-20150923-git.tgz
>
> :D
>
> I tested the pypi packages and none out of 8 I tried was available.
Interesting. I suppose one would have to use the API to find out the
URLs above?
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Internet Archive APIs useful as fallback?
2018-12-20 7:47 ` Ludovic Courtès
@ 2018-12-20 12:55 ` swedebugia
0 siblings, 0 replies; 8+ messages in thread
From: swedebugia @ 2018-12-20 12:55 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guix-devel
On 2018-12-20 08:47, Ludovic Courtès wrote:
> Hi!
>
> swedebugia <swedebugia@riseup.net> skribis:
>
>> I tested 3 of the quicklisp (there are ~1000) packages and these 2 worked:
>> https://web.archive.org/web/20170313123155/http://beta.quicklisp.org/archive/cl-moneris/2011-04-18/cl-moneris-20110418-git.tgz
>> https://web.archive.org/web/20170330204246/http://beta.quicklisp.org/archive/cl-modlisp/2015-09-23/cl-modlisp-20150923-git.tgz
>>
>> :D
>>
>> I tested the pypi packages and none out of 8 I tried was available.
>
> Interesting. I suppose one would have to use the API to find out the
> URLs above?
I tried
https://archive.org/wayback/available?url=http://beta.quicklisp.org/archive/cl-moneris/2011-04-18/cl-moneris-20110418-git.tgz
to get the url of the latest snapshot but it returned:
{"url":
"http://beta.quicklisp.org/archive/cl-moneris/2011-04-18/cl-moneris-20110418-git.tgz",
"archived_snapshots": {}}
which I consider an error. :(
The same goes for the cl-modlisp above. So 2/2 failed to give us the
snapshot url.
So the API seems currently broken which leaves us with the web-interface.
I will report upstream...
--
Cheers Swedebugia
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-12-20 12:49 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-12-19 5:26 Internet Archive APIs useful as fallback? swedebugia
2018-12-19 14:57 ` Ludovic Courtès
2018-12-19 17:18 ` swedebugia
2018-12-20 7:47 ` Ludovic Courtès
2018-12-20 12:55 ` swedebugia
2018-12-19 21:28 ` Björn Höfling
2018-12-20 3:00 ` bill-auger
2018-12-20 4:42 ` swedebugia
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).