all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Internet Archive APIs useful as fallback?
@ 2018-12-19  5:26 swedebugia
  2018-12-19 14:57 ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: swedebugia @ 2018-12-19  5:26 UTC (permalink / raw)
  To: guix-devel

Hi

I stumbled over these at clintons blog and thought I would share them 
here if anybody is interested.

APIs for content other that way-back machine:
https://blog.archive.org/2018/12/13/documentation-for-public-apis-at-the-internet-archive/

APIs for the way-back machine:
https://archive.org/help/wayback_api.php

excerp:
Wayback Availability JSON API

This simple API for Wayback is a test to see if a given url is archived 
and currenlty accessible in the Wayback Machine. This API is useful for 
providing a 404 or other error handler which checks Wayback to see if it 
has an archived copy ready to display. The API can be used as follows:
http://archive.org/wayback/available?url=example.com

which might return:

{
     "archived_snapshots": {
         "closest": {
             "available": true,
             "url": 
"http://web.archive.org/web/20130919044612/http://example.com/",
             "timestamp": "20130919044612",
             "status": "200"
         }
     }
}


-- 
Cheers Swedebugia

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Internet Archive APIs useful as fallback?
  2018-12-19  5:26 Internet Archive APIs useful as fallback? swedebugia
@ 2018-12-19 14:57 ` Ludovic Courtès
  2018-12-19 17:18   ` swedebugia
                     ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Ludovic Courtès @ 2018-12-19 14:57 UTC (permalink / raw)
  To: swedebugia; +Cc: guix-devel

Hi!

swedebugia <swedebugia@riseup.net> skribis:

> I stumbled over these at clintons blog and thought I would share them
> here if anybody is interested.
>
> APIs for content other that way-back machine:
> https://blog.archive.org/2018/12/13/documentation-for-public-apis-at-the-internet-archive/
>
> APIs for the way-back machine:
> https://archive.org/help/wayback_api.php

We added support to retrieve Git checkouts (and some tarballs) from
Software Heritage recently:

  https://issues.guix.info/issue/33432

The Internet Archive is not in the business of archiving software, but
it’d be interesting to see if it archives tarballs that people put on
“random” web sites, in which case it could also be useful.

Thoughts?

Ludo’.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Internet Archive APIs useful as fallback?
  2018-12-19 14:57 ` Ludovic Courtès
@ 2018-12-19 17:18   ` swedebugia
  2018-12-20  7:47     ` Ludovic Courtès
  2018-12-19 21:28   ` Björn Höfling
  2018-12-20  3:00   ` bill-auger
  2 siblings, 1 reply; 8+ messages in thread
From: swedebugia @ 2018-12-19 17:18 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On 2018-12-19 15:57, Ludovic Courtès wrote:
> Hi!
> 
> swedebugia <swedebugia@riseup.net> skribis:
> 
>> I stumbled over these at clintons blog and thought I would share them
>> here if anybody is interested.
>>
>> APIs for content other that way-back machine:
>> https://blog.archive.org/2018/12/13/documentation-for-public-apis-at-the-internet-archive/
>>
>> APIs for the way-back machine:
>> https://archive.org/help/wayback_api.php
> 
> We added support to retrieve Git checkouts (and some tarballs) from
> Software Heritage recently:
> 
>    https://issues.guix.info/issue/33432
> 
> The Internet Archive is not in the business of archiving software, but
> it’d be interesting to see if it archives tarballs that people put on
> “random” web sites, in which case it could also be useful.
> 
> Thoughts?

Thanks for the quick reply :)

Yes, thanks for working on SWH! I did not yet succede to download from 
it but I guess it has to do with the baking and all.

---

I tested 3 of the quicklisp (there are ~1000) packages and these 2 worked:
https://web.archive.org/web/20170313123155/http://beta.quicklisp.org/archive/cl-moneris/2011-04-18/cl-moneris-20110418-git.tgz
https://web.archive.org/web/20170330204246/http://beta.quicklisp.org/archive/cl-modlisp/2015-09-23/cl-modlisp-20150923-git.tgz

:D

I tested the pypi packages and none out of 8 I tried was available.

To my knowledge the wayback crawler archives everything now, tgz, pdf, 
exe, you name it. It seems though that it does not archive these large 
files for every pass.

I think that the quicklisp example above is enough merit to add it as a 
last resort after trying SWH.

Thoughts?

-- 
Cheers Swedebugia

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Internet Archive APIs useful as fallback?
  2018-12-19 14:57 ` Ludovic Courtès
  2018-12-19 17:18   ` swedebugia
@ 2018-12-19 21:28   ` Björn Höfling
  2018-12-20  3:00   ` bill-auger
  2 siblings, 0 replies; 8+ messages in thread
From: Björn Höfling @ 2018-12-19 21:28 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 734 bytes --]

On Wed, 19 Dec 2018 15:57:04 +0100
Ludovic Courtès <ludo@gnu.org> wrote:


> We added support to retrieve Git checkouts (and some tarballs) from
> Software Heritage recently:
> 
>   https://issues.guix.info/issue/33432
> 
> The Internet Archive is not in the business of archiving software, but
> it’d be interesting to see if it archives tarballs that people put on
> “random” web sites, in which case it could also be useful.

We sometimes use it to replace expired packages or home-page URIs: In
some cases, the home-page or even the tar-file is in the archive. But
that is a manual step of cause. A famous example of recent time is the
package bzip2, where the domain expired and was taken over.

Björn

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Internet Archive APIs useful as fallback?
  2018-12-19 14:57 ` Ludovic Courtès
  2018-12-19 17:18   ` swedebugia
  2018-12-19 21:28   ` Björn Höfling
@ 2018-12-20  3:00   ` bill-auger
  2018-12-20  4:42     ` swedebugia
  2 siblings, 1 reply; 8+ messages in thread
From: bill-auger @ 2018-12-20  3:00 UTC (permalink / raw)
  To: guix-devel

On Wed, 19 Dec 2018 15:57:04 +0100 Ludovic Courtès wrote:
> The Internet Archive is not in the business of archiving software, but
> it’d be interesting to see if it archives tarballs that people put on
> “random” web sites

FWIW, The Internet Archive is not *in the business* of *anything* - it
is a charity - but more importantly, this post is referring to the
"Wayback Machine"; which is not identical to "The Internet Archive", but
is a very specialized subset of the archive.org service - the Wayback
Machine differs from the main Internet Archive both in function and
scope

namely, the Wayback Machine only caches web pages, it does so in a
semi-automated fashion, carries no metadata, and does not download any
external assets (such as images and tarballs or any external HTML in
frames) unless they are defined explicitly in the HTML of the base web
page (such as data-uri images and other blobs)

the larger Internet Archive is generally useful for anything that is
naturally suitable for archival and it has a specific section/tags for
software; but that is entirely a manual process done by a registered
user; and all items are associated with that registered account

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Internet Archive APIs useful as fallback?
  2018-12-20  3:00   ` bill-auger
@ 2018-12-20  4:42     ` swedebugia
  0 siblings, 0 replies; 8+ messages in thread
From: swedebugia @ 2018-12-20  4:42 UTC (permalink / raw)
  To: bill-auger, guix-devel

On 2018-12-20 04:00, bill-auger wrote:

snip

> 
> namely, the Wayback Machine only caches web pages, it does so in a
> semi-automated fashion, carries no metadata, and does not download any
> external assets (such as images and tarballs or any external HTML in
> frames) unless they are defined explicitly in the HTML of the base web
> page (such as data-uri images and other blobs)

This is no longer the case. I found that it most of the times archives 
blobs too.

As an aside, if we need it (and we do/did in the case of bzip2) we could 
consider helping them raise funds (they have their yearly fundraiser 
going now)

-- 
Cheers Swedebugia

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Internet Archive APIs useful as fallback?
  2018-12-19 17:18   ` swedebugia
@ 2018-12-20  7:47     ` Ludovic Courtès
  2018-12-20 12:55       ` swedebugia
  0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2018-12-20  7:47 UTC (permalink / raw)
  To: swedebugia; +Cc: guix-devel

Hi!

swedebugia <swedebugia@riseup.net> skribis:

> I tested 3 of the quicklisp (there are ~1000) packages and these 2 worked:
> https://web.archive.org/web/20170313123155/http://beta.quicklisp.org/archive/cl-moneris/2011-04-18/cl-moneris-20110418-git.tgz
> https://web.archive.org/web/20170330204246/http://beta.quicklisp.org/archive/cl-modlisp/2015-09-23/cl-modlisp-20150923-git.tgz
>
> :D
>
> I tested the pypi packages and none out of 8 I tried was available.

Interesting.  I suppose one would have to use the API to find out the
URLs above?

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Internet Archive APIs useful as fallback?
  2018-12-20  7:47     ` Ludovic Courtès
@ 2018-12-20 12:55       ` swedebugia
  0 siblings, 0 replies; 8+ messages in thread
From: swedebugia @ 2018-12-20 12:55 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On 2018-12-20 08:47, Ludovic Courtès wrote:
> Hi!
> 
> swedebugia <swedebugia@riseup.net> skribis:
> 
>> I tested 3 of the quicklisp (there are ~1000) packages and these 2 worked:
>> https://web.archive.org/web/20170313123155/http://beta.quicklisp.org/archive/cl-moneris/2011-04-18/cl-moneris-20110418-git.tgz
>> https://web.archive.org/web/20170330204246/http://beta.quicklisp.org/archive/cl-modlisp/2015-09-23/cl-modlisp-20150923-git.tgz
>>
>> :D
>>
>> I tested the pypi packages and none out of 8 I tried was available.
> 
> Interesting.  I suppose one would have to use the API to find out the
> URLs above?

I tried 
https://archive.org/wayback/available?url=http://beta.quicklisp.org/archive/cl-moneris/2011-04-18/cl-moneris-20110418-git.tgz
to get the url of the latest snapshot but it returned:

{"url": 
"http://beta.quicklisp.org/archive/cl-moneris/2011-04-18/cl-moneris-20110418-git.tgz", 
"archived_snapshots": {}}

which I consider an error. :(

The same goes for the cl-modlisp above. So 2/2 failed to give us the 
snapshot url.

So the API seems currently broken which leaves us with the web-interface.

I will report upstream...

-- 
Cheers Swedebugia

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-12-20 12:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-19  5:26 Internet Archive APIs useful as fallback? swedebugia
2018-12-19 14:57 ` Ludovic Courtès
2018-12-19 17:18   ` swedebugia
2018-12-20  7:47     ` Ludovic Courtès
2018-12-20 12:55       ` swedebugia
2018-12-19 21:28   ` Björn Höfling
2018-12-20  3:00   ` bill-auger
2018-12-20  4:42     ` swedebugia

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.