unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
From: Catonano <catonano@gmail.com>
To: guile-user@gnu.org
Subject: salutations and web scraping
Date: Fri, 30 Dec 2011 23:58:47 +0100	[thread overview]
Message-ID: <CAJ98PDwaBDpcAP_L2nYzNjkNwF59fSrjt8gvcxK08TkcdTJbzg@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 1715 bytes --]

Hello people,

Happy New Year.

I´m a beginner, I never wrote a single line of LISP or Scheme in my life
and I´m here for asking for directions and suggestions.

I´m mumbling about a pet project. I would like to scrape the web site of a
comunitarian radio station and grab the flash streamed content they
publish. The license the material is published under is Creative Common  so
what I´m planning is not illegal.

The reason why they chose such an obtuse solution is because they are
obtuse. They started the station in the 70s and now they don´t get this
digital new thing

I read the web stuff. The client chapter suggests to adopt an architecture
similar to that of the server for parallel scrapers and closes flashing the
idea of threads and futures.

I don´t see how I could use threads or futures (I´m not even sure what they
are) and my boldness is such that I´d ask you to write for me an example
skeleton code.

Also I was thinking to write a scraper in Guile scheme and then such
scraper would parse the html source for te relevant bits and then delegate
the flash stuff to a unix command, I think wget, curl or something similar.
Is this reasonable ? Is there any architectural glitch I´m missing, here ?

Don´t worry people, I know that the server setup and the internet
connection is not so strong and I don´t want to be server hostile so I
guess a maximum of 2 parallel connections are gonna run.

Or, I was dreaming I could try to integrate the thing with the Gnome
enviroinment and make it available from the Gnome Shell javascript. So the
people in the community could use it to grab the footages themselves. I
don´t know

Thanks so much for ANY hint
Cato

[-- Attachment #2: Type: text/html, Size: 1752 bytes --]

             reply	other threads:[~2011-12-30 22:58 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-30 22:58 Catonano [this message]
2012-01-10 21:46 ` salutations and web scraping Andy Wingo
2012-01-16 20:06   ` Catonano
2012-01-24 12:47   ` Catonano
2012-01-24 13:07     ` Andy Wingo
2012-01-24 14:17       ` Catonano
2012-01-25  1:41         ` Catonano
2012-01-25  3:56           ` Daniel Hartwig
2012-01-25  4:57             ` Catonano
2012-01-25  9:07             ` Andy Wingo
2012-01-25 17:23               ` Catonano
2012-01-27 12:18                 ` Catonano
2013-01-07 22:23                   ` Andy Wingo
2013-01-30 13:48                     ` Catonano
2012-01-25  8:57           ` Andy Wingo
2012-01-29 14:23             ` Catonano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJ98PDwaBDpcAP_L2nYzNjkNwF59fSrjt8gvcxK08TkcdTJbzg@mail.gmail.com \
    --to=catonano@gmail.com \
    --cc=guile-user@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).