From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Catonano Newsgroups: gmane.lisp.guile.user Subject: web scraping update and a new issue Date: Sun, 5 Feb 2012 15:17:19 +0100 Message-ID: NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=20cf300faaf1305c6d04b83834c4 X-Trace: dough.gmane.org 1328451449 24078 80.91.229.3 (5 Feb 2012 14:17:29 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sun, 5 Feb 2012 14:17:29 +0000 (UTC) To: guile-user@gnu.org Original-X-From: guile-user-bounces+guile-user=m.gmane.org@gnu.org Sun Feb 05 15:17:28 2012 Return-path: Envelope-to: guile-user@m.gmane.org Original-Received: from lists.gnu.org ([140.186.70.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Ru2uE-0007MZ-JK for guile-user@m.gmane.org; Sun, 05 Feb 2012 15:17:26 +0100 Original-Received: from localhost ([::1]:43645 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ru2uE-0001dJ-45 for guile-user@m.gmane.org; Sun, 05 Feb 2012 09:17:26 -0500 Original-Received: from eggs.gnu.org ([140.186.70.92]:57337) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ru2u9-0001bd-Tf for guile-user@gnu.org; Sun, 05 Feb 2012 09:17:22 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Ru2u8-0004Ub-S2 for guile-user@gnu.org; Sun, 05 Feb 2012 09:17:21 -0500 Original-Received: from mail-qw0-f41.google.com ([209.85.216.41]:35043) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ru2u8-0004UQ-Po for guile-user@gnu.org; Sun, 05 Feb 2012 09:17:20 -0500 Original-Received: by qadz32 with SMTP id z32so1720598qad.0 for ; Sun, 05 Feb 2012 06:17:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=H22+bQ8nZ7wr9XJNN6rlXu9DdQlY0iEAgGFTVb6xH/w=; b=QNhgkgUxhl0tWrP0FvtmGAwpl5gnVLDz3W+QOAOR3vWpM6dxHHHq+DBZGl3f7OduMV 2bFEu0RSNrpBuzCwjGiX7yBDCwDcccsgoRVQ7F76BXMsk43utMD7p9HvoqXng3MLmXMe webaBSOQ4Het5GeAFpricXdJJTqAJZrdCCb6A= Original-Received: by 10.224.205.199 with SMTP id fr7mr16884634qab.65.1328451440106; Sun, 05 Feb 2012 06:17:20 -0800 (PST) Original-Received: by 10.229.219.145 with HTTP; Sun, 5 Feb 2012 06:17:19 -0800 (PST) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.85.216.41 X-BeenThere: guile-user@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: General Guile related discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-user-bounces+guile-user=m.gmane.org@gnu.org Original-Sender: guile-user-bounces+guile-user=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.user:9241 Archived-At: --20cf300faaf1305c6d04b83834c4 Content-Type: text/plain; charset=ISO-8859-1 People, should anyone be interested, here are some updates aout my web scraping initiative First things first: I owe some apologies, I indicated some issues, privately, to Ian but I was blabbering. I was just confused. Those issues emerge with some sites only and they do NOT emerge with my radio station site. I applied the Ian's patch for chunked respones to my branch and it works like a charm. Yesterday I had fun like I hadn't since a long time. Now I'm running into a new issue: the web server puts a cookie in the response and I suspect it attempts to acces that cookie at the time of my subsequent request. I also suspect that it doesn't succeed in accessing that so the subsequent response contains some incorrect results Not so when I test the sequence with Firefox. Is there anything concerning the cookies I should be aware of ? Thanks for any hint --20cf300faaf1305c6d04b83834c4 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable People,

should anyone be interested, here are some updates aout my w= eb scraping initiative

First things first: I owe some apologies, I i= ndicated some issues, privately, to Ian but I was blabbering. I was just co= nfused. Those issues emerge with some sites only and they do NOT emerge wit= h my radio station site. I applied the Ian's patch for chunked respones= to my branch and it works like a charm.

Yesterday I had fun like I hadn't since a long time.

Now I&#= 39;m running into a new issue: the web server puts a cookie in the response= and I suspect it attempts to acces that cookie at the time of my subsequen= t request. I also suspect that it doesn't succeed in accessing that so = the subsequent response contains some incorrect results

Not so when I test the sequence with Firefox.

Is there anything = concerning the cookies I should be aware of ?

Thanks for any hint --20cf300faaf1305c6d04b83834c4--