From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Teemu Likonen <tlikonen@iki.fi>
Newsgroups: gmane.emacs.help
Subject: Re: How to get title of web page by url?
Date: Wed, 28 Jul 2010 17:53:17 +0300
Message-ID: <87mxtbmzdu.fsf@mithlond.arda>
References: <AANLkTim0HqKDdYvFqBT3Giy+8n44cxnWtg+w92eA3muu@mail.gmail.com>
	<87vd802nx4.fsf@zemblan.newkuwait.org>
	<AANLkTi=D_M1Fw0Lnp6vRNhAaniwQya-kLBMs-k3+40vK@mail.gmail.com>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: Quoted-Printable
X-Trace: dough.gmane.org 1280328890 7058 80.91.229.12 (28 Jul 2010 14:54:50 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Wed, 28 Jul 2010 14:54:50 +0000 (UTC)
Cc: help-gnu-emacs@gnu.org, Thamer Mahmoud <thamer.mahmoud@gmail.com>
To: Deniz Dogan <deniz.a.m.dogan@gmail.com>
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Jul 28 16:54:49 2010
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.69)
	(envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>)
	id 1Oe81x-0004vi-53
	for geh-help-gnu-emacs@m.gmane.org; Wed, 28 Jul 2010 16:54:49 +0200
Original-Received: from localhost ([127.0.0.1]:49664 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1Oe81w-0004hm-LC
	for geh-help-gnu-emacs@m.gmane.org; Wed, 28 Jul 2010 10:54:48 -0400
Original-Received: from [140.186.70.92] (port=58170 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Oe80k-0004Az-Hd
	for help-gnu-emacs@gnu.org; Wed, 28 Jul 2010 10:53:35 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <tlikonen@iki.fi>) id 1Oe80i-0002Fv-L1
	for help-gnu-emacs@gnu.org; Wed, 28 Jul 2010 10:53:34 -0400
Original-Received: from mta-out.inet.fi ([195.156.147.13]:37122 helo=jenni1.inet.fi)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <tlikonen@iki.fi>) id 1Oe80i-0002EV-9T
	for help-gnu-emacs@gnu.org; Wed, 28 Jul 2010 10:53:32 -0400
Original-Received: from mithlond.arda (84.251.132.215) by jenni1.inet.fi (8.5.122)
	id 4C299CCA00ED1C5A; Wed, 28 Jul 2010 17:53:18 +0300
Original-Received: from dtw by mithlond.arda with local (Exim 4.69)
	(envelope-from <tlikonen@iki.fi>)
	id 1Oe80T-0003bY-K0; Wed, 28 Jul 2010 17:53:17 +0300
In-Reply-To: <AANLkTi=D_M1Fw0Lnp6vRNhAaniwQya-kLBMs-k3+40vK@mail.gmail.com>
	(Deniz Dogan's message of "Wed, 28 Jul 2010 16:12:52 +0200")
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2.50 (gnu/linux)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3)
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.help:74319
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/74319>

* 2010-07-28 16:12 (+0200), Deniz Dogan wrote:

> 2010/7/28 Thamer Mahmoud <thamer.mahmoud@gmail.com>:
>> =A0 =A0(re-search-forward "<title>\\(.*\\)<[/]title>" nil t 1)

> By the way, this will not work in scenarios where the title is spread
> out across multiple lines:
>
> <title>
>   Hello
> </title>
>
> How would you solve this in Emacs Lisp?

Regexps can match whitespace too. Just leave out spaces, tabs and
newlines in the beginning and end of title text. Also note that the
title text itself may contain newlines. We should probably replace
newlines with spaces in the matching string.

The real solution for extracting title from a HTML text are not regular
expressions but a specific HTML parser. The Lisp way to write such
parser would be to turn the document (or only the head part) to nested
lists and other s-expressions and then dive into the list to find the
title. Such parsers already exist for Common Lisp but I'm not sure about
Emacs Lisp.