From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: =?ISO-8859-1?Q?Andreas_R=F6hler?= <andreas.roehler@easy-emacs.de>
Newsgroups: gmane.emacs.help
Subject: Re: How to get title of web page by url?
Date: Wed, 28 Jul 2010 18:03:58 +0200
Message-ID: <4C5054EE.4060106@easy-emacs.de>
References: <AANLkTim0HqKDdYvFqBT3Giy+8n44cxnWtg+w92eA3muu@mail.gmail.com>	<87vd802nx4.fsf@zemblan.newkuwait.org>	<AANLkTi=D_M1Fw0Lnp6vRNhAaniwQya-kLBMs-k3+40vK@mail.gmail.com>
	<87mxtbmzdu.fsf@mithlond.arda>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: dough.gmane.org 1280332846 23172 80.91.229.12 (28 Jul 2010 16:00:46 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Wed, 28 Jul 2010 16:00:46 +0000 (UTC)
Cc: help-gnu-emacs@gnu.org
To: Teemu Likonen <tlikonen@iki.fi>
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Jul 28 18:00:41 2010
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.69)
	(envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>)
	id 1Oe93g-0005Nl-0h
	for geh-help-gnu-emacs@m.gmane.org; Wed, 28 Jul 2010 18:00:40 +0200
Original-Received: from localhost ([127.0.0.1]:60744 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1Oe93f-0005Fc-60
	for geh-help-gnu-emacs@m.gmane.org; Wed, 28 Jul 2010 12:00:39 -0400
Original-Received: from [140.186.70.92] (port=45013 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Oe92f-0005EJ-S5
	for help-gnu-emacs@gnu.org; Wed, 28 Jul 2010 11:59:38 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <andreas.roehler@easy-emacs.de>) id 1Oe92b-0004pj-Jz
	for help-gnu-emacs@gnu.org; Wed, 28 Jul 2010 11:59:37 -0400
Original-Received: from moutng.kundenserver.de ([212.227.126.171]:55034)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <andreas.roehler@easy-emacs.de>) id 1Oe92b-0004pD-6q
	for help-gnu-emacs@gnu.org; Wed, 28 Jul 2010 11:59:33 -0400
Original-Received: from [192.168.178.27] (p5DDB0A87.dip0.t-ipconnect.de [93.219.10.135])
	by mrelayeu.kundenserver.de (node=mreu0) with ESMTP (Nemesis)
	id 0Lm8NJ-1PDkly2OXp-00ZVn9; Wed, 28 Jul 2010 17:59:27 +0200
User-Agent: Mozilla/5.0 (X11; U; Linux i686; de;
	rv:1.9.1.11) Gecko/20100711 Thunderbird/3.0.6
In-Reply-To: <87mxtbmzdu.fsf@mithlond.arda>
X-Provags-ID: V02:K0:GeBisIUIcZR/FOtdDq1lMVZYxKIR/rzDN02EaZe7wGV
	45cUzkZqHmEcyxEwHsGDmdkafYL/JStZzgDp7CYve510P5Bsjt
	MywUGPxIuATUrSwG4ay7lhh2o3DWRTFMbk5FzHVHmHHPq/3Ofh
	PZsYtAVkksVltkrPZUKsrSVsOsqqrB+7PO1oewZ1x6Yds0RAtz
	DMzubIJwQk6dRu3hEngzE1YrUGcTKEEOV6V876v1nM=
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.help:74324
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/74324>

[ ... ]

> The real solution for extracting title from a HTML text are not regular
> expressions but a specific HTML parser. The Lisp way to write such
> parser would be to turn the document (or only the head part) to nested
> lists and other s-expressions and then dive into the list to find the
> title. Such parsers already exist for Common Lisp but I'm not sure about
> Emacs Lisp.
>
>

beg-end.el

at

http://bazaar.launchpad.net/~a-roehler/s-x-emacs-werkstatt

is an essay for such a parser

see thing-at-point-markup.el too, which serves markup-languages as xml, html

thing-at-point-utils.el offers functions to grasp everything between 
angles - and does count nesting.

try ar-angled-lesser-atpt for example

all this needs

thingatpt-utils-base.el,

where the core routines reside.

Have a look, how the parser mentioned is employed via 
beginning-of-form-base, end-of-form-base from there.


Andreas


Andreas

--
https://code.launchpad.net/~a-roehler/python-mode
https://code.launchpad.net/s-x-emacs-werkstatt/