From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Teemu Likonen Newsgroups: gmane.emacs.help Subject: Re: How to get title of web page by url? Date: Wed, 28 Jul 2010 17:53:17 +0300 Message-ID: <87mxtbmzdu.fsf@mithlond.arda> References: <87vd802nx4.fsf@zemblan.newkuwait.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: Quoted-Printable X-Trace: dough.gmane.org 1280328890 7058 80.91.229.12 (28 Jul 2010 14:54:50 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 28 Jul 2010 14:54:50 +0000 (UTC) Cc: help-gnu-emacs@gnu.org, Thamer Mahmoud To: Deniz Dogan Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Jul 28 16:54:49 2010 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Oe81x-0004vi-53 for geh-help-gnu-emacs@m.gmane.org; Wed, 28 Jul 2010 16:54:49 +0200 Original-Received: from localhost ([127.0.0.1]:49664 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Oe81w-0004hm-LC for geh-help-gnu-emacs@m.gmane.org; Wed, 28 Jul 2010 10:54:48 -0400 Original-Received: from [140.186.70.92] (port=58170 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Oe80k-0004Az-Hd for help-gnu-emacs@gnu.org; Wed, 28 Jul 2010 10:53:35 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Oe80i-0002Fv-L1 for help-gnu-emacs@gnu.org; Wed, 28 Jul 2010 10:53:34 -0400 Original-Received: from mta-out.inet.fi ([195.156.147.13]:37122 helo=jenni1.inet.fi) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Oe80i-0002EV-9T for help-gnu-emacs@gnu.org; Wed, 28 Jul 2010 10:53:32 -0400 Original-Received: from mithlond.arda (84.251.132.215) by jenni1.inet.fi (8.5.122) id 4C299CCA00ED1C5A; Wed, 28 Jul 2010 17:53:18 +0300 Original-Received: from dtw by mithlond.arda with local (Exim 4.69) (envelope-from ) id 1Oe80T-0003bY-K0; Wed, 28 Jul 2010 17:53:17 +0300 In-Reply-To: (Deniz Dogan's message of "Wed, 28 Jul 2010 16:12:52 +0200") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2.50 (gnu/linux) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:74319 Archived-At: * 2010-07-28 16:12 (+0200), Deniz Dogan wrote: > 2010/7/28 Thamer Mahmoud : >> =A0 =A0(re-search-forward "\\(.*\\)<[/]title>" nil t 1) > By the way, this will not work in scenarios where the title is spread > out across multiple lines: > > <title> > Hello > > > How would you solve this in Emacs Lisp? Regexps can match whitespace too. Just leave out spaces, tabs and newlines in the beginning and end of title text. Also note that the title text itself may contain newlines. We should probably replace newlines with spaces in the matching string. The real solution for extracting title from a HTML text are not regular expressions but a specific HTML parser. The Lisp way to write such parser would be to turn the document (or only the head part) to nested lists and other s-expressions and then dive into the list to find the title. Such parsers already exist for Common Lisp but I'm not sure about Emacs Lisp.