From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Lennart Borgman Newsgroups: gmane.emacs.help Subject: Re: How to get title of web page by url? Date: Wed, 28 Jul 2010 17:44:45 +0200 Message-ID: References: <87vd802nx4.fsf@zemblan.newkuwait.org> <87ocdr39i7.fsf@zemblan.newkuwait.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: dough.gmane.org 1280331943 19325 80.91.229.12 (28 Jul 2010 15:45:43 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 28 Jul 2010 15:45:43 +0000 (UTC) Cc: help-gnu-emacs@gnu.org To: Thamer Mahmoud Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Jul 28 17:45:41 2010 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Oe8pB-0005cc-Ae for geh-help-gnu-emacs@m.gmane.org; Wed, 28 Jul 2010 17:45:41 +0200 Original-Received: from localhost ([127.0.0.1]:50123 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Oe8pA-0000QO-Qm for geh-help-gnu-emacs@m.gmane.org; Wed, 28 Jul 2010 11:45:40 -0400 Original-Received: from [140.186.70.92] (port=55085 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Oe8oi-0000QJ-Kd for help-gnu-emacs@gnu.org; Wed, 28 Jul 2010 11:45:16 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Oe8od-0002et-Mf for help-gnu-emacs@gnu.org; Wed, 28 Jul 2010 11:45:12 -0400 Original-Received: from mail-pz0-f41.google.com ([209.85.210.41]:63620) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Oe8od-0002ei-I3 for help-gnu-emacs@gnu.org; Wed, 28 Jul 2010 11:45:07 -0400 Original-Received: by pzk33 with SMTP id 33so4310059pzk.0 for ; Wed, 28 Jul 2010 08:45:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=eQj06m6HEBUXtkZXA5Scwz06JIWFHybtd57EdMrpiKQ=; b=SqxbKFU4UjVEZoekwJzxhKuw7Lvms2lC1qu+tprZJGZAzU5uXwhFNCg7X+5RmTm4l/ B3MhyTlhGl3aRwobHxsJphl6yPl86c4iSklC3KwQpXzNyy8565Huiu9KZT7QeS+HUy+0 WjzO7RwpRYFdKHR4K4XE0KgDOB1ypIX8YKnvc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; b=Jybo7LFJimVw2fCOmBTTSwKwTt3/s1E4Dh2rIn4PbJchT5DIDrJLwQ0M6zEtr+EW7J 5DVazRbbZQsmZcPZ8BjnMeWD2qzBmIKjox4+7+NU4gJ6U3wMd/Z9YtdZSpP6JJ1DpXkA iK/GrN23wg98ppBAnz2L7uud+xlTkgRKxeBwQ= Original-Received: by 10.114.102.3 with SMTP id z3mr15451261wab.71.1280331905698; Wed, 28 Jul 2010 08:45:05 -0700 (PDT) Original-Received: by 10.229.5.72 with HTTP; Wed, 28 Jul 2010 08:44:45 -0700 (PDT) In-Reply-To: <87ocdr39i7.fsf@zemblan.newkuwait.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:74322 Archived-At: On Wed, Jul 28, 2010 at 5:34 PM, Thamer Mahmoud wrote: > filebat Mark writes: > >> Thanks, Thamer. It works. >> >> Below is the code snippet. >> >> Well, I still have an encoding problem. >> To get the title of "http://www.baidu.com", the title we get is displaye= d as >> unrecognizable codes. >> >> I have tried to encode it, in the way of "(setq web_title_str >> (encode-coding-string =C2=A0web_title_str 'utf-8-dos))", but it fails. > > I'm also new to Elisp (well sort of). > > But here is a modified version that should handle both charsets and > newlines (and other issues noticed by Deniz Dogan. Thanks). > > (defun www-get-page-title (url) > =C2=A0(let ((title)) > =C2=A0 =C2=A0(with-current-buffer (url-retrieve-synchronously url) > =C2=A0 =C2=A0 =C2=A0(goto-char (point-min)) > =C2=A0 =C2=A0 =C2=A0(re-search-forward "\\([^<]*\\)" nil t= 1) > =C2=A0 =C2=A0 =C2=A0(setq title (match-string 1)) > =C2=A0 =C2=A0 =C2=A0(goto-char (point-min)) > =C2=A0 =C2=A0 =C2=A0(re-search-forward "charset=3D\\([-0-9a-zA-Z]*\\)" ni= l t 1) > =C2=A0 =C2=A0 =C2=A0(decode-coding-string title (intern (match-string 1))= )))) > > The robustness of this code would still depend on whether the HTML is > well-formed, but it should be good enough I think. Have a look at url-copy-file for how to get this correct. (Or web-vcs-url-copy-file in nXhtml which is a little bit more careful.)