From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: filebat Mark Newsgroups: gmane.emacs.help Subject: Re: How to get title of web page by url? Date: Wed, 28 Jul 2010 21:44:49 +0800 Message-ID: References: <87vd802nx4.fsf@zemblan.newkuwait.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=000e0cd3292e4fdb57048c72d253 X-Trace: dough.gmane.org 1280324809 23277 80.91.229.12 (28 Jul 2010 13:46:49 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Wed, 28 Jul 2010 13:46:49 +0000 (UTC) Cc: help-gnu-emacs@gnu.org To: Thamer Mahmoud Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Jul 28 15:46:47 2010 Return-path: Envelope-to: geh-help-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Oe6y7-0001iv-3O for geh-help-gnu-emacs@m.gmane.org; Wed, 28 Jul 2010 15:46:47 +0200 Original-Received: from localhost ([127.0.0.1]:41795 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Oe6y6-00061T-KE for geh-help-gnu-emacs@m.gmane.org; Wed, 28 Jul 2010 09:46:46 -0400 Original-Received: from [140.186.70.92] (port=45411 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Oe6wH-0005NB-BO for help-gnu-emacs@gnu.org; Wed, 28 Jul 2010 09:44:54 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Oe6wF-0007aY-Cf for help-gnu-emacs@gnu.org; Wed, 28 Jul 2010 09:44:53 -0400 Original-Received: from mail-pz0-f41.google.com ([209.85.210.41]:60674) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Oe6wF-0007aJ-47 for help-gnu-emacs@gnu.org; Wed, 28 Jul 2010 09:44:51 -0400 Original-Received: by pzk33 with SMTP id 33so4210095pzk.0 for ; Wed, 28 Jul 2010 06:44:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=ivnKWU2MK6OzYIDAsRjVdRmYyYJXVzr/0Ps6Xe+3qg4=; b=kqxX+DRqrc2Ums/Ngg/yUU/I/QECEJhH2FjFXjN6gnFdjoZ8qMNcR/66SjLl82uVXh H2wctX/PrTg6MdMxVK0874xCtRUz8LJZKQFhNWAqV2bLkVDgSyp3o3zcBvEUYBgCgEJS qnKgWlWm1/Nrtqa7qgtofXDB5PiZ4HSH6PIKE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=wd7d1rVuey2mF57m1NXFhWQ2oBHpTKIz9MlpIUZBXcV4uncWWHq/2utkPnJfXWRIKu nTxXoYN4jG79TKjmH/wuiLSISyHwftccQzSW46yEJeEoQ76EmgshanZP4X32Sn55+uRy hHbmgn182I4XAayJu1j1GlYITrnoUb15hUW3k= Original-Received: by 10.142.127.9 with SMTP id z9mr11806164wfc.193.1280324689451; Wed, 28 Jul 2010 06:44:49 -0700 (PDT) Original-Received: by 10.142.209.12 with HTTP; Wed, 28 Jul 2010 06:44:49 -0700 (PDT) In-Reply-To: <87vd802nx4.fsf@zemblan.newkuwait.org> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Users list for the GNU Emacs text editor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.help:74316 Archived-At: --000e0cd3292e4fdb57048c72d253 Content-Type: text/plain; charset=ISO-8859-1 Thanks, Thamer. It works. Below is the code snippet. Well, I still have an encoding problem. To get the title of "http://www.baidu.com", the title we get is displayed as unrecognizable codes. I have tried to encode it, in the way of "(setq web_title_str (encode-coding-string web_title_str 'utf-8-dos))", but it fails. Since I am a newbie for emacs encoding, can you please help me to point what the problem is? ;; -------------------------- separator -------------------------- (defun get-page-title() "Get title of web page, whose url can be found in current line" (interactive) ;; Get url from current line (copy-region-as-kill (re-search-backward "^") (re-search-forward "$")) (setq url (substring-no-properties (current-kill 0))) ;; Get title of web page, with the help of functions in url.el (with-current-buffer (url-retrieve-synchronously url) (goto-char 0) (re-search-forward "\\(.*\\)<[/]title>" nil t 1) (setq web_title_str (match-string 1))) (setq web_title_str (encode-coding-string web_title_str 'utf-8-dos)) ;; Insert the title in the next line (reindent-then-newline-and-indent) (insert web_title_str) ) On 7/28/10, Thamer Mahmoud <thamer.mahmoud@gmail.com> wrote: > > filebat Mark <filebat.mark@gmail.com> writes: > > > Such as, given "http://www.emacswiki.org/emacs/Git", we will get the > title > > of this web page, which is "EmacsWiki: Git:". > > > > Function of w3m-current-title is quite close, but a standalone lisp > function > > is much preferred. > > > Using the url.el package, > > (defun www-get-page-title (url) > (with-current-buffer (url-retrieve-synchronously url) > (goto-char 0) > (re-search-forward "<title>\\(.*\\)<[/]title>" nil t 1) > (match-string 1))) > > (www-get-page-title "http://www.emacswiki.org/emacs/Git") > => "EmacsWiki: Git" > > hth, > > Thamer > > > -- Thanks & Regards Denny Zhang --000e0cd3292e4fdb57048c72d253 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks, Thamer. It works.<br><br>Below is the code snippet.<br><br>Well, I = still have an encoding problem.<br>To get the title of "<a href=3D"htt= p://www.baidu.com">http://www.baidu.com</a>", the title we get is disp= layed as unrecognizable codes.<br> <br>I have tried to encode it, in the way of "(setq web_title_str (enc= ode-coding-string=A0 web_title_str 'utf-8-dos))", but it fails.<br= >Since I am a newbie for emacs encoding, can you please help me to point wh= at the problem is?<br> <br>;; -------------------------- separator --------------------------<br>(= defun get-page-title()<br>=A0 "Get title of web page, whose url can be= found in current line"<br>=A0 (interactive)<br>=A0 ;; Get url from cu= rrent line<br> =A0 (copy-region-as-kill (re-search-backward "^") (re-search-forw= ard "$"))<br>=A0 (setq url (substring-no-properties (current-kill= 0)))<br>=A0 ;; Get title of web page, with the help of functions in url.el= <br> =A0 (with-current-buffer (url-retrieve-synchronously url)<br>=A0=A0=A0 (got= o-char 0)<br>=A0=A0=A0 (re-search-forward "<title>\\(.*\\)<[/= ]title>" nil t 1)<br>=A0=A0=A0 (setq web_title_str (match-string 1)= ))<br>=A0=A0=A0 (setq web_title_str (encode-coding-string web_title_str = 9;utf-8-dos))<br> =A0 ;; Insert the title in the next line<br>=A0 (reindent-then-newline-and-= indent)<br>=A0 (insert web_title_str)<br>=A0 )<br><br><br><div><span class= =3D"gmail_quote">On 7/28/10, <b class=3D"gmail_sendername">Thamer Mahmoud</= b> <<a href=3D"mailto:thamer.mahmoud@gmail.com">thamer.mahmoud@gmail.com= </a>> wrote:</span><blockquote class=3D"gmail_quote" style=3D"margin: 0p= t 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1= ex;"> filebat Mark <<a href=3D"mailto:filebat.mark@gmail.com">filebat.mark@gma= il.com</a>> writes:<br> <br> > Such as, given "<a href=3D"http:/= /www.emacswiki.org/emacs/Git">http://www.emacswiki.org/emacs/Git</a>",= we will get the title<br> > of this web page, which is "EmacsWiki: Git:".<br> ><br> = > Function of w3m-current-title is quite close, but a standalone lisp fu= nction<br> > is much preferred.<br> <br> <br>Using the url.el package,<b= r> <br> (defun www-get-page-title (url)<br>=A0=A0(with-current-buffer (url-re= trieve-synchronously url)<br>=A0=A0=A0=A0(goto-char 0)<br>=A0=A0=A0=A0(re-s= earch-forward "<title>\\(.*\\)<[/]title>" nil t 1)<br= >=A0=A0=A0=A0(match-string 1)))<br> <br> (www-get-page-title "<a href=3D"http://www.emacswiki.org/emacs/G= it">http://www.emacswiki.org/emacs/Git</a>")<br> =3D> "EmacsWi= ki: Git"<br> <br> hth,<br> <br>Thamer<br> <br> <br> </blockquote></div= > <br><br clear=3D"all"><br>-- <br>Thanks & Regards<br><br>Denny Zhang<br= > --000e0cd3292e4fdb57048c72d253--