From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: filebat Mark <filebat.mark@gmail.com>
Newsgroups: gmane.emacs.help
Subject: Re: How to get title of web page by url?
Date: Thu, 29 Jul 2010 23:07:43 +0800
Message-ID: <AANLkTikdn5-goSA2eTz7iAX6PNs-nbXO9W7ejOeDvaR7@mail.gmail.com>
References: <AANLkTim0HqKDdYvFqBT3Giy+8n44cxnWtg+w92eA3muu@mail.gmail.com>
	<87vd802nx4.fsf@zemblan.newkuwait.org>
	<AANLkTin+3_2bt+Umn6cYt3b=rtOJb+RO=X180Vj3AVcs@mail.gmail.com>
	<87ocdr39i7.fsf@zemblan.newkuwait.org>
	<87k4of324m.fsf@zemblan.newkuwait.org>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary=001636e0b1caa623fd048c881809
X-Trace: dough.gmane.org 1280416109 20815 80.91.229.12 (29 Jul 2010 15:08:29 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Thu, 29 Jul 2010 15:08:29 +0000 (UTC)
Cc: help-gnu-emacs@gnu.org
To: Thamer Mahmoud <thamer.mahmoud@gmail.com>
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Thu Jul 29 17:08:27 2010
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.69)
	(envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>)
	id 1OeUid-0000hd-Tk
	for geh-help-gnu-emacs@m.gmane.org; Thu, 29 Jul 2010 17:08:24 +0200
Original-Received: from localhost ([127.0.0.1]:45593 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1OeUic-00084L-Jb
	for geh-help-gnu-emacs@m.gmane.org; Thu, 29 Jul 2010 11:08:22 -0400
Original-Received: from [140.186.70.92] (port=54384 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1OeUi8-00081g-0s
	for help-gnu-emacs@gnu.org; Thu, 29 Jul 2010 11:07:58 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <filebat.mark@gmail.com>) id 1OeUi1-0003LB-Le
	for help-gnu-emacs@gnu.org; Thu, 29 Jul 2010 11:07:51 -0400
Original-Received: from mail-pv0-f169.google.com ([74.125.83.169]:59825)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <filebat.mark@gmail.com>) id 1OeUi1-0003Kt-BQ
	for help-gnu-emacs@gnu.org; Thu, 29 Jul 2010 11:07:45 -0400
Original-Received: by pvc30 with SMTP id 30so256172pvc.0
	for <help-gnu-emacs@gnu.org>; Thu, 29 Jul 2010 08:07:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:cc:content-type;
	bh=J1yHGT3QquPE3YgwAZ8iSghJzDM+WvKO2CQ2ayzjR3c=;
	b=s2pSeE+pdNvgBF75Jhk8OeSybWZoknxR++K/osvUYtoNCKdQv7AV9FCI0CM6jArViL
	9ij+C6YRXmt0cY++pRUuV61K40kdUp17NHJEUb38Emri0sCXnYifyXLPK8xOZVNyV/SY
	NJoFdZdrcZteyMMdXLVq7if77MTK+OMs5X7jE=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=tHnb8a+6HGB/BhBThg1VDl1Ua+b84liHBdqaNbKvr3fm5l9FHhJLfsnFip/Qec9jZh
	zj5+Nk+tV8ytfv0ymrl4pQq0FEGcAibJ+Ra4n3dKyK2Js/pEoiGF3UTAjmKZctG7Lvp3
	1anJl3CT0WYN0oazW19EqmK1GVsSnB1Ydx7rA=
Original-Received: by 10.143.37.18 with SMTP id p18mr247833wfj.46.1280416063824; Thu, 
	29 Jul 2010 08:07:43 -0700 (PDT)
Original-Received: by 10.229.43.4 with HTTP; Thu, 29 Jul 2010 08:07:43 -0700 (PDT)
In-Reply-To: <87k4of324m.fsf@zemblan.newkuwait.org>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2)
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.help:74337
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/74337>

--001636e0b1caa623fd048c881809
Content-Type: text/plain; charset=ISO-8859-1

Thank you very much, Thamer! It serves my need very well.

Though html parser shall be more powerful, grepping the string shall be good
enough for my requirement.
Thank you all for the attention and valuable discussion.

Post the complete lisp function here, if someone else need it.
;; -------------------------- separator --------------------------
(defun get-page-title()
  "Get title of web page, whose url can be found in the current line"
  (interactive)
  ;; Get url from current line
  (copy-region-as-kill (re-search-backward "^") (re-search-forward "$"))
  (setq url (substring-no-properties (current-kill 0)))
  ;; Get title of web page, with the help of functions in url.el
  (with-current-buffer (url-retrieve-synchronously url)
    ;; find title by grep the html code
    (goto-char 0)
    (re-search-forward "<title>\\([^<]*\\)</title>" nil t 1)
    (setq web_title_str (match-string 1))
    ;; find charset by grep the html code
    (goto-char 0)
    (re-search-forward "charset=\\([-0-9a-zA-Z]*\\)" nil t 1)
    ;; downcase the charaset. e.g, UTF-8 is not acceptible for emacs, while
utf-8 is ok.
    (setq coding_charset (downcase (match-string 1)))
    ;; decode the string of title.
    (setq web_title_str (decode-coding-string web_title_str (intern
coding_charset)))
    )
  ;; Insert the title in the next line
  (reindent-then-newline-and-indent)
  (insert web_title_str)
  )


On Thu, Jul 29, 2010 at 2:14 AM, Thamer Mahmoud <thamer.mahmoud@gmail.com>wrote:

>
> > (defun www-get-page-title (url)
> >   (let ((title))
> >     (with-current-buffer (url-retrieve-synchronously url)
> >       (goto-char (point-min))
> >       (re-search-forward "<title>\\([^<]*\\)</title>" nil t 1)
> >       (setq title (match-string 1))
> >       (goto-char (point-min))
> >       (re-search-forward "charset=\\([-0-9a-zA-Z]*\\)" nil t 1)
> >       (decode-coding-string title (intern (match-string 1))))))
>
> Just did a test on a wikipedia page, and looks like
> `decode-coding-string' doesn't handle upper-case charsets, like UTF-8,
> only utf-8.
>
> So the last line should be:
>
> (decode-coding-string title (intern (downcase (match-string 1)))))))
>
> --
> Thamer
>
>
>


-- 
Thanks & Regards

Denny Zhang

--001636e0b1caa623fd048c881809
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thank you very much, Thamer! It serves my need very well. <br><br>Though ht=
ml parser shall be more powerful, grepping the string shall be good enough =
for my requirement.<br>Thank you all for the attention and valuable discuss=
ion.<br>
<br>Post the complete lisp function here, if someone else need it.<br>;; --=
------------------------ separator --------------------------<br>(defun get=
-page-title()<br>=A0 &quot;Get title of web page, whose url can be found in=
 the current line&quot;<br>
=A0 (interactive)<br>=A0 ;; Get url from current line<br>=A0 (copy-region-a=
s-kill (re-search-backward &quot;^&quot;) (re-search-forward &quot;$&quot;)=
)<br>=A0 (setq url (substring-no-properties (current-kill 0)))<br>=A0 ;; Ge=
t title of web page, with the help of functions in url.el<br>
=A0 (with-current-buffer (url-retrieve-synchronously url)<br>=A0=A0=A0 ;; f=
ind title by grep the html code<br>=A0=A0=A0 (goto-char 0)<br>=A0=A0=A0 (re=
-search-forward &quot;&lt;title&gt;\\([^&lt;]*\\)&lt;/title&gt;&quot; nil t=
 1)<br>=A0=A0=A0 (setq web_title_str (match-string 1))<br>
=A0=A0=A0 ;; find charset by grep the html code<br>=A0=A0=A0 (goto-char 0)<=
br>=A0=A0=A0 (re-search-forward &quot;charset=3D\\([-0-9a-zA-Z]*\\)&quot; n=
il t 1)<br>=A0=A0=A0 ;; downcase the charaset. e.g, UTF-8 is not acceptible=
 for emacs, while utf-8 is ok.<br>
=A0=A0=A0 (setq coding_charset (downcase (match-string 1)))<br>=A0=A0=A0 ;;=
 decode the string of title.<br>=A0=A0=A0 (setq web_title_str (decode-codin=
g-string web_title_str (intern coding_charset)))<br>=A0=A0=A0 )<br>=A0 ;; I=
nsert the title in the next line<br>
=A0 (reindent-then-newline-and-indent)<br>=A0 (insert web_title_str)<br>=A0=
 )<br><br><br><br><div class=3D"gmail_quote">On Thu, Jul 29, 2010 at 2:14 A=
M, Thamer Mahmoud <span dir=3D"ltr">&lt;<a href=3D"mailto:thamer.mahmoud@gm=
ail.com">thamer.mahmoud@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class=3D"im"=
><br>
&gt; (defun www-get-page-title (url)<br>
&gt; =A0 (let ((title))<br>
&gt; =A0 =A0 (with-current-buffer (url-retrieve-synchronously url)<br>
&gt; =A0 =A0 =A0 (goto-char (point-min))<br>
&gt; =A0 =A0 =A0 (re-search-forward &quot;&lt;title&gt;\\([^&lt;]*\\)&lt;/t=
itle&gt;&quot; nil t 1)<br>
&gt; =A0 =A0 =A0 (setq title (match-string 1))<br>
&gt; =A0 =A0 =A0 (goto-char (point-min))<br>
&gt; =A0 =A0 =A0 (re-search-forward &quot;charset=3D\\([-0-9a-zA-Z]*\\)&quo=
t; nil t 1)<br>
&gt; =A0 =A0 =A0 (decode-coding-string title (intern (match-string 1))))))<=
br>
<br>
</div>Just did a test on a wikipedia page, and looks like<br>
`decode-coding-string&#39; doesn&#39;t handle upper-case charsets, like UTF=
-8,<br>
only utf-8.<br>
<br>
So the last line should be:<br>
<br>
(decode-coding-string title (intern (downcase (match-string 1)))))))<br>
<br>
--<br>
<font color=3D"#888888">Thamer<br>
<br>
<br>
</font></blockquote></div><br><br clear=3D"all"><br>-- <br>Thanks &amp; Reg=
ards<br><br>Denny Zhang<br><br>

--001636e0b1caa623fd048c881809--