From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Thamer Mahmoud <thamer.mahmoud@gmail.com>
Newsgroups: gmane.emacs.help
Subject: Re: How to get title of web page by url?
Date: Wed, 28 Jul 2010 18:34:56 +0300
Message-ID: <87ocdr39i7.fsf@zemblan.newkuwait.org>
References: <AANLkTim0HqKDdYvFqBT3Giy+8n44cxnWtg+w92eA3muu@mail.gmail.com>
	<87vd802nx4.fsf@zemblan.newkuwait.org>
	<AANLkTin+3_2bt+Umn6cYt3b=rtOJb+RO=X180Vj3AVcs@mail.gmail.com>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: dough.gmane.org 1280331371 17071 80.91.229.12 (28 Jul 2010 15:36:11 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Wed, 28 Jul 2010 15:36:11 +0000 (UTC)
To: help-gnu-emacs@gnu.org
Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org Wed Jul 28 17:36:09 2010
Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>
Envelope-to: geh-help-gnu-emacs@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.69)
	(envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org>)
	id 1Oe8ft-0000wh-Fb
	for geh-help-gnu-emacs@m.gmane.org; Wed, 28 Jul 2010 17:36:05 +0200
Original-Received: from localhost ([127.0.0.1]:48713 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1Oe8fs-00048b-V1
	for geh-help-gnu-emacs@m.gmane.org; Wed, 28 Jul 2010 11:36:05 -0400
Original-Received: from [140.186.70.92] (port=39188 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Oe8f8-00047N-UE
	for help-gnu-emacs@gnu.org; Wed, 28 Jul 2010 11:35:20 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <geh-help-gnu-emacs@m.gmane.org>) id 1Oe8f6-00010V-6Q
	for help-gnu-emacs@gnu.org; Wed, 28 Jul 2010 11:35:18 -0400
Original-Received: from lo.gmane.org ([80.91.229.12]:39061)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <geh-help-gnu-emacs@m.gmane.org>) id 1Oe8f5-000101-Sa
	for help-gnu-emacs@gnu.org; Wed, 28 Jul 2010 11:35:16 -0400
Original-Received: from list by lo.gmane.org with local (Exim 4.69)
	(envelope-from <geh-help-gnu-emacs@m.gmane.org>) id 1Oe8f4-0000Mp-5N
	for help-gnu-emacs@gnu.org; Wed, 28 Jul 2010 17:35:14 +0200
Original-Received: from 89.203.6.209 ([89.203.6.209])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <help-gnu-emacs@gnu.org>; Wed, 28 Jul 2010 17:35:14 +0200
Original-Received: from thamer.mahmoud by 89.203.6.209 with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <help-gnu-emacs@gnu.org>; Wed, 28 Jul 2010 17:35:14 +0200
X-Injected-Via-Gmane: http://gmane.org/
Original-Lines: 97
Original-X-Complaints-To: usenet@dough.gmane.org
X-Gmane-NNTP-Posting-Host: 89.203.6.209
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)
Cancel-Lock: sha1:ldDpfZfhsIshN428Sv3/KxynoD8=
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3)
X-BeenThere: help-gnu-emacs@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/help-gnu-emacs>
List-Post: <mailto:help-gnu-emacs@gnu.org>
List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/help-gnu-emacs>,
	<mailto:help-gnu-emacs-request@gnu.org?subject=subscribe>
Original-Sender: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.help:74320
Archived-At: <http://permalink.gmane.org/gmane.emacs.help/74320>

filebat Mark <filebat.mark@gmail.com> writes:

> Thanks, Thamer. It works.
>
> Below is the code snippet.
>
> Well, I still have an encoding problem.
> To get the title of "http://www.baidu.com", the title we get is displayed as
> unrecognizable codes.
>
> I have tried to encode it, in the way of "(setq web_title_str
> (encode-coding-string  web_title_str 'utf-8-dos))", but it fails.

I'm also new to Elisp (well sort of). 

But here is a modified version that should handle both charsets and
newlines (and other issues noticed by Deniz Dogan. Thanks).

(defun www-get-page-title (url)
  (let ((title))
    (with-current-buffer (url-retrieve-synchronously url)
      (goto-char (point-min))
      (re-search-forward "<title>\\([^<]*\\)</title>" nil t 1)
      (setq title (match-string 1))
      (goto-char (point-min))
      (re-search-forward "charset=\\([-0-9a-zA-Z]*\\)" nil t 1)
      (decode-coding-string title (intern (match-string 1))))))

The robustness of this code would still depend on whether the HTML is
well-formed, but it should be good enough I think.

--
Thamer


> Since I am a newbie for emacs encoding, can you please help me to point what
> the problem is?


>
> ;; -------------------------- separator --------------------------
> (defun get-page-title()
>   "Get title of web page, whose url can be found in current line"
>   (interactive)
>   ;; Get url from current line
>   (copy-region-as-kill (re-search-backward "^") (re-search-forward "$"))
>   (setq url (substring-no-properties (current-kill 0)))
>   ;; Get title of web page, with the help of functions in url.el
>   (with-current-buffer (url-retrieve-synchronously url)
>     (goto-char 0)
>     (re-search-forward "<title>\\(.*\\)<[/]title>" nil t 1)
>     (setq web_title_str (match-string 1)))
>     (setq web_title_str (encode-coding-string web_title_str 'utf-8-dos))
>   ;; Insert the title in the next line
>   (reindent-then-newline-and-indent)
>   (insert web_title_str)
>   )
>
>
> On 7/28/10, Thamer Mahmoud <thamer.mahmoud@gmail.com> wrote:
>>
>> filebat Mark <filebat.mark@gmail.com> writes:
>>
>> > Such as, given "http://www.emacswiki.org/emacs/Git", we will get the
>> title
>> > of this web page, which is "EmacsWiki: Git:".
>> >
>> > Function of w3m-current-title is quite close, but a standalone lisp
>> function
>> > is much preferred.
>>
>>
>> Using the url.el package,
>>
>> (defun www-get-page-title (url)
>>   (with-current-buffer (url-retrieve-synchronously url)
>>     (goto-char 0)
>>     (re-search-forward "<title>\\(.*\\)<[/]title>" nil t 1)
>>     (match-string 1)))
>>
>> (www-get-page-title "http://www.emacswiki.org/emacs/Git")
>> => "EmacsWiki: Git"
>>
>> hth,
>>
>> Thamer
>>
>>
>>