all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: filebat Mark <filebat.mark@gmail.com>
To: Thamer Mahmoud <thamer.mahmoud@gmail.com>
Cc: help-gnu-emacs@gnu.org
Subject: Re: How to get title of web page by url?
Date: Thu, 29 Jul 2010 23:07:43 +0800	[thread overview]
Message-ID: <AANLkTikdn5-goSA2eTz7iAX6PNs-nbXO9W7ejOeDvaR7@mail.gmail.com> (raw)
In-Reply-To: <87k4of324m.fsf@zemblan.newkuwait.org>

[-- Attachment #1: Type: text/plain, Size: 2223 bytes --]

Thank you very much, Thamer! It serves my need very well.

Though html parser shall be more powerful, grepping the string shall be good
enough for my requirement.
Thank you all for the attention and valuable discussion.

Post the complete lisp function here, if someone else need it.
;; -------------------------- separator --------------------------
(defun get-page-title()
  "Get title of web page, whose url can be found in the current line"
  (interactive)
  ;; Get url from current line
  (copy-region-as-kill (re-search-backward "^") (re-search-forward "$"))
  (setq url (substring-no-properties (current-kill 0)))
  ;; Get title of web page, with the help of functions in url.el
  (with-current-buffer (url-retrieve-synchronously url)
    ;; find title by grep the html code
    (goto-char 0)
    (re-search-forward "<title>\\([^<]*\\)</title>" nil t 1)
    (setq web_title_str (match-string 1))
    ;; find charset by grep the html code
    (goto-char 0)
    (re-search-forward "charset=\\([-0-9a-zA-Z]*\\)" nil t 1)
    ;; downcase the charaset. e.g, UTF-8 is not acceptible for emacs, while
utf-8 is ok.
    (setq coding_charset (downcase (match-string 1)))
    ;; decode the string of title.
    (setq web_title_str (decode-coding-string web_title_str (intern
coding_charset)))
    )
  ;; Insert the title in the next line
  (reindent-then-newline-and-indent)
  (insert web_title_str)
  )



On Thu, Jul 29, 2010 at 2:14 AM, Thamer Mahmoud <thamer.mahmoud@gmail.com>wrote:

>
> > (defun www-get-page-title (url)
> >   (let ((title))
> >     (with-current-buffer (url-retrieve-synchronously url)
> >       (goto-char (point-min))
> >       (re-search-forward "<title>\\([^<]*\\)</title>" nil t 1)
> >       (setq title (match-string 1))
> >       (goto-char (point-min))
> >       (re-search-forward "charset=\\([-0-9a-zA-Z]*\\)" nil t 1)
> >       (decode-coding-string title (intern (match-string 1))))))
>
> Just did a test on a wikipedia page, and looks like
> `decode-coding-string' doesn't handle upper-case charsets, like UTF-8,
> only utf-8.
>
> So the last line should be:
>
> (decode-coding-string title (intern (downcase (match-string 1)))))))
>
> --
> Thamer
>
>
>


-- 
Thanks & Regards

Denny Zhang

[-- Attachment #2: Type: text/html, Size: 2895 bytes --]

  reply	other threads:[~2010-07-29 15:07 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-27 12:14 How to get title of web page by url? filebat Mark
2010-07-28  5:08 ` Thamer Mahmoud
2010-07-28 13:44   ` filebat Mark
2010-07-28 15:34     ` Thamer Mahmoud
2010-07-28 15:44       ` Lennart Borgman
2010-07-28 18:14       ` Thamer Mahmoud
2010-07-29 15:07         ` filebat Mark [this message]
2010-07-28 14:12   ` Deniz Dogan
2010-07-28 14:53     ` Teemu Likonen
2010-07-28 16:03       ` Andreas Röhler
2010-07-28 19:52         ` Andreas Röhler
     [not found]   ` <mailman.2.1280326418.17798.help-gnu-emacs@gnu.org>
2010-07-28 14:49     ` Ted Zlatanov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTikdn5-goSA2eTz7iAX6PNs-nbXO9W7ejOeDvaR7@mail.gmail.com \
    --to=filebat.mark@gmail.com \
    --cc=help-gnu-emacs@gnu.org \
    --cc=thamer.mahmoud@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.