unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: Jean Louis <bugs@gnu.support>
To: GNU Emacs Help <help-gnu-emacs@gnu.org>
Subject: To fetch URL, extract <title> element?
Date: Wed, 11 Nov 2020 12:28:05 +0300	[thread overview]
Message-ID: <X6uupXrgbwdITVW4@protected.rcdrun.com> (raw)

What is the standard built-in way to fetch the http[s] URL?

I need to get string to parse <title> and I think something like this
below:

(setq a
      (with-temp-buffer
	(url-retrieve "https://www.gnu.org" 'identity)
	(buffer-string)))


When researching `eww' I find this function here, which is chunk that
makes sure of parsing and punny code in the URL. I do not find it
useful as I cannot easily fetch URL without thinking of those details.

;;;###autoload
(defun eww (url &optional arg buffer)
  "Fetch URL and render the page.
If the input doesn't look like an URL or a domain name, the
word(s) will be searched for via `eww-search-prefix'.

If called with a prefix ARG, use a new buffer instead of reusing
the default EWW buffer.

If BUFFER, the data to be rendered is in that buffer.  In that
case, this function doesn't actually fetch URL.  BUFFER will be
killed after rendering."
  (interactive
   (let ((uris (eww-suggested-uris)))
     (list (read-string (format-prompt "Enter URL or keywords"
                                       (and uris (car uris)))
                        nil 'eww-prompt-history uris)
           (prefix-numeric-value current-prefix-arg))))
  (setq url (eww--dwim-expand-url url))
  (pop-to-buffer-same-window
   (cond
    ((eq arg 4)
     (generate-new-buffer "*eww*"))
    ((eq major-mode 'eww-mode)
     (current-buffer))
    (t
     (get-buffer-create "*eww*"))))
  (eww-setup-buffer)
  ;; Check whether the domain only uses "Highly Restricted" Unicode
  ;; IDNA characters.  If not, transform to punycode to indicate that
  ;; there may be funny business going on.
  (let ((parsed (url-generic-parse-url url)))
    (when (url-host parsed)
      (unless (puny-highly-restrictive-domain-p (url-host parsed))
        (setf (url-host parsed) (puny-encode-domain (url-host parsed)))))
    ;; When the URL is on the form "http://a/../../../g", chop off all
    ;; the leading "/.."s.
    (when (url-filename parsed)
      (while (string-match "\\`/[.][.]/" (url-filename parsed))
        (setf (url-filename parsed) (substring (url-filename parsed) 3))))
    (setq url (url-recreate-url parsed)))
  (plist-put eww-data :url url)
  (plist-put eww-data :title "")
  (eww-update-header-line-format)
  (let ((inhibit-read-only t))
    (insert (format "Loading %s..." url))
    (goto-char (point-min)))
  (let ((url-mime-accept-string eww-accept-content-types))
    (if buffer
        (let ((eww-buffer (current-buffer)))
          (with-current-buffer buffer
            (eww-render nil url nil eww-buffer)))
      (eww-retrieve url #'eww-render
                    (list url nil (current-buffer))))))



             reply	other threads:[~2020-11-11  9:28 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-11  9:28 Jean Louis [this message]
2020-11-11 15:27 ` To fetch URL, extract <title> element? Michael Heerdegen
2020-11-11 18:04   ` Jean Louis
2020-11-12 12:56     ` Michael Heerdegen
2020-11-12 13:20       ` Jean Louis
2020-11-12 14:49     ` Yuri Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X6uupXrgbwdITVW4@protected.rcdrun.com \
    --to=bugs@gnu.support \
    --cc=help-gnu-emacs@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).