unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
From: Jean Louis <bugs@gnu.support>
To: Michael Heerdegen <michael_heerdegen@web.de>
Cc: help-gnu-emacs@gnu.org
Subject: Re: To fetch URL, extract <title> element?
Date: Thu, 12 Nov 2020 16:20:46 +0300	[thread overview]
Message-ID: <X602rpYV5gDglGSW@protected.rcdrun.com> (raw)
In-Reply-To: <87wnyqlomi.fsf@web.de>

* Michael Heerdegen <michael_heerdegen@web.de> [2020-11-12 15:57]:
> Jean Louis <bugs@gnu.support> writes:
> 
> > > If I understand what you want correctly, eww seems to get the title with
> > > `eww-tag-title'
> >
> > That somehow sounds easier to do. To get HTML or any text is first
> > priority.
> 
> I also only had looked at the eww code.  Maybe Lars wants to help
> more.

Some hyperlinks are captured by copy from any browser and inserted
into Emacs.

- As such do not have title or annotation, but they need to
  have. Title has to be fetched automatically. It is expensive
  process. I would like fetching only headers.

- some WWW links expire, their status has to be updated from time to
  time

- then it becomes possible for user to mark hyperlinks and update
  titles for all of them

I do not know how to use url-retrieve but I found out how to use it
synchronoysly and for now this work non-elegantly. 

(defun hyperscope-url-to-string (url)
  "Fetch URL and return as string."
  (url-retrieve-synchronously url)
  (let ((buffer (url-retrieve-synchronously url)))
    (with-current-buffer buffer
      (buffer-string))))

(defun hyperscope-fetch-title (url)
  "Return title for URL or if there is no match URL."
  (let* ((string (hyperscope-url-to-string url))
	 (match (string-match "<title>\\(.*\\)</title>" string)))
    (if match
	(replace-regexp-in-string "<title>\\|</title>" "" (match-string 0 string))
      url)))

(defun hyperscope-fetch-title-for-url (id)
  (let* ((url (hlinks-link id))
	 (title-or-url (hyperscope-fetch-title url)))
    (hlink-update-name-1 title-or-url id)))

(defun hyperscope-update-url-title ()
  (interactive)
  (let ((id (tabulated-list-get-id)))
    (hyperscope-fetch-title-for-url id)))

> > That will help in Hyperscope to automatically update WWW links with
> > their titles provided that content-type is HTML.
> 
> I'm curious: what exactly are you doing?  (I don't know Hyperscope but
> see that it's easy to find infos about it in the Internet.)

It is DKR or Dynamic Knowledge Repository
https://www.dougengelbart.org/content/view/190/163/
https://en.wikipedia.org/wiki/Dynamic_knowledge_repository

Hyperscope is a browsing tool that enables most of the viewing and
navigating features called for in Doug Engelbart's open hyperdocument
system framework (OHS) to support dynamic knowledge repositories
(DKRs) and rising Collective IQ.
https://www.dougengelbart.org/content/view/154/86/

This HyperScope for Emacs is similar to it. It may grow as large index
or it can be used only for bookmarking simple stuff. It is collection
of hyperlinks to anything. Similarly as Emacs bookmarking system it
can hyperlink to any file, file by search or by line number. It does
not work as text as it is database backed.

emacs-libpq dynamic module for PostgreSQL database is coming soon into
GNU ELPA. When this comes then maybe I get some productive version
coming as well.

As result it gives collective IQ or easier access to pieces of
information that a group may need to accelerate its efficiency.







  reply	other threads:[~2020-11-12 13:20 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-11  9:28 To fetch URL, extract <title> element? Jean Louis
2020-11-11 15:27 ` Michael Heerdegen
2020-11-11 18:04   ` Jean Louis
2020-11-12 12:56     ` Michael Heerdegen
2020-11-12 13:20       ` Jean Louis [this message]
2020-11-12 14:49     ` Yuri Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X602rpYV5gDglGSW@protected.rcdrun.com \
    --to=bugs@gnu.support \
    --cc=help-gnu-emacs@gnu.org \
    --cc=michael_heerdegen@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).