From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Jean Louis Newsgroups: gmane.emacs.help Subject: To fetch URL, extract element? Date: Wed, 11 Nov 2020 12:28:05 +0300 Message-ID: <X6uupXrgbwdITVW4@protected.rcdrun.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="35410"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mutt/2.0 (3d08634) (2020-11-07) To: GNU Emacs Help <help-gnu-emacs@gnu.org> Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Wed Nov 11 14:59:57 2020 Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org> Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org>) id 1kcqfF-000958-BI for geh-help-gnu-emacs@m.gmane-mx.org; Wed, 11 Nov 2020 14:59:57 +0100 Original-Received: from localhost ([::1]:39058 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org>) id 1kcqfE-0002K6-Ds for geh-help-gnu-emacs@m.gmane-mx.org; Wed, 11 Nov 2020 08:59:56 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:54358) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <bugs@gnu.support>) id 1kcqdT-0001P0-Aw for help-gnu-emacs@gnu.org; Wed, 11 Nov 2020 08:58:07 -0500 Original-Received: from static.rcdrun.com ([95.85.24.50]:45729) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <bugs@gnu.support>) id 1kcqdR-0000Tm-F8 for help-gnu-emacs@gnu.org; Wed, 11 Nov 2020 08:58:06 -0500 Original-Received: from localhost ([::ffff:197.157.34.177]) (AUTH: PLAIN admin, TLS: TLS1.2,256bits,ECDHE_RSA_AES_256_GCM_SHA384) by static.rcdrun.com with ESMTPSA id 00000000002C0009.000000005FABEDEA.000037F2; Wed, 11 Nov 2020 13:58:02 +0000 Content-Disposition: inline Received-SPF: pass client-ip=95.85.24.50; envelope-from=bugs@gnu.support; helo=static.rcdrun.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/11/11 08:57:59 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -2 X-Spam_score: -0.3 X-Spam_bar: / X-Spam_report: (-0.3 / 5.0 requ) BAYES_00=-1.9, DATE_IN_PAST_03_06=1.592, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org> List-Unsubscribe: <https://lists.gnu.org/mailman/options/help-gnu-emacs>, <mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe> List-Archive: <https://lists.gnu.org/archive/html/help-gnu-emacs> List-Post: <mailto:help-gnu-emacs@gnu.org> List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help> List-Subscribe: <https://lists.gnu.org/mailman/listinfo/help-gnu-emacs>, <mailto:help-gnu-emacs-request@gnu.org?subject=subscribe> Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org> Xref: news.gmane.io gmane.emacs.help:125226 Archived-At: <http://permalink.gmane.org/gmane.emacs.help/125226> What is the standard built-in way to fetch the http[s] URL? I need to get string to parse <title> and I think something like this below: (setq a (with-temp-buffer (url-retrieve "https://www.gnu.org" 'identity) (buffer-string))) When researching `eww' I find this function here, which is chunk that makes sure of parsing and punny code in the URL. I do not find it useful as I cannot easily fetch URL without thinking of those details. ;;;###autoload (defun eww (url &optional arg buffer) "Fetch URL and render the page. If the input doesn't look like an URL or a domain name, the word(s) will be searched for via `eww-search-prefix'. If called with a prefix ARG, use a new buffer instead of reusing the default EWW buffer. If BUFFER, the data to be rendered is in that buffer. In that case, this function doesn't actually fetch URL. BUFFER will be killed after rendering." (interactive (let ((uris (eww-suggested-uris))) (list (read-string (format-prompt "Enter URL or keywords" (and uris (car uris))) nil 'eww-prompt-history uris) (prefix-numeric-value current-prefix-arg)))) (setq url (eww--dwim-expand-url url)) (pop-to-buffer-same-window (cond ((eq arg 4) (generate-new-buffer "*eww*")) ((eq major-mode 'eww-mode) (current-buffer)) (t (get-buffer-create "*eww*")))) (eww-setup-buffer) ;; Check whether the domain only uses "Highly Restricted" Unicode ;; IDNA characters. If not, transform to punycode to indicate that ;; there may be funny business going on. (let ((parsed (url-generic-parse-url url))) (when (url-host parsed) (unless (puny-highly-restrictive-domain-p (url-host parsed)) (setf (url-host parsed) (puny-encode-domain (url-host parsed))))) ;; When the URL is on the form "http://a/../../../g", chop off all ;; the leading "/.."s. (when (url-filename parsed) (while (string-match "\\`/[.][.]/" (url-filename parsed)) (setf (url-filename parsed) (substring (url-filename parsed) 3)))) (setq url (url-recreate-url parsed))) (plist-put eww-data :url url) (plist-put eww-data :title "") (eww-update-header-line-format) (let ((inhibit-read-only t)) (insert (format "Loading %s..." url)) (goto-char (point-min))) (let ((url-mime-accept-string eww-accept-content-types)) (if buffer (let ((eww-buffer (current-buffer))) (with-current-buffer buffer (eww-render nil url nil eww-buffer))) (eww-retrieve url #'eww-render (list url nil (current-buffer)))))) From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Michael Heerdegen <michael_heerdegen@web.de> Newsgroups: gmane.emacs.help Subject: Re: To fetch URL, extract <title> element? Date: Wed, 11 Nov 2020 16:27:59 +0100 Message-ID: <87zh3nc3r4.fsf@web.de> References: <X6uupXrgbwdITVW4@protected.rcdrun.com> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17641"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) To: help-gnu-emacs@gnu.org Cancel-Lock: sha1:XGnTw91aXCRxugdEgELlRvUDmxI= Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Wed Nov 11 16:29:11 2020 Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org> Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org>) id 1kcs3b-0004TR-Hn for geh-help-gnu-emacs@m.gmane-mx.org; Wed, 11 Nov 2020 16:29:11 +0100 Original-Received: from localhost ([::1]:38488 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org>) id 1kcs3a-0002dt-JH for geh-help-gnu-emacs@m.gmane-mx.org; Wed, 11 Nov 2020 10:29:10 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:51590) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <geh-help-gnu-emacs@m.gmane-mx.org>) id 1kcs2b-0002cE-7v for help-gnu-emacs@gnu.org; Wed, 11 Nov 2020 10:28:09 -0500 Original-Received: from static.214.254.202.116.clients.your-server.de ([116.202.254.214]:43832 helo=ciao.gmane.io) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <geh-help-gnu-emacs@m.gmane-mx.org>) id 1kcs2Z-0006xr-Jz for help-gnu-emacs@gnu.org; Wed, 11 Nov 2020 10:28:08 -0500 Original-Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from <geh-help-gnu-emacs@m.gmane-mx.org>) id 1kcs2W-00036G-QK for help-gnu-emacs@gnu.org; Wed, 11 Nov 2020 16:28:04 +0100 X-Injected-Via-Gmane: http://gmane.org/ Received-SPF: pass client-ip=116.202.254.214; envelope-from=geh-help-gnu-emacs@m.gmane-mx.org; helo=ciao.gmane.io X-detected-operating-system: by eggs.gnu.org: First seen = 2020/11/11 05:55:22 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Spam_score_int: -13 X-Spam_score: -1.4 X-Spam_bar: - X-Spam_report: (-1.4 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org> List-Unsubscribe: <https://lists.gnu.org/mailman/options/help-gnu-emacs>, <mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe> List-Archive: <https://lists.gnu.org/archive/html/help-gnu-emacs> List-Post: <mailto:help-gnu-emacs@gnu.org> List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help> List-Subscribe: <https://lists.gnu.org/mailman/listinfo/help-gnu-emacs>, <mailto:help-gnu-emacs-request@gnu.org?subject=subscribe> Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org> Xref: news.gmane.io gmane.emacs.help:125230 Archived-At: <http://permalink.gmane.org/gmane.emacs.help/125230> Jean Louis <bugs@gnu.support> writes: > What is the standard built-in way to fetch the http[s] URL? `url-retrieve' sounds appropriate. > I need to get string to parse <title>> [...] If I understand what you want correctly, eww seems to get the title with `eww-tag-title'. > When researching `eww' I find this function here, which is chunk that > makes sure of parsing and punny code in the URL. I do not find it > useful as I cannot easily fetch URL without thinking of those details. > [...] Now I couldn't parse this. Michael. From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Jean Louis <bugs@gnu.support> Newsgroups: gmane.emacs.help Subject: Re: To fetch URL, extract <title> element? Date: Wed, 11 Nov 2020 21:04:38 +0300 Message-ID: <X6wntvKmc5LtYuiO@protected.rcdrun.com> References: <X6uupXrgbwdITVW4@protected.rcdrun.com> <87zh3nc3r4.fsf@web.de> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="23313"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mutt/2.0 (3d08634) (2020-11-07) Cc: help-gnu-emacs@gnu.org To: Michael Heerdegen <michael_heerdegen@web.de> Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Wed Nov 11 19:06:16 2020 Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org> Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org>) id 1kcuVb-0005xB-71 for geh-help-gnu-emacs@m.gmane-mx.org; Wed, 11 Nov 2020 19:06:15 +0100 Original-Received: from localhost ([::1]:38880 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org>) id 1kcuVa-0006I5-8J for geh-help-gnu-emacs@m.gmane-mx.org; Wed, 11 Nov 2020 13:06:14 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:37046) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <bugs@gnu.support>) id 1kcuV4-0006H5-JQ for help-gnu-emacs@gnu.org; Wed, 11 Nov 2020 13:05:42 -0500 Original-Received: from static.rcdrun.com ([95.85.24.50]:56497) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <bugs@gnu.support>) id 1kcuV2-0004Pe-L9 for help-gnu-emacs@gnu.org; Wed, 11 Nov 2020 13:05:42 -0500 Original-Received: from localhost ([::ffff:197.157.34.177]) (AUTH: PLAIN admin, TLS: TLS1.2,256bits,ECDHE_RSA_AES_256_GCM_SHA384) by static.rcdrun.com with ESMTPSA id 00000000002C0009.000000005FAC27F2.00006039; Wed, 11 Nov 2020 18:05:37 +0000 Content-Disposition: inline In-Reply-To: <87zh3nc3r4.fsf@web.de> Received-SPF: pass client-ip=95.85.24.50; envelope-from=bugs@gnu.support; helo=static.rcdrun.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/11/11 08:57:59 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org> List-Unsubscribe: <https://lists.gnu.org/mailman/options/help-gnu-emacs>, <mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe> List-Archive: <https://lists.gnu.org/archive/html/help-gnu-emacs> List-Post: <mailto:help-gnu-emacs@gnu.org> List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help> List-Subscribe: <https://lists.gnu.org/mailman/listinfo/help-gnu-emacs>, <mailto:help-gnu-emacs-request@gnu.org?subject=subscribe> Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org> Xref: news.gmane.io gmane.emacs.help:125237 Archived-At: <http://permalink.gmane.org/gmane.emacs.help/125237> * Michael Heerdegen <michael_heerdegen@web.de> [2020-11-11 18:29]: > Jean Louis <bugs@gnu.support> writes: > > > What is the standard built-in way to fetch the http[s] URL? > > `url-retrieve' sounds appropriate. I am trying like this: (defun wrs-fetch-title (url) (url-retrieve url #'wrs-get-title (list url))) (defun wrs-get-title (status url) (message-any status)) (wrs-fetch-title "http://localhost") At least I get status nil, but I do not know how to get the HTML text. Yes, I am looking into eww but it is not enlightening. > If I understand what you want correctly, eww seems to get the title with > `eww-tag-title' That somehow sounds easier to do. To get HTML or any text is first priority. That will help in Hyperscope to automatically update WWW links with their titles provided that content-type is HTML. From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Michael Heerdegen <michael_heerdegen@web.de> Newsgroups: gmane.emacs.help Subject: Re: To fetch URL, extract <title> element? Date: Thu, 12 Nov 2020 13:56:53 +0100 Message-ID: <87wnyqlomi.fsf@web.de> References: <X6uupXrgbwdITVW4@protected.rcdrun.com> <87zh3nc3r4.fsf@web.de> <X6wntvKmc5LtYuiO@protected.rcdrun.com> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="29082"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Cc: help-gnu-emacs@gnu.org To: Jean Louis <bugs@gnu.support> Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Thu Nov 12 13:57:53 2020 Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org> Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org>) id 1kdCAj-0007TI-5I for geh-help-gnu-emacs@m.gmane-mx.org; Thu, 12 Nov 2020 13:57:53 +0100 Original-Received: from localhost ([::1]:56476 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org>) id 1kdCAi-0002hY-6v for geh-help-gnu-emacs@m.gmane-mx.org; Thu, 12 Nov 2020 07:57:52 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:41984) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <michael_heerdegen@web.de>) id 1kdCAM-0002hK-8o for help-gnu-emacs@gnu.org; Thu, 12 Nov 2020 07:57:30 -0500 Original-Received: from mout.web.de ([212.227.17.11]:52753) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <michael_heerdegen@web.de>) id 1kdCAJ-0000Cj-Ah for help-gnu-emacs@gnu.org; Thu, 12 Nov 2020 07:57:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=web.de; s=dbaedf251592; t=1605185814; bh=NoSC4gPQSSAkkrYXJ1zLzTYX3Pu+BdHJQvwEYDjnUQU=; h=X-UI-Sender-Class:From:To:Cc:Subject:References:Date:In-Reply-To; b=THBn0X3rKiFHA1WXLx6Sfr0mAuJCOHjrGY8I805DA/gxkrmk3dCw+KEwNXZaYxFJD 3JW9BPKiCmuSqX0F6iRKij+QQXyWMwE+tx93+QMAJkvx2vFUYDLY4ZdToNPK2J9VlF 1y34kI2S5IAD5nh6BroSWPeo/MK0KX2gmmt8daX0= X-UI-Sender-Class: c548c8c5-30a9-4db5-a2e7-cb6cb037b8f9 Original-Received: from drachen.dragon ([94.218.215.213]) by smtp.web.de (mrweb101 [213.165.67.124]) with ESMTPSA (Nemesis) id 0M3Bhz-1kKz331Glj-00stAC; Thu, 12 Nov 2020 13:56:54 +0100 In-Reply-To: <X6wntvKmc5LtYuiO@protected.rcdrun.com> (Jean Louis's message of "Wed, 11 Nov 2020 21:04:38 +0300") X-Provags-ID: V03:K1:eTEf6DBFL5kvXiAt9bTwSk+3Lv6FZFpZggq6ppYdXFB1yuAAMbl nE86TfbCTaS8/YY8Nn7B1Sdif5g2Gd8Ig9cpgRXaExV9oMA858cgxxqdHI5r4P4LWXJ+v/l H74SoQMwh4ylLd4SMkN8CG73NU6h+hKviTpkHuDTP2gVo8Fn0wUBtiDK8TdmWaimAigMbEN Jo5tKuvU8bdP5bSJemBoA== X-UI-Out-Filterresults: notjunk:1;V03:K0:FgZzaYf3qyc=:RC7NZHB3VutvSiYEctbveS pbv7jBU+igTn8AOAprdwcNLmW7iWlwatGM/W6/K1HW81DQksi+Rx8K2RhZQfmTsMP3t8IRF78 QW3qZFoZGdgaVuW/yZt4XkWyFVusuKJ1Ks+UJjl7h0qOr4zQoGER1tBXtAAZZV8ACso6kCNFw VBaN1K2yNy7PUsfzYAbVFpFQojFFRsmBxJ3cakaaK+zP7e58YUqNGc4eJNEQz4AkL1MLyIv0r OjjJB9QQSFx0g8ZJUCkZg2AO02GfywvhGhSPEM2IqJGFE5fbdmDbDgZf8YcFQjNBUZWVBwT0q SvBM185uOlvJYvKcoyj1geKy2WVzzR48WBe6yNUUwf1iPM/3WraoAXa7eU9QQ+aAGEcDq0J00 a7Ew7qJmnYdpZAWF9QO0J1RRb2gzxFT4n+2qic1axUWrknVs1yuhkGKBDyB0kaj/6zOEjEuWS yJ5K09Cf0h3YpQxX3/gUx2B0Wm6exzJBqhBY5qdROhmkQc7Y0s9g/U6a1oN9ZFItCfn8xeOtK 0Z6gcO+wqvxYaTf65KDRRbv0vzbhth6i5lwWUbX2ptWHIGJoJMTQQikbvFeNhl82mUS+yi4Ue FUds7k6MiILD9vA3+9fWnaAjnW0MST7caemoDKdUfQqQzLF6l6bmWVKm4YOkviqv50TK4EgDH 4D25lSSEUF3XtrAcT3858zOPsWCs7Pu7CaOJnTTv7HIMlKbNLmuGA61jqr1PG1EIYpw19paaY GWXQbIt274ZevmNKXRsBx/jVHjGqxAvqFLr0jlyNbYY9WRi3SZF6b5e+jsnxdwqK2Mx8r2gp Received-SPF: pass client-ip=212.227.17.11; envelope-from=michael_heerdegen@web.de; helo=mout.web.de X-detected-operating-system: by eggs.gnu.org: First seen = 2020/11/12 07:57:25 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] [fuzzy] X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org> List-Unsubscribe: <https://lists.gnu.org/mailman/options/help-gnu-emacs>, <mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe> List-Archive: <https://lists.gnu.org/archive/html/help-gnu-emacs> List-Post: <mailto:help-gnu-emacs@gnu.org> List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help> List-Subscribe: <https://lists.gnu.org/mailman/listinfo/help-gnu-emacs>, <mailto:help-gnu-emacs-request@gnu.org?subject=subscribe> Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org> Xref: news.gmane.io gmane.emacs.help:125249 Archived-At: <http://permalink.gmane.org/gmane.emacs.help/125249> Jean Louis <bugs@gnu.support> writes: > > If I understand what you want correctly, eww seems to get the title with > > `eww-tag-title' > > That somehow sounds easier to do. To get HTML or any text is first > priority. I also only had looked at the eww code. Maybe Lars wants to help more. > That will help in Hyperscope to automatically update WWW links with > their titles provided that content-type is HTML. I'm curious: what exactly are you doing? (I don't know Hyperscope but see that it's easy to find infos about it in the Internet.) Regards, Michael. From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuri Khan <yuri.v.khan@gmail.com> Newsgroups: gmane.emacs.help Subject: Re: To fetch URL, extract <title> element? Date: Thu, 12 Nov 2020 21:49:47 +0700 Message-ID: <CAP_d_8VupQ5p1s_F=zmhSogs7XZmbLnFr51T6qUjayns0PT57g@mail.gmail.com> References: <X6uupXrgbwdITVW4@protected.rcdrun.com> <87zh3nc3r4.fsf@web.de> <X6wntvKmc5LtYuiO@protected.rcdrun.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="1131"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Michael Heerdegen <michael_heerdegen@web.de>, help-gnu-emacs <help-gnu-emacs@gnu.org> To: Jean Louis <bugs@gnu.support> Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Thu Nov 12 15:53:54 2020 Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org> Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org>) id 1kdDyz-0000DL-O4 for geh-help-gnu-emacs@m.gmane-mx.org; Thu, 12 Nov 2020 15:53:53 +0100 Original-Received: from localhost ([::1]:37882 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org>) id 1kdDyy-0007Ln-FP for geh-help-gnu-emacs@m.gmane-mx.org; Thu, 12 Nov 2020 09:53:52 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:42424) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <yurivkhan@gmail.com>) id 1kdDvK-0004Kj-4n for help-gnu-emacs@gnu.org; Thu, 12 Nov 2020 09:50:06 -0500 Original-Received: from mail-vk1-xa29.google.com ([2607:f8b0:4864:20::a29]:41143) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <yurivkhan@gmail.com>) id 1kdDvI-0006ag-4f for help-gnu-emacs@gnu.org; Thu, 12 Nov 2020 09:50:05 -0500 Original-Received: by mail-vk1-xa29.google.com with SMTP id e8so1365518vkk.8 for <help-gnu-emacs@gnu.org>; Thu, 12 Nov 2020 06:50:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=srb83tj2hJn/Ogf/knpeim63TMdIjT/tJqu4ARNL9XI=; b=hbsqZDRkANluk0W51pjp0HVX2cPss3nfEjw/BCmibqTjO7yIpfnrmvt3mHli7MfXMt t2+Dd17A/xaMHxdWtFChDfl3MEElKBgu1Qt9jow/jwYvCyI4q5FgVQUdQWGW2c3hW6OS oEuxjm7PKYhxh6k/OUX6aYlNlrNiBRl/oUix5Dq4u/bQkr/tSWMJLtOdkA7wQE/6Fs2C KGMWJ/Cc2Tf/yqY9l59l+Nc+OjaWqkpK3Yoefva9FlZaSC+wBZ1nscN2JkoPLXEhVJPh B69qNxjxofcy5Pdy6jW2qawvWlIvpiKTuFBm2xSobydnc22KsyMqvn/O7NFUte4WVSxA /Tyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=srb83tj2hJn/Ogf/knpeim63TMdIjT/tJqu4ARNL9XI=; b=ufMMunaimLYcPS1govTOv0Y+UHzWU6dOfhEwPMN6yLuqZ2FMFHaAapCJHTcSfxnaa2 3psUFVlTbRsvkrMBlRepl/bu9IEBgMiXYNGuMiEHUzozyI5n092Jxy20HdDGwfFNKanT JV+5gCFaADfwcWtPoTHs0EYCOxSXpV/xhlfyb0kHZEyehzt1PRupciKLBEOx57o6F7ew mm0JpNw+hflozPt69WCaa4LD/Wxo+nkUM+bTKm1JSx/uPE2fggWC8X4U4tb+Zp0cm8E8 OhYlhEglYKCAsOQX4DMwfMmPVJ2+W/YC+BwBpbLqNZCGm/qAz/n7y4xwVzuL+aaIZ8hP gXOw== X-Gm-Message-State: AOAM530XnRDm5pL9gmhDCM8tIAtsFOrmDB3ub7I14TxxGqNpMBBok/z0 oeSxJToUsAgYhClMU5fAYpX53AILAAr4BPKAR3E= X-Google-Smtp-Source: ABdhPJyr0TZFNnGSqJ7ppPElqvGVtJKlFEai4xP9tuiYBTMTAYytgreF3n22tdAJfCOfPcqWbdwavXXEmWE5NTbQH2I= X-Received: by 2002:a1f:2389:: with SMTP id j131mr117766vkj.18.1605192600321; Thu, 12 Nov 2020 06:50:00 -0800 (PST) In-Reply-To: <X6wntvKmc5LtYuiO@protected.rcdrun.com> Received-SPF: pass client-ip=2607:f8b0:4864:20::a29; envelope-from=yurivkhan@gmail.com; helo=mail-vk1-xa29.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org> List-Unsubscribe: <https://lists.gnu.org/mailman/options/help-gnu-emacs>, <mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe> List-Archive: <https://lists.gnu.org/archive/html/help-gnu-emacs> List-Post: <mailto:help-gnu-emacs@gnu.org> List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help> List-Subscribe: <https://lists.gnu.org/mailman/listinfo/help-gnu-emacs>, <mailto:help-gnu-emacs-request@gnu.org?subject=subscribe> Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org> Xref: news.gmane.io gmane.emacs.help:125250 Archived-At: <http://permalink.gmane.org/gmane.emacs.help/125250> On Thu, 12 Nov 2020 at 01:05, Jean Louis <bugs@gnu.support> wrote: > > > What is the standard built-in way to fetch the http[s] URL? > > > > `url-retrieve' sounds appropriate. > > I am trying like this: > > (defun wrs-fetch-title (url) > (url-retrieve url #'wrs-get-title (list url))) > > (defun wrs-get-title (status url) > (message-any status)) > > (wrs-fetch-title "http://localhost") > > At least I get status nil, but I do not know how to get the HTML > text. Have you read the docstring of =E2=80=98url-retrieve=E2=80=99? CALLBACK is called when the object has been completely retrieved, with the current buffer containing the object, and any MIME headers associat= ed with it.[=E2=80=A6] So probably: (defun wrs-fetch-title (url) (url-retrieve url #'wrs-get-title (list url))) (defun wrs-get-title (status url) (goto-char (point-min)) (search-forward "\n\n") ; skip HTTP headers (if (search-forward-regexp "<title\\(?:\\s+[^>]*\\)?>\\([^<]*\\)" nil 'noerror) (message "URL: %s Title: %s" url (match-string 1)))) (wrs-fetch-title "https://gnu.org/") =E2=87=92 URL: https://gnu.org/ Title: The GNU Operating System and the Fre= e Software Movement (For demonstration purposes, I=E2=80=99m overlooking error handling and MIM= E type checks. In a real program, you ought to first make sure you got a successful status, then check that the response you got has a =E2=80=98Content-Type=E2=80=99 of either =E2=80=98text/html=E2=80=99 or =E2= =80=98application/xhtml+xml=E2=80=99 (with possible parameters such as =E2=80=98charset=E2=80=99), and only then look = for HTML-specific =E2=80=A6 tags.) From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Jean Louis Newsgroups: gmane.emacs.help Subject: Re: To fetch URL, extract element? Date: Thu, 12 Nov 2020 16:20:46 +0300 Message-ID: <X602rpYV5gDglGSW@protected.rcdrun.com> References: <X6uupXrgbwdITVW4@protected.rcdrun.com> <87zh3nc3r4.fsf@web.de> <X6wntvKmc5LtYuiO@protected.rcdrun.com> <87wnyqlomi.fsf@web.de> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="38809"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mutt/2.0 (3d08634) (2020-11-07) Cc: help-gnu-emacs@gnu.org To: Michael Heerdegen <michael_heerdegen@web.de> Original-X-From: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Thu Nov 12 18:52:55 2020 Return-path: <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org> Envelope-to: geh-help-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org>) id 1kdGmD-0009xa-Hr for geh-help-gnu-emacs@m.gmane-mx.org; Thu, 12 Nov 2020 18:52:53 +0100 Original-Received: from localhost ([::1]:39084 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org>) id 1kdGmC-0005fD-Gx for geh-help-gnu-emacs@m.gmane-mx.org; Thu, 12 Nov 2020 12:52:52 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:35488) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <bugs@gnu.support>) id 1kdGlL-0005dE-7o for help-gnu-emacs@gnu.org; Thu, 12 Nov 2020 12:51:59 -0500 Original-Received: from static.rcdrun.com ([95.85.24.50]:35983) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <bugs@gnu.support>) id 1kdGlJ-0006cB-Ac for help-gnu-emacs@gnu.org; Thu, 12 Nov 2020 12:51:58 -0500 Original-Received: from localhost ([::ffff:197.157.34.177]) (AUTH: PLAIN admin, TLS: TLS1.2,256bits,ECDHE_RSA_AES_256_GCM_SHA384) by static.rcdrun.com with ESMTPSA id 00000000002C0004.000000005FAD763A.0000268F; Thu, 12 Nov 2020 17:51:54 +0000 Content-Disposition: inline In-Reply-To: <87wnyqlomi.fsf@web.de> Received-SPF: pass client-ip=95.85.24.50; envelope-from=bugs@gnu.support; helo=static.rcdrun.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/11/12 12:51:55 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -2 X-Spam_score: -0.3 X-Spam_bar: / X-Spam_report: (-0.3 / 5.0 requ) BAYES_00=-1.9, DATE_IN_PAST_03_06=1.592, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: help-gnu-emacs@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Users list for the GNU Emacs text editor <help-gnu-emacs.gnu.org> List-Unsubscribe: <https://lists.gnu.org/mailman/options/help-gnu-emacs>, <mailto:help-gnu-emacs-request@gnu.org?subject=unsubscribe> List-Archive: <https://lists.gnu.org/archive/html/help-gnu-emacs> List-Post: <mailto:help-gnu-emacs@gnu.org> List-Help: <mailto:help-gnu-emacs-request@gnu.org?subject=help> List-Subscribe: <https://lists.gnu.org/mailman/listinfo/help-gnu-emacs>, <mailto:help-gnu-emacs-request@gnu.org?subject=subscribe> Errors-To: help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: "help-gnu-emacs" <help-gnu-emacs-bounces+geh-help-gnu-emacs=m.gmane-mx.org@gnu.org> Xref: news.gmane.io gmane.emacs.help:125251 Archived-At: <http://permalink.gmane.org/gmane.emacs.help/125251> * Michael Heerdegen <michael_heerdegen@web.de> [2020-11-12 15:57]: > Jean Louis <bugs@gnu.support> writes: > > > > If I understand what you want correctly, eww seems to get the title with > > > `eww-tag-title' > > > > That somehow sounds easier to do. To get HTML or any text is first > > priority. > > I also only had looked at the eww code. Maybe Lars wants to help > more. Some hyperlinks are captured by copy from any browser and inserted into Emacs. - As such do not have title or annotation, but they need to have. Title has to be fetched automatically. It is expensive process. I would like fetching only headers. - some WWW links expire, their status has to be updated from time to time - then it becomes possible for user to mark hyperlinks and update titles for all of them I do not know how to use url-retrieve but I found out how to use it synchronoysly and for now this work non-elegantly. (defun hyperscope-url-to-string (url) "Fetch URL and return as string." (url-retrieve-synchronously url) (let ((buffer (url-retrieve-synchronously url))) (with-current-buffer buffer (buffer-string)))) (defun hyperscope-fetch-title (url) "Return title for URL or if there is no match URL." (let* ((string (hyperscope-url-to-string url)) (match (string-match "<title>\\(.*\\)" string))) (if match (replace-regexp-in-string "\\|" "" (match-string 0 string)) url))) (defun hyperscope-fetch-title-for-url (id) (let* ((url (hlinks-link id)) (title-or-url (hyperscope-fetch-title url))) (hlink-update-name-1 title-or-url id))) (defun hyperscope-update-url-title () (interactive) (let ((id (tabulated-list-get-id))) (hyperscope-fetch-title-for-url id))) > > That will help in Hyperscope to automatically update WWW links with > > their titles provided that content-type is HTML. > > I'm curious: what exactly are you doing? (I don't know Hyperscope but > see that it's easy to find infos about it in the Internet.) It is DKR or Dynamic Knowledge Repository https://www.dougengelbart.org/content/view/190/163/ https://en.wikipedia.org/wiki/Dynamic_knowledge_repository Hyperscope is a browsing tool that enables most of the viewing and navigating features called for in Doug Engelbart's open hyperdocument system framework (OHS) to support dynamic knowledge repositories (DKRs) and rising Collective IQ. https://www.dougengelbart.org/content/view/154/86/ This HyperScope for Emacs is similar to it. It may grow as large index or it can be used only for bookmarking simple stuff. It is collection of hyperlinks to anything. Similarly as Emacs bookmarking system it can hyperlink to any file, file by search or by line number. It does not work as text as it is database backed. emacs-libpq dynamic module for PostgreSQL database is coming soon into GNU ELPA. When this comes then maybe I get some productive version coming as well. As result it gives collective IQ or easier access to pieces of information that a group may need to accelerate its efficiency.