unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* functions to download mailing list archives
@ 2022-06-12 22:30 GH
  2022-06-12 23:04 ` Óscar Fuentes
  2022-06-16  0:13 ` GH
  0 siblings, 2 replies; 11+ messages in thread
From: GH @ 2022-06-12 22:30 UTC (permalink / raw)
  To: Help GNU Emacs


I love mailing lists,

so Im writing functions to interact with ~lists.gnu.org~ http interfaces

For example this function to download lists mbox archives:

#+begin_src elisp

(defun lists-mbox-recursive-download (url-head id date)
  "Download ID mailing lists archives as mbox files from server
URL-HEAD using a DATE filter.

URL-HEAD is the mailing list server url, example:
https://lists.gnu.org"
  (with-current-buffer "*eww*"
    (shr-next-link)
    (if (string-match (format "%s/archive/mbox/%s/%s\\(.+\\)" url-head id date)
		      (thing-at-point 'url))
	(eww-download))
    (if (save-excursion (shr-next-link))
	(lists-mbox-recursive-download url-head id date))))

#+end_src

I call that function from this another func:

#+begin_src elisp

(defun lists-load-archive (url-head id)
  (if (y-or-n-p "Load as mbox files?")
      (prog1 (eww-browse-url (format "%s/archive/mbox/%s/" url-head id))
	(if (y-or-n-p "Download archive mboxes?")
	    (let ((date (read-number "Date filter: " 20))
		  (eww-download-directory (read-file-name "Download directory: ")))
	      (lists-mbox-recursive-download url-head id date))))
    (eww-browse-url (format "%s/archive/html/%s/" url-head id))))

#+end_src

test it:

(lists-load-archive "https://lists.gnu.org" "help-gnu-emacs")


But sadly the recursion stop with this warning:

file-local-name: Lisp nesting exceeds `max-lisp-eval-depth'

any idea to fix?



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: functions to download mailing list archives
  2022-06-12 22:30 functions to download mailing list archives GH
@ 2022-06-12 23:04 ` Óscar Fuentes
  2022-06-13 12:43   ` GH
  2022-06-16  0:13 ` GH
  1 sibling, 1 reply; 11+ messages in thread
From: Óscar Fuentes @ 2022-06-12 23:04 UTC (permalink / raw)
  To: GH; +Cc: Help GNU Emacs

GH <project@gnuhacker.org> writes:

> so Im writing functions to interact with ~lists.gnu.org~ http interfaces
>
> For example this function to download lists mbox archives:

[snip]

> But sadly the recursion stop with this warning:
>
> file-local-name: Lisp nesting exceeds `max-lisp-eval-depth'
>
> any idea to fix?

Instead of a recursive function use a recursive data structure: a list.

The outer function puts in the list the initial URL, and then, in a
loop, pops from the list a URL, downloads it, adds to the list the
referenced URLs and repeats until the list is empty or some other
condition is met.

Some complications may remain: for instance, you need to detect cycles
(the failure on your posted code probably comes from that). A list also
comes handy for that, if you walk the list with dolist instead of
popping elements and add new ones with add-to-list, which omits elements
which are already in the list.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: functions to download mailing list archives
  2022-06-12 23:04 ` Óscar Fuentes
@ 2022-06-13 12:43   ` GH
  2022-06-13 16:46     ` Óscar Fuentes
  0 siblings, 1 reply; 11+ messages in thread
From: GH @ 2022-06-13 12:43 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: Help GNU Emacs

Óscar Fuentes <ofv@wanadoo.es> writes:

> Instead of a recursive function use a recursive data structure: a list.

> The outer function puts in the list the initial URL, and then, in a
> loop, pops from the list a URL, downloads it, adds to the list the
> referenced URLs and repeats until the list is empty or some other
> condition is met.

> Some complications may remain: for instance, you need to detect cycles
> (the failure on your posted code probably comes from that). A list also
> comes handy for that, if you walk the list with dolist instead of
> popping elements and add new ones with add-to-list, which omits elements
> which are already in the list.

I dont know how. Maybe some like:

#+begin_src elisp

(defun lists-mbox-recursive-url-list (url-head id date)
  (with-current-buffer "*eww*"
    (shr-next-link)
    (let ((url (thing-at-point 'url)))
      (if (string-match (format "%s/archive/mbox/%s/%s\\(.+\\)" url-head id date)
			url)
	  (add-to-list url-list url)))
    (if (save-excursion (shr-next-link))
	(lists-mbox-recursive-url-list url-head id date)
      url-list)))


(let ((url-list '()))
  (with-current-buffer "*eww*"
    (beginning-of-buffer)
    (lists-mbox-recursive-url-list "https://lists.gnu.org" "help-gnu-emacs" 201)))

#+end_src

But return an error that I dont understand:

Debugger entered--Lisp error: (setting-constant nil)
  add-to-list(nil "https://lists.gnu.org/archive/mbox/help-gnu-emacs/...")
  (if (string-match (format "%s/archive/mbox/%s/%s\\(.+\\)" url-head id date) url) (add-to-list url-list url))
  (let ((url (thing-at-point 'url))) (if (string-match (format "%s/archive/mbox/%s/%s\\(.+\\)" url-head id date) url) (add-to-list url-list url)))
  (save-current-buffer (set-buffer "*eww*") (shr-next-link) (let ((url (thing-at-point 'url))) (if (string-match (format "%s/archive/mbox/%s/%s\\(.+\\)" url-head id date) url) (add-to-list url-list url))) (if (save-excursion (shr-next-link)) (lists-mbox-recursive-url-list url-head id date) url-list))
  lists-mbox-recursive-url-list("https://lists.gnu.org" "help-gnu-emacs" 201)

...

What mean "error: (setting-constant nil)"



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: functions to download mailing list archives
  2022-06-13 12:43   ` GH
@ 2022-06-13 16:46     ` Óscar Fuentes
  2022-06-13 18:41       ` GH
  0 siblings, 1 reply; 11+ messages in thread
From: Óscar Fuentes @ 2022-06-13 16:46 UTC (permalink / raw)
  To: GH; +Cc: Help GNU Emacs

GH <project@gnuhacker.org> writes:

> I dont know how. Maybe some like:
>
> #+begin_src elisp
>
> (defun lists-mbox-recursive-url-list (url-head id date)
>   (with-current-buffer "*eww*"
>     (shr-next-link)
>     (let ((url (thing-at-point 'url)))
>       (if (string-match (format "%s/archive/mbox/%s/%s\\(.+\\)" url-head id date)
> 			url)
> 	  (add-to-list url-list url)))
>     (if (save-excursion (shr-next-link))
> 	(lists-mbox-recursive-url-list url-head id date)
>       url-list)))
>
>
> (let ((url-list '()))
>   (with-current-buffer "*eww*"
>     (beginning-of-buffer)
>     (lists-mbox-recursive-url-list "https://lists.gnu.org" "help-gnu-emacs" 201)))
>
> #+end_src
>
> But return an error that I dont understand:
>
> Debugger entered--Lisp error: (setting-constant nil)
>   add-to-list(nil
> "https://lists.gnu.org/archive/mbox/help-gnu-emacs/...")
>
>   (if (string-match (format "%s/archive/mbox/%s/%s\\(.+\\)" url-head id date) url) (add-to-list url-list url))
>   (let ((url (thing-at-point 'url))) (if (string-match (format "%s/archive/mbox/%s/%s\\(.+\\)" url-head id date) url) (add-to-list url-list url)))
>   (save-current-buffer (set-buffer "*eww*") (shr-next-link) (let ((url
> (thing-at-point 'url))) (if (string-match (format
> "%s/archive/mbox/%s/%s\\(.+\\)" url-head id date) url) (add-to-list
> url-list url))) (if (save-excursion (shr-next-link))
> (lists-mbox-recursive-url-list url-head id date) url-list))
>   lists-mbox-recursive-url-list("https://lists.gnu.org" "help-gnu-emacs" 201)
>
> ...
>
> What mean "error: (setting-constant nil)"

You are trying to mutate a constant (the symbol `nil').

You need to quote the list variable:

(add-to-list 'url-list url)

Other problem with your code is that you still are using function
recursion.

What I was suggesting was something like this:

(defun lists-mbox-recursive-url-list (url-list)
  (dolist (url url-list)
    ;; get the url's content
    ;; do wathever you want with the url's content (save it, ...)
    ;; for each URL of interest inside the content:
      ;; The `t' at the end of add-to-list means to append the new element:
      (add-to-list 'url-list new-url t)))

(lists-mbox-recursive-url-list (list "https://lists.gnu.org/whatever"))

You need to adapt the above to your specific requirements (build the URL
depending on the mailing list, date, etc) but the general structure of
the task is there.

See how there is no function recursion, so no problem with
max-lisp-eval-depth. And add-to-list checks that the element you are
adding is not already in the list, so no problem with cyclic references:
you visit each URL only once.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: functions to download mailing list archives
  2022-06-13 16:46     ` Óscar Fuentes
@ 2022-06-13 18:41       ` GH
  2022-06-13 22:09         ` Óscar Fuentes
  0 siblings, 1 reply; 11+ messages in thread
From: GH @ 2022-06-13 18:41 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: Help GNU Emacs

Óscar Fuentes <ofv@wanadoo.es> writes:
> What I was suggesting was something like this:

> (defun lists-mbox-recursive-url-list (url-list)
>   (dolist (url url-list)
>     ;; get the url's content

yes, but the problem now is to get the urls, all mbox archive urls

>     ;; do wathever you want with the url's content (save it, ...)
>     ;; for each URL of interest inside the content:
>       ;; The `t' at the end of add-to-list means to append the new element:
>       (add-to-list 'url-list new-url t)))

> (lists-mbox-recursive-url-list (list "https://lists.gnu.org/whatever"))

I use recursion to get urls



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: functions to download mailing list archives
  2022-06-13 18:41       ` GH
@ 2022-06-13 22:09         ` Óscar Fuentes
  2022-06-14  9:58           ` GH
  0 siblings, 1 reply; 11+ messages in thread
From: Óscar Fuentes @ 2022-06-13 22:09 UTC (permalink / raw)
  To: GH; +Cc: Help GNU Emacs

GH <project@gnuhacker.org> writes:

> Óscar Fuentes <ofv@wanadoo.es> writes:
>> What I was suggesting was something like this:
>
>> (defun lists-mbox-recursive-url-list (url-list)
>>   (dolist (url url-list)
>>     ;; get the url's content
>
> yes, but the problem now is to get the urls, all mbox archive urls

You already solved that problem on your previous code, didn't you? On my
proposed variant it works the same.

>>     ;; do wathever you want with the url's content (save it, ...)
>>     ;; for each URL of interest inside the content:
>>       ;; The `t' at the end of add-to-list means to append the new element:
>>       (add-to-list 'url-list new-url t)))
>
>> (lists-mbox-recursive-url-list (list "https://lists.gnu.org/whatever"))
>
> I use recursion to get urls

No, you use recursion to process the pages pointed by the urls you got
from a given page.

Now, instead of recursion you add the urls to a list, which acts as a
queue, and get the urls to be processed from that list at the same time
you add new urls to the end of the list.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: functions to download mailing list archives
  2022-06-13 22:09         ` Óscar Fuentes
@ 2022-06-14  9:58           ` GH
  2022-06-14 10:26             ` Emanuel Berg
  0 siblings, 1 reply; 11+ messages in thread
From: GH @ 2022-06-14  9:58 UTC (permalink / raw)
  To: Óscar Fuentes; +Cc: Help GNU Emacs

Óscar Fuentes <ofv@wanadoo.es> writes:
> GH <project@gnuhacker.org> writes:

> You already solved that problem on your previous code, didn't you? On my
> proposed variant it works the same.

>> I use recursion to get urls

> No, you use recursion to process the pages pointed by the urls you got
> from a given page.

> Now, instead of recursion you add the urls to a list, which acts as a
> queue, and get the urls to be processed from that list at the same time
> you add new urls to the end of the list.

I dont know how do without recursion. If you know how, please write the function.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: functions to download mailing list archives
  2022-06-14  9:58           ` GH
@ 2022-06-14 10:26             ` Emanuel Berg
  0 siblings, 0 replies; 11+ messages in thread
From: Emanuel Berg @ 2022-06-14 10:26 UTC (permalink / raw)
  To: help-gnu-emacs

GH wrote:

>>> I use recursion to get urls
>>
>> No, you use recursion to process the pages pointed by the
>> urls you got from a given page.
>>
>> Now, instead of recursion you add the urls to a list, which
>> acts as a queue, and get the urls to be processed from that
>> list at the same time you add new urls to the end of
>> the list.
>
> I dont know how do without recursion. If you know how,
> please write the function.

Iteration (a loop).

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: functions to download mailing list archives
  2022-06-12 22:30 functions to download mailing list archives GH
  2022-06-12 23:04 ` Óscar Fuentes
@ 2022-06-16  0:13 ` GH
  2022-06-16  4:23   ` Emanuel Berg
  1 sibling, 1 reply; 11+ messages in thread
From: GH @ 2022-06-16  0:13 UTC (permalink / raw)
  To: Help GNU Emacs


Fixed without recursion

#+begin_src elisp

;;; Code under GPLv3-or-later

(defun lists-mbox-url-list (url-head id date)
  (with-current-buffer "*eww*"
    (beginning-of-buffer)
    (let ((url-list '()))
      (while (save-excursion
	       (text-property-search-forward 'shr-url nil nil t))
	(shr-next-link)
	(let ((url (thing-at-point 'url)))
	  (if (and url
		   (string-match (format "%s/archive/mbox/%s/%s\\(.+\\)" url-head id date)
				 url))
	      (add-to-list 'url-list url))))
      url-list)))

(defun lists-mbox-download (url-head id)
  (let* ((date (read-number "Date filter: " 20))
	 (dir (read-file-name "Download directory: "))
	 (url-list (lists-mbox-url-list url-head id date)))
    (dolist (url url-list)
      (url-retrieve url #'eww-download-callback (list url dir)))))

(defun lists-load-archive (url-head id)
  (interactive (list "https://lists.gnu.org"
		     (read-from-minibuffer "Mailing list id: ")))
  (if (y-or-n-p "Load as mbox files?")
      (prog1 (eww-browse-url (format "%s/archive/mbox/%s/" url-head id))
	(if (y-or-n-p "Download archive mboxes?")
	    (lists-mbox-download url-head id)))
    (eww-browse-url (format "%s/archive/html/%s/" url-head id))))

#+end_src

Test it

(lists-load-archive "https://lists.gnu.org" "help-gnu-emacs")

Or call it interactivelly:

M-x lists-load-archive RET help-gnu-emacs RET y y



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: functions to download mailing list archives
  2022-06-16  0:13 ` GH
@ 2022-06-16  4:23   ` Emanuel Berg
  2022-06-16 12:27     ` GH
  0 siblings, 1 reply; 11+ messages in thread
From: Emanuel Berg @ 2022-06-16  4:23 UTC (permalink / raw)
  To: help-gnu-emacs

GH wrote:

> Fixed without recursion

Uh oh, I smell trouble! And the byte compiler says ...

  Warning: ‘beginning-of-buffer’ is for interactive use only;
  use ‘(goto-char (point-min))’ instead.

  Error: ‘add-to-list’ can’t use lexical var ‘url-list’; use
  ‘push’ or ‘cl-pushnew’

  Warning: the function ‘eww-download-callback’ is not known
  to be defined.

  Warning: the function ‘shr-next-link’ is not known to
  be defined.

Acutally I don't know if your were lexical, anyway it should
be so regardless of whatever ...

-- 
underground experts united
https://dataswamp.org/~incal




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: functions to download mailing list archives
  2022-06-16  4:23   ` Emanuel Berg
@ 2022-06-16 12:27     ` GH
  0 siblings, 0 replies; 11+ messages in thread
From: GH @ 2022-06-16 12:27 UTC (permalink / raw)
  To: help-gnu-emacs

Emanuel Berg <incal@dataswamp.org> writes:

> Uh oh, I smell trouble! And the byte compiler says ...

>   Warning: ‘beginning-of-buffer’ is for interactive use only;
>   use ‘(goto-char (point-min))’ instead.

>   Error: ‘add-to-list’ can’t use lexical var ‘url-list’; use
>   ‘push’ or ‘cl-pushnew’

> Acutally I don't know if your were lexical, anyway it should
> be so regardless of whatever ...

yes I use lexical, fixed thanks

#+begin_src elisp

;;; -*- lexical-binding: t; -*-

(defun lists-mbox-url-list (url-head id date)
  (with-current-buffer "*eww*"
    (goto-char (point-min))
    (let ((url-list '()))
      (while (save-excursion
	       (text-property-search-forward 'shr-url nil nil t))
	(shr-next-link)
	(let ((url (thing-at-point 'url)))
	  (if (and url
		   (string-match (format "%s/archive/mbox/%s/%s\\(.+\\)" url-head id date)
				 url))
	      (push url url-list))))
      url-list)))

#+end_src

> Warning: the function ‘eww-download-callback’ is not known to be
> defined.

uh? is a vanilla func

eww-download-callback is a compiled Lisp function in ‘eww.el’.

(eww-download-callback STATUS URL DIR)

> Warning: the function ‘shr-next-link’ is not known to be defined.

shr-next-link is an interactive compiled Lisp function in ‘shr.el’.

(shr-next-link)

Skip to the next link.



Now it compile without warnings for me



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-06-16 12:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-12 22:30 functions to download mailing list archives GH
2022-06-12 23:04 ` Óscar Fuentes
2022-06-13 12:43   ` GH
2022-06-13 16:46     ` Óscar Fuentes
2022-06-13 18:41       ` GH
2022-06-13 22:09         ` Óscar Fuentes
2022-06-14  9:58           ` GH
2022-06-14 10:26             ` Emanuel Berg
2022-06-16  0:13 ` GH
2022-06-16  4:23   ` Emanuel Berg
2022-06-16 12:27     ` GH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).