emacs-orgmode@gnu.org archives
 help / color / mirror / code / Atom feed
From: Max Nikulin <manikulin@gmail.com>
To: emacs-orgmode@gnu.org
Subject: Re: Auto-checking dead links in the manual (was: http: links in the manual)
Date: Mon, 22 Aug 2022 22:29:10 +0700	[thread overview]
Message-ID: <te07c9$obv$1@ciao.gmane.io> (raw)
In-Reply-To: <878rnhknaz.fsf@localhost>

On 22/08/2022 09:46, Ihor Radchenko wrote:
> Juan Manuel Macías writes:
> 
>> Maybe, instead of repairing the links manually, we could think of some
>> code that would do this work periodically, and also check the health of
>> the links, running a url request on each link and returning a list of
>> broken links. I don't know if it is possible to do something like that
>> in Elisp, as I don't have much experience with web and link issues. I
>> think there are also external tools, like Selenium Web Driver, but my
>> experience with it is very limited (I use Selenium from time to time
>> when I want to take a screenshot of a web page).
> 
> This is a good idea.
> 
> Selenium is probably an overkill since we should better not link JS-only
> websites from the manual anyway. What we can do instead is a make target
> that will use something like wget.
> 
> Patches are welcome!

I hope that selenium is currently overkill, however more sites are 
starting to use anti-DDOS shields like cloudflare and HTTP client may be 
banned just because it does not fetch other resources like JS scripts.

I do not have a patch, just an idea: export backend that ignores 
everything besides link and either send requests from lisp code or 
generate file for another tool.

#+attr_linklint: ...

may be used to specify regexp that target page is expected to contain. 
There are some complications like e.g. "info:" links having special code 
to generate HTML with URL derived from original path. So it may be more 
robust to parse HTML document (without checking of linked document text).




  reply	other threads:[~2022-08-22 17:03 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-08 14:39 [PATCH] Documentation and NEWS for ` org-latex-language-alist' Juan Manuel Macías
2022-08-09 11:43 ` Ihor Radchenko
2022-08-16 14:16   ` Juan Manuel Macías
2022-08-18 15:39     ` Max Nikulin
2022-08-20  5:51     ` Ihor Radchenko
2022-08-20  7:17       ` http: links in the manual Max Nikulin
2022-08-21  9:55         ` Juan Manuel Macías
2022-08-22  2:46           ` Auto-checking dead links in the manual (was: http: links in the manual) Ihor Radchenko
2022-08-22 15:29             ` Max Nikulin [this message]
2022-08-22 21:58               ` Hendursaga
2022-08-23  2:53               ` Ihor Radchenko
2022-08-09 15:39 ` [PATCH] Documentation and NEWS for ` org-latex-language-alist' Max Nikulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.orgmode.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='te07c9$obv$1@ciao.gmane.io' \
    --to=manikulin@gmail.com \
    --cc=emacs-orgmode@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).