From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id 4Oa2Iui2A2Nd8wAAbAwnHQ (envelope-from ) for ; Mon, 22 Aug 2022 19:03:36 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id wFTxIei2A2M1/wAAG6o9tA (envelope-from ) for ; Mon, 22 Aug 2022 19:03:36 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 5ACE61771A for ; Mon, 22 Aug 2022 19:03:35 +0200 (CEST) Received: from localhost ([::1]:36292 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oQApo-0003gg-MI for larch@yhetil.org; Mon, 22 Aug 2022 13:03:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:55552) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oQ9Mg-0002s1-VV for emacs-orgmode@gnu.org; Mon, 22 Aug 2022 11:29:22 -0400 Received: from ciao.gmane.io ([116.202.254.214]:55222) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oQ9Mf-00014y-Ft for emacs-orgmode@gnu.org; Mon, 22 Aug 2022 11:29:22 -0400 Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1oQ9Mb-0006ds-J3 for emacs-orgmode@gnu.org; Mon, 22 Aug 2022 17:29:17 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: emacs-orgmode@gnu.org From: Max Nikulin Subject: Re: Auto-checking dead links in the manual (was: http: links in the manual) Date: Mon, 22 Aug 2022 22:29:10 +0700 Message-ID: References: <87iln2dckv.fsf@posteo.net> <87y1vx4p7g.fsf@localhost> <87sflwqnnk.fsf@posteo.net> <87tu67o42v.fsf@localhost> <87wnb1ly3b.fsf@posteo.net> <878rnhknaz.fsf@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Content-Language: en-US In-Reply-To: <878rnhknaz.fsf@localhost> Received-SPF: pass client-ip=116.202.254.214; envelope-from=geo-emacs-orgmode@m.gmane-mx.org; helo=ciao.gmane.io X-Spam_score_int: 28 X-Spam_score: 2.8 X-Spam_bar: ++ X-Spam_report: (2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_ADSP_CUSTOM_MED=0.001, FORGED_GMAIL_RCVD=1, FORGED_MUA_MOZILLA=2.309, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.25, NICE_REPLY_A=-0.001, NML_ADSP_CUSTOM_MED=0.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1661187815; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=De0PZ1EEbQdBZDjsnFWtB2/kOklk8FWeaAg8o1Adfq0=; b=TeC5oRN+zu2W3WNpBsds6n5H4POt651rdNPYKC1SdnkYMrkxEHPTo7UBz99oCNLbKnAr02 0xT9oluKxwbbtAjBtGOAGVfeo2XofUBkYopQpvICb7hlB3N9ESX06fjG9RgF3ztFDCgjto wPG6bRbbnyVqyY2AT2jU4kIqprelSykzT6/pKfXzUFm1vTnB0112t1qIns+LjTitysSlif gNUuPBfAdBlZIdWwR6ClqnRSfsGJFgbSwazp/Vl1b75NnGT/f/EupycgQ5QUEuu0KyjXce 5WVVVXSC0G416eAYacyD6qi6pRC3E6iqMFhaS0F/H+r1MD1PXbXE7Y+NQ9PPRg== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1661187815; a=rsa-sha256; cv=none; b=pAyyw+oO3O4jWGLVVS9q6ibfWRfFPUlkkOpa5kFX4IDwiWuPbFlYn0AibcYHJy3pyQMgFX vIfn9rBLmvMz3vbYF9qsZm01g4O2PX8v0ODSlOkmrsYpKKn62TecR8Yi1GWeLxUbPEH/hl m1lWlJmzrY1CjsLkoQJm/vnYuz/oH1qybp/9Bm8W3FqlO0wJvWdF75XiwQl8Ku9ezcJGBE 9Ucy97JiQ43FaPLj0t/tyPV/Ge2/9jAWZ/EevR6l+H0VO3kPZiG/q0ch/s5vgfuwpDsnNr gnK5pWABHJGenvbuCQfvnvuOV9F5+8+zmdhpV8oAMgt5oyt/c8Rl0KZcQfAVEg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: 4.39 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 5ACE61771A X-Spam-Score: 4.39 X-Migadu-Scanner: scn1.migadu.com X-TUID: pU7kqCbsJeAj On 22/08/2022 09:46, Ihor Radchenko wrote: > Juan Manuel MacĂ­as writes: > >> Maybe, instead of repairing the links manually, we could think of some >> code that would do this work periodically, and also check the health of >> the links, running a url request on each link and returning a list of >> broken links. I don't know if it is possible to do something like that >> in Elisp, as I don't have much experience with web and link issues. I >> think there are also external tools, like Selenium Web Driver, but my >> experience with it is very limited (I use Selenium from time to time >> when I want to take a screenshot of a web page). > > This is a good idea. > > Selenium is probably an overkill since we should better not link JS-only > websites from the manual anyway. What we can do instead is a make target > that will use something like wget. > > Patches are welcome! I hope that selenium is currently overkill, however more sites are starting to use anti-DDOS shields like cloudflare and HTTP client may be banned just because it does not fetch other resources like JS scripts. I do not have a patch, just an idea: export backend that ignores everything besides link and either send requests from lisp code or generate file for another tool. #+attr_linklint: ... may be used to specify regexp that target page is expected to contain. There are some complications like e.g. "info:" links having special code to generate HTML with URL derived from original path. So it may be more robust to parse HTML document (without checking of linked document text).