From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id yMU7O/VABGO+fAEAbAwnHQ (envelope-from ) for ; Tue, 23 Aug 2022 04:52:38 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id 0FFcO/VABGPcqAAA9RJhRA (envelope-from ) for ; Tue, 23 Aug 2022 04:52:37 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id AF0EBAD9F for ; Tue, 23 Aug 2022 04:52:37 +0200 (CEST) Received: from localhost ([::1]:43374 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1oQK1s-0002L8-Lb for larch@yhetil.org; Mon, 22 Aug 2022 22:52:36 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:50598) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1oQK1Q-0002Kv-KW for emacs-orgmode@gnu.org; Mon, 22 Aug 2022 22:52:08 -0400 Received: from mail-pg1-x530.google.com ([2607:f8b0:4864:20::530]:45029) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1oQK1P-0004GC-2d for emacs-orgmode@gnu.org; Mon, 22 Aug 2022 22:52:08 -0400 Received: by mail-pg1-x530.google.com with SMTP id c24so11086466pgg.11 for ; Mon, 22 Aug 2022 19:52:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:from:to:cc; bh=sE5lTxsHD8W/QgKTHeX8954e02xuOdtxtbfSH5KsyL0=; b=aP2qPneKLwZEOSZDeHns5BuV00ihsK1d9cXdPDC1gg5lpEW5MJR3HlmuujzdQJQuW8 0DQL1Q9jNLfY/OA3IZn5Ns68KbeAWvcT07FLIIgBBqkks2G4MuUImvbbbbtG1xF87shZ YB7tqJEiGm7uFqVuWxWRWbB4rNK0dpCMPw4XO6U8vdLLJEw4s6oS8FemJbIAdaT2US+z sCqi0DbF3tO1/7kL4RJxAU0cm9fXD8exw9du4za/blVpP0/SizlUq/E+KSlmDEZlB2fK 2K3aXaZg41sycZKY4zeZQrCJIDwPY0x3Ps33XJkFVg4ZucrdRjVpe3SVEfBLosWMGMuD dj8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:x-gm-message-state:from:to:cc; bh=sE5lTxsHD8W/QgKTHeX8954e02xuOdtxtbfSH5KsyL0=; b=lZ7FPbABi/Ok2omt+0Vf9QNLSrrwEYNzY2WqXJvQiQxH8Q/AoTKLlNQpjOduN5bsjR SNs/3e1VfVm+cBbPD15QV1cS5Tw0OzJejgQQv412zNL9KFjgaDryyc+806eoFOkYGjsG PNbROpJpQSFPF2L2gT3pAp6sbuXL0lkpet5rGGKI51M5xAvF3ThOmCtpLJps0eriLADg Jn+vCIPVlNZi+CCLffd6lHZI+TTKEcf25jlXKVirumxDq9OCndXYzXVZS5pnyBDH/d91 hbwtU6foCeRcObZkPjHntT8xxNzLv7Chab41738tDtz6E3oXOOSqbWwMjZmxCSRgO3RR PDuw== X-Gm-Message-State: ACgBeo11UlfeoXw0v86gs98muGKkYSTNT3hDl2qJ47xXAnJyU6NUaco7 PEPT5WfEyDV0TnUGHZEfx38= X-Google-Smtp-Source: AA6agR5tq31AJFBm1EY51ba5XH+pvgbTX8xjS+RL/boiXM4DNhXTQwFwESClbHA9/GfuFpcafC97nA== X-Received: by 2002:a05:6a00:3497:b0:536:d6a8:6f62 with SMTP id cp23-20020a056a00349700b00536d6a86f62mr4200904pfb.79.1661223125523; Mon, 22 Aug 2022 19:52:05 -0700 (PDT) Received: from localhost ([115.154.175.57]) by smtp.gmail.com with ESMTPSA id e11-20020a170902784b00b0016dd6929af5sm9160263pln.206.2022.08.22.19.52.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Aug 2022 19:52:05 -0700 (PDT) From: Ihor Radchenko To: Max Nikulin Cc: emacs-orgmode@gnu.org Subject: Re: Auto-checking dead links in the manual (was: http: links in the manual) In-Reply-To: References: <87iln2dckv.fsf@posteo.net> <87y1vx4p7g.fsf@localhost> <87sflwqnnk.fsf@posteo.net> <87tu67o42v.fsf@localhost> <87wnb1ly3b.fsf@posteo.net> <878rnhknaz.fsf@localhost> Date: Tue, 23 Aug 2022 10:53:10 +0800 Message-ID: <87k06z8ycp.fsf@localhost> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=2607:f8b0:4864:20::530; envelope-from=yantar92@gmail.com; helo=mail-pg1-x530.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-orgmode@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "General discussions about Org-mode." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-orgmode-bounces+larch=yhetil.org@gnu.org Sender: "Emacs-orgmode" X-Migadu-Flow: FLOW_IN X-Migadu-To: larch@yhetil.org X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1661223157; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=sE5lTxsHD8W/QgKTHeX8954e02xuOdtxtbfSH5KsyL0=; b=a4LRxSLl/oSvEUFVXn8YnHEWmiYewW1ejIRpPVRqLnyTHg5BH5+pH2ahU6skFtdhqYuIOc qbugs2IqmfIfl7dqQMBgRO+8pnRsE0IqyHt/WjK/qLMhmOwp3a5k43uA8C5RL5ulvdSCHz 43PZUfuLQIN3P4zgh5eUVAAjBd2uNpdn9//+6ToJAtISnfp48Lr2iKxHLJBIo2hT1//HmK pJTo4FGKjOoizcxYscqesA31lO3ZNSDAojEBBneO680ly8NhsNNNvkQE5ANhE/X7hYM/WN 6R7J/JNNSKrRkBosfdVZEW7HEueFDkQk0tnML4zaWgR8eC+nVQGQyLpgwovmrA== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1661223157; a=rsa-sha256; cv=none; b=OAqLEUUiEGJAy1tPvWkTpRYhvAxnCz30DujeqNyz55kR28h/J0CbAYSuWAd2zRJXnT/F9P l4mrUTBS26xl7JXusDUKj/w8PYc98bYmpOwaD2N/bBlsSzkw75gD8YoxXs0IKPTqZ50nCa 963PZDNP8yK697/7h29NxrxBMRFRrHihaantYGhpJGAGrqf7qByrwYEuRIuybzh/GqS3FB CgcRXNR7H056lf5ejopB6G/uR3UP8iBzakSlowsZCo7TVOmYr6Zw7cV/MKFbpOsa9TI5p1 U++wCCCQ9t9TACCvvxuHM5t7fCQ66vLvaPJRENytjqu1bAbS8jJkKHwoso6eZQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=aP2qPneK; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -3.30 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=aP2qPneK; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "emacs-orgmode-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="emacs-orgmode-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: AF0EBAD9F X-Spam-Score: -3.30 X-Migadu-Scanner: scn0.migadu.com X-TUID: vfoyLB1sEX9Q Max Nikulin writes: > I hope that selenium is currently overkill, however more sites are > starting to use anti-DDOS shields like cloudflare and HTTP client may be > banned just because it does not fetch other resources like JS scripts. Such links are to be considered dead for the purposes of Org manual. We must not link websites that cannot be opened without running non-free JS. It is according to GNU Documentation Standards. > I do not have a patch, just an idea: export backend that ignores > everything besides link and either send requests from lisp code or > generate file for another tool. > > #+attr_linklint: ... > > may be used to specify regexp that target page is expected to contain. > There are some complications like e.g. "info:" links having special code > to generate HTML with URL derived from original path. So it may be more > robust to parse HTML document (without checking of linked document text). Yes, the most robust way will be simply extracting links from the html version of the manual and testing them using whatever method is appropriate. -- Ihor Radchenko, Org mode contributor, Learn more about Org mode at https://orgmode.org/. Support Org development at https://liberapay.com/org-mode, or support my work at https://liberapay.com/yantar92