From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id gAY3KQbFs2EaNQEAgWs5BA (envelope-from ) for ; Fri, 10 Dec 2021 22:22:14 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id 6JMEJQbFs2EOWgAAB5/wlQ (envelope-from ) for ; Fri, 10 Dec 2021 21:22:14 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 3AEA83EFF6 for ; Fri, 10 Dec 2021 22:22:14 +0100 (CET) Received: from localhost ([::1]:42082 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mvnLJ-0006hg-BD for larch@yhetil.org; Fri, 10 Dec 2021 16:22:13 -0500 Received: from eggs.gnu.org ([209.51.188.92]:44370) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvnL8-0006hK-9f for bug-guix@gnu.org; Fri, 10 Dec 2021 16:22:02 -0500 Received: from debbugs.gnu.org ([209.51.188.43]:36493) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mvnL8-00080V-0a for bug-guix@gnu.org; Fri, 10 Dec 2021 16:22:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1mvnL7-00048S-TH for bug-guix@gnu.org; Fri, 10 Dec 2021 16:22:01 -0500 X-Loop: help-debbugs@gnu.org Subject: bug#52338: Crawler bots are downloading substitutes Resent-From: Mark H Weaver Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Fri, 10 Dec 2021 21:22:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 52338 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Leo Famulari , 52338@debbugs.gnu.org Received: via spool by 52338-submit@debbugs.gnu.org id=B52338.163917131715884 (code B ref 52338); Fri, 10 Dec 2021 21:22:01 +0000 Received: (at 52338) by debbugs.gnu.org; 10 Dec 2021 21:21:57 +0000 Received: from localhost ([127.0.0.1]:48039 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvnL3-000488-3J for submit@debbugs.gnu.org; Fri, 10 Dec 2021 16:21:57 -0500 Received: from world.peace.net ([64.112.178.59]:41026) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvnL0-00047v-AK for 52338@debbugs.gnu.org; Fri, 10 Dec 2021 16:21:55 -0500 Received: from mhw by world.peace.net with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mvnKt-0004yJ-Ve; Fri, 10 Dec 2021 16:21:48 -0500 From: Mark H Weaver In-Reply-To: References: Date: Fri, 10 Dec 2021 16:21:11 -0500 Message-ID: <87r1ak2m1p.fsf@netris.org> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1639171334; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:resent-cc:resent-from:resent-sender: resent-message-id:in-reply-to:in-reply-to:references:references: list-id:list-help:list-unsubscribe:list-subscribe:list-post; bh=+AKALmX84O9yShXtH7iqTU77G6bc1OssmOuqNuSU/Po=; b=prxUK/GXlO/F14+pgjVtEKHF4fcTIosuMVudNKAUzBhfyULS0p4jHnaaVO8bkE9c1Jga/l QMVjiyY0TzyCuPRkN/Rqld7fOpqtEvUGA6gRymTH7vfbE2GuMlzk7pGV+vqu9bOn0kjGoN TmvuuE8pRjRakITdDYiP2JzB8keEQEoEULzPRQymNgrikU2vKwi8jbJPT3QqVdy8XflRO+ LLTKy/ElFjVo0NqVfEyNnEHflvYdYTn6zNRpqnqIR1zJd7Az8sshpWDwcT4zls0f9ZrvSt gQ6K7n5KSTevmR9+LfbCCXmZ/NsIi0NRjYxu5//59LRvFaVZWP0s+a79kkXdAA== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1639171334; a=rsa-sha256; cv=none; b=JVsAT1+fcxDcq55jDSMfF4EEs6cv/1xmaoClox3RJ/zGjltnIfpS1AdYdYy8CSpxbf4/+v hMRuuEX4BAnveoauz7mGlTj3R6PnZBkQ5hmbwQaF1E42EUrDMAC3fzZb0ZtMoVBUGBoC0N cBO63WNZPm0RI7q5WdSP4K3shhmEm8qcNlhQM42FDm9ZRb7w7P5LrU0oua/Gt3WJoRDvZk TFrO3mO7wnpAIB5+qVLR92kH0PvSlDeoqsbraB8gyb18FN+eKeodnv7kPhG2TxXrsIxqAx DZ9subn4Q4++CPeLq36+d10JN8dR06fAf6w8a7Xh4OJKu0eyHJnBmMP1pH1BXg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -2.96 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 3AEA83EFF6 X-Spam-Score: -2.96 X-Migadu-Scanner: scn1.migadu.com X-TUID: F9ZFmbaijsX3 Hi Leo, Leo Famulari writes: > I noticed that some bots are downloading substitutes from > ci.guix.gnu.org. > > We should add a robots.txt file to reduce this waste. > > Specifically, I see bots from Bing and Semrush: > > https://www.bing.com/bingbot.htm > https://www.semrush.com/bot.html For what it's worth: during the years that I administered Hydra, I found that many bots disregarded the robots.txt file that was in place there. In practice, I found that I needed to periodically scan the access logs for bots and forcefully block their requests in order to keep Hydra from becoming overloaded with expensive queries from bots. Regards, Mark