bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#52338: Crawler bots are downloading substitutes


From: Mark H Weaver
Subject: bug#52338: Crawler bots are downloading substitutes
Date: Fri, 10 Dec 2021 16:21:11 -0500

Hi Leo,

Leo Famulari <leo@famulari.name> writes:

> I noticed that some bots are downloading substitutes from
> ci.guix.gnu.org.
>
> We should add a robots.txt file to reduce this waste.
>
> Specifically, I see bots from Bing and Semrush:
>
> https://www.bing.com/bingbot.htm
> https://www.semrush.com/bot.html

For what it's worth: during the years that I administered Hydra, I found
that many bots disregarded the robots.txt file that was in place there.
In practice, I found that I needed to periodically scan the access logs
for bots and forcefully block their requests in order to keep Hydra from
becoming overloaded with expensive queries from bots.

     Regards,
       Mark





reply via email to

[Prev in Thread] Current Thread [Next in Thread]