[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#52338: Crawler bots are downloading substitutes
From: |
Mark H Weaver |
Subject: |
bug#52338: Crawler bots are downloading substitutes |
Date: |
Fri, 10 Dec 2021 16:21:11 -0500 |
Hi Leo,
Leo Famulari <leo@famulari.name> writes:
> I noticed that some bots are downloading substitutes from
> ci.guix.gnu.org.
>
> We should add a robots.txt file to reduce this waste.
>
> Specifically, I see bots from Bing and Semrush:
>
> https://www.bing.com/bingbot.htm
> https://www.semrush.com/bot.html
For what it's worth: during the years that I administered Hydra, I found
that many bots disregarded the robots.txt file that was in place there.
In practice, I found that I needed to periodically scan the access logs
for bots and forcefully block their requests in order to keep Hydra from
becoming overloaded with expensive queries from bots.
Regards,
Mark
- bug#52338: Crawler bots are downloading substitutes, Leo Famulari, 2021/12/06
- bug#52338: [maintenance] hydra: berlin: Create robots.txt., Leo Famulari, 2021/12/06
- bug#52338: Crawler bots are downloading substitutes, Mathieu Othacehe, 2021/12/09
- bug#52338: Crawler bots are downloading substitutes, Tobias Geerinckx-Rice, 2021/12/09
- bug#52338: Crawler bots are downloading substitutes, Leo Famulari, 2021/12/10
- bug#52338: Crawler bots are downloading substitutes, Tobias Geerinckx-Rice, 2021/12/10
- bug#52338: Crawler bots are downloading substitutes, Mathieu Othacehe, 2021/12/11
- bug#52338: Crawler bots are downloading substitutes, Mathieu Othacehe, 2021/12/19
bug#52338: Crawler bots are downloading substitutes,
Mark H Weaver <=