One way to easily bypass is to let external services fetching robots.txt (archiv...

One way to easily bypass is to let external services fetching robots.txt (archive.org, GitHub actions, etc...) to cache it and either expose through separate apis/webhook/manual download to the actual scrape server.

robots txt file size is usually small and would not alert external services.