What is robots.txt?
The robots.txt file is a plain-text file placed in the root of your website that instructs web crawlers (robots) which pages or sections they are allowed or not allowed to access. It follows the Robots Exclusion Protocol (REP). When a crawler like Googlebot visits your site, it first fetches yourdomain.com/robots.txt and respects its directives before crawling any page.
A well-configured robots.txt helps you: prevent crawlers from wasting budget on admin pages, staging URLs, or duplicate content; protect sensitive areas from being indexed; and specify the location of your XML sitemap for faster discovery.
FAQ
Does robots.txt prevent Google from indexing pages?
No! Disallowing a URL in robots.txt only prevents crawling, not indexing. If other pages link to a disallowed URL, Google may still index it without crawling it. To prevent indexing, use the <meta name="robots" content="noindex"> tag on the page itself.
What is crawl-delay?
Crawl-delay instructs bots to wait a specified number of seconds between requests. This can reduce server load from aggressive crawlers. Note: Googlebot ignores crawl-delay in robots.txt — use Google Search Console to set Google's crawl rate instead.