Analyze robots.txt rules and test URL path access
Robots.txt is a plain text file placed at the root of a website (e.g., example.com/robots.txt) that tells search engine crawlers which pages or sections they should or should not crawl. It follows the Robots Exclusion Protocol. It matters for SEO because incorrect rules can accidentally block important pages from being indexed, or allow crawlers to waste budget on unimportant pages. Every website should have a well-configured robots.txt file.
The checker parses all User-agent blocks and their Allow/Disallow rules from your robots.txt content. When you enter a path and user-agent, it finds the matching User-agent block (or falls back to the * wildcard block). Then it evaluates all rules using standard robots.txt precedence: longer, more specific path patterns take priority over shorter ones. If both Allow and Disallow match with the same specificity, Allow wins per the Google specification.
Robots.txt prevents crawling, not indexing. A page blocked by robots.txt can still appear in search results if other pages link to it — Google may show the URL without a snippet. To truly prevent indexing, use a noindex meta tag or X-Robots-Tag HTTP header. Importantly, the page must be crawlable for Google to see the noindex directive, so do not block pages with robots.txt if you want to use noindex.