Sitemap blocked by robots txt

Server Management Service

Is your Website having issues resulting in sitemap blocked by robots.txt?

This guide is for you.

Blocked sitemap URLs are typically caused by web developers improperly configuring their robots. txt file.

Whenever you're disallowing anything you need to ensure that you know what you're doing otherwise, this warning will appear and the web crawlers may no longer be able to crawl your site.

A blocked Sitemap error occur due to many reasons that include if there are any disallow rules set in the robotx.txt file if any migration done from HTTP to HTTPS, and so on.

Here at Ibmi Media, as part of our Server Management Services, we regularly help our Customers to fix robots.txt related issues.

In this context, we shall look into steps to fix sitemap blocked by robots.txt error.

Tips to fix sitemap blocked by robots.txt error ?

To resolve your Website's Sitemap issue, apply the following tips.

1. Setting HTTPS in the robots.txt file

One of our customers had recently contacted us with the same error message telling us that the robots.txt was working fine until an SSL was installed on the website:

For any robots.txt trouble, our Support Experts normally start troubleshooting the error by checking the robots.txt file.

We check it for any disallow rule being set which causes the error.

On checking, we found that the below code was set in the robots.txt file:

# robots.txt
#
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these "robots" where not to go on your site,
# you save bandwidth and server resources.
# Website Sitemap
Sitemap: http://www.mydomain.com/sitemap.xml
# Crawlers Setup
User-agent: *
# Allowable Index
Allow: /index.php/blog/
# Directories
Disallow: /404/
Disallow: /app/</code

Here, the sitemap URL is set to HTTP.

The customer had earlier told about the website to be working well before SSL was installed.

So after SSL installation on the website, the Sitemap URL also must be updated to HTTPS.

Once this was set, we suggested the customer wait for the search engine to re-crawl the website.

Finally, this fixed the error.

2. Telling Google to re-crawl the website

We had another customer with the same problem but a different solution.

Let's take a look at it.

Our Support Experts started troubleshooting the problem by checking the robots.txt file and here it is:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

The file was fine as the disallow was set to the other path than the one having the problem.

Also, we confirmed with our customer that there were no disallow rules set in the sitemap.

So we further continued troubleshooting by manually telling Google to crawl the website.

We did it by navigating to the Search Console property >> Crawl >> Fetch as Google >> Add then entered the URL path which Google was warning about and >> Fetch.

Once reloaded, click on Request Indexing >> Crawl only this URL.