Sitemap blocked by robots txt








Is your Website having issues resulting in sitemap blocked by robots.txt?

This guide is for you.


Blocked sitemap URLs are typically caused by web developers improperly configuring their robots. txt file. 

Whenever you're disallowing anything you need to ensure that you know what you're doing otherwise, this warning will appear and the web crawlers may no longer be able to crawl your site.

A blocked Sitemap error occur due to many reasons that include if there are any disallow rules set in the robotx.txt file if any migration done from HTTP to HTTPS, and so on.

Here at Ibmi Media, as part of our Server Management Services, we regularly help our Customers to fix robots.txt related issues.

In this context, we shall look into steps to fix sitemap blocked by robots.txt error.


Tips to fix sitemap blocked by robots.txt error ?

To resolve your Website's Sitemap issue, apply the following tips.


1. Setting HTTPS in the robots.txt file

One of our customers had recently contacted us with the same error message telling us that the robots.txt was working fine until an SSL was installed on the website:

For any robots.txt trouble, our Support Experts normally start troubleshooting the error by checking the robots.txt file. 

We check it for any disallow rule being set which causes the error.

On checking, we found that the below code was set in the robots.txt file:

# robots.txt
#
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these "robots" where not to go on your site,
# you save bandwidth and server resources.
# Website Sitemap
Sitemap: http://www.mydomain.com/sitemap.xml
# Crawlers Setup
User-agent: *
# Allowable Index
Allow: /index.php/blog/
# Directories
Disallow: /404/
Disallow: /app/</code

Here, the sitemap URL is set to HTTP. 

The customer had earlier told about the website to be working well before SSL was installed. 

So after SSL installation on the website, the Sitemap URL also must be updated to HTTPS.

Once this was set, we suggested the customer wait for the search engine to re-crawl the website.

Finally, this fixed the error.


2. Telling Google to re-crawl the website

We had another customer with the same problem but a different solution.

Let's take a look at it.

Our Support Experts started troubleshooting the problem by checking the robots.txt file and here it is:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

The file was fine as the disallow was set to the other path than the one having the problem.

Also, we confirmed with our customer that there were no disallow rules set in the sitemap.

So we further continued troubleshooting by manually telling Google to crawl the website.

We did it by navigating to the Search Console property >> Crawl >> Fetch as Google >> Add then entered the URL path which Google was warning about and >> Fetch

Once reloaded, click on Request Indexing >> Crawl only this URL.


3. Robots.txt tester to fix sitemap blocked error

We also suggest our customers use robots.txt tester to check the warnings and error messages that are generated.

This will provide a detailed description of the error.


[Need urgent assistance in fixing robots.txt error? – We'll help you. ]



Conclusion

This article will guide you on tips to resolve sitemap blocked by robots.txt which is generally caused due to developers improperly configuring the robots.txt file. 

A #sitemap is a blueprint of your website that help search engines find, crawl and index all of your website's content. #Sitemaps also tell search engines which pages on your site are most important.

A sitemap is vital for good SEO practices, and #SEO is vital in bringing in traffic and revenue to the website. 

On the flip side, sitemaps are essential to having search engines crawl and index the website so that the content within it can be ranked within the search results.

txt file is usually the first place crawlers visit when accessing a website. Even if you want all robots to have access to every page on your website, it's still good practice to add a robots. txt file that allows this. txt files should also include the location of another very important file: the #XML Sitemap.

Crawl-delay in robots. txt.:

The Crawl-delay directive is an unofficial directive used to prevent overloading servers with too many requests. 

If search engines are able to overload a server, adding Crawl-delay to your robots. txt file is only a temporary fix.


For Linux Tutorials

We create Linux HowTos and Tutorials for Sys Admins. Visit us on LinuxAPT.com

Also for Tech related tips, Visit forum.outsourcepath.com or General Technical tips on www.outsourcepath.com