About robots txt allow and disallow

Server Management Service

Are you trying to create robots.txt allow and disallow functionality for your website?

This guide is for you.

Robots.txt is named by robots exclusion standard.

It is a text file using which we can tell how the search engines must crawl the website.

txt file contains information about how the search engine should crawl, the information found there will instruct further crawler action on this particular site.

If the robots. txt file does not contain any directives that disallow a user-agent's activity (or if the site doesn't have a robots.

Here at Ibmi Media, as part of our Server Management Services, we regularly help our Customers to set up robots.txt for their website and fix related errors.

In this context, we shall explore more on robots.txt.

Robots.txt allow and disallow functionality ?

Robots.txt basically works like a "No Trespassing" sign. It actually, tells robots whether we want them to crawl the website or not.

So, it does not block access.

The robots.txt file belongs to the document root folder.

Now, let's explore more about how to allow and disallow search engine access to website folders using robots.txt directives.

How to Disallow robots and search engines from crawling ?

We can tell search engines which parts or folders it must not access on a website. This is easily done using the 'disallow' directive.

After the directive, we specify a path or the folder name which the search engine must not access. If there is no path or folder mentioned then the directive is ignored.

Here is an example:

User-agent: *
Disallow: /wp-admin/

How to Allow robots and search engines to crawl ?

We can also tell Search engines about which folders it must access while crawling the website. This is easily done using the 'allow' directive.

Using both the allow and disallow directive together we can tell search engines to access only specific directories. And the rest is set to disallow.

Here is an example:

User-agent: *
Allow: /blog/terms-and-condition.pdf
Disallow: /blog/

Here, the search engine will not crawl the entire folder blog except the file terms-and-condition.pdf.

Few common mistakes done while creating robots.txt allow or disallow

1. Separate line for each directive while using allow or disallow

When mentioning the directives for allowing or disallowing, each one must be in a separate line.

One of our customers had added the below code in robots.txt and it was not working:

User-agent: * Disallow: /directory-1/ Disallow: /directory-2/ Disallow: /directory-3/

The above is the incorrect way of mentioning the directives in robots.txt.

We corrected the file by adding it with below code:

User-agent: *
Disallow: /directory-1/
Disallow: /directory-2/
Disallow: /directory-3/

Finally, adding this code the robots.txt started working fine.

2. Conflicting directives while using robots.txt

Recently, one of our customers had a robots.txt file with the below code in it:

User-agent: *
Allow: /directory
Disallow: /*.html

Here, the search engines are unsure about what to do with the URL http://domain.com/directory.html.

Also, it is not clear to them whether they’re allowed to access.

So we modified the code in a better way by adding wildcards:

User-agent: *
Allow: /directory
Disallow: /*.html$

In the above code, the search engines don't provide any access to the URLs that end with .html.

However, URLs like https://example.com/page/html?lang=en is accessible as it doesn't end with .html.

Robots.txt allow and disallow functionality ?

How to Disallow robots and search engines from crawling ?

How to Allow robots and search engines to crawl ?

Few common mistakes done while creating robots.txt allow or disallow

[Need urgent assistance with Website robots.txt? – We'll help you. ]

Conclusion

Resources

How to securely back up your Linux Ubuntu Centos and Debian Server

Easy way to convert cPanel SSL Certificate from PEM format to PFX

How to fix SSL error err sslversion or cipher mismatch

Easy fix to SQL error 5023

Fix Cloudflare error 1018

Setup openvas on linux Debian and Ubuntu

Method to upgrade MySQL in VestaCP

Method to change Vestacp admin password and fix vestacp errors

Fix SQL server error 772

Fix Cloudflare error 526 Invalid SSL certificate

RECENT POSTS

How to Allow Docker Ports with iptables ?

Error Spawn Sendmail ENOENT - How to fix this issue ?

RECENT SERVICES

Outsourced Web Hosting Support

Server Management

INFORMATION