Google Crawling

What is Google crawling ?

Crawling is the process by which robots pass while they browse websites, going from links to links in order to discover and index the encountered contents. When Google's robots, named googlebot, discover a new page to explore, they check beforehand that they have the authorization to access it thanks to the directives in the robots.txt file. If the robots are authorized to access it, they will examine the page, otherwise they will ignore it.

Why is it important that googlebot can access your page ?

If googlebots can't crawl a page, they won't know its content and will end by not indexing it, even if it's possible that the page is indexed by links pointing to it. In some cases, prohibiting crawling may also be necessary. For example, when a part of your site must not be explored for security or uselessness reasons, it is important to prohibit the exploration of these pages.

How can I check if googlebot has a permission to crawl my page ? 

You can check it directly in the instructions in the robots.txt file. To allow or disallow googlebot crawling you must add an "allow" or "disallow" instruction in the robots.txt file.

Example 1 : Allowing googlebot crawling

user-agent: googlebot 

allow: /my-page

Example 2 : Forbidding googlebot crawling 

user-agent: googlebot

disallow: /my-page