In recent times we often stumbled upon meta tag robots references, particularly after the changes Google operated on the nofollow rel, with consequences on meta robots as well, as we were saying on our previous insights. Therefore, it is the case to dwell a bit more on these directives.

What are the meta tag robots

With the name meta tag robots we refer to labels or instructions inside the HTML code that apply to the search engine crawlers, so to control Googlebot‘s behaviour. It is one of the code strings constituting Google’s official commands that influence the way the crawler performs the scanning and indexing of contents discovered on web pages.

Meta tag robots and x-robots tag

Alongside the interventions on the HTML page there also exist other robot meta directives, the HTTP X-Robots-Tag headings, that the Web server sends as HTTP headings of a specific URL. Crawlers will follow the instructions either way, what really changes is merely the method with which communicate those parameters; the x-robots tags may though be useful in the event of not-HTML pages on site, like particular images and PDFs.

The difference between meta tag robots and robots.txt

At this point it would be good to immediately clarify also the difference between meta tag robots and robots.txt files: the latter is a document containing the same instructions related to single pages or entire site folders, while label indications are specific for each site’s content and web page, then resulting more useful and precise.

In fact, from last September on it was definitely abandoned the rule of the noindex on robots.txt files, a command never really backed up by Google but that seemed to be very popular among webmasters, as well as was also clarified that Googlebot does not follow the nofollow or crawl-delay in robots. The recommended method to block the integration or remove page URLs from Google Index is a no-index command on meta tag robots.

How to use meta tag robots

The meta tag robot is part of the HTML code of a Web page and appears like a code element within the <head> section. It can contain generic instruction, aimed to all search engine crawlers, or rather be addressed to specific user-agents, such as Googlebot; moreover, it is possible to use multiple directives on a single page by separating them with commas, if addressed to the same robot.

If instead we are using different commands for different search user-agents, we should use separated tags for each bot.

What are directives for

According to Google guidelines for developers, meta tag robots allow to “use a granular and specific page approach”, and more in particular to “control how a single page should be indexed and provided to users among Google search results”.

With this tool is then possible to suggest Google which are the resources not to be considered for indexing and ranking, because they do not offer any purpose to users or are only published for service reasons.

Directives for the management of the snippets

From some time now, in addition, webmasters can also take advantage of these commands to “control the indexing and publishing” of a Google snippet, a.k.a those brief text extracts appearing on the SERP that serve to “prove the relevance of a document to the user’s query”.

The instructions of meta tag robots

We are now going to try and supply a summed up but still comprehensive picture of all the instructions it is possible to enter on these labels: as we will see, with these command we can not only address the actual scanning of the site’s pages, but also indicate to the bots what kind of weight they should give to the outbounding links or how many characters to use for snippets on search results.

  • all – it is the default value, for those pages without any limitation for indexing and publishing.
  • noindex – is the command to make the page not to appear among search results.
  • nofollow – useful to communicate the bot not to follow the links on the page. Google now reads these instruction more as a suggestion than as directive, though.
  • none – equal to noindex and nofollow.
  • noarchive – it blocks the appearing of the “Copy cache” link among search results.
  • nosnippet – it prevents the presence of a preview snippet on SERP. It applies to any kind of search result and goes for the classic Research, Google Images and Discover.
  • max-snippet:[number] – it dictates a maximum number of characters to use in a textual snippet for this search result, with no impact on thumbnails or videos. The instruction will be ignored if we do not specify an analyzable number, with two special values:

0 corresponds to nosnippet and blocks the occurrence of the snippets.

-1 indicates that there are no length limits for the snippet.

  • max-image-preview:[setting] – it is used to set a maximum size of an image preview on SERP. There are three accepted values for the command:

none it prevents any preview;

standard it determines a default preview;

large sets a maximum width as large as the visible area.

  • max-video-preview:[number] – determines the maximum number of seconds of a video to use for a video snippet on SERP. It supports two numerical values:

0, at most, a statical image can be used in compliance with the setting max-image-preview;

-1: no limit whatsoever.

  • notranslate – it prevents the translation of the page among search results.
  • noimageindex – it blocks the indexing of the images.
  • unavailable_after: [date/time] – it sets an “expiration date” to a page, that right after the specified date and hour will no longer be displayed on the SERP.