In the building of a positive dialogue between our site and Googlebot sitemaps play an important role, we know that by now; in order to limit errors – which can cost us a lot in terms of page indexing and site performance – we can use the tools of the Google Search Console, and in particular the Sitemap Report. That’s what the new episode of the Google Search Console Training webseries is about.

A guide for sitemaps

Our guide is – as always – Daniel Waisberg, who starts to describe what a sitemap is by highlighting the main aspects of this file: in short, “it is a signal of which URLs of your site you want Google to scan” and can provide information about newly created or modified URLs.

Google also supports four further expanded syntax modes by which we can provide additional information, useful to describe files and contents that are difficult to analyze in order to improve their indexing: so we can describe a URL with included images or a video, indicate the presence of alternative languages or geolocalized versions with hreflang annotations, or (for news sites) to use a particular variant that allows you to indicate the latest updates.

Google and Sitemap

“If I don’t have a Sitemap, can Google still find all the pages on my site?”. The Search advocate also answers this frequent question, explaining that a sitemap may not be necessary if we have a relatively small site and an appropriate internal linking between pages, given the fact that Googlebot should be able to discover contents without any problems.

On the contrary, in certain cases a sitemap is useful and necessary to help Google decide what and when to scan your site:

  • If we have a very large site, with the file we can indicate a priority of the URLs to scan.
  • If the pages are isolated or not well connected.
  • If we have a new site or one with quickly changing contents.

However, the googler reminds us, the use of a sitemap does not guarantee that all pages are actually crawled and indexed, although in most cases providing this file to the bots of the search engine can give us benefits (and certainly does not give us any disadvantage). In addition to that, sitemaps do not replace normal scans and URLs not inserted into the file are not excluded from crawling.

How to build a sitemap

Ideally, the CMS that manages the site can automatically produce sitemap files, using plugins or extensions (and we remember the project to integrate by default sitemaps in WordPress), and Google itself suggests finding a way to create sitemaps automatically instead of manually.

There are two limits to sitemaps, they cannot exceed a maximum number of URLs (50 thousand per file) and have a maximum size (50 MB uncompressed), but if we need more space we can create more sitemaps. We can also send all these sitemaps together in the form of a Sitemap Index file.

Daniel Weisberg ci spiega il rapporto sitemap

The Search Console Sitemap Report

To keep these resources under control we can use the Sitemap Report in Search Console, which is needed to send Google a new Sitemap for the property, display the sending history, view any errors found during the analysis and remove files no longer relevant. This action removes the sitemap only from the Search Console and not from Google’s memory: to delete a sitemap we have to remove it from our site and provide a 404; after several attempts, googlebot will stop following that page and will no longer update the sitemap.

The tool allows us to manage all the Sitemaps of the site, but only if they have been sent through the Search Console, and therefore does not show files discovered through robots.txt or other methods (which however can be submitted in GSC even if already detected).

The sitemap report contains information on all files sent, and in particular the URL of the file related to the root of the property, the type or format (such as XML, text, RSS or atom), the date of sending, the date of Google’s last reading, the crawl status (of the sending or scanning), the number of URLs detected.

How to read the sitemap status

The report indicates three possible status of the sending or scanning of the sitemap.
  • Success is the ideal situation, because it means that the file has been loaded and processed correctly and without errors and that all URLs will be queued for scanning.
  • Has errors means that Sitemap may be scanned, but it has one or more errors; the URLs that might be analyzed will be queued for scanning. By clicking on the report table we can find out more details about the problems and have indications about corrective interventions.
  • Couldn’t be fetched, if any reason prevented file recovery. To find out the cause we need to do a real-time test on the Sitemap with the URL control tool.

Call to action