The name is rather evocative and symbolic: orphan pages, and indicates the main feature of these resources, which have no input references from any other page of the site. That is to say, they are pages that do not receive internal links, practically isolated from the structure of the site and from the other pages. Even if only from this synthesis it is understood that the copious presence of such situation can represent a problem for the SEO, but the search and the correction of the orphan pages is not complicated and there are various tools to intervene.
The definition of orphan pages
In SEO language, orphan pages are defined as those that are present on the site but have no link pointing to them from any other page. An orphan page, in English, can therefore be a URL or a sub-page physically present, but substantially invisible to users who browse because absent from the internal linking structure of the site.
Be careful not to confuse them with the dead-end pages: the latter are pages that do not lead anywhere else because they have no outbounding links (but have, on the contrary, inbounding links).
SEO problems for orphan pages
Orphan pages are URLs that cannot be regularly found by users and, in some ways (if they are not in the sitemap), not even by Googlebot, which has the function of following links, external and internal, and determine the structure and shape of the site.
Their presence causes various problems to SEO, such as an untreated index, disturbances to the internal linking structure (if the orphaned page has outgoing links to other resources), but also difficulties with keyword targeting.
Causes of orphan pages
There are several reasons that can lead to the appearance of these URLs: product pages no longer in stock, old news content now disabled or deleted videos.
Other reasons that generate orphan pages are a wrong use of CMS for creating pages, bad management of a migration, categories put offline without a redirect, failure to delete test pages (for example, those used for A/B tests).
Then there are two common technical causes that give rise to orphan pages that should be immediately addressed and resolved, because they essentially create duplicates of pages that should automatically and consistently redirect to a single URL. It is the management of non-canonical HTTPS/HTTP and www/non-www and that of trailing slash, the final slash of the path.
Checking the variants of pages
To check that there are no errors, you can do a simple test: type the four variants of the home page of the site in the browser –
– verify that all four automatically redirect to the same identical URL which, for consistency, should be set as canonical to itself.
If one of these variants does not redirect correctly, it may be a sign of similar problems on the site also in other pages and you need to check other Urls for the offending variant, to see if it is a more common error, then testing some pages of your site and the .htaccess file to ensure that redirects for these are set correctly.
Verifying trailing slash paths
Another thing to pay attention to is the consistent use of the final bars, or trailing slash. For instance, these two URLs may produce the same content, but the URLs are not identical:
To know if the settings are correct, just do a random check on some pages of the site search with and without the final slash, checking that there is an automatic redirect to the same URL and that the choice is consistent.
Negative effects for SEO
In general, the link structure of a website should be organised in a uniform manner to ensure two goals: facilitating the transfer of internal link juice to important pages and ensuring a good user experience.
Leave it that way, orphaned pages have no value to the site and can even become harmful, especially if present in large numbers.
On the one hand, they create frustrating user experiences, because users cannot reach those pages through the natural structure of the site; so, if there is important or useful information on those pages, they go basically wasted.
On the other hand, they can impact on the optimization of the crawl budget and on the quality of visits/conversions of the site: the web crawler can not report a lot of data or a profile favorable to indexing, and this in the long run can affect the positioning, making the website appear of lower quality.
Not having internal links, then, they do not receive any equity and the search engines do not have a semantic or structural context in which to evaluate the page: that is, they have no way to understand where the page is inserted in the site as a whole, and this makes it more difficult to determine for which queries the page is relevant.
The crawlers’ search for pages
Search engines, like Google, usually find new pages in two ways:
- The crawler follows a link from another page.
- The crawler finds the URL listed in the XML sitemap.
For Google to scan and index the page, it must first be able to find it through links; in the case of orphaned pages, this is not possible and therefore these Urls are often not indexed and can never be displayed in search results.
Even if listed in the XML sitemap, orphan pages remain therefore a problem for the SEO and you have to try to locate and correct them.
How to find all the website’s orphan pages
The first step to solving the orphan pages problem is to identify the scannable pages, or create a complete list of Urls that can currently be reached through the crawling of the site links.
It is important to have a list of all active URLs – those that can receive hits from crawlers – and then exclude pages that are not indexable by search engines, because they are classified as noindex or blocked by setting in robots.txt. The scan should always start from the home page of the site and proceed making sure to use the canonical URL, including correct HTTPS or HTTP and www versions or without www.
Comparing the URL lists to find out gaps
Once obtained the scan, it needs to export the list of URLs to an excel worksheet, pasting them into a column.
Now we have to proceed with the gap analysis, which compares the data of different sources in search of any discrepancies: for instance, Google Analytics data, those of the Search Console, Sitemap or the site’s server log files.
What matters is to have complete lists of Urls to be analyzed in search of resources that “are missing” to identify the gaps, in fact: using for example the match formula automatically launches the search for correspondences and absences and it will be possible to find orphan URLs.
How to tackle and fix orphan pages
After performing these steps and finding all the orphaned pages, it is time to understand what fate they must have on the basis of some assessments and reflections:
- Is the page relevant?
- Does it rank for some keywords despite everything?
- Does it generate visits?
- Does it receive backlinks from authoritative external sources?
- Does its existence make sense in the taxonomy of the site?
- Is it optimized?
If the answers are positive, it needs to further enhance this page and insert it within the internal link structure of the site, simply connecting it from a regular existing page; to improve its performance, then, you can update and improve if necessary its content.
On the contrary, if the page is useless and, what is more, has duplicated or almost duplicated content, the best option is to remove it by setting an HTTP status code 404 or 410, which could also offer benefits in terms of crawl budget efficiency.