To some extent, every website relies on Google. It is simple- your pages get indexed by Google. This made it possible for people to find you. That’s the way things should operate.
However, it is not the case all the time. Many pages never get indexed by Google.
If you’re working with a large website, you will notice that not every page of your site gets indexed. Many pages wait for weeks before Google picks them up.
There are different factors behind this issue. Many of them are the same factors that are mentioned with regard to ranking, for example, they are content quality and links. At times, these factors become very complex and too technical. Modern websites heavily rely on new web technologies as they have suffered from indexing issues in the past and some still do.
Many SEOs still believe that it is because of technical things that prevent Google from indexing content. However, it is a myth. Though it is true that Google might not index your web pages if you don’t send steady technical signals as to which pages like to index or you have an insufficient crawl budget, it is important that you’re consistent with the quality of your content.
Reasons Why Google Isn’t Indexing Your Pages
Google Search Console reports multiple statuses for unindexed pages, like ‘crawled – currently not indexed’ or ‘Discovered – currently not indexed’. Though this information doesn’t fully address the issue, but it is a good place to start diagnostics.
The top indexing issues are:
- Crawled – currently not indexed
It means Google visited a page but didn’t index it. This happens because of the quality of the content issue. Because of the e-commerce boom, Google has become careful when it comes to quality. So if you see your content ‘Crawled – currently not indexed’ make sure your content is uniquely valuable. You should use unique titles, descriptions, and copies on all indexable pages. Avoid copying product descriptions from external sources. Use canonical tags to find duplicate content. You need to block Google from crawling or indexing low-quality sections of your website using the robots.txt file or the no-index tag.
- “Discovered – currently not indexed”
It is our favourite issue. It encompasses everything from crawling issues to insufficient content quality. It’s a huge problem, particularly in the case of big e-commerce stores, and we’ve seen this to apply to millions of URLs on a single website.
There are two reasons for Google may report that e-commerce product pages are ‘Discovered-currently not indexed’. They are:
- A Crawl Budget Issue: There may be too many URLs in the queue of the crawling and they may be crawled and indexed later.
- A Quality Issue: Google may think some pages on that domain aren’t worth of crawling and decide not to visit them by searching for a pattern in their URL.
Dealing with this problem requires some expertise. If you have discovered that your pages are “Discovered – currently not indexed”, do the following:
- Find if there are patterns of pages that fall into this category. Maybe the problem is related to a specific category of products and the entire category isn’t linked internally? Or maybe a large portion of pages are lined up in the queue to get indexed?
- You should optimize your crawl budget. Find out the low-quality pages that Google spends a lot of time crawling. The usual suspects are filtered category pages and internal search pages – these pages can go to a typical e-commerce site. If Googlebot is able to crawl them freely, it may not have the resources to get to the valuable stuff on your website indexed in Google.
- “Duplicate Content”
Duplicate content may be caused by different reasons like:
- Language variations are one of the causes. If you have different versions of the same page that are targeted at the different countries, a few of these pages may end up remaining unindexed.
- Your competitors have used duplicate content. This often happens in the e-commerce industry where several website users use the same product description provided by the manufacturer.
Apart from using rel=canonical, 301 redirects, or creating unique content, we would focus on giving unique value to the users. Fast-growing-trees.com is an example. Instead of boring descriptions and tips on planting and watering, the website gives permission to see a detailed FAQ for many products.
Also, you can easily make comparisons between similar products.
It provides an FAQ for many products. Also, every customer can ask a detailed question about the plant and get an answer from the community.
How to check your website’s index coverage
You have the option to check how many pages of your website aren’t indexed by checking the Index Coverage Report in Google Search Console.
The first thing you should check is the number of excluded pages. Then try to find a pattern that what types of pages don’t get indexed.
If you own an e-commerce store, you’ll most probably find unindexed product pages. Though it is a warning sign, you can’t expect to have your entire product pages indexed, especially with a large website. For example, a large e-commerce store is bound to have multiple duplicate pages and expired or out-of-stock products. These pages may lack the quality that would position them at the front of Google’s indexing queue.
Also, large e-commerce websites face issues with crawl budgets. I’ve seen cases of e-commerce stores having over a million products while 90% of them were classified as “Discovered – currently not indexed”. But if you see that important pages are being excluded from Google’s index then you should be concerned deeply.
How to Increase the Probability Google Will Index Your Pages
Every website is different and may face different indexing issues. However, there are some of the best practices that should help your pages to get indexed:
- Avoid the “Soft 404” signals
You should ensure that your pages don’t contain anything that may wrongly indicate a soft 404 status. This includes anything from using “Not found” or “Not available” in the copy to having the number “404” in the URL.
- Use internal linking
Internal linking is one of the key signals for Google. A particular page becomes an important part of the website and deserves to be indexed. Leave no orphan pages in the structure of your website and don’t forget to add all indexable pages in your sitemaps.
- Implement a strong crawling strategy
Don’t allow Google to crawl cruft on your website. It might take a long time for Google to get to the good stuff if too many resources are spent crawling the less important part of your domain. Server log analysis can offer you the full picture of what Googlebot crawls and how to optimize it.
- Get rid of low-quality and duplicate content
Every large website ends up with some pages that shouldn’t be indexed. You need to make sure that these pages aren’t able to find their way into your sitemaps, and use the noindex tag and the robots.txt file when appropriate. If you allow Google to spend too much time in the worst parts of your site, it may underestimate the overall quality of your domain.
- Send consistent SEO signals