Technical SEO
Indexing
Indexing is how search engines store and organize your content so it can appear in search results. For content to be found in search, it must be indexed. Understanding how indexing works helps you ensure your important content gets indexed and control what shouldn't appear in search results.
How indexing works
After search engines crawl your site and discover content, they analyze it and decide whether to add it to their index—a massive database of web pages that can be searched. Only indexed content can appear in search results.
Not all crawled content gets indexed. Search engines may choose not to index duplicate content, low-quality pages, pages that don't meet quality guidelines, or pages that are blocked. The indexing process involves evaluating content quality and relevance.
You can check if your content is indexed using Google Search Console's Coverage report. This shows which pages are indexed, which aren't, and why. Understanding indexing status helps you identify and fix issues that prevent important content from appearing in search.
Ensuring content gets indexed
Help search engines index your content by:
- Submitting sitemaps: Provide XML sitemaps to Google Search Console to help search engines discover your content
- Internal linking: Link to important pages from other pages on your site so search engines can discover them
- Avoiding duplicate content: Ensure each page has unique, valuable content
- Proper site structure: Create a clear structure that makes it easy for search engines to navigate
- Quality content: Create content that meets quality guidelines and provides value
Common reasons content doesn't get indexed include: pages blocked by robots.txt, pages marked as noindex, duplicate content, low-quality content, or pages with no internal links. Fix these issues to help important content get indexed.
Controlling what gets indexed
Sometimes you want to prevent content from being indexed. Use these methods:
- Noindex tags: Add a noindex meta tag to pages you don't want in search results (e.g., thank-you pages, internal tools)
- Robots.txt: Block crawlers from accessing certain pages (though this doesn't guarantee they won't be indexed if linked from elsewhere)
- Password protection: Protect pages behind login to prevent indexing
Use noindex for pages you don't want in search results but still want accessible via direct links. Use robots.txt to prevent crawling of pages that shouldn't be accessed at all. Be careful with robots.txt—blocking important pages can prevent them from being indexed.
Examples
Well-indexed content
Example: A coffee shop website where all important pages are indexed: homepage, menu pages, location pages, and blog posts. The site has a sitemap submitted to Search Console, important pages are linked from the homepage and navigation, and each page has unique, valuable content.
This site makes it easy for search engines to discover and index important content, which helps the site appear in search results.
Indexing problems
Example: A website where important pages aren't indexed because they have no internal links, are blocked by robots.txt, or are marked as noindex accidentally. The site has no sitemap, and search engines can't discover important content.
This prevents important content from appearing in search results, limiting the site's visibility and ability to serve customers through search.
Previous topic
Mobile Optimisation
Ensuring your site works well on mobile
Next topic
Schema Markup
How structured data helps search engines