Crawl Budget
The number of pages a search engine bot will crawl on your site within a given timeframe. Larger sites must manage this carefully.
Crawl budget is the number of pages Googlebot (or any search engine crawler) will request from your site within a given timeframe. It is determined by two factors: crawl rate limit (how fast Google can crawl without overloading your server) and crawl demand (how much Google wants to crawl your site based on popularity, freshness, and size).
Why it matters: for small sites (under 1,000 pages), crawl budget is rarely a concern. Google can easily crawl everything. But for large sites (e-commerce with millions of product pages, news sites publishing hundreds of articles daily, or SaaS platforms with user-generated content), crawl budget becomes critical. If Google cannot crawl your important pages, they will not be indexed, and they will not rank. You are essentially invisible for those pages.
How it works: Google does not publicly state your crawl budget as a number. But you can observe it through Google Search Console's Crawl Stats report, which shows how many pages Google crawled per day, the average response time, and which pages were crawled. If you notice that important pages are not being indexed despite being linked in your sitemap, crawl budget may be the issue.
How to optimize: first, eliminate wasted crawl. Block crawlers from low-value pages (admin panels, search result pages, filtered/sorted variations, login pages) via robots.txt. Use noindex on pages that should not appear in search but do not need to be blocked from crawling. Ensure your XML sitemap only includes pages you actually want indexed. Keep your site fast, because slow response times reduce crawl rate. Fix redirect chains (A > B > C should be A > C). Return proper 404s for dead pages instead of soft 404s.
Internal linking matters too: pages that are deeply buried (many clicks from the homepage) get crawled less frequently. Use a flat site architecture where important pages are reachable within 3 clicks. Internal links from high-authority pages pass crawl priority to linked pages.
Common mistakes: blocking important resources (CSS, JS) in robots.txt, which prevents Google from rendering your pages. Having millions of parameterized URLs that waste crawl budget (use canonical tags or robots.txt to handle these). Submitting a sitemap with URLs that redirect or return errors, which wastes crawl budget on non-productive requests.
Practical example: an e-commerce site with 500,000 product pages notices that only 200,000 are indexed. Using Search Console's crawl stats, they find Google is spending 40% of crawl budget on faceted navigation URLs (/shoes?color=red&size=10). They add robots.txt rules to block faceted URLs and update their sitemap to only include canonical product pages. Within three months, indexed pages rise to 380,000, and organic traffic increases 28%.
Related terms
An HTML element that tells search engines which version of a page is the primary one, preventing duplicate content issues.
Website visitors who arrive through unpaid search engine results rather than ads, direct visits, or referral links.
Search Engine Results Page. The page displayed by a search engine in response to a query.
A set of Google metrics (LCP, INP, CLS) that measure real-world page load speed, interactivity, and visual stability.
Put these concepts into action
Oscom connects your SEO, content, ads, and analytics into one system. Stop context-switching between tools.
Start free trial