HOW TO FIND ALL CURRENT AND ARCHIVED URLS ON A WEBSITE

How to Find All Current and Archived URLs on a Website

How to Find All Current and Archived URLs on a Website

Blog Article

There are various reasons you could want to locate all the URLs on a web site, but your precise goal will ascertain That which you’re trying to find. For instance, you may want to:

Establish each and every indexed URL to research challenges like cannibalization or index bloat
Gather recent and historic URLs Google has found, specifically for website migrations
Locate all 404 URLs to Get well from article-migration glitches
In Each individual state of affairs, a single Software gained’t Present you with almost everything you will need. Unfortunately, Google Lookup Console isn’t exhaustive, and a “web site:illustration.com” research is proscribed and tricky to extract data from.

On this post, I’ll wander you through some resources to build your URL record and in advance of deduplicating the information employing a spreadsheet or Jupyter Notebook, depending on your web site’s dimension.

Outdated sitemaps and crawl exports
In the event you’re on the lookout for URLs that disappeared from your Dwell web page not too long ago, there’s a chance a person on the crew can have saved a sitemap file or perhaps a crawl export ahead of the variations ended up created. In the event you haven’t now, check for these documents; they're able to normally supply what you need. But, should you’re looking at this, you almost certainly did not get so Fortunate.

Archive.org
Archive.org
Archive.org is an invaluable Instrument for Website positioning jobs, funded by donations. When you try to find a website and select the “URLs” solution, you may accessibility up to 10,000 listed URLs.

On the other hand, There are several limitations:

URL Restrict: You can only retrieve around web designer kuala lumpur 10,000 URLs, that is insufficient for larger sites.
Quality: Many URLs may be malformed or reference useful resource documents (e.g., photos or scripts).
No export option: There isn’t a crafted-in approach to export the record.
To bypass the lack of an export button, use a browser scraping plugin like Dataminer.io. Nonetheless, these restrictions indicate Archive.org might not offer a complete Answer for bigger web pages. Also, Archive.org doesn’t point out irrespective of whether Google indexed a URL—but if Archive.org found it, there’s a superb opportunity Google did, far too.

Moz Pro
Even though you may generally utilize a link index to search out exterior internet sites linking to you, these applications also find URLs on your website in the procedure.


The way to utilize it:
Export your inbound back links in Moz Professional to secure a fast and simple listing of goal URLs from the site. Should you’re handling an enormous Internet site, consider using the Moz API to export information beyond what’s workable in Excel or Google Sheets.

It’s imperative that you Take note that Moz Pro doesn’t ensure if URLs are indexed or uncovered by Google. Nevertheless, since most internet sites implement exactly the same robots.txt policies to Moz’s bots because they do to Google’s, this technique usually operates effectively like a proxy for Googlebot’s discoverability.

Google Search Console
Google Research Console delivers numerous valuable resources for constructing your listing of URLs.

Backlinks experiences:


Much like Moz Pro, the One-way links segment supplies exportable lists of focus on URLs. Unfortunately, these exports are capped at one,000 URLs Every. It is possible to use filters for certain pages, but considering that filters don’t apply to the export, you may perhaps should count on browser scraping applications—limited to 500 filtered URLs at a time. Not excellent.

Performance → Search Results:


This export provides you with a summary of webpages getting search impressions. Although the export is limited, You can utilize Google Search Console API for bigger datasets. There's also totally free Google Sheets plugins that simplify pulling extra comprehensive data.

Indexing → Pages report:


This segment presents exports filtered by problem sort, while these are also limited in scope.

Google Analytics
Google Analytics
The Engagement → Internet pages and Screens default report in GA4 is a wonderful source for amassing URLs, that has a generous limit of one hundred,000 URLs.


Even better, you could use filters to develop diverse URL lists, successfully surpassing the 100k limit. One example is, in order to export only blog site URLs, abide by these steps:

Step one: Insert a section towards the report

Step 2: Simply click “Develop a new segment.”


Action 3: Outline the segment that has a narrower URL sample, for example URLs made up of /site/


Take note: URLs found in Google Analytics might not be discoverable by Googlebot or indexed by Google, but they offer worthwhile insights.

Server log documents
Server or CDN log documents are Most likely the final word Resource at your disposal. These logs capture an exhaustive list of each URL path queried by users, Googlebot, or other bots throughout the recorded interval.

Things to consider:

Info measurement: Log data files could be massive, countless sites only retain the final two months of knowledge.
Complexity: Examining log data files is usually difficult, but different resources can be found to simplify the procedure.
Incorporate, and fantastic luck
When you finally’ve gathered URLs from these sources, it’s time to mix them. If your website is sufficiently small, use Excel or, for greater datasets, applications like Google Sheets or Jupyter Notebook. Make certain all URLs are continuously formatted, then deduplicate the list.

And voilà—you now have a comprehensive listing of current, old, and archived URLs. Excellent luck!

Report this page