How to Extract A Domain’s List Of URLs Using The Sitemap, Step by Step
There a lot of reasons why you might need to document all the pages in your website — conducting an audit, revamping your site’s design, or keeping track of redirects just to name a few. But if you don’t have access to fancy WordPress plugins or other licenses, how do you pull an exhaustive list of your URLs?
On today’s episode of the SEO Tips & Tricks Podcast, Tim Jennings walks us through a step-by-step approach to extracting your domain’s URLs using your sitemap.
Please note that I currently have no affiliation with any of the tools other than I love to use them. —Tim
What to Listen For
- 01:05 - Tools for Deciphering the Sitemap
- 02:03 - Finding the Sitemap
- 02:36 - Viewing the URL list
- 04:50 - Creating a Spreadsheet
Transcript
Hello and welcome to another episode of the Search Engine Tips and Tricks Podcast. My name is Tim Jennings and I am the VP of People and Technology at Soulheart. Soulheart exists to relentlessly problem solve and help brands make a positive impact on the world through digital marketing.
On today’s episode, I want to share a really neat tool that can save you hours of brain numbing time. Today I’m going to talk about how you can easily extract the URLs from a sitemap.
Knowing the exact URLs that your site contains has many SEO benefits. The most common reason is a list of your domains URLs help you transition a web overhaul or web redesign project properly without any broken links. But downloading the file or viewing it in your web browser can be very difficult to decipher. So let’s jump in!
As I mentioned earlier, there are many reasons to gather a list of your website’s URLs. There are a few ways to do this and I would suggest using a couple to ensure accuracy. If you’re using WordPress, there are plugins that do this job perfectly. But what if you’re not using WordPress? If you aren’t using WordPress but still need to extract a list of URLs, Screaming Frog is probably the most popular tool for the job. And there’s a reason for that. It’s very powerful. But some people may not be able to afford the license that allows you to crawl more than 500 pages, or they may not have the technical ability to set up the settings how they need them. So today I want to tell you about an awesome tool I found online that allows anyone to extract a website’s URL list from the sitemap.
The first thing you’re going to want to do is find your sitemap. If you don’t know where it’s located, usually you can find it by typing your domain followed by /sitemap.xml into the address bar of your web browser. Other common URL naming conventions you can try are /sitemap-index.xml and /sitemap1.xml.
Once you have the sitemap location it’s time to have some fun.
Now that you have your sitemap address you’re going to want to open a new tab on your browser. In that new tab enter https://robhammond.co/tools/xml-extract into the address bar. This tool is provided by Rob Hammond and is a great help to the web community! Thanks Rob!
Once you have the site up you will want to go back to your sitemap and copy the URL to it. Then in Rob’s tool you’ll simply paste the sitemap’s URL and click start. The results will appear almost immediately. You will see an easy to digest table that clearly lists each URL in that sitemap.
Note: if you have an extensive sitemap you may have to enter several different sitemap URLs in order to get everything. For instance, I was working on a site that the site owner wants us to transfer their site to WordPress from Blogger. Their Blogger sitemap only allowed for 150 pages so they had a 44 page sitemap just for their webpages. That’s not an image sitemap or any other specialty sitemap – just the web pages.
Now that you have gotten the list of URLs to display nicely you’ll want to create a spreadsheet that you can access quickly and easily.
Copy the URLs from Rob’s tool and paste them into the newly created spreadsheet. Continue to add all URLs found from your sitemaps until you have a list of all of the URLs your domain has.
And there you have it! An incredibly easy way to extract your website URLs from your sitemap! I hope this helps you in your SEO efforts.
As always, if you have any questions or if this just sounds incredibly tedious, I’m here to help! Feel free to reach out. You can email me at tim@soulheart.co or head over to our website which is soulheart.co
Have a great day everyone!
Episode Resources
Whether you’re performing a website overhaul, creating new content, or looking to boost your overall SEO, Soulheart has the SEO tools and expertise to help you. We’d love to learn how we can help you reach your marketing goals this year! Just sign up below to book a chat with us.