Better Sitemap Scraper is a quick and simple solution to scraping sitemaps of large sites.
It was written as scrapebox and many other sitemap scrapers out there do not handle nested sitemaps very well.
This is a 3 click solution that can extract and save to a text file millions of urls from the largest of site’s sitemaps.
- Enter the website address to scrape (full url including http://) – You can also enter a direct link to the sitemap here.
- Choose the folder to save your files.
- Hit start.
- Increase threads to speed up scraping.
- Use proxies to avoid your IP getting blocked.
- Reduce buffer size to run on low-power VPS or PC.
Method: Hunting for expired domains.
First pick a high quality authority site you want to use as your “seed site”. Run Better Sitemap Scraper as above and get its list of urls (can be millions!).
Once you have your huge list of urls from your ‘seed’ site then you will need to run this list through an external link extraction tool (eg Scrapebox). This will visit each url in your list and build up a new list of external webpages that your seed site links to.
That new list of external webpages first needs cleaning up by running through scrapebox to trim the urls to just the domain and then again to remove duplicates.
What you are now left with is a list of domain names that your seed site is linking to. Your final step is to send this list to a bulk availability checker and get back the list of domains that are free to register.
You should end up with a good list of expired domains that are available to register and are linked to from the authority site you chose as your seed.