How to Recover your Content from Wayback Machine (Internet Archive)

Table of Contents

Search for Archives
Copy Content Manually
Scrape Internet Archive Content
3rd Party Services

If your website was lost or hacked, you might have the unfortunate task of recovering the content. We always recommend making regular backups of your site, but if they are not available you have another option.

The Internet Archive, also known as the Wayback Machine takes periodic snapshots of many sites across the internet and may have a copy of your site. So, follow along and we’ll teach you how to search for archives and recover your content from the Wayback Machine. You can then use these pieces to rebuild your site from scratch.

Building a new site? We recommend using WordPress with BoldGrid. It’s super easy to use and is included with our WordPress Hosting packages.

Search for Archives

Visit the Wayback Machine at https://archive.org/web.
Type your web address in the search field then click the Browse History button. It will list how many times your site was saved over a time period. For example:
“Saved 34 times between November 9, 2008 and May 28, 2019.“
You will also see a timeline and a calendar. Click the year to view what dates your site was archived.
Click the date on the calendar to view a snapshot of what was saved. You can try to navigate the site to view any available content. Keep in mind, it may not look exactly like your site since it depends on what was archived at the time.
I recommend checking each year and date to ensure you find all of the content.

Copy Content Manually

Now that you know how to search for and find your website snapshots, you can begin copying the text and images to your computer.

Navigate to each page of the site and copy the text, then paste it into a text editor such as Notepad, Google Docs, or MS Word.
Visit each page in the Internet Archive then right-click and save any images you want to recover to a folder on your computer.
In some cases, you may be able to recover some of the website code. Right-click then select View page source to access the site code. Save it to a text editor for later use.

Scrape Internet Archive Content

If you don’t have time to manually copy each page of the website you’re recovering another option is to pull or scrape all the site content using a script. The following are some of the most popular options available. Keep in mind that these are often coded by 3rd parties or individuals and may require testing and troubleshooting to make them function successfully.

3rd Party Services

Want to save time? You can pay a 3rd party service to scrape and recover your website for you. Some will even restore content from CMSs such as WordPress. The pricing and scope of service will differ based on the site, so we recommend checking and comparing them to see which one best meets your needs.

Now that you know how to find and recover website content from the Wayback Machine (Internet Archive), you can begin rebuilding your site. Hopefully, your site will return to its former glory with help from the archived copy. We recommend archiving your website with the Wayback Machine, so you will have updated snapshots.

12 thoughts on “How to Recover your Content from Wayback Machine (Internet Archive)”

I ended up just going to developer tools, view source. I didn’t have a massive amount of copy to copy, but still more than I wanted to just type out.

It is not possible to copy content from the pages. It must be blocked. Have tried it on FF, Chrome, Brave – it’s a. no go.

I rest my website. And export the data. After activating WordPress theme (J News) again. I import the post, pages and media. All restored but the images are not updated in posts. How to recover all the images in posts?

John-Paul Briones says:

September 15, 2022 at 8:55 am

In some cases, you will have to copy or download the content manually, if it is available in the Wayback Machine.

Reply

https://archivescraper.net/ is 404.

Ronnie says:

May 16, 2022 at 12:04 pm

Thanks for letting us know, Jim. Removing it for now.

Reply

Trying but not found my site post I have seen lot of webmaster do that,

Alyssa Kordek says:

November 17, 2020 at 5:17 pm

Hi Alana,

It is possible that your site post was not archived by Wayback Machine, in which case you will likely need to restore from a backup.

Best Regards

Reply

can you please share any tutorial for scraping by Python. That you mentioned on your post.

Alyssa Kordek says:

July 7, 2020 at 2:31 pm

Unfortunately we do not have any such tutorials available. I recommend reaching out to a Python developer for assistance with this task.

Reply

it really worked out for me but still a lot of 404 errors in scrapped html

Ronnie says:

April 24, 2020 at 11:46 am

I’m sorry to hear that you ran into trouble– Using the Wayback Machine to recover site data is really a method of last resort. You may end up needing to manually track down individual URLs in the Wayback Machine after the initial attempt at recovery, and even so not everything will be archived. I hope you’re able to find enough to get started, though!

Reply

Need More Help?