If your website was lost or hacked, you might have the unfortunate task of recovering the content. We always recommend making regular backups of your site, but if they are not available you have another option.
The Internet Archive, also known as the Wayback Machine takes periodic snapshots of many sites across the internet and may have a copy of your site. So, follow along and we’ll teach you how to search for archives and recover your content from the Wayback Machine. You can then use these pieces to rebuild your site from scratch.
Building a new site? We recommend using WordPress with BoldGrid. It’s super easy to use and is included with our WordPress Hosting packages.
Type your web address in the search field then click the Browse History button. It will list how many times your site was saved over a time period. For example: “Saved 34 times between November 9, 2008 and May 28, 2019.“
You will also see a timeline and a calendar. Click the year to view what dates your site was archived.
Click the date on the calendar to view a snapshot of what was saved. You can try to navigate the site to view any available content. Keep in mind, it may not look exactly like your site since it depends on what was archived at the time.
I recommend checking each year and date to ensure you find all of the content.
Copy Content Manually
Now that you know how to search for and find your website snapshots, you can begin copying the text and images to your computer.
Navigate to each page of the site and copy the text, then paste it into a text editor such as Notepad, Google Docs, or MS Word.
Visit each page in the Internet Archive then right-click and save any images you want to recover to a folder on your computer.
In some cases, you may be able to recover some of the website code. Right-click then select View page source to access the site code. Save it to a text editor for later use.
Scrape Internet Archive Content
If you don’t have time to manually copy each page of the website you’re recovering another option is to pull or scrape all the site content using a script. The following are some of the most popular options available. Keep in mind that these are often coded by 3rd parties or individuals and may require testing and troubleshooting to make them function successfully.
Want to save time? You can pay a 3rd party service to scrape and recover your website for you. Some will even restore content from CMSs such as WordPress. The pricing and scope of service will differ based on the site, so we recommend checking and comparing them to see which one best meets your needs.
Now that you know how to find and recover website content from the Wayback Machine (Internet Archive), you can begin rebuilding your site. Hopefully, your site will return to its former glory with help from the archived copy. We recommend archiving your website with the Wayback Machine, so you will have updated snapshots.
JB
John-Paul BrionesContent Writer II
John-Paul is an Electronics Engineer that spent most of his career in IT. He has been a Technical Writer for InMotion since 2013.
I rest my website. And export the data. After activating WordPress theme (J News) again. I import the post, pages and media. All restored but the images are not updated in posts. How to recover all the images in posts?
I’m sorry to hear that you ran into trouble– Using the Wayback Machine to recover site data is really a method of last resort. You may end up needing to manually track down individual URLs in the Wayback Machine after the initial attempt at recovery, and even so not everything will be archived. I hope you’re able to find enough to get started, though!
Get web hosting that grows with your business. Our all-in-one hosting platform gives you everything your website needs to scale - so you can focus on the next big thing for you and your business.
I ended up just going to developer tools, view source. I didn’t have a massive amount of copy to copy, but still more than I wanted to just type out.
It is not possible to copy content from the pages. It must be blocked. Have tried it on FF, Chrome, Brave – it’s a. no go.
I rest my website. And export the data. After activating WordPress theme (J News) again. I import the post, pages and media. All restored but the images are not updated in posts. How to recover all the images in posts?
In some cases, you will have to copy or download the content manually, if it is available in the Wayback Machine.
https://archivescraper.net/ is 404.
Thanks for letting us know, Jim. Removing it for now.
Trying but not found my site post I have seen lot of webmaster do that,
Hi Alana,
It is possible that your site post was not archived by Wayback Machine, in which case you will likely need to restore from a backup.
Best Regards
can you please share any tutorial for scraping by Python. That you mentioned on your post.
Unfortunately we do not have any such tutorials available. I recommend reaching out to a Python developer for assistance with this task.
it really worked out for me but still a lot of 404 errors in scrapped html
I’m sorry to hear that you ran into trouble– Using the Wayback Machine to recover site data is really a method of last resort. You may end up needing to manually track down individual URLs in the Wayback Machine after the initial attempt at recovery, and even so not everything will be archived. I hope you’re able to find enough to get started, though!