HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Trying to scrape ASAP
Author: CB
Date: 08/08/2021 20:24
Mercola's site is a tremendous loss. I am retried and, due to the challenges of
aging, my health is beginning to fail. I credit Mercola's old site for keeping
me alive and I still have much more research to do but it is gone now.

During the 48 hours before the website turned into a pumpkin, I spent hours
with various Linux OS tools trying to scrape the Mercola website. I ran into
the same roadblocks as you found and did not have time to learn enough about
scraping (including programmable spiders) to archive the Mercola site.

All pages that I scraped contained the email signup bribe page so I believe
none of the actual content was scraped (ASP creates and serves a page that
looks like a popup on top of content, but the page is actually a facade and
there is no actual content.)

With HTTrack, I think a user must import either POST data or cookies from
their browser to get ASP to serve the desired page but I never figured out how
to do it. I believe that Dr. Mercola's old site was the most extensive and
valuable heath resource on the web. Due to actions related to personal threats
that I do not understand, he has taken down his life's work!

Much of the site is still here in this archive, though many of its new scrapes
result in redirected pages:

If either Admin or JD have achieved any success in archiving this life-saving
resource, would you be willing to share it with me in some manner?
I did manage to scrape some .pdf files of his articles, but it's no where near
a comprehensive library of his work.
Reply Create subthread

All articles

Subject Author Date
Trying to scrape ASAP

08/05/2021 07:31
Re: Trying to scrape ASAP

08/07/2021 02:31
Re: Trying to scrape ASAP

08/08/2021 20:24
Re: Trying to scrape ASAP

08/10/2021 01:45
Re: Trying to scrape ASAP

08/16/2021 09:24
Re: Trying to scrape ASAP

08/28/2021 00:18


Created with FORUM 2.0.11