Re: Trying to scrape www.mercola.com ASAP - HTTrack Website Copier Forum

Subject: Re: Trying to scrape www.mercola.com ASAP

Author: CB

Date: 08/08/2021 20:24

Mercola's site is a tremendous loss. I am retried and, due to the challenges of
aging, my health is beginning to fail. I credit Mercola's old site for keeping
me alive and I still have much more research to do but it is gone now.

During the 48 hours before the website turned into a pumpkin, I spent hours
with various Linux OS tools trying to scrape the Mercola website. I ran into
the same roadblocks as you found and did not have time to learn enough about
scraping (including programmable spiders) to archive the Mercola site.

All pages that I scraped contained the email signup bribe page so I believe
none of the actual content was scraped (ASP creates and serves a page that
looks like a popup on top of content, but the page is actually a facade and
there is no actual content.)

With HTTrack, I think a user must import either POST data or cookies from
their browser to get ASP to serve the desired page but I never figured out how
to do it. I believe that Dr. Mercola's old site was the most extensive and
valuable heath resource on the web. Due to actions related to personal threats
that I do not understand, he has taken down his life's work!

Much of the site is still here in this archive, though many of its new scrapes
result in redirected pages:
<https://archive.is/articles.mercola.com>

If either Admin or JD have achieved any success in archiving this life-saving
resource, would you be willing to share it with me in some manner?
I did manage to scrape some .pdf files of his articles, but it's no where near
a comprehensive library of his work.

Create subthread

All articles

Subject	Author	Date
Trying to scrape www.mercola.com ASAP		08/05/2021 07:31
Re: Trying to scrape www.mercola.com ASAP		08/07/2021 02:31
Re: Trying to scrape www.mercola.com ASAP		08/08/2021 20:24
Re: Trying to scrape www.mercola.com ASAP		08/10/2021 01:45
Re: Trying to scrape www.mercola.com ASAP		08/16/2021 09:24
Re: Trying to scrape www.mercola.com ASAP		08/28/2021 00:18