HTTrack Website Copier
Free software offline browser - FORUM
Subject: Can't scrape text only from Mic.com
Author: Nancy, fun with Data
Date: 08/26/2021 02:09
 
Hi everyone.

I'm trying to scrape Mic.com for a data mining exercise and all I want is the
CSS and the text of the articles, no images or video. 

Mic.com has around 70,000 articles but Httrack grabs about 1,700 html files
and 70,000 TMP temp files. Then at the end deletes all the TMP files :(

I have Windows 10, Latest version of WinHttrack. After 2 days of suffering
through different options I get the same result. It keeps deleting all the
articles except for 1,700 of them. I've tried everything. 

WHAT am I suppose to put in the options to get this to work? Thank you so
much!
 
Reply


All articles

Subject Author Date
Can't scrape text only from Mic.com

08/26/2021 02:09
Re: Can't scrape text only from Mic.com

08/27/2021 05:12




d

Created with FORUM 2.0.11