HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Can't scrape text only from Mic.com
Author: Nancy, fun with Data
Date: 08/27/2021 05:12
 
I looked through the log files and for the majority of the articles pages it
says 
" Warning: could not detect encoding for:
<https://www.mic.com/p/this-couples-gender-reveal-stunt-started-a-deadly-wildfire-theyre-now-facing-20-years-in-jail-82561477>

so any time HTTrack cannot detect encoding for a page it just keeps it as a
TMP file and deletes it in the end. Strange because when i interrupted a
previous scan it turned a LOT of those TMP files into HTML files. So I think
there's a glitch with HTTrack which is over 10 years old now. The devs keep
saying "those are temp files they're suppose to be deleted" but it's not true. 
 
Reply Create subthread


All articles

Subject Author Date
Can't scrape text only from Mic.com

08/26/2021 02:09
Re: Can't scrape text only from Mic.com

08/27/2021 05:12




4

Created with FORUM 2.0.11