Re: Can't scrape text only from Mic.com - HTTrack Website Copier Forum

Subject: Re: Can't scrape text only from Mic.com

Author: Nancy, fun with Data

Date: 08/27/2021 05:12

I looked through the log files and for the majority of the articles pages it
says 
" Warning: could not detect encoding for:
<https://www.mic.com/p/this-couples-gender-reveal-stunt-started-a-deadly-wildfire-theyre-now-facing-20-years-in-jail-82561477>

so any time HTTrack cannot detect encoding for a page it just keeps it as a
TMP file and deletes it in the end. Strange because when i interrupted a
previous scan it turned a LOT of those TMP files into HTML files. So I think
there's a glitch with HTTrack which is over 10 years old now. The devs keep
saying "those are temp files they're suppose to be deleted" but it's not true.

Create subthread

All articles

Subject	Author	Date
Can't scrape text only from Mic.com		08/26/2021 02:09
Re: Can't scrape text only from Mic.com		08/27/2021 05:12