HTTrack Website Copier
Free software offline browser - FORUM
Subject: Anti-HTTrack 404 Protection
Author: Will
Date: 09/28/2013 02:17
 
I've seen larger, more established websites do this where the page loads a
file, and you are also allowed to access the file via viewing of page source.
However, when you try to access a directory containing the file and others
similar to it, the site returns a 404.

For example, sites like NotDoppler have this protection where you can crawl
<http://i.notdoppler.com/files/strikeforceheroes2.swf>, though when you try to
spider <http://i.notdoppler.com/files/> or <http://i.notdoppler.com>, httrack
returns | Error:  "Not Found" (404) at link i.notdoppler.com/

Other similar sites such as 1cup1coffee.com do not have these restrictions and
allow downloading all the swf content in the website. 

Httrack is set to ignore robots.txt, and if the described urls are entered in
a browser it will return 404 as well. How can you download the data and how
does it work?
 
Reply


All articles

Subject Author Date
Anti-HTTrack 404 Protection

09/28/2013 02:17
Re: Anti-HTTrack 404 Protection

09/28/2013 10:11
Re: Anti-HTTrack 404 Protection

09/28/2013 15:32
Re: Anti-HTTrack 404 Protection

09/29/2013 13:54
Re: Anti-HTTrack 404 Protection

09/29/2013 18:05
Re: Anti-HTTrack 404 Protection

09/30/2013 11:42
Re: Anti-HTTrack 404 Protection

10/03/2013 02:46
Re: Anti-HTTrack 404 Protection

01/17/2014 01:14




f

Created with FORUM 2.0.11