HTTrack Website Copier
Free software offline browser - FORUM
Subject: Problems grabbing files from websites
Author: DutchDude
Date: 11/15/2005 23:00

- The goal : Grab all pdf files from the site by Retro System

- The Problem : Because this site has pdf's in different directories (so the
pdf's name are the same, but their content differ) the program is not able to
capture them correctly

- The thought : I tried the filerenaming option '%n (%M).%t' to rename each
grabbed pdf into a unique name, but it is completely ignored by the program.
All pdf's are stored with the original name in a single directory, thus some
pdf's are lost. I will even settle if the program would create a duplicate
file when the hashes of the present file, and the one downloaded earlier do
not match.

- The goal : Grab all pdf files from the site by System

- The Problem : This site has a flood protection script. I have set the
connections/sec to 0.1 (which should be enough to prevent beiing banned) but
somehow the program sometimes tilts and totally ignores the con/sec parameter
and creates way more connects/sec then specified by settings. Perhaps this is
caused by failed/retry/unaswered connection requests. Anyhow.... it should
allways honour the parameter set, then the ban would not occur. I would also
like, to prevent bans from occuring, an option to pause x secs between
transfers (not after x bytes), which also will avoid the ban.


All articles

Subject Author Date
Problems grabbing files from websites

11/15/2005 23:00
Re: Problems grabbing files from websites

11/15/2005 23:05


Created with FORUM 2.0.11