HTTrack Website Copier
Free software offline browser - FORUM
Subject: READ 1ST, instead. My settings.
Author: Charlieb000
Date: 12/28/2012 04:55
 
Hello,
i probably should start off by saying what i have told HTTrack to do, and say
it calmy not grumpily. i am using the latest version.

i have started this project with the following settings:
<http://www.mnwelldir.org/>
**Scan rules:
ticked pictures
ticked archives
ticked media
manually added documents: +*.pdf +*.djvu (should also have added doc docx,
etc)
**limits
maximum external depth: changed to 0, however on later tries: blank.
increased max tfer rate. 70k
**flow control
numbr of conns: 6
**links
UNTICKED attempt to detect all links (tags,js)
ticked Get html files first.
UNTICKED Get [external?] non-HTML files related to link eg, external
pictures/zips
**build
ticked no error pages
ticked no external pages
ticked do not purge old files
**Spider
UNTICKED parse java files



i commenced downloading and right off the bat it was disobeying settings, i
quit it and reconfigured the settings, it still started accessing external
sites, and the extra directories totalled 105, and contributed at least 11% of
the total download.


i was unhappy that it was downloading so much material from other sites that i
deleted it all except the directory containing the above website and I
configured it to start again. i had ticked "do not purge old files" assuming
this meant that if the file exists, dont delete it. however i saw it had
commenced overwriting the large MP3.  the only thing that i had left to stop
it from getting external sites was java so i unticked that assuming maybe the
bug lay there - so, as written above i disabled java in the links and spider
tabs. still no fix.

The first step would be to see why it wants to download the "donate" images
from paypal, even though the "Get non-HTML files related to link" was not
checked. perhaps it was due to the image being used on the site? if so then
perhaps two check boxes: "get embedded external resources: images (eg paypal),
SWF, youtube etc" and one named "get linked external resources: zips, images
etc" but this is so simular you may as well make it one definition, though the
question is, would people be use them seperate? (i dont think you will do
youtube :P)


BIGGEST PROBLEMS:
1: downloads from other sites, no option to stop it.
2: change text from "do not purge old files" to "keep old versions of updated
files" due to misunderstanding (the tooltip shows what it was meant to be).
3a: when user has selected "continue interrupted download" do not redownload
and check for an update of any file, except the last active files in case they
were damaged. if this is not suitable then how about:
3b: add checkbox: "Tedownload and check for updated files with same file size
- under [textbox] bytes. Local and external files (of any size) with size
mismatch will be updated." (tooltip: unckecked: redownload and check all
files) could seperate the second half of message for a second checkbox.



SUGGESTIONS:
1:allow larger files to be downloaded later, keep them in the list and i can
use httrack to download them as i need them.
2:add document group to the scan rules.
3:add more websites for program to download while running already.




 
Reply Create subthread


All articles

Subject Author Date
How to keep HTTrack from copying external websites

12/12/2011 03:20
Re: How to keep HTTrack from copying external websites

12/12/2011 23:44
Re: How to keep HTTrack from copying external websites

12/14/2011 00:29
Re: How to keep HTTrack from copying external websites

12/14/2011 15:02
have always wished it would work propper...

12/28/2012 02:29
Re: have always wished it would work propper...

12/28/2012 02:46
READ 1ST, instead. My settings.

12/28/2012 04:55




1

Created with FORUM 2.0.11