HTTrack Website Copier
Free software offline browser - FORUM
Subject: Can't restrict 'spidering' to a specified site
Author: G.J.Ellis
Date: 11/14/2006 16:00
 
(I hope I'm not wasting time here: I searched and found related comments but
not this specific issue)

I've just downloaded your new full release in order to grab a small site of
important information.  I have used an older version to good effect (I think
it was 3.13 or thereabouts) so I thought I would be sufficiently familiar to
"go for it" without analysing the new doco'

[Using WinHTTrack front-end 3.40-2]
I need to download this technical reference site.  It is a Standards
information site, so have many links to other official and accademic
information sources.  The full site data is about 30+MB.  Ths bit I really
need is only 17+MB, but the whole site is useful.  I ran it three times on a
sub-address within the domain, with different settings, and it tried
downloading over a Gigabyte before I killed it -- each time it manages to
"break out" and try hoovering up the entire Internet, starting with several
universities, the whole of the BSI and the ISO, and several government
departments...

I have since tried giving it the whole domain and doing a "Download websites
+questions" so I can cancel every external site it links to, but even then it
copies the first few files from each site before skipping the rest, and I now
have >150 other site subdirectories with >500 files.

I have recreated this job from scratch each time to ensure no settings or
caches are modifying the operation.

I can't find in this release a simple option to restrict spidering to the
current or explicitly specifed address.  Such an option exists in the "Experts
only" options page -- my memory may be failing me, but I thought there was a
"general"/"simple" option in previous versions.  I have tried this option on
the default "stay on same address" setting and "stay on same domain."  This
does not make any difference to the spidering operation.

I have also tried explicity setting (in "Limits") the "Maximum external depth"
to "0," even though this is supposed to be the default, to no avail.

How do I tell the front-end to do what I want it to do, rather than what IT
wants?!?;-/

Is there a bug in this release?=-?
Thank you in advance for any help you can give to work around this issue.


oOo


Passing comment:
In this release, the Preferences menu only lets me load of save preferences,
not actually SET them!  I therefore cannot be certain exactly what I am
loading or saving, especially since I can only access any options at all when
actually *defining* a job, not before during or after running said job, so I
have no confidence about which (if any) of those options are being saved.

Otherwise, I thank you again for a spectacularly useful and (generally)
well-assembled piece of software.

Merci beaucoup...
G.J.E.
 
Reply


All articles

Subject Author Date
Can't restrict 'spidering' to a specified site

11/14/2006 16:00
Re: Can't restrict 'spidering' to a specified site

12/28/2006 20:27
Re: Can't restrict 'spidering' to a specified site

12/28/2006 20:52




d

Created with FORUM 2.0.11