| > First of all, I want to store a site offline obviously,
for
> example, techtv.com. Now I want it to store everything
> that's in techtv.com like techtv/screensavers.com or
> techtv/screensavers/pc.com but no url that begins with any
> other name than techtv, so what do i do?
Use scan rules (Set Options / Scan rules):
-* +techtv.com/*
> Second, after 3
> links deep for some reason my pics won't show. I don't
know
> if it's gotten to them yet or just won't download any
pics 3
> links deep but it left the main site and is storing all
> other sites as we speak so i figure it may be done with
the
> main site.
Depth == 3 means "get all links reachable using three
clicks from the top index". The top index is the index
created by httrack that points to the given URLs.
Note that you generally should NOT use depth - default
settings and scan rules should be fine.
> And finally, i want to store everything in the
> site. And I mean everything, even downloading files. I
want
> the site to where you can't even tell you're offline. Does
> it do that by default or do I have to put in a whole bunch
> of scan commands?
This is normally the default behaviour is you start
from / ; please note however that:
- some folders might be missing due to robots.txt rules
- some files might be missing because they are located
outside the domain (use "Get files near a link" to solve
that)
- some javascript links might be bogus is the code is too
complex for httrack
Please also use bandwidth limits if you plan to mirror a
whole site! See in 'Set options'/'Limits' and 'Set
Options'/'Flow Control'. I suggest for large sites:
- no more than 2 simultaneous connections
- no more than 20KB/s
- no more than 2 connections per second
| |