Re: Trying to understand how httrack mirrors a site

Subject: Re: Trying to understand how httrack mirrors a site

Author: WHRoeder

Date: 03/26/2013 03:21

1) Always post the ACTUAL command line used (or log file line two) so we know
what the site is, what ALL your settings are, etc.
2) Always post the URLs you're not getting and from what URL it is
referenced.
3) Always post anything USEFUL from the log file.
4) If you want everything use the near flag (get non-html files related) not
filters.
5) I always run with A) No External Pages so I know where the mirror ends.
With B) browser ID=msie 6 pulldown as some sites don't like a HTT one. With C)
Attempt to detect all links (for JS/CSS.) With D) Timeout=60, retry=9 to avoid
temporary network interruptions from deleting files.

> My question is after httrack goes into the first
> folder, Vol 1, does httrack download ALL the files
> in this folder or just on the first page and then

HTT does a breath first search.
<http://en.wikipedia.org/wiki/Breadth-first_search> Assuming links to page1
page2 ... are visible in the starting url it gets those two pages, then it
would get all files referenced including pages 3 and 4.

> folders? In other words does httrack download all
> files by level, or all files by folder? Not sure if
By level.

> way of changing this behavior so that all files are
Get html files first

> ask because this download has failed to download all
> the files on numerous occasions and even downloads
> files OUTSIDE of this directory which I have set
> httrack NOT to do but it still does it anyway.
No mind readers here. #1 #2

Create subthread

All articles

Subject	Author	Date
Trying to understand how httrack mirrors a site		03/26/2013 02:54
Re: Trying to understand how httrack mirrors a site		03/26/2013 03:21
Re: Trying to understand how httrack mirrors a site		03/26/2013 04:10