| 1) Always post the ACTUAL command line used (or log file line two) so we know
what the site is, what ALL your settings are, etc.
2) Always post the URLs you're not getting and from what URL it is
referenced.
3) Always post anything USEFUL from the log file.
4) If you want everything use the near flag (get non-html files related) not
filters.
5) I always run with A) No External Pages so I know where the mirror ends.
With B) browser ID=msie 6 pulldown as some sites don't like a HTT one. With C)
Attempt to detect all links (for JS/CSS.) With D) Timeout=60, retry=9 to avoid
temporary network interruptions from deleting files.
> (winhttrack
> -qiC2%Ps2u1%s%uN0%I0p3DaK0H0%kf2A25000%f0#f -F
> "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)"
> -%F "<!-- Mirrored from %s%s by HTTrack Website
> Copier/3.x [XR&CO'2013], %s -->" -%l "en, en, *"
> <http://www.mynuface.com> -O1 "C:\My Web Sites\test"
> +*.png +*.gif +*.jpg +*.css +*.js
> -ad.doubleclick.net/* -mime:application/foobar )
> 16:00:22 Warning: File has moved from
> www.mynuface.com/ to <http://mynuface.com/>
> 16:00:22 Info: No data seems to have been
By default, HTT stays on site only. Most of the time site.com redirects to
WWW.site.com, so HTT gets nothing.
Your case is the opposite. Drop the "www."
drop your filters, #4
override robots.txt | |