| > I have been trying to mirror the site
> I have disabled the
> robots.txt and set the browser to firefox without
Don't tell us what you think you did, always post the actual command line used
(or log file line two)
> (webhttrack -q -%i -iC2 ecg.bidmc.harvard.edu/
> ecg.bidmc.harvard.edu/maven/ -O
> "/home/craig/websites/ECG Maven" -n -%P -N0 -s0 -p7
> -D -a -K0 -c4 -%k -%e2 -A25000 -F "Mozilla/4.5
> (compatible; HTTrack 3.0x; Windows 98)" -%F "<!--
> Mirrored from %s%s by HTTrack Website Copier/3.x
> [XR&CO'2008], %s -->" +*.png +*.gif +*.jpg +*.css
> +*.js -ad.doubleclick.net/* +*.asp +*.asp*
> -www.provost.harvard.edu/* +*.pdf -bidmc.org/*
> -www.med-ed-online.org/* -%s -%u -k )
1) some sites do not like a HTTrack Browser ID. I only run with msie6
2) There is no robots.txt so overriding it wouldn't help. -s0 says you
didn't.
3) +*.png +8.gif are unnecessary, if you want everything use the near flag
(get non-html files related)
4) the -www.... are unnecessary, be default HTT stays on the starting site
only.
5) ecg.bidmc.harvard.edu/maven the only links on that page that stay on that
site are the ones in the box with "Please choose one of the following:" and
all those reach a form on the next page. HTT does not click on forms. | |