| I'm trying to mirror a site that has numbered directories off the root and has
pdf and zip files under each numbered directory. Those are the only two
filestypes I want.
If I simply process the root site, I get "forbidden" even after I log onto the
main site (see below for more details).
The sites structure involves the following (examples used i.e. not the real
URLs, but close):
You log on at <https://sitea.customerhub.net>
The pdfs are from <https://s3.amazonaws.com/sitearesources/63/wcsneray.pdf>
The zip files are from <http://sitearesources.s3.amazonaws.com/63/wcsneray.zip>
Other content will vary by number and filenames after the numbered directory
e.g. <https://s3.amazonaws.com/sitearesources/110/mycontentb.pdf> and
<http://sitearesources.s3.amazonaws.com/110/mycontentb.zip>.
How can I have HTTrack walk the numbered folders from 1 on up and download the
content?
Thanks.
Kevin
| |