HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Spidering URLs that are .exe-based?
Author: Xavier Roche
Date: 02/28/2013 10:58
 
> The URLs are presented via server-side .exe - see
> www.pcrecruiter.net/pcrbin/wisdombase.exe - and I
> think the fact that the URLs contain .exe rather
> than .htm is throwing the software off track.

No, it should not. You may even have links ending with ".gif" which are
actually HTML files, and httrack will rename the files as ".html" :)

> tried fiddling with the settings, but I can't get it
> to follow the URLs and download the rendered HTML
> pages.

The robots.txt rules may prevent from downloading:

Note: due to www.pcrecruiter.net remote robots.txt rules, links beginning with
these path will be forbidden: /c
gi-bin/, /images/, /mri/, /badv/, /adv/, /oadv/, /gadv/, /iadv/, /img/,
/presentation/, /sos/, /phone/, /clients/, /RCM/, /overview/, /
mailers/, /Templates/, s.htm (see in the options to disable this)

Check in the options to disable that, but with care (if you are the site
admin, I suppose this is okay :p)
 
Reply Create subthread


All articles

Subject Author Date
Spidering URLs that are .exe-based?

02/27/2013 21:39
Re: Spidering URLs that are .exe-based?

02/28/2013 10:58
Re: Spidering URLs that are .exe-based?

02/28/2013 19:15
Re: Spidering URLs that are .exe-based?

02/28/2013 19:27
Re: Spidering URLs that are .exe-based?

02/28/2013 21:29




1

Created with FORUM 2.0.11