| I know about the priority option to only get HTML files.
As a test,
I am attempting to only get HTML files with filters.
HTTrack only processes 1 file with these rules.
I place this .httrackrc in the output folder:
clean
robots 0
deny *
allow html
allow mime:text/html
allow mime:application/xml
Should it not retrieve all HTML files?robots.txt processing is disabled.
I tested on multiple web sites.
httrack is started as in
httrack www.httrack.com
from
hts-log.txt
HTTrack3.43-5+libhtsjava.so.2
HTTrack Website Copier/3.43-5 mirror complete in 1 seconds : 1 links scanned,
1 files written ... bytes transfered using HTTP compression in 1 files, ratio
37%
The single HTTP page is retrieved successfully. | |