| > Since I cared only about the texts, I used this rule
> set "-*.css -*.js -ad.doubleclick.net/*
> -mime:application/foobar -*.gif -*.jpg -*.png -*.tif
> -*.bmp -*.zip -*.tar -*.tgz -*.gz -*.rar -*.z ...
Alternative would be options -> export -> scan mode=html only
> I also disabled the "parsing java files".
Java files are binary. You mean javascript (need not be in separate file)
> The problem was that httrack downloaded thousands of
> similar files, for exmaple, a lot of default*.html.
Look at the hts-cache\new.txt. You'll be able to see what url created the file
and where the url came from.
> interesting is that, httrack was able to fill in the
> "from" entry (the city to depart from) with a city
> name (such as Cleveland), and the same to the "to"
> entry. How can httrack be so "smart" to know what
Httrack does not do forms. Period. What you're seeing in your browser is
either form defaults or entries from your cookies.
| |