|
I've seen page that has some escaped newlines and tabs in the url. Httrack
doesnt handle it well.
AFAIK, browsers ignore those extra characters.
Httrack reads the URL as having the newlines (i guess) so the links get the
404 "not found".
In the cache the newline is written so instead of the entry being 1 line, its
more (which messed up my parser :/).
The test URL for it is:
<http://www.cm-chamusca.pt/chamusca/concelho/informacaogeografica/?wbcmode=presentationunpublished>
A mirror with depth 2 exemplifies what i've said.
Greetings,
Joao Luzio | |