| I tried to download this URL
// www.nifty.org/nifty/gay/military/
Lots of links inside contain non-filetype-file like
// www.nifty.org/nifty/gay/military/some-file
But some has filetype
// www.nifty.org/nifty/gay/military/some-file.html
// www.nifty.org/nifty/gay/military/some-file.pdf
when i tried to download, the non-filetype-file was renamed to
// some-filehtml.html
this causes problem because basically the file are TXT, and when opened in
browser, the linebreaks are not recognized (since is not a HTML).
I already switch the options not to check document type and use Force old
HTTP/1.0 Request:
// Options > Spider > Check document type : Never
// Options > Spider > Force old HTTP/1.0 Requests : Check
but the problem still occuring. what are the right options to handle this? | |