| From the FAQ "Why are files sometimes renamed?"
<http://www.httrack.com/html/faq.html#Q1b3>
"HTTrack tries to know the type of remote files. This is useful when links
like <http://www.someweb.com/foo.cgi?id=1> can be either HTML pages, images or
anything else. Locally, foo.cgi will not be recognized as an html page, or as
an image, by your browser. HTTrack has to rename the file as foo.html or
foo.gif so that it can be viewed."
When it works, it's a useful feature.
It's not working reliably for me on one site.
The site has a search.php page. When you request it you always get HTML.
Sometimes httrack renames the file to .html ("store text/html without scan:
search.html"):
"""
$ cd ~
$ httrack -g <http://saa.gov.uk/search.php>
HTTrack3.47-21+libhtsjava.so.2 launched on Mon, 10 Feb 2014 13:54:10 at
<http://saa.gov.uk/search.php>
(httrack -g <http://saa.gov.uk/search.php> )
Information, Warnings and Errors reported for this mirror:
note: the hts-log.txt file, and hts-cache folder, may contain sensitive
information,
such as username/password authentication for websites mirrored in this
project
do not share these files/folders if you want these information to remain
private
Mirror launched on Mon, 10 Feb 2014 13:54:10 by HTTrack Website
Copier/3.47-21+libhtsjava.so.2 [XR&CO'2013]
mirroring <http://saa.gov.uk/search.php> with the wizard help..
13:54:35.gov.uk/Warning: p (1101Warning: store text/html without scan:
search.html
HTTrack Website Copier/3.47-21 mirror complete in 25 seconds : 1 links
scanned, 1 files written (11013 bytes overall) [3886 bytes received at 155
bytes/sec], 11013 bytes transferred using HTTP compression in 1 files, ratio
30%
(No errors, 1 warnings, 0 messages)
Done.
Thanks for using HTTrack!
$ ls
search.html search.html.readme
"""
And sometimes it leaves the file name as .php ("store text/html without scan:
search.php"):
"""
$ mkdir 1
$ cd 1
$ httrack -g <http://saa.gov.uk/search.php>
HTTrack3.47-21+libhtsjava.so.2 launched on Mon, 10 Feb 2014 13:55:13 at
<http://saa.gov.uk/search.php>
(httrack -g <http://saa.gov.uk/search.php> )
Information, Warnings and Errors reported for this mirror:
note: the hts-log.txt file, and hts-cache folder, may contain sensitive
information,
such as username/password authentication for websites mirrored in this
project
do not share these files/folders if you want these information to remain
private
Mirror launched on Mon, 10 Feb 2014 13:55:13 by HTTrack Website
Copier/3.47-21+libhtsjava.so.2 [XR&CO'2013]
mirroring <http://saa.gov.uk/search.php> with the wizard help..
13:55:44.gov.uk/Warning: p (1101Warning: store text/html without scan:
search.php
HTTrack Website Copier/3.47-21 mirror complete in 31 seconds : 2 links
scanned, 1 files written (11013 bytes overall) [3886 bytes received at 125
bytes/sec], 11013 bytes transferred using HTTP compression in 1 files, ratio
30%
(No errors, 1 warnings, 0 messages)
Done.
Thanks for using HTTrack!
$ ls
search.php search.php.readme
"""
These fetches were within a minute of each other.
Is httrack, the server, or something else causing this?
How can I get consistent behavior? | |