HTTrack Website Copier
Free software offline browser - FORUM
Subject: Inconsistent renaming behavior (php/html)
Author: Iain Elder
Date: 02/10/2014 15:16
 
From the FAQ "Why are files sometimes renamed?"

<http://www.httrack.com/html/faq.html#Q1b3>

"HTTrack tries to know the type of remote files. This is useful when links
like <http://www.someweb.com/foo.cgi?id=1> can be either HTML pages, images or
anything else. Locally, foo.cgi will not be recognized as an html page, or as
an image, by your browser. HTTrack has to rename the file as foo.html or
foo.gif so that it can be viewed."

When it works, it's a useful feature.

It's not working reliably for me on one site.

The site has a search.php page. When you request it you always get HTML.

Sometimes httrack renames the file to .html ("store text/html without scan:
search.html"):

"""
$ cd ~
$ httrack -g <http://saa.gov.uk/search.php>
HTTrack3.47-21+libhtsjava.so.2 launched on Mon, 10 Feb 2014 13:54:10 at
<http://saa.gov.uk/search.php>
(httrack -g <http://saa.gov.uk/search.php> )

Information, Warnings and Errors reported for this mirror:
note:	the hts-log.txt file, and hts-cache folder, may contain sensitive
information,
	such as username/password authentication for websites mirrored in this
project
	do not share these files/folders if you want these information to remain
private

Mirror launched on Mon, 10 Feb 2014 13:54:10 by HTTrack Website
Copier/3.47-21+libhtsjava.so.2 [XR&CO'2013]
mirroring <http://saa.gov.uk/search.php> with the wizard help..
13:54:35.gov.uk/Warning: p (1101Warning: store text/html without scan:
search.html

HTTrack Website Copier/3.47-21 mirror complete in 25 seconds : 1 links
scanned, 1 files written (11013 bytes overall) [3886 bytes received at 155
bytes/sec], 11013 bytes transferred using HTTP compression in 1 files, ratio
30%
(No errors, 1 warnings, 0 messages)
Done.
Thanks for using HTTrack!
$ ls
search.html  search.html.readme
"""

And sometimes it leaves the file name as .php ("store text/html without scan:
search.php"):

"""
$ mkdir 1
$ cd 1
$ httrack -g <http://saa.gov.uk/search.php>
HTTrack3.47-21+libhtsjava.so.2 launched on Mon, 10 Feb 2014 13:55:13 at
<http://saa.gov.uk/search.php>
(httrack -g <http://saa.gov.uk/search.php> )

Information, Warnings and Errors reported for this mirror:
note:	the hts-log.txt file, and hts-cache folder, may contain sensitive
information,
	such as username/password authentication for websites mirrored in this
project
	do not share these files/folders if you want these information to remain
private

Mirror launched on Mon, 10 Feb 2014 13:55:13 by HTTrack Website
Copier/3.47-21+libhtsjava.so.2 [XR&CO'2013]
mirroring <http://saa.gov.uk/search.php> with the wizard help..
13:55:44.gov.uk/Warning: p (1101Warning: store text/html without scan:
search.php

HTTrack Website Copier/3.47-21 mirror complete in 31 seconds : 2 links
scanned, 1 files written (11013 bytes overall) [3886 bytes received at 125
bytes/sec], 11013 bytes transferred using HTTP compression in 1 files, ratio
30%
(No errors, 1 warnings, 0 messages)
Done.
Thanks for using HTTrack!
$ ls
search.php  search.php.readme
"""

These fetches were within a minute of each other.

Is httrack, the server, or something else causing this?
How can I get consistent behavior?
 
Reply


All articles

Subject Author Date
Inconsistent renaming behavior (php/html)

02/10/2014 15:16




c

Created with FORUM 2.0.11