HTTrack Website Copier
Free software offline browser - FORUM
Subject: Problems with httrack page capture
Author: John
Date: 04/17/2007 15:12
 
When I try the following httrack command on a site containing images embedded
from other sites, the images and HTML of the page are archived, but the
archived HTML file points to "live" remote images, instead of the files that
httrack has archived.  So instead of seeing the captured, archived images, the
archived HTML page still points to current data on remote sites.  

The command I'm using is:
/usr/local/bin/httrack <https://[site]/[page].html> -n -O . -P [proxy:port] >>
/var/log/httrack_monitor_log  2>&1

The only output that gets recorded is:
WARNING! You are running this program as root!
It might be a good idea to use the -%U option to change the userid:
Example: -%U smith

Mirror launched on Tue, 17 Apr 2007 08:55:01 by HTTrack Website
Copier/3.41+libhtsjava.so.2 [XR&CO'2007]
mirroring <https://[site]/[page].html> with the wizard help..
Done.
Thanks for using HTTrack!


The command seems to run successfully and does not return any errors, and all
the images are captured and stored in a subdirectory, but as I mentioned, the
HTML does not point to those images, pointing instead to the remote, linked
versions.

During one run, I noticed the "robots.txt" error that someone else had pointed
out, but this only seemed to happen once, and the site I'm capturing does not
contain this file.

Thanks,
~John
 
Reply


All articles

Subject Author Date
Problems with httrack page capture

04/17/2007 15:12




8

Created with FORUM 2.0.11