Problems with httrack page capture - HTTrack Website Copier Forum

Subject: Problems with httrack page capture

Author: John

Date: 04/17/2007 15:12

When I try the following httrack command on a site containing images embedded
from other sites, the images and HTML of the page are archived, but the
archived HTML file points to "live" remote images, instead of the files that
httrack has archived.  So instead of seeing the captured, archived images, the
archived HTML page still points to current data on remote sites.  

The command I'm using is:
/usr/local/bin/httrack <https://[site]/[page].html> -n -O . -P [proxy:port] >>
/var/log/httrack_monitor_log  2>&1

The only output that gets recorded is:
WARNING! You are running this program as root!
It might be a good idea to use the -%U option to change the userid:
Example: -%U smith

Mirror launched on Tue, 17 Apr 2007 08:55:01 by HTTrack Website
Copier/3.41+libhtsjava.so.2 [XR&CO'2007]
mirroring <https://[site]/[page].html> with the wizard help..
Done.
Thanks for using HTTrack!


The command seems to run successfully and does not return any errors, and all
the images are captured and stored in a subdirectory, but as I mentioned, the
HTML does not point to those images, pointing instead to the remote, linked
versions.

During one run, I noticed the "robots.txt" error that someone else had pointed
out, but this only seemed to happen once, and the site I'm capturing does not
contain this file.

Thanks,
~John

All articles

Subject	Author	Date
Problems with httrack page capture		04/17/2007 15:12