HTTrack Website Copier
Free software offline browser - FORUM
Subject: Linked pages missing images??
Author: Bandit
Date: 01/13/2010 06:16
> I'm downloading a text file list of urls but am
> having trouble getting the images from those pages
> to download.

Sorry, but this doesn't make sense to me.  A "text file of URL's" would
contain URL's only, in text, hence why it is called a "text file".  And you
wouldn't be "downloading" it, as my thought process goes, because it would be
downloadED (i.e. saved) and presumed the starting point (URL list) of the

> If the urls are all at:
> <>

So to start with, am I correct in thinking that you have a text file saved (in
your project folder perhaps) that has a list of URL's which refer to various
articles published at the location <{pagename>}
with the {pagename} as the only difference between each of the URL's in the
> I know the images are at:
> <>

So there is no correlation between the article "pagename" and the location of
the image files?
So you have this file, say "links.txt, that contains URL's, such as
and you know (by peeking ahead, I'm supposing) that each of those "pages"
contain images in the form of ''<img src='' on the "img" server of

If that's the case, it looks like all is well to me.
(That is, until... "Maximum mirroring depth" - see below)

> So I set my filters to be:
> -*
> -*.pdf

Where did come from?  Also, for the middle three (or maybe
should be two), I would use either just /* after the .com or something like
this (if "this" is what you want):*[path]*.html*[path]*.jpg

> My mirroring depth and external depth are set to 0
> because I only want the pages in the list of URLs. I
> have checked the "Get non-HTML files related to a
> link" checkbox.

I wish I could explain "Why?", but I have NEVER been able to get even images
embedded in the starting URL page unless I've had max mirror depth blank
(default) or set to at least 2 - even with the "get near" option selected.  I
dunno; maybe I'm doing something wrong myself because to me, if I have max
depth set to 1, logically that means I want the starting page and everything
(per filters) to make it "mirrored" locally.  But using the "-*" as the first
filter, and following up with exclusively what I want, eliminates setting the
max depth to 2 as being a problem.

If I'm not wrong about that, it seems pretty unintuitive to me.  (No offense,

The way I see it, the "get near files" selection just helps you out by saving
you from putting all filters (+'s) you would need without it.  You still seem
to have to leave the max depth blank or set it to something other than 0 or

> There's nothing in the robots.txt that would
> preclude the images from downloading (although I'll
> ignore robots.txt next time I try to mirror anyway
> just to be sure).

Your log file will tell you near the top if there is a robots.txt restriction. 
It's extremely clear when there is one...
> Why are the images from the pages not downloading?
Regardless of the fact that I was having trouble following what you were
trying to do, I think changing the Maximum Mirror Depth (not External Depth)
to 2 or higher will get you the images you've been looking for!

> Thanks,
> Ari

Reply Create subthread

All articles

Subject Author Date
newbie - download list of urls

01/11/2010 12:27
Re: newbie - download list of urls

01/11/2010 18:52
Linked pages missing images??

01/13/2010 06:16
Oops.. Bill already answered.

01/13/2010 06:18
Re: Linked pages missing images??

01/13/2010 17:56
Re: newbie - download list of urls

01/13/2010 21:02


Created with FORUM 2.0.11