Linked pages missing images?? - HTTrack Website Copier Forum

Subject: Linked pages missing images??

Author: Bandit

Date: 01/13/2010 06:16

> I'm downloading a text file list of urls but am
> having trouble getting the images from those pages
> to download.

Sorry, but this doesn't make sense to me.  A "text file of URL's" would
contain URL's only, in text, hence why it is called a "text file".  And you
wouldn't be "downloading" it, as my thought process goes, because it would be
downloadED (i.e. saved) and presumed the starting point (URL list) of the
mirror.

> If the urls are all at:
> <http://articles.site.com/article/pagename>

So to start with, am I correct in thinking that you have a text file saved (in
your project folder perhaps) that has a list of URL's which refer to various
articles published at the location <http://articles.site.com/article/{pagename>}
with the {pagename} as the only difference between each of the URL's in the
list?
> I know the images are at:
> <http://img.site.com/img/pages/articles/subfolder/image-name.jpg>

So there is no correlation between the article "pagename" and the location of
the image files?
So you have this file, say "links.txt, that contains URL's, such as
<http://articles.site.com/article/pageA1.html>
<http://articles.site.com/article/pageB1.html>
<http://articles.site.com/article/pageB2.html>
<http://articles.site.com/article/pageB3.html>
<http://articles.site.com/article/pageC1.html>
and you know (by peeking ahead, I'm supposing) that each of those "pages"
contain images in the form of ''<img src='' on the "img" server of site.com
or
<http://img.site.com/{something}/{something}/*.jpg>

If that's the case, it looks like all is well to me.
(That is, until... "Maximum mirroring depth" - see below)

> 
> So I set my filters to be:
> -*
> +articles.site.com/article/*.*
> +images.site.com/*.*
> +img.site.com/*.*
> -*.pdf

Where did images.site.com come from?  Also, for the middle three (or maybe
should be two), I would use either just /* after the .com or something like
this (if "this" is what you want):
+articles.site.com/*[path]*.html
+img.site.com/*[path]*.jpg


> 
> My mirroring depth and external depth are set to 0
> because I only want the pages in the list of URLs. I
> have checked the "Get non-HTML files related to a
> link" checkbox.
>

I wish I could explain "Why?", but I have NEVER been able to get even images
embedded in the starting URL page unless I've had max mirror depth blank
(default) or set to at least 2 - even with the "get near" option selected.  I
dunno; maybe I'm doing something wrong myself because to me, if I have max
depth set to 1, logically that means I want the starting page and everything
(per filters) to make it "mirrored" locally.  But using the "-*" as the first
filter, and following up with exclusively what I want, eliminates setting the
max depth to 2 as being a problem.

If I'm not wrong about that, it seems pretty unintuitive to me.  (No offense,
Xavier!)

The way I see it, the "get near files" selection just helps you out by saving
you from putting all filters (+'s) you would need without it.  You still seem
to have to leave the max depth blank or set it to something other than 0 or
1.

> There's nothing in the robots.txt that would
> preclude the images from downloading (although I'll
> ignore robots.txt next time I try to mirror anyway
> just to be sure).

Your log file will tell you near the top if there is a robots.txt restriction. 
It's extremely clear when there is one...
 
> Why are the images from the pages not downloading?
Regardless of the fact that I was having trouble following what you were
trying to do, I think changing the Maximum Mirror Depth (not External Depth)
to 2 or higher will get you the images you've been looking for!


> Thanks,
> Ari

HTH,
~Bp

Create subthread

All articles

Subject	Author	Date
newbie - download list of urls		01/11/2010 12:27
Re: newbie - download list of urls		01/11/2010 18:52
Linked pages missing images??		01/13/2010 06:16
Oops.. Bill already answered.		01/13/2010 06:18
Re: Linked pages missing images??		01/13/2010 17:56
Re: newbie - download list of urls		01/13/2010 21:02