HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Wildcards in URL's
Author: Bandit
Date: 11/24/2009 18:18
 
> > I'm trying to get all the images from a site that
> > follow this structure:
> > <http://www.example.com/download_example.php?file=*.jpg>
> > Where * is a wildcard that varies in size.
> 
> You can NOT use wildcards in urls. You MUST use a
> valid url in every call to a webserver.

I agree with that.
In my other post about the filter(s), the only way that would work is with the
correct starting page.  If, for example,
<http://www.example.com/download_example.php?file=1.jpg>
contains a link to 
<http://www.example.com/download_example.php?file=2.jpg>
and so on, each url linking to the next url until the list/site is exhausted,
then the filter may/should work using
<http://www.example.com/download_example.php?file=1.jpg>
as the starting URL.

If there are no direct links to 
<http://www.example.com/download_example.php?file=anything_else.jpg>
on
<http://www.example.com/download_example.php?file=starting_url.jpg>
then it appears that the way I recommended to filter would still fail.

Alternately, if there exists pages which refer to the download_example.php
URL's, such as 
<http://www.example.com/downloads.php?page=1>
then you would probably need to use something like the resources below to
generate the URL's containing
<http://www.example.com/downloads.php?page=xyz>
for every "xyz" that you wanted, use those as the starting URL's, and then
include the filters from my other post.

That should work if it is the case.  If you post the website URL that you are
trying to mirror, it would be easier to tell you exactly how to do it or if it
is not (easily) doable...


> You must either spider the site to get valid urls or guess:
> Number sequences:
> How to mirror only files/URLs using a certain ID/number range -
> <http://httrack.kauler.com/help/URL_number_sequences>

Nice resource Bill!  Thanks!!!
~     --B^p
 
Reply Create subthread


All articles

Subject Author Date
Get only the images from a certain path

11/24/2009 09:12
Re: Get only the images from a certain path

11/24/2009 09:13
Re: Get only the images from a certain path

11/24/2009 09:55
Re: Get only the images from a certain path

11/24/2009 16:36
Re: Wildcards in URL's

11/24/2009 18:18
Re: Get only the images from a certain path

11/24/2009 20:51
Re: Get only the images from a certain path

11/24/2009 23:26
Re: Get only the images from a certain path

11/25/2009 08:21
Re: Get only the images from a certain path

11/25/2009 15:04
Re: Get only the images from a certain path

11/25/2009 16:53
Re: Get only the images from a certain path

11/27/2009 18:34




b

Created with FORUM 2.0.11