| > > I'm trying to get all the images from a site that
> > follow this structure:
> > <http://www.example.com/download_example.php?file=*.jpg>
> > Where * is a wildcard that varies in size.
>
> You can NOT use wildcards in urls. You MUST use a
> valid url in every call to a webserver.
I agree with that.
In my other post about the filter(s), the only way that would work is with the
correct starting page. If, for example,
<http://www.example.com/download_example.php?file=1.jpg>
contains a link to
<http://www.example.com/download_example.php?file=2.jpg>
and so on, each url linking to the next url until the list/site is exhausted,
then the filter may/should work using
<http://www.example.com/download_example.php?file=1.jpg>
as the starting URL.
If there are no direct links to
<http://www.example.com/download_example.php?file=anything_else.jpg>
on
<http://www.example.com/download_example.php?file=starting_url.jpg>
then it appears that the way I recommended to filter would still fail.
Alternately, if there exists pages which refer to the download_example.php
URL's, such as
<http://www.example.com/downloads.php?page=1>
then you would probably need to use something like the resources below to
generate the URL's containing
<http://www.example.com/downloads.php?page=xyz>
for every "xyz" that you wanted, use those as the starting URL's, and then
include the filters from my other post.
That should work if it is the case. If you post the website URL that you are
trying to mirror, it would be easier to tell you exactly how to do it or if it
is not (easily) doable...
> You must either spider the site to get valid urls or guess:
> Number sequences:
> How to mirror only files/URLs using a certain ID/number range -
> <http://httrack.kauler.com/help/URL_number_sequences>
Nice resource Bill! Thanks!!!
~ --B^p
| |