Re: Need advanced scanning rules for tricky site

Subject: Re: Need advanced scanning rules for tricky site

Author: Jason

Date: 03/22/2013 01:25

For reference here is the files I am trying to get and the only way to access
them through several links

prefix.asite.com.au/a/b/c/[all-something, like a list]
prefix.asite.com.au/a/b/c/[a number]
asite.com.au/d/b/c/e/[name.pdf]    - These PDFs are what I want

[   ] indicates a directory that changes depending on which PDF I want to
access


> There is no magic here. What part of forbidden
> wasn't clear?It was clear to me, I was just saying for clarity. I know most
websites don't allow directory access anyway

> Either through the links or you guess the URLs:
> Number sequences: How to mirror only files/URLs
> using a certain ID/number range -
> <http://httrack.kauler.com/help/URL_number_sequences>

That may or may not help (I might try and implement). Problem is the links
with the numbers aren't the final destination. It doesn't help with the
prefix.asite.com.au and the asite.com.au change in domain.

> Filters do just that, they don't enable magic. If
> you want one file type: -* +*.html +*.XXX
And stop talking about magic. That's why I am asking, because the filters
don't 'do magic', that I know that this is definitely going to be tricky (as
said in the subject of this thread!)

I read somewhere for it to be able to get PDFs it will need the HTMLs anyway
that link to it (spider through these links)

I might as well reveal the website address, if this is still not clear enough

Jason

Create subthread

All articles

Subject	Author	Date
Need advanced scanning rules for tricky site		03/21/2013 13:28
Re: Need advanced scanning rules for tricky site		03/21/2013 14:15
Re: Need advanced scanning rules for tricky site		03/22/2013 01:25
Re: Need advanced scanning rules for tricky site		03/22/2013 01:53
Re: Need advanced scanning rules for tricky site		03/24/2013 08:12
Re: Need advanced scanning rules for tricky site		03/24/2013 14:34
Re: Need advanced scanning rules for tricky site		04/21/2013 13:39