| > In summary, I simply want to
> * crawl: site.com/a/*
> * download (and not parse): site.com/b/*
Because /a/index.html is your start URL it's going to be included in the
project no matter what, so we don't need a rule for that.
If what you want is only in the "b" directory then you can use:
-* +site.com/b/*
| |