| Hello,
I've been trying to get my Scan Rules just right for about an hour now and I'm
having no luck.
There's a chain of galleries on a website called baka-images.com and I'm
trying to rip all of them. The URLs where the actual JPEG files are located
under:
<http://baka-images.com/forum/gallery>
An example of one JPEG is:
<http://baka-images.com/forum/gallery/16/453-051009100004.jpeg>
The directory "16" above in the URL can be any number starting from "0", so it
is non-deterministic. So, to keep it simple, I've made this my rule:
+*/forum/gallery/*.jpg
Furthermore, there are thumbnail images all prefixed with thumb_. So I have
this now:
-*thumb_*.jpg
+*/forum/gallery/*.jpg
However, this does not work. It downloads the thumbs anyway and everything
else (html, xml, everything). The actual URL I'm using to begin the search
is:
<http://baka-images.com/forum/index.php?action=gallery;cat=12>
I do this because the base URL for all of the galleries is inaccessible
directly and HTTrack cannot read it. The URL provided above is the actual
interface into the gallery.
How can I adjust the site rules so that I only get the JPG files? I don't want
the thumb_*.jpg files, and I don't want any HTML, XML, PHP, or other files.
Thanks in advance for your help. | |