Need help with Scan Rules - HTTrack Website Copier Forum

Subject: Need help with Scan Rules

Author: Bob

Date: 10/06/2009 00:10

Hello,

I've been trying to get my Scan Rules just right for about an hour now and I'm
having no luck.

There's a chain of galleries on a website called baka-images.com and I'm
trying to rip all of them. The URLs where the actual JPEG files are located
under:

<http://baka-images.com/forum/gallery>

An example of one JPEG is:

<http://baka-images.com/forum/gallery/16/453-051009100004.jpeg>

The directory "16" above in the URL can be any number starting from "0", so it
is non-deterministic. So, to keep it simple, I've made this my rule:

+*/forum/gallery/*.jpg

Furthermore, there are thumbnail images all prefixed with thumb_. So I have
this now:

-*thumb_*.jpg
+*/forum/gallery/*.jpg

However, this does not work. It downloads the thumbs anyway and everything
else (html, xml, everything). The actual URL I'm using to begin the search
is:

<http://baka-images.com/forum/index.php?action=gallery;cat=12>

I do this because the base URL for all of the galleries is inaccessible
directly and HTTrack cannot read it. The URL provided above is the actual
interface into the gallery.

How can I adjust the site rules so that I only get the JPG files? I don't want
the thumb_*.jpg files, and I don't want any HTML, XML, PHP, or other files.

Thanks in advance for your help.

All articles

Subject	Author	Date
Need help with Scan Rules		10/06/2009 00:10
Re: Need help with Scan Rules		10/06/2009 17:05
Re: Need help with Scan Rules		11/04/2009 13:54