HTTrack Website Copier
Free software offline browser - FORUM
Subject: Need help with Scan Rules
Author: Bob
Date: 10/06/2009 00:10
 
Hello,

I've been trying to get my Scan Rules just right for about an hour now and I'm
having no luck.

There's a chain of galleries on a website called baka-images.com and I'm
trying to rip all of them. The URLs where the actual JPEG files are located
under:

<http://baka-images.com/forum/gallery>

An example of one JPEG is:

<http://baka-images.com/forum/gallery/16/453-051009100004.jpeg>

The directory "16" above in the URL can be any number starting from "0", so it
is non-deterministic. So, to keep it simple, I've made this my rule:

+*/forum/gallery/*.jpg

Furthermore, there are thumbnail images all prefixed with thumb_. So I have
this now:

-*thumb_*.jpg
+*/forum/gallery/*.jpg

However, this does not work. It downloads the thumbs anyway and everything
else (html, xml, everything). The actual URL I'm using to begin the search
is:

<http://baka-images.com/forum/index.php?action=gallery;cat=12>

I do this because the base URL for all of the galleries is inaccessible
directly and HTTrack cannot read it. The URL provided above is the actual
interface into the gallery.

How can I adjust the site rules so that I only get the JPG files? I don't want
the thumb_*.jpg files, and I don't want any HTML, XML, PHP, or other files.

Thanks in advance for your help.
 
Reply


All articles

Subject Author Date
Need help with Scan Rules

10/06/2009 00:10
Re: Need help with Scan Rules

10/06/2009 17:05
Re: Need help with Scan Rules

11/04/2009 13:54




2

Created with FORUM 2.0.11