HTTrack Website Copier
Free software offline browser - FORUM
Subject: Feature Req: '++' in filters to ignore structure
Author: Rob
Date: 12/05/2014 13:47
 
Hi All,

I think this would be a useful feature to allow external files are stored in
the base structure of the scrape.

E.g. I want to scrape a forum (mainly for the pictures) and store it in the
base structure for later cataloging.

The base site is www.somesite.com/forum/subforum4/

within that are various posts:

www.somesite.com/forum/subforum4/subject1.html
www.somesite.com/forum/subforum4/subject2.html
www.somesite.com/forum/subforum4/subject3.html
www.somesite.com/forum/subforum4/subject4.html

Which contain images hosted on various hosting sites:

www.somehost.com/get_image.php?image=12345
www.pichost.com/a/b/c/d/somepic.jpg

etc.

I'd like the ability to specify in the filters as follows,

-*
+www.somesite.com/forum/area1/subject*.html
++www.somehost.com/get_image.php?image=*
++www.pichost.com/*/*/*/*/*.jpg

which would result in the files being saved like so:

www.somehost.com/forum/subforum4/subject23/abcde.jpg
www.somehost.com/forum/subforum4/subject23/get_image_9f34.jpg

..etc.

Is this implementable? :)
 
Reply


All articles

Subject Author Date
Feature Req: '++' in filters to ignore structure

12/05/2014 13:47
Re: Feature Req: '++' in filters to ignore structure

12/10/2014 12:27




d

Created with FORUM 2.0.11