| There is a website that I would like to batch download some images from
specific pages (usually between 3-5 images per page, sometimes between 1-7
images per page). I can point out the specific pages in the "Web Addresses"
box of HTTrack, so that solves one of the problems. The second other problem
is how to filter out the images. They are from the sub-domain of another
domain than the website they are displayed on and they all have a specific
format which is ****_word.jpg
**** = 0-9 x4
word is a replacement and not the actual word the filenames uses.
So, for example:
I want to extract img.notexample.com/images/0172_word.jpg,
img.notexample.com/images/0173_word.jpg and
img.notexample.com/images/0174_word.jpg from
example.com/category/subcategory/page100.html
I also want the directories of the ripped pages to be named after the page
URLs.
Even better would be if there was a way to have the folders named after an
individual part of the url. I.e. "page100" instead of
"example.com/category/subcategory/page100.html".
Concrete examples can be provided, but I won't post them in public. I can
however PM or e-mail them to anyone who is kind enough to help me with this. | |