| This time I tried to mirror gallery of websites from cssbeauty.com. Setted up
quite standard config - since I wanted only part od websites (not articles)
gallery, spider started from
<http://www.cssbeauty.com/archives/category/business/>. Optnions were: can go
down, can go outside domain (whole web - I wanted it to can get pages from
gallery), external deep 1 (only 1st page of every external website) .
Filters: -www.cssbeauty.com/* -cssbeauty.com/* (I don't want it to retrieve
whole cssbeauty, which is quite big site)
+www.cssbeauty.com/archives/category/* (I want it can go to every subcategory,
not only business. links look like
<http://www.cssbeauty.com/archives/category/CATEGORY/).+cssbeauty.com/archives/category/>*
(the same reason) +*.png +*.gif +*.jpg +*.css +*.js -ad.doubleclick.net/*
(standard).
Then... why is it downloading whole w3.org at the moment I am writing ?
www.w3.org dir has 6 mb at the moment and is still growing.
| |