HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Unnecessary Wikimedia files copied with the site
Author: Matt
Date: 07/23/2017 21:30
>Please explain me what's the reason of
>this strange behaviour of your program.

Your not setting it up properly and just relying on a start page, is forcing
the program to ASSUME hundreds of things, and so wow, in your case, with that
specific site it didn't guess exactly what you wanted. (RTFM) The Default
assumptions are for very simple site's or very small subsections of simpler
sites. It just didn't work for you.

But here is quick rundown.

First don't include the protocol
So your start page should be just be

It should be ONLY one, adding the second made HTTrack do a whole host of other

Go to Set_Options -> Scan_Rules  
Delete everything there (those are the Assumptions)

First add the line


This tells it to reject Everything from everywhere, including Wikipedia

Then add rules to allow the content you want**

Now it will only get files that are in those two URL ''directories'' or lower

You may need to add more include filters (+stuff) if you find stuff missing.

Since your only issue is the multiple Wiki links (commons, media, Wikipedia), 
You could leave all as is and just add a rule/filter to block any wiki based


Your site downloaded for me as you orgianly did it with only the added filter
-*wiki*, in about 2 hours

It also grabbed

but each of those were less then 1MB 

Hope this get you on your way

Reply Create subthread

All articles

Subject Author Date
Unnecessary Wikimedia files copied with the site

07/22/2017 23:31
Re: Unnecessary Wikimedia files copied with the site

07/23/2017 21:30


Created with FORUM 2.0.11