HTTrack Website Copier
Free software offline browser - FORUM
Subject: WebHTTrack not staying on single site
Author: Lum
Date: 09/17/2014 23:46
 
I'm mirroring a forum and it has not once, but three times started to mirror
other forums that are linked in posts from the original site. I have it set to
stay on the same site. I have only multimedia files allowed in the scan rules
and the rest are negative filters to strip out posting, user profiles, etc...
But I've had to add those other sites to the filter list to stop them from
being mirrored too. I am not new to httrack, I've used it for years, and I
know how to set appropriate options. The only difference this time is the size
of what I'm mirroring (it's a large forum). I've had to keep a close eye on it
because it keeps picking up other sites and I can't figure out why.

My scan rules:
=========

+*.wmv +*.wma +*.ac3 +*.vid +*.swf +*.qt +*.mkv +*.vob +*.wav +*.rm +*.mp2
+*.mp3 +*.asf +*.avi +*.mpeg +*.mpg +*.mov +*.7z +*.exe +*.rar +*.gz +*.tgz
+*.tar +*.zip +*.bmp +*.tif +*.ico +*.jpeg +*.ps +*.txt +*.rtf +*.docx +*.doc
+*.pdf +*.tex +*.odt +*.png +*.gif +*.jpg +*.css +*.js -ad.doubleclick.net/*
-*action=viewprofile* -*action=notify* -*action=sendtopic* -*action=print*
-*action=imsend* -*action=post* -*action=modify* -*action=im* -*action=search*
-*action=profile* -*action=mlall* -*action=shownotify* -*action=logout*
-*action=markasread* -*action=register* -*action=login* -*action=reminder*
-*action=recent* -*action=markallasread* -*board=cfnm* -*start=*[0-9]#*[0-9]*
-*start=*[] -*start=0*[] -*/help/* -*.justusboys.com/* -*.thesuperficial.com/*
-truth.darering.com/* 

=========

the last three I had to add to stop those sites being mirrored. I caught each
at about 50-100+ pages into copying those sites.

Full command-line
=========

webhttrack -q -%i -iC1
<http://board.[website].com/vsBoard/cgi-bin/yabb/YaBB.cgi> -O
"/home/[user]/websites/[Site Name]" -n -%P -p7 -N0 -s2 -x -%x -%q -X0 -p7 -D
-a -K0 -c1 -%k -A25000 -F "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)"
-%F "<!-- Mirrored from %s%s by HTTrack Website Copier/3.x [XR&CO'2013], %s
-->" +*.wmv +*.wma +*.ac3 +*.vid +*.swf +*.qt +*.mkv +*.vob +*.wav +*.rm
+*.mp2 +*.mp3 +*.asf +*.avi +*.mpeg +*.mpg +*.mov +*.7z +*.exe +*.rar +*.gz
+*.tgz +*.tar +*.zip +*.bmp +*.tif +*.ico +*.jpeg +*.ps +*.txt +*.rtf +*.docx
+*.doc +*.pdf +*.tex +*.odt +*.png +*.gif +*.jpg +*.css +*.js
-ad.doubleclick.net/* -*action=viewprofile* -*action=notify*
-*action=sendtopic* -*action=print* -*action=imsend* -*action=post*
-*action=modify* -*action=im* -*action=search* -*action=profile*
-*action=mlall* -*action=shownotify* -*action=logout* -*action=markasread*
-*action=register* -*action=login* -*action=reminder* -*action=recent*
-*action=markallasread* -*board=cfnm* -*start=*[0-9]#*[0-9]* -*start=*[]
-*start=0*[] -*/help/* -*.justusboys.com/* -*.thesuperficial.com/*
-truth.darering.com/* -%s -%u -f2

=========

Any help would be appreciated as I have a few more sites I want to mirror,
which are forums that have fallen into disuse and might disappear. Thanks.
 
Reply


All articles

Subject Author Date
WebHTTrack not staying on single site

09/17/2014 23:46




0

Created with FORUM 2.0.11