HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Never ending mirroring
Author: Xavier Roche
Date: 10/12/2001 17:11
 
> Thanks for the reply,
> the mirror isn't ending as the total number of links 
> scanned is increasing all the time. Well its true 
that 
> I shall set a bigger number for bigger sites,but its 
> hard to gauge the depth as there will be missing 
> pictures or pages. 

Again, the BEST way is to detect why you get so many 
pages. Generally, with 'infinite' level, a mirror 
should be okay (because there isn't an infinite number 
of links in the website)

The problem here - I think - might be due to dynamic 
pages which generate dynamic links -always different-, 
and forcing the engine to catch new links forever.

Interrupt the mirror (after having downloaded many 
links), and look in the project folder, clicking on 
each folders to try to detect WHERE are all these 
numerous links. Then you will he able to fodbide this 
specific URL.

Example:
Imagine you mirrored www.foobar.com/smith/ in "Project 
1"

Go to C:\My Web Sites\Project 1

Go to www.foobar.com and click on "smith", and ask for 
properties. You may see a huge number of files. 
Go into this folder - if you see hundreds of files 
with almost same names (bar45F1.html, 
bar4587.html...), these files might be responsible for 
your problems. Else, in this subfolder, you will see 
several folders - check the number of files inside 
them, and go into the one which has the buggest number.
Redo these steps until you reach the "numerous" files 
with almost the same names.

At this point, you'll be in folder - say 
"Project 1\www.foobar.com\smith\cgi\pages\" and you'll 
see many files "barXXX.html"

Then, you're going to exclude all barXXX files: 
restart the project, and in Options/Filters, add:

-www.foobar.com/smith/cgi/pages/bar*

(note the / instead of \)

You may reach several paths to exclude, in this case 
add several rules.

 
Reply Create subthread


All articles

Subject Author Date
Never ending mirroring

10/12/2001 13:28
Re: Never ending mirroring

10/12/2001 14:17
Re: Never ending mirroring

10/12/2001 16:40
Re: Never ending mirroring

10/12/2001 17:11




9

Created with FORUM 2.0.11