HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Different query re: maximum file limit of v3.30
Author: Xavier Roche
Date: 10/25/2003 23:15
 
> I thank you, Mr. Roche,
> for bringing something very valuable to the world.

Thanks :)

>      Second thing: something completely unrelated to the
> topic. When I did a Google search to come back here to
> post in the forum, I ran across:
> <http://www.zylox.com/compare/httrack.php>

Oh yes, and they even bought some adwords: when you 
type "httrack", you have their nice ads and misleading 
comparison. I did not even take the time to respond to this 
editor, as they will probably ignore my complaint. This is 
not really important - people can freely (free of charge 
and freedom) test httrack, ask questions (I'm wondering if 
their support service is also serious, humph :) ) on the 
forum, and I hope they'll have their own idea.

> My problem is that I really only want the entries from
> the dictionaries.
> Thus, this means that just five or six dictionaries
> will completely fill the maximum limit of 10 million
> files within HTTrack.

Well, you can actually ask more to httrack, but I do not 
guarantee anything, and even 10,000,000 links is really big 
for the engine (hashtables will be a bit full, and the 
whole process may slow down and take some memory)

> All of the dictionary entry 
> pages are listed as, using the example from above:
> /index page/ENTRY.html?entry=##.######
> ...where '#' represents a numerical designation for
> each entry page.

>      I tried filtering using the inital option of 
> downloading like so:
> www.domainname.com/index page/(my authentication string)
> www.domainname.com/index page/ENTRY*.*

Err, is the authentication string places IN the filename?!
generally authentication is using, for example, http 
authentication, and thus you would have:

<http://myusername:mypassword@www.domainname.com/indexpage>

Also remember that you have to define a starting URL (the 
authenticated one), but the other URLs put among will not 
be seen as authenticated ones. The best solution is to 
define the starting URL, and then use FILTERS (Scan rules) 
to define precisely the mirror scope:

-* +www.domainname.com/indexpage/ENTRY*.*
or maybe better
-* +www.domainname.com/indexpage/ENTRY*.* +*.gif +*.jpg 
+*.png +*.css +*.js

to capture related styles and images.

 
Reply Create subthread


All articles

Subject Author Date
Different query re: maximum file limit of v3.30

10/25/2003 21:57
Re: Different query re: maximum file limit of v3.30

10/25/2003 23:15




b

Created with FORUM 2.0.11