| > I thank you, Mr. Roche,
> for bringing something very valuable to the world.
Thanks :)
> Second thing: something completely unrelated to the
> topic. When I did a Google search to come back here to
> post in the forum, I ran across:
> <http://www.zylox.com/compare/httrack.php>
Oh yes, and they even bought some adwords: when you
type "httrack", you have their nice ads and misleading
comparison. I did not even take the time to respond to this
editor, as they will probably ignore my complaint. This is
not really important - people can freely (free of charge
and freedom) test httrack, ask questions (I'm wondering if
their support service is also serious, humph :) ) on the
forum, and I hope they'll have their own idea.
> My problem is that I really only want the entries from
> the dictionaries.
> Thus, this means that just five or six dictionaries
> will completely fill the maximum limit of 10 million
> files within HTTrack.
Well, you can actually ask more to httrack, but I do not
guarantee anything, and even 10,000,000 links is really big
for the engine (hashtables will be a bit full, and the
whole process may slow down and take some memory)
> All of the dictionary entry
> pages are listed as, using the example from above:
> /index page/ENTRY.html?entry=##.######
> ...where '#' represents a numerical designation for
> each entry page.
> I tried filtering using the inital option of
> downloading like so:
> www.domainname.com/index page/(my authentication string)
> www.domainname.com/index page/ENTRY*.*
Err, is the authentication string places IN the filename?!
generally authentication is using, for example, http
authentication, and thus you would have:
<http://myusername:mypassword@www.domainname.com/indexpage>
Also remember that you have to define a starting URL (the
authenticated one), but the other URLs put among will not
be seen as authenticated ones. The best solution is to
define the starting URL, and then use FILTERS (Scan rules)
to define precisely the mirror scope:
-* +www.domainname.com/indexpage/ENTRY*.*
or maybe better
-* +www.domainname.com/indexpage/ENTRY*.* +*.gif +*.jpg
+*.png +*.css +*.js
to capture related styles and images.
| |