External Link Limit Option - HTTrack Website Copier Forum

Subject: External Link Limit Option

Author: Tommy Rousse

Date: 11/07/2013 02:22

Hello,

Xavier, if you see this post, thank you very much for HTTrack.  It's a
tremendously useful tool that I regularly use in my academic work to save
fan-generated websites about early arcade games before they disappear.

I'm currently trying to push it into a slightly different use-scenario; based
on the documentation, it seems to be possible, but I'm not quite sure how some
of the options interact.

Here's what I would like to do: 1.) capture an entire website, plus one degree
of out-links (i.e. a complete mirror of a website, plus every URL which the
website links to). 2.) do the same, but only capture the list of URLs.

Here's what I have so far, using my personal website as an example:
httrack <http://www.ludist.com/> -O [path directory] -e -%e1 (for the full
capture)
&
httrack <http://www.ludist.com/> -O [path directory] -e -%e1 -p0 (for the URL
scan)

As I understand it, those options are:
-e = search the whole web, rather than just within the top-level domain
-%e1 = limit option; 1 degree of external links from top level domain
-p0 = switch to scan from capture

Is there an argument or something that the -%eN option requires?  The problem
I'm running into is that it's not limiting the crawling at all and trying to
download the entire internet.  I've tried it without -e, and it still was
trying to download the entire internet, oddly enough.

I saw someone else ask for similar help, and -near was suggested; based on the
documentation, I don't see how that would resolve the issue.

I deeply appreciate any help you can provide.  Thanks again for HTTrack!

Best,
Tommy Rousse

JD-PhD student
Northwestern University | Media, Technology and Society

All articles

Subject	Author	Date
External Link Limit Option		11/07/2013 02:22
Re: External Link Limit Option		11/09/2013 17:04