Stopping crawl in specific directory - HTTrack Website Copier Forum

Subject: Stopping crawl in specific directory

Author: Sparklepaws

Date: 03/28/2018 13:40

Hey guys. These days with the rise of specially-formatted websites it's hard to
find a "normal" site ending in a clear format (ie .htm, .html). Instead most
URLs are composed in a way that leaves them open, such as
www.foo.com/bar/foobar/. An actual example of a site done this way is Reddit.

This presents a tough issue for HTTrack since it needs a filetype to confirm
the download, otherwise you have to crawl everything. For example:

+www.foo.com/bar/foobar/* 
(Turns any loose pages into index.html files, which is good, but it also
crawls DEEP).

+www.foo.com/bar/foobar/*.html
(Doesn't work because technically the page isn't an htm, html or shtml file).


Is there any way besides setting an External Depth to stop HTTrack from
crawling beyond a certain point in a path?

All articles

Subject	Author	Date
Stopping crawl in specific directory		03/28/2018 13:40
Re: Stopping crawl in specific directory		03/28/2018 13:48
Re: Stopping crawl in specific directory		04/25/2018 12:06
Re: Stopping crawl in specific directory		04/25/2018 12:08