HTTrack Website Copier
Free software offline browser - FORUM
Subject: How to log or show scanned links?
Author: Sergei Kulagin
Date: 12/09/2020 20:32
I'm downloading a website with a wiki and a forum <>

It has a a calendar: <> . Which
has infinite number of pages. At the start I didn't add it to the filters, it
would be scanning infinitely. I was getting calendar pages for 2028.

Now I added -*/event.php* to the filter list. But I could be missing something
else because it already downloaded 60000+ pages again and still isn't

It would help tremendously if I could get a log of visited links: get the most
visited files such as <>,
which would mostly be showthread.php with argument ?threadid=134327, but it
would be nice to see the number of showthread.php links it visited. But then
there would be pages I don't want to see. And the infinite ones would most
probably be at the top. I'd see them very quickly and add them to the filter
with appropriate arguments.

I'd also like to see the most popular arguments(queries) of files. Statistics
of most popular queries would also help a lot to spot infinites and just very
popular links I don't want to archive like "Recent changes" pages on every
single wiki page:
. Every Recent Changes page(which always starts with
wikimedia/index.php?title=Special:RecentChangesLinked) on the wiki has at
least 640 combinations because of all of these buttons:
<> . Which I did follow to the "no follow" filter
as well.

But I'm sure I missed something and it will still scan infinitely and
statistics of scanned and visited pages, as well as statistics of queries
would help to make all the needed filters to archive only important pages and
avoid infinites.

Is there a way to show all visited pages or log it to a file in httrack?

All articles

Subject Author Date
How to log or show scanned links?

12/09/2020 20:32
Re: How to log or show scanned links?

12/11/2020 23:35
Re: How to log or show scanned links?

12/24/2020 10:09


Created with FORUM 2.0.11