| I started to use WinHTTrack one year ago. I recon that
WinHTTrack is
the best of the site-downloader by now.
But I am wondering how to handle the following
situations:
1) Site with session id.
(This has been already mentioned by Julio
and Peter Drakes.)
It seems no solution by now.
2) In the option "Expert Only" item "Travel
Mode", the smallest scope
of downloading is "Stay in same directory".
If a site has the following paging
structure, like
<http://www.aaaa.com/news.asp>
(2-1)
<http://www.aaaa.com/history.asp>
(2-2)
<http://www.aaaa.com/product.asp>
(2-3)
This assumes that there are several sections
in that site. So, for
"history" section, there are serial pages
like:
<http://www.aaaa.com/history.asp?subid=0001> (2-4)
<http://www.aaaa.com/history.asp?subid=0002> (2-5)
...
My question is that, I only want "history"
part including its main
page and each content pages (2-2),(2-4), (2-
5),...
By now, the only solution is by
using "Exclude" in option tab "Scan
Rules" in which I need to put every link I
do not want, like
excluding "*news*", "*product*". If the site
has big amount of
sections, I will be crazy after I exclude
all others.
Do you have any good solutions?
3) Use 2)'s example.
How can I get only one page, like (2-2)? The current solution seems to use
"Maximum
mirroring depth" and
set it to be the value of 2. (value 1 for
only html file without
pictures).
What's the logic in this situation? Why it does not work for downloading 2
levels of pages by using
"Maximun mirroring depth" value of 4? (As it
is reasonable for
guessing the logic that one level on value 2
then two levels on
value 4)
4) Use 2)'s example.
How can I download only pages like (2-4) and
(2-5) without
downloading page (2-2)? WinHTTrack does not have the logic:
"Include" the link containing "history.asp?subid=",
but will at the same time
"Exclude" the links containing "history.asp"
How can it be improved?
Currently (version 3.04), if I use the above
logic, the result
will return NO pages.
5) How to download the content from a linking
page? E.g. if a page
<http://www.aaaa.com/article_links.html> has various
external links to other sites for articles,
how can I download
only other sites' articles without other
content of the linked
sites?
I don't think by using "Maximun mirroring
depth" is a good
solution.
6) I found that if I use WinHTTrack to download
dynamic web pages
(like asp, jsp, pl,... with dynamic server
behind), it actually
parses the pages one by one -- very slow.
It seems the setting of "Number of
connections" in option
"Flow control" tab does not work in this
case. The problem is
that those each dynamic page's response time
is normally quite
long, so the whole downloading of the
WinHTTrack from such
dynamic sites is very, very, very slow.
I hope the above mentioned can give you some ideas. If
WinHTTrack
can get through with them, I think it would be the
real king of
site downloaders.
Alan
| |