More Logics and Funtions Expected - HTTrack Website Copier Forum

Subject: More Logics and Funtions Expected

Author: Alan Yu

Date: 08/18/2001 17:40

I started to use WinHTTrack one year ago. I recon that 
WinHTTrack is 
the best of the site-downloader by now.



But I am wondering how to handle the following 
situations:

1) 	Site with session id.
	(This has been already mentioned by  Julio 
and Peter Drakes.)
	It seems no solution by now.
2)	In the option "Expert Only" item "Travel 
Mode", the smallest scope 
	of downloading is "Stay in same directory". 
	If a site has the following paging 
structure, like
	
	<http://www.aaaa.com/news.asp>                 
       (2-1)
	
	<http://www.aaaa.com/history.asp>              
       (2-2)
	
	<http://www.aaaa.com/product.asp>              
       (2-3)
	This assumes that there are several sections 
in that site. So, for 
	"history" section, there are serial pages 
like:
		<http://www.aaaa.com/history.asp?subid=0001>          (2-4)
		<http://www.aaaa.com/history.asp?subid=0002>          (2-5)
		...
	
	My question is that, I only want "history" 
part including its main
	page and each content pages (2-2),(2-4), (2-
5),...
	By now, the only solution is by 
using "Exclude" in option tab "Scan
	Rules" in which I need to put every link I 
do not want, like
	excluding "*news*", "*product*". If the site 
has big amount of 
	sections, I will be crazy after I exclude 
all others.
	
	Do you have any good solutions?
3)	Use 2)'s example.
	How can I get only one page, like (2-2)?	The current solution seems to use
"Maximum 
mirroring depth" and
	set it to be the value of 2. (value 1 for 
only html file without
	pictures). 
	
	What's the logic in this situation?	Why it does not work for downloading 2 
levels of pages by using
	"Maximun mirroring depth" value of 4? (As it 
is reasonable for
	guessing the logic that one level on value 2 
then two levels on
	value 4)

4)	Use 2)'s example.
	How can I download only pages like (2-4) and 
(2-5) without
	downloading page (2-2)?	WinHTTrack does not have the logic:
	
	"Include" the link containing "history.asp?subid=",
	but will at the same time
	"Exclude" the links containing "history.asp"
	
	How can it be improved?
	Currently (version 3.04), if I use the above 
logic, the result
	will return NO pages.

5)	How to download the content from a linking 
page?	E.g. if a page 
<http://www.aaaa.com/article_links.html> has various
	external links to other sites for articles, 
how can I download
	only other sites' articles without other 
content of the linked
	sites? 
	
	I don't think by using "Maximun mirroring 
depth" is a good 
	solution.

6)	I found that if I use WinHTTrack to download 
dynamic web pages
	(like asp, jsp, pl,... with dynamic server 
behind), it actually
	parses the pages one by one -- very slow.
	
	It seems the setting of "Number of 
connections" in option 
	"Flow control" tab does not work in this 
case. The problem is 
	that those each dynamic page's response time 
is normally quite
	long, so the whole downloading of the 
WinHTTrack from such 
	dynamic sites is very, very, very slow.
	


I hope the above mentioned can give you some ideas. If 
WinHTTrack
can get through with them, I think it would be the 
real king of
site downloaders.


Alan

All articles

Subject	Author	Date
More Logics and Funtions Expected		08/18/2001 17:40
Re: More Logics and Funtions Expected		08/19/2001 20:18
Re: More Logics and Funtions Expected		08/26/2001 10:24