Re: More Logics and Funtions Expected - HTTrack Website Copier Forum

Subject: Re: More Logics and Funtions Expected

Author: Xavier Roche

Date: 08/26/2001 10:24

> 1) 	Site with session id.
> 	(This has been already mentioned by  Julio 
> and Peter Drakes.)
> 	It seems no solution by now.

Changing session IDs are really hard to handle, I will 
try to find a way to avoid this problem, but due to 
the post-load system, this isn't obvious (see below)

> 2)	In the option 'Expert Only' item 'Travel 
> Mode', the smallest scope 
> 	of downloading is 'Stay in same directory'. 
> 	If a site has the following paging 
> structure, like
> 	
> 	<http://www.aaaa.com/news.asp>                 
>        (2-1)
> 	
> 	<http://www.aaaa.com/history.asp>              
>        (2-2)
> 	
> 	<http://www.aaaa.com/product.asp>              
>        (2-3)
> 	This assumes that there are several sections 
> in that site. So, for 
> 	'history' section, there are serial pages 
> like:
> 		<http://www.aaaa.com/history.asp?>>; subid=0001          (2-4)
> 		<http://www.aaaa.com/history.asp?>>; subid=0002          (2-5)
> 		...
> 	
> 	My question is that, I only want 'history' 
> part including its main
> 	page and each content pages (2-2),(2-4), (2-
> 5),...
> 	By now, the only solution is by 
> using 'Exclude' in option tab 'Scan
> 	Rules' in which I need to put every link I 
> do not want, like
> 	excluding '*news*', '*product*'. If the site 
> has big amount of 
> 	sections, I will be crazy after I exclude 
> all others.
> 	
> 	Do you have any good solutions?
-* +www.aaaa.com/history.asp?* +*.gif +*.jpg +*.css 
+*.js
might do the trick?
> 3)	Use 2)'s example.
> 	How can I get only one page, like (2-2)?
To only get first level (types into the url list) 
pages:
-*

To get other ones:
-* +www.aaaa.com/history.asp?subid=0002

> 4)	Use 2)'s example.
> 	How can I download only pages like (2-4) and 
> (2-5) without
> 	downloading page (2-2)?> 	WinHTTrack does not have the logic:
> 	
> 	'Include' the link containing 'history.asp?> subid=',
> 	but will at the same time
> 	'Exclude' the links containing 'history.asp'

That is, -* +*history.asp?* -*history.asp*[] ?
> 5)	How to download the content from a linking 
> page?> 	E.g. if a page 
> <http://www.aaaa.com/article_links.html> has various
> 	external links to other sites for articles, 
> how can I download
> 	only other sites' articles without other 
> content of the linked
> 	sites? 

Maybe using external depth of 1 or 2, but this isn't a 
very good idea, as it will take all external links

> 6)	I found that if I use WinHTTrack to download 
> dynamic web pages
> 	(like asp, jsp, pl,... with dynamic server 
> behind), it actually
> 	parses the pages one by one -- very slow.

Use --assume option, that is, with 3.05, use "MIME" 
tab in options to force MIME types (will speed up the 
download!)

> 	It seems the setting of 'Number of 
> connections' in option 
> 	'Flow control' tab does not work in this 
> case. The problem is 
> 	that those each dynamic page's response time 
> is normally quite
> 	long, so the whole downloading of the 
> WinHTTrack from such 
> 	dynamic sites is very, very, very slow.

Yes, with asp pages WITHOUT mime definitions (the 
engine has to detect the file type each time)

> I hope the above mentioned can give you some ideas. 
If 
> WinHTTrack
> can get through with them, I think it would be the 
> real king of
> site downloaders.

Eheh :)

Create subthread

All articles

Subject	Author	Date
More Logics and Funtions Expected		08/18/2001 17:40
Re: More Logics and Funtions Expected		08/19/2001 20:18
Re: More Logics and Funtions Expected		08/26/2001 10:24