Re: Problems with dynamic file downloads - HTTrack Website Copier Forum

Subject: Re: Problems with dynamic file downloads

Author: martin

Date: 11/19/2002 09:43

> > I use httrack for automated checking of links
> > of large webites. However, I today ran into a
> > problem wich seems to be related to PHP scripts
> > used to download files. A typical URL wich seems
> > to make problems is:
> > ...../docroot/_dynamicscript/pdfdownload.php?file=%
> > 2Fnews1032939400%2asdasd.pdf
> > The result is, that httrack does not stop mirroring
> > or checking this site until I kill it.
> 
> Err, that is, you have MANY urls generated? Or the 
parsing 
> is very slow? Did you try to skip these files (-
> *pdfdownload.php*) ? I didn't checked, but loops are a 
> common problems in dynamically generated links.

hi xavier,

at first, I have to thank you for creating and *supporting* 
this great piece of software!

the download is very fast (the server stands next to 
me :) ), but httrack runs into a loop, although the 
dynamically created links are unique and do not change. I 
tried to exclude -*#, as this calls the javascript which 
opens the links. I also tried -*pdf, but all without 
success. httrack saves the pdfs as html files, so I played 
around with the mime types, but it does not help also.

On what level checks httrack if the file already has been 
downloaded - on filesystem base or on link base? I compared 
the behaviour of XENU Link Sleuth, which does not run into 
a loop. But I personally prefer your tool which has much 
more options and runs under linux :)

--martin.

Create subthread

All articles

Subject	Author	Date
Problems with dynamic file downloads		11/18/2002 15:58
Re: Problems with dynamic file downloads		11/18/2002 18:07
Re: Problems with dynamic file downloads		11/19/2002 07:39
Re: Problems with dynamic file downloads		11/19/2002 09:43
Re: Problems with dynamic file downloads		11/19/2002 10:00
Re: Problems with dynamic file downloads		11/19/2002 20:25
Re: Problems with dynamic file downloads		11/22/2002 10:05