| > > I use httrack for automated checking of links
> > of large webites. However, I today ran into a
> > problem wich seems to be related to PHP scripts
> > used to download files. A typical URL wich seems
> > to make problems is:
> > ...../docroot/_dynamicscript/pdfdownload.php?file=%
> > 2Fnews1032939400%2asdasd.pdf
> > The result is, that httrack does not stop mirroring
> > or checking this site until I kill it.
>
> Err, that is, you have MANY urls generated? Or the
parsing
> is very slow? Did you try to skip these files (-
> *pdfdownload.php*) ? I didn't checked, but loops are a
> common problems in dynamically generated links.
hi xavier,
at first, I have to thank you for creating and *supporting*
this great piece of software!
the download is very fast (the server stands next to
me :) ), but httrack runs into a loop, although the
dynamically created links are unique and do not change. I
tried to exclude -*#, as this calls the javascript which
opens the links. I also tried -*pdf, but all without
success. httrack saves the pdfs as html files, so I played
around with the mime types, but it does not help also.
On what level checks httrack if the file already has been
downloaded - on filesystem base or on link base? I compared
the behaviour of XENU Link Sleuth, which does not run into
a loop. But I personally prefer your tool which has much
more options and runs under linux :)
--martin.
| |