HTTrack Website Copier
Free software offline browser - FORUM
Subject: Please check other post
Author: Bandit
Date: 11/25/2009 19:18
 
> Info:  Note: due to www.thecoverproject.net remote
> robots.txt rules, links begining with these path
> will be forbidden: /includes, /images (see in the
> options to disable this)

Javier,

Please check my other post:
<http://forum.httrack.com/readmsg/22376/22373/index.html>

Referring to that, please try:
1. set the filters ("scan rules"), suggested:
-* +*thecoverproject.net/view.php?cover_id=* 
+*thecoverproject.net/download_cover*.jpg 

2. change the build structure:
either %h%p/%n%[cover_id:.ID=:::].%t%[file:.:::]
or %h/%[cover_id:View.ID=:.html::]%[file::::]
(NOTE: this line:
     %h[cover_id:%n.%t.ID=:::][file:.:::]
was a copy/pasting error, sorry)

3a. set max mirroring depth to 2 (not critical)
3b. add the proper list of starting URL's
e.g. <http://www.thecoverproject.net/view.php?cover_id=1>
to however high "cover_id=###" you want to try to test with

I don't think you need to disable robots.txt, at least not as evidenced by the
screenshot you posted.

I think the "Content-Type: application/x-download" that the server is
reporting for download_cover.php is preventing HTT from properly naming the
file as a .jpg on its own.  I haven't found a way to work around that yet,
other than setting up the user defined structure.

HTH;
B^D

 
Reply Create subthread


All articles

Subject Author Date
Re: Get only the images from a certain path

11/25/2009 16:53
Please check other post

11/25/2009 19:18
Re: Please check other post

11/25/2009 22:03
Re: Please check other post

11/28/2009 00:16




0

Created with FORUM 2.0.11