| Hi :)
- Many thanks to all involved for this handy application.
- Using WinHTTrack Website Copier 3.41-2 w/ Firefox 1.5.0.11, in Windows XP
-> I am looking to download a very large message board thread (830 pages!), as
opposed to the entire website, and have hit a snag.
1) In the URL box of WinWC, I'm entering the url of pg 1 out of 830. Is that
right?
Does it understand to download that thread and only that thread or will it
continue into other areas of the board following page 830?
2) Following a short test dl lasting a couple minutes, I've noticed that it's
grabbing all page elements nicely *except*, it's only showing hyperlinked
member-posted images (Imageshack, Photobucket, etc) and *not* displaying
"attached" images, that is, images that can be directly opened from the hard
drive & hosted directly on the message board.
3) After looking through the FAQ, I narrowed the issue down to the message
board's robots.txt rules and found that one of the rules, "Disallow:
/forum/attachment.php" appears to be the culprit.
The images in question have a url that resembles this:
"example.com/forum/attachment.php?s=31f4cfd95b6a45d1fa25ccda9fecda39&postid=2435642"
(everything above is valid except "example".)*
4) I'm awaiting word from the site admin as to whether he'll allow not only
the entire thread dl but the robots rule bypass as well. Until then...
5) Please instruct me on how to tweak WinWC in order to dl the entire thread
and how to bypass that Robots rule (and only that rule, if possible, as
opposed to the entire 30+ rules found in his robots.txt) in order to ensure
user posted images will show up.
Many thanks in advance for any help! :D
* I'm unfamiliar with the "'rel=nofollow' antispam attribute" - is it
acceptable to just remove any hypertext as I've done? If not, where to I
insert the phrase 'rel=nofollow'? | |