| Actually I had set max depth to 3 and external depth to 0. The 3 because it is
enough to get all entries following the BLOG ARCHIVE on the right side, since
entries are ordered by Year,Month,Entries, and that is enough.
Yes, I noticed that .tmp are renamed to html (I posted while still
downloading).
Since you comment that, I guess that the extremely duplicated entried for
blogspot may be due to the "?ShowComment=" urls elaborated by HTTrack.
I think the problem is explained here:
<http://blogger-hits.blogspot.com/2009/06/remove-duplicate-content-because-of.html>
Added to the filters this, and the problem is solved:
-*?showComment=*
-*.rar -*.zip -*.mov -*.mpg -*.mpeg -*.avi -*.asf -*.mp3 -*.mp2 -*.rm -*.wav
-*.vob -*.qt -*.vid -*.ac3 -*.wma -*.wmv
And can be added to the filter to remove:
-*?max-results=*
-*search?updated-min=*
-*search?updated-max=*
-*/feeds/*
-*?comments*
-*/comments
-*/search/label/* HTML page containing entries with that label. This always
contains duplicated entried.
This is an example of how to download a blogspot blog with HTTrack. | |