| Actually I had set max depth to 3 and external depth to 0. The 3 because it is
enough to get all entries following the BLOG ARCHIVE on the right side, since
entries are ordered by Year,Month,Entries, and that is enough.
Yes, I noticed that .tmp are renamed to html (I posted while still
downloading).
Since you comment that, I guess that the extremely duplicated entried for
blogspot may be due to the "?ShowComment=" urls elaborated by HTTrack.
I think the problem is explained here:
<http://blogger-hits.blogspot.com/2009/06/remove-duplicate-content-because-of.html>
Added to the filters:
-*?showComment=*
-*.rar -*.zip -*.mov -*.mpg -*.mpeg -*.avi -*.asf -*.mp3 -*.mp2 -*.rm -*.wav
-*.vob -*.qt -*.vid -*.ac3 -*.wma -*.wmv
And can be added to the filter to remove:
-*?max-results=*
-*search?updated-min=*
-*search?updated-max=*
-*/feeds/*
-*?comments*
-*/comments
-*/search/label/* HTML page containing entries with that label. This always
contains duplicated entried.
This is an example of how to download a blogspot blog with HTTrack.
/*****************
Remove Duplicate content because of "showcomments=" links in Blogger
Posted On Friday, June 26, 2009 at at 5:44 AM by Abdelrahman Ellithy
Blogger makes a link for every comment on your blog that is for a post like :
<http://nogoomfm.blogspot.com/2009/05/blog-post_19.html>
may get urls as :
<http://nogoomfm.blogspot.com/2009/05/blog-post_19.html?showComment=1242753180000>
this is a major problem for famous massively commented posts and blogs,
Here is the way you can solve it using the rel="canonical" attribute
1- Log in to your blogger Dashboard.
2- Choose Layout ( edit HTML template) and Download the template as backup.
3- Find :
</head>
4- Put before it the following code :
<!-- 100fm6.com block duplicate content START -->
<b:if cond='data:blog.pageType == "item"'>
<link expr:href='data:blog.url' rel='canonical'/>
</b:if>
<!-- 100fm6.com block duplicate content END -->
5- Save your template.
Google uses re='canonical' hack in it blog too.
*****************/ | |