| > ananda/lineage/jesus-christ4246.html
> ananda/lineage/jesus-christ.html.txt
> ananda/lineage/jesus-christd5aa.html
> ananda/lineage/jesus-christ.html/rwsmsh/index.html
Look at the page source and you'll see
class="addthis_button_facebook external"
addthis:url=http://www.ananda.org/ananda/lineage/jesus-christ.html/rwsmsh/
Facebook likewise share and email etc.
So if you have options -> links -> attempt to detect=checked that's where
rwsmsh comes from.
Because of that, since you are using site structure it has to create the
directory ...lineage/jesus-christ.html/rwsmsh/ since that is what the url
states. You beleive jesus-christ.html is a file, it's not, it's a directory.
look in 4246.html and you'll see:
Mirrored from www.ananda.org/ananda/lineage/jesus-christ.html?
utm_source=addthis%26utm_medium=socialmedia%26utm_campaign=Share which the log
says from from rwsmsh/ so parameters a also involved.
Add a filter -*/rwsmsh/* and you'll get what you want.
Also most links are absolute href="/image/..." type and since httrack only
goes down by default most of the links will not be retreived. Add a filter to
override:
+www.ananda.org/* -*/rwsmsh/* | |