| > I would like to define the scope of download in the url,
> so that
> foo.com would download any url matching foo.com*
This is the case, by default (note that "would download"
should be replaced by "would be authorized to download all
urls that match.." asq you can't be sure that you'll
encounter such links during the crawl)
> foo.com/dir/ would download foo.com/dir/*
Same as above
> foo.com/dir/bar.html would download just bar.html
Ah, this is not the default behaviour (foo.com/dir/* will
be authorized too)
> but the source code src/htsalias.c says all the stay
options
> (stay-on-same-dir -S, can-go-down -D etc) are depracated.
These ones are deprecated because httrack is now usng
filters (scan rules) which are much more powerful
> How can I limit the fetching scope?
Options / Scan rules
-www.example.com/*
or even thinks like
-* +www.example.com/whatIwantoToget/* +www.example2.com/*
+*.gif
> I would like to have a visible header (footer) on each
> page such as 'foo.com/bar 2003-07-07' where foo.com/bar is
> a link to that address. It should be just after the body
> tag to keep the html valid. (And maybe a perl one-liner
> script on the documentation to strip out these comments if
> needed).
Humm, you can do that with the footer option, but they
won't be put after the body tag..
But this can be easily done with a 1 line script :)
find myproject -type f -name "*.html" -exec sh -c "cat {} |
sed -e 's/\(<body[^<>]*>\)/\1hello world<br>/'>_tmp && mv -
f _tmp {}" \;
> I want httrack to build me a 'partial copy of internet'
> on my hard disk, so that
> - everything goes under ~/websites
> - no project folders, instead <http://foo.com/zaa.html>
> goes to ~/websites/http/foo.com/zaa.html
> (and not ~/websites/foo.com/foo.com/zaa.html)
~/websites/foo.com/zaa.html ?
In /etc/httrack.conf :
set path ~/websites/#
set structure 1003
> - if a page links to a site fetched earlier, link
> would automatically be converted to a link
> to local copy of that site (even when they
> belong to different projects.
Err.. this one would be much complex to implement (all
websites structure should be parsed and stored in memory
for lookup purpose, which is quite a pain)
| |