Re: stay depracated, visible header, save structure

Subject: Re: stay depracated, visible header, save structure

Author: Xavier Roche

Date: 07/07/2003 20:53

> I would like to define the scope of download in the url,
> so that
>         foo.com would download any url matching foo.com*

This is the case, by default (note that "would download" 
should be replaced by "would be authorized to download all 
urls that match.." asq you can't be sure that you'll 
encounter such links during the crawl)

>         foo.com/dir/ would download foo.com/dir/*

Same as above

>         foo.com/dir/bar.html would download just bar.html

Ah, this is not the default behaviour (foo.com/dir/* will 
be authorized too)

> but the source code src/htsalias.c says all the stay 
options
> (stay-on-same-dir -S, can-go-down -D etc) are depracated.

These ones are deprecated because httrack is now usng 
filters (scan rules) which are much more powerful

> How can I limit the fetching scope?
Options / Scan rules
-www.example.com/*
or even thinks like
-* +www.example.com/whatIwantoToget/* +www.example2.com/* 
+*.gif

> I would like to have a visible header (footer) on each
> page such as 'foo.com/bar 2003-07-07' where foo.com/bar is
> a link to that address. It should be just after the body 
> tag to keep the html valid. (And maybe a perl one-liner
> script on the documentation to strip out these comments if
> needed).

Humm, you can do that with the footer option, but they 
won't be put after the body tag..

But this can be easily done with a 1 line script :)

find myproject -type f -name "*.html" -exec sh -c "cat {} | 
sed -e 's/\(<body[^<>]*>\)/\1hello world<br>/'>_tmp && mv -
f _tmp {}" \;

> I want httrack to build me a 'partial copy of internet'
> on my hard disk, so that
> - everything goes under ~/websites
> - no project folders, instead <http://foo.com/zaa.html>
> goes to ~/websites/http/foo.com/zaa.html
> (and not ~/websites/foo.com/foo.com/zaa.html)

~/websites/foo.com/zaa.html ?
In /etc/httrack.conf :
set path ~/websites/#
set structure 1003

> - if a page links to a site fetched earlier, link
> would automatically be converted to a link
> to local copy of that site (even when they
> belong to different projects.

Err.. this one would be much complex to implement (all 
websites structure should be parsed and stored in memory 
for lookup purpose, which is quite a pain)

Create subthread

All articles

Subject	Author	Date
stay depracated, visible header, save structure		07/07/2003 14:45
Re: stay depracated, visible header, save structure		07/07/2003 20:53
Re: .. save structure		07/08/2003 10:18
Re: .. save structure		07/08/2003 19:18