| > 1, when I want download jpg and jpeg will regexp
> '+*jpe*g'
> and for htm and html
> '*html*'
> work?
+*.jpg +*.jpeg +*.html
should do the trick
> 2, I want to download files ending with
> eg. 10, 20, 1003, 1289, 1345
> from URI like this:
> <http://foo.com/print.phtml?id=1234>
> is it possible to do it with one command or is
> necessary to use separate commands for each?> (and is possible to specify
range eg. 12-100
> or even mixed: 5, 8, 12-100, 128, 1006-1152 ?)
Ranges, no, but you can use:
+foo.com/print.phtml?id=*
> 3, on page eg.
>
> <http://root.cz/index.html>
> (are articles and discussions)
> (clanek = article)
> Art.1011 (http://root.cz/clanek.phtml?id=1011)
> Disc.1011
> <http://root.cz/forum/diskuse.php3?clanek=1011&>;
> vlakno=0&stav=0&vse=Zobrazit+v%B9e
>
> Art.1010 (http://root.cz/clanek.phtml?id=1010)
> Disc.1011
> <http://root.cz/forum/diskuse.php3?clanek=1010&>;
> vlakno=0&stav=0&vse=Zobrazit+v%B9e
> ('end' of index.html)
>
> their printer friendly version are
> articles
> <http://root.cz/print.phtml?id=1011>
> <http://root.cz/print.phtml?id=1010>
> ('print' instead of 'clanek')
>
> discussions
> <http://root.cz/forum/diskuse.php3?clanek=1011&>;
> vlakno=0&stav=0&vse=Zobrazit+v%B9e&print=1
>
> <http://root.cz/forum/diskuse.php3?clanek=1010&>;
> vlakno=0&stav=0&vse=Zobrazit+v%B9e&print=1
> (there is appended '&print=1' on end of the URI)
>
> but they aren't linked on index.html (but they are
> on Art.1011 (http://root.cz/clanek.phtml?id=1011))
> Is it possible to download it only the index.html
> file and only printer-friendly pages with images
> and other wanted datas?> I tried it - but no success - it was only possible
> to do it with downloading Art.10xy too.
> (Yes, there is probably a way - download index.html
> and with some bash scripting extract URI's , replace
> (probably with sed) clanek with print and feed it
> back to httrack - but it is hard way)
Wow, quite complex situation. If I understand, you
want to to capture URLs that aren't linked in the
index.html, but in deeper pages, without mirroring
these pages. This isn't possible - you have to include
pages which contains links to the pages you want to
download.
You can exclude links, however, like in:
+root.cz/forum/diskuse.php3*
-root.cz/forum/diskuse.php3?*print=1*
This filter will accept all diskuse.php3 links except
those with 'print=1' in the query string. By combining
several filters, you may manage to sharpen the scope
of the mirror
| |