HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: New feature suggestion: External Link Scan Rules
Author: Filer
Date: 07/25/2002 12:08
 
[long article, but I do believe there is some beef too]

> While those are no help, imagine there was another one like *[external] ...
then you might have filters like '+foo.com/*[external]*.zip' which would
include all ZIP files on links external to foo.com

Even this would help very much, because as of now there are no separate rules
for external links at all. An extra bonus would be if the *[external] tag
would accept a rules list after it, like

+foo.com/*[external][*.zip,*.html,*.php]

this would save a lot of space in the scan rules window, and finally address
the lack of secondary scan rules. The same list format could also save space
and be more easy to read like 

+foo.com/*[.zip,.html,.php]
would be equivalent to
+foo.com/*.zip
+foo.com/*.html
+foo.com/*.php
Much more powerful, yes?
Normally, getting and defining the external links creates very much trouble
when defining scan rules, because if one makes

-*
+www.thesite.com/*

this will practically shut out any "external" sites, and rampant +*.html use
will throw the scan out of intended bounds (with external depth 4 and +*.html
the scan engine has way too much time to go just about anywhere and totally
screw things up; with many external sites it is not even feasible to start
prohibiting the biggest mishits).

> Really the only thing that can't be done at the moment is the *.php->*.zip
idea

This could be very powerful if implemented with the "scan rule list" idea of
above, then one would command

+foo.com/*[external][*.php,*.cgi]->[*.jpg**[>20],*.png**[>100]]

from external links, check *.php,*.cgi only, from either of these get
resulting specified jpg,png only...clearly this would herald a new age in
engine functionality? Or maybe the "->" is not needed at all, just implement
the command in such a way that if two lists are present, then they are
interpreted as conditional statement relating to each other, like

+foo.com/*[external][*.php,*.cgi][*.jpg**[>20],*.png**[>100]]

In this way, users could finally nest the scan rules (which I suggested a
while back) in the fashion of

+foo.com/*[external][*.php][*.html][*.asp,*.cgi][*.jpg]

would finally get the resulting .jpg files after many levels of obscuring the
actual link!
 
Many unix programs are powerful exactly because of their powerful scripting
languages, slowly evolving more programmable scan rules for Httrack could be
very useful.
 
Reply Create subthread


All articles

Subject Author Date
New feature suggestion: External Link Scan Rules

07/24/2002 19:52
Re: New feature suggestion: External Link Scan Rules

07/25/2002 01:15
Re: New feature suggestion: External Link Scan Rules

07/25/2002 12:08
Re: New feature suggestion: External Link Scan Rules

07/26/2002 20:44




2

Created with FORUM 2.0.11