| I am writing a script whereby a user can enter a URL to
a forum thread and then httrack will capture that thread.
At this particular forum, each thread is available in two
styles:
'Normal' view:
<http://www.somesite.com/thread-topic.html> (First page of thread)
<http://www.somesite.com/thread-topic-2.html> (Second page of thread)
<http://www.somesite.com/thread-topic-3.html> (Third page of thread)
'Printable' view:
Same thread above, URL's become:
<http://www.somesite.com/thread-topic-print.html> (First page of thread)
<http://www.somesite.com/thread-topic-print-2.html> (Second page of thread)
<http://www.somesite.com/thread-topic-print-3.html> (Third page of thread)
Currently in my script I've got it working such that
as long as the 'normal' view of a thread is required
the capture is working as per the following call to
httrack:
httrack.exe <http://www.somesite.com/thread-topic.html>
-N1 -O "c:\site" -*
+http://www.somesite.com/thread-topic.html*.html
-http://www.somesite.com/thread-topic.html-print*.html
But if a user instead wanted to capture the 'printable' version of the thread
the above obviously won't work.
Is there some sort of regular expression that I can specify so that regardless
of whether a user chooses the 'Normal' or 'Printable' view of a thread, just
the relevant pages are retrieved?
Thanks | |