HTTrack Website Copier
Free software offline browser - FORUM
Subject: Scan rules problem - mime types
Author: Mihaela
Date: 09/12/2006 16:02

I have a basically simple problem, but it's been giving me headaches for days

I have a bunch of websites I have to crawl and I only want to download the
text, no binary stuff. First I've tried this:

-* +*htm* +*cgi* +*asp* +*php* +*jsp* +*xml* +*dhm* +*xhtm*

And it works fine, except that content files can have any extension, even no
extension. So, now I'm trying this:

-u1 -mime:*/* +mime:text/html

It's great, but I get a lot of ".delayed" files for all the
images/pdf/zip/.... files and I really don't want them.

Please advice me.

PS: I'm using v3.4

Best regards!

All articles

Subject Author Date
Scan rules problem - mime types

09/12/2006 16:02
Re: Scan rules problem - mime types

09/17/2006 18:19


Created with FORUM 2.0.11