| > Quickly extract only valid urls from the
> "hts-log.txt"
> file ?>
> Hi,
>
> First thank you for your great program. ;-)
>
> I choosed several options in Winhttrack in order
> that
> this program takes all the html files of a site and
>
> put all the ".rm" files which are bigger than 500ko
> in
> the "hts-log.txt" file. [Notes: The ".rm" files of
> the
> site are around or bigger than 10 Mo and I have a
> 56k
> bandwith.]
> I want to be able to add in a new text file, only
> all
> the valid urls from the "hts-log.txt" file (urls
> starting by "http://" and stopping by ".rm"). So I
> can
> use this new text file in a program like flashget
> where I can resume my downloads and choose by
> advance
> which files I want to download.
>
> Many thanks in advance,
> Pbram
>
> Here are the options I used :
> Automatic copy / Forget robot.txt
>
> +*.css +*.js -ad.doubleclick.net/*
> +*.rm
> -*.exe -*.gif -*.jpg
> -*.jpg
>
>
The file is already tab-delimited, so all you need to do is chop of the
top/bottom of the file and open in Excel.
From there you can use Excel auto-filter to find exactly what you want.
If you want to break things into more Excel columns, first open in a text
editor and add tabs where you want the field breaks to be. | |