HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Quickly extract only valid urls from hts-log.txt ?
Author: Tom Baker
Date: 03/02/2011 22:19
 
> Quickly extract only valid urls from the
> "hts-log.txt" 
> file ?> 
> Hi,
> 
> First thank you for your great program. ;-)
> 
> I choosed several options in Winhttrack in order
> that 
> this program takes all the html files of a site and
> 
> put all the ".rm" files which are bigger than 500ko
> in 
> the "hts-log.txt" file. [Notes: The ".rm" files of
> the 
> site are around or bigger than 10 Mo and I have a
> 56k 
> bandwith.]
> I want to be able to add in a new text file, only
> all 
> the valid urls from the "hts-log.txt" file (urls 
> starting by "http://" and stopping by ".rm"). So I
> can 
> use this new text file in a program like flashget 
> where I can resume my downloads and choose by
> advance 
> which files I want to download.
> 
> Many thanks in advance,
> Pbram
> 
> Here are the options I used :
> Automatic copy  / Forget robot.txt
> 
> +*.css +*.js -ad.doubleclick.net/*
> +*.rm 
> -*.exe -*.gif -*.jpg
> -*.jpg
> 
> 

The file is already tab-delimited, so all you need to do is chop of the
top/bottom of the file and open in Excel.

From there you can use Excel auto-filter to find exactly what you want.

If you want to break things into more Excel columns, first open in a text
editor and add tabs where you want the field breaks to be.
 
Reply Create subthread


All articles

Subject Author Date
Quickly extract only valid urls from hts-log.txt ?

02/09/2002 16:16
Re: Quickly extract only valid urls from hts-log.txt ?

02/09/2002 16:45
Re: Quickly extract only valid urls from hts-log.txt ?

02/09/2002 17:12
Re: Quickly extract only valid urls from hts-log.txt ?

02/10/2002 17:40
Re: Quickly extract only valid urls from hts-log.txt ?

02/10/2002 22:33
Re: Quickly extract only valid urls from hts-log.txt ?

02/11/2002 21:06
Re: Quickly extract only valid urls from hts-log.txt ?

02/12/2002 05:20
Re: Quickly extract only valid urls from hts-log.txt ?

03/02/2011 22:19




d

Created with FORUM 2.0.11