HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Downloading files form one site only
Author: Matt
Date: 09/04/2017 18:26
 
Ok, I should just say RTFM, but  here is the real answer

Your scan rules are 
+*[name].*[name]tpg.ch*[name].*[name]/*
-*[name].*[name].com*[name].*[name]/
-*[name].*[name].net*[name].*[name]/*

First problem: you don't want every thing so your first FILTER line should be
-*

that block everything! All Page4s All Sites!

then we add stuff we do want
+www.tpg.ch/documents/*


Ok as for the <http://www.tpg.ch/fr/horaires> page, I only get to see the
"International version" and on that version of the page all the colored
numbers are hard links to the pdf (No PHP)
"/documents/10162/16057571/tpg_ligne_18-11dec2016.pdf"  for the 18 line

But going on your description we should add
+www.tpg.ch/html/pdf/*



You must also add all the pages that have the links to the pdfs, so HTTrack
gets those pages to then get the links.

+www.tpg.ch/fr/horaires/rechercher*


Now Your start pages. As were restricting the site content so much, there
might not be any links from your start page to the pages with the links so we
add all of your needed base pages.
www.tpg.ch/fr/plans-du-reseau
www.tpg.ch/fr/plans-de-connexion
www.tpg.ch/livre_horaire
www.tpg.ch/fr/horaires

We don't need to add rules '+' for these base pages, as its implied.
We also don't need the "www.tpg.ch" you have currently so remove it.

Other Things:
Be nice, turn some of the settings down
MaxRate 250000
MaxConn 5
Socets 5
Trave 1

Clear these settings
ExtDepth
Depth


Give that a try.


 
Reply Create subthread


All articles

Subject Author Date
Downloading files form one site only 08/20/2017 15:27
Re: Downloading files form one site only 09/04/2017 18:26
Re: Downloading files form one site only 09/04/2017 18:28




9

Created with FORUM 2.0.11