HTTrack Website Copier
Free software offline browser - FORUM
Subject: Downloading files form one site only
Author: Omar Farouk
Date: 08/20/2017 15:27
 
ok so basically here is what I wan t to do.

the Geneva public transport site www.tpg.ch has a collection of PDF that are
found on different pages.

the first few are simple because they are just static links found on these
pages:
<http://www.tpg.ch/fr/plans-du-reseau>
<http://www.tpg.ch/fr/plans-de-connexion>
<http://www.tpg.ch/livre_horaire>

the other thing that makes these three related is that the PDF on the above 3
pages are all found in the folder <http://www.tpg.ch/documents/>

the last group is the more complicated one. you can find them by going to this
page <http://www.tpg.ch/fr/horaires>
this page execut3es some sort on non static PHP or AJAX or something.
if you click on any of the colered numbers in that page it will list all
stations along that line. then if you clock on any of these stations.

for example if you click on line 18 "purple" then choose Bel-Air station you
will reach This page:
<http://www.tpg.ch/fr/horaires/rechercher?p_p_id=PlansReseaux_WAR_PlansReseauxportlet&p_p_lifecycle=0&p_p_state=normal&p_p_mode=view&p_p_col_id=column-1&p_p_col_count=1&_PlansReseaux_WAR_PlansReseauxportlet_jspPage=%2Fhtml%2F_thermometre.jsp&_PlansReseaux_WAR_PlansReseauxportlet_req=thermometre&_PlansReseaux_WAR_PlansReseauxportlet_ligne=18&_PlansReseaux_WAR_PlansReseauxportlet_sens=ALLER&_PlansReseaux_WAR_PlansReseauxportlet_arret=Bel-Air&_PlansReseaux_WAR_PlansReseauxportlet_date>=

if you scroll to the end of the page you will find the word Téléchargement
and beneath it a link to a PDF. these PDFs are static and all found in folder
<http://www.tpg.ch/html/pdf/> on the server.

after a lot of tinkering with the options I finally arrived at the below
options, which worked, unfortunately it had a side effect of downloading the
entire site and not only that it actually in parallel downloads parts of any
other site linked to in tpg.ch such as facebook.com youtube.com twitter.com
and a nunch of .ch sites.

it downloaded 10 GB and it has pulled some of the PDF I want but not all then
I paused.

question is what are the right paremeters to achieve what I want.

Near=1
Test=1
ParseAll=1
HTMLFirst=1
Cache=1
NoRecatch=0
Dos=0
Index=1
WordIndex=0
MailIndex=0
Log=1
RemoveTimeout=0
RemoveRateout=0
KeepAlive=1
FollowRobotsTxt=0
NoErrorPages=0
NoExternalPages=0
NoPwdInPages=0
NoQueryStrings=0
NoPurgeOldFiles=0
Cookies=1
CheckType=0
ParseJava=1
HTTP10=0
TolerantRequests=1
UpdateHack=1
URLHack=1
StoreAllInCache=0
LogType=0
UseHTTPProxyForFTP=0
Build=0
PrimaryScan=3
Travel=3
GlobalTravel=1
RewriteLinks=0
BuildString=%%h%%p/%%n%%q.%%t
Category=
MaxHtml=
MaxOther=
MaxAll=
MaxWait=
Sockets=10
Retry=3
MaxTime=
TimeOut=60
RateOut=
UserID=Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
Footer=
AcceptLanguage=en, *
OtherHeaders=
DefaultReferer=
MaxRate=25000000000
WildCardFilters=+*[name].*[name]tpg.ch*[name].*[name]/*
%0d%0a-*[name].*[name].com*[name].*[name]/*%0d%0a-*[name].*[name].net*[name].*[name]/*
Proxy=
Port=
Depth=20
ExtDepth=1
MaxConn=10
MaxLinks=10000000
MIMEDefsExt1=
MIMEDefsExt2=
MIMEDefsExt3=
MIMEDefsExt4=
MIMEDefsExt5=
MIMEDefsExt6=
MIMEDefsExt7=
MIMEDefsExt8=
MIMEDefsMime1=
MIMEDefsMime2=
MIMEDefsMime3=
MIMEDefsMime4=
MIMEDefsMime5=
MIMEDefsMime6=
MIMEDefsMime7=
MIMEDefsMime8=
CurrentUrl=www.tpg.ch
CurrentAction=0
CurrentURLList=
 
Reply


All articles

Subject Author Date
Downloading files form one site only 08/20/2017 15:27
Re: Downloading files form one site only 09/04/2017 18:26
Re: Downloading files form one site only 09/04/2017 18:28




6

Created with FORUM 2.0.11