HTTrack Website Copier
Free software offline browser - FORUM
Subject: information about how sites are downloaded
Author: Simone Spinozzi
Date: 05/12/2013 16:01
 
hi, sorry if my question sounds too stupid, i see this is a pretty technical
forum.

I currently use Teleport Pro as my default web spider, because it has a nifty
feature that is absent in most other webspiders i had considered, before
stumbling into this one.

Namely: it does not require for the site to be on the download directory in
order to download the updated files, but keeps a database of the files, checks
on that database if there are files that fail a name/date/CRC check and
downloads just the new files even after i've deleted the sites from the save
directory.

Why is that important to me? well here's where i said it would sound stupid,
but if somebody would be nice enough to answer i'd appreciate it.

I'm addicted to webcomics. I would open all of them and it would take me hours
each day to scan through all of them. As i went from college to my actual job
i simply could not continue and thus i went to search more practical
solutions, i did find one in teleport pro, since with that program i can
download the sites, then check the "new only" button and see only what was new
in each site, this cut down significantly the time i spent on the web and that
i did not have anymore.

Usually i leave the computer running a batch file of all my teleport projects
and when i come back home i can read all the new stuff in a matter of minutes
rather than several hours of just watching the same sites that have not
updated in days.

Plus i can delete the download folder after viewing the new stuff saving a lot
of disk space and be done with it, knowing the next day teleport would still
download only the new stuff (or the changed files) because the project file
does not check the files on the disk it keeps an internal database with the
time/date/crc of the downloaded files inside the project file.

Usually it's a dept 0 operation, since most webcomics do offer a single page
to see the most recent comic. Currently i have 357 sites, 324 with a dept 0
(or at most 1 in a few rare cases where the guys tend to muddy up the updates
often). And 33 with a "scan all the archives because the guys constantly
rewrite them or update batches of 20 pages each time... i know it sucks but
what can you do?"

Now, all i've read about this webspider in this site says that's yet another
spider that checks on the physical copy of the site on the disk before
downloading or updating. But i also saw that it has a lots of options, so i
might have missed it while checking, or it was written in technicalese and i
did not understand it.

Again sorry if it sounds stupid, answers are appreciated.
 
Reply


All articles

Subject Author Date
information about how sites are downloaded

05/12/2013 16:01
Re: information about how sites are downloaded

05/12/2013 23:00
Re: information about how sites are downloaded

05/13/2013 22:06
Re: information about how sites are downloaded

05/13/2013 22:48




0

Created with FORUM 2.0.11