HTTrack Website Copier
Free software offline browser - FORUM
Subject: Using HTTrack for site version control
Author: roger tubby
Date: 06/25/2017 14:55
 
First, I want to say that this is an incredible piece of software (POS) and
having it open-source is fantastic.

I am spidering some web sites (with permission) that are produced by Drupal. I
want to check in the pages into git or subversion and be able to see what has
changed since the last time I ran HTTrack. 

Unfortunately, the pages returned are all produced dynamically so the header
date is useless as an indicator and both Drupal and HTTrack change some
limited page content. Drupal appears to change various unimportant (to me)
HTML ID and FORM fields, and HTTrack is injecting the "<!-- Mirrored from ...
-->" elements. This means that all pages are viewed as different and need to
be updated.

If no one has a better solution, I would write a plugin preprocess callback to
essentially ignore these strings - probably using the httrack-py library. I'm
not sure what to do about the changed dates in the headers.

Lastly, I find searching in the forum almost useless. When the google search
is displayed there are no scroll bars in the results section. I've turned off
all javascript blockers and tried both chrome and firefox.

Merci bien pour cette oeuvre magnifique!
 
Reply


All articles

Subject Author Date
Using HTTrack for site version control 06/25/2017 14:55
Re: Using HTTrack for site version control 07/21/2017 15:40




d

Created with FORUM 2.0.11