| I have downloaded a website using HTTrack. This site has a lot of zip,pdf,doc
files which are located on seperate domains. Because there were so many of
these files I configured HTTrack to exclude them to speed things up.
My idea was that I would then parse all the HTML files that HTTrack downloaded
for the absolute links to those extra files and use a download manager to
fetch them at a faster download rate than HTTrack offers.
I have all the files now, but I want to integrate those files into my mirrored
site to browse offline, but all the HTML files point to absolute links. I have
created the folder structure manually and copied the files across to the
mirrored site location.
My problem is now to update all the links in the HTML files from absolute
links to relative ones. The site is huge so a don't want to do this manually.
Have I approached this whole thing in the correct way? Please could anyone
offer a solution?
Many thanks | |