HTTrack Website Copier
Free software offline browser - FORUM
Subject: Can Someone Help me out with this?
Author: Solomon Boulos
Date: 09/22/2002 00:25
 
I know that you can't currently merge two projects, so I'm 
gonna write something to do it for me, but I need HTTrack 
to gather some stuff for me first.  I'm trying to make a 
copy of 
<http://www.osha.gov/SLTC/eyeandface_etool/index.html>

So I used that as the URL to start with
Left the depths alone
Set the Links->Get non-HTML files

and i'm pretty happy with what it got for me.  however, 
this leaves a lot of stuff that OSHA wants included.  so i 
wrote a program to find all the absolute url's that were 
left over, which spits out this:

<http://www.dol.gov/>
<http://www.osha.gov/>
<http://www.osha.gov/doc/outreachtraining/htmlfiles/subparte>
.html
<http://www.osha.gov/doc/outreachtraining/outreachtraining.h>
tml
<http://www.osha.gov/dts/osta/oshasoft/index.html>
<http://www.osha.gov/dts/osta/otm/otm_iii/otm_iii_6.html>
<http://www.osha.gov/html/disclaim_home.html>
<http://www.osha.gov/html/Feed_Back.html>
<http://www.osha.gov/html/subject-index.html>
<http://www.osha.gov/index.html>
<http://www.osha.gov/pls/oshaweb/owadisp.show_document?p_table=FACT_SHEETS&p_id=142>
<http://www.osha.gov/pls/oshaweb/owadisp.show_document?p_table=STANDARDS&p_id=10120>
<http://www.osha.gov/pls/oshaweb/owadisp.show_document?p_table=STANDARDS&p_id=10269>
<http://www.osha.gov/pls/oshaweb/owadisp.show_document?p_table=STANDARDS&p_id=10658>
<http://www.osha.gov/pls/oshaweb/owadisp.show_document?p_table=STANDARDS&p_id=10665>
<http://www.osha.gov/pls/oshaweb/owadisp.show_document?p_table=STANDARDS&p_id=9777>
<http://www.osha.gov/pls/oshaweb/owadisp.show_document?p_table=STANDARDS&p_id=9778>
<http://www.osha.gov/pls/oshaweb/owadisp.show_document?p_table=STANDARDS&p_id=9853>
<http://www.osha.gov/pls/oshaweb/owaredirect.html?p_url=http://www.asse.org/ShopOnline/books/standards/3322.h>
tm
<http://www.osha.gov/pls/oshaweb/owaredirect.html?p_url=http://www.cdc.gov/niosh/homepage.html>
<http://www.osha.gov/pls/oshaweb/owasrch.full_site_search>
<http://www.osha.gov/SLTC/index.html>
<http://www.osha.gov/SLTC/smallbusiness/sec7.html>
<http://www.osha.gov/SLTC/usersguide/view_print.html>

we'd probably remove a few, but the important ones are 
those referencese to the standards, like this one

<http://www.osha.gov/pls/oshaweb/owadisp.show_document?p_table=STANDARDS&p_id=9853>

So I tried doing an HTTrack on just that file ( or those 
sets of files ) and what I want it to do is get that HTML 
file ( output by Oracle server ), which it does just fine, 
but also include the images it needs.  I thought include 
non-HTML links would do this ( but i have set maximum 
mirroring depth to 1 so it won't pick up all the 
standards, there are a lot of them... ) but I'm wondering 
if it's my settings or if it's because the URL i supplied 
isn't .htm/.html or really .anything valid

Any help would be appreciated, oh and I don't need it to 
preserve the name really, I found that the log files 
contain the conversions, and I'll just utilize those to 
update the links in order to "merge" my projects.

Thanks
 
Reply


All articles

Subject Author Date
Can Someone Help me out with this?

09/22/2002 00:25
Re: Can Someone Help me out with this?

09/22/2002 22:47




f

Created with FORUM 2.0.11