| I know that you can't currently merge two projects, so I'm
gonna write something to do it for me, but I need HTTrack
to gather some stuff for me first. I'm trying to make a
copy of
<http://www.osha.gov/SLTC/eyeandface_etool/index.html>
So I used that as the URL to start with
Left the depths alone
Set the Links->Get non-HTML files
and i'm pretty happy with what it got for me. however,
this leaves a lot of stuff that OSHA wants included. so i
wrote a program to find all the absolute url's that were
left over, which spits out this:
<http://www.dol.gov/>
<http://www.osha.gov/>
<http://www.osha.gov/doc/outreachtraining/htmlfiles/subparte>
.html
<http://www.osha.gov/doc/outreachtraining/outreachtraining.h>
tml
<http://www.osha.gov/dts/osta/oshasoft/index.html>
<http://www.osha.gov/dts/osta/otm/otm_iii/otm_iii_6.html>
<http://www.osha.gov/html/disclaim_home.html>
<http://www.osha.gov/html/Feed_Back.html>
<http://www.osha.gov/html/subject-index.html>
<http://www.osha.gov/index.html>
<http://www.osha.gov/pls/oshaweb/owadisp.show_document?p_table=FACT_SHEETS&p_id=142>
<http://www.osha.gov/pls/oshaweb/owadisp.show_document?p_table=STANDARDS&p_id=10120>
<http://www.osha.gov/pls/oshaweb/owadisp.show_document?p_table=STANDARDS&p_id=10269>
<http://www.osha.gov/pls/oshaweb/owadisp.show_document?p_table=STANDARDS&p_id=10658>
<http://www.osha.gov/pls/oshaweb/owadisp.show_document?p_table=STANDARDS&p_id=10665>
<http://www.osha.gov/pls/oshaweb/owadisp.show_document?p_table=STANDARDS&p_id=9777>
<http://www.osha.gov/pls/oshaweb/owadisp.show_document?p_table=STANDARDS&p_id=9778>
<http://www.osha.gov/pls/oshaweb/owadisp.show_document?p_table=STANDARDS&p_id=9853>
<http://www.osha.gov/pls/oshaweb/owaredirect.html?p_url=http://www.asse.org/ShopOnline/books/standards/3322.h>
tm
<http://www.osha.gov/pls/oshaweb/owaredirect.html?p_url=http://www.cdc.gov/niosh/homepage.html>
<http://www.osha.gov/pls/oshaweb/owasrch.full_site_search>
<http://www.osha.gov/SLTC/index.html>
<http://www.osha.gov/SLTC/smallbusiness/sec7.html>
<http://www.osha.gov/SLTC/usersguide/view_print.html>
we'd probably remove a few, but the important ones are
those referencese to the standards, like this one
<http://www.osha.gov/pls/oshaweb/owadisp.show_document?p_table=STANDARDS&p_id=9853>
So I tried doing an HTTrack on just that file ( or those
sets of files ) and what I want it to do is get that HTML
file ( output by Oracle server ), which it does just fine,
but also include the images it needs. I thought include
non-HTML links would do this ( but i have set maximum
mirroring depth to 1 so it won't pick up all the
standards, there are a lot of them... ) but I'm wondering
if it's my settings or if it's because the URL i supplied
isn't .htm/.html or really .anything valid
Any help would be appreciated, oh and I don't need it to
preserve the name really, I found that the log files
contain the conversions, and I'll just utilize those to
update the links in order to "merge" my projects.
Thanks | |