| > Hi everyone, I'm new to webhttrack and have a
> question as to whether the ".do" extension on the
> website I'm trying to mirror should be treated
> differently? Should I put something special in the
> mime types section of webhttrack to identify what
> ".do" is? If so what would that be? I googled it a
> little and most of what I read is over my head but
> it seems to have something to do with java and/or
> the website being mainly an online store. Here's the
> url just in case someone needs to look at the site
> to be able to help:
You shouldn't have to do anything special for the .do files. They are just
returning HTML so that's how HTTrack should save them.
You could run a quick test by creating a project, setting the start URL to:
<http://www.officedepot.com/browse.do?N=1000000291+10324+4294965720>
and set the filters to:
-*
+www.officedepot.com/ddSKU.do?level=SK&id=434357&N=1000000291+10324+4294965720&An=browse
> <http://www.officedepot.com/>
>
> I work for the company as a home based customer
> service rep and I have to use the site all day long
> and I don't particularly like the search engine
> being used, so I want to mirror it on my hd and
> index it somehow. Any ideas on the best way to index
> a large site like that? Thx for any help.
That's going to be a freaking huge mirror...
You should try to identify any page-types you could exclude, for example:
-www.officedepot.com/shop/items/add.do*
-www.officedepot.com/promo.do*
But there is simply so much there it's going to take ages. Remember to set
slow speed and connections.
| |