Re: ".do" extension in website url? - HTTrack Website Copier Forum

Subject: Re: ".do" extension in website url?

Author: Leto

Date: 03/27/2006 04:54

> Hi everyone, I'm new to webhttrack and have a
> question as to whether the ".do" extension on the
> website I'm trying to mirror should be treated
> differently? Should I put something special in the
> mime types section of webhttrack to identify what
> ".do" is? If so what would that be? I googled it a
> little and most of what I read is over my head but
> it seems to have something to do with java and/or
> the website being mainly an online store. Here's the
> url just in case someone needs to look at the site
> to be able to help:


You shouldn't have to do anything special for the .do files.  They are just
returning HTML so that's how HTTrack should save them.

You could run a quick test by creating a project, setting the start URL to:
<http://www.officedepot.com/browse.do?N=1000000291+10324+4294965720>

and set the filters to:

-*
+www.officedepot.com/ddSKU.do?level=SK&id=434357&N=1000000291+10324+4294965720&An=browse



> <http://www.officedepot.com/>
> 
> I work for the company as a home based customer
> service rep and I have to use the site all day long
> and I don't particularly like the search engine
> being used, so I want to mirror it on my hd and
> index it somehow. Any ideas on the best way to index
> a large site like that? Thx for any help.

That's going to be a freaking huge mirror...
You should try to identify any page-types you could exclude, for example:

-www.officedepot.com/shop/items/add.do*
-www.officedepot.com/promo.do*

But there is simply so much there it's going to take ages.  Remember to set
slow speed and connections.

Create subthread

All articles

Subject	Author	Date
".do" extension in website url?		03/26/2006 05:12
Re: ".do" extension in website url?		03/27/2006 04:54