HTTrack Website Copier
Free software offline browser - FORUM
Subject: Default filters (aka Scan Rules) & external sites-
Author: Bandit
Date: 11/19/2009 06:31
 
>  - <http://www.httrack.com/page/2/en/index.html>
>  - WinHTTrack : Windows 9x/XP/Vista/Seven
>  - httrack-3.43-7.exe
> With the exe file name as
>  - WinHTTrack.exe

> 1 - Start it by clicking on the HTTrack Website Copier icon on the Desktop.
> 2 - Click Next
> 3a- Give project name as HTMLDOG
> 3b- Leave Project Category blank.
> 3c- Leave BasePath default as C:\My Web Sites\
> 3d- Click Next.
> 4a- Leave Action as default Download web site(s)
> 4b- In the Web Address:(URL) paste <http://www.htmldog.com/>
> 4c- Leave all the set options as default
> 4d- and click next.
> + Leave default settings on the next screen
> "please select connection parameters as necessary..."
> + Click Finish.
> 
> When i look at the downoad folder i see
>  C:\My Web Sites\HTMLDOG\hts-cache\
>  C:\My Web Sites\HTMLDOG\pagead2.googlesyndication.com\
>  C:\My Web Sites\HTMLDOG\www.csszengarden.com\
>  C:\My Web Sites\HTMLDOG\www.htmldog.com\
>  C:\My Web Sites\HTMLDOG\www.westciv.com\
> 
> It is downloading fromm these sites as well which I
> do not want. I just want the complete www.htmldog.com
> Thanks
> (Note: I am totally new at this and have not a clue about it)
>

Hello JJ,

I am kinda new at this too but have figured out a bit over the past few weeks. 
One thing I'm not sure you understood was that when you see more "folders"
than the main site you wanted downloaded, this does not mean that those other
sites have been downloaded or mirrored for you.  It means that your main site
has images (typically) or picture files (in this case could also be javascript
or style sheet files) on it that are *hosted* by the external domain, server,
or site.  Your best bet and least complicated option is to just leave them as
they are.

If you just don't like the way it looks in the folder, you won't likely
"break" anything if you just manually remove (delete) the extraneous folders
after the mirror is complete.  Be sure to leave both your htt-cache folder as
well as the site's main folder.  If you plan to update the mirror regularly,
this might not be a good idea, but it would still work for the most part.

I think I am using the same version, downloaded from the same location, of
WinHTTrack as you except I think I got the "NoInstall" version.  I have the
same default filters as you referred to in your other post:
+*.png +*.gif +*.jpg +*.css +*.js -ad.doubleclick.net/*
-mime:application/foobar

After futzing around with this thing for a while, I noticed a suggestion in
the built-in Help file under Advanced Filters.  What you *probably* want to do
to get exactly what you want is to go into "Set Options..." on the fourth step
(#4 - I renumbered your original) above and then go straight to the "Scan
Rules" tab.  This is where filters are set.  Delete everything that is in
there.  Seriously, you don't need to worry because you can just start over
with a new project and those default ones will reappear.  So, after you clear
it out, put in only these two "rules" or filters:
-* +www.htmldog.com/*

To see where I got that, press F1 inside WinHTTrack, then click WinHTTrack
under "How to Use", click Options, click Scan Rules (tab), then click Advaced
Filters at the bottom.  See the second (lol) section 2.

Note there *are* other ways of doing this, but I hope this helps...
~     --b
 
Reply Create subthread


All articles

Subject Author Date
Downloading only main domain web site

11/08/2009 11:51
Re: Downloading only main domain web site

11/08/2009 16:20
Re: Downloading only main domain web site

11/08/2009 17:30
Re: Downloading only main domain web site

11/08/2009 17:30
Re: Downloading only main domain web site

11/09/2009 14:29
Re: Downloading only main domain web site

11/09/2009 17:06
Re: Downloading only main domain web site

11/09/2009 20:15
Re: Downloading only main domain web site

11/10/2009 08:32
Re: Downloading only main domain web site

11/10/2009 15:10
Re: Downloading only main domain web site

11/11/2009 13:11
Re: Downloading only main domain web site

11/11/2009 14:59
Re: Downloading only main domain web site

11/11/2009 15:22
Default filters (aka Scan Rules) & external sites-

11/19/2009 06:31
Re: Default filters (aka Scan Rules) & external sites-

11/19/2009 16:39
Re: Default filters (aka Scan Rules) & external sites-

11/20/2009 14:08
Re: Default filters (aka Scan Rules) & external sites-

02/15/2011 21:04
Re: Default filters (aka Scan Rules) & external sites-

03/04/2012 19:47
Re: Downloading only main domain web site

02/20/2018 19:54
Re: Default filters (aka Scan Rules) & external sites-

04/23/2020 10:48
Re: Default filters (aka Scan Rules) & external sites-

03/10/2022 06:30




2

Created with FORUM 2.0.11