| > i would like to ask if there's a way to ignore all
> GET arguments ( everything after the ? )
>
> The side I want to mirror always adds a
> ?timestamp=xxxxxxxxxxxxx, the x are random numbers.
>
> Is it possible to ignore everything behind the ?
You can exclude links containing a querystring:
-*?*
Or more specific:
-*?timestamp=*
However this means that HTTrack will not follow those links at all, so if
they're needed to build the structure of the site (i.e. those pages are needed
to find other pages you want) then you would need to crawl them...
If the link is "page.php?timestamp=222" then there is no way to say just get
the "page.php" part.
| |