| > I am trying to figure out exactly what causes
> the 'file has moved from URLa to URLb' or 'file has
> temporarily moved' error message.
> How does the server know I am using an offline
> browser?
Either using the Browser ID sent, or by detecting the
robots.txt request, or by detecting the number of
simultaneous requests, or by detecting that all links
in a page are being tracked one by one.
You can not avoid offline browser detectors, but there
are simpler ways to avoid offline browsers (see at
<http://www.httrack.com/HelpHtml/abuse.html#WEBMASTERS>)
Generally, 'moved pages' are mistyped link, such as:
<http://www.foo.com/bar> instead of
<http://www.foo.com/bar/>
(ending / missing)
Or <http://www.foo.com/Bar/> instead of
<http://www.foo.com/bar/>
(case mismatch)
The default behaviour is to download only pages in
current directory/location (plus some images and other
related files), therefore moving from
<http://www.foo.com/Bar/> to <http://www.foo.com/bar/> may
cause the link to be considered as 'external'.
| |