HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Deleted Messages on Server
Author: Xavier Roche
Date: 12/21/2003 16:29
 
> I recently copied a website that is a password protected 
> message forum. I used the Capture URL feature to record 
my 
> username and password. After completing the mirror, Many, 
> but not all, of the messages that I had posted on the 
site 
> were deleted from the website, but were intact on my 
local 
> copy that I had just made.
> Is there a setting or filter that I should have used to 
> prevent this?
This is definitely a design bug on the server, because 
regular URLs (generating GET requests) should not have side-
effects to on the database. Especially, "delete" or "move" 
actions should always be triggere by POSTed forms, so that 
regular crawlers do not "f up" the forum when running.

Anyway, the mandatory analysis to be done BEFORE crawling a 
forum is to list which links are composing the forum ; such 
as:
- links to display a regular page ; such as
<http://www.example.com/myforum/forum.cgi?id=1234&foobar=cherry>

- links to display the next page or previous page following 
a regular page ; such as
<http://www.example.com/myforum/forum.cgi?id=1234&foobar=cherry&next>
which is, in this example, identical to:
<http://www.example.com/myforum/forum.cgi?id=1235&foobar=cherry>

- links to make an action, such as delete or reply
<http://www.example.com/myforum/forum.cgi?delete&id=1234>
or
<http://www.example.com/myforum/forum.cgi?reply&id=1234>

Here, you'll have to use scan rules such as:

-* +www.example.com/myforum/forum.cgi* 
-www.example.com/myforum/forum.cgi*delete*
-www.example.com/myforum/forum.cgi*reply*
-www.example.com/myforum/forum.cgi*next*
-www.example.com/myforum/forum.cgi*previous*

To get all regular forum pages, except the "delete/reply" 
links, and except "previous" ans "next" pages, which would 
cause to fetch all pages in 3 identical versions, wasting 
3X bandwidth.

You can also, optionnally, include images or related files 
that could be located outside the forum:

+*.gif +*.jpg +*.png +*.css +*.js

 
Reply Create subthread


All articles

Subject Author Date
Deleted Messages on Server

12/19/2003 04:58
Re: Deleted Messages on Server

12/21/2003 16:29
Re: Deleted Messages on Server

12/22/2003 04:39




f

Created with FORUM 2.0.11