| Hello
I've come across an occaisonal problem on websites
with pages with improper HTML. What happens is the
URL for a link on that site is given with two "/"
forward slashes in it.
Ex: instead of <http://www.httrack.com/page0.php>
the link goes to <http://www.httrack.com//page0.php>
(there isn't anything wrong with these pages here...)
on complex sites with many links back and forth pages
you can get stuff like if httrack runs long enough:
<http://www.httrack.com/////page0.php>
which are saved as c:\web\www.httrack.com_____\*.*
Webbrowsers do the same thing as httrack, but for a
webbrowser this isn't a problem because it isn't try
to save everything. The server software seems
irrelevent, at least it happens/can happen on sites
with either Apache or Microsoft IIS.
My current workaround is to use the internal depth
limit to 4 or 5 levels.
I think a better solution is to check URLs for
duplicated "/" 's and rewrite them with just one "/".
I can't think of a legitimate case to actually have
two "/"'s in any URL, except the initial http://
Nonetheless this would be best implemented as a
configurable feature just in case it 'breaks'
compatibility with the way HTTrack gathered sites
before.
| |