Re: HTTrack does not capture pdf files correctly....

Subject: Re: HTTrack does not capture pdf files correctly....

Author: Xavier Roche

Date: 05/04/2004 23:13

> When we attempt to open a pdf from the copied site 
offline 
> this message comes up File does not begin with '%PDF-' If 
> we locate the file in the 'tree', or if we find it 
through 
> a search on the computer, and open it from either of 
those 
> two places this message comes up...

This is neither an install problem with acrobat, nor any 
other similar problem.
The problem is related to the way the site is delivering 
the pdf's, probably with a redirect, and how httrack reacts.

Basically, it means that links to pdf files "jumps" to 
another place, which is perfectly handled by a regular 
browser (this is most probably a "META refresh" for those 
who know what I am talking about). 

However, when httrack tries to fetch this link, this 
unexpected "jump" is a problem: the file was already named 
(for example, foo.pdf), and the page received is just a 
small html page with an embedded reirect. 

The result: a "fake" pdf file with html data inside. 
The "real" pdf file is elsewhere. On a live site, this 
isn't a problem: the remote server gives some "hint" 
(content-type) to the browser to tell him that the file is 
just a redirect, and the browser can then "see" it and at 
last points to the real PDF file. Unfortunately, on 
a "local" copy (filesystem), there is no such hint, and 
when trying to open the file, acrobat is launched, and 
acrobat won't open the it, because is it just.. html.

So I suspect that the real pdf file, which was probably 
downloaded, it "elsewhere" in the same project directory.

I will try to track this problem in httrack, but it is 
definitely NOT a simple bug and it will require some work. 
It's not a problem to answer to the problem: it is a limit 
in httrack, and I'll have to find a workaround.

Create subthread

All articles

Subject	Author	Date
HTTrack does not capture pdf files correctly....		05/02/2004 01:18
I have the same problem.		05/02/2004 22:46
Additional Info - A new discovery		05/03/2004 10:50
Additional Query on this problem.		05/04/2004 06:03
Diagnosis of error by Adobe staff		05/04/2004 22:11
Re: HTTrack does not capture pdf files correctly....		05/04/2004 23:13
Maybe I can help...		05/05/2004 00:25
Re: Maybe I can help...		05/05/2004 19:13
Re: Maybe I can help...		05/05/2004 22:24
Did you get my e-mail?		05/05/2004 22:26
Re: HTTrack does not capture pdf files correctly....		05/10/2004 14:51
Re: HTTrack does not capture pdf files correctly....		02/22/2021 07:44