| Hi, after all the tips and tricks I have found on this forum, I still have
trouble downloading the PDFs I strive for. I was hoping some of you could help
me on my way?
Maybe it's best to start with my error log:
HTTrack3.48-22+htsswf+htsjava launched on Thu, 19 May 2016 15:31:17 at
* +mime:text/html +*.pdf -*logout*
(winhttrack -qgC2%Ps0u1%s%u%I0p3DaK0H0%kf2A250000%f0#f -F "Mozilla/4.78 [en]
(Windows NT 5.0; U)" -%F -%l "en, *"
-O1 "C:\Users\USERNAME\Desktop\LOCATION" * +mime:text/html +*.pdf -*logout* )
Information, Warnings and Errors reported for this mirror:
note: the hts-log.txt file, and hts-cache folder, may contain sensitive
such as username/password authentication for websites mirrored in this
do not share these files/folders if you want these information to remain
15:31:17 Warning: Retry after error -4 (No data (connection closed)) at link
(from primary/primary)
15:31:22 Warning: Retry after error -5 (Unable to get server's address: The
requested name is valid, but no data of the ) at link */ (from
15:31:24 Warning: Retry after error -4 (No data (connection closed)) at link
(from primary/primary)
15:31:24 Warning: Retry after error -5 (Unable to get server's address:
unknown error) at link */ (from primary/primary)
15:31:27 Error: "No data (connection closed)" (-4) after 2 retries at link
(from primary/primary)
15:31:27 Error: "Unable to get server's address: unknown error" (-5) after 2
retries at link */ (from primary/primary)
15:31:27 Warning: No data seems to have been transferred during this session!
: restoring previous one!
I'm on "WinHTTrack Website Copier 3.48-22" and I've been using the "Capture
URL" in order to get my PDFs.
Unlike the tutorial, I don't end up with a "Link captured! You can now restore
your proxy settings" message.
But I end up with a "hts-post0"-file and an adress in the URL field after
feeding HTTrack my login info.
There are not many lines inside the "hts-post0" file, it looks like this:
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like
Proxy-Connection: Keep-Alive
Content-Length: 0
Host: joint5.prosjekthotell.com
Pragma: no-cache
Since I don't get the "OK message" from httrack, I wonder if I have to enter
my credentials in the "optional fields" in the URL dialog. The address to the
hts-post-file is prefilled, so I have tried running the crawl with both my
credentials filled and not filled in.
I have the following Scan Rules:
"* +mime:text/html +*.pdf -*logout*"
HTTrack have never come so far as to make a cookies.txt file in my save
folder, but I used Chrome and an extension named cookies.txt to generate a
cookies.txt-file and placed it in my project folder.
It includes a lot of lines with ASP session IDs, and 2 lines in the bottom
that stick out a little bit.
I figure they are a bit sensetive, so I don't post them?
I have turned off robots.txt.
The website I'm trying to copy of is a "Project hotel", it serves many ongoing
projects, I've selected one folder containing many pdf's, there are many
previous versions of each file recursively. The adress I'm trying to use with
HTTrack is:
Any idea what I'm missing? | |