HTTrack Website Copier
Free software offline browser - FORUM
Subject: collection from repository using handle server
Author: kmk
Date: 06/06/2023 01:06
 
Hi,
I am trying to download a specific collection from a public repository that
uses a handle server. I have a few questions:
1.) The start page of the collection is 
<https://repository.university.edu/handle/10822/1052698>
There are HTML pages and pdf files that I want to download that actually live
at 
<HTTP://handle.net/10822/1052698>

2.) There are also links to full metadata records that I need to capture that
look like
<https://repository.library.georgetown.edu/handle/10822/1082137?show=full>

3.) while /10822/ is a folder, all of the university's collections and
documents are just sequential numbers after that. So
<https://repository.university.edu/handle/10822/1082189>
is a page within that collection
<https://repository.university.edu/handle/10822/559522>
is in a different collection entirely.

4.) The links that I want to the pdf files actually look like
<https://repository.university.edu/bitstream/handle/10822/1082137/FBI%20FISA%20Query%20Guidance%20Part%2001.pdf?sequence=1&isAllowed=y>

5.) A link to an alternate view that I do NOT want look like
<https://repository.university.edu/static/flexpaper/template.html?path=/bitstream/handle/10822/1082137/FBI%20FISA%20Query%20Guidance%20Part%2001.pdf?sequence=1&isAllowed=y>

So how do I use limits and filters and rules and excludes to capture only that
collection, the pdf that is actually at handle.net, both the short record HTML
page and the full metadata record HTML page - all while avoiding the
/static/flexpaper view?
starting url of 
<https://repository.university.edu/handle/10822/1052698>
include handle.net/10822/*
exclude /static/*

will include handle.net/* only include things that link off of my specific
linked url? or is that going to end up bringing in all of handle.net? or even
all of handle.net/10822?
does httrack now how to handle the bistream, and ?show=full stuff?
Clues, help, hints and answers are all greatly appreciated!!

kmk

 
Reply


All articles

Subject Author Date
collection from repository using handle server

06/06/2023 01:06




a

Created with FORUM 2.0.11