HTTrack Website Copier
Free software offline browser - FORUM
Subject: Problems scraping drop-down menu
Author: Jamie Patrick-Burns
Date: 09/24/2015 16:29

I am working on scraping websites with the WiderNet Project in Chapel Hill, NC
and have run into some problems with drop-down menus. The site is
<>. The problem pages are
<> and
<>, which lead to the journal and
blog archives and have drop-down menus to access the archives by month. These
pages themselves are scraped, but when I try to select another month I get a
404 error. For example, for the journal archive I get a message “The page
you requested <> could not be
found” and for the blog archive I get a message like “The page you
requested <> could not be
found.” I looked at the page source code and the different drop-down options
lead to relative links in the code. I’m wondering if it’s some database or
javascript setup that’s causing the problem?
Here are the parameters I used, copied from the doit file. The hts-log file
was too large to open. 
-%F "<!-- this file was mirrored for the egranary digital library from %s%s on
%s -->" -F "mozilla [en] egranary digital library system" -Q -C2 -t -%P -n -s0
-%s -%u -N0 -p3 -D -a -K5 -H0 -%k -f2 -A25000 -%A cgi,php,php3,asp=text/html
-%f0 -#f -q -X -#L -o0 -u2 -qwC2%Pns0u2k%s%uN0I0%I0p3DaH0%kf2o0A25000%f#f -F
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" -%l "en, en, *" -O1 "X:\\egCache\\" -* +*.css +*.js* -mime:application/foobar** +*.gif +*.jpg +*.png +*.tif +*.bmp +*.zip +*.tar
+*.tgz +*.gz +*.rar +*.z +*.exe +*.mov +*.mpg +*.mpeg +*.avi +*.asf +*.mp3
+*.mp2 +*.rm +*.wav +*.vob +*.qt +*.vid +*.ac3 +*.wma +*.wmv -#L10000000 -O
"X:\\egRawScraped\\,X:\\egCache\\" -%A
cgi=text/html -%A php,php3,asp=text/html

Thank you for any suggestions or help! 

All articles

Subject Author Date
Problems scraping drop-down menu

09/24/2015 16:29
Re: Problems scraping drop-down menu

09/24/2015 19:10


Created with FORUM 2.0.11