HTTrack Website Copier
Free software offline browser - FORUM
Subject: Drop down menus
Author: Jamie Patrick-Burns
Date: 02/25/2016 19:58
 
Hello,

I posted a question on September 24, 2015 and never received any response.
Does anyone have any insights or similar problems?
I am working on scraping websites with the WiderNet Project in Chapel Hill, NC
and have run into some problems with drop-down menus. The site is
<http://smallwarsjournal.com/>. The problem pages are
<http://smallwarsjournal.com/jrnl/iss/archive> and
<http://smallwarsjournal.com/blog/archive/201509>, which lead to the journal
and blog archives and have drop-down menus to access the archives by month.
These pages themselves are scraped, but when I try to select another month I
get a 404 error. For example, for the journal archive I get a message “The
page you requested <http://smallwarsjournal.com/jrnl/iss/archive> could not
be
found” and for the blog archive I get a message like “The page you
requested <http://smallwarsjournal.com/blog/archive/201509> could not be
found.” I looked at the page source code and the different drop-down options
lead to relative links in the code. I’m wondering if it’s some database or
javascript setup that’s causing the problem? Here are the parameters I used,
copied from the doit file. The hts-log file
was too large to open. 
-%F "<!-- this file was mirrored for the egranary digital library from %s%s
on
%s -->" -F "mozilla [en] egranary digital library system" -Q -C2 -t -%P -n
-s0
-%s -%u -N0 -p3 -D -a -K5 -H0 -%k -f2 -A25000 -%A cgi,php,php3,asp=text/html
-%f0 -#f -q -X -#L -o0 -u2 -qwC2%Pns0u2k%s%uN0I0%I0p3DaH0%kf2o0A25000%f#f -F
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" -%l "en, en, *"
smallwarsjournal.com/ -O1 "X:\\egCache\\smallwarsjournal.com" -* +*.css +*.js
-ad.doubleclick.net/* -mime:application/foobar +smallwarsjournal.com*
-smallwarsjournal.com/user* +*.gif +*.jpg +*.png +*.tif +*.bmp +*.zip +*.tar
+*.tgz +*.gz +*.rar +*.z +*.exe +*.mov +*.mpg +*.mpeg +*.avi +*.asf +*.mp3
+*.mp2 +*.rm +*.wav +*.vob +*.qt +*.vid +*.ac3 +*.wma +*.wmv -#L10000000 -O
"X:\\egRawScraped\\smallwarsjournal.com,X:\\egCache\\smallwarsjournal.com"
-%A
cgi=text/html -%A php,php3,asp=text/html

I’m also having similar issues with this site: <http://www.mmrjournal.org/>.
The page <http://www.mmrjournal.org/archive> has two drop-down menus that link
to journal volumes. In my scrape the volume 2 menu is dropped down and the
volume 1 menu doesn’t work; when I click on volume 2 it folds up and then
doesn’t work at all. 

Here are the parameters from the log file: 
HTTrack3.45-3+htsswf+htsjava launched on Fri, 18 Sep 2015 16:34:03 at -* 
+*.jpg  
+*.gif  
+*.css  
+*.js  
+*.png 
-*page2ad* 
-*google-analytics* 
-*paypal.com* 
-*akamaitech.com* 
-*google.com*  
-*doubleclick.com* 
-*googlesyndication* 
-*webtrendslive* 
-*sitemeter.com* 
-*ads.web.aol.com* 
-*nedstatbasic.net* 
-*webstats4u.com* 
-*webstats.motigo.com* 
-*statcounter.com* 
-*service.urchin.com* 
-*snap.com* 
-*stats.wordpress.com* 
-*visit.webhosting.yahoo.com* 
-*mailto* 
 +www.mmrjournal.org/* www.mmrjournal.org/ 
(httrack -%F "<!-- this file was mirrored for the egranary digital library
from %s%s on %s -->" -F "mozilla [en] egranary digital library system" -Q -C2
-t -%P -n -s0 -%s -%u -N0 -p3 -D -a -K5 -H0 -%k -f2 -A25000 -%A
cgi,php,php3,asp=text/html -%f0 -#f -q -X -#L -o0 -u2 -O
X:\egRawScraped\www.mmrjournal.org,X:\egCache\www.mmrjournal.org -%S
X:\UpdateSW\HTTrackScanRules\ScanRulesFull.txt +www.mmrjournal.org/*
www.mmrjournal.org/ )

Thank you for any suggestions or help! 
 
Reply


All articles

Subject Author Date
Drop down menus

02/25/2016 19:58




4

Created with FORUM 2.0.11