HTTrack Website Copier
Free software offline browser - FORUM
Subject: **Getting linked files hidden in HTML source code*
Author: Bandit
Date: 01/07/2010 19:50
 
I just noticed this post and tried a little experimenting myself.

I had some luck so I thought I would add what I discovered to the topic, in
case there is still some interest in this or another user finds this article
in a search.  Don't forget, though, if you have a follow-up question to this,
the best way to get an answer is to post it as a new topic.  Only new topics
go to the top of the list (as of this writing - 07-Jan-2010), not new *posts*
unfortunately.

If it matters, I am on Vista SP1 and the version of WinHTT I am using is:
 WinHTTrack Website Copier 3.33 (+swf)

While browsing (FFx v3.5.7) MovingImageMusic.com, I noticed that you could go
into the Tunebank section to find the song files.  If you choose a Tag, it
brings up the song titles associated with that tag.  If you view the source of
one of these, you'll see commented out HTML "href" tags such as <!--a
href="/tunebank/show/1/"-->.  If you put that href after the site's URL in the
browser, e.g.
 <http://www.movingimagemusic.com/tunebank/show/1/>
you'll get a page that is pretty useless in the browser, but the HTML source
contains the name of the song and a reference to the mp3 location in its
javascript text.  Note that this could be changed (fixed/broken/removed) at
some point in the future by the author, but for now...

Another side note, my browser with AFP 9, produced no sound from trying to
play these songs with the "JW MP3 Player 3.12"
(/zinc_media/flash/mp3player.swf) embedded in the page(s) on that site.  I'm
not sure why, but didn't really care to find out either :)

So...

Trying to mirror using only
 <http://www.movingimagemusic.com/>
as the starting URL seemed to miss those "show" links.  I noticed that a *few*
of the mp3's were downloaded when mirroring this way but upon further
examination, they were discovered by HTT because they are linked in pages
crawled under the Work section of the site.  To continue testing, I used the
default options (for 3.33) except I set the max HTML size to 8K and the max
non-html size to 80K and just looked at the log to see which "big files" it
*would have* downloaded.  This was just to save time and bandwidth.  The
filters/scan rules I set were:
 -* +www.movingimagemusic.com/*

For the second trial, I generated a list of URL's following the pattern:
 <http://www.movingimagemusic.com/tunebank/show/{#}/>
using 1-99 as folders.  I changed nothing else in the settings, replaced
<http://www.movingimagemusic.com/> as my starting URL with the list of 99 URL's
and ran again.  This resulted in 90 - *not* 99 - additional folders (i.e.
"links") under www.movingimagemusic.com/tunebank/show/* and the log file
showed that 90 mp3's were not downloaded due to file size.  It is noteworthy
that the highest numbered folder under "/show/" was or is 93 and 3 folders in
the numeric sequence 1-99 were missing (42, 68, & 75), which is why there were
90 links discovered rather than 99.  So I adjusted the URL list to end at 93
and removed the 3 no-shows.  I will paste the list at the bottom of this post
to save time for anyone who wants to follow-up with my experiment. 
Additionally, to tidy it up some more, I assumed that ONLY the mp3's from the
site were wanted, so I next changed the filters/scan rules to
 -* +www.movingimagemusic.com/tunebank/show/*
  +www.movingimagemusic.com/media/audiostore/*
Again, this seemed to run with the wanted results and the log file showed a
"perfect 90" warnings that 90 mp3's were not downloaded due to size.  That
being the case, I think it's safe to say that all of the mp3's would be
downloaded without the file size restriction.  To back that up a little, I
removed the max file sizes and reran the mirror but cancelled after a few
minutes because I did not want to waste more bandwidth than necessary.

One caveat is that, obviously, if the author/webmaster puts the  

So, to summarize, use the generated "/show/" URL's as the starting URL's,
leave settings at default except the "scan rules" (and max xfer rate:
100KB/s), and RUN!  :)

Hope This Helps!!!
~BD

--
FOLLOWING IS ONLY USEFUL IF YOU WANT TO TRY THIS YOURSELF:

With only the URL list removed, the following is the command line from
DoIt.log:
-qwC2%Ps2u1%s%uN0%I0p3DaK0H0%kf2A100000%f#f -F "Mozilla/4.5 (compatible;
HTTrack 3.0x; Windows 98)" -%F "<!-- Mirrored from %s%s by HTTrack Website
Copier/3.x [XR&CO'2005], %s -->" -%l "en, en, *" {...LONG URL LIST...} -O
"D:\\My Web Sites\\test,D:\\My Web Sites\\test" -*
+www.movingimagemusic.com/tunebank/show/*
+www.movingimagemusic.com/media/audiostore/* -%A
php3,php,php2,asp,jsp,pl,cfm,nsf=text/html

Changing some other settings that I usually change and minor tweaking, the
command line could also be shortened a little to
-qwC2%Ps0u1%s%uN0%I0p3DaK0H0%kf2A100000%f#f -F "Mozilla/4.0 (compatible; MSIE
6.0; Windows NT 5.0)" -%F "" -%l "en, en, *" {...LONG URL LIST...} -O "D:\\My
Web Sites\\test,D:\\My Web Sites\\test" -* +*/media/audiostore/*

HERE IS THE GENERATED URL LIST (FOR COPY/PASTING):

<http://www.movingimagemusic.com/tunebank/show/1/>
<http://www.movingimagemusic.com/tunebank/show/2/>
<http://www.movingimagemusic.com/tunebank/show/3/>
<http://www.movingimagemusic.com/tunebank/show/4/>
<http://www.movingimagemusic.com/tunebank/show/5/>
<http://www.movingimagemusic.com/tunebank/show/6/>
<http://www.movingimagemusic.com/tunebank/show/7/>
<http://www.movingimagemusic.com/tunebank/show/8/>
<http://www.movingimagemusic.com/tunebank/show/9/>
<http://www.movingimagemusic.com/tunebank/show/10/>
<http://www.movingimagemusic.com/tunebank/show/11/>
<http://www.movingimagemusic.com/tunebank/show/12/>
<http://www.movingimagemusic.com/tunebank/show/13/>
<http://www.movingimagemusic.com/tunebank/show/14/>
<http://www.movingimagemusic.com/tunebank/show/15/>
<http://www.movingimagemusic.com/tunebank/show/16/>
<http://www.movingimagemusic.com/tunebank/show/17/>
<http://www.movingimagemusic.com/tunebank/show/18/>
<http://www.movingimagemusic.com/tunebank/show/19/>
<http://www.movingimagemusic.com/tunebank/show/20/>
<http://www.movingimagemusic.com/tunebank/show/21/>
<http://www.movingimagemusic.com/tunebank/show/22/>
<http://www.movingimagemusic.com/tunebank/show/23/>
<http://www.movingimagemusic.com/tunebank/show/24/>
<http://www.movingimagemusic.com/tunebank/show/25/>
<http://www.movingimagemusic.com/tunebank/show/26/>
<http://www.movingimagemusic.com/tunebank/show/27/>
<http://www.movingimagemusic.com/tunebank/show/28/>
<http://www.movingimagemusic.com/tunebank/show/29/>
<http://www.movingimagemusic.com/tunebank/show/30/>
<http://www.movingimagemusic.com/tunebank/show/31/>
<http://www.movingimagemusic.com/tunebank/show/32/>
<http://www.movingimagemusic.com/tunebank/show/33/>
<http://www.movingimagemusic.com/tunebank/show/34/>
<http://www.movingimagemusic.com/tunebank/show/35/>
<http://www.movingimagemusic.com/tunebank/show/36/>
<http://www.movingimagemusic.com/tunebank/show/37/>
<http://www.movingimagemusic.com/tunebank/show/38/>
<http://www.movingimagemusic.com/tunebank/show/39/>
<http://www.movingimagemusic.com/tunebank/show/40/>
<http://www.movingimagemusic.com/tunebank/show/41/>

<http://www.movingimagemusic.com/tunebank/show/43/>
<http://www.movingimagemusic.com/tunebank/show/44/>
<http://www.movingimagemusic.com/tunebank/show/45/>
<http://www.movingimagemusic.com/tunebank/show/46/>
<http://www.movingimagemusic.com/tunebank/show/47/>
<http://www.movingimagemusic.com/tunebank/show/48/>
<http://www.movingimagemusic.com/tunebank/show/49/>
<http://www.movingimagemusic.com/tunebank/show/50/>
<http://www.movingimagemusic.com/tunebank/show/51/>
<http://www.movingimagemusic.com/tunebank/show/52/>
<http://www.movingimagemusic.com/tunebank/show/53/>
<http://www.movingimagemusic.com/tunebank/show/54/>
<http://www.movingimagemusic.com/tunebank/show/55/>
<http://www.movingimagemusic.com/tunebank/show/56/>
<http://www.movingimagemusic.com/tunebank/show/57/>
<http://www.movingimagemusic.com/tunebank/show/58/>
<http://www.movingimagemusic.com/tunebank/show/59/>
<http://www.movingimagemusic.com/tunebank/show/60/>
<http://www.movingimagemusic.com/tunebank/show/61/>
<http://www.movingimagemusic.com/tunebank/show/62/>
<http://www.movingimagemusic.com/tunebank/show/63/>
<http://www.movingimagemusic.com/tunebank/show/64/>
<http://www.movingimagemusic.com/tunebank/show/65/>
<http://www.movingimagemusic.com/tunebank/show/66/>
<http://www.movingimagemusic.com/tunebank/show/67/>

<http://www.movingimagemusic.com/tunebank/show/69/>
<http://www.movingimagemusic.com/tunebank/show/70/>
<http://www.movingimagemusic.com/tunebank/show/71/>
<http://www.movingimagemusic.com/tunebank/show/72/>
<http://www.movingimagemusic.com/tunebank/show/73/>
<http://www.movingimagemusic.com/tunebank/show/74/>

<http://www.movingimagemusic.com/tunebank/show/76/>
<http://www.movingimagemusic.com/tunebank/show/77/>
<http://www.movingimagemusic.com/tunebank/show/78/>
<http://www.movingimagemusic.com/tunebank/show/79/>
<http://www.movingimagemusic.com/tunebank/show/80/>
<http://www.movingimagemusic.com/tunebank/show/81/>
<http://www.movingimagemusic.com/tunebank/show/82/>
<http://www.movingimagemusic.com/tunebank/show/83/>
<http://www.movingimagemusic.com/tunebank/show/84/>
<http://www.movingimagemusic.com/tunebank/show/85/>
<http://www.movingimagemusic.com/tunebank/show/86/>
<http://www.movingimagemusic.com/tunebank/show/87/>
<http://www.movingimagemusic.com/tunebank/show/88/>
<http://www.movingimagemusic.com/tunebank/show/89/>
<http://www.movingimagemusic.com/tunebank/show/90/>
<http://www.movingimagemusic.com/tunebank/show/91/>
<http://www.movingimagemusic.com/tunebank/show/92/>
<http://www.movingimagemusic.com/tunebank/show/93/>

{EOF}
 
Reply Create subthread


All articles

Subject Author Date
Re: Impossible to download musc files

12/21/2009 10:20
**Getting linked files hidden in HTML source code*

01/07/2010 19:50
Re: caveat

01/07/2010 19:54




8

Created with FORUM 2.0.11