| suppose, I like to download all zipped subtitles from a site.
If there's captcha protection, I dont think web crawlers can do that (maybe in
the future they would be able to).
But even if there's no captcha protection, there's one stupid thing that
protects the crawler from ripping.
To give an example, let's look at the site tvsubtitles.net.
When we get a page with a link "download", that link points to something like
"sitename/download-123.html".
But, when the server gets this query, it returns raw data (the zip file)
instead of returning a html document.
Browsers can easily come up with the situation, ie they will show a message
box where to save the file.
But crawlers cant do that. How would I download such files? | |