HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Exclude File Name
Author: lime kang
Date: 08/12/2025 03:57
 
Thanks a lot.
If you are using Scrapy, you can modify the start_requests method in your
spider to filter out the URLs you don't want:
import scrapy <https://io-games.onl/>

class MySpider(scrapy.Spider):
    name = "my_spider"

    def start_requests(self):
        urls = ['<URL>']
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        for href in response.css('a::attr(href)').getall():
            if 'Wf6hTr0x' not in href:  # Exclude unwanted files
                yield response.follow(href, self.save_file)

    def save_file(self, response):
        # Logic to save the file
 
Reply Create subthread


All articles

Subject Author Date
Exclude File Name

07/27/2025 16:15
Re: Exclude File Name

07/31/2025 06:11
Re: Exclude File Name

08/12/2025 03:57




5

Created with FORUM 2.0.11