HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: picture tag with source subtags
Author: Mitch B
Date: 06/22/2017 23:41
 
Here is a python script that I wrote that removes all the srcset tags that you
can use after your website is downloaded.  

#Given a PATH go through all the html files and delete all of the srcset
subtags for img tags.
#files will be overwritten

from bs4 import BeautifulSoup
import os
from glob import glob
import codecs

PATH = "<downloaded website path>"

result = [y for x in os.walk(PATH) for y in glob(os.path.join(x[0],
'*.html'))]
for filename in result:
	print filename
	file = codecs.open(filename,'r','utf-8')
	data = file.read()
	soup = BeautifulSoup(data, 'html.parser')
	for p in soup.find_all('img'):
		if 'srcset' in p.attrs:
			del p.attrs['srcset']
	file.close()
	file1 = codecs.open(filename,'w','utf-8')
	file1.write(soup.prettify())
	file1.close()
	

 
Reply Create subthread


All articles

Subject Author Date
picture tag with source subtags

03/24/2015 21:21
Re: picture tag with source subtags

06/01/2015 11:52
Re: picture tag with source subtags

01/05/2016 09:09
Re: picture tag with source subtags

06/22/2017 23:41
Re: picture tag with source subtags

09/20/2017 13:05
Re: picture tag with source subtags

06/27/2018 09:33
Re: picture tag with source subtags

03/23/2019 21:41




f

Created with FORUM 2.0.11