| I guess the .ops file, which I am not familiar with, is from the WinHTTrack
client?
Using here only the command line version.
Yes, the downloaded BBC artifact delivers for me not the protocol error, but
the result you described seems pretty much the same.
The actual content is contained, but it's not nice to look at. And that
webpage has heavy JavaScript stuff in it. Bet, the editing exercise was a pain
in the ass, wasn't it?
On command line I tried this parameterization:
---
httrack <https://www.bbc.co.uk/bitesize/guides/z26rcdm/revision/1> -v -s0 -F
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/74.0.3729.169 Safari/537.36" -%c2 -%B -%s -%v
---
and later I added some more filter rules as I saw in the webpage's source code
that bunch of .js files, but apart from some more artifact downloads, it
didn't impact the visual result at all.
---
httrack <https://www.bbc.co.uk/bitesize/guides/z26rcdm/revision/1> -v -s0 -F
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/74.0.3729.169 Safari/537.36" -%c2 -%B -%s -%v -%S
"bbc-filters.txt" -i
---
whereas bbc-filters.txt contains this:
+https://www.bbc.co.uk/bitesize/guides/*
+https://fig.bbc.co.uk/*
+https://www.bbc.com/wwscripts/*
+https://static.bbc.co.uk/*
+https://nav.files.bbci.co.uk/*
+https://m.files.bbci.co.uk/*
+https://mybbc.files.bbci.co.uk/*
+https://sa.bbc.co.uk/*
+https://int.bbc.co.uk/*
+https://int.bbc.com/*
+https://test.bbc.co.uk/*
+https://test.bbc.com/*
+https://stage.bbc.co.uk/*
+https://stage.bbc.com/*
+https://live.bbc.co.uk/*
+https://live.bbc.com/*
| |