| I am trying to harvest some unedited OCR output in html files of the form:
<http://.../cgi-bin/witch/docviewer?did=060&seq=158&frames=0&view=text> The text
of interest is available both as manuscript images (gif, I think) and text. Is
it possible to mirror all the texts from ...&seq=1 to the end of the chain?
Under the text view the characters in the html source are in block:
<h3>Text of page:</h3>
<pre>
<P><B>Page: </B>158<P>
<I>
text here
</I> 91.<BR>
</pre>
Thanks. | |