| You could try grabbing everything and filtering for the text. There's a python
package called 'textract' you can use for dumping text from all filetypes, and
with a simple script you can create a text counterpart to every non-text file
(e.g., for every FOO.pdf you get a FOO.pdf.txt).
Why do you want to pull only text? | |