Certainly what you ask can be done (extract specific text from PDF),
if the PDF docs are not encrypted/secured by the creators. Text is
stored as text in PDF, not as bitmap images (unless the PDF was created
from a bitmap image) so you can pull out the text with the right
tool. PDF format is well documented by Adobe.
Here are some PDF links
http://www.adobe.com/prodindex/acrobat/adobepdf.htmlhttp://www.ep.cs.nott.ac.uk/pdfcorner/http://www.pdfzone.com/
See esp. here for extraction tools
http://www.pdfzone.com/products/software/toolinfo_extract.asp
I've written software to create PDF from various graphics/text.
It wasn't too hard. If you need to write it, software to
extract text should be a straight-forward programming project
for some software engineer. Java is a great match for PDF, since
the standard ZIP libraries of java work on PDF compressed data.
-- Don
In article <717801BBC100D211B89500805F6FAD93047D56 at snap01.synapticcorp.com>,
<Tvenkatesh at synapticcorp.com> wrote:
>I would like to know if there is software that can convert PDF file into
>text files.
>Specifically we want to extract sequences from patent documents which are
>stored as images in PDF
>format. We tried Acorobat reader, it did not help.
>I appreciate your help.
>Thanks
>Venky
>___________________________
>T. V. (Venky) Venkatesh, Ph D
>Senior Scientist (Bioinformatics and Molecular Biology)
>Synaptic Pharmaceutical Corporation
>215 College Road
>Paramus NJ 07652 - 1431
>201-261-1331x720 (Phone)
>201-261-0623(Fax)
>Tvenkatesh at synapticcorp.com>>
--
-- d.gilbert--biocomputing--indiana-u--bloomington-in-47405
-- gilbertd at bio.indiana.edu