James McEvoy

Member
  • Content count

    4
  • Joined

  • Last visited

Community Reputation

0 Poker-Face

About James McEvoy

  • Rank
    Lurker
  1. OCR of previously part-OCRed document

    Hi Heidi It is now over one month since you passed the files on to the development team. Was there any feedback? Regards James
  2. OCR of previously part-OCRed document

    Hi Heidi Attached find extracts of the same 55 pages (of 720) of the searchable original and the OCRed (by PDFElement) versions of the book. There were four pages missing from the original download, near the end of the extract, and these have been added. The original is searchable but the added pages were image only. I have done this through the forum rather than by email due to my email server file size restrictions. James Spencer & Gillen OCR pp 29-83.pdf Spencer & Gillen orig pp 29-83.pdf
  3. OCR of previously part-OCRed document

    Hi Heidi Thanks for the response. The original file is 36MB and the OCRed version is 94Mb, which are too large to email. I could load them on Google Drive and provide a link I suppose. It's a while since I have done that and I will have brush up on the process. Alternatively I can send portions of the files which total less than 10Mb. What do you think? James
  4. I have downloaded a very old book from Gooreader which had already been OCRed by Google. I inserted some pages into the document which were image only. The print is very faint so I was curious as to what would happen if the whole document was OCRed through PDFElement. The process took far longer than for a similar length book of only image pages. The result came out in a darker print which was an improvement. Google's previous OCR was ignored and the process was repeated. The added images were OCRed and so were all the other pages for a new version of the text.Understandably it was not as good as Google's version but that was to be expected. I am guessing there is no standard format for holding OCRed text, so PDFElement was not able to access Google's version of the text. I wondered what the program was doing which took so long, since it seemed to treat the document as images only. Is there an alternate procedure which which would have improved on the outcome I obtained?