Master Scanned Book Processing: Acrobat Pro: Comprehensive Guide: Optimal Efficiency, Searchability

แชร์
ฝัง
  • เผยแพร่เมื่อ 20 ต.ค. 2024

ความคิดเห็น • 53

  • @stevenwoodfield1658
    @stevenwoodfield1658 11 หลายเดือนก่อน +1

    Thank you so much for your videos! I've found them a perfect starting point for my own digitizing journey. Before I go on to the next step in the process, I come back to reference your videos. Thank you for saving me time and a headache, blessings to you!

    • @DigitizeYourBooks
      @DigitizeYourBooks  8 หลายเดือนก่อน

      Thank you for the kind words! I'm glad I could help.

  • @etiennedegaulle3817
    @etiennedegaulle3817 3 ปีที่แล้ว +2

    FYI, as of the December 2018 version of Acrobat DC, "the embedded index in the PDF is no longer used for searching." Hopefully they are adding an internal optimized index by default.
    Great video by the way! I just started digitizing most of my library. This video save me lots of time and trial and error!

  • @larrythibodeaux7236
    @larrythibodeaux7236 10 หลายเดือนก่อน +1

    Thank you so much for this! I bought the czur shine book scanner, and it scans flat paper ok. but converting the text into a searchable pdf is not good at all. This is way better than CZUR! Adobe doesnt make unknown characters like , and it doesnt combine 2 words into 1. And it also doesnt change the order of the paragraphs when you copy and paste it. Thank you for showing me this!

  • @scotto0010
    @scotto0010 6 ปีที่แล้ว +3

    This is an amazing set of videos. Very concise when it can be but then plenty of detail here in the final video where it is really needed. You have a very good system and seem to have thought about everything. I would love to pick your brain about this type of project but replace the word "books" with "magazines". There are a whole host of additional issues to deal with on the magazine side. Thanks for your time and effort.

    • @PeterMosier
      @PeterMosier 6 ปีที่แล้ว +1

      scotto0010 Thank you so much for the kind words. You’ve made my day! As for magazines: when I first started experimenting with and learning how to do this, I did a few magazines. The results weren’t quite as good as text-only books, which benefit from Clear Scan, but they were still pretty good. Some “moire” patterns in the photos, but I didn’t play enough to improve that totally. If you have a few questions, feel free to ask in this thread. Maybe I can help, or perhaps make a video about digitizing mags. Thanks again.

    • @scotto0010
      @scotto0010 6 ปีที่แล้ว

      That would be great. Th main things about magazines are
      #1 how do you deal with yellowing of old magazine pages? Also, what is the best "Dpi" to scan them at? Is it best to start big and down-sample form there? How would Clear Scan work with magazines - especially with colored pages and colored text? I suppose a lot of the questions though would entail the post-processing in Photoshop in order to make it look right. What size files would we be looking at for a "good", highly readable product? It seems that really only a few things would overlap between creating text only books and picture/image heavy magazines.

  • @wjhyde
    @wjhyde 5 ปีที่แล้ว +1

    Thank you for this video. Very helpful.

    • @DigitizeYourBooks
      @DigitizeYourBooks  5 ปีที่แล้ว

      Glad it was helpful. Have fun digitizing your books!

  • @MrsCalabresesTeachingChannel
    @MrsCalabresesTeachingChannel 6 ปีที่แล้ว

    Great information! Thanks! Mac user here, I often find Adobe difficult to navigate, this helps!

  • @gabrielcastejon7914
    @gabrielcastejon7914 2 ปีที่แล้ว +1

    You're a great samaritan

  • @UncleMatte
    @UncleMatte 5 ปีที่แล้ว

    Thanks for this video, in showed me LOTS of things I was doing wrong, and how to improve things way beyond what I was doing. I was hoping to ask you a REALLY long question, probably to long to put here? I there any way to send it to you, or I can put it on my dropbox account? for you to read if you have a spare moment or two? .... I can barely use Facebook, no clew about twitter or any of the "other ones" .... I can post it here, but it might bore everyone! Thanks Again !!!

  • @theanthropic8114
    @theanthropic8114 6 หลายเดือนก่อน

    Thanks for the tips. For my part, I found it odd that after combining my .tiff files using Adobe Acrobat DC Pro, then converting them to searchable images at 300dpi (originally 600dpi), then to editable text and images, the file size somehow increased from 9mb (after searchable images) to over 10mb (after editable text and images).
    Any idea why? Thanks.

    • @DigitizeYourBooks
      @DigitizeYourBooks  6 หลายเดือนก่อน +1

      🤔 I have no idea why it swelled after editable text & images. Fortunately, going from 9MB to 10 MB is not a big difference. I wouldn't worry about it -- but I'm still wondering why it happened!

  • @lkj234
    @lkj234 6 ปีที่แล้ว

    Great content! Keep up the good work!

  • @larrythibodeaux7236
    @larrythibodeaux7236 10 หลายเดือนก่อน

    Also, can you do a video of audiofying your books with the software balabolka? Meaning making them into audio books? And buying a text to speech voice like IVONA Amy voice?

    • @DigitizeYourBooks
      @DigitizeYourBooks  6 หลายเดือนก่อน

      Ooohhh... this touches a nerve for me. I have produced some audiobooks the old-fashioned way, by performatively reading the text and then painstakingly editing the production. (voice.mosier.ca/). Automated text-to-speech (TTS) is destoying the low end of the audiobook narration vocation. Having said that, I might to a video on this topic just to compare the results to a human reader. Thanks for the suggestion!

    • @larrythibodeaux7236
      @larrythibodeaux7236 6 หลายเดือนก่อน

      @@DigitizeYourBooks Haha that sounds great!

  • @gr3yg0at
    @gr3yg0at 5 ปีที่แล้ว +1

    Great video. When I followed this for a pdf I have, 578 pages, the final file size is more than double the original file size. It started out as a 43 mb file, did the first OCR "searchable" text file and the file size was reduced to 31mb. One the next OCR "editable" text the final file size jumped to 116mb. That doesn't seem right. I have gone through the process twice with the same results. Any ideas?

    • @DigitizeYourBooks
      @DigitizeYourBooks  5 ปีที่แล้ว

      Hmmmm, that is a real head-scratcher. I've not seen that before. Perhaps there is something unusual about this book? Perhaps lots of diagrams, that are more difficult to convert to "editable images"? Or maybe lots of weird fonts? Just a guess.
      This reminds me: I wish Adobe gave the option for "editable text" without also trying to create "editable images". I never want the images made to be editable, and have found for some engineering texts it really messes up parts of some images when making them editable (in dangerously subtle ways).
      The good news: because you followed my 2-step solution, you now ditch the 2nd (larger) version knowing that it is needlessly too big. That is why I always do it as a 2-step: in case there is a problem with the "editable text" version. Cheers!

    • @gr3yg0at
      @gr3yg0at 5 ปีที่แล้ว +1

      This book does have a lot of pictures and illustrations. Since you have mentioned them I am starting to think that is whats causing this issue.
      My engineer brain is thinking there must be a way to exclude the images. Now I know how Im spending my weekend.

  • @koritz123
    @koritz123 6 ปีที่แล้ว +1

    Would Adobe Acrobat Pro upgrade 2015 do an equivalent job as opposed to leasing the 2017 version for a month. The 2015 upgrade is available for $65. I checked and if I'm not mistaken the 2017 version of Adobe DC leases for approximately $25 a month as of December 2017.

    • @DigitizeYourBooks
      @DigitizeYourBooks  6 ปีที่แล้ว +1

      Hi koritz123, thanks for asking, I am flattered that you asked. However I do not know the answer. They key question is: does the 2015 version do the "Editable Text and Images" OCR feature? I think it was called "ClearScan" back then, and I don't know if Adobe made any changes to the algorithm when they changed the name. You should also know that the 2017 Subscription includes other services that may, or may not, be important to you, so you can consider that. Having said that, if the 2015 has ClearScan (aka Editable Text and Image OCR) and you don't need any newer features, then the 2015 version should be OK for you.

    • @koritz123
      @koritz123 6 ปีที่แล้ว +1

      Digitize Your Books I found something online about the new version of Adobe Acrobat Pro DC 2017 being able to resize or rescale pages to be more easily read in Kindle and other e-reader software so that being the case I like the idea of being able to resize a PDF other than just cropping off the white part of the perimeter. So this goes along with your comments about the 2017 version having potentially more features that may be useful than maybe an Antiquated version that's a few years old.

  • @UncleMatte
    @UncleMatte 5 ปีที่แล้ว

    I tried to contact you through twitter, what a nightmare, it endless looped me to "how to" do this and that, but no way to sign up and fix things! A bit dizzy, I'll try again later. Either way your help is GREATLY appreciated?

  • @rodolfo6168
    @rodolfo6168 3 ปีที่แล้ว +1

    Recommend Downsample: 300 dpi

    • @DigitizeYourBooks
      @DigitizeYourBooks  3 ปีที่แล้ว

      I scanned at 300 dpi (see 10:15 in video). Are you recommending down-sampling to something lower than that? Thanks!

  • @UncleMatte
    @UncleMatte 5 ปีที่แล้ว

    Before I start going farther down the "Rabbit Hole". I have an older Acrobat 7 Pro version. Is it worth it for me to spent ($100 to $150) for a much newer version of Acrobat? Thanks!

    • @DigitizeYourBooks
      @DigitizeYourBooks  5 ปีที่แล้ว +1

      I would only suggest upgrading software if your current software is missing a feature you need. Specifically, I personally MUST have “Editable text and images” feature, as explained in this video. That feature has had different names in previous versions, and I don’t know whether or not v7 has that feature. If it does (by any name) then probably no need to upgrade. Cheers.

    • @UncleMatte
      @UncleMatte 5 ปีที่แล้ว +1

      @@DigitizeYourBooks Thanks, I "Upgraded to Ver XI. A little different, but I'll get the hang of it eventually. Thanks for your advice.

  • @gr3yg0at
    @gr3yg0at 4 ปีที่แล้ว +1

    I just finished digitizing another book. As I started to read through it I noticed Adobe Acrobat had changed some of the words. I compared it to the original scan and the paper book itself and confirmed words were being changed. What I have found is words are being changed during the step when changing text with the editable text and image option. I'm curious if anyone else is seeing this happen.

    • @DigitizeYourBooks
      @DigitizeYourBooks  4 ปีที่แล้ว

      I haven't noticed this but it may be possible. I have noticed where engineering graphs and drawings get modified during the "Editable text and Images" process. It is for this reason that I first do a conventional OCR, and then repeat the OCR using "Editable Text" -- just in case the "Editable" process messes up.
      My guess for what is happening: OCR is not perfect. And when using "Editable Text" method, the image of the word(s) is replaced by the OCR result. So if there is an error in the OCR, that error is now "baked in" the final text.
      Thanks for commenting. Cheers!

    • @gr3yg0at
      @gr3yg0at 4 ปีที่แล้ว

      @@DigitizeYourBooks I'm also curious why using the "editable text" more than doubles the file size. My book went from 22mb to 88mb. The book did have a lot of images and I wonder if this is whats causing the jump in file size.

    • @DigitizeYourBooks
      @DigitizeYourBooks  4 ปีที่แล้ว +1

      Another TH-cam viewer had the same issue. I suspect you are correct: images seem to be not handled well with Acrobat's "Editable Text and Images" option. I really, REALLY, wish Adobe would give us an option for "Editable Text" which doesn't try to make the images editable. In addition to file size growing, I have seen it mangle the images, but so subtly that it isn't obvious -- truly dangerous for an engineering textbook. That is the main reason that my process is two-step: (1) regular OCR and (2) Editable Text/Images OCR.
      Hope this info helps. Cheers!

  • @daithiocinnsealach1982
    @daithiocinnsealach1982 5 ปีที่แล้ว +2

    That book Voodoo Science looks interesting, but the cover is awful. I''m even more shocked to see it's an Oxford Press book. It looks like an attempt by an amateur self-publisher, rather than a professional cover made by one of the largest and most prestigious publishers in the world...

    • @DigitizeYourBooks
      @DigitizeYourBooks  5 ปีที่แล้ว +1

      Agreed: not a very impressive cover on this book. Contents are interesting, though.

  • @DanielRamos-zx1kh
    @DanielRamos-zx1kh 7 ปีที่แล้ว

    Hi Peter, do you have any way of private message you?

    • @DigitizeYourBooks
      @DigitizeYourBooks  7 ปีที่แล้ว +2

      I am on Twitter @PeterMosier you can follow me, and then DM me there if you like. What did you want to talk about?

    • @DanielRamos-zx1kh
      @DanielRamos-zx1kh 7 ปีที่แล้ว

      I just tried to DM you at Twitter but I only can If you follow me. Anyway I Just OCRed a scanned book, but there are some texts that aren't recognized by the OCR. Look here: i.imgur.com/avlUkD4.jpg
      Do you know how to get recognized these texts?

    • @DigitizeYourBooks
      @DigitizeYourBooks  7 ปีที่แล้ว +1

      I suspect the problem is poor contrast. That is, instead of black and white (the normal for most books) your example had light grey text on a non-white background. From my experience, poor contrast confuses OCR.
      You can try playing with the contrast settings in your scanning software to try to increase the contrast so that it works. However, you might not be able to ever get it to OCR correctly, especially for the very light grey text.

    • @DanielRamos-zx1kh
      @DanielRamos-zx1kh 7 ปีที่แล้ว +1

      Digitize Your Books Thanks for your response! And you know how to type that specific part manually?

    • @DigitizeYourBooks
      @DigitizeYourBooks  7 ปีที่แล้ว +1

      Manual corrections may or may not be possible, depending on which software you are using.