How to convert PDFs to audiobooks with machine learning

แชร์
ฝัง
  • เผยแพร่เมื่อ 23 ต.ค. 2024

ความคิดเห็น • 722

  • @edwincaballero690
    @edwincaballero690 3 ปีที่แล้ว +9

    Title says "How to convert PDFs to audiobooks with machine learning" it does not say "How to convert PDFs to audiobooks for the first time ever groundbreaking discovery with machine learning". Appreciate the work and explanation.

  • @snlagr
    @snlagr 4 ปีที่แล้ว +489

    Using that font size approach to remove garbage text was clever!

    • @tusharmaurya1668
      @tusharmaurya1668 4 ปีที่แล้ว +15

      But it will also remove the headings and sub-headings (b/c it is not in common font size)....

    • @lvishal8556
      @lvishal8556 4 ปีที่แล้ว +5

      U can give the garbage text a different type font not size. Then u can just remove that type font. In that way no other issue will be seen I guess

    • @ieornl
      @ieornl 4 ปีที่แล้ว +1

      @@tusharmaurya1668it would be looking at the actual font size of the garbage text, not just the font size of the body, that way, the headings won't be excluded.

    • @archimidiz
      @archimidiz 4 ปีที่แล้ว

      @@lvishal8556 isn't the problem how to identify the garbage text from the actual content? So, if we knew which one is the garbage text which we would have to know to give a different font as you said, we wouldn't have to go through all this trouble

    • @lvishal8556
      @lvishal8556 4 ปีที่แล้ว

      @@archimidiz then what's ur approach ?

  • @kayambilampulamasaka739
    @kayambilampulamasaka739 4 ปีที่แล้ว +26

    A lot of people don't understand the sense of fulfilment and satisfaction that comes with building things yourself. Using existing apps is a good idea but building your own opens you up to a whole world of new possibilities. Thanks for sharing!

  • @driziiD
    @driziiD 4 ปีที่แล้ว +1425

    lol, i am literally working on this exact project, with the exact APIs, to solve the exact problem of listening to technical books

    • @dalemarkowitz8021
      @dalemarkowitz8021 4 ปีที่แล้ว +66

      Lol! I'm curious to know how you handle the problem of deciding what to include in the audio!

    • @DenITDao
      @DenITDao 4 ปีที่แล้ว +11

      @@dalemarkowitz8021 that's autoML Tables task, I think

    • @skaterope
      @skaterope 4 ปีที่แล้ว +33

      and i gonna build a startup from it hahahahaa

    • @moustafarahal3396
      @moustafarahal3396 4 ปีที่แล้ว +11

      @@dalemarkowitz8021 guess someone needs to build a machine learning model/project to classify the parts of a page into different categories then we decide what part we read based on what "we know" it is

    • @SakibAhmedSkB
      @SakibAhmedSkB 4 ปีที่แล้ว +6

      @@moustafarahal3396 I am thinking like training a TensorFlow objection detection model with labeled pages to detect "important" portions and then feed the OCR with cropped portions... IDK if it would be a super noob approach... (since i am :3 )

  • @bryngerard4334
    @bryngerard4334 4 ปีที่แล้ว +185

    I achieved this myself about 12 years ago but not using ML. I merely formatted the PDF files so that they were easily processed by the voice engine in Acrobat Reader. It worked fine although I didn't have the range of voices you have now. I got used to 'Marvin' (sounded a bit like Hawkings) and listened to many books. I did go so far as to create .wav files from them but the data was massive, although converting to MP3 would have been possible. Still a great deal of data though. I had all of the text in a Word file which I would print to a Postscript file and convert to PDF using Ghostscript.
    This approach would have been useful but I think using a voice engine is a lot more efficient.

    • @9888622400
      @9888622400 4 ปีที่แล้ว +3

      I did same 5 years ago, using pyPDF and ESpeak!

    • @asandax6
      @asandax6 4 ปีที่แล้ว

      Instead of MP3 convert to AAC

    • @misutasa
      @misutasa 4 ปีที่แล้ว +1

      There's a lot of stuff I'm missing out just because I'm not resourceful enough huh

    • @KS-wt6yg
      @KS-wt6yg 4 ปีที่แล้ว

      @@asandax6 how

    • @UserUnknown07
      @UserUnknown07 4 ปีที่แล้ว +3

      The art of Bodge !
      I use @ Voice Aloud Reader. It's simple and has many voices too.

  • @hereiamrakshith
    @hereiamrakshith 4 ปีที่แล้ว +1395

    I thought there's an app ready to download here.

    • @MridulSharma21
      @MridulSharma21 4 ปีที่แล้ว +57

      No, she just converted the pdfs into mp3 files.

    • @sreenivasulu7554
      @sreenivasulu7554 4 ปีที่แล้ว +18

      Same here.😒

    • @artthatsnice8817
      @artthatsnice8817 4 ปีที่แล้ว +5

      me too🤣

    • @ihtesham_emon
      @ihtesham_emon 4 ปีที่แล้ว +104

      And my Moon+ reader can do this automatically without any machine learning 😄

    • @RKEDITZ
      @RKEDITZ 4 ปีที่แล้ว +7

      @@ihtesham_emon thanks

  • @adiparzival4682
    @adiparzival4682 4 ปีที่แล้ว +459

    Why not do it the easier way?
    1. Convert pdf to epub file.
    2. Upload the epub file to google playbooks and then download it on the same app.
    3. Use the read aloud feature and enjoy.

    • @alexjr977
      @alexjr977 4 ปีที่แล้ว +29

      boi . Where is read aloud

    • @adiparzival4682
      @adiparzival4682 4 ปีที่แล้ว +28

      @@alexjr977 After steps1 and 2 . When u open the epub file and touch any page, u will find three vertical dots at the upper right corner. There u'll find the read aloud feature.

    • @alexjr977
      @alexjr977 4 ปีที่แล้ว +12

      @@adiparzival4682 bro i upload this on website not in app .But Thank you for sharing knowledge 😀🤗

    • @adiparzival4682
      @adiparzival4682 4 ปีที่แล้ว +15

      @@alexjr977 Also I was thinking why work so hard when things are easy? 😅
      It's like u have a bike and u move around ur entire city to go to ur school which is just 15 minutes walk from ur home😂😂

    • @alexjr977
      @alexjr977 4 ปีที่แล้ว +5

      @@adiparzival4682 Lol

  • @sanapalalakshmipathi
    @sanapalalakshmipathi 4 ปีที่แล้ว +665

    Don't do with Mathematics books.

  • @bucketofbarnacles
    @bucketofbarnacles 4 ปีที่แล้ว +19

    This is a fantastic idea for any PDF, not just research papers. I would take it two steps further. 1) Optionally converting it into a podcast format. As a podcast, I can listen to it in Airr, which allows you to bookmark audio clips and transcribe those clips. This is like taking notes while you listen, and is long-walk friendly. 2) Put this up on a web front-end for public use.

    • @Beautiful1234
      @Beautiful1234 ปีที่แล้ว

      How do you do this?

    • @travv88
      @travv88 ปีที่แล้ว

      that sounds like something I'd like. Often when I listen to things while walking I hear something I want to take note of and have thought of some solution like this.

  • @purpleghost3863
    @purpleghost3863 4 ปีที่แล้ว +6

    This type of project puts a smile on my face. It’s fun, useful, and simple to understand and implement

  • @multiversodebolso
    @multiversodebolso ปีที่แล้ว +1

    I am completely in love with this Google tool! The videos on my channel will be made completely like this and even in two languages!

  • @NathanSubramani
    @NathanSubramani 4 ปีที่แล้ว +3

    You have shared the approach highlighting the ML functions & other GCP modules.
    Gives a clear pathway in to how problem was resolved.
    Good one.

  • @robmonkriedlinger
    @robmonkriedlinger 3 ปีที่แล้ว +5

    This is so exciting - perfect for my Dyslexic daughter to use at school - thank you!!!

  • @adityashaw3198
    @adityashaw3198 4 ปีที่แล้ว +165

    Add this tech in an e-book reader and launch something like kindle, Google

    • @itsmerg5273
      @itsmerg5273 4 ปีที่แล้ว +2

      yo HALO fan

    • @sasimitra5871
      @sasimitra5871 4 ปีที่แล้ว +11

      It's already been there for years
      Google Play Books and Kindle have a read aloud option. AND Microsoft Edge's pdf viewer has a read aloud option.

    • @4TH4RV
      @4TH4RV 4 ปีที่แล้ว

      I am pretty sure Google books has that

    • @johnjordan3552
      @johnjordan3552 4 ปีที่แล้ว +1

      There are apps like that but they don't have advanced settings over the TTS, a shame really

    • @flavioallemand5217
      @flavioallemand5217 4 ปีที่แล้ว

      Some PDFs are actually images so it’s necessary first to convert the file into intelligible text and afterwards audio files.

  • @vidasale5633
    @vidasale5633 3 ปีที่แล้ว +2

    1 had no clue 2 watch the video 2nd time ive got 20% 3rd time ive got 40% and 4th time ive got 70% wow no more for today i need to clear my head and watch it tommorow .....thenk you very much ,,,

  • @sadiqadeyanju
    @sadiqadeyanju 4 ปีที่แล้ว +2

    I used balabolka to extract the text from the PDFs, used find and replace to remove the garbage and used Amazon Polly to save the audio to an S3. Downloaded 32 hours audio from academic papers for my MBA class.

    • @hectorprx
      @hectorprx 4 ปีที่แล้ว

      Hello, I was using Polly also but limited to a 100 k of characters which required batching the job. Did you figure out a work around ? Thanks

    • @sadiqadeyanju
      @sadiqadeyanju 4 ปีที่แล้ว

      @@hectorprx nope, let me know if you ever do

  • @carlitosviewsyou
    @carlitosviewsyou 4 ปีที่แล้ว +10

    Wow before this video came out, I put together a rudimentary version of this with AWS Polly. Your font-size based approach to cutting out the junk is so good.

    • @VilokReddy
      @VilokReddy 4 ปีที่แล้ว

      aws polly excellent tool

    • @hectorprx
      @hectorprx 4 ปีที่แล้ว +1

      Hello, I was using Polly also but limited to a 100 k of characters which required batching the job. Did you figure out a work around ? Thanks

    • @VilokReddy
      @VilokReddy 4 ปีที่แล้ว

      @@hectorprx I used polly way before I didn't have much idea at now

    • @carlitosviewsyou
      @carlitosviewsyou 4 ปีที่แล้ว +1

      @@hectorprx My script it iterates through the entire text and at every word, it adds the word length to a characterCount variable. I use the characterCount variable to make sure it grabs 100k characters or less at a time and if a word pushes it past the limit, I drop the word and send the request, before starting the next chunk. There's no way to increase the limit as far as I can tell.

  • @jofx4051
    @jofx4051 4 ปีที่แล้ว +73

    Google: Upload video
    Also people: *There are many apps existing before you*

    • @Mzulfreaky
      @Mzulfreaky 4 ปีที่แล้ว +6

      Well, GCP is for developers so the video uploaded is targeted for developers to build their own in Google's cloud

  • @javiermiranda-lozano9184
    @javiermiranda-lozano9184 2 ปีที่แล้ว +2

    Great Video! But I was wondering if you could do a step-by-step video because I am having trouble with the minute details?

  • @aravindkrishnasaravu2223
    @aravindkrishnasaravu2223 4 ปีที่แล้ว +45

    Wow..This is legit the practical use of ML in real life which millions of people want to do..converting them into audiobooks is very helpful for college students and as a project to do it would be very helpful..Thank you, GCP...Now I know why I had subscribed to you guys..Thanks again :-)

    • @dalemarkowitz8021
      @dalemarkowitz8021 4 ปีที่แล้ว +2

      Aw, glad to hear it! That's why we make 'em! :)

    • @bhavyadhingra9463
      @bhavyadhingra9463 2 ปีที่แล้ว

      @@dalemarkowitz8021 is it thefree service

  • @harshavardhannaik3325
    @harshavardhannaik3325 4 ปีที่แล้ว +4

    Most PDF readers already provide this feature of 'read out aloud'. I have been using this feature for more than 9 years.
    But, it reads the page numbers as well which isn't a big deal.

  • @alanwheeler1
    @alanwheeler1 2 ปีที่แล้ว

    The @Voice Aloud app for Android has been reading PDF research papers, articles, and entire books to my ears since 2015!

  • @Blue-_-Jay
    @Blue-_-Jay 4 ปีที่แล้ว

    I use this app called Read Aloud for PC and eReader Prestigo for Mobile. The voice sounds pretty natural to me - cutomizable. To add to that the text can be edited to your liking. As a lawyer, helps reading those lenghty judgements & books on the go. I am addicted to them now, can't imagine my life without them. No more scared of heavy texts and books - just need the pdf.

  • @elvisjmartis
    @elvisjmartis 4 ปีที่แล้ว

    Since I do not have so much expertise in coding, what I did was used ocr and aligned the content in a word file. Then used the microsoft immersive reader to read out the text for me.

  • @RaviTrek
    @RaviTrek 4 ปีที่แล้ว +4

    This was what .i was working on few years ago but approach used in this was amazing..

  • @tinselinkl
    @tinselinkl 4 ปีที่แล้ว +1

    I've been using text aloud for all my pdf. So far so good. It reads blogs too or whatever on the screen.

  • @NoSTs123
    @NoSTs123 26 วันที่ผ่านมา

    What a cool concept, Thanks for bringing it alive!

  • @anispinner
    @anispinner 4 ปีที่แล้ว +1

    Finally... A channel without ads

  • @rahmankhan566
    @rahmankhan566 4 ปีที่แล้ว +18

    This is already available in adobe few months back.

  • @parasdahiya1823
    @parasdahiya1823 4 ปีที่แล้ว

    Pocket app does this really well. I'm not sure if it handles PDFs but their own voice outputs are really good.

  • @anshulshrivastava7685
    @anshulshrivastava7685 4 ปีที่แล้ว +4

    I am trying with different approach, like removing the words (

  • @TomConder
    @TomConder 4 หลายเดือนก่อน

    We have powerful LLM models these day. If you were to write this today, would you leverage them to filter out the junk words?

  • @yogasounds1
    @yogasounds1 2 ปีที่แล้ว +1

    can you do a step by step walk through of this procedure please? thanks

  • @rachman3339
    @rachman3339 4 ปีที่แล้ว +1

    I did the exact same thing 40 years algo, but instead a computer I used a tape recorder and myself as text to speech. And as a player and headphones I used a walkman.

  • @wisdomlounge4452
    @wisdomlounge4452 3 ปีที่แล้ว

    In many cases you can't be sure which of the non-standard font size text is "junk text" without manually looking it over. I wouldn't simply and categorically think of all headings and subheadings as "junk text".
    These parts of the document weren't put there for nothing. Their purpose is to help organize the information being presented. Once you see or have the heading/subheading read, you'll know, especially if you got distracted, where in the document you're at.
    You can also know, through the headings what's about to be covered. I'm also not really bothered about page numbers read out loud. To me that's kind of like an "audible progress bar". I don't mind knowing how far along I am in the audiobook converted pdf, lol.
    If it's a matter of speed and efficiency, increase the read out loud speed to 1.5x - 2x or whatever you can handle.

  • @GeekMustHave
    @GeekMustHave 4 ปีที่แล้ว

    You are a clever girl, said the Doctor. Love your delivery style and pace. The associated graphics are great for following the process. Keep broadcasting!!

  • @KanagawaMarcos
    @KanagawaMarcos 3 ปีที่แล้ว

    The amount of views in this videos show how much good this problem is to teach as an example to google's cloud products.

  • @douglasharley2440
    @douglasharley2440 4 ปีที่แล้ว

    smart shortcut using the font-style hack...waaaay better than several days of marking-up for training.

  • @arpit.s
    @arpit.s 4 ปีที่แล้ว +27

    How about building an AI that reads text and summaries it? What extra steps will be needed apart from the above mentioned ones?

    • @kartikkalia01
      @kartikkalia01 4 ปีที่แล้ว +8

      GPT3

    • @madhu619
      @madhu619 4 ปีที่แล้ว

      Yeah exactly. GPT-3 is your answer.

  • @heyrmi
    @heyrmi 4 ปีที่แล้ว +1

    I am really amazed. I have a lot of books in pdf format.
    And have gone through the same phase.

  • @tanveersingh8290
    @tanveersingh8290 4 ปีที่แล้ว +2

    i made a similiar thing where i used my camera to detect text and it will speak it to me word by word while also highlighting the word. It can work in realtime where you use your camera and it can also work in normal mode.

  • @zarnilynnkyaw
    @zarnilynnkyaw ปีที่แล้ว +1

    Will there be an app for that, it would be really great to listen to paper while I'm on the treadmill.

  • @TrendRain
    @TrendRain 4 ปีที่แล้ว

    Yup. That's how it's done. Talk to someone who has done it, copy the code and tada you have built machine learning model that converts pdf to audio. I used to overthink about programing but this is a way to go.

  • @DelfinoGarza77
    @DelfinoGarza77 3 ปีที่แล้ว

    I like the "garbage text"....I gives me a clue that something is changing, or there is something I need to actually look at.

  • @drwriddhimanchattopadhyay2701
    @drwriddhimanchattopadhyay2701 4 ปีที่แล้ว +1

    There is an app.
    eReader Prestige.
    It reads pdfs like audiobooks. It has local languages too.

  • @webdancer
    @webdancer 4 ปีที่แล้ว +1

    I've been using balabolka and the old ivona voices to achieve these seamlessly. Glad to see this on GCP. I'll definitely check this out. Thanks

  • @Clire-h6r
    @Clire-h6r 4 ปีที่แล้ว

    What about for those of us who can't buy the "bucket" on google?
    Thank you Dr Markowitz and chalom!

  • @ashfaqueahamedf2820
    @ashfaqueahamedf2820 4 ปีที่แล้ว +5

    Bring this soon i need to study with this for my exams😜

  • @deeliciousplum
    @deeliciousplum 4 ปีที่แล้ว +4

    Wonderful and practical use of machine learning.

  • @aravindv9291
    @aravindv9291 3 ปีที่แล้ว

    Should you be going for walks in the first place, if you are in quarantine?

  • @VishwaSeneviratne
    @VishwaSeneviratne 4 ปีที่แล้ว +2

    Hi Dale, I would like to see how to use GCP ML services to use time series data from things like metrics and predict future behavior patterns etc.

    • @dalemarkowitz8021
      @dalemarkowitz8021 4 ปีที่แล้ว +1

      Hmm, good idea!

    • @VishwaSeneviratne
      @VishwaSeneviratne 4 ปีที่แล้ว +2

      @@dalemarkowitz8021 I was clueless about how to do it. I'm interested on a using data set like InfluxDB or Prometheus. Would be grateful if you can initiate a video on that. By the way your videos are very good. Keep up the good work!!!

  • @johnjordan3552
    @johnjordan3552 4 ปีที่แล้ว

    There are apps that let you use tts on PDFs, it's a whole lot of more practical than using an mp3 file except that I haven't found a software that gives you the option to make the tts ignore certain texts(like page numbers, repetitive sentences at the top-bottom of texts etc.). So other than converting the PDFs to mp3 it would be more beneficial and productive to add advanced settings such as making tts ignore "garbage texts". If you guys develop a PDF reader like that, I'd definitely use it instead of the current app I am using

    • @Chloroplastism
      @Chloroplastism ปีที่แล้ว

      try moon readers built in tts - you can modify settings to ignore stuff.

  • @Beautiful1234
    @Beautiful1234 ปีที่แล้ว

    Thank you for the great video! how can we download the audio book after completing the process? appreciate if you can share

  • @SGoswami41
    @SGoswami41 4 ปีที่แล้ว

    Have you ever heard about Narrator's Voice? It does the same thing, I'm using it since the last 3 years for making my college notes into audiobook, basically MP3 files.

  • @pratyushpanda1985
    @pratyushpanda1985 4 ปีที่แล้ว +7

    What is the total API and GCP cost for converting 1 pdf(10 pages) to an audiobook?

    • @dalemarkowitz8021
      @dalemarkowitz8021 4 ปีที่แล้ว +4

      For me, it was free, because I stayed in the free tier! The cost varies depending on your usage. See the blog post for more details.

    • @siliconslice
      @siliconslice 4 ปีที่แล้ว +1

      @@dalemarkowitz8021 I used "IVONA Voices 2" with "Adobe Reader" 's 'Rread Out Loud' option but "IVONA" was for PC only and paid we need something same with Android handset but free. Can you please suggest us about that !?

    • @siliconslice
      @siliconslice 4 ปีที่แล้ว +1

      @@dalemarkowitz8021 And one friendly suggestion, you have 24 'subscribers' including me :D, you should upload.

  • @DumblyDorr
    @DumblyDorr 4 ปีที่แล้ว +5

    This is very cool! Though I fear the usefulness of reading academic texts aloud is limited because of the many figures and formulas they often contain.
    I do like the approach of choosing the most-used font-size as a heuristic indicator - nice thinking :)

    • @delfinenteddyson9865
      @delfinenteddyson9865 3 ปีที่แล้ว +1

      I suppose it highly depends in what academic field you are. For a lot of social sciences it could be quite nice.

    • @M22340
      @M22340 3 ปีที่แล้ว +5

      *laughs in social sciences*

  • @paulbernardo6626
    @paulbernardo6626 4 ปีที่แล้ว

    Pls launch such feature this year this is amazing

  • @shashankjajoo
    @shashankjajoo 4 ปีที่แล้ว

    Useful for reading anything which primarily requires your critical thinking, a good aid indeed.

  • @AdityaRajput-pj4yg
    @AdityaRajput-pj4yg 4 ปีที่แล้ว

    I use an app named "eReader Prestigio" which uses google text to speech engine allows me to listen like an audiobook. it's works actually better than I expected. and also they have paid version which uses their own developed text to speech engine which works similarly as shown in the video.

  • @ihtesham_emon
    @ihtesham_emon 4 ปีที่แล้ว +2

    It still sounds like machine voice. Instead try IVONA text to speech engine for TTS which has very robust natural voice in HD quality.

  • @manushitrivedi9182
    @manushitrivedi9182 4 ปีที่แล้ว +1

    Hehheh I am doing this since last 6 months by just copy pasting the text to text to speech Google platform. Although this is the smart and easy way now ;) thanks for sharing...

  • @Alkhinjari
    @Alkhinjari 4 ปีที่แล้ว +47

    Please let the next project be: how to do a "literature review" using AI? this would be a revolutionary step in the research world.

    • @daleonai7448
      @daleonai7448 4 ปีที่แล้ว +1

      This is an interesting idea!

    • @zapy422
      @zapy422 4 ปีที่แล้ว +1

      I’m exploring this

    • @shantanujain6376
      @shantanujain6376 3 ปีที่แล้ว +1

      Copy other similar topic's literature review and change the words. No plagiarism, no time spent and ready made references lol

    • @williamseipp9691
      @williamseipp9691 3 ปีที่แล้ว +1

      literature review? you mean like sentiment analysis? Or summary?
      the latter is far more interesting, like an ELI5 app.

    • @AdobadoFantastico
      @AdobadoFantastico 3 ปีที่แล้ว

      @@zapy422 That's dope, best of luck!

  • @DhruvSingh-ru8nx
    @DhruvSingh-ru8nx 4 ปีที่แล้ว +2

    Really interesting and informative. Looking forward to more videos based on projects on ML/AI

  • @iau
    @iau 4 ปีที่แล้ว +90

    _"So now that I'm in quarantine, I go out on walks without a mask"_

    • @johnjordan3552
      @johnjordan3552 4 ปีที่แล้ว +2

      The masks, especially when worn appropriately, greatly reduces the spread of water droplets that come out of our our mouths and nose. Which the germs that gets transmitted to one another reside in

    • @JBB685
      @JBB685 3 ปีที่แล้ว

      Why would you walk outside with a mask on

  • @GermannoTeles
    @GermannoTeles 3 ปีที่แล้ว

    Can you show the cost off using cloud storage bucket + vision+ AutoML Tables and others

  • @1anre
    @1anre 4 ปีที่แล้ว

    Been thinking of how to replicate Adobe Reader voice assistant in other documents which are open on other applications & now this GCP ML project popped up.
    Mega Cool

  • @jaydev7248
    @jaydev7248 3 ปีที่แล้ว

    Interesting & very useful concept for Google Android app making.

  • @sohamgurav7713
    @sohamgurav7713 4 ปีที่แล้ว +24

    The next thing we need to focus on is how to convert text to speech in natural voice, rather then that robotic

    • @tomfreemanorourke1519
      @tomfreemanorourke1519 3 ปีที่แล้ว

      I believe there is the voice 'sampler' method from the early 90's that has now come full circle and one's 'own' voice is convertible as are many celebrity voices.....Could you listen to yourself reading aloud to?....yourself?.......

    • @williamseipp9691
      @williamseipp9691 3 ปีที่แล้ว +1

      I was thinking about a related question: what are the audio qualities of the things we hear everyday? Specifically to your idea: what are the audio qualities of a given voice that make it sound robotic vs natural?
      If an AI model can Van-Goghify a photo with "style transfer" then surely another model can apply a realistic mask or filter of sorts to make a generic robotic voice sound human.
      Ugh. Actually that's a hard problem.

  • @JL-sp8ss
    @JL-sp8ss 2 ปีที่แล้ว

    thanks for this i need to try this. It will help my dyslexia condition. V hard to read text. i am wondering how easy to set it up.

  • @adamm2716
    @adamm2716 4 ปีที่แล้ว

    neat project i might have to try to do it myself

  • @VaibhavShewale
    @VaibhavShewale 4 ปีที่แล้ว +1

    can we change the audio of the output?

  • @sm4773
    @sm4773 4 ปีที่แล้ว

    My pdf reader does that (foxit pdf). I use it for proof check. However, it can not omit page number, line number, equation or all unnecessary things.
    I guess its a great app idea and very useful for website like newspaper reading also. I saw one newspaper have inbuilt in them.
    Another important thing is the reverse. I record every lectures in mp3 but can't convert in text easily. I heard Google is working on it and made it available in Google phones.

  • @lemasch01
    @lemasch01 4 ปีที่แล้ว

    Omg! I was looking for such a solution eagerly! I can’t stand reading long texts for my studies! Please share how you coded it! Is there also maybe a way to convert scanned/photographed texts from books to audio? And: can I make it run on my iPhone (I guess not)?

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 4 ปีที่แล้ว +2

    This is cool project. Is it expensive though? cool, I saw the answer now. Thanks!

  • @Trazynn
    @Trazynn 4 ปีที่แล้ว

    I love the project, exactly wat I was looking for. However, I do (subjectively of course) prefer Azure's voice generator so that's what I'm using the final part for.

  • @aspd00
    @aspd00 ปีที่แล้ว +1

    Fantastic
    Now give an option for procrastinators

  • @rishabgupta1772
    @rishabgupta1772 4 ปีที่แล้ว +1

    Do they have a website/app to do this on demand?

  • @craigstark22
    @craigstark22 10 หลายเดือนก่อน

    Great idea, but what about non-public content copyright and AI LLM governance issues?

    • @curious_one1156
      @curious_one1156 8 หลายเดือนก่อน

      Not for commercial use without Author's permission.

  • @rishabhkumardjain
    @rishabhkumardjain 4 ปีที่แล้ว

    @Voice app does this exact same thing. Not only does it handle PDFs but also a few other audiobook formats

  • @TIENTI0000
    @TIENTI0000 4 ปีที่แล้ว +2

    that is just awesome! we are living in the future! finally

    • @lovingputu5916
      @lovingputu5916 4 ปีที่แล้ว +2

      You do know right that there is an app named Moonreader ?

  • @sabelokhanyile
    @sabelokhanyile 3 ปีที่แล้ว +1

    Hi Dale, hope you are well and safe. I am not a developer and I was trying to develop the pdf to the audiobook project. Your blog post and video was informative but struggling with the setup of the code from github

  • @parimaltingane1798
    @parimaltingane1798 3 ปีที่แล้ว

    It seems so complicated though I would give it it a try thank you.

  • @victorboyi6383
    @victorboyi6383 4 ปีที่แล้ว +2

    Wow!!! This is literally the starter that I needed for my project.

  • @abhishtindulkar
    @abhishtindulkar 4 ปีที่แล้ว

    do they use chrome OS in the video or they just used apple? the three dots in the top left of the browser...

  • @RealMattHaney
    @RealMattHaney 2 ปีที่แล้ว

    Want to see you build a website or software where I can just upload my pdf and it does this. Even though I have learned a small amount of python, I have very limited time (want to listen while doing chores) and was looking for something to do this for me. Disappointed this video wasn’t ‘hey here’s a tool that does this.’ Neat trick you all did, but not a very useful tool for most people.

  • @nathaliecedeno5251
    @nathaliecedeno5251 3 ปีที่แล้ว

    This is really insightful. Thanks for sharing!

  • @sunapurva
    @sunapurva 4 ปีที่แล้ว

    Using the "Read Aloud" function in Microsoft Edge browser is simpler (at least for those who don't want to take the pains of building a custom program)!

  • @QuizmasterLaw
    @QuizmasterLaw 4 ปีที่แล้ว

    What about books with 2 languages e.g. bilingual novels or language teaching books?

  • @srisubhashpathuri3909
    @srisubhashpathuri3909 4 ปีที่แล้ว +1

    Is there a web page, where they explain this in detail. I want to develop this on my own...

  • @anupyadav1502
    @anupyadav1502 4 ปีที่แล้ว

    How well does it do with the nuances of the text? Is it an ideal way to convert a story book pdf into MP3?

  • @chrisford7351
    @chrisford7351 10 หลายเดือนก่อน

    Good job and very interesting! Thanks

  • @Ramu294
    @Ramu294 4 ปีที่แล้ว +1

    There are a lot of free apps which provide this function, what makes this different?

  • @kwabenamireku5589
    @kwabenamireku5589 3 ปีที่แล้ว

    Can I have a step by step document of how every thing is done, please

  • @fallenangel8785
    @fallenangel8785 3 ปีที่แล้ว +1

    I want subtitles on Google podcast.please

  • @sheikhakbar2067
    @sheikhakbar2067 4 ปีที่แล้ว +1

    Wow, wonderful tips and tricks... I went through that ordeal!

  • @abrahamjuniorsagoe9120
    @abrahamjuniorsagoe9120 3 ปีที่แล้ว

    Please can you do a video of how to start the project from scratch...thanks

  • @wanderinghooligan
    @wanderinghooligan 4 ปีที่แล้ว +1

    Necessity is the mother of inventions

  • @poojaparis
    @poojaparis 3 ปีที่แล้ว

    I use ereader prestigio app on my android phone. It has read out loud function, with many accents and choice between male and female voice. Besides all this you can easily control the speed and pitch of the voice. As for the research articles I don't like to use read out loud function because it needs more focus and I can't multitask while reading them (💯% focus).

  • @MD-wr4ul
    @MD-wr4ul 3 ปีที่แล้ว

    Google Vision API "interactive API template" is not opening, When i am clicking "Public access Permission:Public to internet". Google Cloud Platform=>Storage=>Bucket=>uploaded image (Public access Permission:Public to internet). Please help.

  • @Jd-zd6bh
    @Jd-zd6bh 4 ปีที่แล้ว

    Whoa!! Some part of the speech sounds like real human talking, specially the last part. . . . Great work. .

    • @MerciBro
      @MerciBro 3 ปีที่แล้ว

      You should try J on -4 pitch.