How to Program Speech Synthesis in an Animatronic Mouth Using Python and Arduino

Will Cogley

มุมมอง 29 415

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 24 พ.ค. 2020
Here's a closer look at the programming behind my animatronic mouth. Using Arduino, Python, and a few open-source libraries, I take a typed sentence and convert it into an animation sequence.
Support me on Patreon! / nilheimmechatronics
Contact: enquiries@willcogley.com
Discord Server: / discord
Open source animatronic mouth design: www.nilheim.co.uk/latest-proje...
Instructable: www.instructables.com/id/Simp...
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 47

@scottduede8134 4 ปีที่แล้ว ⁺²¹
As a linguist, I can say that this is awesome sauce.
@hypodyne1 4 ปีที่แล้ว
I did the same thing with a talking head program. Used the same dictionary and mapped the visemes to the phonemes. You could have a conversation in real time with my app (called Ayako). Awesome that you took it further with the robotic mouth. Well done.
@PhG1961 4 ปีที่แล้ว
Excellent work ! Awesome !!
@stevecoxiscool 4 ปีที่แล้ว ⁺¹
Nice work !!!
@jonathangriffin3486 4 ปีที่แล้ว ⁺²
For emulating audio you would probably need to look at how the frequency domain representation of the signal is changing over time.
@TheRainHarvester 4 ปีที่แล้ว ⁺¹⁰
Make your mouth the narrator in the bottom right of all your videos!
How loud are the servos in real life?
@drudtube 3 ปีที่แล้ว
That looks great! I'm working on a same kind of controller. But I am using a stereo track that is analyzed by a Teensy Audio Board. The Right track contains the speech and controlls the Jaw. The Left track contains tones that correspond to the mouth positions of the other servo's.
For example 200Hz for A, E, I. And 250Hz for B, M, P.
So the actor who's gonna play with this mouth only has te make an audio track with the right tones in the right position on the Left-track.
@TheMrJuoji 4 ปีที่แล้ว
instead of maping in an array each servo position for each viseme you could try to do a reverse kynematic model then just have each end position for the mouth maped in set of position , you could even map that to motion capture
@tecnicotec1 4 ปีที่แล้ว ⁺¹
Very good vídeo and better job.
Its really amazing and interesting.
Thanks about your job.
@Nono-hk3is 4 ปีที่แล้ว
Good work!
@cdoebler 4 ปีที่แล้ว
Excellent work. My upcycled Teddy Ruxpin uses a much simpler set of visemes.
@satyakidas7144 4 ปีที่แล้ว
Beautiful video awesome robot
@tankart3645 4 ปีที่แล้ว
Looks awesome I got to say
@MattHollands 4 ปีที่แล้ว ⁺¹³
Are you planning to put a skin on the mouth? Seems like there are bits around the mouth to deform the lips etcs but looks a bit odd without a skin
@kiltmaster7041 2 ปีที่แล้ว ⁺¹
It did strike me as odd that he was setting servo positions for certain expressions when he doesn't even know what those expressions will look like on a completed face. Surely it would make more sense to finish the head before setting something like that? But what do I know?
@Skyliner_369 3 ปีที่แล้ว ⁺¹
I'm sure that if I wanted to, I could probably write a blender extension that avoids all this phoneme stuff and instead sends direct pose data from animated frame data. That way, the mouth is animated like how someone might animate a character.
@Skillseboy1 3 ปีที่แล้ว
Such a cool video
@robertwesterfield3454 6 หลายเดือนก่อน
Wow thanks!
@FirstLast-wr9mh 4 ปีที่แล้ว
Fantastic
@AltMarc 4 ปีที่แล้ว
For local ASR try DeepSpeech, on a RPI4 DeepSpeech Lite works in real time.
Local Speech recognition is still tricky, works better on full sentences than single words.
@Robots-and-androids ปีที่แล้ว
You might be able to use text to speech software first and then convert that text that it gives you into phonemes. both microsoft and bing offer "free" speech recognition for python. I use both. I am planning on doing something similar with a human figure.
@twobob 4 ปีที่แล้ว
I seem to remember using a Regular Expression to make the UK version of the Arpabet for my speech projects in the past. I think thats right. This looks like a fun project. I used a c# library under unity for this last time I faffed around. Good job overall. it's a tricky subject :)
@twobob 4 ปีที่แล้ว
Also the iphonex does a decent job these days. apps.apple.com/us/app/face-cap/id1373155478 might be worth an eyeball for example.
I did cmusphinx.github.io/2013/03/speech-recognition-on-kindle-touch-with-cmusphinx/ donkeys years ago showing that one can indeed get a decent extract of words from those tools you mentioned. You need to faff a bit though. Good luck, this one looks like fun,
@mariotoys6173 4 ปีที่แล้ว
Respect
@Robots-and-androids ปีที่แล้ว
where did you get that amazing servo tester????? I NEED one of those! --Thomas
@HeathLedgersChemist 4 ปีที่แล้ว
Could you approximate the mouth positions for Leeds by just leaving the mouth open all of the time?
@DustinWatts 4 ปีที่แล้ว ⁺²
Great work Will! I was thinking, what about using an ESP32 instead of Arduino? ESP32 can run MicroPython. Therefore eliminating the need for two microcontrollers and a serial connection.
@KineticWasEpicVideos 4 ปีที่แล้ว
NLTK will not run on an ESP32. Raspi + arduino combo would be ideal for a contained system.
@SDRIFTERAbdlmounaim 3 ปีที่แล้ว
use a loop and table instead of a bunch of 'if's those will stack up real quickly lol
@SpaceDave-on8uv 3 ปีที่แล้ว
2:44
Does this mean switch statements do not exist in arduino?
@Jimmyfpv_ 3 ปีที่แล้ว
Yes, but bare in mind that they are not very useful when you do logic operations within the ‘if’. You would need to map the possible results into values so that you could use the switch statements
@REALVIBESTV 10 หลายเดือนก่อน
Can I buy the code
@CyberSyntek 4 ปีที่แล้ว
Will, take a look at audioservocontroller dot com. I'm not sure how many servos it is cabable of controlling at once as I haven't grabbed one yet, I can inquire though as someone from the FB group has one. There is a few dif audio servo controllers out there but they don't seem to be very common. Scary Terry is another one. I saw someone post the hardware layout at some point so it might be easier to throw one together depending on the components. Might be the only way to get that many sevos running in sync.
Anymore thoughts on the potential forum? XD
*edit* Fernado from the group has a vid up with him testing it on his DARA robot. "DARA robot lip sync" if u r curious. Can ask him if he played with it anymore since. I think he had it just hooked up to the jaw and not his tongue model, he would know better mind you. :9
@ViennaMike 4 ปีที่แล้ว
Scary Terry and similar just work off the volume of the sound source, not visemes. Adequate for a jaw on a prop or toy (I use them and I'm developing a similar thing using a Raspberry Pi), bit nowhere near what Will is doing with visemes, jaw,.face, and tongue movements.
@Allanusmonostat ปีที่แล้ว
So if you modded this just a hair it could be a phonetic filter.
@mr.e.484 4 ปีที่แล้ว
#10
@saucelessbones5872 ปีที่แล้ว
gona make me act up
@gone6442 2 ปีที่แล้ว
Ok im making the mad hatter and march hare
@abetusk 4 ปีที่แล้ว
Unfortunately this isn't "open source". The source is available, as are the STLs, but there is no license on them and so cannot be used used, redistributed or altered legally.
The commonly held definition of "open source" is (from en.wikipedia.org/wiki/Open-source_license):
"... a type of license for computer software and other products that allows the source code, blueprint or design to be used, modified and/or shared ... Licenses which only permit non-commercial redistribution or modification of the source code for personal use only are generally not considered as open-source licenses."
From the "terms of service" page at www.nilheim.co.uk/terms-of-service.html:
" .. not to (or permit anyone else to) do or attempt any of the following:
* distribute, rent, loan, lease, sell, sublicense, or otherwise transfer or offer the Service for any commercial purpose;
"
Which puts it in direct contradiction with the definition of "open source" most widely used.
Please consider removing the term "open source" for something more appropriate like "source available", or putting the source code and STL files under a free/libre license.
@ViennaMike 4 ปีที่แล้ว
Of course I agree that prohibiting commercial use means it's not "open source" under the common definition. But I can certainly see reasons for doing so. I do think that the developer should consider some standard license, rather than the current "terms of service" which has some clear wording errors and more importantly, use of a non-standard license restricts uses the creator intended to allow, as no one is familiar with them or exactly how the terms may be interpreted. Besides just changing to an open source license . wouldn't other options include: 1) While not intended for software, use the Creative Commons license limiting commercial use, 2) License under the VERY unrestictive GPL, with options for commercial users to pay for closed licenses. This doesn't actually prohibit commercial use provided the user abides by the terms of GPL opening up their own changes to the same terms, but may make it more attractive for commercial users to pay for a restrictive license, or 3) While I haven't seen it used, use the Commons Clause (commonsclause.com/)?
@Bigbirddev 2 ปีที่แล้ว
People who made this
*it took me 2 years to make*
@Mr_Motor 3 ปีที่แล้ว
on L the tongue should touch the top
@MrMoka15 4 ปีที่แล้ว
Are you a Furry? You could make a lot of money by seling this to them :3
@ChrisD__ 3 ปีที่แล้ว
It might be hard to fit all this stuff into a mask, but I remember there being a few people building animatronic fursuit heads like this.

ต่อไป

เล่นอัตโนมัติ

How to Make a Compact Animatronic Eye Mechanism with 3D Printing and Arduino