This is the coolest side project I've worked on
ฝัง
- เผยแพร่เมื่อ 16 ก.ย. 2024
- Become a YT Members to get extra perks!
/ @webdevcody
My Products
🏗️ WDC StarterKit: wdcstarterkit.com
📖 ProjectPlannerAI: projectplanner...
🤖 IconGeneratorAI: icongeneratora...
💀 ScaryStoryGenerator: scarystorygene...
Useful Links
💬 Discord: / discord
🔔 Newsletter: newsletter.web...
📁 GitHub: github.com/web...
📺 Twitch: / webdevcody
🤖 Website: webdevcody.com
🐦 Twitter: / webdevcody
This man sure loves his lambdas
For the time being I think, adding some sort of a progress bar/percentage for all the time intensive tasks would be a plus, that way the user would know there is a process happening in the bg and the UI hasn't crashed or not just responding.
yeah I should do that
He doesnt need your advice, im sure he knows about progress bars 😂
idk what I watched but I watched the whole thing. Cody is a genius
Quick tip: I feel like it would be a good thing to evaluate the segments first before generating the images since it can cost you money to generate them. The example shown where you merged two segments and deleted the image could be saved if you were to preview the segments before.
Yeah maybe preview could be useful
@@WebDevCody an option to improve the writing/story with suggestions and a comparison view between the two might be cool
Could feel the joy in that one, Cody. Nice work.
It didn't cost the user credits to regenerate a failed image, did it? If so, I'd say that is something I would change immediately. Effectively paying for a faulty result would be offputting. Also allow the user to adjust segments before generating images so they have a bit more control over how many images are being generated and how many credits it will cost them? In fact, maybe let the user supply how many images they want to allow before you even process the story text with GPT. That way you can tell GPT to adjust its segments split based on a specified number the user is feeling okay with.
Ffmpeg and lambda... reminds me of the time I somehow managed to spend 8 hours trying to compile a static ffmpeg binary that included the "drawtext" filter inside a docker container.
Needless to say I learned a lot about Docker, Linux, and various build tools that day. I ended up just finding a repo that has prebuilt static binaries with all features anyway, so I just download that in my Dockerfile.
This does seem like a cool project. Handling a lot of async jobs that are dependent on one another was a big challenge for me. Makes me want to get back into web.
this idea is so good!
I mean, if I have tons of money, I will put my favorite novels in here and have small comics without having to wait for the actual adaptation of them!
Good job babe ! You're doing great!
You could add a vector store to this architecture that would save images on draw into the vector store and if a prompt was similar enough (let's say > 90%) it could just reuse that image instead of doing an on demand draw every time. Just a thought.
Pretty great architecture man. Instead of the lambda + queue+ finaliser queue and all that, you can probably save a lot of this orchestration by using step functions.
Cody this is great thanks for sharing love the idea and the system design stuff!
Talk about a dude who has good fucking ideas for projects, AND have the skill to bust it out. Thats inspiring to me. Im not that far yet. My current goal is simple websites that are robust + auth + users, etc, but further down the line... I like you style
great video i like the system design talk it was really interesting to hear your thought process. Also a really cool idea to have ai generate a complete scary story video from just text in minutes.
Sirrrrr!!!! Good job babe! I Love ya ❤
I love him too
Can we share him😅
I was searching for this comment 🙂
@@cslearn3044 lol we can all love him… except I get extra 💅🏿😉
@@joeadeleke lol I can loan but I don’t share, money must change hands lol
Now this is senior development folks. Mad man chart lol
Great video man. So cool seeing an app using all the stuff being built just for fun/entertainment rather than the million other vids that are "100x your workflow..."
Why not do the video generation on user side in the browser.
Send over the images, transcript and audtio to user. Display and render the video in browser using canvas. It'll save all the orchestration, save money and give users a better use experience to be able to view the video before final render.
The main down side is that the actual render might take longer depending on users machine.
Probably that last one, inconsistent render times based on user machine or phone.
It was really interesting to watch, thank you
amazing project cody !
ffmpeg has the threads flag that is used to set how many threads to utilize for operations and it does miracles on CPU only machines (not sure how it will perform on lambda though), but you will have to check the video quality on different settings
I like this! Could be cool if you can even make it easier to create those scary story tiktoks. Also I found that using Claude's prompt generator rly does help in making the ai do what u want
Great Video!
Are there plans to create a paid tutorial course for this app, similar to the one for the icon generator? Since it involves several elements, like AWS services, it seems like it would be a more advanced course. I think many people would be interested.
To shorten the perceived wait time, process each story segment before the user presses the generate button. Maybe reprocess the last modified story segment when the user begins editing a new one. By the time the user has reviewed the segments and presses the generate button, you'll already have all the video created and only need to stitch everything together. This will appear to be much faster. Especially if the user makes a video, tweaks the story, and re-generates a new video. Obviously the tradeoff is cost. Perhaps add a "lightning mode" switch that people can turn on for a few extra credits. I'm sure you'll want to optimize too, but then your optimization will be at your leisure and more about cost savings than stressing over a poor user experience. Best of luck. This does look very fun.
It'd be really cool to learn how you rewrite the canvas in Go. And maybe a live coding of the progress bar implementation :D
Regarding the subtitles, it'd be extra nice to have them as the normal movie subtitles that have 1 or 2 sentences on the screen at a time, instead of having each word showing up in real time
Maybe buy some Threadrippers and a couple 4090's for a home server but again scaling and distribution would be a problem, I guess no one will ever leave the cloud addiction😅
Ain’t nothing better than spending $5k on a side project generating zero revenue 💪
Good job🎉
is it possible you could talk about your local/dev setup when working on cloud stack like AWS ? For instance When you had to create a lambda did you check it by deploying on some dev AWS account or you used a local stack. At my work we struggle with local development.
Damn this project is really impressive, nice video
Amazing project Cody! I would suggest giving the output option as vertical video so you can upload it to Instagram or TikTok. Great project!
I have that planned for next, it shouldn’t be hard to add in
So the user is going to be in the storyboard area for quite some time. 1) I would only have the segment text appear and allow the user to re-write and arrange their segments in their storyboard. Once done, they can generate all image or one by one. 2) If they approve of the image + text segment, allow them to "send to production" button on each segment giving the workers a headstart to build out the video while the user works on other parts their storyboard.
Yeah that could help speed up the final video generation for sure
It would be in your best interest to use ec2; potentially allowing you to be able to make more complex videos in the future. Also ec2 has an auto scaling feature. But for now just use lambda if money is a factor. it would be smart to have some type of plan to be able to integrate ec2 in the future.
This would look awesome in a single go server with some clever go routines orchestration.
Impressive engineering 👏
Now we can make our own entertainment.
make a movie with moving characters next ;)
Awesome project! would be a great follow up to cover how you built the credit system
I was surprised to see that you're not using remotion for this. Could probably reduce your lambda costs and complexity since you could do everything in a single go.
doesn't remotion use lambdas?
This is awesomeeee
You are big inspiration to me! the project is really interesting one, you should definitely add some thrill music for the background. I hope you can market it and earn some $$. Best wishes Cody
Cody this was super interesting
This is a very cool project.
I was thinking, you could have a feature going forward which helps to convert the segments into a comic book (with the picture and text on a comic-type background) where the user has an option to export as pdf. Cool project btw
This is cool asf🔥🔥
I've come around pretty hard to convex at this point. The schema validators being composable and exportable is tough to beat. The only thing I'm left wanting for is bulk inserts. You have to manually insert in a loop or make heavy use of objects with arrays.
This is really cool
The technology is really cool in this one. But the video I am not sure how much value it has, considering it's just stitched together images that are zoom in and zoom out. Imagine sitting and watching same zooming image for 30-60 seconds.
If you want to try something challenging, try and add streaming. If you want to make it even more challenging, while improving the user experience try and add concurrent streaming but raw dog it, and don’t use any libraries like vercel AI, just work with the SSE that are returned from openAI.
May God have mercy on your soul. Lol.
Can u make a tutorial on this but using google Gemini if its possible.
As gemini is free.
May be ECS to generate audio, video, then putting them together with ffmpeg in a single containe??
Cool project.
The ec2 instance would make sense in term of performance but also cutting down the img frames part -> putting together -> ffmpeg to ONLY ffmpeg Im just you can do this. Seems lots of trouble for making videos from images with subs.
I've never created a video or generated many images through AI prompts, but I'm curious about the process. I know, In your app there are many features to customize the final result in a more granular way, and it’s also much cheaper to break the process into pieces and edit those pieces separately. However, could I achieve the same result or something close just by using AI prompts directly and refining the output repeatedly? I'm asking because I'm working on a project to determine the chances that a text is AI-generated. I'm using some Python libraries for text analysis and planning to populate my database with a bunch of AI-generated texts to have a basis for comparison with the input text. The thing is, sometimes I think that simply asking the AI about the chances of a text being AI-generated might give me a better result than my own app. Ans during this moments moments I just thing in give up. But, you know, the truth is that I'm doing this not because it will be a superior product, probably it will just be there in my Github vault... Anyways, I think the point is: it will be fun and I’ll learn something. And great job, loved this app!
I was wondering if you looked into how to keep character looks the same across all images since they are generated from different parts of your story?
you have to make sure you use the same subject detail in the prompt, so like. A guy in a black suit. In regards to the face, that's not really possible unless you train your own model using images of the face, or run a face swap model against your generate images (which often just looks bad). It makes it even more tricky if you have multiple people in your image.
since it renders faster than realtime, why not just stream it? :)
you have a video about self hosting convex and what you are missing out of features if you do that?
i wonder how much the aws bill would be to run this infrastructure ?
That’s awesome, did the video artifact have voice over also?
Yes
My friend over at @LetsReadPodcast needs to see this one!
Can you please create a tutorial of making a similar project?
Interesting.. I have one question: Requests to openai are client side or server side using tools like cURL?
How do you sync transcription with voice? 🤔
What did you use in backend? I mean which framework?
how long did it take you to figure all this and build this out. blink and Cody puts out a video on a whole saasable thing
What about step functions instead of SQS
Perhaps profiling to see where exactly your bottlenecks are would help you find interesting ways of optimizing
if you need to run a long running task over 15min what aws service you need to use
Ec2, fargate, ecs. Something with a stateful machine
NO GEN AI.
Generate short vertical video bro you will make way more doing that.
Yeah TikTok is on the todo list
This would save me alot if time, is the code of this project opensource?
If have a phew ideas for it, if i can forl this that would be awesome
Quick question....why do u need SQS before your lambdas ??? Cant you not trigger the lambdas directly ???
To Retry (deez nutz) failed tasks
Sqs has automatic retires in case something fails, and when dealing with network requests or processes that use a lot of memory or cpu, sometimes stuff fails randomly
So are you planning to animate the images?
They already slowly zoom in if that counts
@@WebDevCody Honestly what you have done so far is more impressive.
Bro, 2 minutes to generate such a video is definitely not much. I don't think you need to optimize this. This thing already can get a lot of users.
Next request will probably be "make different voices for different characters in dialogs"
(I'm working on an AI audiobooks project, can say from experience)
Babe, how did you manage to sync the audio to the subscripts and videos, I'd belive creating video from images using ffmpeg could possibly make them out of sync ?
Open ai gives timestamps with each word, so I just show the words based on those timestamps
@@WebDevCody cool cool ! I'd also wanna explore them but my pocket don't allow.
@@anothermouth7077do you have a good gpu? If so whisper is open source and you could use other models to do this on your own system for learning.
@@WebDevCody I feel like there should be more words on sceren for longer. I dont like these super fast zoomer subtitles.
@@WebDevCody is it possible you could talk about your local/dev setup when working on cloud stack like AWS ? For instance When you had to create a lambda did you check it by deploying on some dev AWS account or you used a local stack. At my work we struggle with local development.
how are you generating the videos?
Great work! I did something similar, but with a completely automated process (no human in the loop - or almost, so far). This is my first attempts so the results are no so great yet, but check the channel if you want to take a look at them :D
cool prj but can just run locally for free
Dude, you're spending too much time fixating on the performance. Just put up a progress bar or send the user an email when it's done. If this ends up getting scale, you can start looking into optimizations, like having a dedicated server, etc.
Yeah that’s true
can i get code for this
code .
Ur so cool
doesn't cost money if you run it locally 🤦🏽
I thought it's a real videos
Now I really wanna know how much faster Go would be in this
so far I've seen python is able to generate the frames at about 20ms a frame. My node-canvas solution is doing maybe 60ms a frame.
I hope ai generated content drives up the value of original content 🙏.
That's an amazing project you done there. I think lamdas are better for this kind of thing, where you spin up one for a cpu intensive task, but you're only gonna need this when users are making stuff. Going the dedicated server way, maybe if you have constant use. As for gpus, don't they rent on demand with gpus? Another option would be to have those AI requests on a dedicate server, but really simplify them as in no video creation and stitching on your server, go the WASM ffmpeg way. Not sure if it can be done at the moment, not really following WASM progress. Or probe capabilities (there must be some webAPI? if not, WASM? lol) and have both lamdas for people without the capabilities, say on a slow cpu, or no gpu, and WASM for the rest? This would save you money. That's a good project, take it further.
Generating a 7 min video costs maybe $0.12 cents, so whatever effort i put into all of this must be worth it. It’s much easier for me to just charge the user one extra dollar to cover operation costs honestly than it would be for me to look into everything you just said.
@@WebDevCody Sure, WASM is early stages atm, but it could really save you alot of money since the heavy lifting is gonna be taking place on people's computers, and most people have more than enough power sitting idle. Ffmpeg video creation and stitching is gonna use their cpu power. I think it's worth looking at it at some point, especially if your project succeeds and has alot of users. In any case, I really liked your project and hope it does!!!