- 39
- 27 432
Tech Giant
Nigeria
เข้าร่วมเมื่อ 5 ส.ค. 2023
- Tech reviews
- Unboxing of cool gadgets and drones
- AI & coding tutorials
🤖💻🚁
Don't forget to subscribe and ring the bell to never miss an update!
Stay Tech-Savvy🤟🏻
- Unboxing of cool gadgets and drones
- AI & coding tutorials
🤖💻🚁
Don't forget to subscribe and ring the bell to never miss an update!
Stay Tech-Savvy🤟🏻
Improving AI Agents with Background Tasks | Different Approach to Handling Tools
A quick demo showing how AI agents can run tools in the background using a Task Manager. This is just a different way to handle tool execution compared to typical approaches. If you're working with AI agents, you might find this implementation interesting.
#ai #Programming #Demo #TechTutorial #python #artificialintelligence #langchain #aitools #aiagents #agentic
#ai #Programming #Demo #TechTutorial #python #artificialintelligence #langchain #aitools #aiagents #agentic
มุมมอง: 317
วีดีโอ
Running LLMs locally w/ Ollama - Llama 3.2 11B Vision
มุมมอง 7079 ชั่วโมงที่ผ่านมา
In this video, we'll explore how to use Ollama’s latest Llama 3.2 Vision model with 11 billion parameters and run it locally. The Llama 3.2 Vision model, available in 11B and 90B versions, brings advanced multimodal capabilities that allow it to interpret images, recognize scenes, generate captions, and answer questions based on visual content. Optimized for both image reasoning and text-based ...
Setting up Janus 1.3B Multimodal LLM Locally | Image Generation & Understanding
มุมมอง 44314 วันที่ผ่านมา
In this video, I'll show you how to locally setup the new Multimodal Large Language Model that's capable of both image understanding and generation; called Janus 1.3B by Deepseek AI 🖥️ MY SETUP: PC: Macbook M1 Pro Language: Python Version: Python 3.12 #ai #aivoice #aivoices #texttospeech #tts #cosyvoice #funaudiollm #voicecloning #voicesynthesis #voice #llm #prompt #instruct #opensource #openso...
Open-source Voice Cloning with the new F5 TTS Model | Local Setup, CLI Inference & Gradio Web UI
มุมมอง 3.1K28 วันที่ผ่านมา
In this video, I'll show you how to setup the F5-TTS an excellent voice cloning text to speech model. 🖥️ MY SETUP: PC: Macbook M1 Pro Language: Python Version: Python 3.12 🔗 LINKS F5-TTS HF: huggingface.co/SWivid/F5-TTS E2-TTS HF: huggingface.co/SWivid/E2-TTS F5-TTS Github Repo: github.com/FunAudioLLM/CosyVoice OTHER VOICE CLONING MODELS: Cosyvoice SFT Model setup: th-cam.com/video/NDckWBZztTI/...
CosyVoice Text to Speech WebUI (Open-source) - English Version
มุมมอง 27728 วันที่ผ่านมา
NOTE: This video is just an overview of Cosyvoice WebUI running on my Macbook M1 Pro In this video, we'll go over the CosyVoice WebUI (which I translated to English) running on my MacBook. We tried both the SFT Model & the Base model. We then generated speech from text with both models and we also cloned Cogman's voice (the robot butler from the movie: Transformers Last Knight) 🔗 LINKS Cosyvoic...
CosyVoice TTS #3 | Open-source Instruct Model Text-to-Speech
มุมมอง 164หลายเดือนก่อน
NOTE: This video is the third of a three part series, where I setup Cosyvoice on my Macbook M1 Pro In this tutorial, I'll guide you through setting up CosyVoice on your MacBook for multilingual text-to-speech synthesis using Python3.12 & Conda env. CosyVoice is a cutting-edge multilingual text-to-speech (TTS) system designed to produce natural, lifelike speech across over 100 languages. Its sta...
CosyVoice TTS #2 | Open-source Base Model Voice Cloning & Cross-Lingual
มุมมอง 232หลายเดือนก่อน
NOTE: This video is the second of a three part series, where I setup Cosyvoice on my Macbook M1 Pro In this tutorial, I'll guide you through setting up CosyVoice on your MacBook for multilingual text-to-speech synthesis using Python3.12 & Conda env. CosyVoice is a cutting-edge multilingual text-to-speech (TTS) system designed to produce natural, lifelike speech across over 100 languages. Its st...
Setting up CosyVoice TTS #1 | Open-source SFT Model Text to Speech
มุมมอง 386หลายเดือนก่อน
NOTE: This video is the first of a three part series, where I setup Cosyvoice on my Macbook M1 Pro In this tutorial, I'll guide you through setting up CosyVoice on your MacBook for multilingual text-to-speech synthesis using Python3.12 & Conda env. CosyVoice is a cutting-edge multilingual text-to-speech (TTS) system designed to produce natural, lifelike speech across over 100 languages. Its sta...
Meta's Llama 3.2 3B, 11B & 90B Vision models
มุมมอง 261หลายเดือนก่อน
In this video we test three sizes of Llama 3.2 (3B, 11B & 90B), the newly released Llama series of models by Meta AI #ai #llm #aiagents #llama #llama3 #llama3.2 #togetherai #langchain #aiagent #streamlit #metaai #meta
Setting up Fish Speech TTS v1.4 by @FishAudio locally- High Quality Open-source Voice Cloning Model
มุมมอง 1.1Kหลายเดือนก่อน
NOTE: This is just an update to my previous video on setting up Fish Speech v1.2 on Macbook M1 Pro I'll be setting up Fish Speech version 1.4 by @FishAudio This is a great TTS model trained on 700k hours of audio data in multiple languages (English, Japanese, German, French, Spanish, Korean, Arabic, and Chinese audio data), it also performs wonderfully at voice cloning and TTS generation. The o...
High Quality Voice Cloning TTS Model - Fish Speech 1.2 by Fish Audio
มุมมอง 1.2K3 หลายเดือนก่อน
Newer Version Alert 🚨: Fish Speech v1.4 upgrade video th-cam.com/video/kNuRXS01UyA/w-d-xo.html NOTE: This video is part of the Text-to-speech Comparison Series I'll be setting up Fish Speech v1.2 by Fish Audio. This is a great TTS model trained on 300k hours of English, Japanese and Chinees audio data, it also performs wonderfully at voice cloning and TTS generation. The only downside is; it is...
Setting up a Realistic Text-to-speech; Bark (by Suno AI) locally
มุมมอง 1.8K4 หลายเดือนก่อน
NOTE: This video is part of the Text-to-speech Comparison Series I'll be setting up Bark a transformer-based text-to-audio model created by Suno AI. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also generate nonverbal cues, such as laughter, sighing and crying. 🔗 LINKS Code Repo: github...
Simple AI Agent/Chatbot | MegaMind | I/O with Whisper.cpp & Piper TTS
มุมมอง 4145 หลายเดือนก่อน
MegaMind AI is a barebones AI Assistant that's split in the three sections: STT | LLM/MLLM | TTS and was built as a structured chat agent, with the Langchain framework, together with a few python libraries whisper.cpp, to enable fast transcription of user's speech depending on user's PC hardware. 🔗 LINKS Project's Github: github.com/brainiakk/megamind.ai Whisper CPP (CoreML Support) Example Git...
Speech to text with Whisper CPP in a Python Project (with CoreML/Apple Silicon Support)
มุมมอง 1.1K5 หลายเดือนก่อน
Speech to text with Whisper CPP in a Python Project (with CoreML/Apple Silicon Support)
Gemini 1.5 Pro (latest) with Langchain's ChatVertexAI Package
มุมมอง 3235 หลายเดือนก่อน
Gemini 1.5 Pro (latest) with Langchain's ChatVertexAI Package
Vision Tool & Screenshot Tool for Langchain Structured Chat Agent (Powered by Gemini 1.5 Pro)
มุมมอง 3405 หลายเดือนก่อน
Vision Tool & Screenshot Tool for Langchain Structured Chat Agent (Powered by Gemini 1.5 Pro)
Installing Piper Text To Speech Engine (on a Macbook w/ Apple Silicon)
มุมมอง 1.1K6 หลายเดือนก่อน
Installing Piper Text To Speech Engine (on a Macbook w/ Apple Silicon)
Setting up Openvoice version 2 and MeloTTS for AI voice cloning
มุมมอง 7K6 หลายเดือนก่อน
Setting up Openvoice version 2 and MeloTTS for AI voice cloning
WizardLM2 function call - Using Llama 3 Tokenizer & Langchain's Pydantic OpenAI function converter
มุมมอง 1276 หลายเดือนก่อน
WizardLM2 function call - Using Llama 3 Tokenizer & Langchain's Pydantic OpenAI function converter
LLAMA 3: function calling review using llama index framework and Ollama locally.
มุมมอง 5726 หลายเดือนก่อน
LLAMA 3: function calling review using llama index framework and Ollama locally.
Port Harcourt City (Nigeria) | A Brief Cinematic OVER-view 🤣
มุมมอง 1028 หลายเดือนก่อน
Port Harcourt City (Nigeria) | A Brief Cinematic OVER-view 🤣
CFly Faith 2 Pro Drone Review: Advanced Features, 540º Obstacle Avoidance & Impressive Performance!
มุมมอง 59410 หลายเดือนก่อน
CFly Faith 2 Pro Drone Review: Advanced Features, 540º Obstacle Avoidance & Impressive Performance!
Speedybee Bee35 Pro FPV Frame Review: Durable, Protective, and Feature-Packed!
มุมมอง 1.1K10 หลายเดือนก่อน
Speedybee Bee35 Pro FPV Frame Review: Durable, Protective, and Feature-Packed!
Honest Walksnail Avatar HD Pro Kit Review: Disappointing Range, and a Watery Demise 💔
มุมมอง 13311 หลายเดือนก่อน
Honest Walksnail Avatar HD Pro Kit Review: Disappointing Range, and a Watery Demise 💔
Exploring the Landscapes of Uniport with the CFLY Faith 2 Pro Drone
มุมมอง 35211 หลายเดือนก่อน
Exploring the Landscapes of Uniport with the CFLY Faith 2 Pro Drone
Interested. Make a different demo. I still dont understand how its different fom standard function calling
Hello! I'm looking to create a French language model using OpenVoice. I have a dataset with audio clips and aim to generate a checkpoint, but I'm encountering issues with the training process
It would be fine to be able to record our own speech flow, to give the final generated voice the rythm, pause and pitch desired, as Applio RVC does
Require gpu 😢
I ran it on my cpu, it hasn't been optimized to run on apple metal gpu
the voice clone of your voice is 90%. really good!
True, but the v1.4 does a better job though
Can this do any language?
I think it's just english and chinese, you can check their research paper
How much data you need for making this?
How does that differ from tortoise-tts? And which is better?
Can I shot on flat on it ?
No
This is nice, but for someone like me it would be better with a good webui
They have a webui, you can visit their huggingface page, the problem is it is written in chinese so you can try translating the text to english if you have the time.
@@techgiantt if anyone uses microsoft edge browser, just right click and translate the page, by the way I experienced it in english by default.
I have translated it to english in a recent video
Great work! Once the voice is cloned and the code_N file is generated, how to use the inference only for fast tts use?
hello pro is any way to run on GPU Pro 4090 RTX ?
Like I said earlier, I use a Macbook, I don't use Nvidia hardware
How to run on GPU Pro
Your intro noise is unfathomably too loud.
pro what need to change to run this project in GPU i chage cpu to cuda also try cuda:0 but there some error what is wright way to use GPU i have local 4090 GPU in my laptop please help
Can I stream audio from suno bark ? Like get audio chunks with custom sample rate and audio format ?
bro, is you Nigerian?
Curious, what is this explorer window that shows the content being populated as you run the commands?
What are the required machine configs for this? I'm running out of memory on T4 Tesla 16 gigs on cuda and my ram 28 gigs on cpu
I ran it on my Macbook M1 pro but it also supports cuda if your gpu supports that, in the video I switched to cpu to run it on my mac
Is that support Bahasa Indonesia?
Does it have a docker setup?
not sure
hello pro big big fan pro finally its run perfectly with me 😍 thank you for this Awesome tutorial just one more question i have GPU 4090 on my laptop what i need to run this on GPU i try to change this code device="cpu" to device="cuda" i have 2GPU's intel Build in and Nvedia GFORCE RTX 4090 thank you in advance pro 😀
I use a macbook that's why I switched from cuda to cpu. You can reach out to Nvidia support if you're having trouble with their hardware, but make sure you've installed the necessary drivers for that graphics card so you can use cuda, before reaching out to them.
@@techgiantt i have Cuda setup and it work fine with many other projects but what change that need to do ro run in GPU
@techgiantt Which model of MacBook is required sir ?
Please pro make full clear tutorial
Hello pro i fellow your tutorial on fish speech Please make full clear tutorial about This model 1.4
we still wait for clear tutorial about this model please pro
Hello Sir thank you so much for this tutorial. I'm wondering if you can do the cloning but using Bark (the one you used in last video ) so the reading also sounds more human ?
hello i try all solution that you give me still same errors please i need your help best regards
hello pro good tutorial but i have mistake can you review with me what is problem did almost every thing like what you did its still show me No module named 'fish_speech.conversation' i check the path and every thing is good
Did you use version 1.2 (that's what I used)? but there is a new version (v1.4).
@@techgiantt yes i use 1.4 still same problem Can i have your contact No i need your help pro
@@techgiantt i need your contact No I need your help
@@mahaltech Just get version 1.2 from their github releases page, and follow the other steps I showed in the video.
@@techgiantt there is some defrance
The project can also read PDF documents and I am willing to connect it all into a super all in one project wit UI this months.
I would also Love an UI to select languajes between your outputs and select characters between your characters .TXT prompts. And for last a selector for your vaults .TXT to tell the LLM what do you expect the AI to know about your needs.
Thanks, I think you need a better 🎤 mic, your voice is so low, probably will put people to sleep, voice quality is more important than video, good luck
Sorry about the inconvenience, had some problems with the mic IO. Would switch it out in the next video.
great job!by the way, did you try vllm to load mistral llm? vllm can start an OpenAI API-compatible server with: python -m vllm.entrypoints.openai.api_server --model mistralai/Mistral-7B-Instruct-v0.3 from langchain.chat_models.base import BaseChatModel creat a custom class CustomVLLM:class CustomVLLM(BaseChatModel), then llm = CustomVLLM(base_url="localhost:8080/v1"
@@baoxinping3081 I’ll check it out, I’ve been postponing checking it out for a while. I’m also working on updating the project, maybe I’ll cram it into one video.
awesome
Having issues with piper, not able to find the module regardless of my attempts.
I didn't set up piper in this video
@@techgiantt no worries, was having some bizarre dependency issues with my venv, did a clean pip of the requirements and all ran smoothly besides some ALSA issues but cleaned it all up. Thanks.
guys make sure u r using python 3.10 or it gets weird. But also this was an amazing tutorial! thanks!!
is it utilizing the GPU of the apple silicon?
♥
As far as I understand, this project is for Linux and MacOS only. Since coremltools only works for them. But the video is still good.
For those who don't want to mess around with a complex installation, use Pinokio
Nice work man
Thanks man
While executing python -m main, I'm getting the following error: cannot import name 'TTS' from 'melo'
@@SurrenderToAction did you install Melo TTS ? Because it’s in the modules directory and you need to install it in that virtual environment you setup
@@techgiantt As you do on the 3:48 of the video, right? Yes, I did that, but I had a bunch of issues. Maybe because I was using last version of python, and went to 3.9 later..
@@SurrenderToAction Check th-cam.com/video/UsuuSgnOJxg/w-d-xo.html
@@techgiantt Yeah, I did rename it to "melo". Maybe better do everything from scratch. One more day..
@@techgiantt I'm getting this error when installing melo: ERROR: Failed building wheel for tokenizers
got this error when genereting the audio file ../lib/libespeak-ng.dylib' (no such file)
why you mounted the gps like that......... its too bad the frame comes with the mount broo
The GPS mount is too small for the standard size of GPS
Wow, This is amazing 😮😮
Thanks, Can i run this on CPU because i don' t have GPU
I think by default it runs on CPU, just ignore the CoreML support bit.
A bit confusing. What's the relation between MeloTTS and OpenVoice V2?
Melo tts can act as a stand-alone Text to speech engine or as the Base speaker for Openvoice v2. Openvoice is both a tts and a voice cloning engine. The Openvoice v1 can do without Melo tts as the base speaker
@@techgiantt Thanks for your reply. I'm able to play English voice without any issue. But when I play Chinese, I got the following error message: RuntimeError: Placeholder storage has not been allocated on MPS device! Any suggestion? Thanks.
Does it work good on other languages audio because i have tried on bark and tacotron 2 but did not get good results for hindi language, thanks for video keep giving good content 😊
I think it’s mostly English, Japanese, Chinese, French, Spanish, Korean language that’s supported, but they it also has Indian accent
Hello brother, can u share codes ?
For the text to speech or ai agent using tools, because I already have a video on the TTS and a GitHub repo in the description of that video
Is there a way to make these TTS more expressive.
Yes, but you need a beefy gpu to use it with an ai model since you won’t want extra latency, but I’ll create a video for that.
@@techgiantt I think it would be amazing if they could act, expressing emotions anger, sadness, sorrow, compassion, confidence, hesitation, shyness, embarrassment, bravado, whisper, fear, shout, laugh, etc. moods and personality expressed via voice.
@@komakaze1 exactly my thought. Is there any other alternative out there, one that doesn't cost 20 bucks per month?
try making it run with ollama phi3 and llava for vision of phi3 vision please if possible also where can we get the code i have also set of tools that can be used in ollama and phi3 hit me to talk about it if you want !!!
usa o Wizard, ele é melhor como assistente, estou procurando projeto para compartilhar a tela do computador, e ele descreve e ajudar com codigos.
good bro!
Thanks 🔥