- 43
- 35 601
Tech Giant
Nigeria
เข้าร่วมเมื่อ 5 ส.ค. 2023
- Tech reviews
- Unboxing of cool gadgets and drones
- AI & coding tutorials
🤖💻🚁
Don't forget to subscribe and ring the bell to never miss an update!
Stay Tech-Savvy🤟🏻
- Unboxing of cool gadgets and drones
- AI & coding tutorials
🤖💻🚁
Don't forget to subscribe and ring the bell to never miss an update!
Stay Tech-Savvy🤟🏻
Using XAI's Grok 2 Text and Vision models with Langchain
This video demonstrates testing the new Grok 2 text and vision models, exploring their capabilities in image understanding and function calling through practical examples and real-world applications.
#ai #xai #aimodel #aiagents #tts #coding #python #llm #stt #grok #grok2 #xaimodels
#ai #xai #aimodel #aiagents #tts #coding #python #llm #stt #grok #grok2 #xaimodels
มุมมอง: 39
วีดีโอ
Real-Time Speech-to-Text & Speaker Identification using Whisper, Vosk & Pyannote (Open-Source)
มุมมอง 427วันที่ผ่านมา
In this video, I’ll walk you through two simple solutions for real-time speech-to-text and speaker verification/identification. These implementations combines transcription and speaker identification capabilities using popular tools like PyAnnote, Whisper, and Vosk. Whether you're building an AI system, exploring speech processing, or tackling speaker verification challenges, this video provide...
Open-source Voice Cloning & Text to Speech with the new OuteTTS v0.2 500M Model | Local Setup
มุมมอง 68521 วันที่ผ่านมา
In this video I'll show you how to setup OuteTTS v0.2 locally. OuteTTS-0.2-500M is the improved successor to the v0.1 release. It performs fast voice cloning and speaker profile generation, which shortens the inference time. 🖥️ MY SETUP: PC: Macbook M1 Pro Language: Python Version: Python 3.12 #ai #coding #python #pythonprogramming #tts #texttospeechtechnology #texttospeech #voicecloning #voice...
Installing Marco-o1 locally - Open Source "Reasoning" Model
มุมมอง 508หลายเดือนก่อน
In this video I'm going to be putting the new open-source "reasoning" language model: Marco-o1 to the test. Marco-o1 Large Language Model (LLM) is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies optimized for complex real-world problem-solving tasks. #ai #coding #llm #reasoning #reasoningmodel #artificiali...
Improving AI Agents with Background Tasks | A Different Approach to Handling Tools
มุมมอง 384หลายเดือนก่อน
A quick demo showing how AI agents can run tools in the background using a Task Manager. This is just a different way to handle tool execution compared to typical approaches. If you're working with AI agents, you might find this implementation interesting. #ai #Programming #Demo #TechTutorial #python #artificialintelligence #langchain #aitools #aiagents #agentic
Running LLMs locally w/ Ollama - Llama 3.2 11B Vision
มุมมอง 1.1Kหลายเดือนก่อน
In this video, we'll explore how to use Ollama’s latest Llama 3.2 Vision model with 11 billion parameters and run it locally. The Llama 3.2 Vision model, available in 11B and 90B versions, brings advanced multimodal capabilities that allow it to interpret images, recognize scenes, generate captions, and answer questions based on visual content. Optimized for both image reasoning and text-based ...
Setting up Janus 1.3B Multimodal LLM Locally | Image Generation & Understanding
มุมมอง 540หลายเดือนก่อน
In this video, I'll show you how to locally setup the new Multimodal Large Language Model that's capable of both image understanding and generation; called Janus 1.3B by Deepseek AI 🖥️ MY SETUP: PC: Macbook M1 Pro Language: Python Version: Python 3.12 #ai #aivoice #aivoices #texttospeech #tts #cosyvoice #funaudiollm #voicecloning #voicesynthesis #voice #llm #prompt #instruct #opensource #openso...
Open-source Voice Cloning with the new F5 TTS Model | Local Setup, CLI Inference & Gradio Web UI
มุมมอง 4.1K2 หลายเดือนก่อน
In this video, I'll show you how to setup the F5-TTS an excellent voice cloning text to speech model. 🖥️ MY SETUP: PC: Macbook M1 Pro Language: Python Version: Python 3.12 🔗 LINKS F5-TTS HF: huggingface.co/SWivid/F5-TTS E2-TTS HF: huggingface.co/SWivid/E2-TTS F5-TTS Github Repo: github.com/FunAudioLLM/CosyVoice OTHER VOICE CLONING MODELS: Cosyvoice SFT Model setup: th-cam.com/video/NDckWBZztTI/...
CosyVoice Text to Speech WebUI (Open-source) - English Version
มุมมอง 3812 หลายเดือนก่อน
NOTE: This video is just an overview of Cosyvoice WebUI running on my Macbook M1 Pro In this video, we'll go over the CosyVoice WebUI (which I translated to English) running on my MacBook. We tried both the SFT Model & the Base model. We then generated speech from text with both models and we also cloned Cogman's voice (the robot butler from the movie: Transformers Last Knight) 🔗 LINKS Cosyvoic...
CosyVoice TTS #3 | Open-source Instruct Model Text-to-Speech
มุมมอง 2262 หลายเดือนก่อน
NOTE: This video is the third of a three part series, where I setup Cosyvoice on my Macbook M1 Pro In this tutorial, I'll guide you through setting up CosyVoice on your MacBook for multilingual text-to-speech synthesis using Python3.12 & Conda env. CosyVoice is a cutting-edge multilingual text-to-speech (TTS) system designed to produce natural, lifelike speech across over 100 languages. Its sta...
CosyVoice TTS #2 | Open-source Base Model Voice Cloning & Cross-Lingual
มุมมอง 4072 หลายเดือนก่อน
NOTE: This video is the second of a three part series, where I setup Cosyvoice on my Macbook M1 Pro In this tutorial, I'll guide you through setting up CosyVoice on your MacBook for multilingual text-to-speech synthesis using Python3.12 & Conda env. CosyVoice is a cutting-edge multilingual text-to-speech (TTS) system designed to produce natural, lifelike speech across over 100 languages. Its st...
Setting up CosyVoice TTS #1 | Open-source SFT Model Text to Speech
มุมมอง 5202 หลายเดือนก่อน
NOTE: This video is the first of a three part series, where I setup Cosyvoice on my Macbook M1 Pro In this tutorial, I'll guide you through setting up CosyVoice on your MacBook for multilingual text-to-speech synthesis using Python3.12 & Conda env. CosyVoice is a cutting-edge multilingual text-to-speech (TTS) system designed to produce natural, lifelike speech across over 100 languages. Its sta...
Meta's Llama 3.2 3B, 11B & 90B Vision models
มุมมอง 3902 หลายเดือนก่อน
In this video we test three sizes of Llama 3.2 (3B, 11B & 90B), the newly released Llama series of models by Meta AI #ai #llm #aiagents #llama #llama3 #llama3.2 #togetherai #langchain #aiagent #streamlit #metaai #meta
Setting up Fish Speech TTS v1.4 by @FishAudio locally- High Quality Open-source Voice Cloning Model
มุมมอง 2.1K3 หลายเดือนก่อน
NOTE: This is just an update to my previous video on setting up Fish Speech v1.2 on Macbook M1 Pro I'll be setting up Fish Speech version 1.4 by @FishAudio This is a great TTS model trained on 700k hours of audio data in multiple languages (English, Japanese, German, French, Spanish, Korean, Arabic, and Chinese audio data), it also performs wonderfully at voice cloning and TTS generation. The o...
High Quality Voice Cloning TTS Model - Fish Speech 1.2 by Fish Audio
มุมมอง 1.7K4 หลายเดือนก่อน
High Quality Voice Cloning TTS Model - Fish Speech 1.2 by Fish Audio
Setting up a Realistic Text-to-speech; Bark (by Suno AI) locally
มุมมอง 2.3K5 หลายเดือนก่อน
Setting up a Realistic Text-to-speech; Bark (by Suno AI) locally
Simple AI Agent/Chatbot | MegaMind | I/O with Whisper.cpp & Piper TTS
มุมมอง 4716 หลายเดือนก่อน
Simple AI Agent/Chatbot | MegaMind | I/O with Whisper.cpp & Piper TTS
Speech to text with Whisper CPP in a Python Project (with CoreML/Apple Silicon Support)
มุมมอง 1.3K6 หลายเดือนก่อน
Speech to text with Whisper CPP in a Python Project (with CoreML/Apple Silicon Support)
Gemini 1.5 Pro (latest) with Langchain's ChatVertexAI Package
มุมมอง 3657 หลายเดือนก่อน
Gemini 1.5 Pro (latest) with Langchain's ChatVertexAI Package
Vision Tool & Screenshot Tool for Langchain Structured Chat Agent (Powered by Gemini 1.5 Pro)
มุมมอง 3647 หลายเดือนก่อน
Vision Tool & Screenshot Tool for Langchain Structured Chat Agent (Powered by Gemini 1.5 Pro)
Installing Piper Text To Speech Engine (on a Macbook w/ Apple Silicon)
มุมมอง 1.4K7 หลายเดือนก่อน
Installing Piper Text To Speech Engine (on a Macbook w/ Apple Silicon)
Setting up Openvoice version 2 and MeloTTS for AI voice cloning
มุมมอง 8K7 หลายเดือนก่อน
Setting up Openvoice version 2 and MeloTTS for AI voice cloning
WizardLM2 function call - Using Llama 3 Tokenizer & Langchain's Pydantic OpenAI function converter
มุมมอง 1417 หลายเดือนก่อน
WizardLM2 function call - Using Llama 3 Tokenizer & Langchain's Pydantic OpenAI function converter
LLAMA 3: function calling review using llama index framework and Ollama locally.
มุมมอง 5948 หลายเดือนก่อน
LLAMA 3: function calling review using llama index framework and Ollama locally.
Port Harcourt City (Nigeria) | A Brief Cinematic OVER-view 🤣
มุมมอง 1029 หลายเดือนก่อน
Port Harcourt City (Nigeria) | A Brief Cinematic OVER-view 🤣
CFly Faith 2 Pro Drone Review: Advanced Features, 540º Obstacle Avoidance & Impressive Performance!
มุมมอง 64211 หลายเดือนก่อน
CFly Faith 2 Pro Drone Review: Advanced Features, 540º Obstacle Avoidance & Impressive Performance!
Speedybee Bee35 Pro FPV Frame Review: Durable, Protective, and Feature-Packed!
มุมมอง 1.1Kปีที่แล้ว
Speedybee Bee35 Pro FPV Frame Review: Durable, Protective, and Feature-Packed!
Honest Walksnail Avatar HD Pro Kit Review: Disappointing Range, and a Watery Demise 💔
มุมมอง 152ปีที่แล้ว
Honest Walksnail Avatar HD Pro Kit Review: Disappointing Range, and a Watery Demise 💔
Exploring the Landscapes of Uniport with the CFLY Faith 2 Pro Drone
มุมมอง 387ปีที่แล้ว
Exploring the Landscapes of Uniport with the CFLY Faith 2 Pro Drone
Bravo
Vorrei tanto che semplifichi di cose
Molto bravo quello che fai e molto complicato non è facile
Where can I find the github repo ?
github.com/brainiakk/Cosyvoice-WebUI-English
I've been looking for a solution for this for a while. Amazingly useful video!
Glad it helped! 🤟🏻
Sir, is there permission to fine-tune this model and use it in commercial projects? So there might be any licensing problems?
I don't think so, the license is Creative Commons (CC-by-NC) just ask someone into legal stuff and licenses about it, but not sure it's allowed for commercial projects
The bus/commute question - the self-reflection is all messed up, if you read it.
does it work well on macbook pro m1?
Good one...
Interested. Make a different demo. I still dont understand how its different fom standard function calling
Hello! I'm looking to create a French language model using OpenVoice. I have a dataset with audio clips and aim to generate a checkpoint, but I'm encountering issues with the training process
It would be fine to be able to record our own speech flow, to give the final generated voice the rythm, pause and pitch desired, as Applio RVC does
Require gpu 😢
I ran it on my cpu, it hasn't been optimized to run on apple metal gpu
the voice clone of your voice is 90%. really good!
True, but the v1.4 does a better job though
Can this do any language?
I think it's just english and chinese, you can check their research paper
How much data you need for making this?
How does that differ from tortoise-tts? And which is better?
Can I shot on flat on it ?
No
This is nice, but for someone like me it would be better with a good webui
They have a webui, you can visit their huggingface page, the problem is it is written in chinese so you can try translating the text to english if you have the time.
@@techgiantt if anyone uses microsoft edge browser, just right click and translate the page, by the way I experienced it in english by default.
I have translated it to english in a recent video
Great work! Once the voice is cloned and the code_N file is generated, how to use the inference only for fast tts use?
hello pro is any way to run on GPU Pro 4090 RTX ?
Like I said earlier, I use a Macbook, I don't use Nvidia hardware
How to run on GPU Pro
Your intro noise is unfathomably too loud.
pro what need to change to run this project in GPU i chage cpu to cuda also try cuda:0 but there some error what is wright way to use GPU i have local 4090 GPU in my laptop please help
Can I stream audio from suno bark ? Like get audio chunks with custom sample rate and audio format ?
bro, is you Nigerian?
Curious, what is this explorer window that shows the content being populated as you run the commands?
What are the required machine configs for this? I'm running out of memory on T4 Tesla 16 gigs on cuda and my ram 28 gigs on cpu
I ran it on my Macbook M1 pro but it also supports cuda if your gpu supports that, in the video I switched to cpu to run it on my mac
Is that support Bahasa Indonesia?
Does it have a docker setup?
not sure
hello pro big big fan pro finally its run perfectly with me 😍 thank you for this Awesome tutorial just one more question i have GPU 4090 on my laptop what i need to run this on GPU i try to change this code device="cpu" to device="cuda" i have 2GPU's intel Build in and Nvedia GFORCE RTX 4090 thank you in advance pro 😀
I use a macbook that's why I switched from cuda to cpu. You can reach out to Nvidia support if you're having trouble with their hardware, but make sure you've installed the necessary drivers for that graphics card so you can use cuda, before reaching out to them.
@@techgiantt i have Cuda setup and it work fine with many other projects but what change that need to do ro run in GPU
@techgiantt Which model of MacBook is required sir ?
Please pro make full clear tutorial
Hello pro i fellow your tutorial on fish speech Please make full clear tutorial about This model 1.4
we still wait for clear tutorial about this model please pro
Hello Sir thank you so much for this tutorial. I'm wondering if you can do the cloning but using Bark (the one you used in last video ) so the reading also sounds more human ?
hello i try all solution that you give me still same errors please i need your help best regards
hello pro good tutorial but i have mistake can you review with me what is problem did almost every thing like what you did its still show me No module named 'fish_speech.conversation' i check the path and every thing is good
Did you use version 1.2 (that's what I used)? but there is a new version (v1.4).
@@techgiantt yes i use 1.4 still same problem Can i have your contact No i need your help pro
@@techgiantt i need your contact No I need your help
@@mahaltech Just get version 1.2 from their github releases page, and follow the other steps I showed in the video.
@@techgiantt there is some defrance
The project can also read PDF documents and I am willing to connect it all into a super all in one project wit UI this months.
I would also Love an UI to select languajes between your outputs and select characters between your characters .TXT prompts. And for last a selector for your vaults .TXT to tell the LLM what do you expect the AI to know about your needs.
Thanks, I think you need a better 🎤 mic, your voice is so low, probably will put people to sleep, voice quality is more important than video, good luck
Sorry about the inconvenience, had some problems with the mic IO. Would switch it out in the next video.
great job!by the way, did you try vllm to load mistral llm? vllm can start an OpenAI API-compatible server with: python -m vllm.entrypoints.openai.api_server --model mistralai/Mistral-7B-Instruct-v0.3 from langchain.chat_models.base import BaseChatModel creat a custom class CustomVLLM:class CustomVLLM(BaseChatModel), then llm = CustomVLLM(base_url="localhost:8080/v1"
@@baoxinping3081 I’ll check it out, I’ve been postponing checking it out for a while. I’m also working on updating the project, maybe I’ll cram it into one video.
awesome
Having issues with piper, not able to find the module regardless of my attempts.
I didn't set up piper in this video
@@techgiantt no worries, was having some bizarre dependency issues with my venv, did a clean pip of the requirements and all ran smoothly besides some ALSA issues but cleaned it all up. Thanks.
guys make sure u r using python 3.10 or it gets weird. But also this was an amazing tutorial! thanks!!
is it utilizing the GPU of the apple silicon?
♥
As far as I understand, this project is for Linux and MacOS only. Since coremltools only works for them. But the video is still good.
For those who don't want to mess around with a complex installation, use Pinokio
Nice work man
Thanks man
While executing python -m main, I'm getting the following error: cannot import name 'TTS' from 'melo'
@@SurrenderToAction did you install Melo TTS ? Because it’s in the modules directory and you need to install it in that virtual environment you setup
@@techgiantt As you do on the 3:48 of the video, right? Yes, I did that, but I had a bunch of issues. Maybe because I was using last version of python, and went to 3.9 later..
@@SurrenderToAction Check th-cam.com/video/UsuuSgnOJxg/w-d-xo.html
@@techgiantt Yeah, I did rename it to "melo". Maybe better do everything from scratch. One more day..
@@techgiantt I'm getting this error when installing melo: ERROR: Failed building wheel for tokenizers
got this error when genereting the audio file ../lib/libespeak-ng.dylib' (no such file)