43
35 601

Real-Time Speech-to-Text & Speaker Identification using Whisper, Vosk & Pyannote (Open-Source)

17:27

Open-source Voice Cloning & Text to Speech with the new OuteTTS v0.2 500M Model | Local Setup

19:21

Installing Marco-o1 locally - Open Source "Reasoning" Model

21:55

Improving AI Agents with Background Tasks | A Different Approach to Handling Tools

4:53

Running LLMs locally w/ Ollama - Llama 3.2 11B Vision

10:34

Setting up Janus 1.3B Multimodal LLM Locally | Image Generation & Understanding

19:39

Using XAI's Grok 2 Text and Vision models with Langchain

This video demonstrates testing the new Grok 2 text and vision models, exploring their capabilities in image understanding and function calling through practical examples and real-world applications.
#ai #xai #aimodel #aiagents #tts #coding #python #llm #stt #grok #grok2 #xaimodels

มุมมอง: 39

วีดีโอ

Real-Time Speech-to-Text & Speaker Identification using Whisper, Vosk & Pyannote (Open-Source)

17:27

Real-Time Speech-to-Text & Speaker Identification using Whisper, Vosk & Pyannote (Open-Source)

มุมมอง 427วันที่ผ่านมา

In this video, I’ll walk you through two simple solutions for real-time speech-to-text and speaker verification/identification. These implementations combines transcription and speaker identification capabilities using popular tools like PyAnnote, Whisper, and Vosk. Whether you're building an AI system, exploring speech processing, or tackling speaker verification challenges, this video provide...

Open-source Voice Cloning & Text to Speech with the new OuteTTS v0.2 500M Model | Local Setup

19:21

Open-source Voice Cloning & Text to Speech with the new OuteTTS v0.2 500M Model | Local Setup

มุมมอง 68521 วันที่ผ่านมา

In this video I'll show you how to setup OuteTTS v0.2 locally. OuteTTS-0.2-500M is the improved successor to the v0.1 release. It performs fast voice cloning and speaker profile generation, which shortens the inference time. 🖥️ MY SETUP: PC: Macbook M1 Pro Language: Python Version: Python 3.12 #ai #coding #python #pythonprogramming #tts #texttospeechtechnology #texttospeech #voicecloning #voice...

Installing Marco-o1 locally - Open Source "Reasoning" Model

21:55

Installing Marco-o1 locally - Open Source "Reasoning" Model

มุมมอง 508หลายเดือนก่อน

In this video I'm going to be putting the new open-source "reasoning" language model: Marco-o1 to the test. Marco-o1 Large Language Model (LLM) is powered by Chain-of-Thought (CoT) fine-tuning, Monte Carlo Tree Search (MCTS), reflection mechanisms, and innovative reasoning strategies optimized for complex real-world problem-solving tasks. #ai #coding #llm #reasoning #reasoningmodel #artificiali...

Improving AI Agents with Background Tasks | A Different Approach to Handling Tools

4:53

Improving AI Agents with Background Tasks | A Different Approach to Handling Tools

มุมมอง 384หลายเดือนก่อน

A quick demo showing how AI agents can run tools in the background using a Task Manager. This is just a different way to handle tool execution compared to typical approaches. If you're working with AI agents, you might find this implementation interesting. #ai #Programming #Demo #TechTutorial #python #artificialintelligence #langchain #aitools #aiagents #agentic

Running LLMs locally w/ Ollama - Llama 3.2 11B Vision

10:34

Running LLMs locally w/ Ollama - Llama 3.2 11B Vision

มุมมอง 1.1Kหลายเดือนก่อน

In this video, we'll explore how to use Ollama’s latest Llama 3.2 Vision model with 11 billion parameters and run it locally. The Llama 3.2 Vision model, available in 11B and 90B versions, brings advanced multimodal capabilities that allow it to interpret images, recognize scenes, generate captions, and answer questions based on visual content. Optimized for both image reasoning and text-based ...

Setting up Janus 1.3B Multimodal LLM Locally | Image Generation & Understanding

19:39

Setting up Janus 1.3B Multimodal LLM Locally | Image Generation & Understanding

มุมมอง 540หลายเดือนก่อน

In this video, I'll show you how to locally setup the new Multimodal Large Language Model that's capable of both image understanding and generation; called Janus 1.3B by Deepseek AI 🖥️ MY SETUP: PC: Macbook M1 Pro Language: Python Version: Python 3.12 #ai #aivoice #aivoices #texttospeech #tts #cosyvoice #funaudiollm #voicecloning #voicesynthesis #voice #llm #prompt #instruct #opensource #openso...

Open-source Voice Cloning with the new F5 TTS Model | Local Setup, CLI Inference & Gradio Web UI

12:32

Open-source Voice Cloning with the new F5 TTS Model | Local Setup, CLI Inference & Gradio Web UI

มุมมอง 4.1K2 หลายเดือนก่อน

In this video, I'll show you how to setup the F5-TTS an excellent voice cloning text to speech model. 🖥️ MY SETUP: PC: Macbook M1 Pro Language: Python Version: Python 3.12 🔗 LINKS F5-TTS HF: huggingface.co/SWivid/F5-TTS E2-TTS HF: huggingface.co/SWivid/E2-TTS F5-TTS Github Repo: github.com/FunAudioLLM/CosyVoice OTHER VOICE CLONING MODELS: Cosyvoice SFT Model setup: th-cam.com/video/NDckWBZztTI/...

CosyVoice Text to Speech WebUI (Open-source) - English Version

6:32

CosyVoice Text to Speech WebUI (Open-source) - English Version

มุมมอง 3812 หลายเดือนก่อน

NOTE: This video is just an overview of Cosyvoice WebUI running on my Macbook M1 Pro In this video, we'll go over the CosyVoice WebUI (which I translated to English) running on my MacBook. We tried both the SFT Model & the Base model. We then generated speech from text with both models and we also cloned Cogman's voice (the robot butler from the movie: Transformers Last Knight) 🔗 LINKS Cosyvoic...

CosyVoice TTS #3 | Open-source Instruct Model Text-to-Speech

11:21

CosyVoice TTS #3 | Open-source Instruct Model Text-to-Speech

มุมมอง 2262 หลายเดือนก่อน

NOTE: This video is the third of a three part series, where I setup Cosyvoice on my Macbook M1 Pro In this tutorial, I'll guide you through setting up CosyVoice on your MacBook for multilingual text-to-speech synthesis using Python3.12 & Conda env. CosyVoice is a cutting-edge multilingual text-to-speech (TTS) system designed to produce natural, lifelike speech across over 100 languages. Its sta...

CosyVoice TTS #2 | Open-source Base Model Voice Cloning & Cross-Lingual

12:22

CosyVoice TTS #2 | Open-source Base Model Voice Cloning & Cross-Lingual

มุมมอง 4072 หลายเดือนก่อน

NOTE: This video is the second of a three part series, where I setup Cosyvoice on my Macbook M1 Pro In this tutorial, I'll guide you through setting up CosyVoice on your MacBook for multilingual text-to-speech synthesis using Python3.12 & Conda env. CosyVoice is a cutting-edge multilingual text-to-speech (TTS) system designed to produce natural, lifelike speech across over 100 languages. Its st...

Setting up CosyVoice TTS #1 | Open-source SFT Model Text to Speech

12:50

Setting up CosyVoice TTS #1 | Open-source SFT Model Text to Speech

มุมมอง 5202 หลายเดือนก่อน

NOTE: This video is the first of a three part series, where I setup Cosyvoice on my Macbook M1 Pro In this tutorial, I'll guide you through setting up CosyVoice on your MacBook for multilingual text-to-speech synthesis using Python3.12 & Conda env. CosyVoice is a cutting-edge multilingual text-to-speech (TTS) system designed to produce natural, lifelike speech across over 100 languages. Its sta...

Meta's Llama 3.2 3B, 11B & 90B Vision models

26:42

Meta's Llama 3.2 3B, 11B & 90B Vision models

มุมมอง 3902 หลายเดือนก่อน

In this video we test three sizes of Llama 3.2 (3B, 11B & 90B), the newly released Llama series of models by Meta AI #ai #llm #aiagents #llama #llama3 #llama3.2 #togetherai #langchain #aiagent #streamlit #metaai #meta

Setting up Fish Speech TTS v1.4 by @FishAudio locally- High Quality Open-source Voice Cloning Model

16:44

Setting up Fish Speech TTS v1.4 by @FishAudio locally- High Quality Open-source Voice Cloning Model

มุมมอง 2.1K3 หลายเดือนก่อน

NOTE: This is just an update to my previous video on setting up Fish Speech v1.2 on Macbook M1 Pro I'll be setting up Fish Speech version 1.4 by @FishAudio This is a great TTS model trained on 700k hours of audio data in multiple languages (English, Japanese, German, French, Spanish, Korean, Arabic, and Chinese audio data), it also performs wonderfully at voice cloning and TTS generation. The o...

High Quality Voice Cloning TTS Model - Fish Speech 1.2 by Fish Audio

22:21

High Quality Voice Cloning TTS Model - Fish Speech 1.2 by Fish Audio

มุมมอง 1.7K4 หลายเดือนก่อน

High Quality Voice Cloning TTS Model - Fish Speech 1.2 by Fish Audio

Setting up a Realistic Text-to-speech; Bark (by Suno AI) locally

12:17

Setting up a Realistic Text-to-speech; Bark (by Suno AI) locally

มุมมอง 2.3K5 หลายเดือนก่อน

Setting up a Realistic Text-to-speech; Bark (by Suno AI) locally

Simple AI Agent/Chatbot | MegaMind | I/O with Whisper.cpp & Piper TTS

36:59

Simple AI Agent/Chatbot | MegaMind | I/O with Whisper.cpp & Piper TTS

มุมมอง 4716 หลายเดือนก่อน

Simple AI Agent/Chatbot | MegaMind | I/O with Whisper.cpp & Piper TTS

Speech to text with Whisper CPP in a Python Project (with CoreML/Apple Silicon Support)

11:30

Speech to text with Whisper CPP in a Python Project (with CoreML/Apple Silicon Support)

มุมมอง 1.3K6 หลายเดือนก่อน

Speech to text with Whisper CPP in a Python Project (with CoreML/Apple Silicon Support)

Gemini 1.5 Pro (latest) with Langchain's ChatVertexAI Package

8:13

Gemini 1.5 Pro (latest) with Langchain's ChatVertexAI Package

มุมมอง 3657 หลายเดือนก่อน

Gemini 1.5 Pro (latest) with Langchain's ChatVertexAI Package

Vision Tool & Screenshot Tool for Langchain Structured Chat Agent (Powered by Gemini 1.5 Pro)

13:27

Vision Tool & Screenshot Tool for Langchain Structured Chat Agent (Powered by Gemini 1.5 Pro)

มุมมอง 3647 หลายเดือนก่อน

Vision Tool & Screenshot Tool for Langchain Structured Chat Agent (Powered by Gemini 1.5 Pro)

Installing Piper Text To Speech Engine (on a Macbook w/ Apple Silicon)

20:39

Installing Piper Text To Speech Engine (on a Macbook w/ Apple Silicon)

มุมมอง 1.4K7 หลายเดือนก่อน

Installing Piper Text To Speech Engine (on a Macbook w/ Apple Silicon)

Setting up Openvoice version 2 and MeloTTS for AI voice cloning

36:55

Setting up Openvoice version 2 and MeloTTS for AI voice cloning

มุมมอง 8K7 หลายเดือนก่อน

Setting up Openvoice version 2 and MeloTTS for AI voice cloning

WizardLM2 function call - Using Llama 3 Tokenizer & Langchain's Pydantic OpenAI function converter

3:26

WizardLM2 function call - Using Llama 3 Tokenizer & Langchain's Pydantic OpenAI function converter

มุมมอง 1417 หลายเดือนก่อน

WizardLM2 function call - Using Llama 3 Tokenizer & Langchain's Pydantic OpenAI function converter

LLAMA 3: function calling review using llama index framework and Ollama locally.

5:34

LLAMA 3: function calling review using llama index framework and Ollama locally.

มุมมอง 5948 หลายเดือนก่อน

LLAMA 3: function calling review using llama index framework and Ollama locally.

Port Harcourt City (Nigeria) | A Brief Cinematic OVER-view 🤣

0:49

Port Harcourt City (Nigeria) | A Brief Cinematic OVER-view 🤣

มุมมอง 1029 หลายเดือนก่อน

Port Harcourt City (Nigeria) | A Brief Cinematic OVER-view 🤣

CFly Faith 2 Pro Drone Review: Advanced Features, 540º Obstacle Avoidance & Impressive Performance!

11:14

CFly Faith 2 Pro Drone Review: Advanced Features, 540º Obstacle Avoidance & Impressive Performance!

มุมมอง 64211 หลายเดือนก่อน

CFly Faith 2 Pro Drone Review: Advanced Features, 540º Obstacle Avoidance & Impressive Performance!

Speedybee Bee35 Pro FPV Frame Review: Durable, Protective, and Feature-Packed!

11:44

Speedybee Bee35 Pro FPV Frame Review: Durable, Protective, and Feature-Packed!

มุมมอง 1.1Kปีที่แล้ว

Speedybee Bee35 Pro FPV Frame Review: Durable, Protective, and Feature-Packed!

9:30

Honest Walksnail Avatar HD Pro Kit Review: Disappointing Range, and a Watery Demise 💔

มุมมอง 152ปีที่แล้ว

Honest Walksnail Avatar HD Pro Kit Review: Disappointing Range, and a Watery Demise 💔

1:06

TECH GIANT Youtube Channel Trailer

มุมมอง 124ปีที่แล้ว

TECH GIANT TH-cam Channel Trailer

Exploring the Landscapes of Uniport with the CFLY Faith 2 Pro Drone

3:07

Exploring the Landscapes of Uniport with the CFLY Faith 2 Pro Drone

มุมมอง 387ปีที่แล้ว

Exploring the Landscapes of Uniport with the CFLY Faith 2 Pro Drone

ความคิดเห็น

@hubtv1740 6 วันที่ผ่านมา
Bravo
@hubtv1740 6 วันที่ผ่านมา
Vorrei tanto che semplifichi di cose
@hubtv1740 6 วันที่ผ่านมา
Molto bravo quello che fai e molto complicato non è facile
@fework 8 วันที่ผ่านมา
Where can I find the github repo ?
@techgiantt 7 วันที่ผ่านมา
github.com/brainiakk/Cosyvoice-WebUI-English
@YetAnotherNotHacking 13 วันที่ผ่านมา
I've been looking for a solution for this for a while. Amazingly useful video!
@techgiantt 13 วันที่ผ่านมา
Glad it helped! 🤟🏻
@DanielMayfield-iv6ju 23 วันที่ผ่านมา
Sir, is there permission to fine-tune this model and use it in commercial projects? So there might be any licensing problems?
@techgiantt 23 วันที่ผ่านมา
I don't think so, the license is Creative Commons (CC-by-NC) just ask someone into legal stuff and licenses about it, but not sure it's allowed for commercial projects
@junyuzheng5282 29 วันที่ผ่านมา
The bus/commute question - the self-reflection is all messed up, if you read it.
@vermisTheOneAlien 29 วันที่ผ่านมา
does it work well on macbook pro m1?
@HardwareHouse_spanish หลายเดือนก่อน
Good one...
@ssssqqww11 หลายเดือนก่อน
Interested. Make a different demo. I still dont understand how its different fom standard function calling
@mostaphaghribi7512 หลายเดือนก่อน
Hello! I'm looking to create a French language model using OpenVoice. I have a dataset with audio clips and aim to generate a checkpoint, but I'm encountering issues with the training process
@fredpourlesintimes หลายเดือนก่อน
It would be fine to be able to record our own speech flow, to give the final generated voice the rythm, pause and pitch desired, as Applio RVC does
@logeshr1761 หลายเดือนก่อน
Require gpu 😢
@techgiantt หลายเดือนก่อน
I ran it on my cpu, it hasn't been optimized to run on apple metal gpu
@flethacker หลายเดือนก่อน
the voice clone of your voice is 90%. really good!
@techgiantt หลายเดือนก่อน
True, but the v1.4 does a better job though
@BateesaSaul หลายเดือนก่อน
Can this do any language?
@techgiantt หลายเดือนก่อน
I think it's just english and chinese, you can check their research paper
@geleiacompepino 2 หลายเดือนก่อน
How much data you need for making this?
@xXWillyxWonkaXx 2 หลายเดือนก่อน
How does that differ from tortoise-tts? And which is better?
@LuckyTvNG 2 หลายเดือนก่อน
Can I shot on flat on it ?
@techgiantt 2 หลายเดือนก่อน
No
@fulldivemedia 2 หลายเดือนก่อน
This is nice, but for someone like me it would be better with a good webui
@techgiantt 2 หลายเดือนก่อน
They have a webui, you can visit their huggingface page, the problem is it is written in chinese so you can try translating the text to english if you have the time.
@yikifooler 2 หลายเดือนก่อน
@@techgiantt if anyone uses microsoft edge browser, just right click and translate the page, by the way I experienced it in english by default.
@techgiantt 2 หลายเดือนก่อน
I have translated it to english in a recent video
@mikebeamlight8334 2 หลายเดือนก่อน
Great work! Once the voice is cloned and the code_N file is generated, how to use the inference only for fast tts use?
@mahaltech 2 หลายเดือนก่อน
hello pro is any way to run on GPU Pro 4090 RTX ?
@techgiantt 2 หลายเดือนก่อน
Like I said earlier, I use a Macbook, I don't use Nvidia hardware
@mahaltech 2 หลายเดือนก่อน
How to run on GPU Pro
@Joe-ot5bo 2 หลายเดือนก่อน
Your intro noise is unfathomably too loud.
@mahaltech 2 หลายเดือนก่อน
pro what need to change to run this project in GPU i chage cpu to cuda also try cuda:0 but there some error what is wright way to use GPU i have local 4090 GPU in my laptop please help
@nomuchohan 2 หลายเดือนก่อน
Can I stream audio from suno bark ? Like get audio chunks with custom sample rate and audio format ?
@aotrakstar 2 หลายเดือนก่อน
bro, is you Nigerian?
@jun6lee 2 หลายเดือนก่อน
Curious, what is this explorer window that shows the content being populated as you run the commands?
@dhananjaytalati4529 2 หลายเดือนก่อน
What are the required machine configs for this? I'm running out of memory on T4 Tesla 16 gigs on cuda and my ram 28 gigs on cpu
@techgiantt 2 หลายเดือนก่อน
I ran it on my Macbook M1 pro but it also supports cuda if your gpu supports that, in the video I switched to cpu to run it on my mac
@JagoKritik 3 หลายเดือนก่อน
Is that support Bahasa Indonesia?
@ojikutu 3 หลายเดือนก่อน
Does it have a docker setup?
@techgiantt 2 หลายเดือนก่อน
not sure
@mahaltech 3 หลายเดือนก่อน
hello pro big big fan pro finally its run perfectly with me 😍 thank you for this Awesome tutorial just one more question i have GPU 4090 on my laptop what i need to run this on GPU i try to change this code device="cpu" to device="cuda" i have 2GPU's intel Build in and Nvedia GFORCE RTX 4090 thank you in advance pro 😀
@techgiantt 3 หลายเดือนก่อน
I use a macbook that's why I switched from cuda to cpu. You can reach out to Nvidia support if you're having trouble with their hardware, but make sure you've installed the necessary drivers for that graphics card so you can use cuda, before reaching out to them.
@mahaltech 2 หลายเดือนก่อน
@@techgiantt i have Cuda setup and it work fine with many other projects but what change that need to do ro run in GPU
@SumithKumar-z3x 2 หลายเดือนก่อน
@techgiantt Which model of MacBook is required sir ?
@mahaltech 3 หลายเดือนก่อน
Please pro make full clear tutorial
@mahaltech 3 หลายเดือนก่อน
Hello pro i fellow your tutorial on fish speech Please make full clear tutorial about This model 1.4
@mahaltech 3 หลายเดือนก่อน
we still wait for clear tutorial about this model please pro
@mahmoudlaal8601 3 หลายเดือนก่อน
Hello Sir thank you so much for this tutorial. I'm wondering if you can do the cloning but using Bark (the one you used in last video ) so the reading also sounds more human ?
@mahaltech 3 หลายเดือนก่อน
hello i try all solution that you give me still same errors please i need your help best regards
@mahaltech 3 หลายเดือนก่อน
hello pro good tutorial but i have mistake can you review with me what is problem did almost every thing like what you did its still show me No module named 'fish_speech.conversation' i check the path and every thing is good
@techgiantt 3 หลายเดือนก่อน
Did you use version 1.2 (that's what I used)? but there is a new version (v1.4).
@mahaltech 3 หลายเดือนก่อน
@@techgiantt yes i use 1.4 still same problem Can i have your contact No i need your help pro
@mahaltech 3 หลายเดือนก่อน
@@techgiantt i need your contact No I need your help
@techgiantt 3 หลายเดือนก่อน
@@mahaltech Just get version 1.2 from their github releases page, and follow the other steps I showed in the video.
@mahaltech 3 หลายเดือนก่อน
@@techgiantt there is some defrance
@T.ONE-2-EntreOlivosEstudio 4 หลายเดือนก่อน
The project can also read PDF documents and I am willing to connect it all into a super all in one project wit UI this months.
@T.ONE-2-EntreOlivosEstudio 4 หลายเดือนก่อน
I would also Love an UI to select languajes between your outputs and select characters between your characters .TXT prompts. And for last a selector for your vaults .TXT to tell the LLM what do you expect the AI to know about your needs.
@fulldivemedia 4 หลายเดือนก่อน
Thanks, I think you need a better 🎤 mic, your voice is so low, probably will put people to sleep, voice quality is more important than video, good luck
@techgiantt 3 หลายเดือนก่อน
Sorry about the inconvenience, had some problems with the mic IO. Would switch it out in the next video.
@baoxinping3081 4 หลายเดือนก่อน
great job！by the way, did you try vllm to load mistral llm? vllm can start an OpenAI API-compatible server with: python -m vllm.entrypoints.openai.api_server --model mistralai/Mistral-7B-Instruct-v0.3 from langchain.chat_models.base import BaseChatModel creat a custom class CustomVLLM：class CustomVLLM(BaseChatModel)， then llm = CustomVLLM(base_url="localhost:8080/v1"
@techgiantt 4 หลายเดือนก่อน
@@baoxinping3081 I’ll check it out, I’ve been postponing checking it out for a while. I’m also working on updating the project, maybe I’ll cram it into one video.
@baoxinping3081 3 หลายเดือนก่อน
awesome
@OriahVinree 4 หลายเดือนก่อน
Having issues with piper, not able to find the module regardless of my attempts.
@techgiantt 4 หลายเดือนก่อน
I didn't set up piper in this video
@OriahVinree 4 หลายเดือนก่อน
@@techgiantt no worries, was having some bizarre dependency issues with my venv, did a clean pip of the requirements and all ran smoothly besides some ALSA issues but cleaned it all up. Thanks.
@jadenscali4585 4 หลายเดือนก่อน
guys make sure u r using python 3.10 or it gets weird. But also this was an amazing tutorial! thanks!!
@RansbyJohan 4 หลายเดือนก่อน
is it utilizing the GPU of the apple silicon?
@martinjoshy9102 4 หลายเดือนก่อน
♥
@HellFable 4 หลายเดือนก่อน
As far as I understand, this project is for Linux and MacOS only. Since coremltools only works for them. But the video is still good.
@wilcurran3377 5 หลายเดือนก่อน
For those who don't want to mess around with a complex installation, use Pinokio
@digitalhour 5 หลายเดือนก่อน
Nice work man
@techgiantt 5 หลายเดือนก่อน
Thanks man
@SurrenderToAction 5 หลายเดือนก่อน
While executing python -m main, I'm getting the following error: cannot import name 'TTS' from 'melo'
@techgiantt 5 หลายเดือนก่อน
@@SurrenderToAction did you install Melo TTS ? Because it’s in the modules directory and you need to install it in that virtual environment you setup
@SurrenderToAction 5 หลายเดือนก่อน
@@techgiantt As you do on the 3:48 of the video, right? Yes, I did that, but I had a bunch of issues. Maybe because I was using last version of python, and went to 3.9 later..
@techgiantt 5 หลายเดือนก่อน
@@SurrenderToAction Check th-cam.com/video/UsuuSgnOJxg/w-d-xo.html
@SurrenderToAction 5 หลายเดือนก่อน
@@techgiantt Yeah, I did rename it to "melo". Maybe better do everything from scratch. One more day..
@SurrenderToAction 5 หลายเดือนก่อน
@@techgiantt I'm getting this error when installing melo: ERROR: Failed building wheel for tokenizers
@hteinferno 5 หลายเดือนก่อน
got this error when genereting the audio file ../lib/libespeak-ng.dylib' (no such file)

Tech Giant

ความคิดเห็น