- 216
- 8 679
Two Voice Devs
United States
เข้าร่วมเมื่อ 29 ก.ค. 2020
Episode 214 - NotebookLM: The Future of Personalized AI Learning for Developers?
Dive into the world of AI-powered learning with Allen and Mark as they explore Google's innovative NotebookLM. This cutting-edge tool offers a fascinating glimpse into the potential of Google's Gemini AI model. NotebookLM allows you to centralize your notes, documents, and even audio/video transcripts, transforming them into an interactive knowledge base. Discover how its conversational interface lets you ask questions, generate summaries with citations, and even create podcasts from your source material!
Allen and Mark discuss how NotebookLM serves as a compelling example for developers looking to build their own Gemini-based applications. They break down how its features, like intelligent summarization, citation generation, and conversational Q&A, can be replicated and customized using the Gemini API. This episode provides valuable insights and inspiration for developers eager to harness the power of Gemini for their own projects.
They also cover practical use cases for students, developers, and anyone looking to personalize their learning experience, while addressing NotebookLM's current limitations. Join the conversation and share your thoughts on how you might use this exciting new technology!
Timestamps:
[00:00:00] Introduction
[00:00:50] What is NotebookLM? Core Functionality and Features
[00:02:47] Asking Questions and Getting AI-Generated Summaries with Citations
[00:04:23] Following Citations and References within the Notebook
[00:06:24] Potential Use Cases for Students (PDF Import and Analysis)
[00:07:17] Creating Outlines and Other AI-Assisted Note Enhancements
[00:07:51] Supported File Types and Data Handling
[00:09:10] Generating Podcasts from Source Material (with audio example!)
[00:11:47] Using NotebookLM with Technical Documentation (Gemini API Example)
[00:14:17] Managing and Selecting Sources within a Notebook
[00:14:53] Downsides and Limitations (No API, Manual Processes)
[00:17:43] Comparison to ChatGPT and GPTs
[00:18:41] Sharing and Collaboration Features
[00:19:26] Potential Applications for Developers (Project Management, Code Analysis)
[00:25:11] Integrating with Automated Testing and CI/CD
[00:25:34] Potential Integration with Git Repositories and Version Control
[00:27:12] The Future of AI-Powered Knowledge Systems
[00:28:04] More Potential Use Cases (Research Paper Analysis)
[00:28:36] Customized Learning and Problem Solving with AI
[00:30:06] Conclusion and Call to Action
Image generated using Imagen 3 with prompt:
Cartoon ink and paint, with a touch of tech.
Scene: Two podcast hosts, sitting in front of microphones,
smiling and engaging in conversation.
Both hosts are male, caucasian, software developers in their early 50s,
wearing glasses, and are clean shaven.
The host on the left is wearing an orange t-shirt and a black flat cap.
The host on the right is wearing a light blue polo shirt.
Warm, inviting lighting.
Background:
In the center is a stylized notebook, connecting documents, code, audio,
and video. Also connected to this notebook are two human-looking android
podcasters, one male and one female, who have microphones as well.
Negative prompt:
beards
#NotebookLM #AI #ArtificialIntelligence #MachineLearning #Developers #Programming #SoftwareDevelopment #Productivity #Learning #Education #Podcast #GoogleAI #Gemini #Chatbots #KnowledgeManagement #NoteTaking #AItools #TechPodcast #TwoVoiceDevs
Allen and Mark discuss how NotebookLM serves as a compelling example for developers looking to build their own Gemini-based applications. They break down how its features, like intelligent summarization, citation generation, and conversational Q&A, can be replicated and customized using the Gemini API. This episode provides valuable insights and inspiration for developers eager to harness the power of Gemini for their own projects.
They also cover practical use cases for students, developers, and anyone looking to personalize their learning experience, while addressing NotebookLM's current limitations. Join the conversation and share your thoughts on how you might use this exciting new technology!
Timestamps:
[00:00:00] Introduction
[00:00:50] What is NotebookLM? Core Functionality and Features
[00:02:47] Asking Questions and Getting AI-Generated Summaries with Citations
[00:04:23] Following Citations and References within the Notebook
[00:06:24] Potential Use Cases for Students (PDF Import and Analysis)
[00:07:17] Creating Outlines and Other AI-Assisted Note Enhancements
[00:07:51] Supported File Types and Data Handling
[00:09:10] Generating Podcasts from Source Material (with audio example!)
[00:11:47] Using NotebookLM with Technical Documentation (Gemini API Example)
[00:14:17] Managing and Selecting Sources within a Notebook
[00:14:53] Downsides and Limitations (No API, Manual Processes)
[00:17:43] Comparison to ChatGPT and GPTs
[00:18:41] Sharing and Collaboration Features
[00:19:26] Potential Applications for Developers (Project Management, Code Analysis)
[00:25:11] Integrating with Automated Testing and CI/CD
[00:25:34] Potential Integration with Git Repositories and Version Control
[00:27:12] The Future of AI-Powered Knowledge Systems
[00:28:04] More Potential Use Cases (Research Paper Analysis)
[00:28:36] Customized Learning and Problem Solving with AI
[00:30:06] Conclusion and Call to Action
Image generated using Imagen 3 with prompt:
Cartoon ink and paint, with a touch of tech.
Scene: Two podcast hosts, sitting in front of microphones,
smiling and engaging in conversation.
Both hosts are male, caucasian, software developers in their early 50s,
wearing glasses, and are clean shaven.
The host on the left is wearing an orange t-shirt and a black flat cap.
The host on the right is wearing a light blue polo shirt.
Warm, inviting lighting.
Background:
In the center is a stylized notebook, connecting documents, code, audio,
and video. Also connected to this notebook are two human-looking android
podcasters, one male and one female, who have microphones as well.
Negative prompt:
beards
#NotebookLM #AI #ArtificialIntelligence #MachineLearning #Developers #Programming #SoftwareDevelopment #Productivity #Learning #Education #Podcast #GoogleAI #Gemini #Chatbots #KnowledgeManagement #NoteTaking #AItools #TechPodcast #TwoVoiceDevs
มุมมอง: 27
วีดีโอ
Episode 213 - Scary Developer Stories: A Halloween Special
มุมมอง 20วันที่ผ่านมา
Boo! Join Two Voice Devs for a special Halloween episode filled with chilling tales from the software development crypt. Mark and Allen recount true stories of coding nightmares, from dropped databases to runaway pings, and offer words of wisdom for surviving your own development horrors. Listen with the lights on (if you dare) as they explore the spooky side of coding, complete with a chilling...
Episode 212 - Data Labeling for Developers
มุมมอง 7514 วันที่ผ่านมา
Join Mark and Allen, your Two Voice Devs, as they delve into the crucial world of data labeling for machine learning model training. Whether you're a seasoned data scientist or a developer just starting to explore AI, understanding data labeling is essential for building effective models. In this episode, they explore various data labeling techniques, from manual labeling for simple voice apps ...
Episode 211 - Apple Intelligence and Siri's Future (and Beyond)
มุมมอง 6021 วันที่ผ่านมา
Join us for a fascinating conversation with John G, a seasoned voice developer, as he shares his insights into Apple's approach to AI and the future of Siri. John discusses his journey from helping content creators to the Alexa ecosystem and then into the Apple world, driven by the potential of App Intents and the evolving landscape of Apple Intelligence. We delve into the technical details, ex...
Episode 210 - Simplifying Generative AI Development with Firebase GenKit & GitHub Models
มุมมอง 76หลายเดือนก่อน
Join Mark and Xavier on Two Voice Devs as they dive into the world of generative AI development with Firebase GenKit and GitHub Models. Xavier, a Google Developer Expert in AI, Microsoft MVP in AI, and GitHub Star, shares his insights on these emerging technologies and his open-source project that bridges the gap between them. Discover how Firebase GenKit offers a simpler, more modular approach...
Episode 209 - AI-Powered Pronunciation: Conquering Tricky TTS
มุมมอง 43หลายเดือนก่อน
This episode of Two Voice Devs, recorded before the exciting announcement of OpenAI's GPT-4o Realtime and Audio previews, tackles a classic developer challenge: taming unruly text-to-speech (TTS) engines. Triggered by a listener question, Allen and Mark dive into the frustrating inconsistencies of TTS pronunciation, particularly when dealing with dynamically generated text from LLMs. They explo...
Episode 208 - O1: Reasoning Engine or Agent's Brain?
มุมมอง 262หลายเดือนก่อน
Join us as we dive deep into OpenAI's latest model, O1, with special guest host Michal Stanislavik, founder of utter.one and one of the voice community builder behind VoiceLunch. We explore the model's "reasoning" capabilities, its potential impact on conversational AI, and how developers can leverage its strengths. Michal shares his insights from hands-on experience, highlighting both the exci...
Episode 207 - Mentorship in Software Development
มุมมอง 79หลายเดือนก่อน
Join Mark and Allen on this episode of Two Voice Devs as they dive into the often overlooked but crucial topic of mentorship in software development. They explore what mentorship is (and isn't), the benefits for both mentor and mentee, and share personal anecdotes and practical advice. Whether you're a seasoned developer or just starting out, this episode offers valuable insights into fostering...
Episode 206 - Building Powerful AI Agents with LangGraph
มุมมอง 319หลายเดือนก่อน
Dive into the world of agentic AI development with Allen and Mark as they explore LangGraph, a powerful state management system for building dynamic and complex AI agents with LangChain. Discover how LangGraph simplifies agent design, handles state transitions, integrates tools, and enables robust error handling - all while keeping the LLM at the heart of your application. Further Info: * githu...
Episode 205 - Gemini + LangGraph Agents + Google Sheets = Vodo Drive
มุมมอง 5122 หลายเดือนก่อน
Join us as we explore Vodo Drive, an innovative project that leverages Google's Gemini AI to revolutionize how we interact with spreadsheets. Creator Allen Firstenberg takes us behind the scenes, revealing the architecture, challenges, and breakthroughs of building an agentic system that understands and manipulates data like never before. Discover how Vodo Drive: * Empowers natural language int...
Episode 204 - Alexa Skill Sunset Strategies
มุมมอง 392 หลายเดือนก่อน
In this episode of Two Voice Devs, Allen and Mark discuss the considerations and strategies for shutting down an Alexa skill. They explore various reasons why developers might choose to sunset their skills, including declining usage, deprecated features, and the evolving Alexa landscape. They also delve into the technical aspects of skill removal, highlighting the options of hiding or removing ...
Episode 203 - Imagen 3: Stunning Realism & Ethical Questions
มุมมอง 602 หลายเดือนก่อน
Join Allen and Linda as they dive into Google's Imagen 3 and Imagen 3 Fast, a powerful new set of image generation models. We explore its capabilities, pricing, features, and limitations, including a deep dive into the API and how to use it with Python code. This episode features an in-depth look at Imagen 3's photorealism and comparison with its predecessor, Imagen 2. We examine the ethical im...
Episode 202 - Hosting and Large Language Models
มุมมอง 132 หลายเดือนก่อน
Join Allen Firstenberg and Mark Tucker on Two Voice Devs as they discuss the challenges and solutions of hosting large language models (LLMs). They explore various hosting environments, including Firebase, AWS Amplify, Vertex AI, and Docker/Kubernetes, comparing their strengths and weaknesses. Allen shares his experience with Firebase Cloud Functions and the seamless integration with Google Clo...
Episode 201 - Introduction to KitOps for MLOps
มุมมอง 323 หลายเดือนก่อน
Join Allen and Mark in this episode of Two Voice Devs as they dive into the world of MLOps and explore KitOps, an open-source tool for packaging and versioning machine learning models and related artifacts. Learn how KitOps leverages the Open Container Initiative (OCI) standard to simplify model sharing and deployment. Learn more: * kitops.ml/ Key Topics and Timestamps: What is DevOps? (0:00:41...
Episode 200 - Four Years and Looking Forward
มุมมอง 423 หลายเดือนก่อน
Mark Tucker and Allen Firstenberg celebrate 200 episodes and four years of Two Voice Devs! In this special episode, they reflect on the journey so far, the evolution of the AI landscape, and what excites them most about the future of development. Join them as they discuss: 00:00 Four years ago... 00:10 The evolution of large language models (LLMs) and how the landscape has shifted over the past...
Episode 199 - Is the Future of AI Local?
มุมมอง 493 หลายเดือนก่อน
Episode 199 - Is the Future of AI Local?
Episode 198 - Wisdom from Unparsed: LLMs are Hammers, Not Silver Bullets
มุมมอง 433 หลายเดือนก่อน
Episode 198 - Wisdom from Unparsed: LLMs are Hammers, Not Silver Bullets
Episode 197 - Alexa Skill Development in the Age of LLMs
มุมมอง 394 หลายเดือนก่อน
Episode 197 - Alexa Skill Development in the Age of LLMs
Episode 196 - Is GPT 4o a Game Changer?
มุมมอง 345 หลายเดือนก่อน
Episode 196 - Is GPT 4o a Game Changer?
Episode 195 - Android, Agents, and the Rabbit R1
มุมมอง 1455 หลายเดือนก่อน
Episode 195 - Android, Agents, and the Rabbit R1
Episode 193 - Revolutionizing Intent Classification
มุมมอง 3576 หลายเดือนก่อน
Episode 193 - Revolutionizing Intent Classification
Episode 192 - Google Cloud Next 2024 Recap
มุมมอง 606 หลายเดือนก่อน
Episode 192 - Google Cloud Next 2024 Recap
Episode 191 - Beyond the Hype: Exploring BERT
มุมมอง 416 หลายเดือนก่อน
Episode 191 - Beyond the Hype: Exploring BERT
Episode 190 - Google Gemma's Tortoise and Hare Adventure
มุมมอง 187 หลายเดือนก่อน
Episode 190 - Google Gemma's Tortoise and Hare Adventure
Episode 189 - Farewell, ADR: The Impact on Alexa Developers
มุมมอง 847 หลายเดือนก่อน
Episode 189 - Farewell, ADR: The Impact on Alexa Developers
Episode 188 - Building Responsible AI with Gemini
มุมมอง 547 หลายเดือนก่อน
Episode 188 - Building Responsible AI with Gemini
Episode 186 - Conversational AI with Voiceflow Functions
มุมมอง 507 หลายเดือนก่อน
Episode 186 - Conversational AI with Voiceflow Functions
Episode 185 - Cloud vs Local LLMs: A Developer's Dilemma
มุมมอง 1918 หลายเดือนก่อน
Episode 185 - Cloud vs Local LLMs: A Developer's Dilemma
Is there any indication that there will be multimodal devices?
OK...This is probably the most interesting TVD you've published. I'm trying to rebuild my Mac (Mini) environment and I'd like to work on some of this. My MM isn't working (apparently the disk is corrupt from not using the machine for years. I'm going to get an M4 when I can get one discounted a bit. At that point I'd like to chat with you John to try to figure it out.
Hey Dana, thanks for listening! Definitely reach out when you get your new machine.
Well done!!!
Could you please guide to make a multi agent framework with voice integration, like a very low latency voice agent with langgraph integration
Great description and ambitious project. Super useful combination of everyday Google technology, and cutting edge integration with Gemini. Looking forward to seeing it in action!
Thanks, Lloyd! That was certainly one of my goals in building and presenting it - showing how well all these technologies work together to let people build some awesome stuff.
Mark and Allen - What is your opinion HuggingFace? I believe HuggingFace has the Model registry, keeping the Code on Docker Image separate from the Model.
Congrats Allen and Matk 👏
Congratulations on 200th episode! And great talk - casual, fun and inclusive!
Grrrrrr
Great to see you both at I/O!
P r o m o S M 😭
Hi would you be able to create a video guide on how to install your plugin please?
Being a developer these days is like trying to build a sandcastle on a tsunami prone beach!
getting a rabbit this month, but how can I practice with a LAM in the meantime? like where can I practice with LAMs?
I like y'all
Important topic/conversation. Thanks for the discussion
Have you guys ever used pyannote audio before? I'm trying to figure out what's best to get the "essence" of someones voice from their voice embedding. Do I extract embeddings from longer audio clips of the person's voice? Or do I extract embeddings from 30ms chunks of voice audio and then obtain an average embedding from the embeddings I extracted?
waste of my time.
super interesting ways to test this new model. thanks! can't wait to get my hands on it
hey, congrats on preview :) Just curious, did you get access via Google AI Studio or Vertex?
We got access via Google AI Studio, and didn't get API access. As far as I know, at least at this time, there isn't any access through Vertex AI.
Also, it should not only limit to English, i,e, the phonemes! This can apply to how we can learn multiple languages!
This is so WILD!!! 😯😳🤯
Hi...can we talk about how to use LLM to apply fuzzy logic for capacitor management in pv grid??
21:29 they say 21 minutes and 29 seconds into the 44 minute video, 22:12; try content + '...' + contents23:07 pairing with top_p and top_k can influence negative randomness; use coroutine without mallocate;25:26 would be helpful which seems to be an anomaly in these ecosystems.
How about (maybe temporarily) increasing the time for response?
Hey, I know these guys
Happy Thanksgiving🎉
You said, this seemed to be a rushed release. Maybe the sudden firing of Altman is related to this.
Hi, I recently found your channel and like the style. On what other channel I might contact you?
Great discussion Noble and Allen. I especially liked the part starting at 22:54 😀 Spot on about the relational DB analogy and yes we need "SQL for LLMs" so to speak. And yes, next year more talks on QA and how to take LLM powered applications to production.
Credit where credit is due, Roger. {: I think we're getting there. As you illustrated, we're at the point where a developer who is willing to put (a lot of) time into getting an open source LLM going can do so as easily as getting an open source database server - but we're getting closer!
good information. do you have any direct information on configure Google Matching Engine? Hard to find good tutorials on local set up on my machine.
I agree, it is VERY hard to get started with what is now called Vertex AI Vector Search. I'm working on some material that covers this, and possibly some ways to make this simpler. Stay tuned!
I have an observation that might have made this discussion clearer at the start: the model is just data for the transformer, much like tables are/contain the data for a SQL server. The question that Allen was asking was: what uses the model, which you finally answered. I have two related questions: 1) How does one tell the transformer library to use my local GPU? Is that even possible from a python interpreter framework? 2) What it would take to hook up langchain to your locally running transformer instance, so you could still code using the langchain paradigm/framework which people are more familiar with?
I think I can answer (2). I think. LangChain has several components which, as I understand it, can access your local model. For example, there is an integration for Llama.cpp. But that kinda goes to the heart of the question I posed - how do you just access a local model? This seems like it should be easy to answer - why does it end up being so difficult?
Global Intents !!
Is this like Air? The video that we have been seeing on LinkedIn with customer service? 🤔
Fascinating..
⚡⚡⚡⚡⚡⚡⚡⚡🔥🔥! Crack
Awesome episode, I just discovered LangChain after many failed attempts at creating the raw code myself in my Nextjs webapp to force GPT api to search the internet. Now I don't have to reinvent the wheel, can just implemet LangChain by deploying it on Huggingface. Thanks for the insights
This is gold. Thank you.
😁🎅
😍
I think App Actions will be a bigger failure than Conversational Actions apparently was, although Conversational Actions actually extended Assistant's functionality opening up enormous possibilities whereas App Actions just launches apps. Shortcuts into an app?!?! Seriously?!?! That is their big idea? I don't speak to my phone in public. I think the primary location where voice apps are used is at home. If you're at home you likely have cheap smart speakers everywhere, and your phone is on a charger or the counter. I think leaving smart speakers/displays behind is a huge mistake, and creating separate Assistant functionality based on user surface is a huge mistake. If I need to remember what I can say to Google Assistant based on what I am speaking to, I'm not going to use it.
Absolutely agree! This is a smoke in the eye and a completely a non sense action by Google
😈 ρяσмσѕм
Very interesting...
I'm watching and...
Ah! I worked on a similar project during my first year in Conv. AI. It was a CRUD application, basically, having a Google Spreadsheet as db. You could create a knowledge base by voice. I haven't seen many use cases of this kind so far. That would be suitable for assurances or banks. Great demo, btw! I'm not sure about the prompt 'prompt me': I would have the assistant say directly 'what's your weight today?' or something similar. The conversation would be shorter and anyone could understand what the app is about. But I do understand that this a personal tool.
So in the callback discussion I'm seeing a video starting where a troll comes out and nails the badge to the wall.
Started with a TI 99 4/A playing Munchman and Parsec. Learned to code Basic there. Then got a Commodore 128 and learned to code Extended Basic. Eventually graduated to a PC with Windows 3.0. (The C-128 got me through college.)
👀 p̶r̶o̶m̶o̶s̶m̶
Voiceflow's api tester is pretty straightforward...Postman must be easier. now..or I know more now
You almost certainly know more. 😁
Design a sample so that if directions are followed...it works