GPT-4o, AI overviews and our multimodal future

IBM Technology

มุมมอง 19 582

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 25 พ.ย. 2024

ความคิดเห็น • 19

@ThoughtfulAl 6 หลายเดือนก่อน ⁺³
I took a photo of this podcast in my TH-cam browser, and asked GPT-4o to analyze it. It took only 5 seconds and here is the analysis:
The image shows a virtual panel discussion hosted by IBM, as indicated by the "think | IBM" branding at the top left corner. There are four participants visible, each in their own video feed, suggesting a remote conferencing setup.
The title of the discussion, visible at the bottom, is "AI models and our multimodal future." This indicates the conversation likely centers around advancements in artificial intelligence, particularly focusing on multimodal AI, which involves systems that can process and understand multiple types of data simultaneously (e.g., text, images, audio).
Other details such as the video being liked 22 times, options to share, download, and access more content, and the TH-cam browser tabs at the top, suggest this is a live-streamed or recorded event available for viewing online.
@Charles-Darwin 6 หลายเดือนก่อน
I submitted a link to an 800pg paper and it only took a second or two to summarize and discuss it.
@Superteastain 6 หลายเดือนก่อน ⁺²
Let's be super clear about Google advertising. The way to think about paid online advertising is a comparison to what organized crime used to call protection. So in my best Tony Soprano voice I'll now sell you some online advertising. "Hey, I hear you want to sell some service X",
"You know the guy down the street sells service X, and he's doing alright",
"You know he donates to the.... well... let's say he pays to play",
"Are you going to pay to play?"
"You know you've got a nice place here, I'd hate to see it burn down..."
That's essentially it. You'll hear talk about the success of google because of amazing search results. This is incorrect. When Google launched most of us were on 56kps dial up connections. The search competition had pages full of crud that took a lifetime to load. Google didn't have any crud so it loaded quickly. That was it. Now Google just serves up a load of trash while it makes 100s of millions of dollars serving up that crud.
@gaylenwoof 6 หลายเดือนก่อน ⁺¹
Given the latency differences between quick/shallow answer vs. longer pause/logically deeper answer, I hope systems will have a built-in default option to pick the depth of answer you want. I'm imagining a few keywords to trigger a deeper-thinking mode with the implicit assumption that you are willing to tolerate a longer pause (and/or a possible charge to your account?) - just as you would expect with a human. I assume that if I say something like "take some time to think about this and give me the best response you can" then the AI will, indeed, take some time to give the most deeply intelligent response. Am I thinking about this plausibly? Or am I being naive?
@RomaineGangaram 6 หลายเดือนก่อน
Beautiful show first time here. Interesting and informative guests. You have a very comprehensive discussion panel going
@MartinBroadhurst 6 หลายเดือนก่อน
8:46 My take on this is the extended context window is how it is rooted in enterprise. These huge context windows, 1-2m tokens, is enough to provide the context required. Bumps up the inference cost, sure, but it's how you give the context without relying on RAG
@Charles-Darwin 6 หลายเดือนก่อน
I think 'o' actually stands for orbitofrontal cortex.
It could be a smaller/narrow receptive input network that doesn't really retain or memorize beyond simple and critical pathways, and a much larger network that assesses the weighted inputs - for bottom-up top-down approach. Because of this, I think 4o is a double ended model that are working together for distilling input and assessment.
This region of the brain is multimodal, but just as our organic builds, vision is the primary input where the other modalities also largely construct to visual representations (hear a garbage truck outside, visualize what that truck looks like in your head). This region is also extremely low latency by necessity as responses to visual input needs near-automatic responses (driving a car, walking).
All things considered I think this is the analogue of our orbitofrontal cortex and perhaps the applicability extends far farther and wider than theorized prior to implementing the solution. Shy of having the equivalent biological need to survive, I think this is AGI and we've only seen the baby brother. I don't think we'll get the whole enchilada this year or the next, rather what they've been saying, an agentic version of Jr to do biddings to paid subscribers. This kind of a breakthrough aligns with the primary scientists (and alignment conscientious) taking their leave as the management has turned on the primary objective, allocating infrastructure resources to press forward with the model's expansion over creating safety for it.
@GNARGNARHEAD 6 หลายเดือนก่อน ⁺¹
some great perspective, thanks
@DrJanpha 6 หลายเดือนก่อน
Modes of media, fragmented ...here are there
What is more useful is smartphones' functions like doing banking and ordering products...this is what is called "multifaceted integration "
@markmuller7962 6 หลายเดือนก่อน
Unfortunately the Google demo was pre-generated and they provided a tiny pathetic little disclaimer in the bottom right corner of the screen.
Besides, if it goes like the last presentation, everything AI google announce either doesn't work at all or never arrive or if it arrives it comes 6 months later in a scrapped form... So this podcast putting the OpenAI and Google presentations on the same plane is borderline bizarre and probably unprofessional too
@flashmo7 6 หลายเดือนก่อน
Environmental metric should be first.. unless you like to sun bath at 80 degrees Celsius some day on an average summer day...
@lighteningrod36 6 หลายเดือนก่อน
Parts geolocation has been around for ten years, with maint scheduling etc
@dianedean4170 6 หลายเดือนก่อน
🎉❤😊When law enforcement and security footage are translated from image to text...the jury and judge will be more confident in their determination of reasonable doubt.
The accuracy of information will be more reliable and hopefully justice will be served.🎉❤😅
@bamh1re318 6 หลายเดือนก่อน ⁺¹
google search, for me, has been garbage. Northern light was used to be way better for search, but not for ad sale
@TOMESHTI 6 หลายเดือนก่อน
M U L T I
M O D A L I T Y
4 O
@AlgoNudger 6 หลายเดือนก่อน ⁺¹
from Generative to Objective-Driven. 🙃
@Hordebarraged 6 หลายเดือนก่อน
Imagine that incredibly annoying, ingratiating female 4.o voice dealing with a police officer if you were pulled over- or any authority / face-to-face response scenario....would the authority figure punch you first or simply tear your iphone in two?
@Bronco541 6 หลายเดือนก่อน
I didnt find it the least bit annoying.

ต่อไป

เล่นอัตโนมัติ

Scaling AI, agent-led future, and race to AGI