What runs GPT-4o? | Inside the Biggest AI Supercomputer in the cloud with Mark Russinovich

แชร์
ฝัง
  • เผยแพร่เมื่อ 2 มิ.ย. 2024
  • Microsoft has built the world’s largest cloud-based AI supercomputer that is already exponentially bigger than it was just 6 months ago, paving the way for a future with agentic systems.
    For example, its AI infrastructure is capable of training and inferencing the most sophisticated large language models at massive scale on Azure. In parallel, Microsoft is also developing some of the most compact small language models with Phi-3, capable of running offline on your mobile phone.
    Watch Azure CTO and Microsoft Technical Fellow Mark Russinovich demonstrate this hands-on and go into the mechanics of how Microsoft is able to optimize and deliver performance with its AI infrastructure to run AI workloads of any size efficiently on a global scale.
    This includes a look at: how it designs its AI systems to take a modular and scalable approach to running a diverse set of hardware including the latest GPUs from industry leaders as well as Microsoft’s own silicon innovations; the work to develop a common interoperability layer for GPUs and AI accelerators, and its work to develop its own state-of-the-art AI-optimized hardware and software architecture to run its own commercial services like Microsoft Copilot and more.
    ► QUICK LINKS:
    00:00 - AI Supercomputer
    01:51 - Azure optimized for inference
    02:41 - Small Language Models (SLMs)
    03:31 - Phi-3 family of SLMs
    05:03 - How to choose between SLM & LLM
    06:04 - Large Language Models (LLMs)
    07:47 - Our work with Maia
    08:52 - Liquid cooled system for AI workloads
    09:48 - Sustainability commitments
    10:15 - Move between GPUs without rewriting code or building custom kernels.
    11:22 - Run the same underlying models and code on Maia silicon
    12:30 - Swap LLMs or specialized models with others.
    13:38 - Fine-tune an LLM
    14:15 - Wrap up
    ► Unfamiliar with Microsoft Mechanics?
    As Microsoft's official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft.
    • Subscribe to our TH-cam: / microsoftmechanicsseries
    • Talk with other IT Pros, join us on the Microsoft Tech Community: techcommunity.microsoft.com/t...
    • Watch or listen from anywhere, subscribe to our podcast: microsoftmechanics.libsyn.com...
    ► Keep getting this insider knowledge, join us on social:
    • Follow us on Twitter: / msftmechanics
    • Share knowledge on LinkedIn: / microsoft-mechanics
    • Enjoy us on Instagram: / msftmechanics
    • Loosen up with us on TikTok: / msftmechanics
    #AI #AISupercomputer #LLM #GPT
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 52

  • @alexpearson415
    @alexpearson415 12 วันที่ผ่านมา +20

    This is my favorite video that Microsoft makes. So cool

    • @MSFTMechanics
      @MSFTMechanics  12 วันที่ผ่านมา +1

      Thank you so much! Appreciate your taking the time to comment and glad you liked it.

  • @ThaLiquidEdit
    @ThaLiquidEdit 11 วันที่ผ่านมา +15

    Mark Russinovich is a legend!

  • @blitzio
    @blitzio 12 วันที่ผ่านมา +12

    Awesome to see this, especially the hardware, networking and data center breakdown and info.

    • @MSFTMechanics
      @MSFTMechanics  12 วันที่ผ่านมา

      Glad you enjoyed it!

  • @ds920
    @ds920 11 วันที่ผ่านมา +8

    That’s why I choose to buy their stocks, they know what it means to actually work. It was a long way for me from early 90s, when I’m - hardcore Unix user was calling Windows only using words “must die”, to start spending my free money on their stocks, and to actually admit what this company is really doing all this time. Thank you guys for keeping that spirit!

  • @BigEightiesNewWave
    @BigEightiesNewWave 12 วันที่ผ่านมา +8

    Man, Mark is God-status at Microsoft

  • @ABLwAmazing
    @ABLwAmazing วันที่ผ่านมา +1

    Ah, the sysinternals guy. I owe half my career to this guy. Thx.

  • @IshaqIbrahim3
    @IshaqIbrahim3 13 วันที่ผ่านมา +2

    Timeline: 9:00 What happen to the heat energy extracted during cooling? Does it get used to generate electricity to power other devices or supply energy to some of the cooling fans or is it not used for anything?

  • @drivenbycuriosity
    @drivenbycuriosity 10 วันที่ผ่านมา +3

    Most fascinating part for me is the Multi-LORA.

    • @MSFTMechanics
      @MSFTMechanics  10 วันที่ผ่านมา +1

      It is. It's a little like differencing disks with the additional state/data.

  • @SuperRider-RS
    @SuperRider-RS 12 วันที่ผ่านมา +3

    Great session, Thank you

    • @MSFTMechanics
      @MSFTMechanics  12 วันที่ผ่านมา

      Appreciate the compliment, thank you!

  • @ShpanMan
    @ShpanMan 10 วันที่ผ่านมา +1

    Underrated video, a lot of cool useful details!

    • @MSFTMechanics
      @MSFTMechanics  10 วันที่ผ่านมา

      Thank you! Happy that it's useful - and it keeps evolving quickly.

  • @user-gg8we2ot4b
    @user-gg8we2ot4b 12 วันที่ผ่านมา +2

    Interesting architecture.

  • @LouSpironello
    @LouSpironello 12 วันที่ผ่านมา +3

    Great info about the architecture! Thank you.

    • @MSFTMechanics
      @MSFTMechanics  12 วันที่ผ่านมา

      Thank you! Glad it helped on the architecture front.

  • @jeetmajumdar7588
    @jeetmajumdar7588 6 วันที่ผ่านมา

    Great session, Mark is as always the best❤

    • @MSFTMechanics
      @MSFTMechanics  5 วันที่ผ่านมา

      Thanks so much! Appreciate your taking the time to comment.

  • @Jj-du8ls
    @Jj-du8ls 12 วันที่ผ่านมา +3

    5 times the Azure supercomputer deployed each month? Is that a typo..

    • @MSFTMechanics
      @MSFTMechanics  12 วันที่ผ่านมา +6

      It's not. We just announced 30x have been added since November 2023

    • @Hashtag-Hashtagcucu
      @Hashtag-Hashtagcucu 11 วันที่ผ่านมา

      What he isn’t saying is for how long this rate goes on

    • @guruware8612
      @guruware8612 11 วันที่ผ่านมา +1

      @@Hashtag-Hashtagcucu For ever, as long as there are people thinking that it's a great idea to chat with a machine or have a robot-dog.
      Insanity is the new norm.

    • @coreystrait513
      @coreystrait513 9 วันที่ผ่านมา +1

      ​@@MSFTMechanicsStargate and quantum computing hurry up

  • @lifeslooker
    @lifeslooker 13 วันที่ผ่านมา

    What would it take to take a 175B model to shrink it to run on a mobile phone? What are the limitations? The language used in the model? Can a compression be used or a language be developed that doesn't take up much space?

    • @MSFTMechanics
      @MSFTMechanics  12 วันที่ผ่านมา +2

      The closest correlation to size is the parameter count, so Phi-3-mini has 3.8bn parameters and is roughly 2.2GB file size to run locally on the phone as demonstrated by Mark in the video. There are things that the larger models will do in terms of reasoning and built-in knowledge, as Mark said. One example that we actually hit while planning this show is that the slightly larger Phi-3 models could phrase the cookie recipe in the writing style of Yoda from Star Wars. Because mini didn't have the pop culture references in its training set, we made the tone sarcasm instead.

    • @lifeslooker
      @lifeslooker 12 วันที่ผ่านมา +1

      @@MSFTMechanics funny I’m watching Star Wars episode 1 right now on Apple TV+😂😂😂😂
      Sarcasm is something the is very rich in style and in different languages would be interesting to see how this is done in say Italian or French

  • @phobosmoon4643
    @phobosmoon4643 11 วันที่ผ่านมา +1

    Great video. I have a maybe annoying question; how can we know that cloud ai services are selling us what they say they are? For example, context length could easily be fudged.

    • @phobosmoon4643
      @phobosmoon4643 10 วันที่ผ่านมา

      @@test-zg4hv yea I'm asking how you test it? Is it kind of like a error checking algorithm?

    • @MSFTMechanics
      @MSFTMechanics  10 วันที่ผ่านมา +1

      You can stipulate that in code or using the Azure AI Studio, and you can test it. We cover that to some extent in this episode th-cam.com/video/3hZorLy6JiA/w-d-xo.html

  • @kylev.8248
    @kylev.8248 12 วันที่ผ่านมา +2

    This is awesome

    • @MSFTMechanics
      @MSFTMechanics  12 วันที่ผ่านมา +1

      Glad you liked it and thank you!

  • @sachoslks
    @sachoslks 9 วันที่ผ่านมา +1

    5 times the Azure supercomputer deployed each month, thats insane!!! What does that mean for training next gen frontier models? 30x November 2023 does it mean you can train it 30x longer, 30x bigger or 30x faster or what? Will this continue up to the end of the year reaching almost 65x compute in one year?

    • @MSFTMechanics
      @MSFTMechanics  9 วันที่ผ่านมา

      Good questions. We have deployed 30x total or on average 5 additional instances per month of the November 2023 Top 500 submission with 14k networked GPUs, 1.1m cores and 561 petaflops. These will continue getting bigger and more instances provisioned in the future. And now there are more options for GPUs and AI accelerators, too, plus the Nvidia H200 and Blackwell architectures are coming soon with more speed, power and efficiency.

  • @MDFnyny
    @MDFnyny 9 วันที่ผ่านมา +1

    Thanks, quite impressive!

    • @MSFTMechanics
      @MSFTMechanics  9 วันที่ผ่านมา

      Thanks for watching and commenting!

  • @RohanKumar-vx5sb
    @RohanKumar-vx5sb 8 วันที่ผ่านมา

    cool stuff!

  • @nestorreveron
    @nestorreveron 12 วันที่ผ่านมา +1

    Thanks.

  • @bfg5244
    @bfg5244 12 วันที่ผ่านมา

    that's inspiring

    • @MSFTMechanics
      @MSFTMechanics  10 วันที่ผ่านมา

      Glad you liked it. Thanks for taking the time to comment.

  • @kyber.octopus
    @kyber.octopus 8 วันที่ผ่านมา

    Nice

  • @jeffreyrh
    @jeffreyrh 11 วันที่ผ่านมา +1

    Wouldn't it be possible to create a distributed computer system like SETI or that Protein folding project, and use this computing power to train AI systems? Those projects used peoples personal computers when they had idle time.

    • @Zreknarf
      @Zreknarf 6 วันที่ผ่านมา

      it's called a botnet and yeah you can do that. these are purpose built AI chips though, nobody has those at home because they are not for sale yet.

    • @Zreknarf
      @Zreknarf 6 วันที่ผ่านมา

      also, from the video, inferencing requires high bandwidth memory, not so much compute power, which would suffer greatly from latency

  • @synthwave7
    @synthwave7 9 วันที่ผ่านมา +1

    Glad Microsoft is making sure there is co-existence between all hardware manufacrturers, otehrwise AI hardware will become chaos.

  • @Rkcuddles
    @Rkcuddles 11 วันที่ผ่านมา +1

    This dude AI?

    • @DeployJeremy
      @DeployJeremy 9 วันที่ผ่านมา

      Mark has been trained on at least 175 billion parameters, but he isn't AI 🙂

  • @Arcticwhir
    @Arcticwhir 10 วันที่ผ่านมา

    13:38 you used the same exact joke a year ago with mark

    • @MSFTMechanics
      @MSFTMechanics  10 วันที่ผ่านมา +1

      Yes, that was intentional, because Multi-LoRA would allow Neo to have hundreds or thousands of skills added simultaneously, not just the one like last year.

  • @ArronLorenz
    @ArronLorenz 10 วันที่ผ่านมา

    Solid organic joke.