NEW Multi-Modal AI by APPLE

แชร์
ฝัง
  • เผยแพร่เมื่อ 27 ก.ย. 2024
  • Apple published new Machine Learning (ML) models on its GitHub repo: 4M-21. Massively Multimodal Masked Modelling.
    All rights w/ authors:
    4M-21: An Any-to-Any Vision Model
    for Tens of Tasks and Modalities
    arxiv.org/pdf/...
    Video from Apple and Lausanne:
    storage.google...
    #appleai
    #apple
    #multimodalai

ความคิดเห็น • 11

  • @mshonle
    @mshonle 3 หลายเดือนก่อน

    It’s about time that we went back to encoder/decoder architectures again!

  • @MeinDeutschkurs
    @MeinDeutschkurs 2 หลายเดือนก่อน

    Great! Watched both, your video and the video by EPFL. Hope, the community will create a dataset that is not based on synthetic data to increase the quality. I was impressed by the video-frame demo. I hope that some day, audio and video/animation will be included. That’s so exciting!

  • @李純心-y9u
    @李純心-y9u 3 หลายเดือนก่อน

    it is very good of Any-to-Any introduction.

  • @thesimplicitylifestyle
    @thesimplicitylifestyle 3 หลายเดือนก่อน

    Very useful! Thank you! 😎🤖

  • @fontenbleau
    @fontenbleau 3 หลายเดือนก่อน

    i don't understand why they release such miniscule useless models year long, the decent models in my experience starting from 30 billions only (yes, i have 128Gb RAM). Only such size provide some quality of more than function (a glimpse of intelligence esp uncensored) in squezzed quantised versions.

    • @code4AI
      @code4AI  3 หลายเดือนก่อน +2

      Now I could explain to you, that current phones do have compute limitations on board or I could explain that research projects start with a smaller complexity to document proof of concept, but would you understand it?

    • @fontenbleau
      @fontenbleau 3 หลายเดือนก่อน

      @@code4AI It's hard to tell their real motives, Apple is the most closed tech group. Yes, phones are incapable today as robots, no good chip anywhere. I understand perfectly, that's just my opinion and Apple will never release big models publicly, such are valuable asset. Llama 7B is good but only as dictionary/translator, anything less even more primitive. For spyware like Recall this small model is perfect.

    • @falklumo
      @falklumo 2 หลายเดือนก่อน

      You seem to be confused. This work is not about an LLM, your parameter count intuition does not apply. This is better be compared with stable diffusion which DOES an ok job on 8GB GPUs.

    • @fontenbleau
      @fontenbleau 2 หลายเดือนก่อน

      @@falklumo that's a weird reply and why you referencing to stable diff at all, an image generator? Kinda long to explain, but first Apple's stylus writing recognition (a grandfather of current Ai) was horrible, they bought patent license to use better one in Newton device, made by others.

  • @tomw4688
    @tomw4688 3 หลายเดือนก่อน +1

    Great catch! Thanks for reviewing this.