CPU? GPU? This new ARM chip is BOTH

แชร์
ฝัง
  • เผยแพร่เมื่อ 21 ม.ค. 2025

ความคิดเห็น • 744

  • @MarcoGPUtuber
    @MarcoGPUtuber 4 ปีที่แล้ว +471

    A64FX.....Why have I heard that name before?
    Oh yeah! Athlon 64 FX!

    • @arusenpai5957
      @arusenpai5957 4 ปีที่แล้ว +10

      Yeah, tha name remind´s me that too XDD

    • @pflernak
      @pflernak 4 ปีที่แล้ว +15

      So thats where the deja vu feeling came from

    • @zM-mc2tf
      @zM-mc2tf 4 ปีที่แล้ว +1

      What goes round...

    • @jean-pierreraduocallaghan8422
      @jean-pierreraduocallaghan8422 4 ปีที่แล้ว +2

      I knew I'd seen that somewhere before but I couldn't put my finger on it ! Thanks for the reminder ! :)

    • @xxdizannyxx
      @xxdizannyxx 4 ปีที่แล้ว +3

      FX you...

  • @billykotsos4642
    @billykotsos4642 4 ปีที่แล้ว +526

    RIP SATORU IWATA.
    A BRILLIANT AND UNIQUE MIND.
    His father never wanted him to pursue a games career.

    • @mix3k818
      @mix3k818 4 ปีที่แล้ว +6

      There's only one video that comes to my mind at this point.
      th-cam.com/video/j2dxX5DIEMQ/w-d-xo.html
      R.I.P. to both.

    • @masternobody1896
      @masternobody1896 4 ปีที่แล้ว

      Intel is better

    • @perhapsyes2493
      @perhapsyes2493 4 ปีที่แล้ว +4

      And I'm glad he didn't listen.

    • @MissMan666
      @MissMan666 4 ปีที่แล้ว

      @@masternobody1896 intel is nr.2.

    • @legacyoftheancientsC64c
      @legacyoftheancientsC64c 4 ปีที่แล้ว

      Nice Haiku

  • @TechKerala
    @TechKerala 4 ปีที่แล้ว +285

    Dedicated my life's 20 MINUTES.. Worth it as always..

    • @adnan4688
      @adnan4688 4 ปีที่แล้ว +2

      Absolutely!

    • @Soul-Burn
      @Soul-Burn 4 ปีที่แล้ว +6

      Only dedicated 10 minutes. x2 speed is great.

    • @johnnyxp64
      @johnnyxp64 4 ปีที่แล้ว +1

      20:36 actually for me..cause I wanted to see my name on the Credits... 🤣😝

    • @AwesomeBlackDude
      @AwesomeBlackDude 4 ปีที่แล้ว +1

      Always guarantee when you watch a (Jim) #AdoredTV video ❎

    • @miguelpereira9859
      @miguelpereira9859 4 ปีที่แล้ว

      @@johnnyxp64 Being a Coreteks patreon means having big pp

  • @wajihbleik436
    @wajihbleik436 4 ปีที่แล้ว +84

    Thank you for doing what you're. I learn a lot from your videos.

  • @lawrencedoliveiro9104
    @lawrencedoliveiro9104 4 ปีที่แล้ว +13

    7:55 Not quite. Supercomputing applications actually have limits to their parallelism. There is also a need for heavy communication traffic between cores. Hence the fast interconnect, which is a major component of the build cost of a super.
    For an example of a massively parallel application which doesn’t need such heavy interprocessor communication, consider rendering a 3D animation. The renderfarms that are deployed for such an application are somewhat cheaper than supercomputers.

  • @MrBearyMcBearface
    @MrBearyMcBearface 4 ปีที่แล้ว +113

    This video sounds more like a nonfiction crime tvshow than something about processors.

    • @rcrotorfreak
      @rcrotorfreak 3 ปีที่แล้ว

      can u share us ur pic?

    • @kcvriess
      @kcvriess 2 ปีที่แล้ว

      You make me laugh but at the same time I'm annoyed. This dude has a wealth of knowledge and insights, but he's HORRIBLE to listen to.

  • @lawrencedoliveiro9104
    @lawrencedoliveiro9104 4 ปีที่แล้ว +30

    8:53 That doesn’t make sense. “Teraflops” is a unit of computation (“flops” = “floating-point operations per second”), not of data transfer. Data transfer rates would be measured in units of bits or bytes per second.

    • @blackdoveyt
      @blackdoveyt 4 ปีที่แล้ว +6

      Yeah, A64FX has 1TB/s theoretical bandwidth and 840GB/s of actual bandwidth.

  • @suibora
    @suibora 4 ปีที่แล้ว +114

    17:28 sure, streaming today data would be instant with tomorrows technology, but what about tomorrows data? The extinction of load times is far away. More powerful computers? That will just be an excuse to use more detailed textures :'D

    • @Mil-Keeway
      @Mil-Keeway 4 ปีที่แล้ว +20

      loading nowadays is no longer limited by file size, it is limited by bad code. NVMe SSDs do many GiB/s, no game asset needs more than the blink of an eye to load. Sadly, developers have some of the fastest possible hardware available (especially in big-budget games and programs), so they have no need to optimize. Running the same code on an average PC then makes it unusable.

    • @redrock425
      @redrock425 4 ปีที่แล้ว +2

      The biggest issue is poor telecoms infrastructure. Even in the UK it varies massively in speed, they're already trying to save cost and not put in full fibre.

    • @pflernak
      @pflernak 4 ปีที่แล้ว +2

      @jayViant Talking of holograms:
      th-cam.com/video/V7V05T4DhrU/w-d-xo.html

    • @635574
      @635574 4 ปีที่แล้ว +9

      @@Mil-Keeway compression and bad structuring of data makes for terrible load times rven on high end NVMes. Games before the next get werent optimized for this, maybe except star citizen and arkham knight.

    • @paramelofficial9100
      @paramelofficial9100 4 ปีที่แล้ว +2

      Just compare it to just 10 years ago when some websites would take years to load half the time, or 10 years before that when printing a jpeg was faster than viewing it on a webpage. We're really stretching conventional processor capabilities thin but there will definitely some fundamental shift in the industry that keeps the performance train chugging along. Could be a beefed up ARM chip, desktop chips made from different materials (silicon ain't the most performant, it's the most flexible) or something completely different if synthetic neurons or quantum computers have an early breakthrough. Internet banwidth is also constantly improving.
      Honestly the only thing slowing us down is companies milking their current technologies like crazy. Let's all thank AMD's Threadripper for shoving 32 inefficient cores into pro-sumer PC's and speeding up global warming lol. And let's not forget Intel's tiny generational improvements. There are certain sollutions which could be implemented pretty soon but who has time to research other options when they have to pump out 3 useful and 7 useless chips a year?
      TL;DR : Tech seems to be improving faster than consumer needs because it never improves fast enough for professional needs, driving researchers to find new and better sollutions. But capitalism's a bit of a bitch sometimes and is getting in the way.

  • @DarthAwar
    @DarthAwar ปีที่แล้ว +1

    If the utilise the newest HBM version instead of traditional DRAM for Cache it would vastly increase its processing speed and reliability but also dramatically increase production costs

  • @chuuni6924
    @chuuni6924 4 ปีที่แล้ว +70

    If you haven't already, you may want to look into RISC-V's upcoming Vector extension. It does all that SVE does, but better.

    • @Toothily
      @Toothily 4 ปีที่แล้ว +4

      Better how?

    • @chuuni6924
      @chuuni6924 4 ปีที่แล้ว +28

      @@Toothily There are a couple of independent things. For one thing, there's no architectural upper limit to the number of vector lanes. Another thing is that the dynamic configuration of the vector registers allows better utilization of the register file (for example, if only a couple of vector registers are used, they can subsume the register storage of the other registers to get much, much wider vectors). Also, while that part of the specification is still a bit up in the air, there is an aim to provide for polymorphic instructions based on said dynamic configurations, which means that it's far easier for it to adopt new data types with very small architectural changes. They also aim to provide not only 1D vector operations, but even 2D or 3D matrix operations, which could provide functionality similar to eg. nVidia's tensor cores, except in a more modular fashion.
      There are more examples too, but I think this post is running long enough as it is. I recommend reading the specification.

    • @Toothily
      @Toothily 4 ปีที่แล้ว +4

      @@chuuni6924 that sounds really cool spec wise, but do they have working silicon yet?

    • @chuuni6924
      @chuuni6924 4 ปีที่แล้ว +15

      @@Toothily The spec isn't even finalized yet, so no, there's definitely no silicon yet. However, the Hwacha research project is being carried out in parallel and I know there's a very strong connection between it and RV-V, and I believe they have working silicon in some sense of the word. It's a research project rather than a product, however, so not in the ordinary sense of the word.

    • @mrjean9376
      @mrjean9376 4 ปีที่แล้ว +1

      Really wanted to know, what you guys think/opinion about this computer compared to nvidia dgx a100?? Does it has equal performance or something? I really excited to know this. Thx :)

  • @micronyaol
    @micronyaol 4 ปีที่แล้ว +203

    Can't imagine a japanese chip without TOFU interface

    • @glasser2819
      @glasser2819 4 ปีที่แล้ว +6

      iExplorer has SHAKRA engine (as shown in TaskMgr)
      and if it was coded in Germany it would have a SAUSAGE Cache pipe... LOL

    • @IngwiePhoenix_nb
      @IngwiePhoenix_nb 4 ปีที่แล้ว +4

      @@glasser2819 No, it would have a Bierfass (beer jar) pipeline ;) I am german, I should know. ^.^

    • @minitntman1236
      @minitntman1236 4 ปีที่แล้ว +9

      The driver of AE86 was a tofu delivery man

    • @prashanthb6521
      @prashanthb6521 4 ปีที่แล้ว +3

      SUSHI coming up next.

    • @matthewcalifana488
      @matthewcalifana488 4 ปีที่แล้ว +4

      They make the Best capitors .

  • @kipronosoi
    @kipronosoi 4 ปีที่แล้ว +66

    Woot !! Coreteks is back, feels like its been forever...

    • @StopMediaFakery
      @StopMediaFakery 4 ปีที่แล้ว +1

      Don't you just love their Masonic logo? The honeycomb hexagon also known as the Cube, a reference to Saturn and the system we live in. Just so happens to also be in the beehive colours.

  • @m_schauk
    @m_schauk ปีที่แล้ว +1

    Damn this video had aged well... so good. Wish more videos like this were made and popular on TH-cam.

  • @MrTrilbe
    @MrTrilbe 4 ปีที่แล้ว +39

    So,, ARM, AMD and Fujitsu teamed up for a super APU, that's in some ways more epic then EYPC..., I will call this colab FARMeD!

    • @MrTrilbe
      @MrTrilbe 3 ปีที่แล้ว

      @June 1st Absolute 2006 Rainbow it was a tongue in cheek summery of this video and an pun at the end

  • @rickbhattacharya2334
    @rickbhattacharya2334 4 ปีที่แล้ว +3

    Man your videos always inspire me to read more computer architecture .
    I have computer architecture as a subject in my bachelor's and i don't like it but your videos always inspire me to read it more.

  • @Battlebaconxxl
    @Battlebaconxxl 4 ปีที่แล้ว +81

    What you describe sounds like a modern version of the PS3's cell chip.

    • @FrankHarwald
      @FrankHarwald 4 ปีที่แล้ว +23

      Kind of, yes! The PS3 used several DSP-like processors connected onto a ring bus - except that rings, as well as other pure bus like topologies, while being the simplest way to interconnect multiple regions on a chip have several inherent limits which restrains this kind of topology to a limited amount of locally adjacent cells which is why the kind of processor presented here not only has one ring, but a hierachy of rings topology:
      See this paper as an example for examining & describing different hierachical ring topoligy variants as on-chip interconnection networks, also called NoC = "network on chip"
      "Design and Evaluation of Hierarchical Rings
      with Deflection Routing": pages.cs.wisc.edu/~yxy/pubs/hring.pdf
      This has been a hot reseach topic in hp & scientific computer engineering for several years now.
      Another really old, formerly rejected but increasingly interesting & related research topic is "computing-in-memory", also "processing-in-memory" or "near memory processing" because the costs to transfer data between processing units & memory is, as mentioned in this video, increasingly becoming a limiting factor, see
      "Computing In-Memory, Revisited": ieeexplore.ieee.org/document/8416393 but also semiengineering.com/in-memory-vs-near-memory-computing/
      & while the recent emergence of array processors like Googles tensor cores & other forms of neuromorphic processing units is clearly at least partly due to that, this problem isn't limited to applications using AI but applies to a much broader category of problems - the "bandwidth wall" is a thing.

    • @SerBallister
      @SerBallister 4 ปีที่แล้ว +6

      @@FrankHarwald One of the biggest headaches of working with the cell BB was the relatively tiny amounts of accessable memory each SPU had (256kb IIRC). This meant you couldn't use a lot of general purpose algorithms and instead had to modify them to be streamable with high locality of reference - for some algorithms it just isn't possible to optimise in such a way.

    • @FrankHarwald
      @FrankHarwald 4 ปีที่แล้ว +2

      @@SerBallister indeed, but modifying algorithms so that they run with a high amount of locality is something that you'll have to do for all data intensive algorithms anyway - no matter how much of it is done automatically, profiler-assisted or by hand - regardless of what the underlying architecture is because while all shared memory architectures will start hitting the bandwidth wall at some point, distributed memory architectures will be the only way to circumvent these limitations. & yes, this also means that algorithm that access a lot of memory from the same chunk in a purely serial way will either have to be modified to access data in parallel from multiple chunks (if possible) or remain bandwidth limited (if this is acceptable or if the algorithm is inherently serial).

    • @SerBallister
      @SerBallister 4 ปีที่แล้ว +3

      @@FrankHarwald You should aim for that yeah. The SPU local memory presented an addressing barrier instead of a cache miss like on a multicores, all data has to be present in that block. Take a ps3 game for example. Some systems like physics and pathfinding can be hard to compress your game world in 256kb, the PPU had to work on that stuff and you then had the headache of pipelining the output of that into the SPU (e.g. animation) if you want to avoid stalls. Interesting chip but can be hard work, task scheduling and synchronisation is also not straight forward. I would prefer working with modern desktop multicores with shared memory.

    • @thurfiann
      @thurfiann 4 ปีที่แล้ว

      of course it is

  • @raymondobouvie
    @raymondobouvie 4 ปีที่แล้ว +123

    I am no engineer in any shape, but with Coreteks videos I am getting such a digestible form of explanation that teaches me, even thaw i am 37yo) Thank you so much!

    • @mrlithium69
      @mrlithium69 4 ปีที่แล้ว +15

      37 is not too late. God willing you will be learning well past 37 and even at 73.

    • @Seskoi
      @Seskoi 4 ปีที่แล้ว +3

      I'm 101 years old and still learning!

    • @IARRCSim
      @IARRCSim 4 ปีที่แล้ว +6

      @@Seskoi in base ten?

    • @raymondobouvie
      @raymondobouvie 4 ปีที่แล้ว

      @@IARRCSim they opened schools on Mars - finally)

    • @The_Man_In_Red
      @The_Man_In_Red 4 ปีที่แล้ว

      @@Seskoi
      I'm 1,009,843,000 seconds old and I push myself every nanosecond to learn more and more

  • @chafacorpTV
    @chafacorpTV 4 ปีที่แล้ว +16

    I once heard that HAL got its name by grabbing IBM's and ticking the characters because they saw themselves as "one step ahead of IBM". Seeing this, I truly believe it.

    • @miketaratuta
      @miketaratuta 4 ปีที่แล้ว +3

      ticking them back, not forwards

  • @TechdubberStudios
    @TechdubberStudios 4 ปีที่แล้ว +14

    Loved this video so much, watched it twice in a row.

  • @seylaw
    @seylaw 4 ปีที่แล้ว +1

    And ARM already has announced the SVE2 extension which is a replacement for their NEON instruction set (for home/multimedia usage instead of SVE1 which is tuned for HPC workloads). Interesting times are ahead and can't wait for ARM storming the PC desktop...

  • @arthurcuesta6041
    @arthurcuesta6041 4 ปีที่แล้ว +3

    You're finally back. Thanks again for the amazing work.

  • @DanafoxyVixen
    @DanafoxyVixen 4 ปีที่แล้ว +50

    The comparison with the duel intel xeons is a little silly now that they have already been blown out of the water with eypc.. still an interesting CPU tho..

    • @stefangeorgeclaudiu
      @stefangeorgeclaudiu 4 ปีที่แล้ว +12

      I think people are going to get surprised when AMD announces Milan this year.
      Also, the Frontier 1.5 exaFLOPS supercomputer will use a CPU chiplet + 4 GPU chiplets + memory in the same AMD chip.

    • @thomasjensen1590
      @thomasjensen1590 4 ปีที่แล้ว +6

      The question is, what is more EPYC?

    • @BrianCroweAcolyte
      @BrianCroweAcolyte 4 ปีที่แล้ว +3

      I agree. With how many problems Intel has been having for the last 4-5 years stagnating them on 14nm, comparing anything besides other x86 CPUs to Intel feels disingenuous.
      If they compared this ARM chip to the actual current x86 performance leader (a 2U Epyc Rome server with 128 cores) it would be beaten by at least 2-3X. Maybe performance per watt would be better on the ARM chip, but the performance density would almost definitively be unbeaten.

    • @aminorityofone
      @aminorityofone 4 ปีที่แล้ว +5

      ​@@BrianCroweAcolyte This isn't the first time ARM was expected to be dominate. It happened int he 90's as well. In fact Microsoft made Windows NT compatible with ARM back then. There was big promise that RiSC cpus would take over the world. Well, that didnt happen, and i still dont think it will happen today or in the future.

    • @defeqel6537
      @defeqel6537 4 ปีที่แล้ว +1

      @@aminorityofone ARM will probably continue to dominate the market where chips are designed for purpose (unless RISC-V takes that market), mostly because x86 isn't licensed to anyone new.

  • @peacenaga7725
    @peacenaga7725 4 ปีที่แล้ว

    I stumbled upon your channel when viewing your interview of Jon Masters and have binge watched 3 episodes losing sleep. Kudos. I am learning a lot! Thank you. I havent binge watched in a long time.

  • @fanitriastowo
    @fanitriastowo 4 ปีที่แล้ว +12

    I like that progress bar ads

  • @斯溫克
    @斯溫克 4 ปีที่แล้ว

    Congratulations on 100.000 subscribers !! I love your videos and i came a long way in computer khnowlage because of you , i hope you have a great year ! Love you from EU Si ♥️😊

  • @desjardinspeter1982
    @desjardinspeter1982 4 ปีที่แล้ว +1

    your video presentations are so well done. I always look forward to watching them! such an interesting product. thank you for covering this!

  • @datsquazz
    @datsquazz 4 ปีที่แล้ว +68

    Those chips are cool and all, but did you see THIS? 18:04 That truck has FOUR WHEEL STEERING, now THAT is innovation

    • @onebreh
      @onebreh 4 ปีที่แล้ว +22

      They have been on the roads for years...

    • @carholic-sz3qv
      @carholic-sz3qv 4 ปีที่แล้ว +5

      there is also all wheel steering at the wheels at the back too, look at this tatra video th-cam.com/video/U-ujpvOeydk/w-d-xo.html

    • @Mil-Keeway
      @Mil-Keeway 4 ปีที่แล้ว +2

      lots of 3-axle garbage trucks in europe have frontmost and rearmost steering, pivoting around the middle axle basically.

    • @keubis2132
      @keubis2132 4 ปีที่แล้ว

      @@onebreh pog realy ?

    • @koolyman
      @koolyman 4 ปีที่แล้ว +8

      You call that innovation? Get back to me after you google 'Spork'

  • @aziziphone9350
    @aziziphone9350 4 ปีที่แล้ว +10

    Finally ! My favourite TH-camr coreteks uploaded a video. Love your content man been watching you since the age of 15 in 2019 till now

    • @aziziphone9350
      @aziziphone9350 4 ปีที่แล้ว

      JustVictor 17 hehhe nice man

    • @aziziphone9350
      @aziziphone9350 4 ปีที่แล้ว

      JustVictor 17 this technology and semiconductor field is the only place were we all can be together without any bullshit politics and drama of this cruel world.

  • @InfinitePCGaming
    @InfinitePCGaming 4 ปีที่แล้ว

    Was worrying you disappeared. Glad we got a new video.

  • @lazadafanboyz7970
    @lazadafanboyz7970 4 ปีที่แล้ว +1

    We need to rethink how we use computers. ARM is the future, X86 processors are flawed, it runs at 125watts while ARM will only use 5watts with equal of computing power. The only problem is we need to reprogram everything

  • @MrSchweppes
    @MrSchweppes 4 ปีที่แล้ว

    Great Video! Thanks!

  • @lawrencedoliveiro9104
    @lawrencedoliveiro9104 4 ปีที่แล้ว +5

    5:36 Actually, the plural of “die” is “dice”.
    Yes, those dice. As in the phrase “the die is cast”, which means instead of throwing several dice, you have thrown just one, and must stand by whatever it shows.

    • @ehp3189
      @ehp3189 4 ปีที่แล้ว +1

      The "die is cast" comes from the middle high German/English Gutenberg printing. The printed page came from a single die cast, which is why it was slow and expensive (though cheaper than the Monks drawing each page by hand). This allowed Bibles to be printed, helped people learn how to read, and bring education to the people.

    • @lawrencedoliveiro9104
      @lawrencedoliveiro9104 4 ปีที่แล้ว +1

      @@ehp3189 That can’t have been right. Guternberg’s innovation was the invention of movable-type printing, as in having separate pieces for each letter that were assembled to make up a page. Printing an entire page from a single block was a technique that had been invented by the Chinese centuries earlier.

    • @ehp3189
      @ehp3189 4 ปีที่แล้ว

      @@lawrencedoliveiro9104 Granted, but the expression goes more towards the assembled type set being cast together in a block and any changes to that during a printing run were not to be allowed. It was difficult enough that breaking apart the group and then reassembling it for one letter change was more expensive than it was worth. At least that is my understanding. I liked philologogy in college but they only offered one class ...

  • @tipoomaster
    @tipoomaster 4 ปีที่แล้ว +6

    "The future is Fusion", the slogan was just 12 years ahead of the technology

  • @GarretSlarrity
    @GarretSlarrity 4 ปีที่แล้ว

    Very excited for the video on neuroscience and computing!

  • @UHDking
    @UHDking 4 ปีที่แล้ว

    I am a fan. Good content man. Thanks for your research and sharing the knowledge.

  • @CitizenTechTalk
    @CitizenTechTalk 4 ปีที่แล้ว +1

    Simply mind blown! Wow!
    Thank you, amazingly educational video!!!

  • @N0N0111
    @N0N0111 4 ปีที่แล้ว +9

    7:15 Finally the memory bottleneck is being some what addressed.

  • @Hazemann
    @Hazemann 4 ปีที่แล้ว +3

    This advanced Fujitsu's A64FX chip variation will be in a Nintendo Switch revision in the future as a complement to the ARM's CPU and Nvidia's GPU

    • @deoxal7947
      @deoxal7947 4 ปีที่แล้ว

      Where can I read about that?

  • @VicenteSchmitt
    @VicenteSchmitt 4 ปีที่แล้ว

    Glad to see your videos again!

  • @JamesLee-mp2qz
    @JamesLee-mp2qz 4 ปีที่แล้ว

    I didn't think it was humanly possible for your voice to get any lower... You proved me wrong :)

  • @virtualinfinity6280
    @virtualinfinity6280 4 ปีที่แล้ว +10

    Just an minor correction: The ring bus was used by Intel for Haswell and Broadwell. From Skylake onward, they are using a mesh interconnect.

    • @davidgunther8428
      @davidgunther8428 4 ปีที่แล้ว +1

      Desktop Skylake (and family) uses a ring bus too. The high core count chips use the mesh interconnect.

  • @TheJabberWockyy
    @TheJabberWockyy 4 ปีที่แล้ว +1

    I wonder why everyone isn't talking about this. This is fascinating and exciting.

  • @denvera1g1
    @denvera1g1 4 ปีที่แล้ว

    Consumer processors will probably use HBM as sort of an L4 cache, or a base memory with a tiering system, and then still have traditional memory channels, though maybe less channels

  • @studiosnch
    @studiosnch 4 ปีที่แล้ว +1

    And a few months later we see many design choices here (especially the on-chip memory) in Apple Silicon M line.

  • @andrew1977au
    @andrew1977au 4 ปีที่แล้ว

    Awesome video bud, some very interesting info there. Thank you

  • @josephfrye7342
    @josephfrye7342 2 ปีที่แล้ว +1

    this is another example that you don't need nvidea or various graphics card seperatly.

  • @greenempower1053
    @greenempower1053 4 ปีที่แล้ว +4

    I've been seeing this coming for years now.

  • @danuuu101
    @danuuu101 4 ปีที่แล้ว

    Your channel is a gold mind for computer engineers I really like your analysis and getting into details more then other channels do.
    In other note, I really want to see a video about RISC V and its future in personal computing and IoT I'm currently learning RISC V assembly and planning on building a small RISC V CPU on a FPGA but I'm very curious about its future and if it worth the effort.

  • @TheJabberWockyy
    @TheJabberWockyy 4 ปีที่แล้ว

    Awesome video man! Ty for the great content

  • @zM-mc2tf
    @zM-mc2tf 4 ปีที่แล้ว

    Thank you again for your insight, and all the info.

  • @AlexSeesing
    @AlexSeesing 4 ปีที่แล้ว

    The end sounds like Cygnus X - Positron but a bit different. Has that anything to do with the presumed change in computing you laid out in this video? If so, that is a masterful match!

  • @Speak_Out_and_Remove_All_Doubt
    @Speak_Out_and_Remove_All_Doubt 4 ปีที่แล้ว +11

    When are we going to see desktop CPUs with 3D stacked memory and realistically how much memory is it going to have? I can't see 32GB of system memory fitting on a normal desktop die but maybe I'm wrong. Or will it just be a small amount of on-die memory to be used like an extra cache layer but you still have your normal DDR system memory?
    Also, I can't be my head around heat dissipation in this new 3D stacking future. Given heat is the biggest issue in 2D chips and atm we can cool directly on to the only layer there then 3D stacking just doesn't sound like it's going to work on anything other than ultra low powered chips to me.
    I guess you could try to have some kind of micro coolant channels flowing between layers or maybe thin sheets of graphene but this will be expensive and complex to integrate plus If you have to go from CPU's running at 5GHz to them running at 1.8GHz then I can't see the benefits of closer system memory being enough to overcome this inherent drawback.

    • @mrlithium69
      @mrlithium69 4 ปีที่แล้ว +4

      True story. Yes we will see 3D stacked CPUs with some kind of RAM on them. (My guess is 2023). We've seen GPUs already that have large HBM stacks "on-chip", and we will see them integrated to even more "on-die". Since Intel has invented its "Foveros" tech to avoid the Interposer layer and TSV's. It does seem like the RAM packages would be almost as large in die size as the CPU. (judging back to the HBM GPUs). Also, Remember the 128MB EDRAM L4 cache on the integrated GPU of Broadwell 5775C (2015)? Granted it was for the iGPU, but that was an early proof of concept of CPU+RAM, and took up a very large real estate of that chip. It wasnt market friendly, the proportions and cost were all wrong for mass-marketability, but it was interesting to say the least. Right now its probably easy enough to add a stack of 1 or 2 modern 16Gigabit DRAM die (=2 or 4 Giga BYTES), thats just my speculation.
      They've also been researching that micro-coolant-channel science as a real possibility lately. And you're right, the heat in the 3D stacks is the main issue. Everyones trying to stack the best way possible, its just a question of how.
      Then again, besides the real science of it, theres commercial viability concerns. The currently long-standing seperation between the CPU and RAM markets, means we get good prices on both. RAM as a commodity as we know it would be at risk if the 2 main CPU makers are integrating it onto a CPU. The move away from "memory modules" we all know and love would be too hard to do. Plus, think Apple laptops and planned obsolescence; they already solder the RAM on the motherboard, I'm sure they'd love it soldered right on the CPU. So we can buy $1000 CPU+RAM combo-chips and throw them away in 3 years when Chrome starts using 128GB of ram. I'd be more convinced if it was a small L4 cache idea, and leave system RAM alone.

    • @johannajohnson310
      @johannajohnson310 4 ปีที่แล้ว +2

      he does NOT want to mentiion the downfalls , we can barely keep heat off our current cpus and gpus, so 3d stacked? nah the amount of power needed, and cost to switch a billions of severs ,no happening

    • @IARRCSim
      @IARRCSim 4 ปีที่แล้ว +1

      Having things as simple as the video suggests would be a dream. Unfortunately, we'll keep having multiple levels of memory like we always have for over 50 years and different speeds, capacity, and prices per capacity for each. That means any software developers eager for better performance will need to continue optimizing for that complicated layout and be mindful of how it all works. Drastically more L3 cache would be great but there will always be demand for different capacities that strike a different balance on speed, price per unit of capacity, and capacity. RAM Drives introduced at th-cam.com/video/6pp_krChw_A/w-d-xo.html do a great job of explaining some of the tradeoffs between speed, price, and others between SSD-like memory vs RAM.

    • @alexsiniov
      @alexsiniov 4 ปีที่แล้ว

      When it hits consumer it will probably have HBM3 memory with up to 64gb on die. Since you won't need ram and GPU, PCs will become micro form factor, imagine how many sockets for these monsters can u fit on standart ATX motherboard without chipset, pci-e slots, etc? :)) and u just need to watercool that bitch with AIO or custom loop. U can add for example 12tflops another chip when u run out of resources or add smaller performance one and they will stack together :) and then add most powerfull one when u get money :) and continue your life with 3 cpus and add one more in 5-6 years :)

    • @The_Man_In_Red
      @The_Man_In_Red 4 ปีที่แล้ว

      @@johannajohnson310
      I'm inclined to agree, there's only so much efficiency you can squeeze out of silicon no matter how it's designed.

  • @jorgegomes83
    @jorgegomes83 4 ปีที่แล้ว +1

    Impressive as always, sir. Thank you.

  • @tbernesto0912
    @tbernesto0912 4 ปีที่แล้ว +3

    Well, this is Absolutely Amazing.. Thank You Very Much for Sharing..
    Greetings from México... !!!

  • @notjulesatall
    @notjulesatall 4 ปีที่แล้ว

    Incredible specs. All the computing power of a GPU with SIMD intrinsics and all the software support it already has available, I really look forward programming on these chips.

  • @metallurgico
    @metallurgico 4 ปีที่แล้ว

    Finally the first video since I subscribed! I watched all your previous videos lol

  • @aarneuuk9601
    @aarneuuk9601 4 ปีที่แล้ว

    Thank you for yet more fantastic content!
    I read you make (at least some?) of your own background music
    (WOW!)

  • @lawrencedoliveiro9104
    @lawrencedoliveiro9104 4 ปีที่แล้ว +1

    11:39 No, not all the other memory types did commoditize eventually. I can think of two things that Intel bet on, that flopped: bubble memory, and RAMBUS.

  • @xXxserenityxXx
    @xXxserenityxXx 4 ปีที่แล้ว +1

    Hats off to the designers of the text reader programmers.

  • @accesser
    @accesser 4 ปีที่แล้ว +1

    Fascinating documentary, you clearly put a lot of work into this

  • @oraz.
    @oraz. 4 ปีที่แล้ว

    There are so many computing questions about whether to use the gpu that people answer with how transferring memory is the bottleneck. Things like fft are better on the cpu only because of the extra delay in transferring between system memory and the gpu. A dual purpose system sounds so much more elegant and futuristic. I hope things go in that direction.

  • @aikanikuluksi4766
    @aikanikuluksi4766 4 ปีที่แล้ว +2

    So that is where Arnold Schwarzenegger's Terminator got (or will be getting) its CPU from.

  • @erics3596
    @erics3596 4 ปีที่แล้ว +1

    RIP Sun SPARC - you kicked ass...but FJ moving away from that arch is the final nail in the coffin

  • @winstonsmith430
    @winstonsmith430 4 ปีที่แล้ว +2

    I've been waiting to see hbm used on a processor! Awesome job, it was exactly what I was predicting. As always great video Coreteks.

  • @nagyandras8857
    @nagyandras8857 4 ปีที่แล้ว

    most probably on a deskptop cpu one will find a "BIG" core for the os, sometimes gpu cores, but that may not be allways a case, and i think later on FPGAs. then, the most used function of a given code can set the FPGA to do just that. that would be hardware and software tailored for speed. i believ l1, l2,and l3 cache will be removed as we know them today, and a single cache memory will be used, accessible directly by all cores, and the FPGA . unified memory address. Sortha like a big shared l3 cache, expect fast as current l1 cache.

  • @SelecaoOfMidas
    @SelecaoOfMidas 4 ปีที่แล้ว +3

    The future with ARM processors looks great with this one.
    Interesting that Nintendo has an indirect connection to this too. One could imagine their NSO servers running on top of an A64FX processor, maybe a future console? 🤔

  • @Connor3G
    @Connor3G 4 ปีที่แล้ว

    Thanks for the fascinating video!

  • @WarriorsPhoto
    @WarriorsPhoto 4 ปีที่แล้ว +1

    Hey Celso, I am not going to lie and say I understood all that you said in this video. But 80% made sense to me. It’s an interesting time to be alive for sure. Thank you for sharing this information about modern chip designs and I hope Intel and AMD can catch up soon.

  • @TECHN01200
    @TECHN01200 4 ปีที่แล้ว +5

    When I heard "CPU and GPU" I was thinking AVX turned to 11

  • @kentaaoki8064
    @kentaaoki8064 4 ปีที่แล้ว +20

    17:01 Reseat that RAM!

  • @henrik5804
    @henrik5804 4 ปีที่แล้ว

    Very interesting video and I believe you are spot on in your predictions.

  • @valentinoesposito3614
    @valentinoesposito3614 4 ปีที่แล้ว +2

    The Japanese make the most exotic CPUs

  • @kingphiltheill
    @kingphiltheill 4 ปีที่แล้ว

    Amazing and interesting. Thanks for the video

  • @miguelangelriveiro
    @miguelangelriveiro 4 ปีที่แล้ว

    Thanks for another fantastic video!

  • @Audiman0aha
    @Audiman0aha 4 ปีที่แล้ว

    So what you're saying is the cell architecture was way ahead of its time. Small SPEs working on small tasks to paralyze a workloads bbandwidth.

  • @benegesserit9838
    @benegesserit9838 4 ปีที่แล้ว

    gratzz for 100k...

  • @IRWBRW964
    @IRWBRW964 4 ปีที่แล้ว

    Arm based servers with RiscV or whatever based accelerators does make sense for HPC in the future.

  • @kennyj4366
    @kennyj4366 4 ปีที่แล้ว +2

    How exciting, talk about disrupting technology. All it would take is the right application like Gaming consoles or very powerful SFF PC's. Thank you so much for sharing this information and knowledge. Great video.
    👍🙂👍

    • @MrTrilbe
      @MrTrilbe 4 ปีที่แล้ว

      I wouldn't say it's that disruptive, now anyway, it would have been 4 years ago when HBM2 was first in production, this is just a more mature use of that technology vs sticking it on GPU's

  • @Kalisparo
    @Kalisparo 4 ปีที่แล้ว +1

    Both AMD and Intel cpus run something similar to arm internally and uses a cisc to RISC translator.

  • @stefanosstamatiadis740
    @stefanosstamatiadis740 4 ปีที่แล้ว +2

    Blew my mind, as every time.

  • @williamhart4896
    @williamhart4896 4 ปีที่แล้ว

    A certain amount of lust for that dual socket water cooled board with the dual a64fx chips good video and thanks

  • @CookyMonzta
    @CookyMonzta 4 ปีที่แล้ว +1

    Apple is planning to employ ARM chips in their machines. When last I heard, they were going to start with their MacBooks; presumably with their custom A14 and maybe the A15 in the future. One must expect that, by the time they move their custom ARM architecture to their desktops and iMacs, they'll have the A16 ready. Would it be too farfetched to assume that we'll see this A64FX (or its successor) in a MacPro?
    And what of RISC-V? Will they get into the desktop/laptop game? Or is there something already out there that I haven't seen yet?

  • @wandersgion4989
    @wandersgion4989 4 ปีที่แล้ว

    I used to live next to the 京コンピューター Kei Supercomputer in Kobe. Very cool to see these new advancements. 👍🏻

  • @siddeshpatil8810
    @siddeshpatil8810 4 ปีที่แล้ว

    I didn't understand a thing in this video but I am curious to learn..where to start ?

  • @celsostarec6735
    @celsostarec6735 4 ปีที่แล้ว +1

    @8:53 - 3TFlops of peak bandwidth?
    Floating Point Operations x Bandwidth?
    Wouldn't it be 3TB/s of Bandwidth ou 3TFlops of Throughput?

  • @bananya6020
    @bananya6020 4 ปีที่แล้ว

    I just hope that if specialized chips become a thing, then there won't be restrictions to OS (there *shouldn't* )
    i really like the idea of having all memory chips etc on one chip. as it is right now, there is a lot of wasted space spent on circuitry connecting components like RAM and a GPU, or perhaps in the future storage, that can definitely be eliminated. in the past, we had cache dies, separate from the cpu. now it's integrated, making processing faster. same can do with other chips. imagine all we could do if we only had a processing chip that had all the work being done; we could have much smaller devices with much more power. it seems like a good step towards a future without such bulky machines. imagine an ultrabook now that can do your desktop's work better with much less space. now imagine the higher power efficiency due to less spread as well as other improvements (such as using ARM)--you could have a small ultrabook the weight of paper, the processing power of a desktop, and a battery life that lasts months (my laptop already lasts about two weeks without a charge).
    though i will say, the only actual figures really shown in this graph are of tasks highly dependent on memory, as well as being very parallelizable. perhaps with scheduling and less parallel tasks though, this chip would perform worse.
    one of my concerns about this though would then be the fact that we totally lose upgradeability--either deal with outdated tech in 2-3 years of release or buy a totally new machine. this may then give rise to subscription hardware or cloud computing, with renting server space from a host and simply using a raspberry pi-like tablet or laptop to connect and do all your work. tons of nasty stuff could rise tho such as spying, data breaches, and simply corporate or lawyer scuminess

  • @nowonmetube
    @nowonmetube 4 ปีที่แล้ว +3

    I just hope it's not the Cell processor all over again. It seems that Toshiba always tried to invent a new architecture (going as far as inventing new technology that's faster than silicon based chips), but it never takes off.

    • @karapuzo1
      @karapuzo1 4 ปีที่แล้ว

      Hear hear, remember Cell and Itanium, great on paper, not so much in real life. Always keep in mind the baggage and momentum of established software and industry.

  • @OptimumSlinky
    @OptimumSlinky 4 ปีที่แล้ว +1

    A large core with smaller cores assisting? Reminds me of the Cell and its SPUs in the PS3.

  • @artisan002
    @artisan002 4 ปีที่แล้ว

    I'm intensely curious how it handles multimedia workloads. (My bias is towards things like digital music production, where the overall description of what this chip is and does would be ideal)

  • @alurnima
    @alurnima 4 ปีที่แล้ว

    I will admit not to understand everything nor listening enough...but Coreteks Videos including Stock Footage and his Voicing are always great.

  • @quantumdot7393
    @quantumdot7393 4 ปีที่แล้ว

    i really hope this future is not far away. it is great

  • @ClaudioSL619
    @ClaudioSL619 4 ปีที่แล้ว

    congrats for your 100k subs!

  • @daydream605
    @daydream605 3 ปีที่แล้ว

    11:30
    I think ram will continue to exist, but more like a page file system that storage devices used to use like ready boost on vista.

  • @alecday3775
    @alecday3775 4 ปีที่แล้ว

    Hello there Coreteks:
    I could really see a chip like this being used in laptops cause of how powerful they are but how little space needed since there is no need for system RAM or a dedicated graphics chip meaning it would make high end gaming-grade laptops much more affordable and have much longer battery life. I mean, with the much-shrunken motherboard (really you only need a an arm for all the I/O like your USB ports, video outputs and charging port and that is it [not to mention the power button and a secondary board for the trackpad]), you could fill the the remainder of the space with a massive battery which means a much longer battery life out of the laptop MUCH LONGER than the one that was shown off at CES this year that lasts 16.5 hours (which is pretty insane in of itself). We could see batteries lasting possibly 24 hours or more on a single charge doing mundane tasks. Just think of that for a second. A laptop that is capable of running al of the latest games with all the graphical settings cranked up while playing modern AAA games WHILE STILL allowing for more than 18 hours of gaming or running more GPU-intensive workloads on battery. THAT'S INSANE. If you are going out to say a LAN (they do still exist) or to a friends house and wanted to bring your laptop to play some games for a few hours, you don't need to worry about the charger AT ALL. That is why I see a chip VERY SIMILAR to this one being used in laptops cause you could get a very powerful but very power-efficient laptop for quite possibly LESS THAN $1 grand US. One that is capable of DOING EVERYTHING that one that would set someone back more than $3 grand for now but for LESS THAN $1 grand. Really insane to think about. Dunno what you think but I think laptops are a really good use of a chip like this.

  • @NikolaosSkordilis
    @NikolaosSkordilis 4 ปีที่แล้ว

    While A64FX is indeed quite impressive, perhaps saying it's almost like a GPU is an exaggeration. Apart from having HBM2 memory (and thus a high memory bandwidth) and that rather tight 6D Tofu interconnect what part of it is like a GPU? What about its cores? How do they function, and how do they compare to GPU shader cores?
    Can they all work together in a SIMD or SIMT manner? How about their integer and floating point performance? Are these peak 3 TFLOPs mentioned at 8:55 just FP32 floating point performance from the 512-bit SIMD units or a balanced integer & floating point performance? And how long can this peak performance be sustained?

  • @redrock425
    @redrock425 4 ปีที่แล้ว +1

    Great video, always enjoy your content and usually learn something new. As an aside "H" is pronounced as aitch( from the Norman) most native English speakers get this wrong! Note I'm referring to English (the original) not "American English" whatever that is 😉👍

    • @adventureinlife7700
      @adventureinlife7700 4 ปีที่แล้ว

      Thank you! Glad I'm not the only one who noticed this. And I tried to ignore it but every time he said "haitch" it was so very distracting.

  • @doperider85
    @doperider85 4 ปีที่แล้ว +1

    Coreteks has the smoothest smokey delivery in the tech world you have it dialed in man

  • @mapesdhs597
    @mapesdhs597 4 ปีที่แล้ว

    25+ years ago SGI was at the forefront of pushing memory and I/O bandwidth as the critical foundation for effective scaleable HTPC, which made sense given their typical client base, ie. those dealing with very large datasets (GIS, medical, auto, aerospace, etc.) Their NUMALink interconnect I think topped out at 64GB/sec by the time the company was finally gone. They also did a lot of work reducing latency for large shared memory systems and it was their OS scaleability ported to Linux which allowed Linux to really take off in the HTPC market, along of course with XFS and OpenGL more generally.
    That drive though for bandwidth kinda faded after the company hit the rocks in the early 2000s; once their workstation line stopped and the tech world became x86, nobody really cared about bandwidth for the longest time, it was all about the separate componets in a system instead (CPU/gfx/RAM/storage), a decade+ of a legacy chipset arch to link things together (partly offset by having gfx and some storage connected directly to the CPU), hardly any progress in how they communicate and nothing at all in COTS terms about scaleability (at one point SGI had ideas about a workstation that could be connected together to form much larger shared memory systems, a multicore MIPS-based system using PCI Express and NVIDIA gfx, but that never came about). Now suddenly with a sharp increase in data demands, including from consumers, bandwidth is back in fashion.
    Is there any dense COTS system today that can do 1TB/sec I/O? SGI managed this almost 20 years ago with the Origin3900 series, though that required rather a lot of racks. :D I've still yet to hear of any single motherboard workstation that can match even an older Onyx2 Group Station for Defense imaging with just 64 CPUs (load and display a 67GB 2D image in less than 2 seconds), though surely Rome could do this (an Onyx3000 system could doubtless do this with far fewer CPUs, but SGI stopped publishing such technical details after 9/11). I sometimes wonder what gfx is being used by the DoD for such tasks, probably a custom/scaleable Quadro with a buttload of RAM.
    Thanks for the excellent video! It sounds like at last genuinely new directions in computing are on the horizon.
    Group Station Ref: www.sgidepot.co.uk/onyx2/groupstation.pdf

  • @materialburst983
    @materialburst983 4 ปีที่แล้ว +2

    Hey, I was about to take a nap. ;D Thanks for the content!

  • @p4nx844
    @p4nx844 4 ปีที่แล้ว

    Damn great video dude