Parallella: The Most Energy Efficient Supercomputer - Ray Hightower of ROIClear

แชร์
ฝัง
  • เผยแพร่เมื่อ 24 ก.ย. 2024
  • Slides: rayhightower.co...
    Parallella is a single-board computer roughly the size of a credit card or Raspberry Pi. Parallella runs Linux. It has 18 cores (2 ARM, 16 RISC) and you can buy it online for about $150. This presentation tells why we care about parallelism and briefly shows how parallel execution differs from serial.
    Presented at Madison+ Ruby on August 22, 2015.
    Presented by Ray Hightower of ROIClear (ROIClear.com)

ความคิดเห็น • 450

  • @gene4390
    @gene4390 7 ปีที่แล้ว +35

    The most efficient computer I ever saw (I own 2 of them) made in the 1980s the Casio FX-790P. It had built in basic programming language, scientific functions, 16kb of ram, and ran at 1Mzh (very good for the early 80s) and could run for 2 years off two tiny little watch batteries! I used mine mainly in collage and wrote my own programs. I even programed several games for it. lol Almost 35+ years later I still use the FX-790P (or renamed Tandy PC-6) durable micro computer to this day.

  • @kevincozens6837
    @kevincozens6837 6 ปีที่แล้ว +2

    The parallel "hello, world" program failed. At 12:26 there are 20 responses from 16 cores. Three cores (0,1 0,3 and 2,1) never responded. Four cores responded multiple times.

  • @CryptoJones
    @CryptoJones 6 ปีที่แล้ว +1

    Mr. Hightower, this motivates me to study parallelism more in-depth. Thank you for this.

  • @rainbowbunchie8237
    @rainbowbunchie8237 8 ปีที่แล้ว +41

    When your electronics become obsolete, put them in a drawer and keep them forever.
    Electronic things are WAY too cool to throw away, no matter how old they are. =P

    • @pyrographic380
      @pyrographic380 6 ปีที่แล้ว +1

      yeah

    • @satibel
      @satibel 6 ปีที่แล้ว

      nasty ass-dildo or nasty-ass dildo? :p

  • @oldchannel6511
    @oldchannel6511 8 ปีที่แล้ว +110

    18 cores and 1GB RAM.. Absolute savage.

    • @hydrochloricacid2146
      @hydrochloricacid2146 8 ปีที่แล้ว +8

      Bottleneck FTW

    • @KianGurney
      @KianGurney 8 ปีที่แล้ว +16

      +CasualMods 7 gamers, one CPU.

    • @0xf7c8
      @0xf7c8 8 ปีที่แล้ว +2

      +Nerd You have no idea what you are talking about.

    • @oldchannel6511
      @oldchannel6511 8 ปีที่แล้ว +2

      Yeah I do, lmao.

    • @0xf7c8
      @0xf7c8 8 ปีที่แล้ว +8

      I'll put it easy for you. You have in your head the concept that this cores are even close to a modern x86 core, when this is not the case. This cores are not even as powerful as 1 single Cuda core in a gpu. A mid-range GPU has, let's say, 650 cuda cores and 2gb of ram. And with that amount of ram they have all the memory they can handle without overshooting. And gpus can easily be used as clusters and in fact they are.
      I'm not saying that this design is perfectly well thought and they have nothing to improve, but that 1gb of ram in this kind of device is not as crazy as you think.

  • @reezlaw
    @reezlaw 8 ปีที่แล้ว

    This video being 360p in 2015 showed that we must be actually going backwards

  • @mehmetedex
    @mehmetedex 7 ปีที่แล้ว

    I can listen forever this guy. Great speech

  • @pieterrossouw8596
    @pieterrossouw8596 8 ปีที่แล้ว +34

    1GB RAM with 18 cores, for a lot of HPC applications, that is going to be a catastrophic bottleneck. In x86 architecture compute clusters, a "golden rule" is 2GB per processing core, depending on application obviously. Sure these cores are comparatively weak, but since RAM chips are pretty inexpensive, it's a shame that for that price they didn't include at least 4GB of RAM.

    • @MichaelPohoreski
      @MichaelPohoreski 8 ปีที่แล้ว +3

      +Pieter Rossouw Yup still waiting for an extremely low-cost 16 GB + 8 core, or hell, even 4GB + 4 core device. While the Raspberry Pi 2B, Banana Pi and Parallella are all "nice" SoC embedded devices the lack of 4GB+ RAM gimps theses devices from more "serious" work where our data sets are larger. :-/

    • @walter0bz
      @walter0bz 8 ปีที่แล้ว +3

      +Pieter Rossouw these are 'little cores', more comparable to GPU warps or SIMD lanes.
      one x86 core is equivalent to several parallela cores (it might be as many as 16, depends on pipeline depth, simd, execution units, I don't know off hand), so it's still about right.
      The parallela concept is still worthwhile, GPUs prove more,simpler cores have higher throughput. a big-core spends huge resources figuring out parallelism from a single thread on the fly.
      nonetheless the board has other problems, but they have to start somewhere with this new architecture needing new software. it would be perfect for AI work IMO (dataflow)

    • @0xf7c8
      @0xf7c8 8 ปีที่แล้ว

      +Pieter Rossouw If you see closely this is a Xilinx chip, probably a FPGA, so i would call it a mounted prototype. Its hard to put 4gigs of ram in a FPGA

    • @llothar68
      @llothar68 8 ปีที่แล้ว +2

      +Pieter Rossouw
      The problem is not only the RAM (yes 1GB per core is important and at least 64KB cache per core) but the RAM throughput. With just one memory channel you will get almost 0 parallelism in many real world tasks.
      I'm not even think it's a good teaching device because of this restrictions which do not let you draw conclusions about bottlenecks when everything is a bottleneck.

    • @walter0bz
      @walter0bz 8 ปีที่แล้ว

      it's really a forward looking experimental device, PGAS architecture would scale far better than anything else, but they didn't got the budget to build a large chip with newer process yet (the concept only makes sense when scaled up to thousands of cores). Chicken/egg situation with software

  • @antonnym214
    @antonnym214 8 ปีที่แล้ว

    Dr. Hightower, this is a very nice presentation. I like your style and how well you explain it for the layperson. It's pretty exciting to run a single module with that little solar generator. Makes me think it would be quite feasible to power a huge array of those with just a few solar panels on the roof. It could be virtually free to power the system. Lots of possibilities there, because for most installations, the challenge is covering the operating costs, as opposed to the initial expense.

  • @ragsdale9
    @ragsdale9 8 ปีที่แล้ว +3

    im curious if the parallella would increase wattage under high utilization.

  • @AndrewHelgeCox
    @AndrewHelgeCox 8 ปีที่แล้ว

    This is quite interesting in that it is a talk given by a person who is clearly not an expert in his subjects of parallel programming, or really of anything he touches on, but it still manages to be a little bit entertaining.

  • @ForbiddenUser403
    @ForbiddenUser403 8 ปีที่แล้ว +5

    What we really need is a parallel platform with the individual nodes configured like hot swappable modules with the ability to plug them all into a centralized expandable location with a virtualization software solution that's able to recognize the resources of all those "blades" and utilize them, and see them as traditional PC hardware allowing the use of traditional software and OS's without the need to rewrite all application to make use of parallel processing individually..

    • @jgbreezer
      @jgbreezer 7 ปีที่แล้ว

      Computer (software) can't yet parallelise problems for us automatically well enough, we still need to write things in a way ready for this. Its getting more and more the default way of writing things for scaling horizontally rather than vertically nowadays in the commercial world, but still not ready for low-level parallelism in a large way. Cultural change required.

    • @stevebez2767
      @stevebez2767 6 ปีที่แล้ว

      so buy the board write the program too do nust tjat,parallel programming next stop quanta?

    • @neilruedlinger4851
      @neilruedlinger4851 6 ปีที่แล้ว +1

      Sounds like a worthwhile project for a savvy start-up company?

  • @antonnym214
    @antonnym214 8 ปีที่แล้ว +1

    45-seconds to fully boot is pretty impressive, compared to my win7 box.

  • @dogeeconomist4825
    @dogeeconomist4825 6 ปีที่แล้ว

    I'm gonna have to start buying one of these every now and then and setting them up as an ever-growing cluster for BOINC. Much interest in future offers and capabilities as well as competing products as they emerge.

  • @antonnym214
    @antonnym214 8 ปีที่แล้ว +1

    Nice talk. Outstanding machine, and you present it very well.

  • @TrueRebel
    @TrueRebel 6 ปีที่แล้ว

    XcellenT info Ray... Parallella is the future of Super Computing and that audience couldn't make the math, ha ha ha ha ha ha ha ha. Congratulations Ray

  • @SudoPi
    @SudoPi 8 ปีที่แล้ว +28

    It will be way cooler if this would be maybe about 40$ or so. 150$ is a big price to ask from consumers to purchase a SBC

    • @assaulth3ro911
      @assaulth3ro911 8 ปีที่แล้ว +1

      +The Random Stuff Yeah. It is however different from a Pi, I think $75-$100 would be more fair.

    • @mysticvirgo9318
      @mysticvirgo9318 8 ปีที่แล้ว +2

      +The Random Stuff will most likely get less expensive per unit as they sell more and more :)

    • @supercompy
      @supercompy 8 ปีที่แล้ว +2

      +The Random Stuff They are $75 for the micro-server version and $99 for the desktop version right now on amazon.
      I think that is a fair price considering the number of cores.

    • @voyager1bg
      @voyager1bg 8 ปีที่แล้ว

      +The Random Stuff not that expensive, we're talking supercomputing here... I believe such advancements are the future

    • @SudoPi
      @SudoPi 8 ปีที่แล้ว +1

      Yea but if the price is 35$ like the Raspberry Pi than it would probably be more interesting to customers since not everyone would be willing to pay $150 just to tinker around but as you said not that expensive but it really depends on who is looking at the price point and for me, the 35$ price tag on the Pi 2 is cooler

  • @JohnVegas
    @JohnVegas 8 ปีที่แล้ว +1

    I always enjoy your presentations. God bless!

  • @Rarius
    @Rarius 8 ปีที่แล้ว +78

    1) Note that he compares his 18 core system with just a single core of the Mac, not with running on all four cores!
    2) I coded up this algorithm in C# on my two year old PC (Intel i5-3570K!)... and even running single threaded it managed it in 6.65s... three times faster than this Parallella, and twice as fast as the Mac!
    3) This is a pretty poor algorithm for finding primes... There are FAR better ones. For instance, on my PC, the sieve of Eratosthenes algorithm gets the same result in 0.38s! Better algorithms often (usually?) yield better results than throwing more hardware at a problem.
    While applaud the effort going into the Parallella, it needs to be significantly faster before it is worth investing in.
    It might be interesting to see how a stack of Raspberry Pi 3s (you can get 4 Pis with change from $150) would do with their 16 cores.

    • @fatkidd7782
      @fatkidd7782 8 ปีที่แล้ว +7

      everybody needs to read this

    • @dialupdavid
      @dialupdavid 8 ปีที่แล้ว +2

      This was my first thought too, no idea why in the hell anyone thought it would be a good idea to compare a single thread of a Quad core/ Eight thread system to a Dual core ARM chip with 16 Co-processing cores. Makes no logical sense too me, were they that offended in how low the performance was? To me this was nothing technical, this guy was no Engineer/Enthusiast; solely a salesman with a sales pitch.

    • @owatson67
      @owatson67 8 ปีที่แล้ว +9

      Yeah but does your PC use 5 watts and did it cost $150? I haven't run this algorithmic on my PC yet but i know it would push a good time. It has a i7-6700HQ which is quad core CPU with 8 threads but I know that it would beat it but it's not the point.

    • @Rarius
      @Rarius 8 ปีที่แล้ว +2

      No my PC doesn't consume 5 watts or cost $150... but neither does the Apple he compares the Parallella with.
      Actually, you could build a PC for less than $150 using second hand parts that would outperform the Parallela AND be much easier to program.
      I suspect that a $150 cluster of Raspberry Pis would give it a good run for its money too.

    • @dialupdavid
      @dialupdavid 8 ปีที่แล้ว +1

      Thunder o Well, the Tegra X1 has about the same Power requirements, and a 256 Core Maxwell GPU. Anyone who's going to do parallel processing is going to be 10x better off using CUDA or openCL. Not to mention the Actual A57's in that SoC are probably faster than the entire Parallela board by a factor of 3-4.

  • @larrycastro7937
    @larrycastro7937 8 ปีที่แล้ว

    I stumbled onto this website, and thought it was fascinating. All I know is about Moore'slaw, doubling of transistors on a microchip every eighteen months.

  • @mike_98058
    @mike_98058 7 ปีที่แล้ว +13

    Mr Hightower failed to demonstrate that Parallela was circa 2015 the most energy efficient supercomputer on the planet. He failed to compute the efficiency in terms of FLOPS/watt which was his initial basis of comparison.

    • @GeekBoy03
      @GeekBoy03 7 ปีที่แล้ว +4

      The item is actually from 2012, but released in 2013. Four years, and nothing new from them

    • @neilruedlinger4851
      @neilruedlinger4851 6 ปีที่แล้ว +1

      I did a computation based on Watts per Core.
      The Parallela is 18.26 times more energy efficient than the Tianha-2.

    • @pwnmeisterage
      @pwnmeisterage 6 ปีที่แล้ว +1

      Now there's Epiphany-V, a 1024-core RISC SoC, and Epiphany-VI is already underway.
      Tianha-2 undergoes constant rotating upgrades as the Xeon/Phi cores are rescaled up and out.
      The presenter did explain that raw core count is not entirely meaningful in "real" (complex) problems, he was only able to demonstrate their overwhelming advantage in "optimum" (simplex) problems.
      The Tianha-2 was designed to be a unified distributed supercomputing platform, not an ad-hoc modular system with "limitless" expandibility - I suspect that in the real world it uses far less electrical power than the huge number of Parallella SBCs that would be needed to solve the same problems in the same time. Even energy efficiencies aren't linear, you can't just keep stacking LEGO computer modules together indefinitely, there are diminishing returns.

    • @monad_tcp
      @monad_tcp 6 ปีที่แล้ว

      Without a new programming model you can't extract all that performance. Of course the culprit is the C language, but not everyone can program in Haskell yet.

    • @andrewyork3869
      @andrewyork3869 6 ปีที่แล้ว

      @@monad_tcp what about ASM?

  • @atranimecs
    @atranimecs 8 ปีที่แล้ว

    It's not about the raw power of parallella, its about the performance/watt ratio and also the heat output.

  • @daveb5041
    @daveb5041 7 ปีที่แล้ว +3

    Why not put it in series with a 5w light bulb, the brightness of the bulb will show power consumption. The best way to save electricity is to make a computer that runs on vacuum tubes instead of transistors. A tube can take the place of three to five transistors so you can shrink a billion core processor down to 300 million tubes. To power it dont run it on coal, have monkeys pedaling bicycles hooked to generators. Feed them GMO bananas made by monsanto to cut down on food costs. You can also have them make copies of Shakespeare by putting a type writer in front of each one. Statistics proves that with enough time and type writers one will publish a complete work.

  • @05Rudey
    @05Rudey 8 ปีที่แล้ว +4

    I want one just to tell my mates that I've got a super computer.

  • @tigerbody69
    @tigerbody69 6 ปีที่แล้ว +4

    "Will it float?"

  • @eggraf
    @eggraf 8 ปีที่แล้ว +18

    Run it in parallel on the Mac. he only ran it serially...

    • @SarahC2
      @SarahC2 7 ปีที่แล้ว +2

      3.6 seconds...

    • @minecraftermad
      @minecraftermad 6 ปีที่แล้ว +1

      kek now do it on a 5W vega or ryzen based thingy (most energy efficient from what i've heard but might be wrong about ryzen)

    • @jvebarnes
      @jvebarnes 6 ปีที่แล้ว

      2015 vs 2018 we cannot know the future

  • @DaHaiZhu
    @DaHaiZhu 8 ปีที่แล้ว

    He never did say how much more energy efficient the Parallella was per core to the Chinese Supercomputer. In other words, how many petaflops per watt does the Parallella use compared to the Chinese Supercomputer???

  • @MrGencyExit64
    @MrGencyExit64 8 ปีที่แล้ว

    GPU cores are general-purpose too, they just work at peak efficiency when coupled with specialized hardware that handles scheduling, memory fetches, etc. for the sorts of patterns (i.e. 4 pixels at a time) used in rendering. You'd need A LOT more of them to achieve their kind of performance without that extra support hardware :)

  • @idhan
    @idhan 7 ปีที่แล้ว +5

    the prime calculation program can be easily run in parallel on the mac.. assuming it has 4 logical processors.. it could run in about 3.5 seconds. That should be the real comparison.. saying that, still the Parallella is an amazing peace of hardware :-)

    • @altEFG
      @altEFG 6 ปีที่แล้ว

      4 times vs 13 times faster in parallel

    • @tigerbody69
      @tigerbody69 6 ปีที่แล้ว

      please make a vide and show us

  • @12kenbutsuri
    @12kenbutsuri 3 ปีที่แล้ว +1

    I ordered one once, it was completely broken by the time it arrived.

  • @ultraviolet.catastrophe
    @ultraviolet.catastrophe 3 ปีที่แล้ว

    Why would I buy this when I can buy 6 Raspberry Pi 3 boards that would give me a total of 24 cores?

  • @djprodigalsun
    @djprodigalsun 8 ปีที่แล้ว

    He is using the battery in that solar pack, why don't you tell us what the solar efficiency of that panel is..

  • @duderobi
    @duderobi 7 ปีที่แล้ว +1

    3:25 dit I hear right 2 ARM (Acorn Risc Mashine) and 14 Risc cores?

    • @56335130
      @56335130 4 ปีที่แล้ว

      xilinx zynq is a fpga soc

  • @justy1337
    @justy1337 6 ปีที่แล้ว

    Just wished that the video was in full hd.

  • @antonnym214
    @antonnym214 8 ปีที่แล้ว

    18 cores for $150 is pretty spectacular, especially compared to a standard AMD or Intel CPU.

  • @GeekBoy03
    @GeekBoy03 8 ปีที่แล้ว +21

    Seems Parallella is fizzing out. Three years, and no new models.

    • @teknostatik1055
      @teknostatik1055 8 ปีที่แล้ว

      Could just be gaining momentum since it's a fairly new product.

    • @GeekBoy03
      @GeekBoy03 8 ปีที่แล้ว +10

      Tekno Statik Three years in technology is a very long time. It's had more than enough time to get grounded, and new models to appear.

    • @teknostatik1055
      @teknostatik1055 8 ปีที่แล้ว

      Yeah, no... What parallela is doing is it's adding one or more dimensions for instructions to be run in parallel to each other (hence the name). Where it gets complicated is HOW the work is split because not all tasks require the same level of splitting, not all tasks can be split the same way, this example took trial and error to be split into the correct number of cores, and the program had to be re-written from serial to work in parallel.

    • @GeekBoy03
      @GeekBoy03 8 ปีที่แล้ว +2

      Tekno Statik I take if you have zero understanding of product life cycles. I was referring to nothing new coming out in three years, not learning how to program the thing.

    • @teknostatik1055
      @teknostatik1055 8 ปีที่แล้ว

      And I take it you know nothing of programming.
      How are we supposed to translate every program from serial into parallel? Have you any concept of the implications that go BEYOND your so-called "technology" and "product"? Without parallel programming there will be no product.

  • @davecc0000
    @davecc0000 7 ปีที่แล้ว

    Excellent presentation, understandable, great examples.

  • @GrennKren
    @GrennKren 8 ปีที่แล้ว +1

    I saw the future!

  • @jessstuart7495
    @jessstuart7495 7 ปีที่แล้ว

    Any programming language or compiler developments for developing parallel software? I would think the compiler would have to know a lot about the underlying architecture to be able to produce software to efficiently allocate and manage the cores and memory.

  • @chrisking7603
    @chrisking7603 6 ปีที่แล้ว

    This video its presenter are quite entertaining, but I wanted: #1 a clear comparison of megaflops per megawatt-hour against currently optimised supercomputers; #2 explanation of how linearly adding parallel cores can really compensate for a limit in polynomial growth of chip density.
    Apart from being properly RISC, this is seemed very Transputer-ish.

  • @Tommo_
    @Tommo_ 8 ปีที่แล้ว

    macs are only expensive because of how compact they are. if you look at the inside of a 12 inch macbook, the whole 8gb of ram and 500gb of storage fits into about 10 by 5 cm of space. The rest is the battery. And it runs without a fan. Amazing.

  • @FlumenSanctiViti
    @FlumenSanctiViti 6 ปีที่แล้ว

    I'm not a programmer, but... his code at 12:56 should return TRUE for input number 4?

  • @IDoThisInMySpareTime
    @IDoThisInMySpareTime 9 ปีที่แล้ว

    Interesting video, on geek.com they listed the parallella as achiving around 90 GFLOPS. So if i did the math correctly there, a supercomputer with the processing power of Tianhe-2 (listed on wiki at 17.6 MW) would require a cluster of around 400k parallellas and run at around 2 MW?

  • @sigmareaver680
    @sigmareaver680 6 ปีที่แล้ว

    The only thing attractive here is the energy efficiency. Would it be worth crypto mining with?

  • @tenshi7angel
    @tenshi7angel 6 ปีที่แล้ว

    The problem with Parallella, there are programs that cannot be done on multi-core or multi-system setups.

  • @stevebez2767
    @stevebez2767 8 ปีที่แล้ว

    Well done with that,actual methodically harden course to project kickstart as well!

  • @rospotrebpozor3873
    @rospotrebpozor3873 8 ปีที่แล้ว +2

    The problem is that program has to compute one result before it can make decision for another.
    parallel processing does not solve that problem.

    • @Thyhorrorchannel
      @Thyhorrorchannel 8 ปีที่แล้ว +1

      +rospotreb pozor RISC .

    • @walter0bz
      @walter0bz 8 ปีที่แล้ว +1

      +rospotreb poor many algorithms parallelise fine. changes the way you program and the types of work you can do. see deep-learning (which became viable due to GPUs, and its a poor use of a CPU), it can use huge parallelism across layers and deeper nets, but suffers at the moment from communication bottlenecks in clusters. the point here is parallelism with local memories and an on-chip network overcome that.

  • @jerryschull2122
    @jerryschull2122 8 ปีที่แล้ว +6

    Seems way too pricey. The Pi2 and Pine64 are really cheap and have significant processing power, fits most project requirements.

    • @0xf7c8
      @0xf7c8 8 ปีที่แล้ว +1

      +Jerry Schull Google cluster, that is what this is designed for.

    • @stevebez2767
      @stevebez2767 6 ปีที่แล้ว +1

      yeah like all back too The Simpsons as some crazy kid finds 'dirty riffs in basic 'coo yells for skitso anarchy run hells exterimination repeat of give it too the keyIDzz sig moon frieds?

  • @JohnSmith-ut5th
    @JohnSmith-ut5th 6 ปีที่แล้ว

    A GPU is *far* more energy efficient in comparison to its processing power. A *low-end* GPU would absolutely smoke this device. The main advantage of this device is physical *portability,* not energy efficiency.

  • @KittyKittaw
    @KittyKittaw 6 ปีที่แล้ว

    Motorola - 16cores, 1985 -parallel processors, round the same time. Course it burned more power, or did it?

  • @jarisipilainen3875
    @jarisipilainen3875 7 ปีที่แล้ว +1

    are you scared to show how fast mac is on multiple threats? it was faster anyway on one lol. but yet it cost more and allmost 3 times faster core. ,ac could do it on 7 seconds and way under. your probram was only think benefit of your board lol

  • @erickleefeld4883
    @erickleefeld4883 7 ปีที่แล้ว

    Could I use something like this to run Handbrake video compressions, and use an app on my Mac to administer it?

  • @w.rustylane5650
    @w.rustylane5650 7 ปีที่แล้ว

    Nice video on parallelism, for what it's worth.

  • @rudde7251
    @rudde7251 8 ปีที่แล้ว +3

    When you find primes, do you check up till sqrt rounded up or rounded down?

    • @MrBrew4321
      @MrBrew4321 8 ปีที่แล้ว +2

      +Rudde down

    • @rudde7251
      @rudde7251 8 ปีที่แล้ว

      +Brew Dain Thanks man :)

    • @fliptmartley
      @fliptmartley 8 ปีที่แล้ว

      +Rudde, I square the prime I'm testing against and and check to see if it's larger than the candidate.

    • @mullermanden
      @mullermanden 8 ปีที่แล้ว +2

      +Rudde
      Instead of using:
      for(int i=3; i

    • @MrBrew4321
      @MrBrew4321 8 ปีที่แล้ว +3

      You can calculate sqrt(p) above the loop and store the result in a variable to use as the upper bound, but i*i changes each iteration so that isn't possible..

  • @BoggyBogdan
    @BoggyBogdan 9 ปีที่แล้ว +1

    That's awesome
    Thanks for sharing

  • @draken68
    @draken68 7 ปีที่แล้ว

    Very interesting video. What i got out of that is in metro Australia we pay $2-$2.50 in rural Australia $3-$3.50 (Australian Dollars per Watt)

  • @MrManerd
    @MrManerd 6 ปีที่แล้ว

    Does the Parallella use ECC memory?
    That's all I want.

  • @hinasamal8406
    @hinasamal8406 6 ปีที่แล้ว

    Parralela supercomputing is a fantastic idea

  • @Donatellangelo
    @Donatellangelo 8 ปีที่แล้ว

    By my calculations, building an actual supercomputer with these would almost be $100,000.00!!!!! D: Holy shit!

  • @adavistheravyn573
    @adavistheravyn573 7 ปีที่แล้ว

    I happen to work in the field of high-performance computing and had something similar in mind for my own numerical simulations or BOINC stuff.
    Before spending hundreds of Euros for RPi3 boards, I did some tests with a special version of my nbody code which is written in C, highly optimized and utilizes the OpenMP library for parallel computing. My benchmark focussed on floating-point performance with negligible RAM consumption. What are my results? Well, it's disillusioning. My benchmark task took 65 minutes on a single core of my RPi3, while a single Core i5-6500 solved the problem in 65 seconds! Using four threads, the RPi3 still took more than 18 minutes, while my Intel Core i5-6500 got that job done in 17 seconds.
    Conclusion: Neglecting communication overhead, I would have to come up with more than 60 RPi3 boards to get on par with a decent Core i5-6500 ... ARM might give you more FLOPS per Watt, but when it comes to pure floating-point performance, the architecture is still far behind.

    • @GeekBoy03
      @GeekBoy03 7 ปีที่แล้ว

      ARM processors come in a very large variety, with up to 8 cores, The Raspberry Pi 3 uses a lower end ARM Cortex-53. The upper end is the Cortex-A75. But remember, ARM has low power usage as a priority. ARM is certainly getting more powerful, and some companies has started making Laptops with AMR processors.
      Remember, the RB Pi is just for projects, and prototyping.

  • @RobbieFPV
    @RobbieFPV 8 ปีที่แล้ว +35

    haha I saw "hightower" and immediately expected a huge black cop.

    • @StefanBlurr
      @StefanBlurr 8 ปีที่แล้ว

      he died a long time ago :'(

    • @RobbieFPV
      @RobbieFPV 7 ปีที่แล้ว

      O haha yea ofcourse! I hardly play that map though. I'm more of a goldrush or dustbowl player :v

  • @RinksRides
    @RinksRides 7 ปีที่แล้ว

    i think mores law is still relevant, it;s just taking a different direction. We're getting more and more powerful computing ability while the cost and power consumption can be lowered at the same time. So, if you view it from a performance per watt level then Moore's law is still relevant in that context.

    • @smorrow
      @smorrow 7 ปีที่แล้ว

      Moore's Law proper is about number of transistors.

  • @roschereric
    @roschereric 7 ปีที่แล้ว

    Just think that at the same time, Nvidia had the 970 for less power per GFLOP already. Pair that with an ARM dual core and you are better performer

  • @fy7589
    @fy7589 6 ปีที่แล้ว +1

    This is not a new idea. We already build cluster computers using super high end hardware and one chip in them is capable of the same speed as thousands of parallellas or raspberry pi's Just one chip in them. And it is much more power efficient and space friendly than building super big Pi Clusters. Instead, FPGA chips will become more popular in the future .

  • @Raven-fu1zz
    @Raven-fu1zz 3 ปีที่แล้ว

    Why can't you just use a GPU to do the calculations, they have thousands of cores, and per watt you would get more performance

  • @madgamer3974
    @madgamer3974 8 ปีที่แล้ว

    cloud phone connected by internet to supercomputer = best phone ever :D

  • @ChrisD__
    @ChrisD__ 8 ปีที่แล้ว +17

    If I could run Blender Cycles on this, I'll take fifty.

    • @mutantgenepool
      @mutantgenepool 6 ปีที่แล้ว +2

      Was thinking the same thing. xDD

    • @Art7220
      @Art7220 6 ปีที่แล้ว +2

      Can it run XP or Crysis, or Bitcoin Mining? Someone always asks about Crysis.

    • @afronprime51
      @afronprime51 6 ปีที่แล้ว +1

      Reading my mind

    • @Phoen1x883
      @Phoen1x883 6 ปีที่แล้ว +1

      With only 1 GB of RAM, you'd be fairly limited in your scene size.
      In addition, rendering requires lots of high speed access to _all_ the memory, as rays need to bounce around the scene (and therefore, around memory), Just looking at the block diagram, you can see that none of the cores have direct access to a large block of memory. Unless there is some extremely fast communication bus between cores, that means long pauses in execution while data is fetched from memory.
      Would be nice if we could get someone familiar with Cycles internals to take a look and evaluate whether the Parallella architecture is usable for rendering. I did some quick searches, and didn't find anything solid.

  • @terrance_huang
    @terrance_huang 6 ปีที่แล้ว

    ditch the soft cores and do it on bare metal verilog, you can get another 10x performance

  • @Nomoreidsleft
    @Nomoreidsleft 6 ปีที่แล้ว

    I don't know why he's even calling it a supercomputer. Only 16 cores, and probably doesn't even do floating point.

  • @jarisipilainen3875
    @jarisipilainen3875 7 ปีที่แล้ว

    is it 18cores and 18 extra cores? if some core will broke. some intel prosessors have 9 cores but there 9 exrts to fix broken core at fly. OR you can activate them all lol 18 cores. propably they work paraller. not atleast serial lol

  • @afronprime51
    @afronprime51 6 ปีที่แล้ว

    Can you use them as a render farm?

  • @Donatellangelo
    @Donatellangelo 8 ปีที่แล้ว

    I hope this doesn't have any of the NSA's poison on it.

  • @frostgreen5527
    @frostgreen5527 7 ปีที่แล้ว +1

    nice presentation, open source and small power consumption, not bad...

    • @stevebez2767
      @stevebez2767 6 ปีที่แล้ว

      yunno the actual grounding back ground of living in a gent with some batterries you have too recharge,some windmills,some solar lights,etc to proove you can be coz of this wailing 'universal'failing that ego yelling utter liar of any exsist invites,pays,an builds club war den non men for meter maids count sell no wellys sheep shag act of 'had you all'you know you think I was???Yellow Lines,no on,no approach too know a Meter,answer door,in gets 'bill'big blue 'company guy'your all comparing exsists too pay build three arse non element teee red robe 'kill the giy'gsus vet war law own yer run into ground zero sport o non lord manger e state yells of sit yer on a stamp,get tirkey work or starve,full slave driven yer man,carzee lie sence have it learns???watt o?

  • @IraQNid
    @IraQNid 8 ปีที่แล้ว +3

    A fractal Parallela cluster is the real answer. But how well does it run beneficial programs such as SETI@HOME and BOINC? These are programs that seek to solve our most pressing issues of the day using idle distributed CPU and GPU cycles. That idle CPU and GPU processing prowess is then used with tiny segments of data sent to users all around the globe to analyze data. Results are then sent back to the researcher's computer centers. I used to participate on SETI@HOME, Einstein@HOME, and BOINC to help solve the mysteries of our Universe, to find a cure for cancers, and to produce better rice yields to feed more people with an improvement in how the rice is grown.
    You might want to research the computational power of a Titan series GPU and something called "CUDA" :)

    • @marcusdudley7235
      @marcusdudley7235 8 ปีที่แล้ว

      I used to run BOINC too, but, although it claimed to only use idle cycles, my CPU and GPU showed much more active cooling with BOINC running and my power usage almost tripled.

  • @DAVIDGREGORYKERR
    @DAVIDGREGORYKERR 8 ปีที่แล้ว

    I wonder has anyone built a super computer around 16 boards containing 64 IMOS T800 Transputer’s each which equals 1024 Transputer cores that will run Linux.

  • @Gamepak
    @Gamepak 6 ปีที่แล้ว

    cool but does it do Crysis?

  • @einsteinwallah2
    @einsteinwallah2 4 ปีที่แล้ว

    make this in 480p or higher

  • @38KSW
    @38KSW 7 ปีที่แล้ว

    Too bad can't find this thing any place

  • @MasterGhostKnight
    @MasterGhostKnight 6 ปีที่แล้ว

    It doesn't matter if it is a MAC or whatever. You are using 1 2.7GHz processor to do the job. Each core of the paralllella was 1GHz but you have 16 of them. Let's be generous and say that each parallella is 1/3 the speed so it would take sensibly 3x as long. But you have 16 cores, so you would expect 16/3 the processing power, or it should take you 3/16 the amount of time.. lets be generous and round it up to 1/4.
    The serial still took less time to finish the job. I would say that this was a massive fail!

  • @TheTurnipKing
    @TheTurnipKing 6 ปีที่แล้ว

    16.21 That says far more about the overpricing of the Mac to me than anything else

  • @vinny142
    @vinny142 8 ปีที่แล้ว +3

    @16:19 "a 150 dollar device was comparable to a $2000 mac"
    Well, to one core of the mac, he's onlty using one ore on the mac, not all four, which is what the $2000 costs. So really he is comparing a $500 mac to a $150 device. Loose the screen and the rest of the hardware, and the price is about the same.
    And even then it's only true for this particular application. Do you do much prime-number checking? I've never done it either.
    Parallel computing is ofcourse nothing new, back in the early 2000's companies like Industrial Light and Magic and Pixar learned very quickly that you get much more bang for you buck if you add many many many small cheap nodes, than fewer faster but more expensive nodes. Adding one 2Ghz core to a system adds 2Billion instructions a second to the system, which is the same as upgrading four cores from 2Ghz to 2.5, which is a lot more expensive than one 2Ghz core.

  • @diskgrind3410
    @diskgrind3410 8 ปีที่แล้ว +1

    Other than the Jobbathehut in the audience I thought it was a good speech.

  • @jarisipilainen3875
    @jarisipilainen3875 7 ปีที่แล้ว

    if anyone intersted how fast is 5 rasbperry pi3 cluster and it not cost 180 :) but i didnt say this board not good. need more cores lol

  • @Masoudy91
    @Masoudy91 8 ปีที่แล้ว

    A mac with 2.4 GHz toke 14 sec.
    18 (or 20?) 1GHz should add up to 18 GHz or 20 GHz?
    Yet it toke 18 sec?
    Not really familiar with computation stuff .. :(

    • @ToriRocksAmos
      @ToriRocksAmos 8 ปีที่แล้ว +1

      +Yousif Tareq you can't just add the numbers up. Those are entirely different machines running different architectures.

    • @Masoudy91
      @Masoudy91 8 ปีที่แล้ว

      +Marcel Krebs yep, so I heard.

  • @theq4602
    @theq4602 7 ปีที่แล้ว

    I think the guys in this comment section are coputer nerds talking about programe efficiency. Notice he's talking about this stuff called WATTS. He's talking about an ENERGY EFFICIENT computer. NONE of the nit wits in this comment section gives a damn about power usage do you? Computer use CRAZY amounts of power. The point of the parallella isn't to get more processing power. It's to move in a direction most computers still lag behind. Energy efficiency. All that heat is simply electricity being wasted. That's electricity that could be used for a billion other things than getting you're rocks off.

    • @wopmf4345FxFDxdGaa20
      @wopmf4345FxFDxdGaa20 6 ปีที่แล้ว

      +David Vermillion The idea is to get more computing power, but more power per watt. ;) Computers don't really lag behind in energy efficiency. Especially in portable devices, like laptops and phones its one of the most important design factors of a modern computer, because they are battery powered. The problem is not only energy consumption, it is that nearly all the energy used by the system is transformed into heat, and you have to remove that heat from the device somehow. If you need fans and active cooling, that also requires energy. So if you can reduce energy consumption of the calculations, you can reduce heat dissipation and therefore need less cooling, and that way can also reduce energy used for cooling.
      In super computers these problems are in a bigger scale, and the cost of computations is directly related to the energy efficiency of the system, so energy efficiency is a very important design factor in a super computer.
      General purpose CPU's are the basically least energy efficient, GPU's are a lot better, then FPGA's again a bit better and specially purpose built circuits are usually the best.
      Indeed unit of energy is Wh, and Watts is a unit of power. :)

  • @jarisipilainen3875
    @jarisipilainen3875 7 ปีที่แล้ว +1

    you only used 1 thread on mac lol

  • @TradingFuturo
    @TradingFuturo 6 ปีที่แล้ว

    I don't really buy into parallela is general and GPU is specialized. GPUs are already quite general with openCL it just is not only good at graphics.

  • @absolute___zero
    @absolute___zero 4 ปีที่แล้ว

    a mobile phone motherboard (no display, no flash, no case) would cost 2-3 bucks from Alibaba with dual ARM processor with 0.5GB of RAM each , sor for $150 we could have 50 mobile phone boards or 100 cores running in parallel totalling 25 GB of RAM. And those would be modern ARM cores possibly with SIMD instructions, not simplified cores build using FPGA inside Parallela.I doubt 18 parallela cores would beat 100 ARMs manufactured using ASIC. Good project, but helpless against China's cheap prices.

  • @Petr75661
    @Petr75661 8 ปีที่แล้ว

    Mobileye EyeQ4 pulls 2.5 teraflops @ 3 W. Parallela gives only 0.09 teraflops @ 5 W.

    • @llothar68
      @llothar68 8 ปีที่แล้ว +1

      +jednoucelovy
      Yes it's all fake. In real world nothing in the ARM world beats Intel on performace/Watt (except GPU if you use matrix algorithms in single preceision).

  • @jimbig3997
    @jimbig3997 6 ปีที่แล้ว

    Moore's Law is on hiatus because there has been no competition to Intel until recently so they took a break.

  • @SamuelBSR
    @SamuelBSR 6 ปีที่แล้ว +1

    2015 Hahaha :))))
    It's 2018 and where is parallella now?

  • @ammonlu8566
    @ammonlu8566 6 ปีที่แล้ว

    superb talk thank you very much

  • @maxlol0
    @maxlol0 7 ปีที่แล้ว +1

    could be good as a linux media server or NAS. a bit weak for main computing task.

    • @iluan_
      @iluan_ 6 ปีที่แล้ว

      It has a ZYNQ FPGA chip from Xilinx. For many applications, that's more than enough for high performance computing.

  • @williamhart4896
    @williamhart4896 8 ปีที่แล้ว

    hmm this board plus a other companies board both of them running in one device the parella in CO process and a pine a64 ln main hmm super compute in a tablet case ?

  • @СергейЗакордонец-и6р
    @СергейЗакордонец-и6р 8 ปีที่แล้ว

    the conclusion is - its a tool for work!
    if so, then several questions have to be asked!
    1 - for what purpose is this small factor ? IMO it must be a relatively big board something like a standard server MB and with a efficient cooling solution "out of the box" (for server cases)
    2 - whats a point of energy efficiency of the board it self if the power that it takes from the wall will be significantly higher ? and that brings us back to question #1 (more chips on a single board - more effective PSU you can use) fit a 100-200 of those on a single board and it will shine ! Until ... As a technology - yah seems ok. As a device - not worth it at all!
    3 - comparing a single core mac results (14s) and multicore your devise (18s) is a vary accurate way to do the test !?
    4 - comparing price of a consumer grade notebook (with a display, design, all those other peaces that form its price, including a brand premium) with a PSB with a 4 chips and 2 connectors in it is really a great way to compare the prices!
    REALY !?
    HUGE dislike !

  • @looneyburgmusic
    @looneyburgmusic 8 ปีที่แล้ว

    There is a fatal flaw with "Moore's Law"... Where is it written that CPU's *must* always stay the same size, (or get smaller), while the transistor count rises? Simply increase the CPU die size, instead of only worrying about shrinking the transistors.
    Sure, this would cause a bottleneck for portable devices like smartphones, but for desktop/laptop computers it would not be an issue - How many customers would reject a desktop or laptop computer, (or tablet even), that is 10x more powerful, with the slight downsize that it is slightly larger, and consumes more energy?

    • @backflp
      @backflp 8 ปีที่แล้ว

      I've been wondering about this too, why not just make the physical size of a CPU bigger, making room for more transistors? What stops them from doing this with the CPU's used in parallell computing and supercomputers? Portable devices like phones and tablets are small enough anyway, no need for a smaller processor in those.

    • @sdphotography4733
      @sdphotography4733 8 ปีที่แล้ว

      www.newegg.com/Product/Product.aspx?Item=N82E16819117643&cm_re=i7-_-19-117-643-_-Product
      10 cores, and growing.

    • @0xy_
      @0xy_ 8 ปีที่แล้ว

      One of the main reasons that CPU and GPU die stay the same size is when you increase the size the thermal output increases and it becomes harder to disperse that heat. The bigger the die size the more power required to power it and the more heat that it creates. Think of it like a bigger car using more fuel. Now I know that's not the only reason but I'm on a phone and can't really research and verify. Also bigger CPU = more material = more money

    • @looneyburgmusic
      @looneyburgmusic 8 ปีที่แล้ว

      Phaint Well all know these things... But would you buy a CPU that was twice as large, (and needs more cooling/power), if it offered 10x the processing power, or 50x, or 100x?
      There are practical applications for larger die-CPU's, where power/heat dissipation would not be an issue.... Everything is a trade-off - want a thin cellphone, you need a thin CPU. Need a single computer that can do the work of 10 computers in one box, a larger CPU die could give you that...
      The bottom line is CPU die size gets smaller because that is what the market has demanded since almost day one, but there is no actual reason die size *must* shrink.

    • @ubbgn
      @ubbgn 8 ปีที่แล้ว

      Clearly u dont know shit about the business! :)

  • @artlab_one
    @artlab_one 7 ปีที่แล้ว

    Would love to see a Blender 3D test on this device :)

  • @mann2.088
    @mann2.088 7 ปีที่แล้ว

    u know there is also the proposal of a quantum computer

  • @tyronenelson9124
    @tyronenelson9124 6 ปีที่แล้ว

    So now in 2018 this parallella is way outdated

  • @hanniffydinn6019
    @hanniffydinn6019 6 ปีที่แล้ว

    Anyone remember transputers?