Assembly Language Misconceptions

แชร์
ฝัง
  • เผยแพร่เมื่อ 25 ก.ย. 2024

ความคิดเห็น • 583

  • @mheermance
    @mheermance 3 ปีที่แล้ว +400

    I learned to program in the 80s when compilers stunk, and it was a piece of cake to beat them with hand coded assembly. As a result many projects were written in assembler to run on older and newer hardware. The advent of efficient compilers was a godsend, and for work I was glad to see it sidelined. But for fun I still code in assembly because building high level features like lambda functions or garbage collectors from the ground up teaches you a great deal.

    • @sallylauper8222
      @sallylauper8222 3 ปีที่แล้ว +12

      Yeah, I thought it was really inarestin that he said that today to write faster assembly you have to know all the tricks of the compilers.

    • @SunMasterXIV
      @SunMasterXIV 3 ปีที่แล้ว +12

      I used Lattice C (and 68k assembly) on the Amiga in the 80s, and I thought it was pretty good. But the way modern compilers are able to optimize the code is sometimes amazing. It doesn't help tailormake assembly that so many x64 CPUs variations are available, where instructions execution time vary.

    • @AURORAFIELDS
      @AURORAFIELDS 3 ปีที่แล้ว +3

      68000 is a good example of why C compilers are not good for everything. A lot of the efficient code relies on passing arguments via registers, while C relies on stack frames. Memory access on the 68000 is really slow, so automatically C will be slow too.

    • @mheermance
      @mheermance 3 ปีที่แล้ว +7

      @@AURORAFIELDS true, but many C compilers implement fast call linkage. They pass by registers and the called function saves on the stack if it calls another function.

    • @Ehal256
      @Ehal256 3 ปีที่แล้ว +1

      @@mheermance finding a compiler that does that for the 68k nowadays however, is quite difficult. GCC doesn't, and while llvm recently added support, I doubt it does either. Maybe something from the 80s, but I'd rather code things by hand when performance is really important.

  • @ChiliTomatoNoodle
    @ChiliTomatoNoodle 3 ปีที่แล้ว +248

    Really good information quality and density here. This guy knows his stuff.

    • @WhatsACreel
      @WhatsACreel  3 ปีที่แล้ว +24

      Means a lot brus! You are a legend, Chili :)

    • @classicnosh
      @classicnosh 3 ปีที่แล้ว +10

      @@WhatsACreel - He's not wrong. I learned Pascal and C wasn't really taught in my school since Pascal was considered "academic". Assembly was also easier in those days since the microcomputers were much smaller and it was possible to really understand the memory map. Nowadays, the philosophy is very different. The rule of thumb is, don't try to outsmart the compiler. ;)

    • @tootaashraf1
      @tootaashraf1 2 ปีที่แล้ว

      The c++ guy

    • @Andoxico
      @Andoxico 2 ปีที่แล้ว

      ayy it's papa Chili

  • @wingunder
    @wingunder 3 ปีที่แล้ว +63

    "If you can help yourself, try not to write a virus." 😂😂😂
    You should put this quote on a t-shirt. Your sense of humor is simply wicked 👍

    • @OpenGL4ever
      @OpenGL4ever ปีที่แล้ว

      I love that line.
      And the background to that is, if you can do that, you don't need to write a virus. You will also find a well-paid job without having to drift into the criminal corner to make a lot of money.

  • @randyscorner9434
    @randyscorner9434 3 ปีที่แล้ว +83

    With current compiler technology there is one area where the move to assembly provides massive advantages. That is when you can vectorize the code to fully use the SSE and MMX extensions. For one routine, unrolling the loop 1 time fit the register set, allowed 8 wide vector calculations and increased the overall performance of a high end electronic piano by 12X. This was sufficient to move the program off a new Mac to a RPI3. The load went from 40% of the CPU on the Mac to 9% of the CPU on the RPI3 with just one thread. Getting to this point with a high level programming language requires a different compiler and coupling that to C or C++ is much harder than doing the 60 assembly instructions by hand.
    It's all about how badly one or two routines dominate the runtime. It's often the case that these "hotspots" can get extra love and show major performance improvement. Of course, the best optimization would be to stop using Python as production code.....

    • @thomasmaughan4798
      @thomasmaughan4798 2 ปีที่แล้ว +27

      "Of course, the best optimization would be to stop using Python as production code"
      LOL 🙂

    • @FM-tq2gs
      @FM-tq2gs ปีที่แล้ว +2

      Newbie question: why can't compilers do that kind of optimization? Will they be able to one day?

    • @Mr8lacklp
      @Mr8lacklp ปีที่แล้ว +16

      ​@@FM-tq2gs they will be able to do it sometimes in the future but there are really two problems here:
      One is that the compiler can only do an optimization if it can prove that it won't change the behavior of the program for any value it might possibly see and it simply doesn't have all the information as all it sees is the source code. You might for example have a number that represents the day of the week so *you* know it's never going to be greater than seven but the compiler can't know that so it can't apply any optimizations that assume that the number won't be greater than seven. So there are some optimization you can do that are literally impossible to do for a compiler no matter how advanced.
      The other problem is that both finding an optimization and proving that it doesn't change the behavior of the code are very difficult and not generally things computers can do at all. And this is where compilers are steadily getting better but it's very possible that there are some optimizations that will just never be worth the longer compile times or the effort of implementing them.

    • @FM-tq2gs
      @FM-tq2gs ปีที่แล้ว +3

      @@Mr8lacklp thank you for the explanation!

    • @robegatt
      @robegatt ปีที่แล้ว

      ​@@Mr8lacklpyeah, that is why some programming language are better than others... a Pascal compiler could easily do what you said in the first example.

  • @spacewolfjr
    @spacewolfjr 3 ปีที่แล้ว +196

    I work in CyberSecurity and end up using assembly a lot when reverse engineering / disassembling malware, it's an essential skill for that kind of work

    • @shanehebert396
      @shanehebert396 3 ปีที่แล้ว +28

      Well... you have to since I doubt the malware writers are going to give you the source and all you have is the executable ;)

    • @tappineapple3381
      @tappineapple3381 3 ปีที่แล้ว +5

      Did you go to college? If so what did you major in? I am currently a junior in high school and I would like to further learn about reverse engineering and getting better with stuff like IDA and reclass. Any advice?

    • @y2ksw1
      @y2ksw1 3 ปีที่แล้ว +1

      Agreed.

    • @y2ksw1
      @y2ksw1 3 ปีที่แล้ว +11

      @@tappineapple3381 I suggest to disassemble Viruses. Most of them are brilliant examples of engineering and most of them are made by true masters of art.
      The next step I suggest, is to make your own operating system. If you master this step, you will have no problem to solve all other problems you may come across.

    • @tappineapple3381
      @tappineapple3381 3 ปีที่แล้ว +4

      @@y2ksw1 Thank you!, I have been following the tutorials on guided hacking and I have very much enjoyed reversing video games and I feel like malware would be the next best step. Now, making an operating system scares me.

  • @lgrantcdg
    @lgrantcdg 3 ปีที่แล้ว +35

    Excellent talk!
    In the 1970s at General Motors Research Labs, they ran an experiment with a PLI-based computer graphics system. They recoded a few high-usage routines in assembly language. The system got faster. Then they recoded them in PLI and the system got even faster. Then they recoded them in assembly language again, and it got faster still.
    It turned out that each time they recoded the routines, they improved the algorithm, and that made much more of a difference than which language they used.

  • @craigmhall
    @craigmhall ปีที่แล้ว +6

    I rarely write in assembly any more, but it's good to know for:
    -debugging release / optimized code
    -studying the generated assembly and finding ways to tweak the source code to generate better assembly
    -generally understanding how the machine works, what is expensive and what is not

    •  ปีที่แล้ว +2

      This! I personally write asm only as a hobby for microcontrollers, where cycle-level timing is sometimes required (the rest of the time C suffices), but I read it a lot more as disassembled code for the reasons you mentioned.

  • @guillermoleon0216
    @guillermoleon0216 3 ปีที่แล้ว +19

    First Assembly I ever learned was for the Z80 and I absolutely loved it! I don't use it at work but getting to know it taught me a lot about how computers work.

  • @kevinjensen3056
    @kevinjensen3056 3 ปีที่แล้ว +12

    Been programming in assembly and C since '79. Assembly is still widely in my field of embedded programming, but I haven't needed to resort to it for years. The code density that an expert on the CPU can achieve in assembly is incredible. Still most of what you've said is correct for most complex CPUs, but some comments are a little inaccurate for embedded processors today. Most MCU core instructions are still atomic, but the problem of mutilthreaded read write race conditions still apply when the data size is less than the buss width. This sort of issue appears in most interview tests for embedded programmers.
    You really should do a lecture on race conditions at the sub instruction level (as you just did), the instruction level, at the thread level, the o/s level and even beyond.
    Liked your lecture on radix sort. Never tried that one before. Keep up the good work.

  • @SimGunther
    @SimGunther 3 ปีที่แล้ว +154

    Gotos are NOT considered harmful
    Wormholes in the other hand are considered VERY harmful

    • @k7iq
      @k7iq 3 ปีที่แล้ว +31

      If one does not like "goto" then just rename it to jmp and then it's OK because it's what the compiler might output in assembly anyway ! 😁

    • @imperatoreTomas
      @imperatoreTomas 3 ปีที่แล้ว +4

      Goto is my favorite function

    • @programaths
      @programaths 3 ปีที่แล้ว +8

      In BASIC, well, it was very present. I learned that on my own and was used to put GOTO everywhere as it was the way to skip code based on a value "ON x GOTO label1,label2,label3" (or line numbers!)
      Then I used GOTO also to recycle code (as in GOSUD).
      Very good for state machines too, even if I didn't know it had a name.
      Then I had to take visual basic courses at school and the teacher was pulling her hair reading my code...no FOR and IF, GOTO worked just fine. On top of that, I kept my habit of reusing code.
      I am not even sure I would be able to understand my own code as I totally forgot that habit. Still, have good memories of that because the teacher ended up saying she will not correct it anymore and just give points for it working as intended. ^^ At the same time, others had troubles to understand what a variable was and I had already implemented snake and Sokoban just for fun :-D
      (As devs, we find it to be very simple, but I taught a bit too and this is a huge hurdle!)

    • @LionKimbro
      @LionKimbro 3 ปีที่แล้ว +16

      Wormhole = en.wikipedia.org/wiki/COMEFROM

    • @roygalaasen
      @roygalaasen 3 ปีที่แล้ว +3

      @@programaths when I started out with computer classes back in 1991, we had to draw flowcharts before we were allowed to write a single line of code. Only one entry point, one exit point and no lines were allowed to cross, essentially banning goto entirely.
      Now my favourite programming language, Swift, is sometimes forcing you to use a label to tell which loop you want to BREAK out of, which is essentially a goto in disguise.
      My brain cringes but I have to get used to it lol
      Edit: to clarify. Break in all programming languages breaks out of neared LOOP. If you are in a switch .. case you will still break out of the nearest loop. In Swift you will break out of the switch case, still stuck in the loop unless you label the loop you want to break out of.

  • @mattias3668
    @mattias3668 3 ปีที่แล้ว +49

    There are some case were you want to use assembly for performance because the compiler will not choose the best instructions for your good. For example, if you are addition on bigints, you will probably with to use the addition with carry instruction, which the compiler probably will not be able to figure out that it can use. And there are probably a large number of very specialised instructions like this, I imagine for example that the compiler won't use the SHA or AES instructions.
    Not only are there different assembly languages for different architectures, you also have different dialects for different assemblers.

    • @WhatsACreel
      @WhatsACreel  3 ปีที่แล้ว +18

      I absolutely agree!

    • @shanehebert396
      @shanehebert396 3 ปีที่แล้ว +10

      You would hope that if you are using a library that's implemented bigint or SHA/AES that the people who wrote the library used intrinsics to implement the library calls.

    • @mattias3668
      @mattias3668 3 ปีที่แล้ว +9

      ​@@shanehebert396 Actually, I wouldn't necessarily hope that. When I implemented addition for bigint, GCC didn't have a good intrinsic for doing add with carry (I don't know it it has now), the closest it had was addition with overflow detection, which it couldn't optimised, so inline assembly was necessary for good performance. So you want your bignum to use inline assembly in this case, and then just add a portable fallback for unknown architectures. In other situations, intrinsics may work just as well, but in these cases you still need a portable fallback, so the older reason to use intrinsics instead of inline assembly in these situations is that the intrinsics may be supported for multiple architectures, and hopefully most compilers will recognise them, but that's not necessarily they case, and it is more likely that they will recognise the inline assembly.
      Similarly, intrinsics for SHA/AES, if there even are any, are not portable.

    • @shanehebert396
      @shanehebert396 3 ปีที่แล้ว +3

      @@mattias3668 yeah, that's the beauty of conditional compilation ;) if the arch is detected, use the version of the library that uses intrinsics, if not, fall back to the library made from portable code. Then it's up to the library providers (or an interested 3rd party in the case of open source) to add to the project.
      But yes, you're also at the mercy of the compiler and how it generates code (gcc, in your case, with add with carry).

    • @andrewdunbar828
      @andrewdunbar828 2 ปีที่แล้ว

      Rotate instructions are also not accessible from your high level language. Endian-switching instructions used to be inaccessible too but various compiler + CPU combos I looked at a while ago could recognize most ways to do endian switching in C and produce the right ASM code... but not always!

  • @clickrick
    @clickrick 3 ปีที่แล้ว +6

    I'm glad you got to the point that there are assembly languages for just about every processor and didn't allow people to assume that x86 is all there is.
    As someone who has written assembler on ICL 1900, IBM 360 & 370, DEC PDP 11, as well as microprocessors like the 6502 and Z80, I've become aware of just how different the fundamental architectures are, in particular addressing modes.

  • @ParagonX13
    @ParagonX13 2 ปีที่แล้ว +8

    i'm a young person and i taught myself reverse engineering/assembly over the past several years (messing around with disassemblers and searching my questions on the internet) and actually enjoyed it way more than i thought i would... at first it was just a means to an end but i very quickly grew fascinated with it all. i have no idea what to do with this passion though other than hobby projects... :p

    • @OpenGL4ever
      @OpenGL4ever ปีที่แล้ว

      If you need a playground. Many open source audio and video codecs are already optimized for the x86 and ARM architectures, but this is not yet the case for the RISC-V architecture. So you could buy a single board computer (SBC) with a RISC-V CPU and then see what could be optimized there. You would need to learn RISC-V assembly though.

  • @Guztav1337
    @Guztav1337 3 ปีที่แล้ว +26

    You should get more cushions/backdrop in the room, there is a bit of echo in the background.

    • @mrdouble
      @mrdouble 3 ปีที่แล้ว

      Was thinking the same, looks like an expensive mic though :/

    • @swharden
      @swharden 2 ปีที่แล้ว +1

      The condenser microphone is "too nice". It's picking-up every little echo in the room. A dynamic microphone or a basic gaming headset (microphone closer to the mouth) could be better options for this space.
      Edit: audio is good in later videos

  • @ricos1497
    @ricos1497 3 ปีที่แล้ว +13

    If I'm to take just one thing from this video its that I shouldn't write viruses. One virus, absolutely fine - or recommended perhaps - viruses, not. Great advice, thanks.

  • @brannonharris4642
    @brannonharris4642 3 ปีที่แล้ว +4

    Reductive learning. Discovering what something is not is seemingly more potent than only pondering on what that thing is.
    Love this video!

  • @DownhillAllTheWay
    @DownhillAllTheWay 3 ปีที่แล้ว +11

    12:15 "Assembly language is the language of the hardware."
    Permit me to nit-pick. *_Machine language_* is the language of the hardware. Asm is a near-English representation of it.
    Many years ago, I had access to a Data General Nova computer (it was the back-up machine on a customer site). I knew how to swap modules, and I was OK at hardware maintenance (scopes, and that sort of stuff) but I didn't know anything about computers at the time. By reading the manual, I entered a 3 (in binary) into a memory address, and a 6 into another address using the front-panel switches, then I wrote an instruction in machine code to add them together - and it produced a 9 in the destination address - a thrill that I remember to this day.
    I learned the machine code pretty well on that machine, and wrote an assembler in binary code. I had been intending to write diagnostics on the machine, but I moved on before I did that, and never used my (rather strange) assembler. Well, I had never seen an assembler up to that point, so I didn't have much to go on.

    • @ancapftw9113
      @ancapftw9113 2 ปีที่แล้ว

      The best example I saw was a guy making a 6202 (I think) program by writing to a ram chip and feeding it into the processor. He showed what the assembly would look like, but had to program it in hex code.

  • @hell0kitje
    @hell0kitje 3 ปีที่แล้ว +15

    Glad to see you back, mate :) I started with your c++vids and now im discoveri g asm, keep posting more!

  • @draconite
    @draconite 2 ปีที่แล้ว +11

    #1: This does depend on the architecture you're building for. Compiling for the 68000 with GCC, it's easy to beat the compiler if you know what you're doing

    • @OpenGL4ever
      @OpenGL4ever ปีที่แล้ว

      You've already made an assumption here, using a specific compiler. On the other hand, if you use a compiler that is optimized for the use of fast calls and 68k, then it can look different.

  • @starpawsy
    @starpawsy 2 ปีที่แล้ว +2

    Most successful assembly program I wrote was in 1992. I did a square root function using Newton's method, that was faster than what the compiler of the day provided in the maths library! In those days, the width of the floating point divide register was 80 bits. Dunno what it is today. This might not work today.
    As an aside, some people night say "only 80 bits"? Well, consider that 80 bits == 24 significant decimal digits. Consider that if you measure the diameter of the known universe to 24 significant figures, the last figure is less than the classical diameter of a hydrogen atom.
    Newton's method for calculating the square root of x.
    Start with a guess, call it a.
    Calculate b = x/a.
    Take the average c = (a/b)/2.
    That will be closer than either a or b. Use c as your next guess for a and iterate. Keep going until a & b vary only by 1 in the LSB.
    The challenge was making a really really good guess for a that works for all numbers. I hit on the idea of dividing the exponent by 2 (shift right by 1) , and zeroing all but the most significant bit of the mantissa. For negative exponents you do the opposite - double the value of the exponent. This actually worked really well.!
    Here's a worked example. Square root of 10 (well actually 10.000000000000000000000000000000)
    start with 3
    10 / 3 = 3.33...
    3 + 3.33...= 6.33...
    divide by 2 = 3.166...
    In one iteration, you've got 2 decimal places.

  • @Alex-op2kc
    @Alex-op2kc 3 ปีที่แล้ว +6

    Here's an alternative definition: An assembly language is a set of mnemonics and other language elements defined by an assembler that let you write symbolic statements that map to hardware instructions.
    Under that definition, there can be multiple assembly languages per architecture. For example, there are multiple assemblers for x86: MASM, NASM, YASM, and fasm. And each define a different, although very similar, assembly language.

    • @robertobokarev439
      @robertobokarev439 ปีที่แล้ว

      Nasm has the finest "classical" syntax, while all you wanna do looking at masm is to go back to C. Can't tell anything about fasm and yasm, don't have enough experience

  • @TerjeMathisen
    @TerjeMathisen 3 ปีที่แล้ว +4

    Congratulations Creel, you've managed to create a very informative set of videos on x86 asm, all stuff that I would have loved to have back in the days, starting in 1982 when I had to write interrupt drivers in hex. :-)
    PS. I went on to use asm on everything from video (DVD & BluRay) & audio codecs (ogg vorbis), crypto (AES competition), games (Quake) and I still write some really low-level code, usually using compiler intrinsics since Visual Studio doesn't allow inline asm anymore. :-(

  • @johnyoungquist6540
    @johnyoungquist6540 3 ปีที่แล้ว +62

    Talking about assembly in general across different processors is fraught with trouble. I do embedded apps in 8051 assembly only. In fact I wrote the assembler. I can promise that C in the 8051 environment is at least 500% slower and also 500% bigger than assembly even for simple things that C should be good at. It is widely accepted that compilers use a tiny fraction of the instructions set and leave a lot behind. It is easy to point out that ordinary languages contain no information to help compilers use special instructions or constructs. The assembly programmer will recognize an AES algorithm and use the AES instructions a C compiler won't. In modern processors the compiler code generator could hold a significant advantage over the programmer with a detailed knowledge of architecture magic like pipelines, cores, caches, threads. I don't know they handle the moving target of the new processor of the week or tell what processor they will run on. One processors optimization is another's down fall. In contrast the assembly programmer wizard may better the C code speed by 100 times or more with devilish clever thinking and detailed knowledge of the whole instruction set.
    One thing that is universally overlooked is how assembly and high level applications are similar. Apps are typically constructed of functions tailored to do common things for that app. If you need 98 digits precision you'll be writing routines to handle that in any language. These modules are easy to define and test and spread among several programmers. We build bricks first then walls later. A function call is about the same complexity and work to implement in any language. Now all of a sudden apps in all languages are basically function calls and logically look about the same. Neither is more difficult than the other. The planning stage and logic can be nearly identical for any language.

    • @donjindra
      @donjindra 3 ปีที่แล้ว +5

      Exactly. People who don't regularly program in assembler have no idea how much faster assembler is than any high level language. Compiler optimization cannot compete with a programmer who knows the instruction set intimately and can tailor the use of those instructions for a particular task. A 10x improvement in speed is pretty normal. OTOH, a poor programmer is not going to benefit much from assembler code. You have to know what you're doing. The 8051 is a good example. That cpu is so weird a compiler can't deal with it efficiently. A compiler does better with something like ARM.

    • @SimonBuchanNz
      @SimonBuchanNz 3 ปีที่แล้ว +14

      @@donjindra complier optimisation can definitely best any reasonable amount of effort for the majority of code, assuming you're not using the trivial C implementations that come with microcontrollers - inlining and avoiding pipeline stalls is drudge work that's better to let the computer handle, especially when your problem is getting something working or cleaning up a mess, not making something faster. Not always, there's always going to be some cases that confuse a compiler enough that it's easier for you to use assembly than to figure out how to mangle your code so the compiler does the right thing, but advanced instructions are available through intrinsics, and compilers will auto vectorize loops, and so on. The low hanging fruit is getting picked all the time.

    • @donjindra
      @donjindra 3 ปีที่แล้ว +1

      @@SimonBuchanNz I don't know why you think that. In fact, I don't even know what sort of code you have in mind. I don't advocate using assembler to add two register-width numbers.

    • @SimonBuchanNz
      @SimonBuchanNz 3 ปีที่แล้ว +3

      @@donjindra sorry, could you clarify what I said that you have an issue with? I was taking about your statement that "a compiler can never compete with [an assembly] programmer": trivially true in that said assembly programmer could at worst use the same instructions, but not practically true. Not sure where you're getting adding numbers from, but if that's literally all you're doing, then actually yeah, you probably will beat a compiler. It's the 50kloc of "adding two numbers" that's not worth the absurd effort to keep optimized in assembly, and mixing and matching can (depending on your baseline) actually pessimize the code since the compiler can't inline now.

    • @donjindra
      @donjindra 3 ปีที่แล้ว +1

      @@SimonBuchanNz Concerning adding numbers I said the opposite of what you think I said. If the task is simple, such as adding two numbers, the compiler does just fine. There's no point in resorting to assembler. It's the complicated, time consuming tasks that benefit from assembler. Compiler optimization was done by assembly language programmers. But they optimize general cases. They aren't magicians. They can't predict all particular cases. Therefore they cannot optimize for all of them. I have no idea what you mean by the end of your comment.

  • @3Balala3
    @3Balala3 3 ปีที่แล้ว +7

    Great video, helps a lot understanding the assemly's place and purpose nowdays. Also great timing. Tomorrow I have an exam in assembly. We are programming on an emulated dos program. Really, really interesting... :D

  • @alberto3028
    @alberto3028 3 ปีที่แล้ว +45

    ASM is perfect for bootloaders and some parts of OS

    • @WhatsACreel
      @WhatsACreel  3 ปีที่แล้ว +14

      It is indeed! UEFI changed the necessity a little, but certainly low level OS code is one of the most important use cases for ASM! Cheers for watching mate :)

    • @lewiscole5193
      @lewiscole5193 3 ปีที่แล้ว +5

      Assembly language gives complete control of the hardware to the programmer in a way that no HLL can, in no small part because assembly language is processor architecture specific, while an HLL is supposed to be processor architecture independent.
      So, it's not that "ASM is perfect for bootloaders and some parts of OS", it's that there is no other way to get there from here using an HLL.

    • @WhatsACreel
      @WhatsACreel  3 ปีที่แล้ว +3

      @ozan o. I would love to :) Judging by the recent reviews of Apple’s new M1, I think maybe ARM will give x86 a very good shake very soon! We might be witnessing the beginnings of the fall of x86 in the laptop and desktop markets...? Unbelievable!
      Not sure when I can cover these things, but they’re certainly on my to-do list. Thanks for the suggestions, and cheers for watching :)

    • @lewiscole5193
      @lewiscole5193 3 ปีที่แล้ว

      @ozan o.
      OSs have to change over time to meet new hardware and/or user demands, or else they die off.
      Unix is no different and has evolved over time to be different than what it originally started out as.
      So in a very real sense, I suspect that Tony Hoare's famous saying, “I don't know what the language of the year 2000 will look like, but I know it will be called Fortran,” has applicability to OSs with "Linux"/"Unix" being substituted for "Fortran".
      And keep in mind that there already environments where "Linux"/"Unix" is not king ... real time environments such as can be found in cars where QNX, a proprietary message passing microkernel based OS (which can run on ARM based systems by the way), is already more common.
      Yet, thanks to the Posix standard and the QNX's people's interest in it, how, QNX offers a similar interface ("abstraction") to application programs so that their developers feel warm and fuzzy about it.
      I suspect the same thing will likewise happen with any OS that depends on C, including Fuschia.

    • @lewiscole5193
      @lewiscole5193 3 ปีที่แล้ว +2

      @ozan o.
      > As you know, processes never really
      > pause in posix, I don't know if it
      > was due to hardware restriction or
      > design error during constructing
      > of unix back then.
      I don't know what you mean by "processes never really pause in posix".
      Posix is an interface standard for OSs that just happens to look like the interface that Unix/Linux typically used to present.
      It's not an OS itself.
      An OS can be something other than Unix/Linux entirely under the hood and yet present a Posix compliant interface as is the case with QNX which is a proprietary message passing microkernel based OS that is Posix compliant as I indicated before.
      To the extent that Posix was supposed to look like Unix/Linux to the outside world (programmer), various interface calls such as a file read or write do block (pause) because that's what they in Unix/Linux historically did in The Good Old Days.
      That doesn't mean that an OS can't present natively use non-blocking interfaces internally which are look like they are blocking to the user.
      > there is also root privilege problem.
      Again, I don't know what you mean since Posix isn't an OS.
      > Plus Android turn into giant layers of burger.
      > I guess that's why google wanna leave Android.
      Android *IS* Linux by another name. Really.
      > if any other new os becomes complicated
      > and consist of many layers in the future,
      > it will be loop then they will be wandering
      > new solutions in the future:).
      Again, OSs change over time or they die.
      To the extent that everyone thinks that what they want done is the way thing should be, OS developers are likely to toss in lots of crap to satisfy different users.
      If you want a lean, mean OS for your specific machine(s)/application(s), feel free to write one yourself ... and spend forever doing it.

  • @herrbonk3635
    @herrbonk3635 3 ปีที่แล้ว +10

    2:34 _"That one clockcycle is called the latency"_ Not really, that one cycle is called _throughput_ in these contexts. The latency *for simple instructions* (like ALU reg,reg/im) usually equals the number of pipeline stages. In a simple pipelined CPU, that would be: fetch+decode+calculate+write result, i.e. 4 stages and so 4 clock cycles. For the 486, that was five stages and five cycles, for the P4 it was around 20 stages and cycles, and so on (again for simple instructions like ALU reg,reg/im).

    • @laurelsporter
      @laurelsporter 2 ปีที่แล้ว

      But, calculate can be repeated as nauseum, and as long as that can go on, write can be hidden. The full pipeline isn't executed fully for each instruction, before the next one executes.

    • @herrbonk3635
      @herrbonk3635 2 ปีที่แล้ว

      @@laurelsporter Yes, that's the basic idea with a "pipeline", i.e. having all the stages of the instruction execution fully overlappning, so that (different stages of) several instructions in a sequence can be processed at the same time.
      (Typically instruction fetch -> decode -> effective address calculation -> operand fetch -> ALU -> write-back.)

    • @TellowKrinkle
      @TellowKrinkle ปีที่แล้ว

      Don't know how people talked about the 486, but on modern processors, when people talk about latency, they mean the number of cycles from when the register value is first needed to when it's available to the subsequent instruction. If your CPU has forwarding circuitry (like every modern processor), that's only the number of calculation stages.
      For the example of an `inc rax`, if you had four of those in a row, the cpu would fetch all four in parallel, decode them all in parallel, and calculate them serially, with each one forwarding its result to the next without waiting for writeback. In the end, four (dependent) `inc rax`s would run in four consecutive clock cycles, which is why `inc` is considered to have a latency of just 1 cycle, not 20 or however many a modern processor's pipeline has. The throughput of inc is not 1 but 1/4 for a skylake processor, meaning that the processor can execute four non-dependent inc's in one clock cycle.

  • @jeffm2787
    @jeffm2787 3 ปีที่แล้ว +5

    I was writing x86 before it was called x86. Did 6502, 6809, etc. as well. Stopped when the 486 came out.

  • @VTdarkangel
    @VTdarkangel 2 ปีที่แล้ว +1

    I had to do some SPARC assembly programming when I was in school. The real advantage of it was when we had to do hardware interfaces. Those functions could have been done in C, but when I broke the object files down, I found out that the compiler was inserting a bunch extra commands that were completely unnecessary such as settings in the master register for settings that weren't being used. By doing the interfaces in assembly, I could bypass all of that.

  • @_mrgrak
    @_mrgrak 3 ปีที่แล้ว +1

    The best programming related content on youtube right now. Creel explains complex topics simply, truly a great teacher. Looking forward to the next video!

  • @brorelien8447
    @brorelien8447 3 ปีที่แล้ว +23

    14:43 I partially disagree with you on this point. Some processor like the 6502 has a little instruction set which can be easily learn (only around 56 instructions). I know an 8 bit CPU can't really be compared with a modern x64, but some embedded CPU still uses these simpler 8 bit instruction set.
    Otherwise I like the video.

    • @y2ksw1
      @y2ksw1 3 ปีที่แล้ว +2

      Well, some 8 bit processors have a lot of instructions. Of course, if you group, then almost any processor has only a few:
      Add, subtract, multiply, divide, invert, move. That's about it.
      When I teach, I actually point out that most processors can only add and negate. They do it in a very efficient way though.

    • @NoNameAtAll2
      @NoNameAtAll2 2 ปีที่แล้ว

      risk v >_>

  • @programaths
    @programaths 3 ปีที่แล้ว +1

    First year in school: Compute the volume of a cone...in assembly!
    Most student were blocked on the division!!! That's when the learn overflow AND underflow.
    I do not remember the in and out, but the division gives you a good ride if you didn't pay attention to the curriculum.
    Then that's when you are doing your work that you realize that registers can be split in different way, that there is a flag register too.
    At that time (15 years ago), there was "help PC" with nice explanations of all of this...
    Another difficulty of assembly is that it's "verbose". In higher language, "if" is identified as is. In assembly CMP+JNE,JEQ,JZ,JNZ,JNP.
    And even conditions with conjunctive or disjunctive becomes challenging.
    Another nicety was using the stack for local variables instead of trying to guess which register is safe to use ^^
    It's a bit cloudy, because it's far away now. But that wasn't that easy! It's a gymnastic on its own!
    But overall, whatever is the language, programming is really complicated.
    It's all about solving problems and expressing the solution as code...And most of the time, the problem to be solved is also to be found!

    • @WhatsACreel
      @WhatsACreel  3 ปีที่แล้ว +1

      So true! Cheers for watching :)

  • @stevem3432
    @stevem3432 3 ปีที่แล้ว

    I begun learning assembly at uni this semester and I actually enjoy it. Thanks for these videos.

  • @BlackStarEOP
    @BlackStarEOP 2 ปีที่แล้ว +1

    8:10 "Race conditions are brilliant" :D (y) Thumbs up for that... Tracking down race conditions has been the most difficult part of my career as a software engineer.
    If you implement something using more than 1 thread, if you carefully think things through, there's not much you can do wrong. However... when suddenly one guy in your team says "yes I know how to improve the performance, just put this and this into its own thread" then you know you need to buckle up. You're in for one hell of a ride...

  • @gFamWeb
    @gFamWeb 3 ปีที่แล้ว +11

    I've always pictured the talk about clock speed as analogous to how fast a cars tire can spin. Sure it can spin very fast, but if you don't have that good of traction on the tires, it's not going to help much. Same with throughout and clock speed.

    • @okaro6595
      @okaro6595 3 ปีที่แล้ว +1

      IMO the engine RPM is better.

  • @RufianEmbozado
    @RufianEmbozado ปีที่แล้ว

    Assembly will always retain two strong points. First, when you learn to code in assembly you go through a rush of "illuminations" (I'm always thinking on 8 bit platforms because they are simple enough to have a grasp on all the landscape, and because I'm that old. Nothing is yet done, you push and pull all those pesky bits all over the place "by hand", a blazingly fast hand) that put a lot of pieces of the information science puzzle rigth into place. Second, there is an inherent beauty in assebly code. Motorola 68000 had a beatiful , beautiful assembler (I crashed on it with an Amiga 500 and, man, what a joy it was! All those fancy chips at your command... Most missed piece of hardware ever). I never got that feeling when I tried to code assembly on i386. I still think learning to write assembly for any CPU is worth the price. No need to do great things, just some humble tasks. You'll have the ride of your life (as a nerd, at least) and wont fall for those kind of misconceptions. Great video, of course. Assembly has the virtue to dispell all sorts of misconceptions. But assembly itself is covered by some key misconceptions which keep it from teaching all it can.

  • @theDemong0d
    @theDemong0d 3 ปีที่แล้ว +3

    In my experience writing assembly (mostly to capitalize on AVX), yes the function call overhead is a huge performance hit, but you need to write your program in assembly anyways because when you switch to AVX intrinsics, you need to know what assembly you want the intrinsics to produce. Writing the function first in assembly makes it easy to translate into AVX intrinsics, and the intrinsics should allow you to write C++ that compiles almost exactly instruction-for-instruction identical to your handwritten assembly. Yeah, it's not quite as cool as your program running your handwritten x86, but it's the next best thing and with the call overhead eliminated, you can reap large performance boosts.

  • @DigitalPhage
    @DigitalPhage 3 ปีที่แล้ว +30

    "x86 Assembly Language Misconceptions" would be a more apt title, however a good video.

    • @TheBypasser
      @TheBypasser 3 ปีที่แล้ว +1

      Oh yeah, say Arduino compared to pure AVRASM is like a snail vs a ballistic missile (just like for the most of the RISC cores, HLL vs ASM that is).

    • @niclash
      @niclash 3 ปีที่แล้ว

      Misconception; x64 Instruction Set is a typical one. The micro controllers are typically magnitudes easier to learn fully. And then there are the funky/academic outliers, like 1 OpCode Instruction Set. But the majority of Assembly Languages out there are dozens, maybe 100 and a bit, and not the thousands in the Intel/AMD world.

  • @maxmuster7003
    @maxmuster7003 3 ปีที่แล้ว +2

    But in assembly we do not need to use the calling convention, so we have not to put some arguments on the stack. We can put arguments into some register to call subroutines. No need to use a high level programming language. We can execute 4 simple integer instructions at the same time. www.agner.org/optimize/instruction_tables.pdf
    instlatx64.atw.hu/

    • @WhatsACreel
      @WhatsACreel  3 ปีที่แล้ว +1

      I agree completely. If we're in ASM, and we're not calling C or C++ functions from libraries, we can define whatever calling convention we like :)

    • @KohuGaly
      @KohuGaly 3 ปีที่แล้ว

      Wait, are you saying calling conventions don't get optimized away? If the compiler knows all the places where the function is called, it shouldn't be too hard to just cook up a custom calling convention that juggles arguments in registers, instead of stack. I'd say that would be pretty high on the list of optimizations to implement in a compiler.
      Pretty much the only place where you can't do it is when calling 3rd party functions, like dynamic libraries.

    • @ville_syrjala
      @ville_syrjala 3 ปีที่แล้ว +2

      @@KohuGaly In c-like languages that kind of optimization can only be done trivially if the definition of the called function is in the same compilation unit (ie. same .c file). Doing it across compilation units requires link time optimization (lto). Lto is gaining in popularity, but the downside is much slower build times and higher memory requirements for the compiler, oh and compiler bugs of course :)
      PS. there can be compiler specific ways to change the calling convention for c functions as well (eg. gcc regparm attribute)

  • @lgrantcdg
    @lgrantcdg 3 ปีที่แล้ว +3

    IBM’s DB2 database for the IBM mainframe (MVS) is written in a proprietary PLI-like language. A few years ago, they increased its speed by 20 percent, just by improving the code that the compiler emitted. Computer architectures are constantly evolving, as newer and fancier instructions are added. Even if you are the world’s best assembly programmer, and know every instruction inside and out, there is no way you can update a large assembly-language code base to take advantage of each improvement in the architecture.

    • @OpenGL4ever
      @OpenGL4ever ปีที่แล้ว

      Fortunately, C has a preprocessor for such cases. It allows you to write all code in C, optimize where necessary for one or more CPU architectures in their specific assembly language and then use the C code as a fallback. And if you then have a much better C compiler. All that is needed after that is just a recompilation with the improved compiler using only the C code. Then you can see where the C compiler optimizes better. And where it's still worse, you compile the assembler routines back in.

  • @spacewolfjr
    @spacewolfjr 3 ปีที่แล้ว +2

    The legend returns! Thanks Mr. Creel.. man..

  • @roax206
    @roax206 2 ปีที่แล้ว +1

    Though from my understanding, assembly is mostly just machine code but replacing the binary instruction IDs with short nicknames for the instruction.
    Technically any compiled "higher level" language will be converted into assembly at one point (unless the person who wrote the compiler is a masochist and memorized all the instruction ID numbers). The main point when assembly becomes quicker then simply relies on whether the problem is easier to express in assembly language rather than the HLL used and to what level you are willing to manually optimize the assembly code.

  • @Cubinator73
    @Cubinator73 3 ปีที่แล้ว +8

    15:49 I think you got something wrong there. Obviously, assembly is needed in all sorts of things like programming compilers and optimizing low-level routines. The "misconception" that "assembly language is no longer needed due to optimizing compilers" expresses the fact that your average programmer doesn't need to write assembly himself because far more competent people already did it and made their optimized routines available in the optimizing compiler. I myself only ever used assembly to explore how CPUs work and how compilers optimize stuff, but I never NEEDED to write my own assembly code for my own projects.

    • @lewiscole5193
      @lewiscole5193 3 ปีที่แล้ว +2

      That's nice ... OTOH being a former OS maintainer/developer, I used assembly a lot, not just because most of the OS was also written in assembly (which it was), but because it gave me control over data/code placement that no available compiler did/could, which was especially important in the bootstrap code I was responsible for the care and feeding there of.
      And I suspect that's still true ... the hardware defines and uses data structures that I don't want/need a compiler guessing what sort of code should be generated for.

    • @WhatsACreel
      @WhatsACreel  3 ปีที่แล้ว +3

      Yes, I do wish that the proper position of ASM was expressed more clearly in computer science education. I was taught to fear the language during my degree, encouraged to neglect it entirely. Maybe it’s different in other institutions?
      I do not disagree entirely with the sentiment. But I do think it is skewed a little too far away from ASM. I think learning ASM for OS development or to understand the CPU are excellent applications!
      Cheers for watching and commenting folks :)

    • @lewiscole5193
      @lewiscole5193 3 ปีที่แล้ว +2

      @@WhatsACreel
      I have no idea how ASM is being taught in schools these days, but back when I was a student -- just after the dinosaurs had been killed off by an asteroid -- there was no question that any non-impaired human could outdo a compiler in terms of generating fast/small code.
      The reason why you were supposed to use an HLL was because it increased programmer productivity.
      Studies had supposedly been done that showed that the average number of DEBUGGED lines of code that could be produced per programmer per day was about TEN (10) independent of programming language.
      And because each HLL statement typically turned into more ASM line, that meant that if you could use an HLL, you should because you could potentially get more done using an HLL than you could ASM especially in terms of code that was supposedly "portable" across platforms.
      There were also supposedly studies that showed a wide variation in programmer output as well and so YMMV, but familiarity with a particular language also had a lot to do with programmer productivity (I don't recall how much).
      The gist of this is that I usually write in ASM because that's what I'm most familiar with, and because I'm no longer getting paid for what I write, it's my choice.
      I can speak C if I have to, but I don't consider myself fluent and I simply don't see the need to spend time becoming more fluent in C when I can do what I want probably (?) faster in ASM.
      What bothers me is that people who seem to shy away from away from using ASM seem to think that there's something fundamentally different in how you generated ASM code versus an HLL thrown at a compiler.
      To me, though, that's not the case.
      When I occasionally do write HLL code, I do the exact same thing that I do when I write ASM code, the only difference being how far "down" I "refine" the code before I come to a valid HLL or ASM statement.
      I just don't understand what it is that makes people think there's something special when it comes to how to write ASM code versus HLL code.
      It makes me think that maybe too much time is spent teaching the structure of various HLLs and not enough on how to think and solve problems.
      Just my opinion ....

    • @WhatsACreel
      @WhatsACreel  3 ปีที่แล้ว +2

      @@lewiscole5193 Ha! I know the feeling! I learned in the 90’s. Things have changed a lot since then. Especially Assembly language. It’s gone from maybe 100 instructions and 16 registers to massive SIMD register files and 3000 instructions!
      I certainly agree that programmer productivity and portability are very important. And the choice of language is a big part of that. Sometimes ASM is a good fit, and sometimes it is not. I do love how fast it can be, and how flexible. There’s some brain-melting, deep trickery that is natural to ASM, which is too low level to be practical in HLL’s. But for the most part, anything is pretty achievable in any language, and so it becomes a matter of choosing the best tool for the job.
      I couldn’t agree more! The problem with ASM is the perception of it. Folks shy away from it in a way that might not be warranted. It’s just a language, after all. IMHO, it’s a really fun and powerful language.
      I do love a good bit of HLL code too, but ASM will always hold a special place for me. If for nothing else, I made a video about ASM 10 years ago and put it up on TH-cam, and have since built this little channel :)

    • @lewiscole5193
      @lewiscole5193 3 ปีที่แล้ว

      @@WhatsACreel
      Ten years? My how time goes by when you're having "fun".

  • @DukeDudeston
    @DukeDudeston 3 ปีที่แล้ว +2

    "You can do a lot of stupid things in any language"
    I was able to delete ntfs.sys in a language called "DarkBASIC" when I first started out. So yes. You can do a lot of stupid things in languages.

  • @y2ksw1
    @y2ksw1 3 ปีที่แล้ว +8

    I have been programming for a vast time of my life in Assembly, and the most challenging tasks were to write code in a way, to run in parallel in the separate pipelines (super scalar). The example you have given, would have been rewritten, eventually longer, in order to get the parallel mechanism working. One way would be:
    mov ebx, eax
    inc eax
    nop
    inc ebx
    So the first two run together, and the resting again. And we would gain at least 2 clock cycles.
    However: assembly made a lot of sense in the old days. Now, with multi-core multi-scalar processors and the brilliant optimisation of compilers, Assembly code died pretty much out.
    I still use it on special hardware though. I am eyeballing the Raspberry Pi Pico, for example 😊

    • @OpenGL4ever
      @OpenGL4ever ปีที่แล้ว

      inc eax
      mov ebx, eax
      Does the same job as your code and requires less RAM.

    • @y2ksw1
      @y2ksw1 ปีที่แล้ว

      @@OpenGL4ever It's not a question of memory, but to get part of this code running in a different pipeline and thus double up the speed.

    • @y2ksw1
      @y2ksw1 ปีที่แล้ว

      Your code would run 4 times slower

    • @OpenGL4ever
      @OpenGL4ever ปีที่แล้ว

      @@y2ksw1 Why should it? In my opinion it runs at the same speed.
      Your code might do
      mov ebx, eax
      inc eax
      in its own pipeline, but
      nop ; does nothing
      and
      inc ebx
      depends on the mov ebx, eax before.

    • @y2ksw1
      @y2ksw1 ปีที่แล้ว

      @@OpenGL4ever If you do first an operation on eax, and then use it to assign its value to another register, it stalls and waits to settle just that tiny bit which doesn't allow to move the code to the other pipeline. I have been timing these instructions very accurately and your assumption, while are technically correct, perform way less efficient. On time critical applications, such as real time graphics manipulation I was working for, the code alignment and sometimes illogical reordering of instructions, made the difference of fluent or staggering graphics.
      I got mainly the filter and render code prepared by graphics specialists and my task was it to speed it up. But also big number mathematics and operating system libraries. Most of them grew noticeable in size, but were of unmatched speed.

  • @danepane527
    @danepane527 2 ปีที่แล้ว

    The algo sent me here.. was watching a bunch of Coach McGuirk videos.. subbed!

  • @EvilSandwich
    @EvilSandwich 3 ปีที่แล้ว +4

    I like to program for old systems like the Apple II and the NES, so I code a lot in 6502 ASM. Believe me, you start to miss high level after a while.
    You guys ever try Hello World when you have to explain to the computer how to read and print strings before it can even do that? Heck, the NES doesn't even have ANY internal ROM, so you have to draw the letters manually before you can even start on strings. lol

  • @k7iq
    @k7iq 3 ปีที่แล้ว +5

    I program ARM in C lately... I find that being able to view the ASM output helps to reduce my C code operation. For instance, recently looked at a particular IF statement that I suspected might not work the best that it could and found that defining one of the variables as local register int32_t it reduced the time of that bit of code by two and the size was a bit less two.
    Also, needed to create an ASM function, a float to int function because the compiler did not output the FPU instruction for the rounded version of that instruction.
    ASM has it's uses but mainly, for me I think, in debugging C code.

  • @AngDavies
    @AngDavies 3 ปีที่แล้ว +1

    Minor nit/clarification: while you definitely need to know assembly on a deep level to be able to code an optimising compiler- after all, it's a program that turns code in a given language into as efficient/fast machine code representation as possible.
    That doesn't mean you necessarily should write one in assembly itself- it wouldn't make faster code, only code, faster.
    The better option is often to write the compiler in the language that you intend to compile with.
    You spend loads of time writing a compiler that can create really optimised code for a given platform, build it using some existing compiler, which doesn't make very optimised code, and so the compiled compiler takes ages to compile code.
    But now you've just created a program that turns your code in your language into optimised machine code, so just feed the original code through the new compiler, and you now have an optimised optimizing compiler :D
    Having just "GCC" that compiles to your machine is so much better than having to find a version of GCC tailored to your exact platform

  • @vikassm
    @vikassm 3 ปีที่แล้ว +1

    Fantastic video and channel! Subbed.
    My 2¢ about the poor audio: Use your mobile phone with a ~5$ lapel mic to capture your "B-Roll" audio 🙂
    That way if your nice desktoo mic doesn't record for some reason, the backup audio from your cellphone is still wayyyyy better than the absolute garbage camera mic.
    Just clap once (Aaand ACTION) at the beginning and the end of each take to simplify A/V sync during editing.

  • @wrtlpfmpf
    @wrtlpfmpf 2 ปีที่แล้ว

    One thing doing a project on a small assembler can really help is with coding style. I used to write multiple screen long functions with control structured nested several levels deep. Writing in assembler can really teach you how to write code that is as simple as possible, yet correct. I once did that for a little project on an ATMega. Those are cute little 8-Bit micro controllers. Since they have different addresses for RAM and Flash, programming them in assembler is a lot less painful than, for example, C. Anyhow that project really helped me write readable code when I later did C projects. I later played around with those microcontrollers in C and looking at the assembly created by the compiler I have to say that it's highly dense.
    (The rationale behind assembler was that I had more experience with AVR assembler and that that code would use the remaining flash program storage as data storage, something that is even harder to do in C)

  • @mikefochtman7164
    @mikefochtman7164 3 ปีที่แล้ว

    Good information. When we had some ASM instruction dependencies, we sometimes would look down a few lines and see if we could move some other instruction in between the dependent instructions. That meant we could space out the two dependent instructions to let the first one finish and give another ALU something to do while the first one crunched.
    Also worked on a different processor that had a special increment. Used in the OS interrupt handling, it had a couple of instructions that were non-interruptable so we could guarantee that the increment and sto would be atomic.

  • @gideonz74b
    @gideonz74b 2 ปีที่แล้ว +1

    @Creel: Executing an instruction in one cycle does *not* mean that the *latency* is one cycle. It means that the *throughput* is one instruction per cycle. The latency is always a lot more than that, because it has to pass through the pipeline.

  • @rfvtgbzhn
    @rfvtgbzhn ปีที่แล้ว

    From what I heated, you can get a significant performance boost in some cases by disassembling the compiled code and rewriting parts in Assembly language.

  • @trashtrashisfree
    @trashtrashisfree ปีที่แล้ว

    I always wrote a good macro library for the assembly I was working in. System 360/370 didn't even have stacks so my first priority was writing things to push and pull values and create subroutines. Everyone else was hand-cutting every single line. Far more error free. Same for other issues in 6502.

  • @sergiomarroquinjr3587
    @sergiomarroquinjr3587 2 ปีที่แล้ว

    I always seem to learn something new from you. Keep it up!

  • @PaulaBean
    @PaulaBean ปีที่แล้ว

    When the rubber hits the road, you can always benchmark the speeds of your C++ code against assembly code. Measurement trumps speculation. Thanks for the nice video!

  • @PvblivsAelivs
    @PvblivsAelivs 3 ปีที่แล้ว

    I have seen many people say that compilers do these wonderful tricks and that hand-coded assembly language is not (generally) faster than a compiler's output. While there may be some compilers that do this, no compiler I have actually used does so.
    "You might get the right result."
    Especially if you use the lovely little LOCK. Any processor that can feasibly be part of a multi-processor system needs a way of executing al least certain instructions without interference from other processors.
    "The CPU will perform the instruction a lot slower."
    It will if two processor units are trying to access the same memory at the same time. After all, one must stall. But the processor that "gets there first" has a negligible performance penalty. It was a two-cycle penalty on the 8086. (I only have timing information up to the 486.)

  • @CallousCoder
    @CallousCoder ปีที่แล้ว +1

    ARM 64 cpus actually have a couple of assembly dialects. You have your AARCH64 but also your Thumb instructions, which are a small instruction to save space.

  • @BrightBlueJim
    @BrightBlueJim 3 ปีที่แล้ว +1

    So to summarize a couple of things you said:
    1) Functions written in assembly don't really run faster than compiled functions.
    6) Assembly is still necessary for low-level optimization, where speed is really important.
    Also, your point on atomic operations applies just as directly to C and C++, or indeed for ANY program written to take advantage of multi-threading.

  • @_Stin_
    @_Stin_ 2 ปีที่แล้ว +1

    It all comes down to machine code at the end of the day and high level code gets compiled to machine code just like assembler has to, HOWEVER the instruction set (especially in x86 or CISC) is now so complex that most human coders (maybe not Sophie Wilson lol) would be unable to write the required code for the desired performance... It's complicated 😉
    Everything else seems correct enough with some specifics missing, but it gets the gist across. Nice video.

    • @_Stin_
      @_Stin_ 2 ปีที่แล้ว

      This is VERY wrong, though... Assembly is NOT machine code. Assembly is NOT hard coded into the CPU, the machine code is. Assembly is a single-layer, human abstraction of machine code.

  • @kindpotato
    @kindpotato 3 ปีที่แล้ว +5

    "race conditions are brilliant" This guy is awesome.

  • @microdocker
    @microdocker ปีที่แล้ว

    Very good and explanatory shot.
    One small weired thing (not related to the topic) is, guy is literally sitting in front of a mic and still recording his voice on oncamera microphone ^_^

  • @codenamelambda
    @codenamelambda 3 ปีที่แล้ว +2

    Well, I'm pretty sure even naive assembly is going to be faster than Python
    Though that isn't exactly a fair comparison.
    Also, inline assembly is a thing in many compiled languages, so "stay in asm as long as possible" is not *entirely* true. Sure, the compiler is still hands off there, but it *can* inline the function around the inline assembly.

  • @sambrown9494
    @sambrown9494 3 ปีที่แล้ว

    Very interesting stuff, enjoying these videos. Hope you don't mind my asking - is that microphone actually turned on? It's a bit echoey like it's the camera microphone doing the recording across the room ..? Looking forward to more vids! Thx :)

    • @sambrown9494
      @sambrown9494 3 ปีที่แล้ว

      Ha umm sorry! I commented and only then read the description. Already covered. Just so you know I was paying attention! ;) Rock on ...

  • @connclark2154
    @connclark2154 3 ปีที่แล้ว +1

    I think one thing that wasn't mentioned was assembly allows you flexibility that higher level languages do not. With this flexibility you can implement more efficient algorithms. For example in between assembly routines you can return more than one value from a function by using a custom calling convention. Its the ability to leverage the freedoms that gives assembly its power and performance.

    • @bigshrekhorner
      @bigshrekhorner ปีที่แล้ว

      That's not something exclusive to Assembly.
      C is able to do this by using pointers as function arguments. Even higher level languages are also able to do this by using tuples that mix types (or simply the same type), or with methods similarly to C, if they allow memory management concepts like pointers.
      Compilers and compiler engineers are extremely smart and definitely way smarter than me or you. That means that if you have thought of an efficient implementation of an algorithm in Assembly, it's also pretty likely the compiler engineers have also thought of it and implemented it. At least if we are talking about mainstream compilers, like GCC or Clang (for the case of C/C++)

  • @thadtheman3751
    @thadtheman3751 2 ปีที่แล้ว

    Actually part of the complexity of assembler comes from the fact that "decorations" of instructions are not uniform. To clarify I will make up an example (it's been a while so don't expect this to be a real world example ).
    You might have INC A,N.
    increase A by N.
    A might be a memory location and N a number (direct addressing)
    INC $A, N
    A might a memory location pointed to by a memory location (indirect addressing)
    INC [$A],N
    N might be a memory location
    INC A,$N
    ...
    THe thing is that some comands accept some of these addressing modes and other do not. A JMP forexample might exceprt all addressing modes, abut a JSR would not. So it get complicated keeping track of which instruction does what.

  • @thomasmaughan4798
    @thomasmaughan4798 2 ปีที่แล้ว

    There was a time when assembly was much faster than compiled but eventually the compiler optimizations produced code that executed efficiently. Depending on what one is doing, assembly is considerably smaller. A function in COBOL to parse a text file was 30 kilo-words and took 30 seconds to execute; I re-wrote it in assembly and it produced an executable that was only 3 kilo-words and parsed the same file in 3 seconds. 1/10th the size and ten times faster! But that extreme example is a result partly of COBOL not really a good choice for that sort of thing and my re-write also used static linking; everything it needed was already linked in the executable so at run time, no "fixups" were needed.

  • @erwinmulder1338
    @erwinmulder1338 2 ปีที่แล้ว

    I grew up programming home computers in the 1980s. You had to write assembly (and sometimes even translate it to number by hand) to make anything that would run faster than at a snail's pace. I mean 8 bit computers at 3.5HMz are not incredibly fast at anything. So if you had BASIC, which was interpreted (not even compiled) that was SUPER slow. You couldn't even draw an entire screen in one second most of the time. These days, I mostly work with assembly in writing (toy) compilers for my own programming languages. In the end, what any compiler really does is basically translate the source code to assembler instructions.

  • @FORTRAN4ever
    @FORTRAN4ever 3 ปีที่แล้ว

    I programmed in assembly on a Sperry Univac 1143 mainframe computer in the early 1980's. Each instruction consisted of a 36 bit word. Commenting was a must. I would prefer to program in FORTRAN or COBOL anyday over assembly.

  • @Lantalia
    @Lantalia 2 ปีที่แล้ว +1

    So, with regards to #1 inline assembly skips the function call overhead, the main reason to do it is to do it is to use instructions not yet supported by your compiler

  • @kevinz1991
    @kevinz1991 3 ปีที่แล้ว

    great information and great delivery. thanks a lot for the time you put into this. subscribed

  • @xeridea
    @xeridea 2 ปีที่แล้ว +1

    Older compilers were known for being slow, and assembly was often used, especially in early consoles. Modern compilers are highly optimized. Besides all the basic stuff, they have all sorts of tricks for optimizing multiply, divide, and what instructions to use, even specific to CPUs if you want. Sometimes CPUs have weird quirks that compiler developers can take advantage of, or at least avoid penalties. Optimizing multiply and divide goes beyond obvious stuff, like bitshifts for powers of 2, they have all sorts of tables for methods for various numbers. Often they can even convert loops into SIMD instructions automatically. If not, doing SIMD completely manually is very tedious, there are methods available in some lower level languages to make it a bit easier.
    Some things can still be hand optimized, but requires very in depth knowledge of CPUs, and even then, may not even be faster. For most purposes, not worth it, though some low resource embedded systems, some drivers, and some other niche cases benifit.

  • @den2k885
    @den2k885 2 ปีที่แล้ว +1

    Compilers optimize very well... for general purpose code, without knowing its data layout. It's very difficult that a compiler will use SIMD instructions and in the rare cases it does it won't make use of the inner characteristics of your problem, as it has no knowledge of them.
    Using Assembler I managed to douvke a linear Sobel algorithm performaces and triple a segmented integral table algorithm's performances. Not even Intel compiler managed to equal those times.

  • @derzweistein8973
    @derzweistein8973 3 ปีที่แล้ว +1

    Where do i learn "everything that [i need to lern] about a computer" to gain significant speed in assembly ? (especially the fun hardware stuff like ooo Execution, Loop Streaming, difrent Execution Engines)

  • @GogiRegion
    @GogiRegion 2 ปีที่แล้ว +1

    I’ve actually looked into virus programming, and commonly out of curiosity, and it looks like good hackers will use C and then compile to assembly for optimization, then assemble it. That’s assuming that you need high level functions in order to do what you need, you want it to take up as little space as possible so it’s harder to detect, and possibly want to remove null bytes (which is supposed to allow your code to work with a wider array of hacks since some rely on a lack of null bytes). It’s actually an interesting topic, and from what I was reading, it sounds like C is preferred over assembly for the same reason Linux is shown in primarily C.

  • @wingman2tuc
    @wingman2tuc 3 ปีที่แล้ว

    Modern CPU are also "deep" pipelines. Fetch -> decode -> exec ->mem access-> rightback.As a very simple example.
    Todays CPU can have 20 to 40 steps for completeing a single instruction.
    Things can be pipelined but you need a very inteligent a complicated forwarding unit and branch predictor in order to take advantage of pipelines.
    Understanding modern cpu architecture is a must in order to use ASM eficiently. Also ASM can be cpu spesific so it may not work in other cpus.

  • @SimonClarkstone
    @SimonClarkstone 3 ปีที่แล้ว +1

    4:58 I am amazed how slow division is. I thought that was 90s or 00s slowness, and that modern CPUs took 5 cycles or something.

    • @alienrenders
      @alienrenders 3 ปีที่แล้ว +1

      To make things worse, there's usually only one ALU that can perform the instruction. If you don't use vector operations, it's downright dreadful performance for multiplies and division.

  • @rjones6219
    @rjones6219 ปีที่แล้ว

    Assemblers and machine code is where I did all my programming. Obviously writing in assembler takes more time than a higher level language. But the code space can be more efficient.

  • @emjizone
    @emjizone ปีที่แล้ว +1

    3:53 This "one instruction per cycle" might be true for the oldest machines, with no clever vectors and lookups and with a very limited set of instructions. This might explain why people believe it to be still true today.
    In that case you'd have to program most of usual math functions yourself (modulo, square root, etc…) and they would take several cycles anyways.

  • @WolfCoder
    @WolfCoder 3 ปีที่แล้ว +3

    The only time I've written assembly was for the 6502 (because its fun), the Z80 clone in the Gameboy (because its fun and the only compiler I found was terrible and couldn't handle ROM paging well, etc.) and the ARM7 DTMI in the GBA where, while there's a port of gcc for it, you still have to write assembly for heavy duty subroutines like interrupts, audio engines, etc. as the compiler optimizations don't seem to work as well in the gcc port. For x86-64 though? Uh.. I think I'll let the compiler have the 'fun' when it comes to that.

  • @cthutu
    @cthutu 3 ปีที่แล้ว +1

    INC RAX won't execute in one clock cycle on a x86 because of fetching and decoding. However, pipelining can make it seem like it does.

  • @seneca983
    @seneca983 3 ปีที่แล้ว +3

    14:36 "The difficulty of assembly is the number of instructions."
    Is this part specific to x86?

    • @kelvinyonger8885
      @kelvinyonger8885 3 ปีที่แล้ว +2

      afaik this whole video is for in-vogue modern uarchs (x86/x64, ARM)

    • @seneca983
      @seneca983 3 ปีที่แล้ว +1

      @@kelvinyonger8885 Doesn't ARM have far fewer instructions than x86?

  • @JerryThings
    @JerryThings 3 ปีที่แล้ว +3

    I didn't know that x87 instructions were slower than x86 !

    • @WhatsACreel
      @WhatsACreel  3 ปีที่แล้ว +3

      Oh, yes, some of them are slow. Some are quick too. The 80 bit floats are amazing :)

  • @MaximYudayev
    @MaximYudayev 3 ปีที่แล้ว +1

    That's mostly applicable to general-purpose CISC, no? For example RISC, namely ARM, RISC-V (okay, not always), PIC and other embedded processors and DSPs, execute instructions in one clock cycle and seem to be the main targets for optimization in ASM where compilers are not smart enough to take advantage of all the ins and outs of the dedicated CPU.

    • @WhatsACreel
      @WhatsACreel  3 ปีที่แล้ว

      Yes, I do recall the PIC is designed to run instructions at the same speed. Except for branching, maybe?? My memory is a little shaky there. ARM takes different times for instructions much like x86. I was definitely thinking mostly about x86 in this video, but most of it is applicable to other hardware.
      Cheers for watching mate :)

  • @michaelbuerge
    @michaelbuerge 3 ปีที่แล้ว

    Great stuff. Interesting and relevant info. Thanks.
    Allow me a remark about audio: You invested in a nice mic. Now you might want to think about the room you're recording in. Maybe put something absorbing in place to reduce room reverberation.

  • @homomorphic
    @homomorphic 3 ปีที่แล้ว

    A little misleading. The LCK prefix locks the memory location referenced by the instruction it doesn't "make the instruction atomic".
    Because that memory region is locked *any* instruction (not just another INC) will stall on that memory address. As you state there is a whole lot more complexity layered on top of that with cache.
    Also, this is just one way of handling parallelism, the PPC architecture uses opportunistic reservation. In this mechanism the memory region is marked as "reserved" which doesn't mean that parallel executing instructions can't change it, just that if another instruction does change it the next store from the reserving core will fail (so whichever core writes first wins and the other core needs to restart with the load to register again). This mechanism achieves greater parallelism than memory locks do in most real world code (because contention is relatively rare).

  • @tchiwam
    @tchiwam 3 ปีที่แล้ว +1

    Would be fun to see a video on transforming locked multithread to lockless thread with a thread manager and completely lock less multithread manager.

  • @dcocz3908
    @dcocz3908 3 ปีที่แล้ว

    I agree but there are lots of situations where the compiler simply fails for example gnuarm won't use multiple load and store properly which for me generated a lot larger code that wouldn't fit in SRAM so it had to run with wait states from flash on my project. By re-writing it in hand assembly allowed me to get a much smaller function, allowing it to be moved into SRAM with the data that was required by application and that is where I got a really large speed improvement. I couldn't have done it without swapping micro for larger memory footprint using just compiler

  • @DanEllis
    @DanEllis 3 ปีที่แล้ว

    I was a bit puzzled by the first one. You seemed to be suggesting that _calling_ code written in assembly language was slower (disregarding the actual execution time of the function). But of course that's not so.
    Regarding malware, it's sometimes necessary to write code in assembly language to strictly control what machine code is generated. For example, to ensure there are no zeros.
    Finally, "assembly language is etched into the chip". Not really, though. The ISA doesn't dictate the syntax of the assembly language. For example, x86 has two very different syntaxes (Intel and... the other one. AT&T?)

  • @ug333
    @ug333 3 ปีที่แล้ว

    Great information, great knowledge
    Side note: what's up with the audio?

  • @user-sl3gc5of6c
    @user-sl3gc5of6c 3 ปีที่แล้ว

    Thanks for the video
    1. It’s obvious that for CISC CPUs, asm is not the lowest level. It’s the lowest programmable level. What about a performance advances in comparison with RISC - the time will show us the Intel’s answer to Apple M’s and Raspberry Pi’s
    2. Asm is not intended for the framework programming. It shows it’s best at critical sections or SIMD. For example, asm provides amazing speed when computing 8 points of fractal at once with AVX and FMA instructions. But the next Intel’s fail is that moving data between YMM registers is impossible. The process slows down while exchanging through ALU and memory.
    3. Actual 64-bit calling conventions and 8-bit stack address arrange are the headshot!
    3.1. We have 16 general registers but we use only 5 of them most commonly (rcx, rdx, r8, r9, and rax).
    3.2. 8-bit address arrange is uncontrollable in large scales. How do we suppose to hold the address in a program with 100k line code with 100 procedures?
    4. Most of the words said about asm are related to CPU architecture and OS solutions

  • @jp5000able
    @jp5000able 2 ปีที่แล้ว

    Back in the early 80's I did some 6502 assembly programming. What made it so difficult, the cpu was only 8 bits. There were no instructions for 16 bit numbers and floating point numbers.

  • @Alex-op2kc
    @Alex-op2kc 3 ปีที่แล้ว

    Creel's back on his cubemaps!

  • @coder2k
    @coder2k 3 ปีที่แล้ว +1

    Looking forward to seeing that next video you already teased :)

  • @controlflow89
    @controlflow89 3 ปีที่แล้ว

    Absolutely amazing channel, keep up the great work!

  • @greywolf271
    @greywolf271 3 ปีที่แล้ว

    Feel the compiler Luke, be the compiler. Only then will you be a compiler jedi

  • @pierce8308
    @pierce8308 3 ปีที่แล้ว

    Two dumb doubts:
    1.) 2:15 , By "single clock cycle", do we mean the cycle for a pipeline stage? Cuz I recall that something like for nonpipelined processors, 1 clock cycle is meant to describe the execution time for one instruction, and the size of the cycle must be large enough to accomodate even the slowest instruction. So "1 clock cycle" is usually a term to describe one pipeline stage delay (of the slowest stage), since most processors are pipelined these days ?
    2.) 6:00, Arent ASM instructions atomic in the sense that they *will* complete whole? As in ASM instructions are not atomic with respect to each other, but with respect to themselves, since two ASM instructions executions steps can be interleaved(like in the INC example you desrcibed), however a *single* ASM instruction will complete whole and not be incomplete, like for example: cant be interuppted by an incoming interrupt/trap.
    Just started learning about Comp Org and Arch, so pardon if the queries are too silly to ask.
    Thanks

  • @thingsiplay
    @thingsiplay 3 ปีที่แล้ว +2

    The biggest problem with Assembly is it is not portable to other architectures. That is why I would never consider to learn it, if I ever get to that level of knowledge.

    • @WhatsACreel
      @WhatsACreel  3 ปีที่แล้ว +1

      That's true. Always a trade off between performance and portability. High level languages are excellent too :)

    • @thingsiplay
      @thingsiplay 3 ปีที่แล้ว

      @@WhatsACreel Yes, but the best thing is, we can combine and mix and match and use the best of both worlds (to an extend).

  • @LukeAvedon
    @LukeAvedon 3 ปีที่แล้ว

    Wonderful video! Glad you are back.

  • @sikkavilla3996
    @sikkavilla3996 3 ปีที่แล้ว

    Happy Holidays @Creel!

    • @WhatsACreel
      @WhatsACreel  3 ปีที่แล้ว

      Happy holidays to you :)