@@Erich-Lab wouldn’t that just be a large enough FPGA or ASIC that takes the whole array as an input and simultaneously computes the sum of every pair of numbers and returns a solution. Since everything is done in parallel assuming you have enough FPGA resources and the array is already loaded it should just take as much time as a single addition takes no matter the size of the array.
@@conorstewart2214 everything still takes o(n) because unless you have infinitely amount of threads to do parallelism, say you have 1 billion n, no way you can have 1 billion threads. Say you do not have infinite threads (100) then it would be N/100 which as N goes to infinity, it is still N.
Very, very impressive, and excellent to see some real(!!) assembly skills on youtube. You would have liked the 80's and 90's 'golden era' for assembly coders, certainly from a game making perspective. Things have certainly moved far with CPU instruction sets, but reassuring to see so much familiarity with those opcodes and concepts. Subbed.
I love assembly! And I loved your cheeky online assembly work around. Did you get assembly in college? I am amazed that most 30 something and below colleagues never ever did anything in assembly. I think I was the last generation where we did a hell of a lot of assembly. Mainly Z80 and 8086 and 68000. I studied EE (digital electronics) and back in 1990 you didn’t have Arduino and cool ESP32s that you could program in C. So when we designed an SBC, we also needed to make it do a certain task and that was basically done in assembly. Was the quickest and easiest way to iterate.
I did a bit of AVR assembly in my EE pipeline in college and I played a lot of capture the flags as well :) I feel like assembly did the most for teaching me how and why code works/breaks.
@@LowLevelTV I totally agree! To me assembly made me understand how computers work, and through that made my EE study simple. I didn’t know you also did EE?! Us EE majors for the win 😁 I actually did EE because back in 1990 there wasn’t a deducted CS program yet. It was bunched in with EE.
@@modipy5703 cool! And MIPS is a nice CPU, lots of registers and great branching. Definitely a good foundation for C and higher languages. Which college do you attend?
@@CallousCoder I attend university of Wisconsin - Oshkosh. We’ve learned Java, then C, then MIPS so far. Learning MIPS has opened my eyes to the basics of computers and how doing certain things in higher level languages can speed up or slow down programs. Assembly is hard but satisfying once you get it to work, and even more satisfying to optimize.
Amazing video! Doing ASM this semester and loving it. We are doing dbl precision fp and bit wise ops right now. Just got done creating custom functions and c function calls. Love this class it’s fun.
When accessing pointers multiple times, it helps to store the pointer in a temporary register (if you need it from the stack) and then doing your "lea" operations as needed. The "lea rbx, [rbx..]" method works fine if you only need to access it once. But in this case, it would've been better to not clobber rbx and instead store the first result into another temporary register, removing the need to get setup rbx by accessing the stack again for the second result. As well, there were still plenty of registers available (r10-r15) so using the stack wasn't necessary in this case.
the __attribute trick was really interesting - I wonder if it would work to integrate a C function into some asm code with non-standard calling conventions... and reminds me of another question: is there a way to dynamically parse C structures into nasm files in a mixed project so that the structures can change as needed by the expanding C codebase without having to rewrite asm code that also uses them? also, had a quick go at the O(n) solution in C, looks like it would be easy enough in asm too... also also, why put input values on the stack? you had enough registers to not need the stack at all except for storing registers that can't be clobbered, and then you wouldn't have needed to keep pulling those values back from memory... also also also, thinking now about the extension to this problem where there is a list of targets to find pairs for in the same dataset, maybe has to be sorted lists for reasonable optimisation...
The main thing to learn here is that C inline assembly syntax and boilerplate is terrible. One of the only good reasons to learn pascal is that freepascal does inline assembly (mostly) right. Big ups for torturing yourself into getting used to this hell. That being said, here are my ASM nitpicks: Mov'ing into ebx just to compare eax and ebx when you scratch ebx after compare instead of doing 'cmp eax, dword ptr [rsp+0x20]' Using 'inc' instead of adding 1 tends to be slightly slower on some CPUs, specifically some intel CPUs. Using dword mov operations while you allocate the full 64 bytes anyway - you are saving a bit of instruction cache when you should be saving instruction cache AND memory cache. If rsp isn't aligned to cache boundaries (and it often won't be) you might lose some cache just to accessing your arguments instead of reserving it for an already cache heavy array search operation. While XORing a register to itself does often cancel during instruction parsing, in my experience it sometimes won't (for whatever reason) so it's usually best practice to xor dwords even on higher registers (I think it has something to do with μ-instruction hardware often wanting to combine successive instructions but I'm not certain exactly where the slowdown comes from). I don't just have critique, I really like how you did 'lea rax, [rax+4]' even though 'add rax, 4' is less complex and fewer letters because it makes it completely clear without a comment that rax is supposed to be a memory pointer, not a normal value. You also did 'cmp' to memory addresses in multiple places so my above nitpick is clearly of a freak occurrence.
@@guinea_horn I'm just an enthusiast assembly programmer. You just get these sorts of esoteric opinions from doing enough pure assembly, userland programming and being actively interested in optimizing things like memory footprint and performance.
This is pretty cool, though if you wanted to make it easier to follow I'd do it outline instead and get syntax highlighting. I prefer nasm syntax over gas Intel, but at least you didn't do it with AT&T syntax. That would've been the worst.
A quote I liked, “Education isn’t something you get. It’s something you take.” You’ll learn a lot more spending time reading, solving as many exercise problems as you can and writing as many programs as you can than you will sitting down in a lecture and expecting that you'll magically learn everything.
@@Dsksea one thing I've always said about college is that I never learned anything from college, but the things I learned in college are invaluable and I'd do it all over again just for those things.
I've been trying to do leet code in assembly, I use it mainly for the challenges that need you to make some math operators. But also used it for quick text windowing in such, or if I need to brute force something.
There are a few ways to do pointer arithmetic on x86, LEA is a very common way but you also see it in a few other places such as: sub rsp, 0x40 mov qword ptr [rsp+0x10], rdi Pointer arithmetic is treated the same as integer arithmetic by the cpu, since the only real difference is that your additions/subtractions are based on the size of the data type being pointed to. Assembly doesn't know what's being pointed to however, which is why he had to specify "qword ptr" when dereferencing
I love the video. It would have benefited even more, if you didn't prepare the algorithm, but implemented it live with all the iterations and bug fixing. I already understand every command but have the most trouble "interacting" with assembly while developing.
I used to love programming in assembly on the 68000... Then I came across weird choices of instructions by gcc based on knowing execution times And Then out of order execution on the RISC and realised the compilers would win. I still would have tested for Outer loop value at i being > target to skip entire inner loop iterations.
Hello LLL team, can you guys make a full tutorial for writing assembly code for ARM and arm64 CPU? I want to know more in depth knowledge how loader loads the program in memory and how all the sections are prepared before running main().
I can't decide if I should or shouldn't call out that this isn't pure asm as they variables at the top are C variables as in assembly "int" is an interrupt.
I tried this last year for advent of code. Unfortunately I was using the Amstrad CPC which is only an 8-bit cpu. Most of the questions required way more than 64k of ram to solve
This is solvable on a 64K, actually most leetcode questions that are easy to mid can be done on a Z80 with a whopping 64K. The only problem is loading test data. Which you either need to compile in your binary as a table. Or create pseudo random values.
Something like this isn't too bad. When you're asked to use unixdict or something similar things get messy. It's possible with bank switching but it's not easy.
"We''re gonna set up the stack frame by pushing rbx" Yeah sure bro 😂 Anyways, would have been cool if you had solved this using rep scasd or even simd, oh well.
He's using GCC inline assembly, which in my opinion has horrendous syntax. It's much easier to use a dedicated assembler and use extern keywords to make the function visible to C
why are we storing the parameters on the stack i.e putting them into memory? i would just work with rdi,rsi,rdx,rcx the way they come (i never saw anywhere that u had to store them on the stack)
This is the convention a C compiler uses to write assembly code. It does this so that you can call any function within another function without everything falling apart. Otherwise managing all these functions becomes a huge mess
@@Cagri2001 They’re “caller” saved registers, so whenever you call a function you have to store them bc they can be used/overwritten. Ig C compiler does this automatically to avoid keeping track of if there’s a function call
@@Cagri2001 yup, callee saved registers always have to be saved. Caller saved can be cheaper to use if you don’t have any function calls bc then they don’t have to be saved.
Something that has annoyed me for a very long time is the brackets when using LEA. I find it very misleading since no dereferencing is actually taking place. The Motorola 68000 does the same thing with its LEA instruction
This isn't a dig at the way you write by the way. I know that you're required to use brackets. It just seems odd that the language was designed that way. As it stands we have to say that [] is the dereference operator EXCEPT when using lea and that's just confusing for new users
@uu uuu, That's true, but I still forget sometimes. Especially when you have pointers to pointers... Don't get me wrong, I'll gladly take it over C's endless sea of * and &, but it's still a bit odd
Your test case failures should have been immediately obvious before you ever got that far. A nested loop like that will almost _always_ go from `i = 0 .. n - 1` in the outer loop, and `j = i + 1 .. n` in the inner loop (and yes, I started writing this comment before the point in the video when you ran the tests)
we humbly request the o(n) solution
SIMD if you please.
We request with urge the o(n) solution
Fuck that, you need the o(1) solution.
@@Erich-Lab wouldn’t that just be a large enough FPGA or ASIC that takes the whole array as an input and simultaneously computes the sum of every pair of numbers and returns a solution. Since everything is done in parallel assuming you have enough FPGA resources and the array is already loaded it should just take as much time as a single addition takes no matter the size of the array.
@@conorstewart2214 everything still takes o(n) because unless you have infinitely amount of threads to do parallelism, say you have 1 billion n, no way you can have 1 billion threads.
Say you do not have infinite threads (100) then it would be N/100 which as N goes to infinity, it is still N.
such a interesting video... never would I in a million years expect someone to do twosum in assembly. great job!
a million years is a long time
@@snapphanen indeed. id expect them to have expected it already
@@danisob3633 well I'm smart and I expected for a million years that you would have expected them to have expected that already.
I still have nightmares of assembly almost 20 yrs ago on masm32 staying with C. Respects you make it look so easy.
I'm currently learning masm32 in university lol. Not the worst. Microsoft's x64 is pretty bad though with it's shadow space. Uhh awful
@@Silent. me to learning assembly right now
I'm learning Risc v rn, funny times
interviewer: can you solve this problem?
no one: sure, my language of choice is assembly
Very, very impressive, and excellent to see some real(!!) assembly skills on youtube. You would have liked the 80's and 90's 'golden era' for assembly coders, certainly from a game making perspective. Things have certainly moved far with CPU instruction sets, but reassuring to see so much familiarity with those opcodes and concepts. Subbed.
M68K assembly is my favorite, as I love modding classic Sonic.
Might try two-sum on that processor😊
this is awesome, really makes me want to learn assembly. You make it look so trivial, for me it would be a nightmare!!
Been doing it for a while lol. Best time to start is now!
@@LowLevelTV I can tell! How did you learn? I have always struggled to find any good material for assembly.
Try out capture the flags. PicoCTF is a good one
I love assembly! And I loved your cheeky online assembly work around.
Did you get assembly in college? I am amazed that most 30 something and below colleagues never ever did anything in assembly. I think I was the last generation where we did a hell of a lot of assembly. Mainly Z80 and 8086 and 68000. I studied EE (digital electronics) and back in 1990 you didn’t have Arduino and cool ESP32s that you could program in C. So when we designed an SBC, we also needed to make it do a certain task and that was basically done in assembly. Was the quickest and easiest way to iterate.
I did a bit of AVR assembly in my EE pipeline in college and I played a lot of capture the flags as well :) I feel like assembly did the most for teaching me how and why code works/breaks.
@@LowLevelTV I totally agree! To me assembly made me understand how computers work, and through that made my EE study simple.
I didn’t know you also did EE?! Us EE majors for the win 😁 I actually did EE because back in 1990 there wasn’t a deducted CS program yet. It was bunched in with EE.
I’m currently a Comp Sci major in college and we are writing some programs in MIPS assembly. So at least a few colleges are still teaching assembly.
@@modipy5703 cool! And MIPS is a nice CPU, lots of registers and great branching. Definitely a good foundation for C and higher languages. Which college do you attend?
@@CallousCoder I attend university of Wisconsin - Oshkosh. We’ve learned Java, then C, then MIPS so far. Learning MIPS has opened my eyes to the basics of computers and how doing certain things in higher level languages can speed up or slow down programs. Assembly is hard but satisfying once you get it to work, and even more satisfying to optimize.
Love this - pleased to see younger developers with such enthusiasm for lower level development skills
Amazing video! Doing ASM this semester and loving it. We are doing dbl precision fp and bit wise ops right now. Just got done creating custom functions and c function calls. Love this class it’s fun.
When accessing pointers multiple times, it helps to store the pointer in a temporary register (if you need it from the stack) and then doing your "lea" operations as needed. The "lea rbx, [rbx..]" method works fine if you only need to access it once. But in this case, it would've been better to not clobber rbx and instead store the first result into another temporary register, removing the need to get setup rbx by accessing the stack again for the second result. As well, there were still plenty of registers available (r10-r15) so using the stack wasn't necessary in this case.
You should post the code as "easy to understand and fast solution"
never seen someone use "ii" for the inner loop haha, very cool though! I'll pretend I followed most of it 🙃
That's some chaotic evil type shit there
Don't we just use j?
He is amateur.
the __attribute trick was really interesting - I wonder if it would work to integrate a C function into some asm code with non-standard calling conventions... and reminds me of another question:
is there a way to dynamically parse C structures into nasm files in a mixed project so that the structures can change as needed by the expanding C codebase without having to rewrite asm code that also uses them?
also, had a quick go at the O(n) solution in C, looks like it would be easy enough in asm too...
also also, why put input values on the stack? you had enough registers to not need the stack at all except for storing registers that can't be clobbered, and then you wouldn't have needed to keep pulling those values back from memory...
also also also, thinking now about the extension to this problem where there is a list of targets to find pairs for in the same dataset, maybe has to be sorted lists for reasonable optimisation...
when i think too much about how computers work it hurts my head, great work!
The main thing to learn here is that C inline assembly syntax and boilerplate is terrible.
One of the only good reasons to learn pascal is that freepascal does inline assembly (mostly) right.
Big ups for torturing yourself into getting used to this hell.
That being said, here are my ASM nitpicks:
Mov'ing into ebx just to compare eax and ebx when you scratch ebx after compare instead of doing 'cmp eax, dword ptr [rsp+0x20]'
Using 'inc' instead of adding 1 tends to be slightly slower on some CPUs, specifically some intel CPUs.
Using dword mov operations while you allocate the full 64 bytes anyway - you are saving a bit of instruction cache when you should be saving instruction cache AND memory cache. If rsp isn't aligned to cache boundaries (and it often won't be) you might lose some cache just to accessing your arguments instead of reserving it for an already cache heavy array search operation.
While XORing a register to itself does often cancel during instruction parsing, in my experience it sometimes won't (for whatever reason) so it's usually best practice to xor dwords even on higher registers (I think it has something to do with μ-instruction hardware often wanting to combine successive instructions but I'm not certain exactly where the slowdown comes from).
I don't just have critique, I really like how you did 'lea rax, [rax+4]' even though 'add rax, 4' is less complex and fewer letters because it makes it completely clear without a comment that rax is supposed to be a memory pointer, not a normal value. You also did 'cmp' to memory addresses in multiple places so my above nitpick is clearly of a freak occurrence.
Can I ask why you know all this? What do you do for work?
@@guinea_horn I'm just an enthusiast assembly programmer. You just get these sorts of esoteric opinions from doing enough pure assembly, userland programming and being actively interested in optimizing things like memory footprint and performance.
Isn't inc only slower on old CPUs?
This is pretty cool, though if you wanted to make it easier to follow I'd do it outline instead and get syntax highlighting. I prefer nasm syntax over gas Intel, but at least you didn't do it with AT&T syntax. That would've been the worst.
This video was more informative than my professor who taught for a semester
:D glad you learned something
i mean not really, he doesnt explain things,, at all i feel
A quote I liked, “Education isn’t something you get. It’s something you take.”
You’ll learn a lot more spending time reading, solving as many exercise problems as you can and writing as many programs as you can than you will sitting down in a lecture and expecting that you'll magically learn everything.
@@Dsksea exactly, just watching this video probably wont do much more than maybe act as a reference
@@Dsksea one thing I've always said about college is that I never learned anything from college, but the things I learned in college are invaluable and I'd do it all over again just for those things.
Assembly is alien to me. However, I appreciate what you did here.
Employer: so what is your name?
LLL: I solve leet code problems in asm
Employer: Hired
Now I remember how I used to look at code with 0 programming experience
love your videos its always a fun learning experience keep it on
Glad you enjoy it!
Can you do some harder problem in assembly? You did easy one and my mind already blown
That’s really impressive ❤
Next, write doom in asm.
I regret that I never paid attention to these in the classes. Now I don't understand this at all, and I will have to study it all over again.
Well done, very informative!
As a noob that does web dev. assembly is full of mind bending stuff you have to know.
I've been trying to do leet code in assembly, I use it mainly for the challenges that need you to make some math operators. But also used it for quick text windowing in such, or if I need to brute force something.
wow, watching this just a week after the infamous bomb lab. Good job mate
i just can say you are doing a wonderful job , you are on another level
From the title I knew ur doing something I did sometime ago, just to prove I did 64-bit signed floating point multiplication.
Great job! The power of assembly at his best!
My brain wants to explode by I continued watching.
This was impressive. I don't understand where pointer arithmetics is done. Shouldn't it be a distinct opcode?
There are a few ways to do pointer arithmetic on x86, LEA is a very common way but you also see it in a few other places such as:
sub rsp, 0x40
mov qword ptr [rsp+0x10], rdi
Pointer arithmetic is treated the same as integer arithmetic by the cpu, since the only real difference is that your additions/subtractions are based on the size of the data type being pointed to. Assembly doesn't know what's being pointed to however, which is why he had to specify "qword ptr" when dereferencing
Insane Mode would be manually inputting eletrical charges(0 n 1) at your processor to use the asm codes
I love the video. It would have benefited even more, if you didn't prepare the algorithm, but implemented it live with all the iterations and bug fixing. I already understand every command but have the most trouble "interacting" with assembly while developing.
I used to love programming in assembly on the 68000... Then I came across weird choices of instructions by gcc based on knowing execution times And Then out of order execution on the RISC and realised the compilers would win.
I still would have tested for Outer loop value at i being > target to skip entire inner loop iterations.
Hello LLL team, can you guys make a full tutorial for writing assembly code for ARM and arm64 CPU? I want to know more in depth knowledge how loader loads the program in memory and how all the sections are prepared before running main().
what a cool idea! great job!!
I can't decide if I should or shouldn't call out that this isn't pure asm as they variables at the top are C variables as in assembly "int" is an interrupt.
I tried this last year for advent of code. Unfortunately I was using the Amstrad CPC which is only an 8-bit cpu. Most of the questions required way more than 64k of ram to solve
This is solvable on a 64K, actually most leetcode questions that are easy to mid can be done on a Z80 with a whopping 64K. The only problem is loading test data. Which you either need to compile in your binary as a table. Or create pseudo random values.
Something like this isn't too bad. When you're asked to use unixdict or something similar things get messy. It's possible with bank switching but it's not easy.
Not all heroes wear capes
This was easier to follow than I imagined 😁👍
Thank you so much! :) Any tips/recommendations on low level language courses or tutorials?
May I ask what hardware and software combination do you use for your screen annotation? Looked very fluid to me.
"Leetcode for some reason doesnt have assembly as chooseable language"
I wonder why lmao
he definitely has a glow up
danger to humanity, you are.
- Master Oogway
#LowLevelMafia #AssemblyGrindset 😤
✊✊✊
Very nicely done! 👌
Thank you very much!
Want to learn this solution just for giggles during an interview.
Why use a compiler when you are the compiler
Impressive. 👏 Loved watching you do this. Are you a mainframe system engineer?
You didnt need a hashmap, an array with 2^32 entries would ve been enough and definitely easier to do on asm.
That's ~2 GB of memory, would you even be able to run that on LeetCode?
i already love this and i haven't even watched it😄
Please do a coding interview and pull out assembly
i thought i will try my best to understand but it went completely somewhat my head
I'm going to be honest, I could barely follow that. I still subscribed.
I guess only microsoft x64 is rcx, rdx, r8, r9 then stack allocated params with floating points in the xmm registers
Yeah for some reason Microsoft made their own 64 bit calling convention
I've never heard of r8 and r9 but then again I've only used 16-bit x86 which didn't have them
Wander if a 'dumb' SIMD solution could beat everyone? What about unrolling the loops?
Inline assembly is hell
Well done! I couldn't even do them in Python.
Leetccode machune code, when?
I just learned html and css, is assembly as easy as those to learn. Please help I am trying to get a job soon
"So now we allocate RAX to stack, pretty simple".
Me: 👁👄👁️
"We''re gonna set up the stack frame by pushing rbx"
Yeah sure bro 😂
Anyways, would have been cool if you had solved this using rep scasd or even simd, oh well.
any recommendations to learn assembly like this?
He's using GCC inline assembly, which in my opinion has horrendous syntax. It's much easier to use a dedicated assembler and use extern keywords to make the function visible to C
@@williamdrum9899 thank you
I would watch Chibiakumas on youtube, his videos are quite good
@@williamdrum9899 thanks man, i will check it right away
if this guy shows up as my interviewee == UNO Reverse
Great video!
my mind is blown ...but wtf did the others guys do with even lower runtime...
thats what I want to know lmfao
It's c++
why are we storing the parameters on the stack i.e putting them into memory? i would just work with rdi,rsi,rdx,rcx the way they come (i never saw anywhere that u had to store them on the stack)
This is the convention a C compiler uses to write assembly code. It does this so that you can call any function within another function without everything falling apart. Otherwise managing all these functions becomes a huge mess
@@williamdrum9899 can you specify by what you mean by everything falling apart?
@@Cagri2001 They’re “caller” saved registers, so whenever you call a function you have to store them bc they can be used/overwritten. Ig C compiler does this automatically to avoid keeping track of if there’s a function call
@@jackytaly thank you
If I have callee saved registers, then it would have been inside the procedure where we push/pop those registers?
@@Cagri2001 yup, callee saved registers always have to be saved. Caller saved can be cheaper to use if you don’t have any function calls bc then they don’t have to be saved.
Something that has annoyed me for a very long time is the brackets when using LEA. I find it very misleading since no dereferencing is actually taking place. The Motorola 68000 does the same thing with its LEA instruction
This isn't a dig at the way you write by the way. I know that you're required to use brackets. It just seems odd that the language was designed that way. As it stands we have to say that [] is the dereference operator EXCEPT when using lea and that's just confusing for new users
Yeah the fact that the lea addressing syntax is backwards is a little annoying.
It doesnt dereference anything, but could be understood as loading the effective address of the contents at the address in the brackets i guess?
@uu uuu, That's true, but I still forget sometimes. Especially when you have pointers to pointers... Don't get me wrong, I'll gladly take it over C's endless sea of * and &, but it's still a bit odd
Any book or any material really for recommendation?
Thank you for this
Damn... I am know several programming languages, and this felt like foreign lingo to me.
Using n^2 solution and it's in the top 10%. Damn.
AWESOME!!!
Thanks!!
Lets see Leetcode problem 420 in assembly!
he is a literal chad🔥🔥
att syntax gang
What kind of job does this knowledge get you and does it pay well?
So cool !
This asm can be optimised more
does leetcode allow you to use att syntax?
I'm stuck, I've been looking for a good resource to learn c and assembly , can u help me plz?
I'm not sure there are solutions with hashmap in C on LeetCode
most likely just those 10%
madlad
my head hurts
Now, because your code only runs in O(n^2), please submit the O(1) solution before we can accept your code as a valid solution.
My brain is horrible, it immediately returned a bad pun after hearing twoSum 😅 the last twosum I processed was with my girlfriend.
Now, that's low level
sum(array)
Eyy fellow Intel syntax enjoyer! :D
You know it!
whats ur operating system? i wanna know where the bottom right section is from
More than likely it's 64-bit x86 Linux
linux using i3wm
Isn't calling malloc kinda cheating?
Your test case failures should have been immediately obvious before you ever got that far. A nested loop like that will almost _always_ go from `i = 0 .. n - 1` in the outer loop, and `j = i + 1 .. n` in the inner loop (and yes, I started writing this comment before the point in the video when you ran the tests)
fuck yeah
Assembly?😵
Do it it O(nlogn) wIth recession please 🙏🙏🙏🙏
couldve been better if written in rust
imagine u r writing this code in interview. Easy offer xD