Dear Christopher, I would like to deeply appreciate your excellent and well described tutorial. Not only I finally manged to understand the arrays but also other topics such as offset.
Thanks for the video. There's a few things I noticed at first glance: 1. line 18 is redundant if you move the compare to the end of the loop (saving one cycle) 2. line 19 LDR R1 could/should be moved outside of the loop as R1 is never changed (more cycles saved :) 3. You could put the LSL shift into the STR instruction directly 4. You could even combine the ADD into the STR using [R1, #offset]! and do your comparison directly on R1 ...and then there's all those pops and pushes which should really be grouped into one command eg. POP {r1-r3}. Sorry to be pedantic. It's just that there's a lot of fun to be had in refactoring ARM code, as there's a LOT of scope for combining logic into one instruction, making sure memory access is minimised (looping using registers only if at all possible). I started writing ARM2 assembler in 92 - 8MHz single core, and in those days every cycle counted...after all these years...I see dead cycles! Hopefully this is useful info to someone watching :)
This code can be optimized. You keep loading R1 with memory location a, every iteration. That’s a bit useless. Also this approach is more efficient if you store from back to front. Then you just have to decrement and then you can omit an extra compare. But that’s a bit case dependent. You have a great ability to teach!!!
Good catch. You can move that instruction out of the loop to optimize things a bit in this case. This won't work in cases where you can't dedicate an entire register to hold the base address during the life of the loop (for example, calling a function during each iteration). In those situations, it is probably better to just load the base address with one instruction instead of pushing / popping from the stack with multiple instructions.
R1 is essentially the base address, R2 is the offset against this base address which gets incremented each time around the loop until the limit is met.
Dear Christopher, I would like to deeply appreciate your excellent and well described tutorial. Not only I finally manged to understand the arrays but also other topics such as offset.
One of the most informative videos I've seen. Took me from totally lost, to feeling confident in arrays now. Thank you!
Thank you!
First explanation of the .balign that actually makes sense. Thank you!
Thanks for the video. There's a few things I noticed at first glance:
1. line 18 is redundant if you move the compare to the end of the loop (saving one cycle)
2. line 19 LDR R1 could/should be moved outside of the loop as R1 is never changed (more cycles saved :)
3. You could put the LSL shift into the STR instruction directly
4. You could even combine the ADD into the STR using [R1, #offset]! and do your comparison directly on R1
...and then there's all those pops and pushes which should really be grouped into one command eg. POP {r1-r3}.
Sorry to be pedantic. It's just that there's a lot of fun to be had in refactoring ARM code, as there's a LOT of scope for combining logic into one instruction, making sure memory access is minimised (looping using registers only if at all possible).
I started writing ARM2 assembler in 92 - 8MHz single core, and in those days every cycle counted...after all these years...I see dead cycles!
Hopefully this is useful info to someone watching :)
Very clear and informative. Subscribed.
Awesome video. How hard do you think it would be to expand on this and implement a bubble sort?
You saved my weekend
This code can be optimized. You keep loading R1 with memory location a, every iteration. That’s a bit useless. Also this approach is more efficient if you store from back to front. Then you just have to decrement and then you can omit an extra compare. But that’s a bit case dependent.
You have a great ability to teach!!!
Good catch. You can move that instruction out of the loop to optimize things a bit in this case. This won't work in cases where you can't dedicate an entire register to hold the base address during the life of the loop (for example, calling a function during each iteration). In those situations, it is probably better to just load the base address with one instruction instead of pushing / popping from the stack with multiple instructions.
This was SOOOOOO helpful. Thank yo so much!
Thanks Christopher, thought you did a nice job with this.
Thank you so much for this video. So helpful!!
is R1 your implementation of the frame pointer from the frame record and is R2 essentially storing where to look on the stack?
R1 is essentially the base address, R2 is the offset against this base address which gets incremented each time around the loop until the limit is met.
Appreciate your effort, great video
Good explanation. Thanks
how can I run and debug a program in ccs without external hardware? is it possible? someone help me plz.
Thanks man! Helped me greatly!
thx m8
Hi can you helpme with my assembly test
My teacher using your video
do u play osu
Thanks for sharing this well produced video! :)