All about MEMORY // Code Review

แชร์
ฝัง
  • เผยแพร่เมื่อ 22 พ.ค. 2024
  • Keep exploring at brilliant.org/TheCherno/ Get started for free, and hurry-the first 200 people get 20% off an annual premium subscription.
    Patreon ► / thecherno
    Instagram ► / thecherno
    Twitter ► / thecherno
    Discord ► / discord
    Code ► github.com/St0wy/GPR4400-Phys...
    Designing a Physics Engine in 5 minutes ► • Designing a Physics En...
    Send an email to chernoreview@gmail.com with your source code, a brief explanation, and what you need help with/want me to review and you could be in the next episode of my Code Review series! Also let me know if you would like to remain anonymous.
    CHAPTERS
    0:00 - 2D Physics Engine
    13:17 - Heap allocation, memory fragmentation and the CPU cache
    18:30 - Logging performance considerations
    23:12 - Smarter memory allocators
    29:12 - Allocate memory once ahead of time when possible
    This video is sponsored by Brilliant.
    #CodeReview

ความคิดเห็น • 213

  • @TheCherno
    @TheCherno  ปีที่แล้ว +183

    Who's excited for part 2?
    Keep exploring at brilliant.org/TheCherno/ Get started for free, and hurry-the first 200 people get 20% off an annual premium subscription.

    • @pushqrdx
      @pushqrdx ปีที่แล้ว +2

      Can you please point me to that sweet Visual Studio color scheme you're using?

    • @heman922
      @heman922 ปีที่แล้ว +3

      Plz do a series of cpu

    • @PhoenixDigitalGamer
      @PhoenixDigitalGamer ปีที่แล้ว +1

      Can u make tutorial on creation of game engine Cinematics system. Please :)

    • @mr.anderson5077
      @mr.anderson5077 ปีที่แล้ว

      yes please

    • @harshsulakhe2720
      @harshsulakhe2720 ปีที่แล้ว +3

      can u plz make a complete VIDEO ON ASSEMBLER like the one similar to LINKER AND COMPILER

  • @Stowy
    @Stowy ปีที่แล้ว +357

    Thanks a lot for looking at my code ! For the logging, I was using spdlog, but then I removed it because I wasn't able to import it using FetchContent haha. This is very useful feedback and I can't wait for the part 2 !

    • @ThePowerRanger
      @ThePowerRanger ปีที่แล้ว +4

      Cheers, good luck for your classes!

    • @blazefirer
      @blazefirer ปีที่แล้ว +18

      2rd

    • @Stowy
      @Stowy ปีที่แล้ว +5

      @@blazefirer english is not my first language lol, my b

    • @blazefirer
      @blazefirer ปีที่แล้ว +13

      @@Stowy its ok. I saw that there was only one reply and I would the 2nd so I couldn't resist making the joke

    • @ohmree
      @ohmree ปีที่แล้ว +1

      I suggest taking a look at xmake as a replacement for cmake, it probably has spdlog in its repos and is just a pleasure to use in general.

  • @anon_y_mousse
    @anon_y_mousse ปีที่แล้ว +197

    As a predominately C developer, I agree with and applaud his choice of adding "pp" to the end of file names to differentiate C and C++ header/source files. They are separate and it should be noted. Arena allocators are a good idea and I've implemented several that I use in my own libraries. Heap allocation need not always be super expensive, even with "vectors", and the mitigation technique I learned years ago that still works beautifully to this day is to scale by a factor of two and always reserve memory starting at some power of two. As to the comment about logging, yes, it is a Windows "feature" to slow down by such a large factor when logging to a console. If you use a Linux distro of nearly any variety you'll be surprised by how quick the terminal updates as compared to Windows.

  • @simonesasso8379
    @simonesasso8379 ปีที่แล้ว +121

    Yes, implementation and profiling of the optimizations would be super interesting to see!

    • @crumbled9774
      @crumbled9774 ปีที่แล้ว +4

      yes yes yes. Can't wish for anything better!

    • @ChrisM541
      @ChrisM541 ปีที่แล้ว +2

      Totally agree, that would be super interesting.

    • @ibrahimmahdi1299
      @ibrahimmahdi1299 ปีที่แล้ว

      can't wait for a video like that from the best "TheCherno"

  • @miguelguthridge
    @miguelguthridge ปีที่แล้ว +8

    At 14:40 where you're talking about cache misses, there's a relevant article which is really good called "Your computer is not a fast PDP-11"

  • @mr.anderson5077
    @mr.anderson5077 ปีที่แล้ว +8

    Cherno, has a huge backlog of "The topic for another video", please keep it coming. yes, we do want a cpu cache, memory fragmentation , and what not in the multiverse video

  • @crystalferrai
    @crystalferrai ปีที่แล้ว +6

    31:45 Good advice about preallocating vectors. If this is a function that runs every frame, I would take it a step further and make the vectors persistent. Clear them at the start of the function and reuse them. This way the memory remains allocated and keeps getting reused. Another option would be to use an auto-resetting frame allocator like you mentioned earlier. However you go about it, the main idea is to not make new heap allocations every frame.

  • @ThePhyskid
    @ThePhyskid ปีที่แล้ว +19

    I'd really be interested in seeing how you add the optimizations. In particular, I'd be interested in seeing how you clean up the memory used by the arena allocator once you're left with holes.

  • @jonathangrahl
    @jonathangrahl ปีที่แล้ว +8

    Great topic! This has been in my head the latest weeks when implementing my path tracer and SaH BVH, and the optimisations really add up. Especially referring objects by index and saving them in a 1D array.

  • @Mnmn-xi6cj
    @Mnmn-xi6cj ปีที่แล้ว +17

    Would love to see you profiling this after your first look at it. I'm sure the stack allocation and growing of the vector each frame hits like a truck. That would also allow you to show some before/after benchmarks!

  • @thwKobas
    @thwKobas ปีที่แล้ว +38

    I left C++ like a 7 years ago, and this brings so much memories and smile to my face. I'm watching your videos for few weeks now and must say good job and keep uploading. :)

    • @tathagatmani
      @tathagatmani ปีที่แล้ว +1

      What did you switch to ?

    • @matthewe3813
      @matthewe3813 ปีที่แล้ว +1

      @@tathagatmani probably rust or c

    • @thwKobas
      @thwKobas ปีที่แล้ว +3

      @@tathagatmani Actually I switched first to objective-C and then swift :D Doing iOS mobile development now

    • @Alperic27
      @Alperic27 ปีที่แล้ว

      c++ has evolved a looot … but he seems to be stuck in c++9x style.

  • @jeffcummings3842
    @jeffcummings3842 ปีที่แล้ว +6

    You really caught my attention when talking about the CPU cache, as I've done some work with Assembly Language programming WAY in the past, but yeah, understanding how that works is an amazing detail for optimization. OMG, great idea with the logging to a file vs console, I'm just getting to the point in my project where it's starting to become medium sized, and logging is an issue already, so great to know that logging to files is more efficient...plus the macros... it probably helps that I am watching your video at a time when I'm considering re-working my entire codebase for my main project too. LOL OMG, that's amazing that you can pre-allocate memory and pass an allocator to the vector class, I'm totally going to look into this and try it! Great video, thanks for sharing.

  • @Klusio19
    @Klusio19 ปีที่แล้ว +5

    I just started learning C++, currently I (I think) finished learning OOP concepts, and this video is so interesting for me actually! The stuff about the memory and access times is pretty interesting.

  • @StevenMartinGuitar
    @StevenMartinGuitar ปีที่แล้ว +2

    Would def love to see you profile this and then implement the optimisations and profile again! (threading, arena, allocators, less heap etc) great video!

  • @Basel-ll8fj
    @Basel-ll8fj ปีที่แล้ว +4

    this series is really fun to watch and very helpful

  • @fellypsantos_
    @fellypsantos_ ปีที่แล้ว +4

    extremely valuable knowledge passed here, thanks Cherno ♥

  • @SkyCityInc
    @SkyCityInc ปีที่แล้ว +4

    This is awesome, makes me want to write my own physics engine as an exercise. Can't wait for the next video!

  • @HLCaptain
    @HLCaptain ปีที่แล้ว +12

    What I would like to see is you optimizing a project based on your recommentation you given in this video, then compare the results with an unoptimal solution with via a profiler. Would be super interesting! Great video though! :)

  • @douglasullman
    @douglasullman ปีที่แล้ว +1

    I've been loving your stuff and gotta say the plug for brilliant is brilliant ! I'm going to check that out. Thank you so much Sir.

  • @atraps7882
    @atraps7882 ปีที่แล้ว +2

    im not even a game developer, i just work on the web and the cloud doing backend stuff but this is really interesting to watch. Subbed!!

  • @uploadschedule
    @uploadschedule ปีที่แล้ว +2

    in the moment now i dont have time to watch it. But later i will watch this vid and im sure its interesting because videos about how the hardware components work etc are always a thing i like learning about :D

  • @viraatchandra8498
    @viraatchandra8498 ปีที่แล้ว +3

    for c++ simple logging, you can look at `sync_with_stdio(false)` and `std::cin.tie(NULL)` calls to accelerate your `cout` code a bit. `printf` will in general be faster though because it doesn't deal a lot with multi threaded scenarios. there are even faster ways to output logs, but of course, its non trivial overhead.

  • @sixtenhugosson
    @sixtenhugosson ปีที่แล้ว +4

    If anyone wants to learn more about memory arenas, there's a good write-up called "Untangling Lifetimes: The Arena Allocator" by Ryan Fleury.

  • @mementomori7160
    @mementomori7160 ปีที่แล้ว +2

    I really liked this video, all in for part 2

  • @ShaunYCheng
    @ShaunYCheng ปีที่แล้ว +16

    I'm not a game dev but this is still very educational.

  • @Sebanisu
    @Sebanisu ปีที่แล้ว +1

    Just realized you are still doing code reviews and this one had 3 videos. So Now I got my afternoon planned out heh.

  • @Beatsbasteln
    @Beatsbasteln ปีที่แล้ว +1

    this was fascinating. can you make a video about how to make an arena allocator and then show how you use it when creating vectors?

  • @aaron6807
    @aaron6807 ปีที่แล้ว

    FINALLY! I'VE BEEN WAITING FOR THIS EPISODE FOR AGES

  • @cyphre117
    @cyphre117 ปีที่แล้ว +1

    Would be great to hear you talking about Static vs Dynamic libraries!

  • @IkeVoodoo
    @IkeVoodoo ปีที่แล้ว +4

    Great video, though each time I wish that we could see the final optimized version of the project :D

  • @thehambone1454
    @thehambone1454 ปีที่แล้ว +1

    Would love a video about the CPU cache and the related!

  • @on-hv9co
    @on-hv9co ปีที่แล้ว +1

    I do something very similar with that log macro. its essentially just an X macro that wraps cerr and uses the ascii color codes. from there DLOG and RLOG are called and will log their respective debug/(sparse) release

  • @F1nalspace
    @F1nalspace ปีที่แล้ว +2

    Nice project and good talk about memory improvements! Memory arenas and transient memory are great and my most used techniques when i do programming these days.
    If you are interested, i have a similar physics project (2D fluid simulation) that is a little bit more complex, due to its multi-threading + integrated benchmark support and 4-versions of C++ styles, where i tried to show the difference between naive/from-the-book C++ programming to data-oriented-programming, but didn´t get it exactly right - especially the data-oriented part. Just give me a hint, i will sent you the details.

  • @Thomas_Lo
    @Thomas_Lo ปีที่แล้ว

    cool refrence video for quite a lot of topics. works well as a refresher :-)

  • @Spartan322
    @Spartan322 ปีที่แล้ว

    Terminal logging is slow in C++ because most streams, especially cout, tends to flush constantly where as most implemented file logging in C++ doesn't perform constant and immediate flushed for every input.

  • @jef777
    @jef777 ปีที่แล้ว +4

    This main function looks so nice. I wish mine could look so inviting.

  • @tolkienfan1972
    @tolkienfan1972 ปีที่แล้ว +3

    Often the dependencies between chained pointers is more important than the fragmentation. I.e. you could explicitly construnct a linked list in contiguous memory, but iterating will still involve the cpu waiting for each load to complete before it can calculate the next pointer. Iterating over the exact same nodes, but using an index instead of the next pointers, will be much faster. The cpu can prefetch the cache lines.

  • @MrDenniable
    @MrDenniable ปีที่แล้ว

    @19:45 About the huge time consumption of logging... You should check out Trice! It speeds up your logging performance on embedded systems :)

  • @enigma7791
    @enigma7791 ปีที่แล้ว +3

    Yes if you could look at your optimisations and the effect on performance that would be really cool! Often I spend too much time optimising code for very little return. EDIT...but I do note the FPS is massive here anyway so it is difficult to quantify if it's worth it. Maybe throw in something that really puts a strain on the FPS and see the optimisations make it smooth again? Either way great code Stowy and great review Cherno.

  • @-infality
    @-infality ปีที่แล้ว +5

    Regarding the slow Windows terminal you may be interested in Casey Muratori's videos about it and his refterm prototype project

    • @Macuyiko
      @Macuyiko ปีที่แล้ว

      Was going to mention that as well. He goes into some interesting details about conhost if I remember correctly which is doing a lot of crazy things that make consoles slow on Windows.

  • @squelchedotter
    @squelchedotter ปีที่แล้ว +3

    I wouldn't expect that the virtual memory thing matters all that much considering current CPUs don't prefetch across page boundaries anyway. But things like huge pages do have advantages in terms of TLB lookups and hit rates.

  • @paligamy93
    @paligamy93 ปีที่แล้ว +1

    @8:13 would not recommend starting with _ ever because its too easy to make a mistake because "Use of two sequential underscore characters ( __ ) at the beginning of an identifier, or a single leading underscore followed by a capital letter, is reserved for C++ implementations in all scopes."
    @13:31 Not only do you want to be using pointers, but ask yourself "Do I need a hierarchy or do i just need several implementations of void ClassName::Update(float deltaTime)"? Because if you don't need a hierarchy, don't use one! Use type erasure and yes its still a function pointer and a potential cache miss, but it will simplify your code structure. Now you have a folder called Entities instead of a type named Entity that everything derives from and your type erased entity type now defines the contract a type must fulfill to be an entity instead of saying you HAVE to derive from Entity to be useful here.
    @14:51 Also known as a "cache-miss" because the writer was not as concerned about "cache locality"
    @19:59 std::sync_with_stdio(false) improves that time considerably but c++ iostreams are notoriously slow and the reason why is because of all the safeguarding overheads they do. The console is slow because it has to render which as you know is bleh. Logging libraries are the way to go in this case and not have them output to console but have them output to files. This is a graphical program so there shouldn't be a "console out" anyway. Create a new global logger named log or something at the very least. There are multithreaded logging libraries that will attempt to put your logs in chronological order if you don't want to split them.
    @25:41 On the virtual part: The operating system will allocate to you a "page" of memory when your current page is full so its basically the same thing as a small arena allocator, but its so much smaller than what an arena allocator will give you and many many system calls to the OS to ask for more "pages" is what makes allocation take so long. You're giving over your CPU cycles to the OS and that's going to mess up your execution cache because it code that's not in your program that's being called, malloc or whatever is going to be a function that's in a dynamic library aka a function pointer and more cache misses. Profile your system calls! You may find more than you expect. Also align your types (adds padding) so that when you do ask for a value its not going to have to ask for 2 lines (? proper name escapes me) because half of your object is on one line and the other half on another.
    @31:47 It looks like the number you're looking for is already computed with collision pairs as well. You seemed to know you needed to make a vector but made it too early! make instances as close as possible to where you use them.
    @32:09 I think the multiple solver problem is something that should be handled with a template. From what I saw you don't need to dynamically at runtime change your solver with the same types. Make your solver be something the compiler figures out.

  • @nathantonning
    @nathantonning ปีที่แล้ว

    Great code review.

  • @shalip
    @shalip ปีที่แล้ว +1

    please release a video where you implement your suggestions. It would be so GREAT !!

  • @ricardopieper11
    @ricardopieper11 ปีที่แล้ว

    This is the 1th The Cherno video I watch

  • @cloud9sl98
    @cloud9sl98 ปีที่แล้ว

    WORKING thx bro

  • @MosiuoaF
    @MosiuoaF ปีที่แล้ว

    Thank You!

  • @rajpootmhm
    @rajpootmhm ปีที่แล้ว

    Please make a video on handling big data
    Along with memory management and time complexity

  • @sethmoore5903
    @sethmoore5903 ปีที่แล้ว +1

    I'm curious how the actual defragmentation process works in a game engine and how it affects performance in a simulation where we have lots of circles dying

  • @darioabbece3948
    @darioabbece3948 ปีที่แล้ว +1

    The project: c++ gameplay
    The cherno explanations: c++ lore

  • @Overminddl1
    @Overminddl1 ปีที่แล้ว

    Logging to console in Windows is indeed substantially, like Substantially slower than on Linux, however there are ways to speed it up as well, both by using Microsofts new terminal as well as using buffering in the program instead of flushing every single log immediately, still not as fast as on Linux, but helps a ton.

  • @bu3778
    @bu3778 ปีที่แล้ว

    damn this was a nice review

  • @dealloc
    @dealloc ปีที่แล้ว

    The reason it's slow to write to stdout is that things like std::flush, std::endl and new lines ("
    ") will flush the contents of the cout buffer into the stdout buffer terminal (writing to it) this happens instantly because terminals usually have little or no buffering, so it can appear instantly. This also happens with files on disk; although it's perceived as faster because it doesn't flush the contents as frequently, due to how the OS buffers the contents before writing to the file on disk. So it's not that terminals are slow, it's that any I/O is slow in general.
    You can avoid this by flushing the cout buffer less frequently (i.e. outside of loops) but it can be an architectural nightmare and often not needed, since you're probably more interested in up-to-date info when debugging. Do what Cherno (and many other projects) does and use different levels of logging for more granularity.

  • @GautamSharma-un3cr
    @GautamSharma-un3cr ปีที่แล้ว

    Please make a video on how to exploit cache lines and CPU cache in order to build blazing fast applications

  • @MrSandshadow
    @MrSandshadow ปีที่แล้ว +1

    23:50 it's called 'placement new'

  • @TuxikCE
    @TuxikCE ปีที่แล้ว +4

    Pls bring more of these code reviews!

  • @simonkufeld7903
    @simonkufeld7903 ปีที่แล้ว

    this channel should have more subs

  • @featherless656
    @featherless656 ปีที่แล้ว +2

    I wish I could find the motivation and smarts to be able to do stuff like this

  • @kursatyakupkukul7670
    @kursatyakupkukul7670 ปีที่แล้ว +1

    Wow, really enjoyed this one as a non game/game engine developer!

  • @m3taldragon1
    @m3taldragon1 ปีที่แล้ว

    Certain IDEs require you to use hpp vs just h if you are using any C++.

  • @wright777
    @wright777 ปีที่แล้ว

    For a better std::cout -> console performance:
    1. Call ios_base::sync_with_stdio(false);
    2. Call std::cin.tie(nullptr);
    3. Use '
    ' instead of std::endl

  • @nikeedev
    @nikeedev ปีที่แล้ว +2

    I read C++ standards 2 months ago, and it said that C++23(C++2b) will support .h file as standard header file. It doesn’t mean that .hpp shouldn’t be used, but .h will be supported because it was before planned to phase it out, but as it was used a lot within C but also C++ they will keep it

    • @ultimatesoup
      @ultimatesoup 8 หลายเดือนก่อน

      You can actually get rid of headers entirely if you use modules

  • @billynugget7102
    @billynugget7102 ปีที่แล้ว

    C++ ALREADY HAS ARENA ALLOCATOR. It works for all std structures/containers even vector. Its called PMR

  • @draco5991rep
    @draco5991rep 6 หลายเดือนก่อน

    I just started programming in C and I wonder a lot about when to use the heap and when to use the stack. Because I am more comfortable using the stack, I predominantly put all data onto the stack. Is there an easy rule of thumb to when use one or the other?

  • @MrFlyingChip
    @MrFlyingChip ปีที่แล้ว

    Haven't seen this in the comments, so will leave it. There's an article called "What Every Programmer Should Know About Memory". It explains in detail how the CPU works with memory, how RAM works, why it's so slow, and why CPU cache memory is so fast. I really recommend reading it (you just need to read only 3-4 first chapters).

  • @freandtuber
    @freandtuber ปีที่แล้ว

    Maybe there is time to have a look in to openMP for loading and shaping allocated memory 🤔

  • @kuroakevizago
    @kuroakevizago ปีที่แล้ว +2

    Thanks you're giving me a heads up on what to do next. I probably going to start making 2D Physics Engine.
    Thanks btw got your brilliant discount :)

  • @roz1
    @roz1 ปีที่แล้ว

    @Cherno We can do calloc rather than malloc which will be a contiguous allocation .... that can help but still it can't beat the stack memory.

    • @TheCherno
      @TheCherno  ปีที่แล้ว

      Both calloc and malloc returns a contiguous allocation of memory - there’s actually very little difference between how those two work

  • @SC2Villares
    @SC2Villares ปีที่แล้ว

    Why is that channel so good? Humanity deserves it? Oh my, what a gift!

  • @DanteWolfwood
    @DanteWolfwood ปีที่แล้ว

    you suggested allocating things like rigid body to the stack because of cpu optimizations but shouldn't the programmer worry about space? Are you banking on the fact that vectors allocate on the heap contiguously? Or should there be a specific buffer created or contiguous heap memory?

  • @codemastercpp
    @codemastercpp ปีที่แล้ว

    For speeding up console ouput
    You can unsync with stdio
    ```
    ios_base::sync_with_stdio(false);
    cin.tie(0);
    ```

  • @12affes
    @12affes ปีที่แล้ว +1

    Excellent video, memory is always an interesting topic!
    My one suggestion would be to change the storage of bodies in DynamicsWorld. On line 23 in the source file (seen at 27:45) the whole 'if (!body->IsDynamic()) continue;' means that static bodies are loaded into the L1 cache and then immediately discarded. Splitting the storage into static and dynamic bodies will ease the pressure on both the cache and the branch predictor.

  • @christopherprobst-ranly6357
    @christopherprobst-ranly6357 7 หลายเดือนก่อน

    Strong argument to use hpp: A potential user does not need to think about extern "C". If it's .hpp, it can be included only and directly in Cpp. .h leaves a lot of room for speculation. Can you import it from C? Can you import it from Cpp? Do you NEED to call extern "C"? It's there for a reason.

  • @andreidumitras4237
    @andreidumitras4237 ปีที่แล้ว

    What cholor scheme do you use?
    Awesome video btw.

  • @stdc_tri
    @stdc_tri ปีที่แล้ว +1

    As a convention, I think using p_Member for protected members are better than m_Member, it just makes it more clearer in my opinion.

  • @ValinorFP
    @ValinorFP 7 หลายเดือนก่อน

    Great video, thank you! In modern C++, is heap memory fragmentation a concern for developers, given that the OS uses virtual memory to map to physical memory? My hypothesis is that even if physical RAM is fragmented, but virtual memory is contiguous, the C++ program's performance will not be affected.

    • @majormalfunction0071
      @majormalfunction0071 7 หลายเดือนก่อน

      Maybe or maybe not. CPUs don't prefetch across page boundaries, probably because of kernel-side page permissions / residency state. The more pages you access, the more TLB slots you use. TLB misses hurt, but maybe not to the level of framerate problems. It's an extra memory access, paid serially. Huge TLB requires defragmented memory on the kernel-side, and has a system-wide limit. Running kernel code to change page residency really hurts. It's many instructions, and a possible disk access.

  • @mobslicer1529
    @mobslicer1529 ปีที่แล้ว

    with logging what i do is for stuff that gets called all the time i only log failures so you know what happens with those but don't flood the log.

  • @SETHthegodofchaos
    @SETHthegodofchaos ปีที่แล้ว

    15:20 Is there a difference between a "Entity Component" system and a "Entity Component System" system/architecture? Both can be implemented with a data-oriented memory layout, correct?

  • @ByChris
    @ByChris ปีที่แล้ว

    How comfortable would you feel about making a C++ Graphics course for udemy?

  • @xxdeadmonkxx
    @xxdeadmonkxx ปีที่แล้ว

    I really want to know how would you deallocate item from custom memory pool (arena?)

  • @odarkeq
    @odarkeq ปีที่แล้ว

    11:33 The webcam picture quality begins to tank because of the video encoding all the little gaps between so many moving circles. It's interesting to see a non-FPS-related side-effect appear while testing FPS-related benchmarks.

  • @BradenBest
    @BradenBest ปีที่แล้ว +1

    I wouldn't worry about fragmentation. It's the heap allocator's job to worry about managing that. And in the general sense, as long as you free memory in the opposite order that you allocated it, fragmentation will not be a problem. I say this as someone who has implemented malloc+free in C. To get a memory leak from allocator fragmentation, you would have to do some insanely stupid things. Of course don't just allocate willy nilly from the heap if you don't have to. Heap allocation carries a performance overhead because when malloc has to get more memory, it has to do so via a system call, which means a context switch, which is slow. That's the `sys` metric given by the `time` command.
    Regarding specifically what is said in the video, where you go into low level machine details like the CPU cache, I especially wouldn't worry about that, because that's premature optimization. Worry about choosing efficient algorithms, not about how the machine accomplishes a task. That's the compiler's job. Turn on that -O3 flag. Or -Ofast if you're not worried about slightly less precise math. Sometimes you can justify low level optimizations, like when the Quake devs implemented the fast inverse square root using low level floating point math. But then look what happened--the chipset manufacturers and compiler vendors caught up. Nowadays, the quake inverse square root is no faster (and sometimes slower) than code that a compiler will generate for a more straightforward algorithm. I do not recommend wasting your time optimizing for hardware. The compiler has already done it and you can save a lot more time by choosing a better algorithm. C (and by extension C++) is not a low level language, and your computer is not a fast PDP.

    • @BradenBest
      @BradenBest ปีที่แล้ว +1

      A big problem with that argument is the assumption that the pieces of data necessarily will be fragmented. It's "whataboutism" taken to the extreme. But let's look at an average case where you allocate 100 small objects using a heap allocator: the heap allocator has a free pool of memory, so it slices a chunk off for both the object and the bookkeeping node to manage that memory, and updates the other node to account for the borrow. It does this over and over again until 89 objects in, the pool doesn't have enough memory. So the allocator will do a context switch asking for more memory. The memory comes from the heap, so it will be adjacent to the previous memory, but it will continue to allocate memory until all objects are allocated. The allocator is smart, it doesn't want to waste CPU time by making a bunch of syscalls to allocate tiny blocks of memory, so it does them in bulk. Pages and pools of memory that it marks up and manages. If the addresses were wildly spread out, that would mean the allocator is allocating random pages for every single allocation request, and all those context switches would be a far worse bottleneck than a cache miss. But as it turns out, the heap grows upward. The addresses are all fairly close together.
      Now, you can optimize your code to assume that the allocator allocates a huge chunk of memory that's all close together, or you can optimize it to assume that the addresses will be far apart, but in the end, that's all you're doing: assuming. The standard says nothing about how the allocator is implemented. Don't assume. Write better algorithms. If the compiler thinks your array of structs will be more efficient if it turns it into individual arrays of the one element you access, it will do exactly that. That's the ultimate lesson: the compiler is better at optimizing than you are.

  • @alyshmahell
    @alyshmahell ปีที่แล้ว

    21:05 how do we define "DISTRIBUTION" for the whole project?

  • @0xCAFEF00D
    @0xCAFEF00D ปีที่แล้ว

    25:45
    Does it really work like this? That you have fragmentation in any percievable way. I thought with virtual memory you're not taking any penalty in reading across pages beyond that you're taking more TLB space because you have multiple pages. Is there any gain in having the actual pages be contiguous?

  • @Amitkumar-dv1kk
    @Amitkumar-dv1kk ปีที่แล้ว

    Do you also review Java codes or is it only c++?

  • @roykapon181
    @roykapon181 ปีที่แล้ว +1

    Is a std::vector with preallocated size a decent way to implement this kind of memory management? Or do you need to do it manually? Im a cpp newbie so pls dont roast me :)
    Btw, a great video! Looking forward for pt 2

    • @Larock-wu1uu
      @Larock-wu1uu ปีที่แล้ว

      I am curious about this as well

    • @roykapon181
      @roykapon181 ปีที่แล้ว

      I forgot to note that it will probably not work well with deleting items (I guess that for this we need a more sophisticated method)...

  • @davidcmoffatt
    @davidcmoffatt ปีที่แล้ว

    There is more benefits to contiguous data storage. Cutting down on TLB misses, and VM page misses jump to mind.

  • @frankhaugen
    @frankhaugen ปีที่แล้ว

    The reason why writing to console is slow, is that windows assume a window, so it's written to the UI interopts, while filewriting is just bits on disk

  • @IgnoreSolutions
    @IgnoreSolutions ปีที่แล้ว

    I’m surprised you didn’t mention the fact that variables starting with just an underscore are considered reserved by the language.

  • @christopherprobst-ranly6357
    @christopherprobst-ranly6357 7 หลายเดือนก่อน

    Logging on Linux/macOS: Yes, their terminals are magnitutes faster than Windows. Reason is that they are totally different implemented and Console on Windows is just slow. I read somewhere why it's hard to change. But Files are always faster, that's true.

  • @fenril6685
    @fenril6685 ปีที่แล้ว +4

    With regards to the .hpp header specification over .h, I find it to be very necessary in a lot of projects which are larger where you have a mixture of both c and cpp code (happens way more often than you might think at some companies where you have legacy code).
    It does makes a huge difference in those cases because you need to compile those .h files as C code only in some situations and not as C++ especially if they are separate projects in a larger solution base. It just makes it easier to distinguish directly what you are looking at.
    I used to be one of the .h default people, and never did understand why someone would use .hpp until I started working on legacy code bases created by other developers in large teams, now it makes sense because organizationally it serves an actual purpose.
    I now just use .hpp as default as a result, because I'd rather not go back after the fact and have to specify hey this is actually a cpp header file and you should compile it in your makefile or whatever build system you are using as C++ code specifically and not C code. Just something to consider.

    • @tolkienfan1972
      @tolkienfan1972 ปีที่แล้ว +1

      Not just legacy. Many of us use modern C. There are numerous cases I prefer C for.

    • @user-dh8oi2mk4f
      @user-dh8oi2mk4f ปีที่แล้ว

      But you don't compile header files?

    • @fenril6685
      @fenril6685 ปีที่แล้ว +2

      @@user-dh8oi2mk4f What I mean is that usually in external build systems you have some method of determining which files are included in which compilation processes, typically by some kind of pattern matching.
      You do NOT want .h files which are strictly c linked in unnecessarily with C++ compilation units. This can result in all sorts of unexpected behaviors, especially if you have C headers putting things in global scope with simplified names, which is pretty frequent in legacy code.
      If I have multiple binaries in a solution that I need to compile some as C and some as C++ then you don't want to pattern match against all .h files when you are building C++ code in your build steps specifically.

    • @user-dh8oi2mk4f
      @user-dh8oi2mk4f ปีที่แล้ว

      @@fenril6685 But why would you need to figure out which headers are c and c++? The compiler simply pastes the contents of the includes directly into the source file. I don't understand why you need to know which headers are which. Maybe this is helpful if you mix c and c++ in the same directory, but I don't get how it would help with a build system

  • @unkgames-abdullahali4048
    @unkgames-abdullahali4048 ปีที่แล้ว +1

    Physics engine: is an engine about physics!! 👍👍👍

  • @deconline1320
    @deconline1320 ปีที่แล้ว +1

    We see it often in code, but in C++ it's not a good idea to start a variable identifier with an underscore. Some combinations of single/double underscore identifiers are reserved for the compiler implementation by the C++ standard. I would avoid it completely.

  • @sherazali8691
    @sherazali8691 ปีที่แล้ว +1

    About logging, can we just create a Static class and call it's function to log something there (through parameters)
    like:
    Logger.Log(_currentFps);
    and in our release build, we just comment out all the statements in that function.
    We would still have an overhead of calling that function and passing parameters, but is it okay to do it like this?

    • @nickgennady
      @nickgennady ปีที่แล้ว

      It’s more simple and straightforward to setup sure but you have to keep commenting and uncommenting every time you want to change build type and you have to remember to do that.
      His macro way is much better.

    • @user-dh8oi2mk4f
      @user-dh8oi2mk4f ปีที่แล้ว

      I would be quite surprised if your compiler left the function call to an empty function with max optimization

    • @nickgennady
      @nickgennady ปีที่แล้ว

      @@user-dh8oi2mk4f fair. Did not think of that

  • @Silencer1337
    @Silencer1337 ปีที่แล้ว

    19:55 you can use C-style printf/printf_s which will be orders of magnitude faster. iostream stuff is slooow and the whole cout syntax is way too verbose to me as well. As always there are edge cases to watch out for e.g. when you log from multiple threads.

  • @lionkor98
    @lionkor98 ปีที่แล้ว

    You can log into a queue, and then flush the queue on a separate thread

  • @TGAPOO
    @TGAPOO ปีที่แล้ว

    Leading underscores are reserved in microsoft code. You should never use leading underscore variable if you expect to work on windows. Prefer trailing if you must.

  • @HappyHorge
    @HappyHorge ปีที่แล้ว +18

    javidx9 has some quite excellent videos on how you can make games in C++ and programming in embedded systems, which is really nice if you're into that kind of low level programming 😄 Low Level Learning is also a great channel for that kind of knowledge 😄

    • @paligamy93
      @paligamy93 ปีที่แล้ว

      CppCon and CppNow also great channels for the more advanced. Amazing talks by Michael Caisse and Luke Valenty this year about what can be done with compile time programming and the type system.

  • @spider853
    @spider853 ปีที่แล้ว

    I suspect you participated in ludum dare jam? We also did participate ) Can you please link to your game?

  • @nuDeltaTech
    @nuDeltaTech ปีที่แล้ว

    I use __ for static and single _ for members.