As a SNES programmer, this makes perfect sense to me. Often times working in 8bit mode is faster than 16bit, because an extra clock cycle is spent decoding the extra byte. I just never really considered it for modern apps. Thanks.
@@InsaneFirebat If I were to guess, this is most likely due to the CPU cache potentially being able to fit more pointer memory addresses since they're shorter.
it depends on the specific cpu and on the mixture of instructions, this requires a serious research if you like to squish every drop of power out of a cpu...
There is also an interesting ABI for x86-64 which compiles 64-bit code with 32-bit pointers, where you can use the full 64-bit range of instructions, but don't have access to addresses above 4GiB. I know there's a Debian port out there that does that. I'm not sure how feasible that would be on Windows.
Larger 64-bit instructions are also going to be slower on 64-bit machines when the code isn't in the cache. They can make up for performance after load and math operations that weren't natively available on 32-bit, but so many operations can just be done with 32-bit anway.
The x86 platform however uses variable length instructions so you aren't doubling the size of the code going to x64. In addition, certain workloads can be considerably faster because the x64 platform has more general purpose registers. Note too that there is an extra emulation layer in Windows 64-bit for 32-bit applications which means if there are a lot of kernel calls there may be a bit more overhead going from the program to 32-bit NTDLL to 64-bit NTDLL to kernel.
Usually they are just outdated and weren't updated, or updating them causes issues that would require re-writing a lot of parts from the game's engine, which might be either impossible (no source access) or very expensive (dev hours) and just not worth it
It's also the most widely compatible Windows programs. 32-bit x86 will run on all modern Windows versions Windows 10 onwards regardless of architecture. If the game doesn't require more than 2GB of RAM, or specific APIs that are 64-bit exclusive, why not compile for that?
That is due to three factors: - the age of the game, maybe they found out x64 didnt provide that much of a performance bump due to hardware limits at the time. - the game never ran on more than 2GB to begin with; voxel, indie, 2D, and pixel games are often compiled in x86 just for performance and compatibility sake. - the engine is the limitation, some things would be too hard to recompile in x64, in most of those cases itll mean a full rewrite of an old engine, maybe the game is intentionally using an old outdated engine for a specific reason, edge cases are real.
I think you should mention that 64 bit are in fact sometimes faster because of 2GB cap - some 32 bit apps pack few variables in one 32 bit one for example 4 8bit ones but 64bit versions don't and just go with 4 64bit variables and for that reason 64 version can read and write them without any problem where 32bit ones need to do bit shifting and masking to read/write. Good example of this is Team Fortress 2 that received massive performance boost after getting 64bit version. Also 64bit programs could have better precision (depends of implementation) and give more accurate results
I always thought one of the "strengths" of x86 was its ability to perform unaligned memory access without penalty? If so, why would packing 8 single byte variables into consecutive addresses be an issue? What am I missing here? Is it bad decisions from programmers manually unpacking things because the source code is also compilable on RISC systems?
In a business environment, 32 bit office is used often due to them using an add on that doesn't have a 64 bit version as its for some old software they haven't replaced...
Depending on 32-bit libs that can't be upgraded to 64-bit is a VERY common reason to stay with 32-bit versions of the parent application. It's not limited to Office. Same problem appears, for example, around audio/video editors using 32-bit plug-ins. In these cases it may be that the maker of the plug-in is no longer around or getting a newer version of the plug-in with 64-bit support involves prohibitive upgrade costs for the parent application and/or the plug-in.
1:01 *There is a RAM limit for even 64-bit applications as well. However, it really doesn't matter (at least for now) as that limit is far greater then what most computers have these days. For those wondering, the limit in question is 16 EiB, around 18.446 quintillion bytes, about 18 MILLION terabytes.
64-bit x86 instructions require more opcode overrides, are often longer, and therefore require more clocks to decode (depending on the length of a multi-byte instruction, from two to around 15 on the latest processors in 64-bit mode). However, to quote a well-used phrase: it's complicated. Modern (x86/CISC) processors are highly pipelined, non-linear, and superscalar with branch prediction and speculative execution so getting the mix of instructions necessary to measure the performance you think you're targeting is tricky, much more so than in the "old days" where each instruction had a predefined and static number of clocks per instruction, there was only one instruction decoded at a time, and there were only 16- or 32-bit modes which were very simplistic by modern standards (that's for e.g. 80386/80486).
On top of that, x86 is technically an umbrella term for 16-bit, 32‐bit, and 64-bit. What matters here is that all of these are based on the i8086 processor.
I'd like to interject here. x86_64 refers to a processor that supports the Intel 8086 and AMD64 extension to x86. x86 refers to a processor just supporting the Intel 8086 (i86) architecture. AMD64 refers to just the AMD64 extention to the Intel 8086 (i86) architecture. Note AMD and Intel cross-license these technologies which is why they both can manufacture processors without violating patent laws.
Even more fun, I believe in various cases, those other bits can contain flags and other data used by the OS for various things (though I don't know if these are usable on the application-side, they might only be meaningful in kernelspace).
@@VLS-Why I guess this is why they're packing data in there then if they're literally no effect as far as physical memory addresses goes. Messing with virtual memory pointers can probably mess up other runtime things though that might depend on values being consistent, that's mostly what I was referring to with applications.
I haven't tested this, but from what I know a lot of the speed difference is likely due to scheduling and CPU internal threading. Many 64bit operations where designed intentionally to be 2 32bit operations glue together. And can be used both ways. So if the code isn't linear it can do 2 32bit things in a single native 64bit instruction. It gets super complicated to identify exactly where the performance boost comes from when it ends up being platform dependent and microcode version and everything else. Because windows could make the decision at a kernel level scheduling since it's aware of the supported instructions, but it generally doesn't do that for user space programs.
In Data Oriented Programming Circles this is quite common knowledge actually. Data Oriented Programming fundamentally has one base thougth copying data hammers performance. Pointers which are Addresses in memory are bigger on 64 bit OS's when you are passing around "Values" to functions something programmers should consider is how big the "Value" is because that "Value" will be copied into the functions memory. When calling functions giving a small number(smaller than the address width in size) by value is cheaper then passing its address(32 or 64 bit in size). The problem when you are programming for computers where the address size is not "static" is that programms perform worse on 64 bit OS's because the addresses passed to functions are bigger so the threshold on when it is worth it to pass the value by value instead of as an address shifts. You can't fix this* Issue if you need to offer both 32 and 64 bit programms the 64 bit will perform worse in operations that use a lot of addresses. * unless you do a popular trick of Data Oriented Programming. If your Programm doesn't need more then 4GB of memory(you could have multiple blocks of 4GB) or less you could get a block of memory big enough to hold all the stuff you need and then instead of having pointers who's size can be 32 or 64 bit, you can have a location into that chunk of memory as a 32 bit number or smaller. This number will always be the same size even on 64 bit systems fixing this speed issue on 64 bit OS's. But this is veryyyyyy niche programming stuff that most programmers don't think about unless they are desperate for every operation per cpu cycle they can get.
Even more fun is applications which stored addresses in a 32 bit int... then got ported to 64 bits. Wine ran in to a bug along these lines a bit ago because apparently windows was generally putting everything down below 4GB and would generally work just fine. Wine wasn't respecting this and exposing badly written code in a lot of applications. Of course, because Windows did it a certain way, Wine has to do so too to assure these applications work as expected.
The RAM limits are a Windows _licence_ thing, lots of processors supported PAE (Physical Address Extension) which allows you to address more than the 4GB limit. Linux has been using this for years. I remember addressing more than 4GB RAM on Ubuntu years ago whereas on 32-bit Windows on the same PC it'd only address 4GB maximum. It seems that drivers sometimes were unstable on >4GB and so MS often disabled it on Home versions but server versions had it enabled to address RAM such as 32GB on Windows 2000 Datacenter.
32 bit is sometimes faster due to the decreased address size, but in most cases 64 bit is faster due to CPUs having more processing power dedicated to running 64 bit code. I think maybe in your case it's faster because you're using C# and maybe C# is unoptimized for 64 but or something. I think if you try writing the code in C++ 64 bit it should be faster. Also you should use benchmarks that take like a minute to run, not 6ms (unless you're running it like 10,000 times and averaging it)
There is a 16-bit wine compatibility layer for windows for old games.
7 ชั่วโมงที่ผ่านมา +2
8:23 As a 3rd semester in collage for Computer Engineering I guess we can assume it's like math using Limit from infinity cuz we only need to see 4 gb instead of inf. So of course it calculates faster. Like how we already chached games to ram instead HDD to see the lookup table
I specifically use the 32-bit version of the media player I'm using (MPC-HC), because I need it to work with a DirectShow filter that only exists in 32-bit, and programs can only load libraries of the same architecture type they're running. This was a problem for some time with Microsoft Office, and also web browsers when they were still compatible with Netscape or ActiveX (native code) plugins.
Another reason to choose 32 bit is dependencies on either legacy or very long-living versions that are required for specific things. Some of these don't compile, or misbehave on 64 bit, and they might bottleneck the rest of your app to require 32 bit compatibility. Though that's probably a bit less likely.
I think it's worth distinguishing between RAM and virtual addresses (VAs). Almost everything you discussed, you were talking about VAs. For example, the RAM was not partitioned on 32b OSs; the VA space was. I know this may be pedantic - but people may think, "Hey I only have 8GB of RAM, why would my 32b programs need 4GB of that?!" Also, VA usage doesn't map to RAM 1:1. Lots of VA usage may not even need RAM - like shared memory, memory mapped files, reserve VAs, etc.
I used to supply both 32 and 64 bit versions of a giant graphics app. Part of the difference was in the size of the dataset handled and how it handled the dataset. Obviously, the 64bit one tried to put a lot more in memory for performance reasons whereas the 32 bit one tended to use disk swapping more and for some refresh events was noticeably slower than the 64 bit version.
Fun fact: x86_64 supports a mode where you still get access to all the 64-bit features *but* pointers are 32 bits, so you can get all the nice native 64-bit integer stuff and the extra registers added while *also* having pointers take less memory. With memory speed being the big bottleneck these days, it can be a fair bit faster.
It’s not a mode per se. It’s an artificial limitation self imposed by programmers. And IIRC only linux had it and was pretty much DOA. And no, there is no actual underlying hardware support for using 64bit registers while in an “inferior mode”. The 386 and its successors did allow real mode code to use the full 32bit registers while addressing 1MB of ram like the 8086
Some caveats on the results. AI cannot give a good synthetic test. If AI is what made the test (as in what should be tested) then it's likely not a very good measuring stick. The other caveat is that the results are not necessarily scalable. In the test the 40% difference might be several milliseconds of difference whereas in an actual application you might still retain a difference of several milliseconds but the operations take 10 times that in genera making the % different. Always be careful with % differences, I'd say for future tests like this always give an absolute value because for all we know the difference will remain the same regardless of how long the operation takes in general.
Yea i mean it was more of a rough test, and obviously all the tests were repeated loops whereas in a real program not everything is gonna be those types of actions
@@ThioJoe How long did the loops run? Some of them are just a couple of seconds, where the time to load the program from the drive might be significant. Hopefully these ran for a minute each test.
I mean, I think in that case the "x86 translation" is in reference to the x86 instruction set family (which what most software is still compiled for and includes both 32-bit and 64-bit apps). I think if it was only able to translate 32-bit apps a lot of stuff would break.
the peak is 32 bit portable programs written into a single exe. Wicked fast. Especially with fast nvme drives. Probably the whole program loaded as whole as a "container" into ram and dissected in ram which is faster oppose to the program already dissected on nvme since some files when you copy it really slow but when you put them into a container and copy it that way that slowness during copying disappear.
Portable programs are probably fast cause the don't ask the operating system where their configuration files are, which could take multiple reads from disk. Assuming the config files don't exist, a non-portable application would still have to ask the operating system where they are stored, then only after that, check if they exist in the expected directory. A portable application on the other hand would only check the directory it resides in, which contents would already be loaded into memory. There are several reasons why copying container files are faster. You're probably referring to .zip, .rar or .7z files, which are faster just by being smaller as they are compressed. But, there are other container files like .tar that don't do compression and, just put a bunch of files into one big file. These are faster because there is overhead for each file copy, so if you copy a lot of files you get a lot of overhead. But, if you put them all into a single file you only get the overhead of one copy. In regards to "dissected in ram", I assume you referring storing a compressed program on disk, then uncompressing it at runtime. You can do this but there isn't much reason to compress programs themselves. Code is small enough your not going to notice a delay loading it into memory. But, the program assets like images, video, and sounds are almost never going to be uncompressed on disk and almost always "dissected"(uncompressed) in memory. Games don't even store images in memory uncompressed, instead working with compressed textures directly.
This isn't accurate, for a few reasons. First, if your app has a bunch of DLL dependencies but only needs one of them to start up and lazy loads the rest as-needed, bundling everything as a single exe forces everything to be eager-loaded from disk and slows down the startup time. There are uses cases for both, but neither is "faster" in the general sense. Second, when you say "especially with fast nvme drives", not only would that mean multiple disk reads are _less_ harmful for performance, but real world benchmarks show little difference in most OS-level operations between high-end NVME SSDs and low-end ones, and sometimes even older SATA SSDs. The performance advantages of those drives are more apparent in very large I/O operations. I think when users see a single bundled exe with no other files, it gives the impression that it's "clean" and therefore faster, but such claims need to be backed up with meaningful benchmarks before being perpetuated.
You might want to check into the x32 ABI. It has the speed increase from the 64-bit registers, but it has a 32-bit pointer size, so you double your cache real estate.
I think the 2GB limit actually comes from Windows using signed pointers. If you made your pointer checks GREATER THAN 0 rather than NOT 0, accessing beyond 2GB would blow up in your face. Unless something like that happened, there is literally no reason to not enable "LARGE_ADDRESS_AWARE" globally, like the Linux folks have done.
That's incorrect. It's carryover, like Theo said, from 32b OSs where kernel pointers had bit 31 of the VA set and user pointers didn't. That meant you could only use half of the VA space for user data. There's no reason for a pointer to be "signed". It does not represent a number for mathematical operations - it's an offset into a virtual address space.
I happen to use a 32 bit program all the time, in the form of my screen reader software. The NVDA screen reader is 32 bit on its' own, and this has bennifits such as supporting more voices, since older voices that some people like me use for screen reading are 32 bit only and don't have 64 bit equivalents.
You actually feel the pain on the lack of 32 bit versions of things when using an old atom netbook :(. Most games, even if 2D and lightweight, are made with only 64 bit in mind. So no chance of ever running them on those old atom cpus. Even some browsers nowadays dont offer updates to their 32 bit version, like brave.
Finally! Thanks for clearifying about 32 bit and 64 bit applications in this video. Most people wondered which one to choose. This will be the answer out.
Well it's more beneficial to run the OS and Kernel in 64-bit, or at least using 64-bit memory due to security implications of being able to guess 32-bit memory space.
On the windows XP machine I was working on, HWInfo 64 bit would not run. HWInfo 32 bit would (no surprise). So if you like working with older Windows OS machines, it is handy to have functional test tools. Good video BTW.
One important reason why 32 Bit programs perform well is that a smaller memory footprint also means a smaller CPU cache footprint. That means they can fit more especially into the fastest cache (L1) that is still very small (64 KB in AMD Zen 5 architecture). However, you should have included some raw mathematical equations in the speed comparisons, because this is where I'd expect 64 Bit versions to perform better - at least when it comes to bigger integers and more precise floating points, because those can be handled in one 64 Bit word but need two 32 Bit words. You might also find it interesting to look into the concept of 32 Bit ABIs (application binary interfaces). This was basically a proposition to make use of the advantages that came with AMD64 (especially the addition of new, wider CPU registers) without actually using 64 Bit instructions, but sticking to 32 Bit for performance. It's sometimes referred to as "x32". 32 Bit ABIs never found widespread use, but the Linux kernel has been ported to "x32" and some distributions such as Debian still actually release a version for it.
Yes! I'm still using MS Office 2003 SP3 32-bit on a decade old HP z420 workstation, those old school 32-bit Office Apps are lighting fast pretty much on every task such as Excel can launch almost instantaneously upon double-clicking the app or document icons. A big architectural difference between 64 and 32-bit is "maxium RAM addressing capability" which affects how many rows a worksheet can handle, but that doesn't really matter for most average users.
You didn't mention one very important detail, which is, if you compiled with or without optimizations? Though, I'd be more curious to see this tested with either gcc or clang, which have much better optimizations than MS. Also, 64 bit supports more instructions that can improve performance, that aren't usually present on 32 bit. SSE/SSE2/SSE3 for example are generally used for 64 bit compiles, since all x86-64 CPU's have this capability. Although you can use SSE/SSE2/SSE3 on 32 bit compiles, not all x86 CPUs have these instructions, therefore they generally aren't used by default. These are just some sets as examples, but there are other relevant instructions as well.
@@gavinrolls1054 The C# program he said he wrote, was just to check for LAA flags. I'm making the assumption that he's smart enough not to try to use an interpreted language to test for 32 vs 64 bit binary performance lol.
If you are using compilers such as GCC or Clang, 64 bits versions should be usually faster as all x86_64 CPU supports SSE and SSE2 SIMD instruction sets, then GCC/Clang enable automatic vectorization with those instructions by default on x86_64 binaries.
You can add `-mfpmath=sse` to your compiler flags to adjust the FP math appropriately. I know I did that around 2012; I even threw -msse2 in for good measure on the 32bit builds.
I've always seen the 32 bits programs just as an relic of the old days, basically support for them was there just for compatibility purposes (much like 16 bits programs could be run on 32 bits systems)
I had seen articles about this specific subject. Turns out that modern CPUs can detect they are running a 32 bit program and "merge" multiple 32 bit operations in the same 64 bits operation. For example they can do an "add" operation of 2 32 bits numbers and use the remaining 32 bits to perform another "add" with different 32 bit numbers because they can handle 64 bits in total at the same time. If I remember correctly what's even more interesting for speed is that 64 bits RAM can handle looking for 2 32 bits addressed numbers in the same clock cycle. Effectively doubling the speed for those specific values. I think this was discontinued at some point tho ...
Actually as a developer i can answer why we sont use 64bit always. Its because going to 64bit adds more overhead at cost of performance and ram usage and for 99% of our work we never need 64 bit so we default to 32bit and only enable 64bit if we think theres a specific reason too
Does this not introduce more work than just using 64bit tho? RAM is cheap these days and performance should be fine unless it's a really old or low end CPU or you need to maintain compatibility with older 32bit OS. I think the funniest example of 32bit program is a certain well known hardware sensor monitoring program that literally has 64 in the name - yet is 32bit.
@@Raivo_K How does this introduce more work? you literally change a compiler flag and that's it. We don't need to change the instructions by hand unless we're doing assembly by hand, so what difficulty is there to just tell my compiler to change from compiling a 64 bit executable to compiling a 32 bit executable?
It's not just about RAM usage. It's about the fact that on certain CPUs, the hardware implementation may make certain instructions faster to use than others, and there are many examples such as x86_64 where the native 64 bit instruction set has some instructions that are slower than the 32 bit instructions. Also, because the size of the instructions is smaller. Even if a 32 bit instruction on its own takes the same amount of cycles as the 64 bit version, the amount of instructions you can fit in the cache is larger when running 32 bit. Another thing that changes is "fast" types, which again, depends width of the type and the associated instructions. For example, in C the int_fast32_t type in a standard compliant compiler for Windows 10 on an x86_64 machine should be a typedef for int, which should correspond on said platform to a 32 bit integer, again because in the case of x86_64 both the 32 and 64 bit integer operations are equally fast, but also because performing multiple operations over a segment of memory with multiple adjacent integers is faster when you can fit more of those integers on the cache, thus leading to the 32 bit type being considered the faster one. In short, there are a lot of factors to consider, but usually a 32 bit integer is going to be faster to operate on because you can fit more 32 bit instructions in the cache.
Unless the program you're dealing with conceptually is processing large amounts of data 32bit would be faster, part of the reason is that the instructions themselves are smaller, allowing more of the program to be cached, and sometimes equivalent instructions may be faster. That said, don't install mismatched versions, you can easily end up with weird broken behavior.
Can always use Linux - 64bit Linux/Wine will run happily run 16bit windows executables too new for DOS. I think there is a wine based NTVDM you can install for 64bit windows though.
In some cases 64-bit may be faster because it has additional instruction sets like AVX that aren't available for 32-bit programs. Especially for anything video related
I know someone who wrongfully assumed that x86 must be better than x64 because it is a greater number but I am sure that a lot of people have been confused by that at one point.
I wonder if this means that if you have a 64bit app that is constantly taking up more memory than it actually needs (to do things that are essentially useless or relatively unimportant) you'd be better off with the 32bit version, since it can't exceed the 4GB limit.
Historic Reasons. Windows NT usually called the 32 bit port to 80386 CPUs as x86 so when 64 bit versions of NT were released the x86 name stuck and continued to mean what it always meant in NT world, the 32bit version for x86 family CPUs.
Surely the "(32 bit)" that is appended to the process name in task manager is a simpler way of finding out? Or has Microsoft removed that in the latest versions of Windows?
Is there a way to force install 32 bit program on a 64 bit version of Windows 10, I have a couple of older programs that pops up a error message "can't install 32 bit program" ?. If you've done one before could you direct me to the video, thank you.
I know the memory allocation is up to the OS, but what if I have over 4GB of 32bit programs running? Will there be memory addressing issues? How does it make calls to memory address that are larger than than the architecture? How does that work?
'if is faster to run a 32 bit program?' it can be even faster if the OS makes it so 32 bit programs can run as 64 bit programs. virtual memory means that you can have a 4GB page table of memory which exists within the overlaying 64bit addressspace. the big thing is that when 32 bit programs or 16 bit ones are made parrellel you get huge speed ups.
Thanks for making this video, Thio. Though I have a Win 11 PC, still have two 32-bit Asus T100TA small laptops/tablets that still work perfectly. Only 2GB RAM, so after Oct. 2025 will have to decide: ESU Windows 10 for 3 more years (School/Teacher discount)...or perhaps 32-bit MX Linux. I mean, why throw out even MORE e-waste, if things still work well?
I would ask questions about how this might work with a JIT runtime like .NET Core. Will the initial compilation to CIL pick the more effective size for each variable? Will the Roslyn analyze the hot code paths and then recompile variables with a different size based on how they run?
also values are a lot more limited too! 32bit can't reach the same values limits as 64bit. Very few programs would need the 64bit range but like he said going for 64bit by default should be fine but its something to keep in mind.
The 2038 problem is an issue for unix and linux systems. Windows stores dates differently, and it isnt affected as much, though it has its own variant of this issue in the form of year 10000 problem.
Your results are surprising to me. Maybe this tendency for 32bit software to run faster is mainly a windows thing? Or maybe also it's because of the language and the compiler you're using? I dabble in C programming on linux and I've done a fair bit of testing of my programs on 32bit and 64bit and in all of my tests that I can remember, the 64bit version ran faster, often a lot faster, on the same computer system
Unless anything is built with code from Google Chrome as the browser is bloated when it comes down to memory as 32-bit support degrades when it comes down to performance on lower ended PCs. Also, Windows 11 dumped support for 32-bit x86 processors altogether, making it 64-bit exclusive.
The RAM limit would be per-process, right? Modern web browsers have separate processes for each tab, so I would think that while they could add up to more than 4GB (assuming you're using a 64-bit OS), each process could use 2-4GB in 32-bit.
Have you tried comparing the performance of operations on 64 bit integers (or floating point numbers, aka doubles)? I'm assuming that that would be significantly faster on x64
Steam being 32bit is ok. But it being able to use only 2 cores is unforgivable when i need to check the game files of a gigantic game and by ssd is barely being used due to 2 cores being maxed out.
I tested running this on linux with wine, and it was actually a toss up with 64bit winning both LinkedList and Dictionary by 8%, and 32bit only having a 2% lead on the others. Maybe this is just wow64 being better than native 64bit on windows...
amd64 or x86_64 actually only uses 48 bit of the 64 bit addresses. So your theoretical limit is far beyond whats actually possible. Also one should consider that the address space of programs is compartmentalized into areas by the OS, for a few reasons, eg. that a program can have an address to memory that the system owns. So you can't even use all 48 bits for the program itself. Though, this compartmentalization depends on the OS,so I guess you might be able to create a bootable program that has access to everything. Theoretically. The 48 bit limitation is unavoidable with current hardware.
Re. Speed difference. Unless I'm missing something the 32 bit program should be limited to half the runtime at worst. And that's in the limiting case that all the data your program handles are pointers, and that your program speed is bound by the memory bandwidth of loading these addresses. So something like datastructures mostly made out of pointers, which seems to have been one of your examples. But most programs written for speed don't tend to be that way, for started they don't use C#. Additionally, AFAIK, most modern instruction extensions such as SIMD expensions are only available in 64 bit, so you loose out on these potential speedups.
8:31 this is not necessarily true, in C, if you use any integer which is less than 32 bit (e.g. int16_t) the operating have to do additional check to make sure it doesn't overflow, which makes it slower, but not necessarily less memory. Looks like in most cases, using a 32 bit integer is the most compatible with RAM and CPU, because the can read it readily, using anything different forces the compiler to check the range of the integers during the compile time, slowing down your system slightly.
@@ThioJoe I'm not 100% sure, but I think the original commenter is talking about aligning types to the native word size of the architecture. So if your native word size is 64-bit, for instance, and you use a 16 bit integer, it's possible it will be extended to 64-bit to maintain alignment at runtime. This could have a small performance impact, but with modern CPUs it would be basically insignificant outside of insanely performance critical work or find yourself using an unusual architecture. Otherwise the C compiler isn't doing any kind of range checking. If you overflow a signed 16-bit integer, it's undefined behavior. If you overflow an unsigned integer, it just wraps around. There are also situations like alignment padding of structs where it may pad smaller types to align to larger types, like if you have a struct with an int16_t and an int32_t the compiler will pad the int16_t to align with the int32_t.
@@ThioJoe Good question, the reason 32 bit file can be read the fastest during compilation is because the OS can read it without any checks. Technically, when you are running an exe file, it should be in the lowest level possible (binary). So it really surpises me when 32 bit outperforms 64 bit. One reason might be because 32 bit has a better optimization because all PCs have to run it?
I use old computers so usually my thoughts on the matter are: _"Internally - All this program does is ______________. And it'll never, ever, consume a gig of ram to do what it does. _So like.. was ruining backwards compatibility reaaaaalyyyyy worth it"_
I wish 32-bit apps would just go away. I don't like the idea of having to install two versions of every library just to be able to run steam. I'm glad macOS dropped it 5 years ago. Unless it's a really old program, there's no reason to have 32-bit anymore.
Sponsored: Don't leave you and your family vulnerable to data breaches! Go to aura.com/thiojoe to get a 14-day free trial to Aura.
sigma
am i first?
if i was not first, i shouldn't have rewritten the comment
but who cares
The voice note feature is ❤
As a SNES programmer, this makes perfect sense to me. Often times working in 8bit mode is faster than 16bit, because an extra clock cycle is spent decoding the extra byte. I just never really considered it for modern apps. Thanks.
@@InsaneFirebat If I were to guess, this is most likely due to the CPU cache potentially being able to fit more pointer memory addresses since they're shorter.
it depends on the specific cpu and on the mixture of instructions, this requires a serious research if you like to squish every drop of power out of a cpu...
There is also an interesting ABI for x86-64 which compiles 64-bit code with 32-bit pointers, where you can use the full 64-bit range of instructions, but don't have access to addresses above 4GiB. I know there's a Debian port out there that does that. I'm not sure how feasible that would be on Windows.
Larger 64-bit instructions are also going to be slower on 64-bit machines when the code isn't in the cache. They can make up for performance after load and math operations that weren't natively available on 32-bit, but so many operations can just be done with 32-bit anway.
The x86 platform however uses variable length instructions so you aren't doubling the size of the code going to x64. In addition, certain workloads can be considerably faster because the x64 platform has more general purpose registers.
Note too that there is an extra emulation layer in Windows 64-bit for 32-bit applications which means if there are a lot of kernel calls there may be a bit more overhead going from the program to 32-bit NTDLL to 64-bit NTDLL to kernel.
I always wondered why a lot of the games on Steam are 32-bit
Usually is because they are old
recompile them into x86_64
Usually they are just outdated and weren't updated, or updating them causes issues that would require re-writing a lot of parts from the game's engine, which might be either impossible (no source access) or very expensive (dev hours) and just not worth it
It's also the most widely compatible Windows programs. 32-bit x86 will run on all modern Windows versions Windows 10 onwards regardless of architecture. If the game doesn't require more than 2GB of RAM, or specific APIs that are 64-bit exclusive, why not compile for that?
That is due to three factors:
- the age of the game, maybe they found out x64 didnt provide that much of a performance bump due to hardware limits at the time.
- the game never ran on more than 2GB to begin with; voxel, indie, 2D, and pixel games are often compiled in x86 just for performance and compatibility sake.
- the engine is the limitation, some things would be too hard to recompile in x64, in most of those cases itll mean a full rewrite of an old engine, maybe the game is intentionally using an old outdated engine for a specific reason, edge cases are real.
I think you should mention that 64 bit are in fact sometimes faster because of 2GB cap - some 32 bit apps pack few variables in one 32 bit one for example 4 8bit ones but 64bit versions don't and just go with 4 64bit variables and for that reason 64 version can read and write them without any problem where 32bit ones need to do bit shifting and masking to read/write. Good example of this is Team Fortress 2 that received massive performance boost after getting 64bit version. Also 64bit programs could have better precision (depends of implementation) and give more accurate results
Hm true, though I guess that falls outside the realm of “all other things being equal” if the logic is different
I always thought one of the "strengths" of x86 was its ability to perform unaligned memory access without penalty? If so, why would packing 8 single byte variables into consecutive addresses be an issue? What am I missing here? Is it bad decisions from programmers manually unpacking things because the source code is also compilable on RISC systems?
In a business environment, 32 bit office is used often due to them using an add on that doesn't have a 64 bit version as its for some old software they haven't replaced...
Depending on 32-bit libs that can't be upgraded to 64-bit is a VERY common reason to stay with 32-bit versions of the parent application. It's not limited to Office. Same problem appears, for example, around audio/video editors using 32-bit plug-ins. In these cases it may be that the maker of the plug-in is no longer around or getting a newer version of the plug-in with 64-bit support involves prohibitive upgrade costs for the parent application and/or the plug-in.
1:01 *There is a RAM limit for even 64-bit applications as well. However, it really doesn't matter (at least for now) as that limit is far greater then what most computers have these days. For those wondering, the limit in question is 16 EiB, around 18.446 quintillion bytes, about 18 MILLION terabytes.
If you watched, like, a few minutes more, Thio mentionned exactly this...
bro didnt finish the video
"640 kB ought to be enough for anybody" fake quote that's still relevant today.
@@RedOneM ... Is why I said "at least for now"
Can't wait for 128bit computers
64-bit x86 instructions require more opcode overrides, are often longer, and therefore require more clocks to decode (depending on the length of a multi-byte instruction, from two to around 15 on the latest processors in 64-bit mode). However, to quote a well-used phrase: it's complicated. Modern (x86/CISC) processors are highly pipelined, non-linear, and superscalar with branch prediction and speculative execution so getting the mix of instructions necessary to measure the performance you think you're targeting is tricky, much more so than in the "old days" where each instruction had a predefined and static number of clocks per instruction, there was only one instruction decoded at a time, and there were only 16- or 32-bit modes which were very simplistic by modern standards (that's for e.g. 80386/80486).
0:43 unless its x86_64
Then it's the same as x64 or AMD64 ;-)
@ yep, just different systems preferring to call it by different names
On top of that, x86 is technically an umbrella term for 16-bit, 32‐bit, and 64-bit. What matters here is that all of these are based on the i8086 processor.
@@andreasjoannai6441 Yep, true!
I'd like to interject here.
x86_64 refers to a processor that supports the Intel 8086 and AMD64 extension to x86.
x86 refers to a processor just supporting the Intel 8086 (i86) architecture.
AMD64 refers to just the AMD64 extention to the Intel 8086 (i86) architecture.
Note AMD and Intel cross-license these technologies which is why they both can manufacture processors without violating patent laws.
At 2:26, that's 15.5 and 11.5 MEGABYTES respectively
that was a K like the old windows which represents Kilobytes not Megabytes
so yes
@@ThioJoewow voice note reply. since when was that a thing
@@N1r4 is it supported for you? for me it has a transcription and says voice reply is not supported
Fyi modern processors don't actually use 64 bit addresses, that is just the datapath. Addresses are ~50 bits
Hmm interesting i’ll have to read more into that 🤔
Linux reports like 48 bits
Even more fun, I believe in various cases, those other bits can contain flags and other data used by the OS for various things (though I don't know if these are usable on the application-side, they might only be meaningful in kernelspace).
@@Aeduo I am CPU engineer, the bits physically do not exist in the silicon
@@VLS-Why I guess this is why they're packing data in there then if they're literally no effect as far as physical memory addresses goes. Messing with virtual memory pointers can probably mess up other runtime things though that might depend on values being consistent, that's mostly what I was referring to with applications.
your caption says 64 could be faster, but you said 32 could be
32 is the correct one
I haven't tested this, but from what I know a lot of the speed difference is likely due to scheduling and CPU internal threading.
Many 64bit operations where designed intentionally to be 2 32bit operations glue together. And can be used both ways.
So if the code isn't linear it can do 2 32bit things in a single native 64bit instruction.
It gets super complicated to identify exactly where the performance boost comes from when it ends up being platform dependent and microcode version and everything else. Because windows could make the decision at a kernel level scheduling since it's aware of the supported instructions, but it generally doesn't do that for user space programs.
In Data Oriented Programming Circles this is quite common knowledge actually.
Data Oriented Programming fundamentally has one base thougth copying data hammers performance.
Pointers which are Addresses in memory are bigger on 64 bit OS's when you are passing around "Values"
to functions something programmers should consider is how big the "Value" is because that "Value" will be copied into the functions memory. When calling functions giving a small number(smaller than the address width in size) by value is cheaper then passing its address(32 or 64 bit in size).
The problem when you are programming for computers where the address size is not "static" is that programms perform worse on 64 bit OS's because the addresses passed to functions are bigger so the threshold on when it is worth it to pass the value by value instead of as an address shifts.
You can't fix this* Issue if you need to offer both 32 and 64 bit programms the 64 bit will perform worse in operations that use a lot of addresses.
* unless you do a popular trick of Data Oriented Programming. If your Programm doesn't need more then 4GB of memory(you could have multiple blocks of 4GB) or less
you could get a block of memory big enough to hold all the stuff you need and then instead of having pointers who's size can be 32 or 64 bit,
you can have a location into that chunk of memory as a 32 bit number or smaller. This number will always be the same size even on 64 bit systems fixing this speed issue on 64 bit OS's.
But this is veryyyyyy niche programming stuff that most programmers don't think about unless they are desperate for every operation per cpu cycle they can get.
lucky i dont write code
"well, isn't it a bit faster?" (a joke)
@@God-ld6ll this will be at the top for sure
Yes, more than one. it's -32 bits faster
That's a "bit" of humor 😅
Even more fun is applications which stored addresses in a 32 bit int... then got ported to 64 bits. Wine ran in to a bug along these lines a bit ago because apparently windows was generally putting everything down below 4GB and would generally work just fine. Wine wasn't respecting this and exposing badly written code in a lot of applications. Of course, because Windows did it a certain way, Wine has to do so too to assure these applications work as expected.
The RAM limits are a Windows _licence_ thing, lots of processors supported PAE (Physical Address Extension) which allows you to address more than the 4GB limit.
Linux has been using this for years. I remember addressing more than 4GB RAM on Ubuntu years ago whereas on 32-bit Windows on the same PC it'd only address 4GB maximum.
It seems that drivers sometimes were unstable on >4GB and so MS often disabled it on Home versions but server versions had it enabled to address RAM such as 32GB on Windows 2000 Datacenter.
32 bit is sometimes faster due to the decreased address size, but in most cases 64 bit is faster due to CPUs having more processing power dedicated to running 64 bit code.
I think maybe in your case it's faster because you're using C# and maybe C# is unoptimized for 64 but or something. I think if you try writing the code in C++ 64 bit it should be faster.
Also you should use benchmarks that take like a minute to run, not 6ms (unless you're running it like 10,000 times and averaging it)
1:20 should say 32bit on screen
Yes
Ah I didn’t catch that 😩
@@ThioJoe no worries, just reupload the video with the corrections :P
There is a 16-bit wine compatibility layer for windows for old games.
8:23 As a 3rd semester in collage for Computer Engineering I guess we can assume it's like math using Limit from infinity cuz we only need to see 4 gb instead of inf. So of course it calculates faster.
Like how we already chached games to ram instead HDD to see the lookup table
I specifically use the 32-bit version of the media player I'm using (MPC-HC), because I need it to work with a DirectShow filter that only exists in 32-bit, and programs can only load libraries of the same architecture type they're running.
This was a problem for some time with Microsoft Office, and also web browsers when they were still compatible with Netscape or ActiveX (native code) plugins.
In Task Manager's Details tab you can see if a process is x86 or x64, if you have Architecture column enabled.
Another reason to choose 32 bit is dependencies on either legacy or very long-living versions that are required for specific things. Some of these don't compile, or misbehave on 64 bit, and they might bottleneck the rest of your app to require 32 bit compatibility. Though that's probably a bit less likely.
I think it's worth distinguishing between RAM and virtual addresses (VAs). Almost everything you discussed, you were talking about VAs.
For example, the RAM was not partitioned on 32b OSs; the VA space was.
I know this may be pedantic - but people may think, "Hey I only have 8GB of RAM, why would my 32b programs need 4GB of that?!"
Also, VA usage doesn't map to RAM 1:1. Lots of VA usage may not even need RAM - like shared memory, memory mapped files, reserve VAs, etc.
I used to supply both 32 and 64 bit versions of a giant graphics app. Part of the difference was in the size of the dataset handled and how it handled the dataset. Obviously, the 64bit one tried to put a lot more in memory for performance reasons whereas the 32 bit one tended to use disk swapping more and for some refresh events was noticeably slower than the 64 bit version.
Fun fact: x86_64 supports a mode where you still get access to all the 64-bit features *but* pointers are 32 bits, so you can get all the nice native 64-bit integer stuff and the extra registers added while *also* having pointers take less memory. With memory speed being the big bottleneck these days, it can be a fair bit faster.
It’s not a mode per se. It’s an artificial limitation self imposed by programmers. And IIRC only linux had it and was pretty much DOA.
And no, there is no actual underlying hardware support for using 64bit registers while in an “inferior mode”. The 386 and its successors did allow real mode code to use the full 32bit registers while addressing 1MB of ram like the 8086
I wondered about this, so, thanks for going over this info, Joe!
Some caveats on the results. AI cannot give a good synthetic test. If AI is what made the test (as in what should be tested) then it's likely not a very good measuring stick. The other caveat is that the results are not necessarily scalable. In the test the 40% difference might be several milliseconds of difference whereas in an actual application you might still retain a difference of several milliseconds but the operations take 10 times that in genera making the % different. Always be careful with % differences, I'd say for future tests like this always give an absolute value because for all we know the difference will remain the same regardless of how long the operation takes in general.
Yea i mean it was more of a rough test, and obviously all the tests were repeated loops whereas in a real program not everything is gonna be those types of actions
@@ThioJoe How long did the loops run? Some of them are just a couple of seconds, where the time to load the program from the drive might be significant. Hopefully these ran for a minute each test.
@ThioJoe will you ever do a video or show some links to where you get your desktop wallpapers from and your rainmeter widgets? Please and thanks 🙂
Also, not to forget, Wndows 10 ARM does not support x64, it only supports x86 translation
I mean, I think in that case the "x86 translation" is in reference to the x86 instruction set family (which what most software is still compiled for and includes both 32-bit and 64-bit apps). I think if it was only able to translate 32-bit apps a lot of stuff would break.
Who uses Windows 10 ARM anymore? I think most WoA users are on Windows 11 now.
@@PASRC You could also ask who uses a 32bit CPU anymore? We've had 64bit CPUs for two decades now.
the peak is 32 bit portable programs written into a single exe. Wicked fast. Especially with fast nvme drives. Probably the whole program loaded as whole as a "container" into ram and dissected in ram which is faster oppose to the program already dissected on nvme since some files when you copy it really slow but when you put them into a container and copy it that way that slowness during copying disappear.
Portable programs are probably fast cause the don't ask the operating system where their configuration files are, which could take multiple reads from disk. Assuming the config files don't exist, a non-portable application would still have to ask the operating system where they are stored, then only after that, check if they exist in the expected directory. A portable application on the other hand would only check the directory it resides in, which contents would already be loaded into memory.
There are several reasons why copying container files are faster. You're probably referring to .zip, .rar or .7z files, which are faster just by being smaller as they are compressed. But, there are other container files like .tar that don't do compression and, just put a bunch of files into one big file. These are faster because there is overhead for each file copy, so if you copy a lot of files you get a lot of overhead. But, if you put them all into a single file you only get the overhead of one copy.
In regards to "dissected in ram", I assume you referring storing a compressed program on disk, then uncompressing it at runtime. You can do this but there isn't much reason to compress programs themselves. Code is small enough your not going to notice a delay loading it into memory. But, the program assets like images, video, and sounds are almost never going to be uncompressed on disk and almost always "dissected"(uncompressed) in memory. Games don't even store images in memory uncompressed, instead working with compressed textures directly.
This isn't accurate, for a few reasons. First, if your app has a bunch of DLL dependencies but only needs one of them to start up and lazy loads the rest as-needed, bundling everything as a single exe forces everything to be eager-loaded from disk and slows down the startup time. There are uses cases for both, but neither is "faster" in the general sense. Second, when you say "especially with fast nvme drives", not only would that mean multiple disk reads are _less_ harmful for performance, but real world benchmarks show little difference in most OS-level operations between high-end NVME SSDs and low-end ones, and sometimes even older SATA SSDs. The performance advantages of those drives are more apparent in very large I/O operations.
I think when users see a single bundled exe with no other files, it gives the impression that it's "clean" and therefore faster, but such claims need to be backed up with meaningful benchmarks before being perpetuated.
You might want to check into the x32 ABI. It has the speed increase from the 64-bit registers, but it has a 32-bit pointer size, so you double your cache real estate.
I think the 2GB limit actually comes from Windows using signed pointers.
If you made your pointer checks GREATER THAN 0 rather than NOT 0, accessing beyond 2GB would blow up in your face.
Unless something like that happened, there is literally no reason to not enable "LARGE_ADDRESS_AWARE" globally, like the Linux folks have done.
That's incorrect. It's carryover, like Theo said, from 32b OSs where kernel pointers had bit 31 of the VA set and user pointers didn't. That meant you could only use half of the VA space for user data.
There's no reason for a pointer to be "signed". It does not represent a number for mathematical operations - it's an offset into a virtual address space.
I happen to use a 32 bit program all the time, in the form of my screen reader software. The NVDA screen reader is 32 bit on its' own, and this has bennifits such as supporting more voices, since older voices that some people like me use for screen reading are 32 bit only and don't have 64 bit equivalents.
You actually feel the pain on the lack of 32 bit versions of things when using an old atom netbook :(. Most games, even if 2D and lightweight, are made with only 64 bit in mind. So no chance of ever running them on those old atom cpus. Even some browsers nowadays dont offer updates to their 32 bit version, like brave.
Finally! Thanks for clearifying about 32 bit and 64 bit applications in this video. Most people wondered which one to choose. This will be the answer out.
Though that aarch64 was better in this scenario. But 32 bit OS on a 64 bit PC is hella slow and quite inefficient with overhead.
Well it's more beneficial to run the OS and Kernel in 64-bit, or at least using 64-bit memory due to security implications of being able to guess 32-bit memory space.
On the windows XP machine I was working on, HWInfo 64 bit would not run. HWInfo 32 bit would (no surprise). So if you like working with older Windows OS machines, it is handy to have functional test tools. Good video BTW.
One important reason why 32 Bit programs perform well is that a smaller memory footprint also means a smaller CPU cache footprint. That means they can fit more especially into the fastest cache (L1) that is still very small (64 KB in AMD Zen 5 architecture). However, you should have included some raw mathematical equations in the speed comparisons, because this is where I'd expect 64 Bit versions to perform better - at least when it comes to bigger integers and more precise floating points, because those can be handled in one 64 Bit word but need two 32 Bit words.
You might also find it interesting to look into the concept of 32 Bit ABIs (application binary interfaces). This was basically a proposition to make use of the advantages that came with AMD64 (especially the addition of new, wider CPU registers) without actually using 64 Bit instructions, but sticking to 32 Bit for performance. It's sometimes referred to as "x32". 32 Bit ABIs never found widespread use, but the Linux kernel has been ported to "x32" and some distributions such as Debian still actually release a version for it.
Yes! I'm still using MS Office 2003 SP3 32-bit on a decade old HP z420 workstation, those old school 32-bit Office Apps are lighting fast pretty much on every task such as Excel can launch almost instantaneously upon double-clicking the app or document icons. A big architectural difference between 64 and 32-bit is "maxium RAM addressing capability" which affects how many rows a worksheet can handle, but that doesn't really matter for most average users.
Also, do not forget AWE support. You can pin physical pages that you can map into your mem space as needed going beyond the 4gb limit.
im here cus of voice messages
Thanks for the info......I never thought for this aspect
Correction: at 1:27, it should say 32 instead of 64 on the screen,
Not really.
He is talking about 32-bit being faster and the text is an addendum that says 62-bit might be faster in some cases.
@@AltonV I suppose in the literal sense, but it'd make more sense to put 32 instead of 64 there
You didn't mention one very important detail, which is, if you compiled with or without optimizations? Though, I'd be more curious to see this tested with either gcc or clang, which have much better optimizations than MS. Also, 64 bit supports more instructions that can improve performance, that aren't usually present on 32 bit. SSE/SSE2/SSE3 for example are generally used for 64 bit compiles, since all x86-64 CPU's have this capability. Although you can use SSE/SSE2/SSE3 on 32 bit compiles, not all x86 CPUs have these instructions, therefore they generally aren't used by default. These are just some sets as examples, but there are other relevant instructions as well.
he can't use gcc clang or even ms, he wrote a c# program lol
@@gavinrolls1054 The C# program he said he wrote, was just to check for LAA flags. I'm making the assumption that he's smart enough not to try to use an interpreted language to test for 32 vs 64 bit binary performance lol.
If you are using compilers such as GCC or Clang, 64 bits versions should be usually faster as all x86_64 CPU supports SSE and SSE2 SIMD instruction sets, then GCC/Clang enable automatic vectorization with those instructions by default on x86_64 binaries.
You can add `-mfpmath=sse` to your compiler flags to adjust the FP math appropriately.
I know I did that around 2012; I even threw -msse2 in for good measure on the 32bit builds.
Thank you for asking the questions we were too afraid to ask
Not the speed but 64bit is often more stable 🤔
This blew my mind
Are u still alive?
Thanks for the video!
I've always seen the 32 bits programs just as an relic of the old days, basically support for them was there just for compatibility purposes (much like 16 bits programs could be run on 32 bits systems)
I had seen articles about this specific subject.
Turns out that modern CPUs can detect they are running a 32 bit program and "merge" multiple 32 bit operations in the same 64 bits operation. For example they can do an "add" operation of 2 32 bits numbers and use the remaining 32 bits to perform another "add" with different 32 bit numbers because they can handle 64 bits in total at the same time.
If I remember correctly what's even more interesting for speed is that 64 bits RAM can handle looking for 2 32 bits addressed numbers in the same clock cycle. Effectively doubling the speed for those specific values. I think this was discontinued at some point tho ...
Actually as a developer i can answer why we sont use 64bit always. Its because going to 64bit adds more overhead at cost of performance and ram usage and for 99% of our work we never need 64 bit so we default to 32bit and only enable 64bit if we think theres a specific reason too
Does this not introduce more work than just using 64bit tho?
RAM is cheap these days and performance should be fine unless it's a really old or low end CPU or you need to maintain compatibility with older 32bit OS.
I think the funniest example of 32bit program is a certain well known hardware sensor monitoring program that literally has 64 in the name - yet is 32bit.
"...Why we sont use..." bro he's almost talking french.
Makes sense. If the "upper half" of every pointer address is always all zeroes anyway, why bother?
@@Raivo_K How does this introduce more work? you literally change a compiler flag and that's it. We don't need to change the instructions by hand unless we're doing assembly by hand, so what difficulty is there to just tell my compiler to change from compiling a 64 bit executable to compiling a 32 bit executable?
It's not just about RAM usage. It's about the fact that on certain CPUs, the hardware implementation may make certain instructions faster to use than others, and there are many examples such as x86_64 where the native 64 bit instruction set has some instructions that are slower than the 32 bit instructions. Also, because the size of the instructions is smaller. Even if a 32 bit instruction on its own takes the same amount of cycles as the 64 bit version, the amount of instructions you can fit in the cache is larger when running 32 bit.
Another thing that changes is "fast" types, which again, depends width of the type and the associated instructions. For example, in C the int_fast32_t type in a standard compliant compiler for Windows 10 on an x86_64 machine should be a typedef for int, which should correspond on said platform to a 32 bit integer, again because in the case of x86_64 both the 32 and 64 bit integer operations are equally fast, but also because performing multiple operations over a segment of memory with multiple adjacent integers is faster when you can fit more of those integers on the cache, thus leading to the 32 bit type being considered the faster one.
In short, there are a lot of factors to consider, but usually a 32 bit integer is going to be faster to operate on because you can fit more 32 bit instructions in the cache.
How do you make these test programs? can You make a tutorial pls?
@@ThioJoe what how did not not about this voice feature
@@Zynorius00 It’s a feature they’re having me test 👀 I don’t think I can say much more than that
@@ThioJoe really interesting
@@ThioJoe Thanks for your Answer. This means a lot that my idol is reacting to my comment.
unrelated but thanks to this video I found out why some old games need a "4GB ram patch"
Loved every moment of this video! Thanks a million! 💙
Hi Thio. 👋
Unless the program you're dealing with conceptually is processing large amounts of data 32bit would be faster, part of the reason is that the instructions themselves are smaller, allowing more of the program to be cached, and sometimes equivalent instructions may be faster. That said, don't install mismatched versions, you can easily end up with weird broken behavior.
Oh, that's good to know, especially for some always running small utility programs and that kind of stuff to save a bit of CPU/RAM usage.
1:16 You should have typed 32 Bit Programs Can be Faster
Was thinking the same
8088 < 8086 < 286 < x86 < P5 < Core Solo < x64 < i9
the only reason i found for using 32 bit windows is if your computer does not support it or you want to run 16 bit apps
True, 32 bit versions of Windows do support 16 bit apps. I wonder if there are emulators you can download though and run on 64 bit
@@ThioJoe you can run windows 3.1 under dosbox or use some sort of VM
Can always use Linux - 64bit Linux/Wine will run happily run 16bit windows executables too new for DOS.
I think there is a wine based NTVDM you can install for 64bit windows though.
@@billy65bob yeah i guess but i have had issues with windows apps under wine before
Who else came from deep humour and the voice message 😂
process explorer has a hidden column "image type" you can enable.
In some cases 64-bit may be faster because it has additional instruction sets like AVX that aren't available for 32-bit programs. Especially for anything video related
We don't use full 64 but addresses. We usually only have partial support. I think it's 48 but addressed upper and lower memory.
I know someone who wrongfully assumed that x86 must be better than x64 because it is a greater number but I am sure that a lot of people have been confused by that at one point.
I wonder if this means that if you have a 64bit app that is constantly taking up more memory than it actually needs (to do things that are essentially useless or relatively unimportant) you'd be better off with the 32bit version, since it can't exceed the 4GB limit.
Had it this week with an insanely big excel file from our AD user export. 32bit excel was basically unusable and extremely laggy
Likely Steam is still 32-bit because they want to ensure backwards compatibility with older games.
10:38 So, why is the 32-bit version labeled x86 and not x32?
Thanks.
Historic Reasons. Windows NT usually called the 32 bit port to 80386 CPUs as x86 so when 64 bit versions of NT were released the x86 name stuck and continued to mean what it always meant in NT world, the 32bit version for x86 family CPUs.
Surely the "(32 bit)" that is appended to the process name in task manager is a simpler way of finding out? Or has Microsoft removed that in the latest versions of Windows?
The Ai apparantly has never trained on hammer videos
Is there a way to force install 32 bit program on a 64 bit version of Windows 10, I have a couple of older programs that pops up a error message "can't install 32 bit program" ?. If you've done one before could you direct me to the video, thank you.
I know the memory allocation is up to the OS, but what if I have over 4GB of 32bit programs running? Will there be memory addressing issues? How does it make calls to memory address that are larger than than the architecture? How does that work?
'if is faster to run a 32 bit program?'
it can be even faster if the OS makes it so 32 bit programs can run as 64 bit programs. virtual memory means that you can have a 4GB page table of memory which exists within the overlaying 64bit addressspace. the big thing is that when 32 bit programs or 16 bit ones are made parrellel you get huge speed ups.
Thanks for making this video, Thio. Though I have a Win 11 PC, still have two 32-bit Asus T100TA small laptops/tablets that still work perfectly. Only 2GB RAM, so after Oct. 2025 will have to decide: ESU Windows 10 for 3 more years (School/Teacher discount)...or perhaps 32-bit MX Linux.
I mean, why throw out even MORE e-waste, if things still work well?
You could also consider Debian:)
I would ask questions about how this might work with a JIT runtime like .NET Core. Will the initial compilation to CIL pick the more effective size for each variable? Will the Roslyn analyze the hot code paths and then recompile variables with a different size based on how they run?
Dam, when I thought 64bit was the new norm
I gotta test this on my games
also values are a lot more limited too! 32bit can't reach the same values limits as 64bit. Very few programs would need the 64bit range but like he said going for 64bit by default should be fine but its something to keep in mind.
32-bit Windows has an expiration date of 2038, I don't think anyone's working on this closed-source problem.
The 2038 problem is an issue for unix and linux systems. Windows stores dates differently, and it isnt affected as much, though it has its own variant of this issue in the form of year 10000 problem.
So, in what applications would you use the 32-bit version instead of the 64-bit version?
Thanks.
Your results are surprising to me. Maybe this tendency for 32bit software to run faster is mainly a windows thing? Or maybe also it's because of the language and the compiler you're using? I dabble in C programming on linux and I've done a fair bit of testing of my programs on 32bit and 64bit and in all of my tests that I can remember, the 64bit version ran faster, often a lot faster, on the same computer system
Unless anything is built with code from Google Chrome as the browser is bloated when it comes down to memory as 32-bit support degrades when it comes down to performance on lower ended PCs.
Also, Windows 11 dumped support for 32-bit x86 processors altogether, making it 64-bit exclusive.
The Revo uninstaller includes this information in its program list!
The RAM limit would be per-process, right? Modern web browsers have separate processes for each tab, so I would think that while they could add up to more than 4GB (assuming you're using a 64-bit OS), each process could use 2-4GB in 32-bit.
Who came from deephumor?
Have you tried comparing the performance of operations on 64 bit integers (or floating point numbers, aka doubles)? I'm assuming that that would be significantly faster on x64
Steam being 32bit is ok. But it being able to use only 2 cores is unforgivable when i need to check the game files of a gigantic game and by ssd is barely being used due to 2 cores being maxed out.
Thanks for posting this video3
I tested running this on linux with wine, and it was actually a toss up with 64bit winning both LinkedList and Dictionary by 8%, and 32bit only having a 2% lead on the others.
Maybe this is just wow64 being better than native 64bit on windows...
amd64 or x86_64 actually only uses 48 bit of the 64 bit addresses. So your theoretical limit is far beyond whats actually possible.
Also one should consider that the address space of programs is compartmentalized into areas by the OS, for a few reasons, eg. that a program can have an address to memory that the system owns. So you can't even use all 48 bits for the program itself. Though, this compartmentalization depends on the OS,so I guess you might be able to create a bootable program that has access to everything. Theoretically. The 48 bit limitation is unavoidable with current hardware.
Re. Speed difference. Unless I'm missing something the 32 bit program should be limited to half the runtime at worst. And that's in the limiting case that all the data your program handles are pointers, and that your program speed is bound by the memory bandwidth of loading these addresses. So something like datastructures mostly made out of pointers, which seems to have been one of your examples.
But most programs written for speed don't tend to be that way, for started they don't use C#.
Additionally, AFAIK, most modern instruction extensions such as SIMD expensions are only available in 64 bit, so you loose out on these potential speedups.
8:31 this is not necessarily true, in C, if you use any integer which is less than 32 bit (e.g. int16_t) the operating have to do additional check to make sure it doesn't overflow, which makes it slower, but not necessarily less memory. Looks like in most cases, using a 32 bit integer is the most compatible with RAM and CPU, because the can read it readily, using anything different forces the compiler to check the range of the integers during the compile time, slowing down your system slightly.
Hm but would that only be during compiling?
@@ThioJoe overflowing during execution
@@ThioJoe I'm not 100% sure, but I think the original commenter is talking about aligning types to the native word size of the architecture. So if your native word size is 64-bit, for instance, and you use a 16 bit integer, it's possible it will be extended to 64-bit to maintain alignment at runtime. This could have a small performance impact, but with modern CPUs it would be basically insignificant outside of insanely performance critical work or find yourself using an unusual architecture. Otherwise the C compiler isn't doing any kind of range checking. If you overflow a signed 16-bit integer, it's undefined behavior. If you overflow an unsigned integer, it just wraps around. There are also situations like alignment padding of structs where it may pad smaller types to align to larger types, like if you have a struct with an int16_t and an int32_t the compiler will pad the int16_t to align with the int32_t.
@@ThioJoe Good question, the reason 32 bit file can be read the fastest during compilation is because the OS can read it without any checks. Technically, when you are running an exe file, it should be in the lowest level possible (binary). So it really surpises me when 32 bit outperforms 64 bit. One reason might be because 32 bit has a better optimization because all PCs have to run it?
Downloading chrome 32bit version rn, my pc abot to love me after that
I use old computers so usually my thoughts on the matter are:
_"Internally - All this program does is ______________. And it'll never, ever, consume a gig of ram to do what it does. _So like.. was ruining backwards compatibility reaaaaalyyyyy worth it"_
I wish 32-bit apps would just go away. I don't like the idea of having to install two versions of every library just to be able to run steam. I'm glad macOS dropped it 5 years ago. Unless it's a really old program, there's no reason to have 32-bit anymore.
So, wouldn't using 32bit version of programs automatically limit resource usage and make it easier to multitask/running multiple programs?
fun fact, the default download for discord is 32 bit