mold is a godsent. I used to reiterate a lot on a large code base where only cpp file got recompiled and mold literally saved me so much time as the linking went down from minutes to few seconds.
We cared about mold not because of the link times but rather because of its ability to eliminate identical symbols. The parameter is called --icf=all, and is off by default. Elimination of symbol based on content is something neither gnu ld nor gold can do, and do to my knowledge lld neither. So having a budget of say 512kb rom and you suddenly get 60kb more space just because mold eradicated lots of identical lambdas, trampolines, trivial forwarders and so on, is a game changer. Interestingly without mold you would try specific template techniques in code to get it smaller: doing extra steps to have more code independent of the set of types it is used with. That usually comes with a ROM cost in itself. Now with mold you might find that the "untouched" or as intended code gives you even better results.
I never really gave much thought to link times before and never considered that they could be sped up. So thank you for yet again educating me on something I was ignoring.
A few years ago, I was doing some work on Chromium. When I had changed a source file, the recompilation step was fast, as you'd expect -- but then I'd have to wait 5-10 minutes for the whole thing to re-link.
I find that on our production code it strongly depends on the quality of the code. The code has to be somewhat tweaked to begin with for this to make any sort of difference. Unfortunately (our?) developers just seem to add dependencies in CMake without thinking about the consequences for build time of adding that
Why do dependencies between libraries get bottlenecked by linking? Even if libB, depends on libA, you should be able to compile the cpp files of libB at the same time you are linking/building libA, right?
Unless you have some other thing (like generated files) that causes an artificial dependency. That's the kind of thing I had to wrangle in overly complicated CMake scripts
Take a look at the CMAKE_OPTIMIZE_DEPENDENCIES variable and its associated OPTIMIZE_DEPENDENCIES target property, available with CMake 3.19 or later (but use 3.28 or later due to an important bug fix affecting Ninja). Setting this variable to true gives you the behavior you describe, as long as certain criteria are satisfied. See the docs for OPTIMIZE_DEPENDENCIES for details.
It would be interesting to measure link+strip time as apparently mold does badly on binary size. Also, would be intersting to measure the performance of the generated binary.
Now I am almost entirely uncaring about compile and link times if the output is not of a high enough quality meaning I really care about LTO and the quality of the LTO output. Does anyone know how the output of the 4 linkers compare these days?
This raises an interesting point - if you *really* care about these things (say, HFT world). Then you should probably compile with/without LTO, -O1,2,3,s,z and link with mold,gold,lld,ld and then possibly pass all of that through BOLT and see what you get! (research.facebook.com/publications/bolt-a-practical-binary-optimizer-for-data-centers-and-beyond/ (now officially in LLVM))
its fast sure but its also missing a slew of compatibility features. If you dont run into unsupported features you might not care. But you do run into them.
For me, the most important tip for Linux development of large projects us to use split DWARF to create separate .dwo files for each object's debug info (like pdb files on windows). This reduces debugger start times from over ten mins on my usual work projects to nearly the same as without the debugger. The flags are: -fuse-ld=gold -Wl,--gdb-index, -gsplit-dwarf. I wonder if this is also supported in mold. It is definitely not supported in the deafault (bfd) linker. It's surprising how few people are aware of this feature of GCC for ELF targets. It could really use a reminder in some future episode as a favour to those who do development on Linux. I've been following mokd for a bit, but haven't trued it, because I believe we have some linker scripts that will need to be erradicated before a switch is possible.
Another common tweak for cmake projects with lots of interdependent shared libs is to use LINK_DEPENDS_NO_SHARED to avoid relinking dependents when a shared lib is changed without modifying interface headers. Of course, you need to consider things like generated header dependencies, but it can have a massive impact on overall link time in non-trivial build systems.
mold is a godsent. I used to reiterate a lot on a large code base where only cpp file got recompiled and mold literally saved me so much time as the linking went down from minutes to few seconds.
We cared about mold not because of the link times but rather because of its ability to eliminate identical symbols. The parameter is called --icf=all, and is off by default. Elimination of symbol based on content is something neither gnu ld nor gold can do, and do to my knowledge lld neither. So having a budget of say 512kb rom and you suddenly get 60kb more space just because mold eradicated lots of identical lambdas, trampolines, trivial forwarders and so on, is a game changer.
Interestingly without mold you would try specific template techniques in code to get it smaller: doing extra steps to have more code independent of the set of types it is used with. That usually comes with a ROM cost in itself. Now with mold you might find that the "untouched" or as intended code gives you even better results.
Did ld/gold failed ICF also with -ffunction-sections?
I never really gave much thought to link times before and never considered that they could be sped up. So thank you for yet again educating me on something I was ignoring.
A few years ago, I was doing some work on Chromium. When I had changed a source file, the recompilation step was fast, as you'd expect -- but then I'd have to wait 5-10 minutes for the whole thing to re-link.
I have used it for some time, and it is really helpful for TDD, where you compile two files quite often and link time dominants.
I find that on our production code it strongly depends on the quality of the code. The code has to be somewhat tweaked to begin with for this to make any sort of difference. Unfortunately (our?) developers just seem to add dependencies in CMake without thinking about the consequences for build time of adding that
Why do dependencies between libraries get bottlenecked by linking? Even if libB, depends on libA, you should be able to compile the cpp files of libB at the same time you are linking/building libA, right?
Unless you have some other thing (like generated files) that causes an artificial dependency. That's the kind of thing I had to wrangle in overly complicated CMake scripts
Take a look at the CMAKE_OPTIMIZE_DEPENDENCIES variable and its associated OPTIMIZE_DEPENDENCIES target property, available with CMake 3.19 or later (but use 3.28 or later due to an important bug fix affecting Ninja). Setting this variable to true gives you the behavior you describe, as long as certain criteria are satisfied. See the docs for OPTIMIZE_DEPENDENCIES for details.
It would be interesting to measure link+strip time as apparently mold does badly on binary size. Also, would be intersting to measure the performance of the generated binary.
This is a fair question - however I've never really seen "strip" take enough time to even notice.
Now I am almost entirely uncaring about compile and link times if the output is not of a high enough quality meaning I really care about LTO and the quality of the LTO output. Does anyone know how the output of the 4 linkers compare these days?
You can link with the fastest for debug builds and switch to the most performant for release builds.
This raises an interesting point - if you *really* care about these things (say, HFT world). Then you should probably compile with/without LTO, -O1,2,3,s,z and link with mold,gold,lld,ld and then possibly pass all of that through BOLT and see what you get! (research.facebook.com/publications/bolt-a-practical-binary-optimizer-for-data-centers-and-beyond/ (now officially in LLVM))
I’ve found mold gives the greatest impact on debug builds.
That tracks with what I saw too I think.
dude you got me with that thumbnail haahhaha
So many neat tools, but most often not for msvc...
msvc has one of the best debugging experiences though in my opinion.
Im not sold on mold
its fast sure but its also missing a slew of compatibility features. If you dont run into unsupported features you might not care. But you do run into them.
For me, the most important tip for Linux development of large projects us to use split DWARF to create separate .dwo files for each object's debug info (like pdb files on windows). This reduces debugger start times from over ten mins on my usual work projects to nearly the same as without the debugger. The flags are: -fuse-ld=gold -Wl,--gdb-index, -gsplit-dwarf. I wonder if this is also supported in mold. It is definitely not supported in the deafault (bfd) linker. It's surprising how few people are aware of this feature of GCC for ELF targets. It could really use a reminder in some future episode as a favour to those who do development on Linux. I've been following mokd for a bit, but haven't trued it, because I believe we have some linker scripts that will need to be erradicated before a switch is possible.
...and it also significantly cuts link times.
Another common tweak for cmake projects with lots of interdependent shared libs is to use LINK_DEPENDS_NO_SHARED to avoid relinking dependents when a shared lib is changed without modifying interface headers. Of course, you need to consider things like generated header dependencies, but it can have a massive impact on overall link time in non-trivial build systems.
Thanks, I'll look into this.