I think an important part that feels like it was skimmed over was the last part of his paper where he's arguing that you can't just make every bad thing that happens to computer code a crime and expect the problem to go away - you have to understand that you're implicitly trusting everything you use, and everything that thing uses, and we need to understand that we're trusting someone when we use their library that it does what it says it does.
Still a very important idea, that our security model(s) are mostly built on trust. With all the layers of software and chips today, few can admit they even understand all of the details. It comes down to trust.
@@stachowi maybe some are trying (NSA probably). But they still need people smart enough to understand this. So I do not believe any government is able to do this.
We are all standing on the shoulders of the humblest of giants, the amount of things that can trace the origin back to the work of the pioneers is astounding.
This considers the possibility of a software ‘supply chain attack’, to insert malevolent code in some circumstances. But don’t forget the hardware or firmware and supply chain attack route - that has certainly been deployed... Gotta build your own hardware too!
@@silaspoulson9935 I agree that teaching is important, but this award is specifically for original technical contributions to computer science, and is based on the level of importance those contributions have to the field.
That's a really interesting topic for a Sci-Fi movie. Like, imagine a trojan that has been dormant for the last 50 years in all the compilers and OS kernels of every computer of the world, and it is a timebomb that's going to wreak havoc at an unexpected moment.
@@calummcconnell7313 As I understand it, Reproducible Builds assumes that the original source of the compiler is fully trustworthy. It solves the problem of someone changing the source of the kernel and other packages, s.t., the compiled binaries have malicious code injected into them. But what if the all of the independent builds are performed by malicious compiler code, i.e., what if the author(s) of the toolchain/compiler have nefarious intentions?
@@LexSheehan The attack is only worth doing if the user can review the source code of the compiler, and see if they can detect a hostile actor that way. If the user can't do such a review, then there is no reason to do a trusting trust attack anyways: you just have that backdoor. What makes the trusting trust attack notable is that it cannot be detected (well, unless you have reproducible builds, and some other mitigations). A compiler author messing with source code can be.
Oddly, out of all TV programmes and films, Blake's 7 hints at this, with Orac using a backdoor that is in every computer, even those from outside the Federation. And that programme really doesn't know its computer science. Or any science, really.
I think the quote from the article is saying you have to "totally" create the code yourself, i.e. start from the binary (or even hardware as the case may be) and build up from there, because if you write your own compiler in binary then it doesn't have to be compiled by any other potentially malicious program, it just runs as you wrote it.
@@dm2060 point taken. Although once you get to the hardware, there are tests you can run to verify that the code that is running is correct. I.E. various instructions result in expected voltages and clock cycles And the instructions do what you expect them to. I suppose an extraordinarily clever engineer could hide something like a simple beacon.
Imagine a man so f'ing ahead of his time that, despite an era of global publication, it takes thirty six years before the next smartest guy notices his paper is a bombshell. You have imagined Ken Thompson.
And imagine a man so clever that creates something for himself that is so clever that the thing gets widely adopted. And although Plan 9 didn't got adopted, many of its ideas did
@@silaspoulson9935 Yep, there's a video by Computerphile explaining how it was born (basically a video of the memorandum by Rob Pike), although they forgot to mention Ken and the fact it was implemented the first time in Plan 9 in a few days
Nah. People understood and recognized the implications almost immediately. Cory's article may have been recent but the dialogue around Ken's paper has been going on basically since day one. I've personally known about it for almost two decades (I've never cared enough to try recompiling my own compilers so its never much mattered to me, but I've known of its existence and understood the basic problem from a conceptual level for almost as long as I've been a programmer).
At the end he talks about what if a printer company starts checking if you have the wrong ink. It's already happening. Just look at HP as an example of a selfish company essentially demanding that their customers either "comply" or get "denied".
So this basically describes the possibility of the worst kind of supply-chain attack possible, which would be practically unfixable without starting over from scratch. I remember several college discussions regarding trust at the hardware level, while shelving the `conspiracy-path`, security features like the `Execute Disable Bit` or the things in line with `Intel Management Engine/Platform Security Processor`, with the implications that data from memory, storage, network traffic, input and output devices could be evaluated, read and manipulated by `some extra rules` on a hardware level which was guarded by the barrier of proprietary code which should just be trusted.
You cannot *feasibly* create an evil compiler that will detect newly written diagnosis and development tools that will allow you to re-establish a trust root. It's possible to hide the evil code, sure, patch reading the file unless it's the OS process loader, etc., but you quickly find that propagating the evil code essentially requires it to be an AI programmer inventing new code in the fly. And it turns out people create new development tools reasonably often. This might count as "starting from scratch", but really it's the source that's the work, and we can generally verify that (especially with git securly hashing the history), so it's not nearly as bad as it sounds.
@@tyrenottire And you should read my comment. I was pretty clearly referring to fooling tools created after the evil code was added, and also the difficulty of having it survive for any kind of long term (think years). The paper didn't address either of these.
I still have somewhere on my TODO-list a project where I rewrite the basic layers from scratch where I only trust the hardware (which is admittedly already too much). It'd start with PXE booting written in machine code (possibly sent from a micro-controller I'd have designed myself), then I'd work my way up the abstraction ladder in steps. Starting with an extremely basic assembler written in machine code (no comments or labels supported), then a more decent assembler written in this poor-assembly. Then a note-quite-C compiler written in assembly that produce assembly. And finally a true C compiler that support all the C89 syntax. And here we are, we can finally compile a gcc we can trust.
But can you trust your assembler program? What if it detects that you're assembling an OS and sneaks extra code in? To be truly safe, do you have to write your own assembler from pure binary? What if your hex editor sneaks extra code in when it detects that you're writing an assembler?
@@CoderThomasB When I said "written in machine code", I didn't mean to write some of this pesky abstract ascii text assembly language. But machine code directly. Likely set up by flipping switches, like on an Altair 8800. But I'd still have to trust the hardware running the code I wrote this way.
@@Ceelvain An Altair 8800 is probably simple enough that you could trust it, but the point of the paper is that someone can almost indefinitely say “what if” because there is almost always something between your code, and its processing, including the processor itself. While if you're building an OS from scratch on an Altair 8800 you're probably safe, most people just use the C compiler build into their distros witch was compiled from another C compiler going all the way back to the first C compiler and the question the paper ask that if only one person in this chain injected the C compiler with malicious code that inserted itself in any other C compiler compiled with that compiler then a lot of the C compilers in use today could have that malicious code in it, and how would you know because the source for your compiler was safe?
I love this channel (Computerphile). It would be so interesting I think to sit in a pub with these folks (e.g. Prof. Brailsford) with a beer and have discussions such as this. Thanks!
This speech scene is for me; TheMatrix reminded me of the speech that started with the phrase "I created the Matrix" in the last scene of the movie. Thanks Professor Brailsford, Ken Thompson, Dennis Ritchie. We learned a lot from you. Greetings.
stop making a feedback loop of stress by worrying about stress what is stress stopping you from doing? nothing also you can suppress it using metacognition
The only time I ever made a geek pilgrimage to Stanford I stood in the bookstore and read that entire Thompson lecture out of a book I couldn't afford to buy.
This actually in a way came up in the Rust compiler (in a non-malicious way). Basically they noticed that the Rust compiler's source code never encodes the ASCII values for escape codes at all. So for example if the compiler encounters a ' ' in the source code that it's meant to compile, the source code of the compiler does not contain any information about the byte value of that and is instead also just defined as ' '. It just automatically propagates from the knowledge of the previous compiler.
The example he gave with printer ink is something that large printer manufacturers have included in their devices; my dad founded and ran a local printing shop for 30 years, and in the more recent printers this was exactly what happened, so you couldn't use ink that you didn't specifically buy from the company. As for the cd-rom example, that was an issue with various games published by EA, they used to not allow the game you purchased to run if you had a cd-rom burner program installed; there were also issues where the "anti-piracy" software EA required as an install literally broke people's computers (some people got broken cd drives, others got completely bricked machines). I don't know if these were intentional references or not, but yes these types of things are happening.
If I understand this correctly, the point is that even if I completely read and understand and document the source code for every program running on my computer, and I also completely understand and document the source code for every compiler, I can _still_ have malware hidden in the binaries. Because of the way bootstrapping works, my entire machine could be vulnerable due to some tiny hidden program that _only_ shows up in binaries, never in source code, and so is nearly impossible to find. But the only way that could be the case is if the compilers and everything else had been set up that way to begin with, like maybe Ken Thompson and UNIX. This is a call to stay grounded, like with encryption and man-in-the-middle attacks (or $2 wrench attacks--probably $5 wrenches now with inflation). You really do have to trust people at some point; there is no way around it, no matter how impressive the promises of various new encryption methods. You have to "trust trust," in the same way we all need some minimal amount of trust in other people to do pretty much anything.
yeah, that's one reason to be happy to have at least two decent open source C compiler, including one the other couldn't know how to poison like this 30 years ago, so building clang with gcc, and then vice versa, should give at least some level of trust.
@@Intermernet If you ignore the kernel, then you can build GCC from 357 byte hex assembler and 757 byte shell in a fully automated and reproducible way. Which shows that there is no trusting trust attack in userspace. Kernel problem is in principle solvable too although, it's not done yet.
I just bought an serial to usb adapter, and found out the driver was disabled by Microsoft -at the request of the manufacturer...- (edit: it was blacklisted by MS) The chip used in the adapter was cloned and pirated, so they decided to automaticaly install a dud driver and disable the whole family in windows 10, including the original chips from the manufacturer. I have no way of knowing if mine is original or pirated, but the solution I was given was a big "FU, buy a new one". Should we put our trust on automatic updates that downgrade products like that?
This is why, if you read the Windows EULA closely, it warns that the product is not suitable for use in medical equipment. Microsoft knows not to trust their own software for life-or-death situations.
AFAIK the FTDI driver reads out the serial number and only refuses to work if it detects a counterfeit one. Which would mean that you bought an adapter with a counterfeit chip.
@@vylbird8014 not really relevant? And also, of course it's not appropriate for life or death situations? The OSs that are are hilariously tiny and can't run anything you would care about. It's like complaining that your barbeque tongs are not appropriate for use as medical forceps.
@@max_kl It didn't read out the serial number. It sent an invalid request to brick the device to the chip. The official chip ignored the invalid request, but the counterfeit chips actually process the request as intended and destroy their own configuration data. Because of this, Microsoft blacklisted the driver.
But even if the compiler is producing machine code without any malicious code in it... we should not forget the CPU and its instructions. Some instructions might have secret behaviour when specific criteria are met - backdoors effectively. So this is not just a problem about trusting a compiler but trusting CPU vendor as well.
I've brought this paper up many, many times when talking about the supposed "trustless" nature of cryptocurrency. The technical solutions involved in cryptocurrency don't eliminate the need for trust, they simply push it somewhere else in the chain. In crypto-world, everyone is running around talking about how they have solved the problem of having to trust the central banks and the commercial banks, but they've simply shifted the problem to the shady online exchanges. Even the tinfoil hat crowd that does bilateral person-to-person swaps and keeps their cryptocurrency in cold storage, they are eventually at the mercy of some shady guy in the back-alley deal they have to make to swap their crypto for anything useful. The point is that there's no escape. Eventually you have to trust *someone* and you're time is better spent thinking about who you can trust rather than trying to play technological whack-a-mole trying to eliminate the need for trust.
The problem is where there is a huge centralised group of highly funded technocrats, using this purposely compromised technology for ill means. Even if you place your trust elsewhere, the centralised power group will always have the last word, unless of course we all turn our back on them and start our own parallel tech world. The chances of this happening are slim to none to though.
for those thinking of rewriting gcc from scratch in assembly ... how do you trust that the processor actually does what it is supposed to do when given an instruction? and if you don't trust intel and arm ... and say i'll make my own processor how do you know that nothing malicious has been hidden by the manufacturer at the silicon level ...
The only solution is to create an environment where people are not motivated to do such things, because they understand the negative impact it has for everyone. You can't solve moral problems with technological solutions.
However I agree with the points made, it's difficult finding the line in trust. Consider the following: your computer is most likely made from parts from various different manufacturers, how can you be sure you can trust any of them not to put anything rogue on them? How can you be sure you can trust any single person or company? There has to be some amount of blind trust otherwise you'd have to try to replicate more than 70 years of progress, not counting the long history of progress in physics, mathematics, and chemistry. And after all that, can you trust yourself with doing the job flawlessly?
@@mabalito You miss the point. if you compiled the compiler on itself then you just kick the trust question down the chain a step. If the original binary at the beginning of the chain is suspect then the current version can be suspect as explained in the video. You need to personaly build your own compiler from machine code to get around this problem. Or take an anti-paranoia pill. If some super villain has done this then what are they waiting for? - when are they going to make use of the exploit?
It's funny because I had a similar thought the other day about the gcc compiler: what if the first compiler you use to bootstrap gcc injects code in it that will add proprietary blobs to everything you compile with gcc? There's just no way of asserting whether that's the case.
The Solar Winds hack isn't exactly what Thompson described but it is close. The Solar Winds hack monitored the filenames of code being compiled and when the one they wanted came along they inserted their malware into the source as it went through the compiler. After compilation they removed the extra code so that the source looked unmodified. A decompilation of the binary would show that the binary and source didn't match. But, who's going to do that? Solar Winds was a remarkably clever hack made all the more scary by the fact that it is very difficult to think of a countermeasure that would prevent it. Another scary idea is that Solar Winds is just a first generation malware. In future infections they may go deeper. Even to the level of doing Thompson's suggested hack to the compiler itself.
It should be possible to reverse engineer a compiler to look for suspicious behavior. Aside from that, I don't see how we could be sure we're getting what we expect from looking at the source code.
It doesn't really solve the issue, but at the very least we have clang, gcc, MSVC - all independently made. As well as nasm, masm, yasm and that for assemblers
Yes. I'm not worried about this in software, (though it is worth keeping an eye out for it.) Hardware is where this problem exists. (Christopher Domas x86 demonstrated that at Blackhat 2017).
@@terranceandrewdavis7070 they can't all realistically have evil code embedded in their binaries that recognize the other compilers and know how to patch them, even as they are being heavily modified over decades. That would require the evil patch to be a general AI, AFAICT.
So did Ken Thompson create a monster? No, I'm pretty sure he didn't. For a start, you wouldn't go tipping people off about the possibility if you'd done that, now would you? I'd also trust the work of Richard Stallman, who personally wrote much of the first version of GCC.
"I'd also trust the work of Richard Stallman, who personally wrote much of the first version of GCC." Sure, he wrote the first version of GCC - but can you trust what he compiled/assembled that first version on on?
@@RedwoodRhiadra ~ Well, strictly speaking, no, you are correct, but I would tend to think in 30 years of work and revision and overhauls and upgrades, by a large group of people, all working (at least potentially) in the open, somebody would've noticed something not right about the code or its behaviour.
Ken Thompson invited people to have a look at his implementation later. Only one guy took him up on the offer long after that. Wrote a blog post about it. Fascinating read as well.
I think the point was trust. Do you trust what you can’t see. It would be like having a food replicator from Star Trek that included a little extra something in your food. LSD or maybe something that makes you comer compliant. This is indeed brings about great debate about trust but I think also highlights how we should be more critical of the tools we use.
Just write a C compiler in assmbly language for the target architecture, then assemble it manually and then use that machine code to bootstrap your way up gcc. Easy, right?
The original claim included that the kernel may also have been compiled by the evil compiler at this point, so it can do things like seeing your assembler reading a GCC source file and add in the evil code patch. I'm a bit dubious it could reliably add it to compilations and not to, say, editors, or compiler tests.
I've heard reports of (though not seen) C compilers that when you build the Unix login program that will grant superuser access to anyone who logs in with the username "kt" and no password.
If anything, really really bad timing. He's been toxic to the community for years, decades even, yes, he helped a lot early on in bootstraping the movement, and yes, he was a genius programmer, but he has tainted his legacy, and caused a lot of harm.
I've seen somewhere that this is somewhat circumventable if you have a compiler A that you can trust (though a bootstrapping mechanism or whatever). If you have another compiler B that you want to test for trustworthiness you can now compile B using B and compare the resulting executable with the executable generated when compiling B using A to C and compiling B with C again. If the generated executable matches exactly, you know that you can also trust the compiler B.
Usually different compiles don't provider exactly the same binaries. Maybe C compilers with the optimizations turned off get close. The modern thing to do is start with: reproducible builds.
@@autohmae The idea is that if you compile B using A the internal logic should be the same as the original B, if B wasn't tampered with. If you now compile B using C it should be equivalent to compiling B using B.
@@oj0024 You had this in your original comment: "If the generated executable matches exactly". I read that as: "if the executables have the same cryptographic hash" All I'm saying this is unlikely to happen with different compilers.
@@autohmae That's what I'm saying, only that you have different compiler binaries from the same compiler, hence the output should be the same. If you were to compile gcc using clang and then compile gcc again using the resulting compiler you get from compiling gcc with clang the compiler should be the same as when you compile gcc with gcc. gcc compiles gcc -> gcc2 clang compiles gcc -> clanggcc clanggcc compiles gcc -> clanggcc2 now clanggcc2 should be the same as gcc2 (on the bit level).
Perhaps you could do a video about Xenix. My first IT job in 1989 was in a network of Olvetti MT24s running Dos3.2, but my boss was trailling Xenix, a Unix variant that was owned by Microsoft at the time. I believe that Xenix morphed into SCO Unix, and Microsoft bought the Carnegie-Mellon micro-Unix kernel and adapted it to become Microsoft NT. I've often wondered why, if Microsoft had already bought the Xenix Unix kernel, why didn't they run with that instead of spending a decade to develop an alternative based on another Unix kernel.
The only case I've heard of this is when the NSA wouldn't say why you should use a particular IV, IIRC? It later turned out if you didn't, there was an attack that would seriously weaken your keys and them explaining this would have immediately made a bunch of existing systems vulnerable. But yeah, that's why "nothing up my sleeves" values like digits of pi or ASCII are valuable.
I blindly trust the developers who package my distro and its build tools. This isn't a joke. I just give up. If they can't break the chain of compiler poisoning, I sure can't.
What's scarier is that you could engineer even the debuggers and hex viewers to pretend the extra code doesn't exist. I don't think this is a real problem with compilers but it's a crazy security nightmare with embedded code and the low lvl stuff in your devices etc.
@@profdaveb6384 that's great thank you. Sorry I did look there before when I watched your video but must have missed it. Small screen, poor eyesight...
This sort of issue is why the teaching of philosophy is so important. Who polices the policeman? It seems that the little policing there is, is after the fact. As an interested outsider vaguely learning of high profile security breaches, it seems that any database directly connected to a communication system is functionally pre-hacked. Am I wrong in this?.
I agree with your observation about the philosophical part of all this. Our bodies do the same sort of "truth checking" at every system level, from DNA to organ function. What's needed are standards of operating behaviors, for everything from chip level to hardware to gui. Only then can everything check everything else.
I think that's a solved issue in software. The undocumented arm processor that intel embedded into their x86 processors, (demonstrated by Christopher Domas) is the manifestation of this issue.
It's not solved. The trust can be pushed down the stack, but the only solution is to build everything from scratch yourself. And yes, Intel and AMD have put spyware in the hardware. What are you going to do, use an Acorn?
11:14 Ah, you mean what the companies behind the DMCA actually tried to to? (and the ink comment seconds before is also actually a thing, printer ink is more expensive than gold, seriously)
For compilers I suspect this will fail as the software evolves over time. Eventually the trojan will no longer be compatible with the new version. The points it was hooking into will eventually be refactored.
Consider that you could change what code the subsequent bad compiler is 'hooking' onto in each iteration of the recompiled compiler down the chain. Based on the fact that the code you're fed each time is fully read into the compiler, you always have a copy of the source code for the previous version and can automatically adjust each new version of the compiler as needed. The only way to be truly sure is to go back write your own compiler in machine code.
@@GeorgeBratley "The only way to be truly sure is to go back write your own compiler in machine code." ... running on a computer you designed and built yourself from discrete transistors...
I've heard it said when you are asked in a job interview what your biggest weakness is as a software engineer? that the best answer is "I'm not Ken Thompson".
The moral is: You can never trust anything(not restricted to code or programs) that you didn't create or build from scratch by yourself. If you need any tools to build the thing in question, you have to make the tools from scratch in the same way. This applies recursively until you are down to the simplest tools poseible that are buildable from raw materials and human power. But even if you do all that, you cannot be sure that your mind is not being manipulated. This chain of trust doesn't have a beginning.
Nowadays people are (formally) proving compilers correct. Of course, you have to "check the proof checker" somehow but its a LOT shorter than "the code" for a C compiler and you only have to "check" one proof checker, which you can use to prove correct any compiler you want (and much more). Of course, proving stuff correct in this way is very hard. But people manage to do it these days.
It might be checking the source for correctness, but that in itself does not address the problem. Does it check the compiled binary along with the source to verify that the binary, when run on tatgetted architecture, will function according to source specification? (If so then that would indeed be a great step forward, even as it would leave unanswered the question of whether the microcode of the target architecture does in fact cause the architecture to execute the machine code faithfully.)
@@zapazap feasible, but I've always been dubious of the claims that software proofs are actually beneficial, and in this case if they can sneak something evil into the C code, surely they can also sneak something evil into the proof if it's that much bigger - but perhaps I'm misunderstanding something.
@@SimonBuchanNz The point is that it doesn't matter how big the proof is. It only matters how big the proof checker program is. The proof checker *is* much smaller than a C compiler. Then, as long as the proof checker is correct, there is no way you can put anything evil into the proof, which is litterally the point of proof verifcation. (You might be able to put evil stuff in the claims that you prove, i.e. prove something completely unrelated and claim that this means your compiler is correct. However, the claims for such a proof can fit on a page of paper and people will read them). Overall, what such proofs achieve is to make completely clear what you mean by "correctness" and eliminate a LOT of places where this correctness may have been violated. It's much easier to make sure "beyond a reasonable doubt" if a microchip is doing what it should than to make sure "beyond a reasonable doubt" if a microchip AND the compiler's source code are.
I know it would be a lengthy project, but wouldn't the solution be to rebuild the compiler from assembler? Start with something very basic, just enough to allow a very simple C program to be written and compiled to a compiler, then rebuild feature by feature with all code visible and replicable for security researchers to verify step by step. You'd really only have to do this once, and you'd have in the end a completely traceable line of source code for any compiler that branches from that core project.
@@davidwuhrer6704 No, code libraries would be "pre-existing code". Of course those would be untrusted. As for CPUs, perhaps those would also need an audit, I don't know enough about their capacity to alter code during compilation.
That's why we should use some wacky compiler like movcc for one of the steps in the bootstrapping chain. Because there's no way whoever wrote the original evil gcc would anticipate movcc, so movcc won't get detected as a compiler, so the malware can't propagate through it.
It is an interesting thought experiment, but how would the exploit work for cross compiled code for architectures that did not even exist back when Kernighan wrote his first compiler? Seems to me it would fail at some point, but I guess later bad actors could do the same trick.
@@usafa1987 You can test a resistor. If it secretly stateful, intending to behave in undocumented ways in ceryain conditions, as part if a logic circuit it can only give corrupt output. It cannot, independently, inject malicious code. (I am ignoring the possibility that there is imbedded intelligence in the resistor package that is communicating with other similar rrsistors. This could be a nightmare.) An EPROM gate array is pretty testable, and after testing can be programmed in house to perform the logic you want. It's not ICs themselves that is such a worry. It is ICs that do internal computation, and whose usefulness relies on the veracity of these computations, that are worrisome.
I wonder if it is feasible to build a physical compiler without any ICs which matches the spec/source of an existing one, and have it be fast enough to compile a clean compiler binary in some reasonable amount of time.
There IS a way to detect the trusting trust attack. Just use completely different compilers (such as clang and gcc), and make them compile the same compiler source. Then make the resulting binaries compile the same source again. If the final binaries are different, one of the compilers you used is compromised!
Different compilers make plenty of different decisions, different optimizations. I’d be shocked if two different compilers gave identical binaries for non-trivial code.
@@usafa1987 They are both the same compiler (from the same source), just created by different compilers. The differences in the binary for the same source should still result in the same binary at the end.
@@usafa1987 To further clarify, the procedure to use is as following: 1) have the source for the compiler that you want to test for "evilness". Let's say gcc. 2) have an entirely separate compiler that can compile that source. Let's say clang. 3) compile gcc with gcc. Call the resulting binary gcc-gcc. 4) compile gcc with clang. Call the resulting binary clang-gcc. 5) now you have two compiler binaries. They will not be identical because of differences in gcc and clang, but they should behave exactly the same. 6) compile gcc with gcc-gcc. Call the resulting binary gcc-gcc2. 7) compile gcc with clang-gcc. Call the resulting binary clang-gcc2. 8) now you have two compiler binaries. They ought to be identical.
@@Jitter88 If a clang compiler was disguised as a gcc compiler, you could find it out pretty easily. Their options are different, their errors are different, it's not that hard to differentiate them.
As long as the language specification is open, can't you always write your own compiler in assembly? And as long as the source code of the popular compiler is open, couldn't you then read through it thoroughly and compile it using your different compiler, to get a fully featured compiler which definitely does not carry the poison any more?
You can, in theory. In practice, this is impractical, because you actually have to go back all the way to making your own computer from discrete transistors, since this sort of attack can begin even at the hardware level. Unless you trust your IC manufacturer. (If you're this paranoid, you can't.)
@@RedwoodRhiadra How would such an attack practically be performable with mere hardware? To covertly inject the malicious seed into everything that gets executed and/or compiled on the hardware, surely would require an immense amount of extra computation (or other detectable stuff), given how little knowledge about the performed code the hardware is supposed to have?
@@fdagpigjYou don't need to alter the compiler if you attack at the hardware level. You just need to alter the hardware. By adding another hidden processing core running Minix for example. You could make sure your backdoor is added to every processor by having the processor add the backdoor to the output of the hardware compiler. It wouldn't be in the executable of the hardware compiler itself.
Unless the Trojan Horse is in the CPU, in addition to code you have written yourself, you can trust machine code that you have stepped through in an in-circuit emulator. I used to debug disassemblies of C code on a 286, without source-level debugging. After a while, you become adept at seeing how the machine code corresponds to the source. In principle, you could hand-compare source and binary line by line, function by function--and once you've eliminated all the machine code that can be explained by the source co de, anything left was added by the compiler.
No. For one thing, the debugger might be affected. But more importantly, an optimising compiler creates data paths that are not in the source, so there is unlikely to be an idempotent bijective relation between object code and source code. And thirdly, modern CPUs use software interrupts, which are not something you can just step through. They can easily be hidden from the debugger.
I guess I am missing something; I understand the implications of Thompson’s paper, however, what’s to stop someone from doing dynamic analysis on the compiler as it processes source code, and match the binary to expected output, or a hash? While this would be time consuming, my gut is saying that you could leverage automation of manually verified proofs to determine if a compiler is secretly malicious, no?
Yes! With the computing power; but those early days were lacking of it. And I guess for giant programs like those we have today, this keeps being an issue....
well, these days we have cpus reordering instructions on the fly, instructions emulated by others at a different abstraction level, instructions programmed by microcode (software) and patchable from userland (recently discovered undocumented instructions in intel…), so sure, we can with effort solve the issues from 40 years ago, with some limitations, but i fear we are quite in trouble for today's problems…
So, a static review of the source wouldn't find this so-called compiler bug.. and dynamic testing would only test for functionality. Isn't it possible to do binary analysis? Perhaps using the source to derive an expected result?
Sure, you can derive an expected result from the source - using the compiler. You can also decompile the object code and compare it to the actual source. Do you trust the decompiler? What if the "bug" is hidden in the hardware?
It's especially interesting considering the recent issues with the security of the dotNET build system. I guess that's one of the issues that just have no good solutions.
I don't know what issues you're talking about, but I'd be very surprised if it has anything to do with the subject of Ken's paper. Malicious code (intentional or otherwise) and sloppy code (again, intentional or otherwise) are of course major generalized security concerns but those are obvious ones. Ken's paper was about _hiding_ malicious code so that it exists in the binary and can propagate itself to new versions of the binary, but no longer exists in the source (so that a source code audit would find no trace of it ever existing). The vast vast majority of security issues (maybe all of them) are plainly visible in their source code if you know what to look for and where. Obviously they're still a problem and "what to look for" is itself a difficult challenge, but its not quite in the same class as what Ken was describing.
Isn't this where reverse engineering tools like Ghidra help? Then again Ghidra could be programmed to ignore specific rogue code... Run Ghidra on Ghidra... Now I'm going cross-eyed
@@gamekiller0123 Just adding specificity: scratch here means transistor level. If you use IC's developed by someone else, no matter how simple, it CANNOT be trusted.
@@lokeshchandak3660 build your own transistors. I'm sure that I saw something not long ago about how someone had managed to embed compete ICs in what looked like simple surface-mount components, so that what seemed to be a resistor or capacitor could actually take over a whole circuit-board o.O
running GNU/Linux this is something I worry about much the time. Is my machine doing something that I don't know about and wouldn't want it to be doing? I implicitly trust the code I'm running (that I didn't write myself; the code I wrote I don't trust at all, but that's another story), but should I....? I implicitly trust Stallman and Torvalds and many others, but who out there is shipping malicious code unrecognized? Easter eggs were a big thing at proprietary corps. Is there something similar but more sinister in binaries that are ostensibly open source where I never look at the source? It's thanks to the tireless workers at FOSS that we have to thank for reducing our levels of anxiety, and to Professor Brailsford and the people at Computerphile, not forgetting the brilliant Ken Thompson of course. But what about the big corps and backdoors? The Intel IME... (surely doing work on behalf of government orgs everywhere)
I've tried to explain this to people who claim to care a lot about security and they either didn't understand or refused to acknowledge that it was a serious problem.
I feel this a little pumped up. Just because something is a binary doesn't mean that you can't look at it, disassemble it and analyze its semantics. Interesting topic though, for sure.
couldn't you detect this by recursion? putting in the same code over and over, if the size of the result increases over time there's something it's inserting stuff
Practically, this is exactly what you'd end up doing, alongside other heuristics like network/code sniffing. However, the code could fight back by detecting your attempt to sniff it out, and take defensive measures (e.g. cloaking itself, causing the detection code to crash with its own recursion, etc.). Modern viruses do this when they go into the registry / killing processes to disable antivirus programs. It ultimately becomes an arms race between the bad guys and threat detectors. This is why the Trust problem isn't solved (from a theoretical perspective) with detection techniques.
I suppose one _could_ write one's own C compiler in assembler, and then bootstrap up from that using something like the gcc source tree, which is extremely thoroughly reviewed.
There are projects out there that try to make it easier to bootstrap compilers like TinyC from your own, literally hand-made bytecode compiler. TinyC could then be used (with a lot of patches) to compile GCC from source, completely eliminating the trusting trust problem.
@@longlostwraith5106 Would it be feasible to bootstrap a compiler writing the first iteration in assembly for a machine, then re-writing the compiler in its language, then creating tools to address other machines and make decent code? I recall also the story of TMG for PDP-7: Doug McIlroy ported TMG to the PDP-7, and wrote it in TMG itself (giving "his piece of paper another piece of paper") bootstrapping it, and then Ken Thompson used it to supply the B language (he wanted to supply FORTRAN but it was too heavy for that machine)
This and even more is done. I have a script that bootstraps GCC from 357 byte self-hosting hex assembler and 757 byte shell. Except for any POSIX kernel, no other pre-built stuff is used. Not even pre-generated configure scripts or pre-built bison parsers.
I think TH-cam has screwed up. I watched a video by Computerphile considering if we can 'trust trust'. Then I switched to a channel for the Unity 22 li9ve stream, . Waiting for it to start, read the preamble, all okay, and then moved on down to read the comments. And found... all these. Talking about compilers, and bootstraps and such, about 3 months ago. Thought 'WTF has that got to with..." and then it clicked. Still; got the comments section from the previous video. Hmm how often does one get to post on a video, about a different video, and yet still concerns the video one is posting on?
Can we trust Richard Stallman and his gcc compiler? What if it has secret code to mess with the binaries in case you are compiling proprietary software?!
And what if somebody writes the c compiler from scratch in assembly just following the C language grammar? That should be safe as the grammar rules are pretty clean and strict (now talking about c, c++ has turned into a pure nightmare over those years)
I think an important part that feels like it was skimmed over was the last part of his paper where he's arguing that you can't just make every bad thing that happens to computer code a crime and expect the problem to go away - you have to understand that you're implicitly trusting everything you use, and everything that thing uses, and we need to understand that we're trusting someone when we use their library that it does what it says it does.
Still a very important idea, that our security model(s) are mostly built on trust. With all the layers of software and chips today, few can admit they even understand all of the details. It comes down to trust.
And you know governments inject backdoors into the hardware and lower level software
@@stachowi maybe some are trying (NSA probably). But they still need people smart enough to understand this. So I do not believe any government is able to do this.
@@superliro100 you do realize the NSA help invent SHA-256 the hashing algorithm used in the Proof of Work? They have smart people too
@@stachowi i said i dont believe any government is capable to do so. But i acknowledged that the NSA probably is. We are on the same page.
We are all standing on the shoulders of the humblest of giants, the amount of things that can trace the origin back to the work of the pioneers is astounding.
This considers the possibility of a software ‘supply chain attack’, to insert malevolent code in some circumstances. But don’t forget the hardware or firmware and supply chain attack route - that has certainly been deployed... Gotta build your own hardware too!
Time to get your breadboard out and head on over to Ben Eater's TH-cam channel :-)
@lunes feriado Or check out the LMARV-1 on TH-cam as well.
You mean like the recent sunbust hack at Solarwinds? 🙂
@@autohmae But you got to trust your breadboard as well then.
@@ZT1ST I've never tried, but as I understand it you can pretty easily open it up.
Professor Brailsford should get the Turing Award for his contribution in documenting computer science history
Has a Turing Award ever been given for promoting and perserving computer science?
Are they fussy about performance-enhancing drugs?
The Turing Award is specifically for contributions of technical importance. Another award would make more sense for educators and historians.
@@cookergronkberg but surely the ability to learn technical information is a contributions of technical importance to others :)
@@silaspoulson9935 I agree that teaching is important, but this award is specifically for original technical contributions to computer science, and is based on the level of importance those contributions have to the field.
That's a really interesting topic for a Sci-Fi movie. Like, imagine a trojan that has been dormant for the last 50 years in all the compilers and OS kernels of every computer of the world, and it is a timebomb that's going to wreak havoc at an unexpected moment.
Starwars kinda did that with Order 66, just not with computers =)
There is a major ongoing project to protect against this attack: called reproducible builds (and a spinoff, bootstrapable builds).
@@calummcconnell7313 As I understand it, Reproducible Builds assumes that the original source of the compiler is fully trustworthy. It solves the problem of someone changing the source of the kernel and other packages, s.t., the compiled binaries have malicious code injected into them. But what if the all of the independent builds are performed by malicious compiler code, i.e., what if the author(s) of the toolchain/compiler have nefarious intentions?
@@LexSheehan The attack is only worth doing if the user can review the source code of the compiler, and see if they can detect a hostile actor that way. If the user can't do such a review, then there is no reason to do a trusting trust attack anyways: you just have that backdoor. What makes the trusting trust attack notable is that it cannot be detected (well, unless you have reproducible builds, and some other mitigations). A compiler author messing with source code can be.
Oddly, out of all TV programmes and films, Blake's 7 hints at this, with Orac using a backdoor that is in every computer, even those from outside the Federation.
And that programme really doesn't know its computer science. Or any science, really.
I would make a slight addition at 4:15 that you also can't trust code that you did create yourself.
I think the quote from the article is saying you have to "totally" create the code yourself, i.e. start from the binary (or even hardware as the case may be) and build up from there, because if you write your own compiler in binary then it doesn't have to be compiled by any other potentially malicious program, it just runs as you wrote it.
Doing all that doesn't make it trustworthy.
@@keyboard_toucher Only if you can't trust yourself.
@@robertkelleher1850 you'd also have to make your own CPU, because how do you know your CPU manufacturer isn't putting in malicious firmware?
@@dm2060 point taken. Although once you get to the hardware, there are tests you can run to verify that the code that is running is correct. I.E. various instructions result in expected voltages and clock cycles And the instructions do what you expect them to. I suppose an extraordinarily clever engineer could hide something like a simple beacon.
Imagine a man so f'ing ahead of his time that, despite an era of global publication, it takes thirty six years before the next smartest guy notices his paper is a bombshell. You have imagined Ken Thompson.
And imagine a man so clever that creates something for himself that is so clever that the thing gets widely adopted.
And although Plan 9 didn't got adopted, many of its ideas did
@@DVRC inlcuding utf-8, by my understanding the dominant web text encoding
@@silaspoulson9935 Yep, there's a video by Computerphile explaining how it was born (basically a video of the memorandum by Rob Pike), although they forgot to mention Ken and the fact it was implemented the first time in Plan 9 in a few days
Nah. People understood and recognized the implications almost immediately. Cory's article may have been recent but the dialogue around Ken's paper has been going on basically since day one. I've personally known about it for almost two decades (I've never cared enough to try recompiling my own compilers so its never much mattered to me, but I've known of its existence and understood the basic problem from a conceptual level for almost as long as I've been a programmer).
@@altrag that's fair
At the end he talks about what if a printer company starts checking if you have the wrong ink.
It's already happening. Just look at HP as an example of a selfish company essentially demanding that their customers either "comply" or get "denied".
Yes, but Thompson thought of it in 1984.
So this basically describes the possibility of the worst kind of supply-chain attack possible, which would be practically unfixable without starting over from scratch. I remember several college discussions regarding trust at the hardware level, while shelving the `conspiracy-path`, security features like the `Execute Disable Bit` or the things in line with `Intel Management Engine/Platform Security Processor`, with the implications that data from memory, storage, network traffic, input and output devices could be evaluated, read and manipulated by `some extra rules` on a hardware level which was guarded by the barrier of proprietary code which should just be trusted.
You cannot *feasibly* create an evil compiler that will detect newly written diagnosis and development tools that will allow you to re-establish a trust root. It's possible to hide the evil code, sure, patch reading the file unless it's the OS process loader, etc., but you quickly find that propagating the evil code essentially requires it to be an AI programmer inventing new code in the fly.
And it turns out people create new development tools reasonably often.
This might count as "starting from scratch", but really it's the source that's the work, and we can generally verify that (especially with git securly hashing the history), so it's not nearly as bad as it sounds.
@@tyrenottire And you should read my comment. I was pretty clearly referring to fooling tools created after the evil code was added, and also the difficulty of having it survive for any kind of long term (think years). The paper didn't address either of these.
now, looking at the direction MS is going with Windows... Visual Studio worries me....
Read that paper some years ago and I've never forgotten about it (it's short, so not that hard to remember). Neat to see a video made about it.
I still have somewhere on my TODO-list a project where I rewrite the basic layers from scratch where I only trust the hardware (which is admittedly already too much). It'd start with PXE booting written in machine code (possibly sent from a micro-controller I'd have designed myself), then I'd work my way up the abstraction ladder in steps. Starting with an extremely basic assembler written in machine code (no comments or labels supported), then a more decent assembler written in this poor-assembly. Then a note-quite-C compiler written in assembly that produce assembly. And finally a true C compiler that support all the C89 syntax. And here we are, we can finally compile a gcc we can trust.
Well, that YOU can trust. I won't trust you :)
@@cbaltatescu True
But can you trust your assembler program? What if it detects that you're assembling an OS and sneaks extra code in? To be truly safe, do you have to write your own assembler from pure binary? What if your hex editor sneaks extra code in when it detects that you're writing an assembler?
@@CoderThomasB When I said "written in machine code", I didn't mean to write some of this pesky abstract ascii text assembly language. But machine code directly. Likely set up by flipping switches, like on an Altair 8800.
But I'd still have to trust the hardware running the code I wrote this way.
@@Ceelvain An Altair 8800 is probably simple enough that you could trust it, but the point of the paper is that someone can almost indefinitely say “what if” because there is almost always something between your code, and its processing, including the processor itself. While if you're building an OS from scratch on an Altair 8800 you're probably safe, most people just use the C compiler build into their distros witch was compiled from another C compiler going all the way back to the first C compiler and the question the paper ask that if only one person in this chain injected the C compiler with malicious code that inserted itself in any other C compiler compiled with that compiler then a lot of the C compilers in use today could have that malicious code in it, and how would you know because the source for your compiler was safe?
I love this channel (Computerphile). It would be so interesting I think to sit in a pub with these folks (e.g. Prof. Brailsford) with a beer and have discussions such as this.
Thanks!
This speech scene is for me; TheMatrix reminded me of the speech that started with the phrase "I created the Matrix" in the last scene of the movie. Thanks Professor Brailsford, Ken Thompson, Dennis Ritchie. We learned a lot from you. Greetings.
I did not need this stress this morning.
And now you're stuck with it for life... DAMN YOU THOMPSON!!!!!!
and that's why issue still stands.
And this question of trust can be applied on every part of your life too. Sorry... :-)
stop making a feedback loop of stress by worrying about stress
what is stress stopping you from doing? nothing
also you can suppress it using metacognition
@@AndersJackson NOOOOO
The only time I ever made a geek pilgrimage to Stanford I stood in the bookstore and read that entire Thompson lecture out of a book I couldn't afford to buy.
Libraries?
This actually in a way came up in the Rust compiler (in a non-malicious way). Basically they noticed that the Rust compiler's source code never encodes the ASCII values for escape codes at all. So for example if the compiler encounters a '
' in the source code that it's meant to compile, the source code of the compiler does not contain any information about the byte value of that and is instead also just defined as '
'. It just automatically propagates from the knowledge of the previous compiler.
Wow, can you post a link to source or something?
I'd also like to know more about this. I tried to look it up but couldn't find anything sadly
Basically in source code there's the moral equivalent of "\
" => "
", instead of "\
" => "\x0a" there's not much to see.
Fascinating, I could endlessly listen to professor Brailsford.
I find the paper much easier to understand than his description of it.
Primeagen chat mentioned this paper and how it relates to the XZ Utils backdoor.
"New-fangled CD-ROM burners" I love this guy so much
The example he gave with printer ink is something that large printer manufacturers have included in their devices; my dad founded and ran a local printing shop for 30 years, and in the more recent printers this was exactly what happened, so you couldn't use ink that you didn't specifically buy from the company. As for the cd-rom example, that was an issue with various games published by EA, they used to not allow the game you purchased to run if you had a cd-rom burner program installed; there were also issues where the "anti-piracy" software EA required as an install literally broke people's computers (some people got broken cd drives, others got completely bricked machines). I don't know if these were intentional references or not, but yes these types of things are happening.
Let's not forget about Sony's rootkit either.
These things have happened in the last two decades, but Thompson suggested things like this in 1984.
If I understand this correctly, the point is that even if I completely read and understand and document the source code for every program running on my computer, and I also completely understand and document the source code for every compiler, I can _still_ have malware hidden in the binaries. Because of the way bootstrapping works, my entire machine could be vulnerable due to some tiny hidden program that _only_ shows up in binaries, never in source code, and so is nearly impossible to find. But the only way that could be the case is if the compilers and everything else had been set up that way to begin with, like maybe Ken Thompson and UNIX.
This is a call to stay grounded, like with encryption and man-in-the-middle attacks (or $2 wrench attacks--probably $5 wrenches now with inflation). You really do have to trust people at some point; there is no way around it, no matter how impressive the promises of various new encryption methods. You have to "trust trust," in the same way we all need some minimal amount of trust in other people to do pretty much anything.
I actually know what the title is! I was just talking about this paper with a friend a week or two ago.
This is a terribly sobering video. Excellent production as always. Thank you for sharing these videos!
Please do a follow-up video on countering "trusting trust" attacks with diverse double compiling, it's fascinating.
I agree!
yeah, that's one reason to be happy to have at least two decent open source C compiler, including one the other couldn't know how to poison like this 30 years ago, so building clang with gcc, and then vice versa, should give at least some level of trust.
@@GabrielPettier Maybe you're just adding 2 attack vectors ;-)
@@Intermernet If you ignore the kernel, then you can build GCC from 357 byte hex assembler and 757 byte shell in a fully automated and reproducible way. Which shows that there is no trusting trust attack in userspace. Kernel problem is in principle solvable too although, it's not done yet.
@@stikonas cross compile from Windows? 😄
Thank you professor Brailsford, very entertaining and thought provoking as always
I just bought an serial to usb adapter, and found out the driver was disabled by Microsoft -at the request of the manufacturer...- (edit: it was blacklisted by MS)
The chip used in the adapter was cloned and pirated, so they decided to automaticaly install a dud driver and disable the whole family in windows 10, including the original chips from the manufacturer.
I have no way of knowing if mine is original or pirated, but the solution I was given was a big "FU, buy a new one". Should we put our trust on automatic updates that downgrade products like that?
You're lucky. They used to ship a driver that would intentionally brick counterfeit chips.
This is why, if you read the Windows EULA closely, it warns that the product is not suitable for use in medical equipment. Microsoft knows not to trust their own software for life-or-death situations.
AFAIK the FTDI driver reads out the serial number and only refuses to work if it detects a counterfeit one. Which would mean that you bought an adapter with a counterfeit chip.
@@vylbird8014 not really relevant? And also, of course it's not appropriate for life or death situations? The OSs that are are hilariously tiny and can't run anything you would care about. It's like complaining that your barbeque tongs are not appropriate for use as medical forceps.
@@max_kl It didn't read out the serial number. It sent an invalid request to brick the device to the chip. The official chip ignored the invalid request, but the counterfeit chips actually process the request as intended and destroy their own configuration data. Because of this, Microsoft blacklisted the driver.
But even if the compiler is producing machine code without any malicious code in it... we should not forget the CPU and its instructions. Some instructions might have secret behaviour when specific criteria are met - backdoors effectively. So this is not just a problem about trusting a compiler but trusting CPU vendor as well.
I've brought this paper up many, many times when talking about the supposed "trustless" nature of cryptocurrency. The technical solutions involved in cryptocurrency don't eliminate the need for trust, they simply push it somewhere else in the chain. In crypto-world, everyone is running around talking about how they have solved the problem of having to trust the central banks and the commercial banks, but they've simply shifted the problem to the shady online exchanges. Even the tinfoil hat crowd that does bilateral person-to-person swaps and keeps their cryptocurrency in cold storage, they are eventually at the mercy of some shady guy in the back-alley deal they have to make to swap their crypto for anything useful. The point is that there's no escape. Eventually you have to trust *someone* and you're time is better spent thinking about who you can trust rather than trying to play technological whack-a-mole trying to eliminate the need for trust.
The problem is where there is a huge centralised group of highly funded technocrats, using this purposely compromised technology for ill means. Even if you place your trust elsewhere, the centralised power group will always have the last word, unless of course we all turn our back on them and start our own parallel tech world. The chances of this happening are slim to none to though.
This comment has aged well.
Those nested horses are the best graphic I've seen on this channel so far. That'll stick with me.
for those thinking of rewriting gcc from scratch in assembly ... how do you trust that the processor actually does what it is supposed to do when given an instruction? and if you don't trust intel and arm ... and say i'll make my own processor how do you know that nothing malicious has been hidden by the manufacturer at the silicon level ...
The only solution is to create an environment where people are not motivated to do such things, because they understand the negative impact it has for everyone. You can't solve moral problems with technological solutions.
You can't know, just like you can't know if a particular restaurant chain is slowly poisening you.
Yeah pretty much
Not really
If you start vomiting or getting headaches after you go there, then you probably getting poisoned lol
That thumbnail is really clever!!!
The thumbnail mirror effect is inconsistent for the mirrored "
@@thomassynths you can't trust it, you had to inspect it yourself 😏
@@thomassynths the effect is intentional - it makes it be the mirror of “no trusting trust”, which is what this video is showing.
@@rainbowevil Yeah , very clever 😂
7:33 "Are you still with me?"
However I agree with the points made, it's difficult finding the line in trust.
Consider the following: your computer is most likely made from parts from various different manufacturers, how can you be sure you can trust any of them not to put anything rogue on them?
How can you be sure you can trust any single person or company?
There has to be some amount of blind trust otherwise you'd have to try to replicate more than 70 years of progress, not counting the long history of progress in physics, mathematics, and chemistry.
And after all that, can you trust yourself with doing the job flawlessly?
That's the point: You can't trust anything you haven't made yourself down to the lowest level from scratch.
Brilliant. I was hoping someone made a video about this. They talked about it at Harvard too at the end of CS50.
Isn´t this the reason behind the GCC compiler and why it is considered Stallman´s most important contribution to free software?
what did this complier get compiled on?
@@thettguy On itself. Heard about bootstrapping?
But doesn't that just shift the question from "Can we trust Thompson?" to "Can we trust Stallman?"
@@mabalito You miss the point. if you compiled the compiler on itself then you just kick the trust question down the chain a step. If the original binary at the beginning of the chain is suspect then the current version can be suspect as explained in the video. You need to personaly build your own compiler from machine code to get around this problem. Or take an anti-paranoia pill.
If some super villain has done this then what are they waiting for? - when are they going to make use of the exploit?
@@thettguy Why on earth would they announce it's use when it can quietly be taken advantage of ?
It's funny because I had a similar thought the other day about the gcc compiler: what if the first compiler you use to bootstrap gcc injects code in it that will add proprietary blobs to everything you compile with gcc? There's just no way of asserting whether that's the case.
That's why I use clang -Bstatic.
Go is said to have a verifiable boot chain. I think Rust as well.
The Solar Winds hack isn't exactly what Thompson described but it is close. The Solar Winds hack monitored the filenames of code being compiled and when the one they wanted came along they inserted their malware into the source as it went through the compiler. After compilation they removed the extra code so that the source looked unmodified. A decompilation of the binary would show that the binary and source didn't match. But, who's going to do that?
Solar Winds was a remarkably clever hack made all the more scary by the fact that it is very difficult to think of a countermeasure that would prevent it.
Another scary idea is that Solar Winds is just a first generation malware. In future infections they may go deeper. Even to the level of doing Thompson's suggested hack to the compiler itself.
It should be possible to reverse engineer a compiler to look for suspicious behavior. Aside from that, I don't see how we could be sure we're getting what we expect from looking at the source code.
It doesn't really solve the issue, but at the very least we have clang, gcc, MSVC - all independently made. As well as nasm, masm, yasm and that for assemblers
Yes. I'm not worried about this in software, (though it is worth keeping an eye out for it.) Hardware is where this problem exists. (Christopher Domas x86 demonstrated that at Blackhat 2017).
The problem is how do you compile those codes without using a precompiled compiler?
@@terranceandrewdavis7070 they can't all realistically have evil code embedded in their binaries that recognize the other compilers and know how to patch them, even as they are being heavily modified over decades. That would require the evil patch to be a general AI, AFAICT.
So did Ken Thompson create a monster? No, I'm pretty sure he didn't. For a start, you wouldn't go tipping people off about the possibility if you'd done that, now would you?
I'd also trust the work of Richard Stallman, who personally wrote much of the first version of GCC.
the monster was there all along, he made us a service by showing it.
"I'd also trust the work of Richard Stallman, who personally wrote much of the first version of GCC." Sure, he wrote the first version of GCC - but can you trust what he compiled/assembled that first version on on?
@@RedwoodRhiadra ~ Well, strictly speaking, no, you are correct, but I would tend to think in 30 years of work and revision and overhauls and upgrades, by a large group of people, all working (at least potentially) in the open, somebody would've noticed something not right about the code or its behaviour.
Ken Thompson invited people to have a look at his implementation later. Only one guy took him up on the offer long after that. Wrote a blog post about it. Fascinating read as well.
I think the point was trust. Do you trust what you can’t see. It would be like having a food replicator from Star Trek that included a little extra something in your food. LSD or maybe something that makes you comer compliant. This is indeed brings about great debate about trust but I think also highlights how we should be more critical of the tools we use.
Just write a C compiler in assmbly language for the target architecture, then assemble it manually and then use that machine code to bootstrap your way up gcc. Easy, right?
The original claim included that the kernel may also have been compiled by the evil compiler at this point, so it can do things like seeing your assembler reading a GCC source file and add in the evil code patch. I'm a bit dubious it could reliably add it to compilations and not to, say, editors, or compiler tests.
I've heard reports of (though not seen) C compilers that when you build the Unix login program that will grant superuser access to anyone who logs in with the username "kt" and no password.
I know what this is going to be about. Looking forward to Professor Brailsford's take on it.
Can you do a video on the RMS and the free software philosophy?
RMS, flawed a man as he is, does indeed deserve more spotlight.
If anything, really really bad timing.
He's been toxic to the community for years, decades even, yes, he helped a lot early on in bootstraping the movement, and yes, he was a genius programmer, but he has tainted his legacy, and caused a lot of harm.
@@GabrielPettier spare us your shitty politics, youre the only toxin in the FOSS community
I've seen somewhere that this is somewhat circumventable if you have a compiler A that you can trust (though a bootstrapping mechanism or whatever).
If you have another compiler B that you want to test for trustworthiness you can now compile B using B and compare the resulting executable with the executable generated when compiling B using A to C and compiling B with C again. If the generated executable matches exactly, you know that you can also trust the compiler B.
Usually different compiles don't provider exactly the same binaries. Maybe C compilers with the optimizations turned off get close. The modern thing to do is start with:
reproducible builds.
@@autohmae The idea is that if you compile B using A the internal logic should be the same as the original B, if B wasn't tampered with. If you now compile B using C it should be equivalent to compiling B using B.
@@oj0024 You had this in your original comment: "If the generated executable matches exactly". I read that as: "if the executables have the same cryptographic hash" All I'm saying this is unlikely to happen with different compilers.
@@autohmae That's what I'm saying, only that you have different compiler binaries from the same compiler, hence the output should be the same.
If you were to compile gcc using clang and then compile gcc again using the resulting compiler you get from compiling gcc with clang the compiler should be the same as when you compile gcc with gcc.
gcc compiles gcc -> gcc2
clang compiles gcc -> clanggcc
clanggcc compiles gcc -> clanggcc2
now clanggcc2 should be the same as gcc2 (on the bit level).
@@oj0024 Interesting circumvention method
Perhaps you could do a video about Xenix. My first IT job in 1989 was in a network of Olvetti MT24s running Dos3.2, but my boss was trailling Xenix, a Unix variant that was owned by Microsoft at the time. I believe that Xenix morphed into SCO Unix, and Microsoft bought the Carnegie-Mellon micro-Unix kernel and adapted it to become Microsoft NT. I've often wondered why, if Microsoft had already bought the Xenix Unix kernel, why didn't they run with that instead of spending a decade to develop an alternative based on another Unix kernel.
Xenix was owned by Microsoft at the time, however NT was build from scratch by former VMS folks - it had no roots in UNIX.
Trust these elliptical curves I propose and build your security around them!
use exactly this cpu instruction to get randomness, it'll be more efficient, do not mix with other entropy sources!
"Hang on, that elliptical curve is a straight line!"
The only case I've heard of this is when the NSA wouldn't say why you should use a particular IV, IIRC? It later turned out if you didn't, there was an attack that would seriously weaken your keys and them explaining this would have immediately made a bunch of existing systems vulnerable.
But yeah, that's why "nothing up my sleeves" values like digits of pi or ASCII are valuable.
I blindly trust the developers who package my distro and its build tools. This isn't a joke. I just give up. If they can't break the chain of compiler poisoning, I sure can't.
What's scarier is that you could engineer even the debuggers and hex viewers to pretend the extra code doesn't exist.
I don't think this is a real problem with compilers but it's a crazy security nightmare with embedded code and the low lvl stuff in your devices etc.
Thompson actually did put this backdoor in the AT&T cc. It is a real problem with compilers.
This provides a new meaning of the saying the compiler is smarter than you.
Do you have a link to the Doctorow article you mentioned please?
It's in the Info block at the top of these comments. Just press on "SHOW MORE"
@@profdaveb6384 that's great thank you. Sorry I did look there before when I watched your video but must have missed it. Small screen, poor eyesight...
5:50 So there is really no such thing as a password when all that needs to happen is flip a bit in the right place.
This sort of issue is why the teaching of philosophy is so important. Who polices the policeman? It seems that the little policing there is, is after the fact. As an interested outsider vaguely learning of high profile security breaches, it seems that any database directly connected to a communication system is functionally pre-hacked. Am I wrong in this?.
I agree with your observation about the philosophical part of all this.
Our bodies do the same sort of "truth checking" at every system level, from DNA to organ function.
What's needed are standards of operating behaviors, for everything from chip level to hardware to gui.
Only then can everything check everything else.
I think that's a solved issue in software. The undocumented arm processor that intel embedded into their x86 processors, (demonstrated by Christopher Domas) is the manifestation of this issue.
It's not solved. The trust can be pushed down the stack, but the only solution is to build everything from scratch yourself.
And yes, Intel and AMD have put spyware in the hardware. What are you going to do, use an Acorn?
11:14 Ah, you mean what the companies behind the DMCA actually tried to to?
(and the ink comment seconds before is also actually a thing, printer ink is more expensive than gold, seriously)
For compilers I suspect this will fail as the software evolves over time. Eventually the trojan will no longer be compatible with the new version. The points it was hooking into will eventually be refactored.
Consider that you could change what code the subsequent bad compiler is 'hooking' onto in each iteration of the recompiled compiler down the chain. Based on the fact that the code you're fed each time is fully read into the compiler, you always have a copy of the source code for the previous version and can automatically adjust each new version of the compiler as needed. The only way to be truly sure is to go back write your own compiler in machine code.
@@GeorgeBratley "The only way to be truly sure is to go back write your own compiler in machine code." ... running on a computer you designed and built yourself from discrete transistors...
@@RedwoodRhiadra Haha, yes! There lies the trust issue: Where do you draw the line?
did the fact that thumbnail for this video has an problem with the mirror mess with them too? The on trusting trust isn't mirrored correctly.
I like Cory Doctorow's books.
I've heard it said when you are asked in a job interview what your biggest weakness is as a software engineer? that the best answer is "I'm not Ken Thompson".
The moral is: You can never trust anything(not restricted to code or programs) that you didn't create or build from scratch by yourself. If you need any tools to build the thing in question, you have to make the tools from scratch in the same way. This applies recursively until you are down to the simplest tools poseible that are buildable from raw materials and human power. But even if you do all that, you cannot be sure that your mind is not being manipulated. This chain of trust doesn't have a beginning.
Nowadays people are (formally) proving compilers correct. Of course, you have to "check the proof checker" somehow but its a LOT shorter than "the code" for a C compiler and you only have to "check" one proof checker, which you can use to prove correct any compiler you want (and much more). Of course, proving stuff correct in this way is very hard. But people manage to do it these days.
It might be checking the source for correctness, but that in itself does not address the problem. Does it check the compiled binary along with the source to verify that the binary, when run on tatgetted architecture, will function according to source specification?
(If so then that would indeed be a great step forward, even as it would leave unanswered the question of whether the microcode of the target architecture does in fact cause the architecture to execute the machine code faithfully.)
I'm willing to bet a complete formal proof of a C compiler would be about two orders of magnitude bigger than said C complier.
@@SimonBuchanNz : Still within the range of feasible.
@@zapazap feasible, but I've always been dubious of the claims that software proofs are actually beneficial, and in this case if they can sneak something evil into the C code, surely they can also sneak something evil into the proof if it's that much bigger - but perhaps I'm misunderstanding something.
@@SimonBuchanNz The point is that it doesn't matter how big the proof is. It only matters how big the proof checker program is. The proof checker *is* much smaller than a C compiler. Then, as long as the proof checker is correct, there is no way you can put anything evil into the proof, which is litterally the point of proof verifcation. (You might be able to put evil stuff in the claims that you prove, i.e. prove something completely unrelated and claim that this means your compiler is correct. However, the claims for such a proof can fit on a page of paper and people will read them).
Overall, what such proofs achieve is to make completely clear what you mean by "correctness" and eliminate a LOT of places where this correctness may have been violated. It's much easier to make sure "beyond a reasonable doubt" if a microchip is doing what it should than to make sure "beyond a reasonable doubt" if a microchip AND the compiler's source code are.
I know it would be a lengthy project, but wouldn't the solution be to rebuild the compiler from assembler? Start with something very basic, just enough to allow a very simple C program to be written and compiled to a compiler, then rebuild feature by feature with all code visible and replicable for security researchers to verify step by step. You'd really only have to do this once, and you'd have in the end a completely traceable line of source code for any compiler that branches from that core project.
@@tyrenottire My entire point was that you wouldn't use anything pre-existing.
That has been done. A lot.
But do you trust the CPU not to inject its own code?
Do you trust the code libraries you use?
@@davidwuhrer6704 No, code libraries would be "pre-existing code". Of course those would be untrusted. As for CPUs, perhaps those would also need an audit, I don't know enough about their capacity to alter code during compilation.
That's why we should use some wacky compiler like movcc for one of the steps in the bootstrapping chain. Because there's no way whoever wrote the original evil gcc would anticipate movcc, so movcc won't get detected as a compiler, so the malware can't propagate through it.
It is an interesting thought experiment, but how would the exploit work for cross compiled code for architectures that did not even exist back when Kernighan wrote his first compiler? Seems to me it would fail at some point, but I guess later bad actors could do the same trick.
Its a topical subject. Very worthy content.
Once we moved away from discrete components and into ICs we lost all hope of ever being secure or knowing what is really happening.
Are you sure you know what that resistor is doing?
@@usafa1987 You can test a resistor. If it secretly stateful, intending to behave in undocumented ways in ceryain conditions, as part if a logic circuit it can only give corrupt output. It cannot, independently, inject malicious code.
(I am ignoring the possibility that there is imbedded intelligence in the resistor package that is communicating with other similar rrsistors. This could be a nightmare.)
An EPROM gate array is pretty testable, and after testing can be programmed in house to perform the logic you want.
It's not ICs themselves that is such a worry. It is ICs that do internal computation, and whose usefulness relies on the veracity of these computations, that are worrisome.
Or is it when humans developed consciousness, or learned to control fire or similar :D
I wonder if it is feasible to build a physical compiler without any ICs which matches the spec/source of an existing one, and have it be fast enough to compile a clean compiler binary in some reasonable amount of time.
Would you design that hardware implementation in VHDL or SystemC?
why not just not use your own compiler
Look into Gnu Guix and Bootstrappable Builds if you want solutions to this problem.
There IS a way to detect the trusting trust attack. Just use completely different compilers (such as clang and gcc), and make them compile the same compiler source. Then make the resulting binaries compile the same source again. If the final binaries are different, one of the compilers you used is compromised!
Different compilers make plenty of different decisions, different optimizations. I’d be shocked if two different compilers gave identical binaries for non-trivial code.
@@usafa1987 They are both the same compiler (from the same source), just created by different compilers. The differences in the binary for the same source should still result in the same binary at the end.
@@usafa1987
To further clarify, the procedure to use is as following:
1) have the source for the compiler that you want to test for "evilness". Let's say gcc.
2) have an entirely separate compiler that can compile that source. Let's say clang.
3) compile gcc with gcc. Call the resulting binary gcc-gcc.
4) compile gcc with clang. Call the resulting binary clang-gcc.
5) now you have two compiler binaries. They will not be identical because of differences in gcc and clang, but they should behave exactly the same.
6) compile gcc with gcc-gcc. Call the resulting binary gcc-gcc2.
7) compile gcc with clang-gcc. Call the resulting binary clang-gcc2.
8) now you have two compiler binaries. They ought to be identical.
How do you know you have "completely different compilers"? It's just shifting the problem.
@@Jitter88 If a clang compiler was disguised as a gcc compiler, you could find it out pretty easily. Their options are different, their errors are different, it's not that hard to differentiate them.
what season is there for him to wear an hawayan shirt!
Could you include a link to Cory Doctorow’s article in the description.
Yes it's in the Info block at the top of these comments. Just press on Show More
@@profdaveb6384 Thanks for adding it. That was a great read along with the paper.
Thanks for the video.
As long as the language specification is open, can't you always write your own compiler in assembly? And as long as the source code of the popular compiler is open, couldn't you then read through it thoroughly and compile it using your different compiler, to get a fully featured compiler which definitely does not carry the poison any more?
You can, in theory. In practice, this is impractical, because you actually have to go back all the way to making your own computer from discrete transistors, since this sort of attack can begin even at the hardware level. Unless you trust your IC manufacturer. (If you're this paranoid, you can't.)
@@RedwoodRhiadra How would such an attack practically be performable with mere hardware? To covertly inject the malicious seed into everything that gets executed and/or compiled on the hardware, surely would require an immense amount of extra computation (or other detectable stuff), given how little knowledge about the performed code the hardware is supposed to have?
@@fdagpigjYou don't need to alter the compiler if you attack at the hardware level. You just need to alter the hardware. By adding another hidden processing core running Minix for example.
You could make sure your backdoor is added to every processor by having the processor add the backdoor to the output of the hardware compiler. It wouldn't be in the executable of the hardware compiler itself.
I read a paper a few years ago that had some solution to this, although I think it required another trusted compiler. Can't remember the name, though.
Reproducible builds.
Unless the Trojan Horse is in the CPU, in addition to code you have written yourself, you can trust machine code that you have stepped through in an in-circuit emulator. I used to debug disassemblies of C code on a 286, without source-level debugging. After a while, you become adept at seeing how the machine code corresponds to the source. In principle, you could hand-compare source and binary line by line, function by function--and once you've eliminated all the machine code that can be explained by the source co de, anything left was added by the compiler.
No. For one thing, the debugger might be affected. But more importantly, an optimising compiler creates data paths that are not in the source, so there is unlikely to be an idempotent bijective relation between object code and source code. And thirdly, modern CPUs use software interrupts, which are not something you can just step through. They can easily be hidden from the debugger.
I guess I am missing something; I understand the implications of Thompson’s paper, however, what’s to stop someone from doing dynamic analysis on the compiler as it processes source code, and match the binary to expected output, or a hash? While this would be time consuming, my gut is saying that you could leverage automation of manually verified proofs to determine if a compiler is secretly malicious, no?
Yes! With the computing power; but those early days were lacking of it. And I guess for giant programs like those we have today, this keeps being an issue....
well, these days we have cpus reordering instructions on the fly, instructions emulated by others at a different abstraction level, instructions programmed by microcode (software) and patchable from userland (recently discovered undocumented instructions in intel…), so sure, we can with effort solve the issues from 40 years ago, with some limitations, but i fear we are quite in trouble for today's problems…
" match the binary to expected output." Compilers are complex enough that computing the expected output by hand simply isn't feasible.
So, a static review of the source wouldn't find this so-called compiler bug.. and dynamic testing would only test for functionality. Isn't it possible to do binary analysis? Perhaps using the source to derive an expected result?
Sure, you can derive an expected result from the source - using the compiler.
You can also decompile the object code and compare it to the actual source. Do you trust the decompiler?
What if the "bug" is hidden in the hardware?
It's especially interesting considering the recent issues with the security of the dotNET build system. I guess that's one of the issues that just have no good solutions.
What recent dotNET build system security issues happened?
.
.
@@Liriq same
I don't know what issues you're talking about, but I'd be very surprised if it has anything to do with the subject of Ken's paper. Malicious code (intentional or otherwise) and sloppy code (again, intentional or otherwise) are of course major generalized security concerns but those are obvious ones.
Ken's paper was about _hiding_ malicious code so that it exists in the binary and can propagate itself to new versions of the binary, but no longer exists in the source (so that a source code audit would find no trace of it ever existing).
The vast vast majority of security issues (maybe all of them) are plainly visible in their source code if you know what to look for and where. Obviously they're still a problem and "what to look for" is itself a difficult challenge, but its not quite in the same class as what Ken was describing.
Isn't this where reverse engineering tools like Ghidra help? Then again Ghidra could be programmed to ignore specific rogue code... Run Ghidra on Ghidra...
Now I'm going cross-eyed
Make your own computer from scratch and program your own reverse engineering tool for that computer.
@@gamekiller0123 Just adding specificity: scratch here means transistor level. If you use IC's developed by someone else, no matter how simple, it CANNOT be trusted.
@@lokeshchandak3660 build your own transistors.
I'm sure that I saw something not long ago about how someone had managed to embed compete ICs in what looked like simple surface-mount components, so that what seemed to be a resistor or capacitor could actually take over a whole circuit-board o.O
@@gamekiller0123 TempleOS
@@gamekiller0123 damn, i would not trust someone like me to do that.
running GNU/Linux this is something I worry about much the time. Is my machine doing something that I don't know about and wouldn't want it to be doing? I implicitly trust the code I'm running (that I didn't write myself; the code I wrote I don't trust at all, but that's another story), but should I....? I implicitly trust Stallman and Torvalds and many others, but who out there is shipping malicious code unrecognized? Easter eggs were a big thing at proprietary corps. Is there something similar but more sinister in binaries that are ostensibly open source where I never look at the source? It's thanks to the tireless workers at FOSS that we have to thank for reducing our levels of anxiety, and to Professor Brailsford and the people at Computerphile, not forgetting the brilliant Ken Thompson of course. But what about the big corps and backdoors? The Intel IME... (surely doing work on behalf of government orgs everywhere)
Dear professor it is late here and i wanted to relax myself with a bit of knowledge, all I got is anxiety ;-)
I've tried to explain this to people who claim to care a lot about security and they either didn't understand or refused to acknowledge that it was a serious problem.
I feel this a little pumped up. Just because something is a binary doesn't mean that you can't look at it, disassemble it and analyze its semantics. Interesting topic though, for sure.
Truster Trusting Trust
couldn't you detect this by recursion? putting in the same code over and over, if the size of the result increases over time there's something it's inserting stuff
Practically, this is exactly what you'd end up doing, alongside other heuristics like network/code sniffing.
However, the code could fight back by detecting your attempt to sniff it out, and take defensive measures (e.g. cloaking itself, causing the detection code to crash with its own recursion, etc.). Modern viruses do this when they go into the registry / killing processes to disable antivirus programs.
It ultimately becomes an arms race between the bad guys and threat detectors. This is why the Trust problem isn't solved (from a theoretical perspective) with detection techniques.
WOW! STAGGERING.
I suppose one _could_ write one's own C compiler in assembler, and then bootstrap up from that using something like the gcc source tree, which is extremely thoroughly reviewed.
There are projects out there that try to make it easier to bootstrap compilers like TinyC from your own, literally hand-made bytecode compiler. TinyC could then be used (with a lot of patches) to compile GCC from source, completely eliminating the trusting trust problem.
@@longlostwraith5106 That would actually be quite a fun project. You have any links?
@@longlostwraith5106 Would it be feasible to bootstrap a compiler writing the first iteration in assembly for a machine, then re-writing the compiler in its language, then creating tools to address other machines and make decent code?
I recall also the story of TMG for PDP-7: Doug McIlroy ported TMG to the PDP-7, and wrote it in TMG itself (giving "his piece of paper another piece of paper") bootstrapping it, and then Ken Thompson used it to supply the B language (he wanted to supply FORTRAN but it was too heavy for that machine)
This and even more is done. I have a script that bootstraps GCC from 357 byte self-hosting hex assembler and 757 byte shell. Except for any POSIX kernel, no other pre-built stuff is used. Not even pre-generated configure scripts or pre-built bison parsers.
@@BytebroUK search for fosslinux/live-bootstrap on github
I think TH-cam has screwed up.
I watched a video by Computerphile considering if we can 'trust trust'.
Then I switched to a channel for the Unity 22 li9ve stream, . Waiting for it to start, read the preamble, all okay, and then moved on down to read the comments.
And found... all these.
Talking about compilers, and bootstraps and such, about 3 months ago.
Thought 'WTF has that got to with..." and then it clicked.
Still; got the comments section from the previous video.
Hmm how often does one get to post on a video, about a different video, and yet still concerns the video one is posting on?
It's worse than that now. Now we have processors inside our CPUs that are more privileged than the CPU and completely outside our control.
Can we trust Richard Stallman and his gcc compiler? What if it has secret code to mess with the binaries in case you are compiling proprietary software?!
Sounds like what happened to XZ Utils
And what if somebody writes the c compiler from scratch in assembly just following the C language grammar? That should be safe as the grammar rules are pretty clean and strict (now talking about c, c++ has turned into a pure nightmare over those years)
Sure. If you do that, you only have to worry about the Trojan horse in the assembler instead.
Wasn't so long ago you could logon to a Mac by changing the username to admin and using a blank password.
CB UNIX is in my DNA. Brian's new book is most excellent.
seems like this is becoming a logical uncertainty problem
With the cloud it gets even more hidden. Hope it rains soon. ;-)