The Barrier Between Source Code and Compiled Code Has Dissolved

Vulnerability discovery, malware detection, IP protection, and the security assumptions that just broke

Mar 24, 2026

The thing that made humans special, our ability to reason, is now just an API call. You buy it by the token. The implications for this are so profound. You could freeze all AI research today and society would spend the next decade radically transforming as we figure out how to apply what we already have.

We can’t figure out everything all at once, so let’s zoom in on just one transformation: software reverse engineering. Most people in the security world are focused on phishing emails and deepfakes. Those matter. They get more attention because people are familiar with it. Hardly anyone was familiar with nuclear theory in the 1940s, but that didn’t stop them from being profoundly affected by it. Similarly, most people are familiar with reverse engineering, but our new found ability to automate what used to be a rare and esoteric skill has far reaching implications from IP law, to malware detection, to war fighting.

The next war is firmware

It’s a common mistake to think the next war will look like the last. Lots of soldiers died in World War 1 because the generals were still used to charging people with horses. But machine guns were a thing, and they hadn’t updated their doctrines. I used to assume war would look like Call of Duty or something — fancy armor, smart bullets, lasers, super advanced tanks, etc. There’s some of that, but there’s also a lot most people didn’t expect. Look at Ukraine. Look at what’s happening in Palestine, in the conflicts involving Iran. The weapon that changed the calculus isn’t some trillion-dollar fighter jet. It’s a cheap drone with an explosive strapped to it. A flying computer.

Bayraktars and Grenade-Dropping Quadcopters: How Ukraine and Nagorno ... — Imagine a swarm of these. People still debating gun control, lol. Also, this nerd made a rocket launcher: http://www.youtube.com/watch?v=DDO2EvXyncE

Drones are firmware. Rockets are firmware. Every weapon system that runs code has a software attack surface. The nation’s infrastructure, utilities, power, water, are all powered by software. It can be exploited. The Department of War seems hip to this already: I called a bunch of reverser friends and asked what they were working on. The answer was usually: “Drones.” They couldn’t say more, but the pattern was clear.

And it’s not just drones.

When the US extracted Maduro from Venezuela in January, Cyber Command hit Venezuela’s power grid as part of the operation -- opened circuit breakers remotely, desynced control systems, staged a blackout across Caracas. Trump bragged about it on camera: “It was dark, the lights of Caracas were largely turned off due to a certain expertise that we have.” Stuxnet destroyed Iranian centrifuges over a decade ago -- the first cyberweapon confirmed to cause physical destruction of industrial equipment. Russia has shut off Ukraine’s power grid via malware three separate times (2015, 2016, 2022), and the second attack was fully automated -- what took twenty operators in 2015 was codified into software by 2016. China’s Volt Typhoon has been sitting inside US water and energy utilities for five or more years, waiting. Not stealing data. Just... sitting there. Pre-positioned for a conflict that hasn’t started yet.

And after the US and Israel struck Iran in February, sixty-plus hacktivist groups activated within hours. CISA issued emergency advisories. Iran’s MuddyWater was already pre-positioned inside US banks and airports with two new backdoors nobody had seen before. The IRGC’s CyberAv3ngers went after water treatment plants. A group called Handala Hack hit a US medical device company.

This isn’t superpower versus superpower. It’s IRGC-funded hackers versus the Aliquippa Municipal Water Authority -- population 9,300, one part-time IT person, equipment dating to the 1930s. They got in through a PLC with the default password “1111” on an internet-facing port. That water authority had never had outside cybersecurity help. Ever.

And Aliquippa is not the outlier. It’s the norm. Seventy percent of US water systems inspected since 2023 fail basic cybersecurity requirements. There are 170,000 water and wastewater systems in this country, serving 300 million Americans, and most of them have no dedicated cybersecurity staff at all. The Littleton Electric Light and Water Department in Massachusetts didn’t know Volt Typhoon had been living in their network for 300 days until the FBI called them on a Friday afternoon. A small-town public utility versus Chinese military intelligence. That’s the game now.

Vibe-coding credential stealers

It’s getting easier to write malware. Malware written by LLMs has been found in the wild since at least 2024 -- HP’s threat researchers caught it because the code had helpful comments and descriptive variable names, which, and I cannot stress this enough, is not how humans write malware. Google found a malware family called HONESTCUE that calls Gemini’s API at runtime to generate its payloads on the fly. Russian-speaking actors are vibe-coding credential stealers with ChatGPT. SentinelLabs found what may be the earliest known LLM-enabled malware -- a Python tool that uses GPT-4 to generate ransomware dynamically, dating to before November 2023.

Finding vulnerabilities used to require deep expertise and enormous time. Google’s Big Sleep project found an exploitable zero-day in SQLite that 150 CPU-hours of fuzzing missed. Anthropic claims their latest model found over 500 high-severity zero-days in well-tested open-source codebases, then 22 Firefox vulnerabilities in two weeks and wrote a working exploit for one of them. Cyber benchmarks show capability doubling roughly every six months. That’s the kind of curve that turns a rare skill into a commodity.

Then there’s orchestration. A Chinese state-sponsored group used Claude Code to autonomously run 80-90% of a cyber espionage campaign against thirty organizations, a human only intervening at four to six decision points. Another actor used it on Kali Linux as a scaled data extortion engine, automating everything from reconnaissance to ransom demand generation. Anthropic’s own red team found their model can replicate the Equifax breach -- simulated, obviously -- using only standard open-source tools. Previous models needed specialized cyber toolkits. This one just needed a bash shell.

These reports are hilarious by the way. “Oh no, our model is just so capable and smart and powerful that these bad guys did all these bad things. It was awful, totally. Anyway, want to buy some tokens?” 😂

Writing attacks, finding targets, running campaigns -- all of it cheaper, faster, and accessible to people who previously couldn’t play. But all of it still depends on the same bottleneck: understanding the software itself. The firmware in the drones, the PLCs in the water plants, the code running on every device that matters. At some point, someone has to actually read the damn software. And that’s where things get really interesting, because the people who can do that are absurdly rare.

A language almost nobody speaks

Reverse engineering is taking compiled software apart to figure out what it does. You start with a compiled program (aka “binary”) and disassemble it to get machine code, or you decompile it to something that looks like ugly C, another low level language. There are no variable names, no comments, and often no function names. This was not meant for human consumption. It’s tedious and takes a lot of focus, like creating a ship in a bottle.

There aren’t many people who have heard of this let alone can do this. It’s barely taught in universities, there’s no career pipeline to speak of, and a lot of things have to line up in your head for it to click. You stumble into it through curiosity and stubbornness, or someone in the military trains you.

Tangent: In the US, the average strangeness of a reverse engineer is quite high. The selection filter is that many get into it via game cheats and software cracks. In Israel, on the other hand, everyone goes through military service and anyone willing and capable is trained to hack. You have very fit, socially well adjusted people that also know extremely esoteric low level malware tricks. It’s a culture shock.

Consider the scale of the gap. There are roughly 27 million software developers in the world. This is a high demand, lucrative profession with tons of on-ramps and sign posts in pop culture. On the other hand, many of the reversers I talk with think the number of people who can competently reverse engineer is probably in the low thousands. It’s a small community. I am hardly the social butterfly, I went to a wonderful reversing conference earlier this year (RE//verse) and somehow managed to know several people there, or they knew people I knew.

That scarcity was the moat. If you had intellectual property in distributed software, it was usually enough to compile the code. Not because compilation is some cryptographic fortress. Because compiled code is a language almost nobody speaks. And the few who could read it would need enormous time and effort to understand even a fraction of what the original developers built. The protection was never encryption. It was a language barrier.

So what happens when that shortage goes away?

Reasoning over code is cheap

Here’s what’s different now. You can take a binary, open it in Ghidra (the NSA’s open-source reverse engineering tool), connect it to a language model through an MCP server, and point the model at the decompiled output. Assuming it’s not packed (like a lot of malware), or obfuscated (like some commercial software), and it’s a native C/C++ binary (and not go or rust), you can pay a few dollars and wait half an hour and get a decent understanding of the file. If you pay a lot more and know what you’re doing and babysit the hell out of it, you can get something approaching compilable source code. Not pretty. Not perfect. But it’s easier than assembly and you can iterate on it to improve. How long until someone does this to a big game or an expensive piece of software and just dumps the “source” on GitHub?

A year ago this wasn’t possible. Today there are multiple open-source projects doing it. People are using Claude and GPT to reverse engineer Atari games, rename functions, trace control flow, identify algorithms. A team presented BinWhisper at Black Hat -- an LLM-driven framework that found real vulnerabilities in Samsung firmware by reasoning over decompiled code. Not a research toy. Real bugs, responsible disclosure, CVEs assigned.

We’ve found vulnerabilities this way too. We’re still doing the responsible disclosure thing, so I can’t go into much detail. This is just the beginning.

What breaks when compiled code becomes readable?

Your IP is more exposed than you think. If you distribute software, your source code has been “protected” by the difficulty of reversing it. That assumption is now wrong. A motivated competitor with a few hundred dollars in API credits can reconstruct your algorithms, your proprietary logic, your trade secrets. Not all of it, not perfectly -- but enough to matter.

Every compiled binary is a vulnerability audit waiting to happen. Remember the drones. Now extend that to industrial control systems, medical devices, automotive ECUs, IoT everything. All firmware. All reversible at a fraction of the old cost. The offensive applications are obvious, but flip it around: you could also audit your own systems. Compiled third-party code you deploy in your infrastructure -- how do you know there’s no backdoor? Until now, you mostly trusted the vendor, because the alternative was absurdly expensive. That cost equation is changing.

And then there’s the copyright question. You find an open-source library that does exactly what you need, but it’s GPL-licensed and you can’t pull it into your proprietary codebase. So you tell an agent to go rewrite it. Make it better, rewrite it in Rust, change the architecture. Is that still covered by the original license? Is it provable? I’d absolutely love to go down this rabbit hole -- the intersection of copyright law, clean-room reverse engineering doctrine, and AI-generated code is genuinely fascinating and completely unresolved -- but that’s its own article. The point is: someone’s going to test it, and the legal system is going to have to catch up. Food for thought.

All of this comes back to one thing: the language barrier was load-bearing. Software security, IP protection, supply chain trust, even international law, all of it assumed that compiled code was effectively unreadable. That assumption is dissolving. And right now, in this specific moment, there’s an unusual asymmetry.

The immune system hasn’t responded yet

Malware has an immune system, and it’s one of the most reliable patterns in all of security. When antivirus vendors started matching byte signatures, malware authors wrote polymorphic engines that mutated their code on every infection -- same payload, different bytes, signature useless. When vendors switched to heuristic rules, authors built metamorphic engines that rewrote the logic itself, not just the encoding. When sandboxes became standard, malware learned to check: am I running in a VM? Is a debugger attached? Is the mouse moving? If yes, play dead. When behavioral analysis caught up, the best operators stopped using malware at all -- living-off-the-land, running PowerShell and WMI, hiding inside the tools Windows ships with. Every advance in detection has been answered. Every single one.

But this pattern isn’t unique to malware. It’s how every ecosystem responds when a protection breaks. You think Microsoft is going to sit there while people convert explorer.exe into working C++ with a few hundred dollars of API credits? You think game studios are going to watch their engines get decompiled and shrug? Every company that ships compiled software -- every vendor, every firmware manufacturer, every DRM provider -- just lost a layer of protection they didn’t realize they relied on on for decades. They’re all going to respond.

Right now, nobody has. That’s why LLM-based analysis works so well today. You point Claude at decompiled code and it works. It examines functions, follows call chains, identifies patterns, annotates code -- because every obfuscator, every packer, every anti-analysis trick in the wild was designed to fool human analysts and traditional static tools. Not something that reasons about what code does in natural language.

They’ll adapt. They always do. On the malware side: LLM-resistant obfuscation, custom packers that generate adversarial code patterns, structures designed to exploit the specific weaknesses of language model reasoning -- attention limits, context windows, the tendency to overindex on variable or file names, strings instead of semantics. On the commercial side: new anti-decompilation techniques, new obfuscation layers, protection schemes designed specifically to break LLM reasoning. The arms race is coming from every direction.

And that’s exactly why the weekend-project version of this (Ghidra + MCP server, a good model, some clever prompts) is great, magical even, but it has an expiration date. It works today because nobody’s adapted (and you’re not looking at obfuscated weird code already). The person at the keyboard is still a reverse engineer. They’re just a stupidly more productive one. But when the landscape shifts, when obfuscation adapts on both sides, you need something that adapts back. Something that updates its understanding of new evasion techniques, that integrates tiered analysis, that doesn’t need one of the world’s five thousand reverse engineers babysitting every session. You need a product, not a technology stack.

Which raises the obvious next question: if everyone’s going to adapt, what does the defense actually look like?

What if detection could actually understand code?

Every detection technique I just described (signatures, heuristics, sandboxing, behavioral analysis) shares one fundamental concession: none of them actually understand what the code does. They approximate. They look for indicators of malicious behavior without ever truly understanding intent. That concession exists because understanding code is slow and expensive. Or at least it was.

When you’re building detection systems, you’re often thinking “I can see this looking at it manually, any analyst would, but how do I replicate that with heuristics and logic rules?” The fact that manual analysis can always find bad stuff is always in the back of your mind. Now you can do it and it’s orders of magnitude faster (and cheaper) than an expert analyst. Instead of byzantine rules you just have: “Look, this function allocates memory, writes shellcode-like patterns to it, then changes the page protection to executable. Flag it.”

“But Caleb, detection needs to happen in tens of milliseconds. You can’t run a big reasoning model on every function in every binary that crosses your network.”

True. And I think it’s solvable. You tier it. A very fast, small model handles 99% of the traffic -- the stuff that’s obviously clean or obviously bad. One in a thousand times, something looks suspicious, and you escalate to a more capable model that takes a couple seconds for a deeper pass. One in a thousand of those, you escalate again to a big reasoning model that spends 30 seconds on a thorough analysis. Cascading classifiers. Security has used architectures like this for years. The difference is that now each tier can actually reason about code instead of matching patterns.

Fast enough to matter

The amount of money and engineering talent being poured into inference speed is almost hard to comprehend. Jensen Huang spent his GTC 2026 keynote arguing that data centers are becoming “token factories” and that tokens-per-watt determines corporate revenue. The approaches are coming from every direction at once: speculative decoding that predicts tokens in batches (2-3x speedup, production-ready today), quantization that runs models at lower precision with negligible quality loss (another 30%+), model distillation that trains smaller models to match bigger ones (the 8B models of today are better than the 70B models of two years ago), entirely new architectures like Mercury 2 that generate tokens in parallel instead of one at a time (13x faster than Claude Haiku while matching it on reasoning). The progress compounds.

Then there’s specialized hardware, which is where things get really interesting for code analysis.

A startup called Taalas did something extreme: they baked the entire Llama 3.1 8B model directly into silicon. Not stored in memory. Hardwired into the chip’s transistors. The result runs at 17,000 tokens per second. That’s roughly 10x faster than Cerebras, 50x faster than GPU inference. The chip uses about 200 watts. Their team is 24 people. They spent $30 million.

It’s a small model. It’s not going to win any benchmarks. But think about what that speed means for the detection problem I described above. You don’t need <insert best model this week> to scan code for suspicious patterns. You need something fast that can read functions and recognize when something looks wrong. An ASIC running a code-specialized model at 17,000 tokens per second could reason about every function in a binary in real time. Think about what happens when you 10,000x the amount of thinking a model does about a piece of code.

Taalas is the most extreme example, but they’re not alone. Groq’s LPU architecture. Cerebras’s wafer-scale chips. Etched’s transformer-specific ASICs. FPGAs being used for cache acceleration. NVIDIA just announced they’re integrating Groq into their Vera Rubin platform specifically for the decode stage of inference. The whole industry is converging on the same insight: inference is a workload where specialization pays off enormously.

The trajectory is directionally obvious, even if the timing isn’t. Model architectures are still evolving fast, which limits the case for fully baked silicon today. But as architectures stabilize, dedicated hardware becomes inevitable. We’ve seen this movie before. Video codecs. Cryptocurrency mining. Network packet processing. The workload matures, the silicon specializes, the cost collapses. It happens every time.

The barrier is dissolving

Compilation used to be a one-way door. You compiled your source code into a binary, and human readability was gone. That one-way property was load-bearing. IP protection depended on it. Security assumptions were built on it. Entire industries operated on the premise that compiled code was effectively a dead language.

That premise is breaking. The models are already good enough to be useful and improving fast enough to be scary. The hardware specialization wave hasn’t peaked. The immune system is just starting to wake up.

If your business depends on the assumption that your compiled code can’t be read, that assumption has an expiration date. If you’re responsible for the security of systems running firmware, understand what automated analysis can already do. The language just got a universal translator. And nobody’s rewritten anything yet.

Caleb's Commentary

Discussion about this post

Ready for more?