The lab caught up to the office

TL;DR

Hack The Box practice rooms used to be a museum of historical bad decisions. They’ve started being a preview of the stack you’re running this quarter. The bugs are mostly where they’ve always been — what changed is that we stopped inspecting what we trust, while building systems whose surface area outgrew anyone’s capacity to inspect them. The protocols, ecosystems, and operational patterns we depend on were correct in their original frames; they composed into something nobody can reasonably review, and the gold rush rewards velocity over reading what you depend on. The same labs whose products created the new attack surfaces are now selling the cleanup. If you ship software for a living, the lab work is increasingly the same shape as the day job.

I’ve been dragging my heels on HTB Lab for a couple of years, but I still watch the boxes drop each month. After two decades of building the kinds of systems they model — the platforms, the pipelines, the auth gateways, the half-considered IAM policies — the practice space is one of the few places you can read an industry’s blind spots without waiting for an incident.

The boxes used to follow a recognizable rhythm. You scanned a host. You found a service. The service was broken on purpose. You broke it the rest of the way and moved up. That rhythm has been drifting for a while, and the last several months drifted far enough to be worth saying out loud.

The boxes don’t look like labs anymore. They look like the stack a working company runs — the same identity providers, the same orchestration layers, the same protocol decisions, the same library pinned to a version that was reasonable six months ago. The bugs aren’t in software set up to be vulnerable. They’re in software set up the way it actually ships.

We stopped looking

The bugs are mostly where they’ve always been: in code somebody wrote. SQL injection still works. IDORs still work. Deserialization still pays rent. None of these classes retired. What changed isn’t the bug. What changed is who wrote the code, how much of it there is, and how closely anyone looked at it before betting on it.

The actual dependency-selection ritual at most companies I’ve worked at is a license check, a star count, maybe a glance at last-commit date. The actual inspection ritual for a new SaaS integration is a SOC 2 report someone skimmed and an OAuth scope page nobody read end-to-end. The actual review of a new IAM policy is whether it passes terraform plan. None of these are critiques of specific teams; they’re how every team I’ve been part of, including ones I led, actually decided what to trust. Multiply by a thousand decisions per quarter and you have the modern enterprise.

Attackers know this. The economics of finding a vulnerability in something popular and unread are unbeatable: one bug, thousands of victims, none of them looking. Introducing one isn’t much harder, as xz-utils demonstrated. The target environment got rich because the inspection regime collapsed — and it collapsed because the surface we were supposed to inspect grew faster than anyone’s capacity to inspect it.

The lab caught up to the production floor in the same motion. The boxes don’t reward novel exploitation skill anymore. They reward the same thing real incidents reward: noticing what nobody bothered to read.

Identity is the new perimeter

Identity-as-perimeter has been a keynote line for years. The labs made it concrete. A growing share of footholds turn on something an authentication framework was supposed to prevent — a token type confusion, a flow that skipped a parameter, a delegation primitive used against a target the architect never pictured.

These classes aren’t new in the literature. They used to be specialty knowledge, the kind of thing you’d see in a research talk or a single appliance advisory. Now they’re the foothold. Not the privesc. Not the boss-level twist. The way in.

That tracks the news. Storm-0558 forged Exchange Online tokens with a stolen Microsoft MSA signing key. The Snowflake campaign of 2024 was, end to end, valid logins replayed against accounts with no enforced MFA. UNC6395 stole OAuth tokens from a marketing integration last August and walked into the Salesforce instances of seven hundred organizations — Cloudflare, Zscaler, Palo Alto Networks, Workday, Tenable, Proofpoint among them — without exploiting a single bug, then sifted the exfiltrated data for AWS keys and Snowflake credentials to pivot further. Vercel was breached this month through an AI tool one employee had connected to their corporate Workspace with “Allow All” OAuth scope; the whole incident was an attacker walking the trust edge that one consent screen created. None of these were memory corruption. All of them were identity primitives behaving exactly as configured, in flows nobody had read end-to-end.

Supply chain shows up at the door

One flavor of bug used to be rare in these rooms: a vulnerability in a real dependency, in a recent version, that you might genuinely have running. The vulnerable thing was usually a custom service the box’s author wrote, or a piece of enterprise software pinned a decade ago and left to rot.

That’s still there. But the entry point is now, more often than not, a library you’ve heard of, in a version that shipped this year, with an advisory from this year. The gap between the version that’s vulnerable and the version running in your actual environment has compressed to nearly nothing.

If you’ve spent any time as a senior engineer staring at a package.json or a go.mod, this should land in a specific way. What’s getting exploited in these boxes isn’t somebody’s bespoke handler. It’s a transitive dependency three levels deep in a tree you didn’t write, picked because it was the obvious choice, pinned because it passed CI, forgotten because it kept passing CI. Most of what’s load-bearing in a modern application is, by line count, code you didn’t author and haven’t read. I’ve shipped services where I could have named maybe forty of the eight hundred packages in the lockfile, and that was on a good day.

The attacker economics follow from that. Hunting a vulnerability inside one company’s bespoke app is artisanal work — one line, one fish. Finding one in a widely-used dependency is the inverse shape: cast a net, catch everyone running that version. The xz-utils backdoor came within days of shipping in stable distributions because somebody spent two years social-engineering maintainer trust into a compression library nearly every Linux box transitively depends on. Polyfill.io turned a hundred thousand sites malicious overnight after a single domain changed hands. The tj-actions/changed-files Action compromise leaked secrets out of tens of thousands of CI pipelines from one upstream tag. The Shai-Hulud npm worm made the model explicit: not just one poisoned package, but a self-propagating one walking maintainer credentials through the registry. It came back in November at a hundred times its first scale, executing on preinstall so it ran before anyone could read what they’d just pulled. Last week the model jumped ecosystems: the CanisterWorm/Namastex compromise hit npm packages from an AI-agent-tooling company, propagated into PyPI from a single stolen token, and the malware was specifically built to harvest LLM API keys alongside the usual cloud credentials. The targets weren’t accidental. The worm authors know where the new credentials live.

When one advisory in a popular library is worth more than a thousand custom audits, that’s where the work goes. The boxes have started rewarding the same instinct.

The boxes used to feel like a museum of bad decisions, mostly old ones. They feel like a preview now — bad decisions still in flight, in code none of us wrote and most of us depend on.

Privilege escalation got quieter

The front of the chain got louder — modern stacks, identity confusion, fresh disclosures. The back of it got quieter. Pure kernel exploits are rare. Memory corruption is rarer. Escalation is, increasingly, a story about a legitimate tool doing exactly what it was built to do, used in a way no one designed for.

That’s a more honest picture of how privilege actually goes wrong in real environments. It’s almost never a flashy zero-day. It’s a script with more authority than the person running it realized, a trust relationship configured years ago and forgotten, a feature added to make administration easier that made compromise easier in the same motion. Volt Typhoon did not bring kernel exploits to US critical infrastructure; it brought wmic, netsh, ntdsutil, and patience. Salt Typhoon walked into the major US telecoms through router CVEs that had been public for years and lawful-intercept interfaces designed to be trusted. Australia’s intelligence service named both groups publicly in late 2025 — the campaigns aren’t a postmortem yet, they’re still in flight. Last month the same model played out from the other side: the FBI’s own DCSNet, the network that handles court-authorized wiretaps and pen-register data, was breached through a commercial ISP vendor’s infrastructure. CALEA mandated the wiretap capability in 1994. Nobody mandated it be secured against adversaries. Thirty years later, the bill arrived at the surveiller’s door instead of the surveilled’s.

The spectacle of memory corruption moved out. The actual shape of corporate compromise moved in.

The seams were features

Pull back from the three. They share a root, and the root isn’t laziness.

The identity primitives that are footholds now were features. OAuth2, OIDC, SAML, scoped delegation, refresh tokens, device codes, the consent screen — each was specced for a world where “machine to machine” meant two backends federating one trust relationship at a time. They became attack surface when we used them to glue together not two companies but two hundred, and when the typical enterprise’s set of trusted relationships quietly grew to include every SaaS its sales team signed up for and every AI tool its engineers tried out one Tuesday afternoon. There was never a moment to inspect the new edge. There was never going to be.

The dependency culture that ships supply-chain attacks was the right call once. Small composable packages were the answer to bloated standard libraries and runtimes you couldn’t fit on a CD. You imported what you needed. The economics worked for a long time, and stopped working when a clean npm install pulled eleven hundred packages and the typical engineer read approximately none of them. The ground moved without making a sound. I didn’t notice; I doubt anyone did.

The legitimate-tool privesc story is the same shape. We declarative-ified operations because logging into boxes by hand didn’t scale. “Senior admin runs ntdsutil” became “scheduled job runs ntdsutil with credentials mounted from a secret that nobody owns.” “SSH in and execute a script” became “GitHub Action with a service-account token in the runner environment.” The IAM policies, Kubernetes RBAC bundles, and OPA rego files those tools execute against are now the actual threat model — JSON and YAML that nobody reads end to end after the initial review, that accrete entries the way requirements.txt files do, that get a * slipped in at three in the morning under a deadline and stay there for the next four years. Each move was correct on its own terms. Each move also moved a piece of trust from a person who could be held accountable to a configuration that can’t.

There’s no villain in any of this. The protocols, the ecosystems, the operational patterns were designed by people solving real problems in their actual frames. I was one of those people; if you’ve shipped anything at scale in the last twenty years, you were too. The bill comes due now because those frames composed into something nobody designed — a surface so large that “we’ll review it” stopped meaning anything in particular. The seams between them are where the bugs live, and the seams are wide because we built them wide on purpose, to let things move through.

Difficulty as a flat line

The difficulty distribution has flattened. The hard boxes aren’t hard because any single step is gnarly. They’re hard because there are more steps, and each step crosses a different boundary — web to container, application to directory, one trust domain to another. Medium boxes are doing what hard boxes used to do. Hard boxes are doing what insane boxes used to do. Insane boxes are modeling small breach narratives end to end.

The component skills haven’t changed much. What’s being asked is that you hold a half-dozen of them in your head at once and notice where one ends and the next begins. Which sounds an awful lot like real incident work. Real environments don’t get owned because someone knew one obscure trick. They get owned because nine reasonable trust decisions, each made by a different team in a different quarter, compose into a path nobody owns end to end.

The golden age came back

When I started this work I thought I’d missed it. The 80s and 90s were the hacker stories — Mitnick, the phreakers, the era when a clipboard and a confident voice got you into a building — and the reason those stories were so good was that the systems were so bad. No encryption anywhere. Implicit trust everywhere. War-dialing tones over hotel lines. The boring engineering work I was about to spend my career doing was, I assumed, the cleanup.

The 2000s and 2010s were the cleanup, more or less. TLS got everywhere. OWASP made a name for itself. Security teams stopped being the people who said no and started being the people who shipped libraries. Memory-safe languages took ground. SSO became normal. None of it was finished work, but the trend line was up. I remember thinking, sometime in the late 2010s, that the interesting attacks had migrated to nation-state surface and that defenders had bought themselves a decade.

We did not buy ourselves a decade.

The current gold rush — cloud, SaaS, and now LLMs and agents — is moving faster than any platform shift I’ve watched, and the security posture of the new layer is roughly where web apps were in 2008. The npm ecosystem ships supply-chain compromises on a rhythm now and shows no sign of fixing the conditions that produce them; the worms have already learned the shape of the AI ecosystem, and the Namastex compromise last week was just the visible one. AI tools are being granted OAuth scopes against corporate identity systems by employees who were trying to summarize a meeting. MCP servers are wiring agents to internal tools without anyone settling what the access-control model even is — and earlier this month, OX Security disclosed that a single architectural flaw in the protocol gives every major AI coding assistant a remote-code-execution surface, with one chain in Windsurf requiring zero user interaction. State-sponsored actors got there first: the GTG-1002 campaign Anthropic disclosed last fall used Claude Code through MCP servers to autonomously orchestrate intrusions against thirty organizations, with the AI agent driving eighty to ninety percent of the tactical work — reconnaissance, exploitation, lateral movement, exfiltration — at request rates no human could match. The data-flow assumptions ACLs were designed to enforce — that a process either has authority or doesn’t, deterministically — don’t survive contact with a system whose next action is a probability distribution over text. We’re wiring those systems into our credential stores anyway, because the alternative is being slow.

The same labs are also positioned to monetize the consequences. Anthropic released Mythos earlier this month with Project Glasswing — a frontier model marketed for vulnerability discovery, claiming thousands of zero-days found in major operating systems and browsers, including a 27-year-old bug in OpenBSD and a 16-year-old one in FFmpeg. OpenAI released Codex Security in March, autonomous vulnerability discovery and patching at scale, with eleven thousand high-severity issues surfaced from public-repository scans alone. Google and the rest will be along shortly. I’ve been wanting capable defensive AI for years, and these are genuinely useful tools. But the shape of the offering is hard to miss: the vendor stack ships the velocity and the cleanup, in that order. The cleaning service was created by the people who created the need for it.

The lesson I had inferred from the 2000s — that we slowly got better at this — was a local trend, not a global one. The shape of the industry rewards velocity, and security is the line item that gets cut when the stock chart is doing what it’s doing. The labs caught up to production for the same reason the news caught up to the labs: there’s a gold rush on, and gold rushes have always been won by the people who built fastest, not safest.

I don’t think the golden age of hackers is coming back. I think it came back four or five years ago and I’ve just been slow to admit it.

What I take from it

The labs aren’t a market signal. They’re the work of a small group of people choosing what’s interesting and what teaches well. Their choices reflect their tastes as much as the threat landscape. Take the rest of this lightly.

That said: the practice rooms stopped being practice rooms. They aren’t a curated zoo of historical mistakes. They’re a slightly-ahead snapshot of the stack you actually run, the trust decisions you inherited, and the libraries you pinned this quarter. The exercises predict the news instead of recapping it.

The lab isn’t where you go to sharpen against artificial hazards before facing real ones. It’s where the hazards being modeled are, with a lag of weeks to months, the hazards your environment is sitting on. The boundary between practice and production thinned for the same reason the boundary between two systems’ trust models thinned. The seams are where the action is.

Most of what I notice in these labs ends up being a thing I notice at work three months later — a configuration that’s load-bearing in a way nobody wrote down, a token that flows further than its issuer intended, a library that’s reasonable to depend on except for the part where it isn’t anymore. Some of those configurations are ones I helped ship in an earlier company, in an earlier frame, when they were correct. The labs have always been a place to feel that earlier than the rest of the industry. They’ve started feeling it a lot earlier than they used to.

That isn’t a complaint. The work being done in these rooms is closer than it’s ever been to the work that needs doing outside them. The cost is that the rooms feel less like rooms.