Bug bounties are an archaeology problem

Change brings more brittle code and infrastructure. The most interesting weaknesses live at the seams between then and now.

Tell me if you’ve heard this story before. You spend a couple of days analysing a client’s attack surface, you find a misconfigured header, you find an authorization edge case you’ve seen the shape of a hundred times before. Two findings, both reportable, neither particularly interesting. You’re a day from filing them and moving on.

And then a third finding lands, and it isn’t from the surface at all. It’s from a blog post the company quietly unpublished six months ago, still indexed by the Wayback Machine, in which an engineer described the migration of one of their internal services from one auth scheme to another. The post mentioned, in passing, that the legacy endpoints would remain available “during the transition period.” The transition period was no longer mentioned anywhere current. The endpoints were still up.

The third finding is worth more than the other two combined. Not because it’s technically sophisticated — it isn’t — but because it points at something larger than itself. If the legacy endpoints survived, what else from that migration survived? If one engineer assumed the transition would close cleanly, who else on the team made similar assumptions? The single deleted blog post is a thread you can pull on for a week.

I’ve been doing more of this lately. Reading less of what targets currently expose and more of what they used to expose. Spending the first day of an engagement in changelogs, archived blog posts, and the early commits of public repositories rather than running scanners against the surface. The hit rate isn’t comparable. Findings that come from the history are deeper, more durable, and more interesting than findings that come from the present, because the present is the version of the system the company is actively defending, and the history is the version they’ve stopped paying attention to.

This isn’t a new insight. Senior bounty hunters have been doing fragments of it for years. What I want to argue here is that it isn’t fragments — it’s a discipline, and once you frame it as one, the techniques fall into place around a single observation: change is what introduces weakness, and reading change is how you find weakness. Software systems are not uniformly fragile. They’re fragile at the seams.

The seams are where systems fail

Civil engineers have known this for a century about bridges. Bridges almost never fail in the middle of a span. They fail at joints, at expansion gaps, at the places where two materials meet, at the points where one section of structure was retrofitted onto another that was originally designed without it. The middle of the span has been load-tested for years; the joint is where assumptions about how the two pieces of the bridge fit together turn out to be slightly wrong.

Software is the same. The middle of a service has usually been touched by enough engineers, exercised by enough users, and survived enough incidents that the gross failures have been wrung out. What hasn’t been wrung out is the place where v1 met v2, where the migration didn’t quite finish, where the team rewrote half of something and ran out of time before they could rewrite the other half. The interfaces between what we used to do and what we do now are where the load-bearing weakness lives.

Most bug bounty methodology, written down, focuses on the present. Lists of recon tools, taxonomies of vulnerabilities, walkthroughs of specific bug classes. That material is useful. But the practitioner who only looks at the current state of the target is hunting in the territory where the gross failures have already been found. The findings that survive in that territory are the ones that require unusual skill to surface — the deep configuration bugs, the chained-low-impact issues, the things that take real work.

The history is different. The history is full of moments where the team made a decision under pressure, didn’t fully clean up after it, and moved on. Each of those moments is a candidate finding. They aren’t always exploitable. Most of them aren’t. But the ratio of interesting things to look at to time spent looking is much higher in the history than in the present.

Reading the project’s history

The techniques split into two rough categories. Accidents are the places where the team moved fast and left something behind. Trades, however, are places where the team moved deliberately and chose to leave something behind, because the alternative was worse for them at the time. Accidents are easier to surface — we’ll start there.

For the accidents, the first place to look is almost always somewhere the company didn’t realize they were publishing.

I once landed a pen test engagement on the strength of a single thing I said in the kickoff meeting. After introductions I told the client let me see if I understand your tech stack correctly — and then I walked through it: the application’s framework and language, the database, the CI/CD platform, the automation framework on top of it, the operating system the production fleet ran on, the cloud provider. The client was visibly surprised and probably wondering if I was already in their fridge eating their food, given that none of the engagement-scoping documents had been shared with me yet. Most of the meeting after that was them catching up to a level of context I shouldn’t have had. What they didn’t know was that I’d just been reading it back to them, sentence by sentence, from the careers page of their own company’s website.

This is the methodology in compressed form. The artifact wasn’t created for security purposes. It was created for hiring purposes. But the constraint that produced the artifact — we need to find someone who can work on this specific system tomorrow — guaranteed it would be honest about what that system actually is. Once you start looking for artifacts produced under constraints that force honesty, you find them everywhere. Job postings are one. Conference talks are another. Public commits to open-source projects the company depends on are a third. The company controls its marketing carefully. It controls these other artifacts much less carefully, because they exist for purposes where careful control would defeat the purpose.

With that frame in place, the rest of the techniques are variations on the same theme: find the artifacts the target produced for non-security reasons, and read them with security eyes.

Public repositories are the next-richest seam. Even if the company’s main product code is private, they almost always maintain something open — a client SDK, a CLI, a documentation site, an API specification repository. The first three weeks of commits in any such repository are gold. Early commits are when the team is still shaping the project, conventions are being invented in real time, and things get committed that probably shouldn’t have been. A test fixture with a real credential. An internal hostname in a comment. A reference to a private package that was meant to be vendored. Files that got added in commit three and removed in commit five — gone from the working tree, still in history forever. Read commit messages between the diffs; words like temporary, workaround, hack, FIXME, and will remove are direct invitations to read carefully. The committer is telling you they themselves treated this code as suspicious. They were right, and most of the time the code is still there.

Look also for commits that delete heavily. A commit that removes two thousand lines is removing something that was important enough to be worth two thousand lines. The reason is often in the message — removing legacy auth, cleanup of old API. The substance is in the parent commit, which is still there. Whatever was deleted was deemed too important or too embarrassing to keep, which is exactly the criterion for worth reading carefully.

A different kind of signal lives in the tone of technical writing — commit messages, engineering blog posts, conference talks. Look for technologies the team writes about with affection. They’re proud of the choice, going into more detail than the situation strictly requires. That’s a system that’s load-bearing and that they’ve spent time making work — which means they probably also spent time discovering its limits. Now look for technologies the team writes about with contempt. The legacy thing they’re trying to migrate off of. The vendor they’ve been complaining about for years. Those are systems the team has emotionally checked out of, which means the maintenance has probably checked out too. Both registers point at interesting territory; you just need to know which one you’re reading.

Disappearing API endpoints are one of the best signals in the discipline. An endpoint that was prominently documented two releases ago and is now nowhere in the documentation hasn’t necessarily been removed. It’s been un-mentioned. The route handler may still be in the code; the load balancer may still be routing it; some downstream client may still be calling it. The team has decided you don’t need to know about it, and that decision is the one to pay attention to. Pull old versions of the OpenAPI spec from the repository’s history, diff against current, list every endpoint that disappeared, and check whether any are still online. A nontrivial fraction will be. Same trick with public packages — diff version-to-version and look for references to private repos, internal hostnames, and abandoned features that leaked in older versions and got cleaned up later. The .gitignore is its own treasure map: it tells you what the team didn’t want committed, which tells you what they were generating locally that they didn’t want the world to see.

The Wayback Machine is the obvious external archive. Less obvious: the search indexers themselves. Google has often cached technical content the company has since unpublished. So has Bing. If a company quietly took down a status page incident report, an engineering blog post, or a job listing that mentioned specific internal infrastructure, those documents are usually still findable for a window after the takedown. The takedown itself is the signal. People remove things because they regret having published them. The reason for the regret is the part you want to read.

For each of these techniques, the underlying question is the same. Where did this team move faster than their hygiene allowed? The history is a map of those moments. You can read it.

Vulnerabilities as load-bearing trade-offs

Here is where the methodology turns and where I think the post argues something most write-ups don’t.

Most security thinking treats vulnerabilities as bugs. Things someone failed to do correctly. Mistakes, oversights, ignorance, deadline pressure. The mental model is the developer didn’t know better, or knew better and was rushed, or knew better and was lazy. That model is correct for a large chunk of what shows up in vulnerability databases. The accident-archaeology techniques above are designed to surface bugs of this type — accidents that left a trace.

But there’s a second class of weakness that this model doesn’t describe well. A class where the developer did know better, wasn’t rushed in the way deadline pressure usually means, and made a deliberate choice to weaken something — because the alternative was that some other piece of the system wouldn’t work. The lock isn’t broken. The lock has been left unlocked, on purpose, by someone who had a reason. From the outside, looking at the unlocked lock, you can’t tell whether you’re looking at a bug or a deliberate trade. Both look the same.

A lot of vulnerabilities are not bugs. They’re load-bearing trade-offs.

Imagine the developer at the moment they made the choice. Tuesday afternoon, three weeks into a sprint they were already behind on, they had to integrate a new system into an existing one. The properly-secured version of the existing system wouldn’t accept the new system’s tokens — maybe because the new system’s auth was older, or simpler, or proprietary, or done by a partner whose engineering team was harder to work with than the team’s own. The developer had two options. Option one: fix the auth integration properly. Option two: turn off some of the validation in the existing system, just for the path the new system uses, with a comment saying they’d come back to it. Option one was a multi-week project that needed sign-off from another team. Option two was a fifteen-line patch they could merge that afternoon. They picked option two. They probably wrote the comment promising to come back. They never came back.

There’s a phrase I’ve started using for this category of code: permanently temporary. The developer fully intended for the trade to be temporary. The comment said it was temporary. Their team’s internal documentation, if it existed, said it was temporary. None of that mattered. The integration shipped, the sprint ended, the developer moved on to the next ticket, and the temporary patch quietly stopped being temporary the moment nobody was actively planning to remove it.

The pattern isn’t really about code. A while back I needed to connect a modem to a router on the other side of the room and didn’t have the patience for a proper cable run. So I laid the cable across the ceiling, hanging it from whatever it could catch on, with every intention of doing it properly the following weekend. It is not the most aesthetically accomplished work of my life. After a few weeks of promising myself I’d finally get to it, I stopped seeing the cable. It became part of the room. This is fine. The cable is downstream of the same human pattern that produces permanently-temporary code: a fast trade made under mild pressure, with an honest intention to revisit, and then the slow erosion of that intention by the simple fact that the workaround works well enough that nothing is actively forcing the revisit. Most code that gets called temporary in a comment is permanently temporary by the time anyone reads the comment again. So is most household wiring.

Now imagine the system three years later. The original developer is gone. Two reorgs have happened. The team that owns the system has turned over twice. The fifteen-line patch is still there, in production, and the system has now been built around it. Other code has come to depend on the validation being absent on that path. The integration the patch was originally for has been replaced by something else, but the patch outlived its reason for existing. Removing the patch now requires understanding what currently depends on the validation being weak, which nobody knows, because nobody has worked on this code in years. The permanently-temporary patch has ossified into a structural weakness that you cannot remove without rebuilding everything that has come to depend on it being there.

That’s a far more interesting finding than a configuration mistake. A configuration mistake gets fixed in a one-line commit. A load-bearing trade gets a Jira ticket that sits open for two years because nobody has the runway to unwind the architecture it’s now part of.

Finding these requires asking a different question than the accident-hunting techniques. The question is: why would someone implement it this way? Not what is wrong with this implementation. The wrongness is obvious. The interesting question is what else the developer was solving for at the time, that made this specific weakening of the system look like the cheaper option. Once you have a hypothesis for what they were trying to integrate against, you can usually find evidence in the public history. Partnership announcements that landed in the same quarter. Changelog entries about new system support. Hiring posts for engineers with experience in some specific other technology that landed at the same time. Conference talks from the company describing how they integrated with X. The integration is almost always public. The trade made to enable the integration almost never is.

This is where the methodology becomes forensic accounting. You’re not looking for the entry that’s wrong; you’re looking for the entry that doesn’t balance, then asking why the books were fudged at that line. Once you find the constraint that justified the fudge — the integration that needed to ship — you understand the system the way the original developer did, and you can predict where else they made similar trades. Because they did. Engineers under pressure don’t make one trade and stop. They make a style of trade, and the style is consistent across the systems the same engineer touched. Find one and you have a hypothesis for ten more.

No villain in their own mind

I’ve been writing a fantasy novel in spare hours, mostly evenings and weekends, about a clerk who can read the bureaucratic substrate underneath an empire’s accounting magic and find the entries that were never properly settled. The book is doing things at the level of plot and politics that are not relevant here, but one of its preoccupations turns out to be relevant: no villain in their own mind is evil. The corrupt magistrate, the noble who sold a frontier keep into default, the engineer in a Tuesday meeting three years ago choosing to disable a validation check — none of them experienced themselves as the bad guy in the moment. They each had a problem they were solving, a constraint that was crushing them, a local trade that made sense at the level of their information and their incentives.

Writing villain motivation, you learn quickly that what the character did is the easy part to specify. Why they did it — the chain of constraints, the local-rational decisions, the moments where they had two bad options and picked the one that hurt the people they couldn’t see — is what makes the character feel real. Readers who only learn the what see a monster. Readers who learn the why feel uncomfortable, because they recognize the trade as one they themselves might have made under the same conditions, and the moral certainty of the verdict gets harder to hold.

This translates with strange exactness to vulnerability research. The bug as a finding is the what. The constraint that produced it is the why. A finding that includes only the what describes a monster — here is a system that lets unauthenticated users do this terrible thing. A finding that includes the why describes a person — here is a system where, three years ago, someone was integrating a partner’s auth and chose to weaken this specific check because the alternative was a six-month re-architecture they didn’t have the political capital to ask for, and the trade has since hardened into structural infrastructure. The second finding is more useful to the defender, because it tells them what to change in their organization rather than just what to patch. It’s also more useful to the attacker, because it suggests where else in the system the same engineer made the same trade.

The methodology is, ultimately, an exercise in empathy with the past developer. You’re trying to reconstruct the world they were operating in, the constraints they were under, the choices they had — and asking which of those choices left a residue you can find. That’s a different posture than the typical recon mindset. It’s slower, more patient, less adversarial in tone, and produces better findings. The developer wasn’t being a villain. They were solving Tuesday’s problem with Tuesday’s information. The fact that you’re now finding the residue is not their fault. It’s just the audit, finally arriving.

Attackers are reading your changelog. You should too.

Everything I’ve described works as well from the inside as it does from the outside. Defenders sitting on the same artifacts the attacker is reading — the company’s own commit history, its own changelogs, its own deleted blog posts, its own .gitignore files — have a much stronger position to work from than the attacker does. They have the complete history, not the public fraction. They have the closed tickets, the post-mortems, the Slack threads. They have the names of the engineers who made each commit and can ask them directly what they were integrating against on Tuesday three years ago. The defender’s recon is better than the attacker’s, in principle, by an order of magnitude. The only question is whether they’re doing it.

The most concrete way to put this discipline to work on the defensive side is to use the changelog as the trigger for re-opening your threat models. Changelog entries are the pulse line on the EKG of your threat model. Most beats are normal — uneventful releases, small fixes, refactors that don’t touch any boundary that mattered. The skill is recognizing the irregularities. A new endpoint in this release. An auth scheme migration started two releases ago and still in flight. A new third-party integration that crossed a trust boundary the existing model didn’t anticipate. A “small refactor” of the permissions middleware that’s actually a redesign in disguise. Each of those is a changelog entry on the engineering side and an unsettled entry on the security side — something whose threat picture has changed and whose threat model probably hasn’t caught up yet. If your team’s process is “we threat-modeled this service when it shipped and we’ve been busy since,” the changelog is the document telling you which beats since shipping have changed shape. Read it that way.

This generalizes the discipline beyond specific findings. A defender who’s reading their own changelog with an attacker’s eye is asking, on every release, what changed about the trust assumptions encoded in our model? That’s a much sharper question than did anyone introduce a vulnerability this sprint? The first question catches structural drift before it becomes exploitable. The second one is downstream of the answer to the first.

In practice, most teams aren’t doing this. The same archaeology that finds findings from outside finds risk from inside, and most security teams have neither the budget, the tooling, nor the cultural mandate to read their own organization’s history with the same patience an external researcher reads it. The attacker spends a week on a target. The defender, if they did the equivalent work on their own systems, would find significantly more — and would find it before the attacker did. But the defender is busy with the alert queue.

When an API endpoint suddenly disappears from the docs, missing a beat in the changelog, the endpoint ceases to exist on the public surface. Except it does, because the route handler is still in the code, the load balancer is still routing to it, and the system is telling you the cake is a lie because it would prefer you stop asking about the cake.

The actionable defensive recommendation that comes out of all this is narrower than read your own history, which is too vague to be operational. The narrower one: when somebody on your team disables a security control to make an integration work, that decision needs to be a documented event, not a quiet commit. The trade itself isn’t always wrong. Sometimes the integration genuinely matters more than the specific control, and the trade is the right local choice. What’s wrong is making the trade silently, because silent trades become invisible load-bearing weaknesses that nobody will know to revisit when the integration’s importance fades. Document the trades. Make a register. Review it once a year. The trades that no longer have a reason can be unwound; the ones that do can be acknowledged. The worst case is the one where the trade is still in place for a reason that no longer exists, and nobody knows it because nobody has looked at it in three years.

This is also, incidentally, how a defender prevents their own internal-archaeology discipline from being needed by an external archaeologist. The trades you’ve documented don’t show up as suspicious gaps; they show up as decisions, with reasons, attached to constraints that can be re-evaluated when those constraints change. The trades you haven’t documented are the ones an external researcher will find before you do. The choice isn’t between having trades and not having trades. Every nontrivial system has trades. The choice is between the trades being part of your visible architecture and the trades being part of your attack surface.

What I take from this

The discipline isn’t really about bugs. It’s about reading the history of decisions and learning to see where the constraints of the past are weak points of the present. That’s a skill that’s useful in vulnerability research, and it’s also a skill that’s useful in code review, in incident response, in due diligence on a company you’re considering acquiring, in onboarding onto a codebase you’ve never seen before. What was this team trying to do at the time they wrote this, and what trade did they have to make to do it? is a question that travels a long way past bug bounties.

The bug is the line item. The why is the audit. Most security writing focuses on the line items, because line items are concrete, countable, and rewardable. But the line items are downstream of decisions that were made under constraints, by people who experienced themselves as solving a reasonable problem with the resources they had. Find the decision and you find the next ten line items before they’re reported. Find the constraint and you understand the system the way the people who built it understood it, which is the only standpoint from which you can predict where it’s going to break next.

The history is the map. It’s been there the whole time. The interesting work is learning to read it.