Who Funds the Watchdogs
$10 million in 2023. $22,060 in 2024. One funding pipeline.
In 2023, Redwood Research reported $10 million in annual revenue. Redwood is one of the independent research labs that AI companies rely on to test whether their safety methods actually work.
In 2024, that revenue fell to $22,060.
Redwood's tax filing shows $2.9 million in expenses that year — the organization was still operating, still paying researchers, still burning through assets that dropped from $12 million to $6.5 million in two years. The work continued. The contributions didn't.
Redwood is not a regulator. It is not a watchdog in any legal sense. But it sits in the ecosystem that AI safety accountability depends on — independent technical capacity outside the labs. That capacity runs through a narrow funding pipe.
Redwood's filing doesn't name the donors who stopped giving. Open Philanthropy's grants database shows approximately $25 million in historical funding to Redwood Research. Whether the decline reflects a deliberate funding decision, a grant cycle ending, or a shift in priorities, the filing illustrates the fragility: a few large grants can make one year's revenue almost disappear in the next.
The Pipeline
Of sixteen major AI companies assessed for safety compliance in 2025, the average score was 52%. No law required any of them to participate, and no regulator enforced the results. Independent organizations are the remaining check — groups that might hold companies accountable from outside the system.
If you trace their funding to its sources, the streams converge.
One pipeline dominates. A single philanthropic organization, Coefficient Giving (formerly Open Philanthropy, rebranded in November 2025), recommends where AI safety grants go. Its primary funder is Good Ventures, the philanthropic vehicle of Facebook co-founder Dustin Moskovitz. Since 2017, this pipeline has directed approximately $336 million to AI safety research — roughly 30 to 40 percent of all tracked funding from any source.
No funder in any field wants to be 30 to 40 percent of an entire sector's budget. Coefficient Giving itself acknowledged the problem in a post titled "AI safety and security need more funders." The admission was not false modesty. After the collapse of FTX in November 2022 (which evaporated roughly $32 million in annual AI safety grants) the concentration grew worse, not better.
But acknowledging the concentration and fixing it are different things. In pharmaceutical research, when a drug company funds the institute that evaluates its drugs, the conflict is recognized as a problem requiring institutional safeguards — required disclosures, independent review boards, competing funding sources. In AI safety, those safeguards do not exist. And the co-founder of the dominant funding pipeline is now employed by one of the companies those organizations most frequently evaluate.
The Entanglement
Holden Karnofsky co-founded GiveWell, then Open Philanthropy, now Coefficient Giving — the organization that recommends where roughly a third of AI safety funding goes. He is married to Daniela Amodei, co-founder and President of Anthropic. In January 2025, he joined Anthropic's technical staff.
None of this is secret. Karnofsky himself has acknowledged that "a lot of our employees are socially integrated into the AI safety community." The relationships are open and documented — which makes them harder to address, not easier.
We found no evidence that Karnofsky's role at Anthropic has directly influenced Coefficient Giving's grant decisions. But no mechanism exists to ensure it couldn't.
Funding concentration in small fields is not unusual. AI safety's particular problem is the link between the dominant funder and the dominant subject of evaluation.
Consider SaferAI, the most prominent independent organization that scores AI companies on safety practices. It evaluates twelve companies across sixty-five criteria adapted from aviation and nuclear risk management. Anthropic consistently ranks first. SaferAI has also downgraded Anthropic three times, from 2.2 to 1.9 to the equivalent of 1.7 out of 5. The evaluations appear rigorous. The methodology is published. The critiques are specific and substantive.
SaferAI's primary funder is Jaan Tallinn, the Skype co-founder who also led Anthropic's $124 million Series A in May 2021 and sits on Anthropic's board as an observer. An Anthropic researcher serves on SaferAI's advisory board, providing input on technical projects including the ratings methodology. SaferAI publishes no conflict-of-interest disclosure policy.
The three downgrades suggest the evaluations are not simply friendly. But a drug company's early investor funding the institute that evaluates the drug — with a company employee advising on methodology — would trigger mandatory conflict-of-interest disclosures regardless of whether the evaluations were critical. The arrangement is the problem, not the output. In AI safety, no regulator requires the disclosure.
The pattern extends beyond SaferAI. The Center for AI Safety, the nonprofit behind the widely signed existential risk statement, received nearly half its operating revenue from Open Philanthropy between 2022 and 2023. Other major funders — the Survival and Flourishing Fund ($150 million since 2019, also backed by Tallinn) and Longview Philanthropy ($198 million toward AI risk reduction) — route through different intermediaries but converge on the same organizations. Even the NSF's $10.9 million AI safety program is a formal partnership with Open Philanthropy — the dominant private funder helping to fund the public one.
The Access Problem
One organization chose a different path.
METR is one of the few independent groups that tests the most powerful AI models before they reach the public. It deliberately diversified away from the dominant funding pipeline. Its predecessor received approximately $1.5 million from Open Philanthropy in 2022, but METR's current donors — including the Pew Charitable Trusts and several foundations outside the usual network — are designed to avoid dependence on any single source.
Financial independence shifted the problem rather than solving it. METR currently tests unreleased models from OpenAI and Anthropic before they ship. But that access is voluntary. Companies grant it because they choose to, not because any law or agreement requires it. METR itself has acknowledged that companies are under no legal obligation to provide any outside organization with the access required to safety-test their models. An analysis of the evaluation landscape identified the implicit constraint: critical findings could jeopardize future access.
Funding dependency creates a ceiling on criticism. Access dependency creates a ceiling on capability. METR addressed the first. The second remains: any organization conducting pre-deployment safety evaluations does so at the pleasure of the companies it evaluates.
The Pledge
In January 2026, all seven co-founders of Anthropic, led by CEO Dario Amodei and president Daniela Amodei, announced they would donate 80 percent of their wealth. At Anthropic's most recent disclosed valuation, the combined pledge could reach an estimated $37.8 billion — a projection based on reported ownership stakes, not a disclosed figure.
The pledge names no specific recipients. But the co-founders built their careers within the effective altruism network that funds AI safety research through Coefficient Giving, and the philanthropic infrastructure to deploy that kind of capital at that scale barely exists outside it.
For context: Coefficient Giving has distributed approximately $336 million across all causes since 2011. The co-founders' pledge is roughly seven times that. In May 2026, Coefficient Giving posted a hiring call expecting to deploy "in the neighborhood of $1 billion" in grants in 2026 alone.
Dario Amodei framed the pledge as a response to "a level of wealth concentration that will break society." A commenter on an industry discussion board framed the implication: "It would not be hard to cast orgs that take AI-involved source funds as something like a lobbying arm of Anthropic equity holders."
The pledge was not hidden. It was celebrated. If even a fraction of that capital flows through the existing infrastructure, the concentration that already defines the field gets worse.
The Architecture
In March 2026, an essay appeared on an industry discussion board titled "EA as a Moloch Nobody Chose" — arguing the field had drifted into a structural trap nobody designed but nobody could escape. Its author traced a cycle:
Effective altruism — the philanthropic movement that treats charitable giving as an optimization problem — channels talent toward AI companies. Those companies become extraordinarily valuable. Employees gain wealth. That wealth flows back through the same philanthropic infrastructure to organizations evaluating the same companies that generated the wealth. Organizations producing work compatible with those companies' continued operation receive funding. Those whose conclusions might imply constraints on the industry face disadvantages that nobody needs to impose deliberately.
The essay's central argument: nobody designed the cycle, but the structure produces it regardless. By May 2026, Coefficient Giving's own hiring post suggested the dynamic was becoming more explicit: the organization described a shift to "strategy-driven creation of new grantees," identifying gaps in the ecosystem, recruiting founders, and spinning up projects to fill them. A single new hire would direct more than $30 million in their first year. A systematic review of the field's talent constraints found just a few dozen grantmakers working on AI safety worldwide, many of them junior. The dominant funder was not just funding the watchdogs. It was choosing which ones exist — through a handful of inexperienced people making high-stakes allocation decisions.
What this looks like from below: in May 2026, the founder of Coordinal Research, an independent AI safety platform, published a postmortem describing "effectively one big funder, and your job is to convince them." Coefficient Giving declined his $1 million request. His safety tooling was absorbed when frontier labs shipped similar features. Six-month grant cycles with no in-cycle contact left him burning personal runway. His conclusion: selection pressure across the ecosystem "systematically under-produces the kind of counterfactual work I think the field most needs."
The funding has produced important work, and the concentration is partly a talent bottleneck — AI safety research requires technical competence that exists almost exclusively inside or adjacent to frontier labs. But dependency shapes what questions get asked, what research gets prioritized, and what findings face the least friction in reaching publication. The pharmaceutical industry eventually built safeguards against this arrangement. AI safety has not.
Dario Amodei estimated a 25% chance of catastrophe and kept building. Anthropic's own safety tests found alignment faking; the company published the finding and shipped the model. Three phone calls killed the executive order that might have required independent review. $175 million in political spending helped kill the state laws that might have created it. And the organizations that remained — the independent groups that might have forced the question — are funded by the ecosystem that produces the risk.
Anthropic filed its S-1 confidentially on June 1. When it goes public, investors will evaluate the company's safety claims. The organizations that could independently verify those claims are the ones documented above.
Each check works. Each check depends on the industry's checks.
This concludes the debut series of Inside the Black Box. Next up: $3 trillion in AI IPOs are hitting public markets before October. Three trillion-dollar bets, three governance experiments, and for the first time: mandatory disclosure.




