US to Pre-Test AI Models From Google, Microsoft, and xAI Before Release

Today’s Focus

The federal government has secured agreements with three of the largest US artificial-intelligence developers, Google DeepMind, Microsoft, and Elon Musk’s xAI, to evaluate their models for safety and security risks before those models are publicly available, CNBC reported on May 5.

The agreements were announced by the newly established Center for AI Standards and Innovation, the federal arm tasked with frontier-model evaluation. The arrangement marks the most substantive federal entry into pre-deployment AI testing to date.

Under the agreements, the Center will gain access to model weights and capabilities ahead of release, conducting evaluations focused on national-security, biosecurity, and cybersecurity risks. CNBC noted that the administration is “in discussions” with additional developers but declined to name them.

Conspicuously absent from the initial list are OpenAI, whose deteriorating commercial partnership with Apple has dominated industry coverage, and Anthropic, the maker of the Claude model family. Neither company has publicly addressed whether parallel agreements are pending.

The move comes amid growing antitrust scrutiny of AI partnerships. The Federal Trade Commission has opened inquiries into major AI deals and identified risks associated with equity stakes, revenue-sharing agreements, exclusive cloud arrangements, and governance rights, according to its public statements summarized by Wilson Sonsini.

The Debate

Supporters argue

Supporters of the pre-deployment evaluations, including officials at the Department of Commerce and bipartisan defense hawks in Congress, contend the agreements are a long-overdue extension of federal oversight into a sector that has operated with minimal regulation despite outsized national-security implications.

The Heritage Foundation’s tech-policy program has backed the basic premise of federal evaluation, calling AI safety testing “a legitimate national-security function,” though it has cautioned against using evaluations as a back-door regulatory regime that could entrench incumbents.

Civil-society groups focused on AI safety, including the Center for AI Safety and the Future of Life Institute, have welcomed the agreements as a meaningful first step. They argue voluntary pre-deployment testing was always going to be insufficient given the speed of frontier-model development, and that the federal government finally has formal access to evaluate models before they reach hundreds of millions of users.

Critics argue

Critics from across the political spectrum have raised concerns. Civil-liberties groups, including the Electronic Frontier Foundation, have questioned whether evaluations will be used to suppress models that take political positions the administration dislikes. They have pointed to last year’s House Judiciary Committee inquiry into Apple and OpenAI over alleged “left-leaning bias” as evidence the federal government may use evaluations as a content-control mechanism.

Industry trade associations, including the Information Technology Industry Council, have warned that pre-release evaluations could slow US innovation and hand a competitive advantage to Chinese developers. They argue the administration’s parallel push to remove diversity-related federal AI guidance contradicts its claim that evaluations are neutral.

Some Senate Democrats, including Senator Mark Warner of Virginia, have argued the agreements do not go far enough. They have called for legislation that would mandate, rather than rely on voluntary partnerships, frontier-model testing across all major developers.

What the experts say

Independent AI-policy researchers are broadly supportive of the principle of pre-deployment evaluation but split on the design. Researchers at the Stanford Institute for Human-Centered Artificial Intelligence have argued that voluntary agreements lack the legal teeth needed to ensure compliance over time. They have called for codification through legislation similar to the EU AI Act’s risk-tiering framework.

Brookings Institution scholars have noted that the Center for AI Standards and Innovation’s authority derives largely from the National Institute of Standards and Technology, which has limited enforcement capacity. They argue that without statutory authority, future administrations could roll back the agreements unilaterally.

The Georgetown Center for Security and Emerging Technology examined the agreements and concluded the publicly disclosed evaluation methodology is narrower than that used by the UK’s AI Security Institute, which has built capability in biosecurity and cyber-risk evaluation since 2023. CSET researchers recommended the US center adopt similar evaluation protocols and publish evaluation results so the public and Congress can assess effectiveness.

By the Numbers

3: Number of US AI developers (Google DeepMind, Microsoft, xAI) under the new federal pre-deployment evaluation agreements, per CNBC.
0: Number of statutory pre-release testing requirements for frontier AI models in current US law, per the Congressional Research Service.
3+: Number of years the UK’s AI Security Institute has been conducting frontier-model evaluations, ahead of the new US program.
70+: Number of state-level AI bills introduced or enacted during the 2025 legislative season, per the National Conference of State Legislatures.
$2.4 billion: Approximate value of US federal AI-related contracts in fiscal 2025, per a Brookings tracker.