Striveworks Becomes First AI Company to Bridge the Army's Two Largest Battlefield AI Efforts. Read now.

Sed ut perspiciatis unde omnis iste

The Department of War has declared its intent to become an AI-first warfighting institution. The question is how to...

Executive Summary

The Department of War has declared its intent to become an AI-first warfighting institution. Executive Order 14179, the AI Action Plan, and the DoW AI Acceleration Strategy collectively establish a clear national mandate: The United States must aggressively adopt AI within its armed forces, at speed, without the bureaucratic barriers that have historically delayed capability delivery. The question is how to deploy AI in a way that delivers the needed outcomes.

The Big Idea: Accountable implementation is not a brake on AI first. It is the technical substrate that makes AI first operationally reliable. Speed without a verifiable evidence base produces costly failures, not dominance.

Core Premises

  • The mandate is explicit and current. EO 14179 (Jan 2025), the AI Action Plan (Jul 2025), and the DoW AI Acceleration Strategy (Jan 2026) collectively require rapid, aggressive AI adoption including agentic systems for kill chain execution and battle management. These are binding directives, not aspirational statements.
  • Accountability requirements persist. DoD AI Ethics Principles (2020), DoD Directive 3000.09 (2023), and the NSM on AI Governance Framework (Oct 2024) establish accountability, traceability, and governability requirements that exist alongside the speed mandate. The question is how to satisfy both simultaneously.
  • Speed and accountability are not in tension at the architecture level. An accountability architecture built into the system from the ground up does not slow deployment; it produces systems that can be trusted and scaled. An accountability architecture bolted on after deployment produces systems that fail under operational stress.
  • A policy-grounded taxonomy of autonomy levels is required. The DoW strategy's Agent Network Pace-Setting Project (PSP) targets kill chain execution—a domain where authorization, constraint adherence, and accountability must be verifiable before and during deployment. A shared taxonomy enables acquisition decisions, program authorization, and legal review to proceed on consistent terms.
  • The DoW has demonstrated that continuous accountability enables faster deployment. The Continuous Authorization to Operate (cATO) framework, which was established by the DoW CIO memo in February 2022 and is now operational across 50+ DoW software factories, demonstrates that shifting from point-in-time assessment to continuous compliance monitoring produces faster deployment timelines, not slower ones. The same structural principle applies directly to AI authorization. Section III.3 develops this precedent.

I. The Policy Mandate: Accelerating AI Adoption in National Security

The current policy environment is the most permissive for autonomous AI systems in the history of the United States government. This is not accidental. It reflects a deliberate strategic judgment that the speed of AI adoption is the primary variable in maintaining military advantage. Understanding the specific language and requirements of the governing documents is essential to aligning any implementation framework with the mandate they establish.

1.1 Executive Order 14179: Setting the National Priority

On January 23, 2025, Executive Order 14179, Removing Barriers to American Leadership in Artificial Intelligence was implemented.[1] The order's policy statement is direct: "It is the policy of the United States to sustain and enhance America's global AI dominance in order to promote human flourishing, economic competitiveness, and national security." The word "dominance" is deliberate: The order explicitly revokes prior AI restrictions it characterizes as "barriers to American AI innovation" and directs the development of an AI Action Plan to implement its dominance objective.

1.2 America's AI Action Plan: Military Preeminence

On July 23, 2025, the White House published the AI Action Plan,[2] the implementation document for EO 14179. The defense section's language is unambiguous: "The United States must aggressively adopt AI within its Armed Forces if it is to maintain its global military preeminence." The word "aggressively" is a policy directive, not rhetorical emphasis. The AI Action Plan directs DoW to establish an AI and Autonomous Systems Virtual Proving Ground—a specific acknowledgment that agentic systems evaluation is a priority capability gap—and to develop a "streamlined process for optimizing DoW workflows and a list of priority workflows for automation with AI."

1.3 DoW AI Acceleration Strategy: AI-First Warfighting

On January 12, 2026, Secretary of War Pete Hegseth issued the Department of War AI Acceleration Strategy,[3] establishing DoW as an "AI-first warfighting force across all domains." Secretary Hegseth's statement of intent is direct: "We will unleash experimentation, eliminate bureaucratic barriers, focus our investments and demonstrate the execution approach needed to ensure we lead in military AI." Under Secretary for Research and Engineering Emil Michael provided the operational framing: "Speed defines victory in the AI era, and the War Department will match the velocity of America's AI industry."

The strategy's seven PSPs are its operational core. Two are directly relevant to agentic AI deployment in national security contexts:

  • Agent Network: "Unleashing AI agent development and experimentation for AI-enabled battle management and decision support, from campaign planning to kill chain execution." This PSP explicitly mandates agentic AI for the most consequential decisions in the targeting cycle.
  • Enterprise Agents: "Building the playbook for rapid and secure AI agent development and deployment to transform enterprise workflows." This PSP creates the institutional template for agentic AI deployment across the enterprise.

Strategic Significance: The DoW strategy does not merely permit agentic AI for kill chain execution—it mandates it. "Agent Network" is not a research project or a pilot. It is a Pace-Setting Project with a single accountable leader and monthly reporting to senior leadership.

1.4 The Governance Baseline: Accountability Requirements That Persist

The speed mandate of EO 14179 does not operate in a policy vacuum or erase prior considerations. The mandate is to maintain governance by improving speed and removing only those regulations that were unduly burdensome. Several accountability requirements remain in force and are directly applicable to agentic AI deployment:

DoW AI Ethics Principles (February 4, 2020).[4] Five principles governing all DoW AI: Responsible, Equitable, Traceable, Reliable, and Governable. The Traceable principle requires that DoW AI be "developed and deployed such that relevant personnel can understand, oversee, and correct the behavior of the AI." The Governable principle requires that DoW AI be "designed and engineered to fulfill their intended function while possessing the ability to detect and avoid unintended harm or subversion." These principles have not been revoked by EO 14179 or the DoW strategy.

DoD Directive 3000.09 (November 2023).[5] Governs autonomy in weapon systems. Requires "appropriate levels of human judgment over the use of force." This standard is explicitly contextual, not a fixed requirement for real-time human-in-the-loop control. It requires senior-level review for systems designed to select and engage without human input, not categorical prohibition of such systems.

NSM on AI Governance Framework (October 24, 2024). The first national security-specific AI governance memorandum directs a Framework for AI Governance and Risk Management in national security contexts. The Framework requires Chief AI Officers, AI Governance Boards, and mandatory test, evaluation, verification, and validation (TEVV) programs for fielded AI systems. These institutional requirements are not superseded by EO 14179, which revoked Biden's consumer-AI executive order (EO 14110) but did not address the national security governance framework.

Table 1. Policy mandate summary: documents, directives, and implications for agentic AI deployment.

II. Why Now: Speed Is a Requirement

The policy mandate for speed is not merely administrative preference. It reflects a genuine and evidence-grounded assessment of the operational environment. Understanding this assessment is essential to implementing the mandate correctly as a true mission requirement.

2.1 The Battlefield of Stuff

The modern battlefield is characterized by an exponential increase in sensors, effectors like drones and UMS, data sources, and decision nodes—the "battlefield of stuff."[7] The volume of operationally relevant information exceeds the cognitive processing capacity of human decision-making structures built for a slower environment. Saturation produces the failure modes most dangerous in national security contexts: missed threats, delayed decisions, and confident action on incomplete information.

2.2 The Darwinian Pressure Argument

Competition with peer adversaries creates a selection pressure that is independent of American policy preferences. An adversary that deploys effective agentic systems faster than the United States does not need to win individual engagements to prevail; it merely needs to operate at a decision tempo that outpaces the American command structure's ability to respond.[7] This is the Darwinian pressure argument for agentic AI autonomy: The competitive environment selects for faster decision cycles, and systems that cannot match the adversary's tempo are not in the competition.

The DoW AI Acceleration Strategy explicitly internalizes this argument: "Military AI is going to be a race for the foreseeable future, and therefore speed wins."[3] This framing is not metaphorical. It is an operational assessment that shapes acquisition priorities, deployment timelines, and the acceptable risk calculus for agentic AI system deployment.

2.3 The Risk Calculus

The DoW strategy makes an explicit risk calculus: The risk of moving too slowly outweighs the risk of imperfect initial deployment. This is not recklessness; it is a reasoned assessment that systems that are not deployed cannot provide mission value and that the cost of sub-optimal AI systems in the field is lower than the cost of no AI systems when adversaries have them.

III. What Responsible Speed Requires

US policy establishes two simultaneous requirements that might appear to be in tension: deploy aggressively and maintain accountability. They are not in tension at the architecture level. The appearance of tension arises from a common but incorrect assumption: the assumption that accountability is something added to a system after it is built, through process, documentation, and approval cycles. When accountability is an architectural property built into the system from the ground up, speed and accountability are mutually reinforcing rather than competing.

3.1 Requirements

Reading the governing documents together, the mandate for agentic AI in national security contexts has the following structure:

  • Speed is required. EO 14179, the AI Action Plan, and the DoW strategy all direct aggressive adoption. The 30-day frontier model deployment mandate eliminates slow timelines as a viable posture. Programs that cannot demonstrate deployment-ready systems at operational pace are not aligned with the mandate.
  • Traceability is required. DoW AI Ethics Principles require that relevant personnel can understand, oversee, and correct AI behavior. This is not aspirational—it is a mandatory property of all deployed DoW AI. A system whose decision logic is opaque does not satisfy this requirement regardless of how fast it is deployed.
  • Governability is required. The Governable principle requires that AI be designed to detect and avoid unintended harm and to be correctable by authorized humans. This is also a mandatory property. A system that cannot be corrected because it moves too fast to be understood or because its constraint adherence cannot be verified does not satisfy this requirement.
  • TEVV is required. The NSM Governance Framework mandates test, evaluation, verification, and validation programs for fielded national security AI. This is a concurrent requirement to the speed mandate. A program that deploys without a TEVV program is not aligned with the NSM framework regardless of how well it aligns with the speed mandate.

3.2 The Enabling Architecture

The accountability architecture proposed in this framework does not add process steps that delay deployment. It provides the technical substrate on which rapid deployment becomes trustworthy:

  • Domain-specific evaluation. Evaluation datasets derived from operational exercises and evaluated against domain-specific rubrics produce a verifiable evidence base before and during deployment. Following AI-first principles requires an agent to be operationally credible when there is domain-specific evidence of nominal performance, not simply generic benchmark scores.
  • Architecture-level constraint enforcement. Constraints encoded at the system architecture level, not in prompt text, are not vulnerable to prompt manipulation, model updates, or adversarial inputs. Architecture-level enforcement is faster to operate (no human approval for each action within the authorized envelope) and more reliable than prompt-level constraint reliance.
  • Full decision provenance. Complete logging of agent reasoning, tool calls, orchestration decisions, and human interaction points enables post-mission accountability review, L3 evaluation, and legal defensibility. This is the technical foundation for the "single accountable leader" governance model that the DoW strategy mandates for each PSP.

3.3 A Proven Precedent: The Continuous Authorization to Operate (cATO) Model

The challenge of enabling rapid deployment while maintaining rigorous accountability is not unique to AI systems. The DoW faced the same structural problem in software security: Traditional Authorization to Operate (ATO) processes required point-in-time risk assessments that took 18 to 24 months and cost more than $3 million on average, creating a deployment bottleneck that put operationally critical software years behind its authorization.[8]

The DoW's solution was not to eliminate accountability; it was to restructure it. In February 2022, the DoW Chief Information Officer issued a memorandum establishing the Continuous Authorization to Operate (cATO) framework.[9] The DoW definition is precise: cATO is "The state achieved when an organization that develops, secures, and operates a system has demonstrated enough maturity in maintaining a resilient cybersecurity posture that traditional risk assessments and authorizations become redundant." [10] The continuous evidence base is stronger, more current, and more operationally reliable than the point-in-time assessment it replaces. The result is authorization timelines measured in days rather than months.

The cATO framework rests on three technical pillars: Continuous Monitoring (CONMON), Active Cyber Defense (ACD), and Secure Software Supply Chain (SSSC).[9] More than 50 DoW software factories—including Platform One and Iron Bank—now operate under cATO frameworks, demonstrating that the model is practical and in production at scale across the DoW enterprise.[8]

The structural parallel to the AI accountability architecture proposed in this paper is intentional. Both frameworks address the same problem—replacing a slow, point-in-time authorization bottleneck with a continuous evidence structure that enables rapid deployment decisions—and both rest on the same three architectural pillars:

Table 2. Structural parallel between the DoW cATO cybersecurity model and the AI authorization framework proposed in this paper. Both replace point-in-time assessment with continuous evidence.

The cATO precedent establishes two things for AI authorization. First, the DoW has already accepted the governing principle: continuous compliance infrastructure enables faster deployment at lower cost than manual processes. Second, it provides an institutional template: Programs building AI accountability on the model of CONMON, ACD, and SSSC are extending a proven DoW pattern.

The Integration Point: The DoW AI Acceleration Strategy mandates single accountable leaders for each PSP. Single accountability requires attributable decisions. Attributable decisions require traceable reasoning chains. Traceable reasoning chains require the observability architecture described in Section IV. The DoW governance model and the technical accountability architecture are the same requirement expressed in different vocabularies.

IV. The Accountability Architecture for Mission Agentic AI

The accountability architecture proposed here implements the requirements established in the previous section and provides the technical structure for meeting those requirements at scale. The core of the architecture is a taxonomy of authorization levels that maps the DoW's deployment mandate to its accountability requirements.

4.1 Human Judgment at Five Levels

A persistent misconception in discussions of autonomous systems is that human judgment is binary: Either a human approves each action (human-in-the-loop), or accountability is lost. This misconception fails to account for the different levels at which human judgment operates in the life cycle of an autonomous system. Meaningful human accountability over autonomous AI action is exercised at five distinct levels:

  • System design: Engineers, lawyers, and commanders define the system's operational scope, constraint architecture, and authorized behaviors. Human judgment at the design level determines what the system can and cannot do. This generates consequential decisions about the system's behavior.
  • Policy and constraint definition: Commanders and lawyers define the rules of engagement, Law of Armed Conflict compliance requirements, and mission-invariant constraints that govern the system's behavior. These are encoded at the architecture level, not the prompt level, ensuring that human policy judgments are structurally enforced.
  • Operational Employment Domain (OED) definition: Commanders define the conditions under which the system is authorized to operate autonomously—the set of scenarios, environments, and mission parameters within which autonomous execution is authorized. Out-of-OED conditions trigger a Request for Authorization (RFA) to human authority before the system proceeds.
  • Mission authorization: Operators and commanders authorize specific missions within the defined OED. This is the decision point that determines whether autonomous execution proceeds for a given mission.
  • Post-mission review: Commanders, legal authorities, and engineers review mission execution, evaluate system performance, identify deviations from authorized behavior, and update the system's OED and constraint architecture accordingly.

This five-level architecture satisfies the DoW AI Ethics Principles' Governable requirement —humans can understand, oversee, and correct AI behavior without requiring real-time human approval of each autonomous action within the authorized envelope. It is the technical implementation of DoD Directive 3000.09's "appropriate levels of human judgment" standard, which is explicitly contextual: Appropriate judgment at the right level is sufficient; real-time human-in-the-loop at every decision is not required.[5]

4.2 Levels of Authorized Agency (LAA)

The Levels of Authorized Agency (LAA) taxonomy provides a shared vocabulary for acquisition, legal review, operational authorization, and system evaluation. Adapted from the SAE J3016 driving automation framework[11] and tailored to the national security mission context, it maps authorization levels to prerequisite conditions, human roles, and accountability assignments:

Table 3. Levels of Authorized Autonomy (LAA): definition, prerequisite conditions, human role, and accountability at each level.

The LAA taxonomy serves three functions for DoW implementation. It 1) provides acquisition authorities with a consistent standard for evaluating claimed agentic levels against demonstrated evaluation evidence—enabling program offices to distinguish systems with validated TEVV evidence from systems that rely on powerpoints for their validation artifact, 2) provides legal reviewers with a framework for assessing 3000.09 compliance at each level since the human role assignment at each LAA level can be directly mapped to the directive's requirements, and 3) provides commanders with a vocabulary for authorizing and communicating the degree of autonomous execution they are authorizing for a given mission.

V. Best Practice

The following best practices represent the standards that responsible implementation of mission-directed agentic AI should meet. These practices represent the operationalization of current policy requirements based on current applied experience with agentic AI systems in defense contexts.

Table 4. Best practices for mission-directed agentic AI deployment: requirement, what it demands, and why it matters.

VI. Implementation Path

Implementing a responsible agentic AI deployment consistent with this framework proceeds in five steps:

Step 1: Establish the evaluation baseline

Before deployment, develop domain-specific evaluation datasets derived from operational exercises, with scoring rubrics and concrete anchoring examples. Define the system's OED ( the conditions under which it is authorized to operate autonomously). Conduct L1 (agent-level) and L2 (system-level) evaluations against the established criteria. Produce the evidence base required for deployment authorization decisions. This step is a prerequisite under the NSM Governance Framework.

Step 2: Authorize the LAA level

Determine the LAA level consistent with the evaluation evidence and the operational context. A system with a robust evaluation baseline and a well-defined OED can authorize higher LAA levels. A system with limited evaluation evidence is constrained to lower LAA levels regardless of claimed capability. The LAA level is an authorization decision that requires human judgment at the appropriate command authority level before autonomous execution begins.

Step 3: Deploy with full instrumentation

Deploy with evaluation-grade instrumentation producing structured logs for every session: agent reasoning chains, orchestrator routing decisions, tool calls, and human interaction events. Framework-level instrumentation, not prompt-level instrumentation, ensures that logging persists across model and prompt updates. Every operational session is a potential evaluation asset.

Step 4: Operate continuous evaluation

Integrate evaluation into the operational and development cycle. Run regression and challenge sets against every development iteration. Conduct adversarial and red team evaluations on a defined schedule. Conduct L3 human-system interaction evaluations with representative operators. Feed findings into development priorities and OED updates.

Step 5: Conduct post-mission review

After each significant operational use, conduct structured post-mission review using the decision trace logs. Identify any OED deviations, constraint boundary events, or performance anomalies. Update the evaluation evidence base and, where appropriate, the OED and constraint architecture. Document findings for the accountability record. This is the continuous accountability mechanism for the agentic system.

As the evaluation program matures, organizations should explore training domain-specific judge models on the preference signal and error corrections generated in Steps 4 and 5. An automated judge trained on operationally derived corrections, e.g., human triage adjudications, override decisions, and post-mission anomaly resolutions, will over time outperform out-of-the-box models on the mission-specific evaluation benchmarks established in Step 1. This is not a near-term requirement; it is a natural extension of the evaluation infrastructure already prescribed here, and it compounds in value with each operational cycle [13].

VII. Conclusion

The Department of War has issued the clearest mandate in US military history for aggressive AI adoption, including agentic AI for kill chain execution and battle management. The mandate is real and has operational teeth.

Accountable implementation and rapid deployment are not competing objectives. They are the same objective at different architectural levels. The DoW's single-accountable-leader governance model requires attributable decisions; attributable decisions require traceable reasoning; traceable reasoning requires the observability and evaluation architecture that this framework describes. The DoW's own cATO precedent confirms the principle: Continuous accountability evidence enables faster deployment, not slower. The LAA taxonomy provides acquisition, legal, and operational authorities the shared vocabulary to move at the speed the mandate requires—on a foundation that holds under operational stress.

Bottom Line: AI first is not AI that cannot act autonomously. It is AI whose autonomous actions are demonstrably aligned with mission intent, constrained by policy, traceable to human judgment, and continuously evaluated against operational reality. This is what AI first means when it works.