What This Post Covers
In The Anatomy of Agentic Code Assist, we looked at how agents like OpenHands work: event streams, sandboxed execution, tool use, the CodeAct framework. That post covered the agent itself, what it does and how it’s built. This post covers a different layer: the infrastructure that keeps agents running reliably in production.
When an agent runs for hours, makes hundreds of tool calls, and interacts with flaky LLM APIs, a whole class of infrastructure problems emerge that application-level code cannot solve:
- State loss on process crashes: a worker dies mid-workflow and hours of accumulated context disappear. The agent restarts from scratch, re-executing every LLM call and tool invocation.
- LLM API rate limits and timeouts: 429s, 500s, socket timeouts, multi-minute latencies. A reflexion loop running 10 cycles can consume 50x the tokens of a linear pass if any step fails and forces a restart.
- Debugging non-deterministic behavior: the same prompt produces different outputs, different tool call sequences, different results. Without a complete execution trace, reproducing production bugs is close to impossible.
- Tasks exceeding server timeouts: agent sessions lasting minutes to hours die on deployments, fail during scaling events, and exceed web server timeout limits.
- Ambiguous recovery after parallel fan-out crashes: the agent launches ten parallel tool calls. The process crashes after seven complete. Which results were already obtained? Which need re-execution?
- Losing context during human-in-the-loop waits: the agent pauses for human approval, potentially for hours or days. The server holding that state needs to remain available, or all accumulated context is lost.
- Error cascades across multi-agent systems: a single failure in one agent propagates downstream without corrective mechanisms. Simple retry logic at the tail end is inadequate because the agent may have already deviated significantly from the intended path.
Temporal is an orchestration platform built around durable execution. We’ll walk through its architecture, understand why each design decision exists, and look at how OpenAI’s Codex team uses it in production.
The core idea can be expressed as a state transition: $S_{t+1} = f(S_t, M(S_t, T_t))$. Agent state evolves through deterministic orchestration ($f$) of non-deterministic operations ($M$ = LLM response, $T$ = tool results). Temporal separates these two concerns at the infrastructure level. The deterministic part goes in workflows. The non-deterministic part goes in activities.
Workflows and Activities
The fundamental design decision in Temporal: split all code into two categories based on determinism.
Workflows
A Workflow is the agent’s control flow, the logic that decides which tools to call, in what order, what to do with results, and when to wait for human input. Workflows run as ordinary code in Python, TypeScript, Go, or Java, with one hard constraint: they must be deterministic. Given the same inputs and the same activity results, a workflow must produce the same sequence of commands every time.
A Workflow Execution can run for seconds, hours, or years. It persists through infrastructure failures. The workflow doesn’t know or care about crashes; from its perspective, execution is continuous.
Activities
Activities are where all side effects live: LLM API calls, tool executions, database writes, HTTP requests. Anything that can fail, timeout, or produce different results on re-execution. Temporal records every activity result in a persistent Event History, an append-only log that serves as the authoritative record for the entire workflow’s state.
Why This Split Matters
The determinism requirement is what enables replay-based recovery (which we’ll cover in the next section). Here’s the reasoning: if we know the workflow logic is deterministic, and we have a recorded log of all activity results, we can reconstruct the exact workflow state after a crash. We don’t need developer-written checkpoint code. We don’t need serialization logic. We just replay the deterministic code with the previously recorded results, and we arrive at the same state.
This raises an obvious question: LLMs are non-deterministic, so how does this work? The answer maps directly to how agents already operate. The LLM call goes in an activity – it’s non-deterministic, its result gets recorded. The logic deciding what to call and when goes in the workflow – it’s deterministic. The agent loop says “if the LLM returned a tool call, execute that tool; if it returned a final answer, return it.” That orchestration logic doesn’t change between runs.
A Complete Agent Loop
Here’s what a complete agent workflow looks like in Python:
from temporalio import workflow, activity
from temporalio.common import RetryPolicy
from datetime import timedelta
from dataclasses import dataclass
@dataclass
class LLMRequest:
goal: str
history: list
available_tools: list
@activity.defn
async def call_llm(request: LLMRequest) -> dict:
# Non-deterministic: LLM API call lives here
response = await llm_client.chat(
messages=request.history,
tools=request.available_tools,
)
return {"action": response.action, "params": response.params}
@activity.defn
async def execute_tool(tool_name: str, params: dict) -> str:
# Non-deterministic: tool execution lives here
return await tool_registry.execute(tool_name, params)
@workflow.defn
class AIAgentWorkflow:
@workflow.run
async def run(self, user_goal: str) -> str:
conversation_history = []
llm_retry = RetryPolicy(
initial_interval=timedelta(seconds=1),
backoff_coefficient=2.0,
maximum_interval=timedelta(seconds=60),
maximum_attempts=10,
)
while not self.is_goal_achieved(conversation_history):
# Deterministic: this decision logic is the workflow
next_action = await workflow.execute_activity(
call_llm,
LLMRequest(
goal=user_goal,
history=conversation_history,
available_tools=self.get_available_tools(),
),
start_to_close_timeout=timedelta(seconds=120),
retry_policy=llm_retry,
)
if next_action["action"] == "tool_call":
# Parallel tool execution when multiple tools requested
results = await asyncio.gather(*[
workflow.execute_activity(
execute_tool,
tool["name"], tool["params"],
start_to_close_timeout=timedelta(seconds=30),
)
for tool in next_action.get("tool_calls", [])
])
conversation_history.extend(results)
else:
conversation_history.append(next_action)
return self.format_final_result(conversation_history)
Workflow / Activity Split
Deterministic orchestration on the left, non-deterministic side effects on the right, Event History in the center
Click any workflow step to highlight its corresponding activity and event history entries
Deterministic Replay
Replay is the mechanism that makes Temporal’s fault tolerance work. Let’s walk through it in detail, because understanding replay is the key to understanding why the rest of the architecture looks the way it does.
The Event History
Every workflow execution has an Event History: an append-only log stored in Temporal’s persistence layer. When an activity completes, Temporal records both the request and the result.
What Happens on a Crash
Here’s a concrete scenario. An agent workflow is at step 4 of 7. It has completed three LLM calls and tool executions, and is partway through the fourth:
- The worker process crashes (OOM, deployment, hardware failure)
- The Temporal server detects the failure (heartbeat timeout or task timeout)
- Another worker picks up the workflow from the task queue
- Temporal re-executes the workflow code from the beginning
- When the code reaches activity calls that already completed (steps 1–3), Temporal returns the previously recorded results from the event history instead of re-executing them
- The workflow code deterministically reaches the exact same state it was in before the crash: same local variables, same loop counter, same conversation history
- Forward execution resumes from step 4. Only now does an actual activity get dispatched
Because the workflow code is deterministic, replaying it with the same activity results always produces the same sequence of commands. The entire call stack and state are reconstructed with no developer-written checkpoint code. This is different from simple checkpointing because the developer never has to decide what to checkpoint or when – the replay mechanism reconstructs everything automatically from the event history.
The Determinism Contract
The determinism requirement imposes hard constraints on workflow code. You cannot use:
random()– useworkflow.random()insteaddatetime.now()– useworkflow.now()insteadtime.sleep()– useworkflow.sleep()or timers instead- Direct I/O (network calls, file reads) – these must go in activities
- Threading or subprocess creation – use activities or child workflows
For AI engineers, this constraint is less restrictive than it sounds. LLM calls and tool executions are inherently side effects, so they already belong in activities. The orchestration logic that decides what to call and when – “call the LLM, check if it returned a tool call, execute the tool, loop” – doesn’t use random numbers or system clocks.
Here’s what non-determinism violations look like in practice:
# WRONG: non-deterministic workflow code
@workflow.defn
class BadAgentWorkflow:
@workflow.run
async def run(self, goal: str) -> str:
if random.random() > 0.5: # different result on replay
strategy = "aggressive"
else:
strategy = "conservative"
timestamp = datetime.now() # different on replay
await asyncio.sleep(5) # blocks the event loop
# CORRECT: deterministic workflow code
@workflow.defn
class GoodAgentWorkflow:
@workflow.run
async def run(self, goal: str) -> str:
if workflow.random().random() > 0.5: # deterministic across replays
strategy = "aggressive"
else:
strategy = "conservative"
timestamp = workflow.now() # deterministic across replays
await workflow.sleep(5) # durable timer, survives crashes
Contrast with OpenHands
Both Temporal and OpenHands use event sourcing, but for different purposes. OpenHands records events (CmdRunAction, FileWriteAction, observations) for debuggability and observability. You can replay the event sequence to understand what the agent did. Temporal records events so the workflow can be reconstructed after a crash as if nothing happened. Same architectural pattern, different goals.
Formalization
If History = $[(a_1, r_1), (a_2, r_2), \ldots, (a_k, r_k)]$ records completed activities, then replay returns $r_1 \ldots r_k$ from history and only executes $a_{k+1}$ forward. The workflow’s determinism guarantees that replaying with recorded results produces the same sequence of activity commands, so the state at step $k$ is identical to the state before the crash.
Deterministic Replay
Watch how Temporal recovers from a crash by replaying the event history
Server Architecture
Temporal runs as four server-side services plus a persistence layer, with user-managed workers running externally.
The Four Services
Frontend Service: a stateless gRPC gateway. All client and worker communication flows through it. Handles rate limiting, routing, and authorization. Horizontally scalable because it holds no state.
History Service: owns workflow state and persists event histories. This is the most important component. Manages state transitions across configurable History Shards, which are the unit of concurrent throughput scaling. Each shard handles a subset of workflows. More shards = more concurrent workflows.
Matching Service: hosts Task Queues and dispatches work to workers. When a workflow needs an activity executed, the Matching Service places it on the appropriate task queue. When a worker polls for work, the Matching Service assigns a task.
Workers: stateless external processes that you deploy and manage. Workers long-poll task queues via gRPC, execute workflow or activity code, and report results back. Because workers hold no state, they can be killed, restarted, or scaled horizontally without any coordination. The Temporal server is always the authoritative record.
Task Queues
Task Queues provide a routing layer that becomes important for agent workloads. Workflow tasks and activity tasks flow through separate queues. You can route activities to specialized worker pools (GPU workers for inference, lightweight workers for API calls) by assigning them to different task queues. This lets teams scale heterogeneous agent workloads independently.
| Component | Responsibility | Failure Impact |
|---|---|---|
| Frontend Service | gRPC gateway, rate limiting, routing | Clients can’t connect (stateless, restart recovers) |
| History Service | Workflow state, event persistence, shard management | Workflow progress pauses until recovery |
| Matching Service | Task queue hosting, work dispatch | Tasks queue but don’t dispatch (no work lost) |
| Workers | Execute workflow/activity code, report results | Pending tasks reassigned to other workers |
| Persistence (DB) | Durable storage for event histories | All services degraded until DB recovers |
Temporal Server Architecture
Four services, a persistence layer, and stateless workers
Hover over any service to see its failure impact
Primitives for Agent Patterns
Beyond workflows and activities, Temporal provides several primitives that map to common agent coordination problems.
Signals
Signals are asynchronous messages sent to a running workflow. The workflow can react at any point in its execution. This is the mechanism for human-in-the-loop: the agent reaches a decision point, calls workflow.wait_condition(), and a signal carrying the human’s approval resumes it.
The workflow can wait hours or days. It consumes no compute while waiting because its state lives in the event history, not in a running process. No worker is tied up, no server is keeping a connection open. The state is persisted in the database and can be reconstructed on demand when the signal arrives.
Queries
Queries let external systems read workflow state without modifying it. This powers dashboards and monitoring: “What step is the agent on? What was the last LLM response? How many tokens has it consumed?” The query handler runs against the in-memory workflow state and returns immediately.
Updates
Updates combine a signal and a query: send a command to the workflow and get a response. This is useful for interactive agent control (“redo step 2 with different parameters”) where you need to both modify the workflow’s behavior and confirm the modification was accepted.
Replit, for example, uses Workflow Updates for human-in-the-loop consent. When their agent wants to perform a destructive action, it pauses and waits for the user to accept or reject via an Update.
ContinueAsNew
Each workflow execution is limited to 51,200 events or 50MB of event history. For agents making hundreds of tool calls, history grows fast; each activity generates roughly 3 events. If activities return large LLM payloads (500KB+), the 50MB limit becomes binding well before the event count limit.
ContinueAsNew addresses this by atomically starting a fresh execution with the same Workflow ID, carrying forward essential state while resetting the history. The old history is archived. For long-running agents, this is how you keep the workflow alive indefinitely.
Human-in-the-Loop Pattern
@workflow.defn
class AgentWithHumanApproval:
def __init__(self):
self.approved = False
self.current_step = "initializing"
self.pending_action = None
@workflow.signal
async def approve(self, decision: str):
self.approved = decision == "yes"
@workflow.query
def get_status(self) -> dict:
return {
"step": self.current_step,
"pending_action": self.pending_action,
"approved": self.approved,
}
@workflow.run
async def run(self, goal: str) -> str:
while not self.is_complete():
action = await workflow.execute_activity(
call_llm, goal,
start_to_close_timeout=timedelta(seconds=120),
)
if action.requires_approval:
self.pending_action = action.description
self.current_step = "awaiting_approval"
# Workflow state persists in DB -- no compute cost while waiting
await workflow.wait_condition(lambda: self.approved)
self.approved = False # reset for next approval
self.current_step = "executing"
result = await workflow.execute_activity(
execute_tool, action.tool, action.params,
start_to_close_timeout=timedelta(seconds=60),
)
return self.format_result()
The workflow.wait_condition(lambda: self.approved) line is where the agent pauses. It can sit there for minutes, hours, or days. If the server restarts, if workers are redeployed, the workflow’s state survives. When the signal arrives, any available worker picks it up and resumes execution.
Agent Primitives Timeline
Signals, Queries, Updates, and wait points across an agent's lifecycle
Retry Policies and Error Handling
LLM APIs fail routinely. Rate limits (429), server errors (500), socket timeouts, multi-minute latencies. These are the norm for agents making hundreds of calls, and different activities need different retry strategies.
Declarative Retry Policies
Retry policies are configured per activity with several parameters: initial interval, backoff coefficient, maximum interval, maximum attempts, and non-retryable error types. The important part is that retries happen at the infrastructure level. If a worker crashes during a retry cycle, another worker picks up with the retry state intact. The developer writes no retry logic.
Why Different Activities Need Different Strategies
LLM calls need aggressive retry with exponential backoff. Rate limits are transient, and the cost of not retrying (losing all accumulated context and starting the agent run from scratch) far outweighs the cost of waiting 30 seconds for capacity. Configure high maximum attempts (10+) with a long maximum interval.
Tool executions need limited retries. Tools may not be idempotent – running git commit twice produces different results. Blindly retrying could cause duplicate side effects. Configure low maximum attempts (2–3) and mark certain error types as non-retryable.
Human notifications often need no retry at all. Fire-and-forget: if the Slack message fails, don’t block the workflow.
llm_retry = RetryPolicy(
initial_interval=timedelta(seconds=1),
backoff_coefficient=2.0,
maximum_interval=timedelta(seconds=60),
maximum_attempts=10,
non_retryable_error_types=["InvalidPromptError"],
)
tool_retry = RetryPolicy(
initial_interval=timedelta(seconds=2),
maximum_attempts=3,
non_retryable_error_types=["PermissionDenied", "NotIdempotent"],
)
# Heartbeating for long-running activities
@activity.defn
async def execute_long_tool(task: dict) -> str:
result = ""
for i, chunk in enumerate(process_chunks(task)):
activity.heartbeat({"progress": i, "last_chunk": chunk.id})
result = await process(chunk)
return result
Heartbeats
For long-running activities, the worker periodically reports progress via heartbeats. If the heartbeat stops (worker crashed), Temporal reschedules the activity on another worker. The new worker can read the last heartbeat details to resume from the last checkpoint rather than starting over. This matters for activities processing large datasets or running multi-step tool executions.
Saga Patterns for Multi-Agent Systems
When multiple agents coordinate, failure handling gets complex. Temporal supports saga patterns where compensation logic runs when a step fails. If a planning agent fails, downstream execution agents’ pending activities can be cancelled rather than left hanging. If the response agent produces an unsatisfactory draft, compensation logic can route back to the research agent for additional context.
| Activity Type | Retry Strategy | Rationale |
|---|---|---|
| LLM API call | Aggressive backoff, 10+ attempts | Rate limits are transient; restart cost is enormous |
| Idempotent tools (search, read) | Moderate backoff, 3–5 attempts | Safe to re-execute; failures are usually transient |
| Non-idempotent tools (write, deploy) | Limited, 1–2 attempts | Re-execution may cause side effects |
| Human notification | No retry | Fire-and-forget; don’t block the workflow |
| Long-running computation | Heartbeat + resume from checkpoint | Avoid restarting expensive work from scratch |
Production Case Study: OpenAI Codex
OpenAI’s Codex, their cloud-based coding agent that writes, tests, and iterates on code, uses Temporal as its core orchestration backbone. Will Wang, a software engineer on the Codex team, confirmed publicly that “Temporal is a critical part of the infrastructure powering Codex, responsible for executing our core control flows.” He described it as enabling the team to “easily reason about concurrency, correctness, and fault tolerance” while scaling a complicated distributed system.
Codex sessions run for 6+ hours on complex tasks. The entire agent loop (prompt construction, model inference, tool calls, result observation, loop back) runs as a Temporal Workflow. Each LLM call and tool execution is an Activity with its own retry policy and timeout. A single “turn” can involve hundreds of tool calls.
The Codex harness manages three conversation primitives: Items (atomic I/O units like messages or diffs), Turns (one unit of agent work from user input), and Threads (the durable container for an ongoing session, with persisted event history supporting resume, fork, and archive operations). Thread persistence – OpenAI describes threads as “durable containers” with “persisted event history” supporting reconnection – aligns directly with Temporal’s Event History.
Codex has a self-review pattern internally called the “Ralph Wiggum Loop”: the agent reviews its own changes, requests additional agent reviews, and iterates until all reviewers are satisfied. In Temporal terms, the review results arrive as signals, and the workflow decides whether to iterate or complete.
The relationship extends beyond Codex. In July 2025, OpenAI and Temporal launched a formal integration adding durable execution to the OpenAI Agents SDK. Every agent invocation runs as a Temporal Activity, orchestration runs as a Temporal Workflow. Temporal also processes millions of ChatGPT Images generation workflows. Venkat Venkataramani (OpenAI’s VP of App Infrastructure) reinforced this at Temporal’s Series D announcement: “Durable execution is a core requirement for modern AI systems.”
Framework Integrations
Temporal integrates with existing agent frameworks so teams don’t have to rewrite their agent logic from scratch. The pattern is the same across integrations: Temporal provides the durability layer, the framework provides the agent logic.
PydanticAI + Temporal
PydanticAI has first-class Temporal support via a TemporalAgent wrapper that preserves PydanticAI’s type-safety while offloading non-deterministic model requests and tool calls to Temporal activities. The orchestration logic lives in a deterministic workflow, and all I/O-bound tasks are automatically wrapped as activities.
One significant design decision: thread-based workflows. Each conversation thread gets its own Temporal workflow that persists for the lifetime of the conversation. This is more efficient than stateless approaches because the system only processes new messages, maintaining context within workflow state rather than re-sending the entire history for every inference.
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel
from temporalio.client import Client
# Define the agent with PydanticAI's type-safe interface
support_agent = Agent(
model=OpenAIModel("gpt-4o"),
system_prompt="You are a customer support agent.",
result_type=SupportResponse, # Pydantic model for type-safe output
)
@support_agent.tool
async def lookup_order(ctx, order_id: str) -> OrderDetails:
return await db.get_order(order_id)
# Wrap with Temporal for durability
from pydantic_ai_temporal import TemporalAgent
temporal_agent = TemporalAgent(
agent=support_agent,
client=await Client.connect("localhost:7233"),
task_queue="support-agents",
)
# Each conversation gets a durable workflow
result = await temporal_agent.run(
"What's the status of order #12345?",
thread_id="customer-session-abc",
)
OpenAI Agents SDK + Temporal
The OpenAI Agents SDK integration centers on the activity_as_tool helper. This function automatically generates OpenAI-compatible tool schemas directly from Temporal activity signatures. The agent reasons about and invokes activities as tools, with every tool call backed by durable execution.
import { activityAsTool } from "@temporalio/openai-agents";
import { OpenAIAgentsPlugin } from "@temporalio/openai-agents";
// Temporal activities become tools the agent can call
const searchTool = activityAsTool(searchDocuments, {
startToCloseTimeout: "30s",
retryPolicy: { maximumAttempts: 3 },
});
const writeTool = activityAsTool(writeDocument, {
startToCloseTimeout: "60s",
retryPolicy: { maximumAttempts: 1 },
});
// Agent orchestration runs as a Temporal Workflow
// Each tool call is a durable Activity
const plugin = new OpenAIAgentsPlugin({
client: temporalClient,
taskQueue: "agent-workers",
tools: [searchTool, writeTool],
});
Developers use the OpenAIAgentsPlugin to configure the Temporal client and worker, enabling integrated tracing that provides visibility through both the Temporal UI and OpenAI dashboards.
When Temporal Adds Unnecessary Complexity
Temporal is not always the right choice. Here’s where it adds more complexity than value:
- Simple agents: a single LLM call followed by one tool call doesn’t benefit from durable execution infrastructure. One comparison found that adding Temporal to a simple document indexing pipeline required “rearchitecting the app, splitting it into two services, adding a runtime dependency on a third service, and adding over 100 lines of code” where a lighter-weight approach achieved the same with 7 lines.
- Prototyping and experimentation: when you’re iterating on agent architecture, the determinism constraints and operational overhead slow you down.
- Sub-30-second agents: if the agent completes before infrastructure failures become likely, the cost of durable execution exceeds the benefit.
- Teams without infrastructure engineering capacity: self-hosted Temporal requires operating four services plus a database. If you don’t have the team to manage this, the operational burden may outweigh the reliability gains.
Trade-offs
Temporal’s guarantees come with trade-offs that shape day-to-day development experience.
Operational Complexity
Self-hosted Temporal requires deploying four independent services plus a persistence database (PostgreSQL, MySQL, or Cassandra) and optionally Elasticsearch for advanced visibility. This is not a single process with a single run command.
Learning Curve
Engineers must internalize: workflows vs activities, determinism rules, event history mechanics, signals, queries, updates, ContinueAsNew, versioning strategies, worker configuration.
The determinism constraint confuses newcomers, especially because LLMs are inherently non-deterministic. The resolution (LLM calls go in activities, not workflows) is simple once understood, but the documentation framing perpetuates the misconception.
Event History Limits
Each workflow execution is limited to 51,200 events or 50MB. An activity generates roughly 3 events. If activities return large LLM payloads (500KB+), the 50MB limit becomes binding well before the event count limit. The mitigation – ContinueAsNew, which atomically starts a fresh execution with carried-over state – works but adds architectural complexity. Teams building agents with many LLM calls must implement payload offloading (store large payloads in S3, pass references) and proactively manage history growth.
Latency
Temporal Cloud’s minimum end-to-end latency is roughly 100ms per workflow step, with a single activity round-trip taking approximately 220ms. Local Activities save ~50ms per call but sacrifice heartbeating and independent retry capabilities. For agents where sub-second interactivity matters (chatbot-like interactions), this overhead accumulates across many steps. Agents with 50+ steps per interaction may see 5–10 seconds of pure infrastructure overhead.
Versioning
Code changes to workflow logic can cause non-determinism errors during replay of running workflows. If a running workflow was started with version 1 of the code and a worker running version 2 picks it up, the replay may produce different activity commands, causing a non-determinism exception. Temporal provides patching APIs and worker versioning, but patches accumulate in code and “need to be removed with extreme care.” Airbyte documented struggles with non-determinism exceptions, ultimately deciding to fail affected workflows rather than attempting recovery. Safe deployment requires replay testing against production event histories in CI.
| Trade-off | Impact | Mitigation |
|---|---|---|
| Operational complexity | 4+ services to manage, or cloud costs | Temporal Cloud; start with dev server locally |
| Learning curve | 2–3 weeks for team onboarding | Start with simple workflows, add primitives incrementally |
| Event history limits | 51,200 events / 50MB cap per execution | ContinueAsNew + payload offloading to S3 |
| Latency overhead | ~100ms/step, ~220ms/activity round-trip | Local Activities for latency-sensitive paths |
| Versioning complexity | Non-determinism errors on code changes | Replay testing in CI, worker versioning |
Closing Thoughts
We covered a lot of ground here: the workflow/activity split, deterministic replay, server architecture, coordination primitives, retry strategies, and how OpenAI’s Codex team puts it all together.
The core design insight is the separation of deterministic orchestration from non-deterministic execution. Once you accept that split, replay-based recovery falls out as a consequence – and with it, most of the infrastructure problems we listed at the top of this post.
OpenAI, Replit, Block, NVIDIA, and others have independently converged on durable execution for their agent workloads. Temporal’s recent $300M Series D at a $5B valuation, with 380%+ year-over-year revenue growth driven substantially by AI workloads, suggests this is a real pattern. The company joined the Agentic AI Foundation (under the Linux Foundation) alongside Anthropic, OpenAI, and Block.
For most teams, the practical path is: prototype with something lighter (LangGraph, CrewAI), validate the agent architecture, and migrate when the agents run long enough and matter enough that you can’t afford to lose state on a crash. The operational investment is real, but so is the cost of rebuilding reliability from scratch.
References
Temporal Documentation. Core Concepts – Workflows, Activities, Workers. Temporal Technologies.
Temporal. Temporal for AI. Overview of Temporal’s AI-specific capabilities and customer stories.
Wang, W. (2025). Codex and Temporal Integration. Will Wang’s public statements on Codex’s use of Temporal for core control flows.
OpenAI. Harness Engineering: Leveraging Codex in an Agent-First World. OpenAI engineering blog on the Codex harness architecture.
Temporal. Build Durable AI Agents with Pydantic AI and Temporal. PydanticAI integration guide.
Temporal. Of Course You Can Build Dynamic AI Agents with Temporal. Temporal’s architecture for dynamic AI agent loops.
Quo (formerly OpenPhone). How We Built a Real-Time AI Voice Agent with Temporal. Production case study on Temporal primitives for voice agents.
Temporal. Production-Ready Agents with the OpenAI Agents SDK + Temporal. OpenAI Agents SDK integration announcement.
Temporal. AI Cookbook – OpenAI Agents SDK. Code examples and patterns for the OpenAI integration.
PydanticAI Documentation. Temporal Durable Execution. Official PydanticAI guide for Temporal integration.
Vanlightly, J. Explanations of deterministic replay mechanics and the determinism contract in Temporal workflows. Referenced via Temporal community resources.
Wang, X., et al. (2025). The OpenHands Software Agent SDK. arXiv preprint arXiv:2511.03690. The predecessor post’s primary reference for event sourcing comparison.