Chris McKenzie

Jun 05, 2025 • 10 min read

Building AI Agents with the OpenAI Agents SDK

A practical guide to creating goal-oriented, tool-using agents with OpenAI’s official framework

Building AI Agents with the OpenAI Agents SDK

Agents, agents, agents. You’ve heard it before, and you’ll hear it again: 2025 is the year of the agent. 1 2 3.

You’ve probably built a bot, wired it to a tool, and hit the wall when it came time to orchestrate multi-step logic or share memory across runs. That’s where agents come in.

So, what exactly is an agent? There’s no single definition everyone agrees on, but in practice, an agent is an AI system that uses an LLM to steer the flow of an app. Most agents pair a model with tools, some planning logic, and memory so they can accomplish a task on the users behalf.

“rather than arguing over which work to include or exclude as being a true AI agent, we can acknowledge that there are different degrees to which systems can be agentic.” ~ Andrew Ng

An agent has a few key characteristics:

  • Goal-Oriented: Agents are built to accomplish a specific task or set of tasks, like handling concert ticket refunds or generating reports.

  • LLM-Powered: At the heart of an agent is a language model that helps the agent make decisions and figure out how to reach its goal.

  • Tool-Enabled: To actually do anything useful (since LLMs alone can’t take action), agents need tools. If the goal is processing refunds, the agent needs access to the ticketing system to make it happen.

  • Memory-Aware: Many agents keep track of past interactions or task progress, helping them stay on track and improve over time.

This is just a quick overview. Disagree? Want more detail? Drop a comment.

OpenAI Agents SDK

OpenAI releasing their own SDK for building agents isn’t surprising — they’re a major player in the space, and let’s be honest, it is the ‘year of the agent.’

With so many agent frameworks already out there — LangGraph, CrewAI, MCP-Agent, or just a good old while loop — is it really worth learning another one? Short answer: yes. Longer answer: it depends. Here’s when it makes sense:

  • ✅ You’re already using OpenAI and want to build fast.

  • ❌ You need persistent memory, rich state management, or non-OpenAI models (and don’t want to fight the SDK).

Now, if I were smart, I’d save that take for the end to boost engagement. But I’d rather respect your time and lay out where I stand before jumping into the examples.

Pros

  • Tight OpenAI integration: If you’re already using OpenAI, it’s quick to get up and running.

  • Lightweight: Minimal and easy to learn — you’re not dealing with a ton of boilerplate.

  • Agent Handoff: Built-in support for delegating between agents. Great for keeping each agent focused on a specific task.

  • Guardrails: Native safety checks on both user input and agent output.

  • Tracing tools: Comes with built-in tracing that works out of the box with OpenAI’s Evaluations platform — and you can wire it up to other systems if needed.

Cons

  • Tight OpenAI integration: While it does support other providers, I found the third-party integrations a bit clunky, and the official docs are obtuse on how to accomplish this.

  • Limited flexibility for complex workflows: No graph-based state control or persistent memory — consider LangGraph if you need that.

  • Early days: Incomplete documentation, and the API is still shifting.

Overall, I encourage you to check it out. The barrier to entry is low, and I would love for you to continue reading ;)

Hands On

Runner

The Runner handles the agent loop for you. You provide a starting agent and input — either a plain string (treated as a user message) or a list of structured inputs (same as you'd pass to the OpenAI Responses API). From there, it loops: the agent calls the model, the model responds, and based on that response, it either returns a final result, hands off to another agent, or calls tools and continues. The agent keeps looping until it returns the final output or hits the maximum number of turns. You can also stream the whole thing in real time by setting stream option to true (false by default), which is great if you want to show progress in the UI.

Tools

Hosted Tools

OpenAI provides a few useful tools out of the box:

  • webSearchTool – for real-time web searches

  • fileSearchTool – for querying OpenAI vector stores

  • codeInterpreterTool – for sandboxed code execution

  • imageGenerationTool – for image generation

  • computerTool – for automating tasks on your machine

import { Agent, webSearchTool, fileSearchTool } from '@openai/agents';

const agent = new Agent({
    name: 'Personal Assistant',
    tools: [webSearchTool(), fileSearchTool('VS_ID')],
});

Function Tools

However, if you want to create your own custom tools, so you can, you know, get the weather, that’s very straightforward too.

import { tool } from '@openai/agents';

import { z } from 'zod';
const getWeatherTool = tool({
    name: 'get_weather',
    description: 'Get the weather for a given city',
    parameters: z.object({ city: z.string() }),
    async execute({ city }) {
        return `The weather in ${city} is sunny.`;
    },
});

Agents as Tools

You can take it even further by using other agents as tools. Just call asTool().

import { Agent } from '@openai/agents';

const summarizer = new Agent({
    name: 'Summarizer',
    instructions: 'Generate a concise summary of the supplied text.',
});
const summarizerTool = summarizer.asTool({
    toolName: 'summarize_text',
    toolDescription: 'Generate a concise summary of the supplied text.',
});
const mainAgent = new Agent({
    name: 'Research assistant',
    tools: [summarizerTool],
});

MCP Servers

With OpenAI’s Agents SDK, you can integrate your agent with MCP servers, which is exciting for two reasons. First, MCP is a widely supported protocol for AI interactions. Second, it’s a strong signal that OpenAI is serious about interoperability — supporting a protocol originally created by a competitor. If you’re not familiar with MCP, you can learn more here.

import { Agent, run, MCPServerStdio } from "@openai/agents";

async function main() {
  // https://github.com/modelcontextprotocol/servers/tree/main/src/memory
  const mcpServer = new MCPServerStdio({
    name: "Memory Knowledge Graph Server, via npx",
    command: "npx",
    args: ["-y", "@modelcontextprotocol/server-memory"],
  });

  await mcpServer.connect();

  try {
    const agent = new Agent({
      name: "Assistant",
      instructions:
        "Use the tools to read and write to the knowledge graph, providing information and saving data. Whatever is in the knowledge graph is true.",
      mcpServers: [mcpServer],
    });
    let result = await run(
      agent,
      "remember that the capital of United States is New York City."
    );
    console.log(result.finalOutput ?? result.output ?? result);
    result = await run(
      agent,
      "Search knowledge graph for the capital of United States?"
    );
    console.log(result.finalOutput ?? result.output ?? result);
  } finally {
    await mcpServer.close();
  }
}

main().catch((err) => {
  console.error(err);
  process.exit(1);
});

Agent Handoffs

Like any application, as your agent grows in complexity, you’ll probably want to break it into smaller, more focused agents designed to handle specific tasks. For example, say you’re building a customer service agent — instead of trying to make one agent handle everything (troubleshooting, purchases, refunds, order status, etc.), you’ll likely benefit from creating dedicated agents for each task and coordinating them through a main agent.

import { Agent, handoff } from '@openai/agents';

const troubleshootingAgent = new Agent({ name: 'Troubleshooting agent' });
const refundAgent = new Agent({ name: 'Refund agent' });
const triageAgent = Agent.create({
    name: 'Triage agent',
    handoffs: [troubleshootingAgent, handoff(refundAgent)],
});

Tracing

I won’t go too deep here — it really deserves its own article. However, tracing your agent’s behavior is critical to understanding what is and isn’t working. There are plenty of options — from LangSmith and LangFuse to OpenAI’s built-in tools. I’m personally a fan of LangFuse and LangSmith, but if you’re already deeply integrated into the OpenAI ecosystem, it’s worth giving their tracing product a try. Ultimately, it’s up to you which platform to use — but it’s important that you pick one. If you’re interested in adding your tracing platform follow the docs for custom tracing processors.

Guardrails

Guardrails run alongside your agents and help catch bad input before it hits your expensive model. Say you’ve got a smart (but slow and costly) model handling customer travel requests — you don’t want it generating code. A cheap, fast model can act as a guardrail, flagging misuse early and skipping the expensive call. Before going live, you should add some kind of protection to prevent abuse — both to keep costs down and to avoid brand damage from misuse.

import { Agent, run, InputGuardrailTripwireTriggered } from "@openai/agents";
import { z } from "zod";
import "dotenv/config";

const inpputGuardrailAgent = new Agent({
  name: "Input guardrail check",
  instructions: "check if the input includes request to generate code",
  outputType: z.object({
    codeGenerationRequest: z.boolean(),
    reasoning: z.string(),
  }),
});

const codeInputGuardrail = {
  name: "Code Request Guardrail",
  execute: async ({ input, context }) => {
    const result = await run(inpputGuardrailAgent, input, { context });
    return {
      outputInfo: result.finalOutput,
      tripwireTriggered: result.finalOutput?.codeGenerationRequest ?? false,
    };
  },
};

const outputGuardrailAgent = new Agent({
  name: "Output guardrail check",
  instructions: "Check if the output includes code.",
  outputType: z.object({ reasoning: z.string(), isCode: z.boolean() }),
});

const codeOutputGuardrail = {
  name: "Code Output Guardrail",
  async execute({ agentOutput, context }) {
    // Note: in the docs it says that agentOutput is an object with a `response` property.
    // In version 0.0.2, this is not the case.
    const result = await run(outputGuardrailAgent, agentOutput, { context });
    return {
      outputInfo: result.finalOutput,
      tripwireTriggered: result.finalOutput?.isCode ?? false,
    };
  },
};

const agent = new Agent({
  name: "Travel agent",
  instructions:
    "You are a travel agent. You help the user with their travel plans.",
  inputGuardrails: [codeInputGuardrail],
  outputGuardrails: [codeOutputGuardrail],
});

async function main() {
  try {
    // This won't trip guardrail
    let result = await run(agent, "Tell me about cheapest flights to Paris");
    console.log(result.finalOutput);
  } catch (e) {
    console.error("Error running agent:", e);
    if (e instanceof InputGuardrailTripwireTriggered) {
      console.log("Code guardrail tripped");
    }
  }

  try {
    // this will trip the input guardrail
    result = await run(agent, "Write code that fetchings flights to Paris");
  } catch (e) {
    if (e instanceof InputGuardrailTripwireTriggered) {
      console.log("Code guardrail tripped");
    }
  }
}

main().catch(console.error);

Models

The Agents SDK is a strong choice if you need a simple framework to get something up and running quickly — especially if you’re using OpenAI’s models. Within the framework, only OpenAI technology is treated as a first-class citizen.

You can still use other model providers, but the implementation is clunky, requiring you to map the provider to OpenAI’s API. Fortunately, OpenAI released a separate package called agents-extensions, which smooths over the pain points of working with custom providers. It lets you easily create a compatible model for any provider supported by Vercel’s AI SDK.

// Will need to install: npm install @openai/agents-extensions
import { Agent, run } from "@openai/agents";
import { aisdk } from "@openai/agents-extensions";
import { anthropic } from "@ai-sdk/anthropic";
const model = aisdk(anthropic("claude-3-haiku-20240307"));

async function main() {
  const agent = new Agent({
    name: "Creative writer",
    model: model,
  });
  let result = await run(
    agent,
    "Write a short story about a robot learning to love."
  );
  console.log(result.finalOutput ?? result.output ?? result);
}

main().catch((error) => {
  console.error("Error running the agent:", error);
})

Final Thoughts

I kept this focused on what it takes to get something working fast. But once that’s in place, it’s worth digging deeper — build your own agents, hook in voice, add streaming, manage context, whatever your app needs. The surface is simple, but you don’t have to stay there.

If you’re already in the OpenAI ecosystem and want to ship something fast, the Agents SDK is a great option. It handles the boring parts — loops, handoffs, guardrails, tracing — so you can focus on what actually makes your agent useful. Whether you’re prototyping or going to production, it’s a solid place to start.


To stay connected and share your journey, feel free to reach out through the following channels:

  • 👨‍💼 LinkedIn: Join me for more insights into AI development and tech innovations.

  • 🤖 JavaScript + AI: Join the JavaScript and AI group and share what you’re working on.

  • 💻 GitHub: Explore my projects and contribute to ongoing work.

  • 📚 Medium: Follow my articles for more in-depth discussions on LangSmith, LangChain, and other AI technologies.

Join Chris on Peerlist!

Join amazing folks like Chris and thousands of other people in tech.

Create Profile

Join with Chris’s personal invite link.

3

12

0