OpenAI· Agents· San Francisco

Researcher, Agentic Post-Training

Comp$295K – $445K

Classified Tasks (30)

Automate 0%Augment 73%Human-Only 27%

Augment (22)

AI assists, human decides

Create frontier agent systems for deployment in products.

technical

Train models that power agentic behavior across products like Codex, ChatGPT, and the API.

technical

Build training signals that teach desired agent abilities.

technical

Run experiments to develop and validate new agent capabilities.

analytical

Build datasets, environments, graders, training methods, and feedback loops that shape agent behavior.

technical

Carry capabilities through major training runs and integrate them into production products.

operational

Improve the capabilities, reliability, and product fit of agentic models.

technical

Build infrastructure to accelerate and increase trustworthiness of large training runs.

technical

Create evaluations that reveal model failures and gaps.

analytical

Write and debug code for models and agent harnesses.

technical

Integrate and test tool use and function-calling capabilities in agents.

technical

Enable agents to operate computers and perform actions in external environments.

technical

Enable and coordinate multi-agent collaboration and interaction.

technical

Implement mechanisms for agents to complete valuable work on behalf of users.

technical

Measure and analyze whether model changes succeeded using metrics and diagnostics.

analytical

Ship model improvements into customer-facing products.

operational

Design and run experiments to improve agent behavior in coding, tool use, function calling, computer use, multi-agent collaboration, long-horizon tasks, factuality, instruction following, and calibrated reasoning.

analytical

Own and improve post-training stack components, including reinforcement learning, data pipelines, graders, reward signals, evaluations, diagnostics, and model-behavior analysis.

technical

Build evals and environments that expose failure modes and convert failures into training data, product fixes, or new research directions.

analytical

Develop early-training and alignment interventions such as data mixtures, objectives, synthetic data, and evaluation loops that shape downstream agent behavior.

technical

Improve large-scale training and launch processes targeting experiment velocity, reliability, observability, reproducibility, cost, latency, and production readiness.

operational

Debug hard failures in shipped or near-shipped models and convert qualitative behavior into concrete hypotheses, experiments, and fixes.

technical

Human-Only (8)

Requires human judgment

Define next-generation agent capabilities and specifications.

creative

Own a research direction and drive it from conception through execution.

leadership

Drive capabilities end-to-end from idea through experimentation, integration, and launch.

leadership

Solve ambiguous capability problems across research, engineering, data, evaluations, and product.

creative

Collaborate with researchers, engineers, product, infrastructure, and safety partners to select changes for major model runs.

communication

Partner with product teams (e.g., Codex, API/platform, ChatGPT) to translate user signals into model improvements.

communication

Assess and decide which integrations, capabilities, and fixes are ready for inclusion in major model runs.

leadership

Lead cross-functional projects that span model training, product infrastructure, and the production agent harness, including multi-agent systems and production-like environment training.

leadership

Job description

Researcher, Agentic Post-Training | OpenAI Careers ## Researcher, Agentic Post-Training Agents - San Francisco Apply now(opens in a new window) # **About the Team** The Agent Post-Training team creates the frontier agents OpenAI ships to the world. We are training the models behind our agents in Codex, ChatGPT, the API, and other frontier products: persistent, proactive intelligence that can operate computers, collaborate with people and other agents, and expand what people and organizations can imagine, attempt, and achieve. We define what the next generation of agents should be able to do, build the training signal that teaches those abilities, and run the experiments that make them real. Our work spans coding, tool use, computer use, multi-agent coordination, long-horizon execution, factuality, instruction following, calibrated reasoning, and taste. Our team is where new model capabilities get made. We build the data, environments, graders, training methods, and feedback loops that shape what OpenAI's next agents can do, then carry those capabilities through major training runs and into the products people use. # **About the Role** As a member of Agent Post-Training, you will improve the capabilities, reliability, and product fit of OpenAI's agentic models. You might own a research direction, build the infrastructure that makes large training runs faster and more trustworthy, create evals that reveal where models fail, or drive a capability from an idea through experimentation, integration, and launch. This role is intentionally broad. The strongest candidates are not defined by one method or subfield; they are people who can take an ambiguous capability problem and make progress across research, engineering, data, evals, and product. You should be excited to work on models that act in the world: writing and debugging code, using tools, calling functions, operating computers, collaborating with other agents, and completing valuable work on behalf of users. You will work with researchers, engineers, product teams, infrastructure teams, and safety/alignment partners to decide what should go into major model runs, measure whether it worked, and ship improvements into products used by real people. This is a high-agency role for people who want their work to land directly in frontier models. # **In this role, you might** * Design and run experiments that improve agentic model behavior across coding, tool use, function calling, computer use, multi-agent collaboration, long-horizon tasks, factuality, instruction following, and calibrated reasoning. * Own end-to-end improvements to the post-training stack, including RL, data pipelines, graders, reward signals, evals, diagnostics, and model-behavior analysis. * Build evals and environments that expose the next set of model failures, then turn those failures into training data, product fixes, or new research directions. * Partner with Codex, API/platform, and ChatGPT product teams to understand what users need and translate product signal into model improvements. * Work on early-training and alignment interventions, including data mixtures, objectives, synthetic data, and eval loops that shape downstream agent behavior. * Help decide which integrations, capabilities, and fixes are ready for inclusion in major model runs. * Improve the machinery for large-scale training and launch: experiment velocity, reliability, observability, reproducibility, cost, latency, and production readiness. * Take on cross-functional projects that touch model training, product infrastructure, and the production agent harness, such as multi-agent systems or training directly against production-like environments. * Debug hard failures in shipped or near-shipped models and turn messy qualitative behavior into concrete hypotheses, experiments, and fixes. # **You might thrive in this role if you** * Have strong technical fundamentals in machine learning,

Source: OpenAI careers · scraped 2026-05-22

Apply at OpenAI