OpenAI· Agents· San Francisco
Researcher, Agentic Post-Training
Comp$295K – $445K
Classified Tasks (30)
Automate 0%Augment 73%Human-Only 27%
Augment (22)
AI assists, human decides
Create frontier agent systems for deployment in products.
technical
Train models that power agentic behavior across products like Codex, ChatGPT, and the API.
technical
Build training signals that teach desired agent abilities.
technical
Run experiments to develop and validate new agent capabilities.
analytical
Build datasets, environments, graders, training methods, and feedback loops that shape agent behavior.
technical
Carry capabilities through major training runs and integrate them into production products.
operational
Improve the capabilities, reliability, and product fit of agentic models.
technical
Build infrastructure to accelerate and increase trustworthiness of large training runs.
technical
Create evaluations that reveal model failures and gaps.
analytical
Write and debug code for models and agent harnesses.
technical
Integrate and test tool use and function-calling capabilities in agents.
technical
Enable agents to operate computers and perform actions in external environments.
technical
Enable and coordinate multi-agent collaboration and interaction.
technical
Implement mechanisms for agents to complete valuable work on behalf of users.
technical
Measure and analyze whether model changes succeeded using metrics and diagnostics.
analytical
Ship model improvements into customer-facing products.
operational
Design and run experiments to improve agent behavior in coding, tool use, function calling, computer use, multi-agent collaboration, long-horizon tasks, factuality, instruction following, and calibrated reasoning.
analytical
Own and improve post-training stack components, including reinforcement learning, data pipelines, graders, reward signals, evaluations, diagnostics, and model-behavior analysis.
technical
Build evals and environments that expose failure modes and convert failures into training data, product fixes, or new research directions.
analytical
Develop early-training and alignment interventions such as data mixtures, objectives, synthetic data, and evaluation loops that shape downstream agent behavior.
technical
Improve large-scale training and launch processes targeting experiment velocity, reliability, observability, reproducibility, cost, latency, and production readiness.
operational
Debug hard failures in shipped or near-shipped models and convert qualitative behavior into concrete hypotheses, experiments, and fixes.
technical
Human-Only (8)
Requires human judgment
Define next-generation agent capabilities and specifications.
creative
Own a research direction and drive it from conception through execution.
leadership
Drive capabilities end-to-end from idea through experimentation, integration, and launch.
leadership
Solve ambiguous capability problems across research, engineering, data, evaluations, and product.
creative
Collaborate with researchers, engineers, product, infrastructure, and safety partners to select changes for major model runs.
communication
Partner with product teams (e.g., Codex, API/platform, ChatGPT) to translate user signals into model improvements.
communication
Assess and decide which integrations, capabilities, and fixes are ready for inclusion in major model runs.
leadership
Lead cross-functional projects that span model training, product infrastructure, and the production agent harness, including multi-agent systems and production-like environment training.
leadership
Job description
Researcher, Agentic Post-Training | OpenAI Careers ## Researcher, Agentic Post-Training Agents - San Francisco Apply now(opens in a new window) # **About the Team** The Agent Post-Training team creates the frontier agents OpenAI ships to the world. We are training the models behind our agents in Codex, ChatGPT, the API, and other frontier products: persistent, proactive intelligence that can operate computers, collaborate with people and other agents, and expand what people and organizations can imagine, attempt, and achieve. We define what the next generation of agents should be able to do, build the training signal that teaches those abilities, and run the experiments that make them real. Our work spans coding, tool use, computer use, multi-agent coordination, long-horizon execution, factuality, instruction following, calibrated reasoning, and taste. Our team is where new model capabilities get made. We build the data, environments, graders, training methods, and feedback loops that shape what OpenAI's next agents can do, then carry those capabilities through major training runs and into the products people use. # **About the Role** As a member of Agent Post-Training, you will improve the capabilities, reliability, and product fit of OpenAI's agentic models. You might own a research direction, build the infrastructure that makes large training runs faster and more trustworthy, create evals that reveal where models fail, or drive a capability from an idea through experimentation, integration, and launch. This role is intentionally broad. The strongest candidates are not defined by one method or subfield; they are people who can take an ambiguous capability problem and make progress across research, engineering, data, evals, and product. You should be excited to work on models that act in the world: writing and debugging code, using tools, calling functions, operating computers, collaborating with other agents, and completing valuable work on behalf of users. You will work with researchers, engineers, product teams, infrastructure teams, and safety/alignment partners to decide what should go into major model runs, measure whether it worked, and ship improvements into products used by real people. This is a high-agency role for people who want their work to land directly in frontier models. # **In this role, you might** * Design and run experiments that improve agentic model behavior across coding, tool use, function calling, computer use, multi-agent collaboration, long-horizon tasks, factuality, instruction following, and calibrated reasoning. * Own end-to-end improvements to the post-training stack, including RL, data pipelines, graders, reward signals, evals, diagnostics, and model-behavior analysis. * Build evals and environments that expose the next set of model failures, then turn those failures into training data, product fixes, or new research directions. * Partner with Codex, API/platform, and ChatGPT product teams to understand what users need and translate product signal into model improvements. * Work on early-training and alignment interventions, including data mixtures, objectives, synthetic data, and eval loops that shape downstream agent behavior. * Help decide which integrations, capabilities, and fixes are ready for inclusion in major model runs. * Improve the machinery for large-scale training and launch: experiment velocity, reliability, observability, reproducibility, cost, latency, and production readiness. * Take on cross-functional projects that touch model training, product infrastructure, and the production agent harness, such as multi-agent systems or training directly against production-like environments. * Debug hard failures in shipped or near-shipped models and turn messy qualitative behavior into concrete hypotheses, experiments, and fixes. # **You might thrive in this role if you** * Have strong technical fundamentals in machine learning,