Nuvepro - Task Intelligence for the Enterprise
OpenAI

Ai Systems Engineer Codex Agents San Francisco

Comp$230K – $385K

Classified Tasks (19)

Automate 0%Augment 89%Human-Only 11%

Augment (17)

AI assists, human decides

Build the agent harness that turns model capability into real-world action.

technical

Design and implement prompting and interpretation pipelines for model outputs.

technical

Feed production experience and telemetry back into models and agent behavior for improvement.

analytical

Operate and develop systems across the stack including harness, model interaction, inference, sandboxed execution, orchestration, evals, and production reliability.

technical

Build AI systems and infrastructure to make Codex agents dependable in production.

technical

Debug Codex behavior end-to-end across the harness, model behavior, inference/runtime stack, GPU fleet, and product surface.

technical

Run experiments and ablations across model, system prompts, and harness stack.

analytical

Build frameworks and tooling for assessing production agent performance.

technical

Convert messy production failures into durable fixes and improvements.

operational

Design and build the core agent execution loop that enables agents to interpret outputs, use tools, and execute code.

technical

Implement capabilities that let agents complete long-horizon tasks safely.

technical

Build sandboxing, isolation, orchestration, state, and workflow infrastructure for agents in real development environments.

technical

Develop evaluation, experimentation, and debugging systems that distinguish harness issues, model behavior issues, inference/runtime problems, and product failures.

analytical

Run ablations across prompts, model-facing interfaces, context construction, tool-use strategies, and harness behavior to improve solve rate, reliability, latency, and cost.

analytical

Improve observability, profiling, and diagnostics across the agent stack, including backend systems, inference, GPUs, and fleet capacity.

technical

Make the harness trainable, measurable, and usable to improve frontier agentic models in collaboration with research.

technical

Build shared primitives and libraries to make Codex faster, safer, more reliable, and easier for internal teams and open-source users to build on.

technical

Human-Only (2)

Requires human judgment

Execute agent actions safely in real environments.

operational

Collaborate with research, infrastructure, and product teams to design agent harness capabilities.

communication

Job description

--- BEGIN UNTRUSTED EXTERNAL CONTENT (source: https://openai.com/careers/ai-systems-engineer-codex-agents-san-francisco/) --- Skip to main contentResearchProductsBusinessDevelopersCompanyFoundation(opens in a new window)Log inTry ChatGPT(opens in a new window)ResearchProductsBusinessDevelopersCompanyFoundation(opens in a new window)AI Systems Engineer, Codex Agents | OpenAICareersAI Systems Engineer, Codex AgentsCodex - Engineering - San FranciscoApply now(opens in a new window)AI Systems Engineer - Codex Core AgentsAbout The TeamThe Codex Core Agents team builds the agent harness that turns model capability into real-world action. We own the systems around the model: prompting and interpreting model outputs, executing actions safely in real environments, and feeding production experience back into better models and better agent behavior.This team sits close to research and works across the stack: harness, model interaction, inference, sandboxed execution, orchestration, evals, production reliability, and the performance envelope around tokens, latency, cost, capacity, and quality. The harness is open source and increasingly part of how models are trained and evaluated, making this one of the highest-leverage layers in Codex.About The RoleWe’re looking for engineers to build the AI systems that make Codex agents dependable in production. The ideal candidate is an agent-systems builder: hands-on across low-level systems and ML workflows, able to debug Codex behavior end to end across the harness, model behavior, inference/runtime stack, GPU fleet, and product surface.You’ll work with research, infrastructure, and product to design agent harness capabilities, run experiments and ablations across the model + system prompt + harness stack, build frameworks for assessing production agent performance, and turn messy failures into durable improvements.What You’ll DoDesign and build the core agent harness and execution loop that lets Codex agents interpret model outputs, use tools, execute code, and complete long-horizon tasks safely.Build sandboxing, isolation, orchestration, state, and workflow infrastructure for agents operating in real development environments.Develop evaluation, experimentation, and debugging systems that distinguish harness issues, model behavior, inference/runtime issues, and product failures.Run ablations across prompts, model-facing interfaces, context construction, tool-use strategies, and harness behavior to improve solve rate, reliability, latency, and cost.Improve observability, profiling, and diagnostics across the agent stack, from backend systems to inference, GPUs, and fleet capacity.Work closely with research to make the harness trainable, measurable, and useful for improving frontier agentic models.Build shared primitives that make Codex faster, safer, more reliable, and easier for other teams and open-source users to build on.You Might Be A Good Fit If YouHave built or operated production systems in distributed systems, infrastructure, developer tooling, sandboxing, virtualization, cloud platforms, or ML systems.Enjoy working across layers: Rust systems code, Python configuration layers, APIs, agent orchestration, evals, logs/traces, inference behavior, runtime constraints, and user outcomes.Have hands-on experience with LLM applications, coding agents, evals, model deployment, inference, compiler/runtime performance, or developer platforms.Care deeply about reliability, safety, performance, debuggability, and clean abstractions.Can debug from evidence and move quickly from ambiguous production failures to practical, durable fixes.Want to work close to research while still shipping changes to productionStill write meaningful code, show strong ownership, and can lead scoped or multi-team AI systems work.Bonus PointsDeep Rust, systems, sandboxing, isolation, or low-level platform experience.Experience with coding agents, agent harnesses, tool-using LLM systems, model evals, or post-training feedback
Source: OpenAI careers · scraped 2026-05-22
Apply at OpenAI