Ai Systems Engineer Codex Agents San Francisco
Classified Tasks (19)
Augment (17)
AI assists, human decides
Build the agent harness that turns model capability into real-world action.
technical
Design and implement prompting and interpretation pipelines for model outputs.
technical
Feed production experience and telemetry back into models and agent behavior for improvement.
analytical
Operate and develop systems across the stack including harness, model interaction, inference, sandboxed execution, orchestration, evals, and production reliability.
technical
Build AI systems and infrastructure to make Codex agents dependable in production.
technical
Debug Codex behavior end-to-end across the harness, model behavior, inference/runtime stack, GPU fleet, and product surface.
technical
Run experiments and ablations across model, system prompts, and harness stack.
analytical
Build frameworks and tooling for assessing production agent performance.
technical
Convert messy production failures into durable fixes and improvements.
operational
Design and build the core agent execution loop that enables agents to interpret outputs, use tools, and execute code.
technical
Implement capabilities that let agents complete long-horizon tasks safely.
technical
Build sandboxing, isolation, orchestration, state, and workflow infrastructure for agents in real development environments.
technical
Develop evaluation, experimentation, and debugging systems that distinguish harness issues, model behavior issues, inference/runtime problems, and product failures.
analytical
Run ablations across prompts, model-facing interfaces, context construction, tool-use strategies, and harness behavior to improve solve rate, reliability, latency, and cost.
analytical
Improve observability, profiling, and diagnostics across the agent stack, including backend systems, inference, GPUs, and fleet capacity.
technical
Make the harness trainable, measurable, and usable to improve frontier agentic models in collaboration with research.
technical
Build shared primitives and libraries to make Codex faster, safer, more reliable, and easier for internal teams and open-source users to build on.
technical
Human-Only (2)
Requires human judgment
Execute agent actions safely in real environments.
operational
Collaborate with research, infrastructure, and product teams to design agent harness capabilities.
communication