Tech Lead, Deployment & Operations — Custom Infrastructure at OpenAI — task breakdown

Tech Lead, Deployment & Operations — Custom Infrastructure | OpenAI Careers ## Tech Lead, Deployment & Operations — Custom Infrastructure Hardware - San Francisco Apply now(opens in a new window) ## **About the Team** OpenAI’s Hardware organization develops silicon and system-level solutions designed for the unique demands of advanced AI workloads. The team is responsible for building the next generation of AI-native silicon while working closely with software and research partners to co-design hardware tightly integrated with AI models. In addition to delivering production-grade silicon for OpenAI’s supercomputing infrastructure, the team also creates custom design tools and methodologies that accelerate innovation and enable hardware optimized specifically for AI. ## **About the Role** We are seeking a Technical Lead to lead deployment and operations for OpenAI’s Silicon & Systems team. This person will become the Directly-Responsible Individual responsible for bringing OpenAI’s custom silicon and associated systems into data center environments, ensuring successful deployment, bring-up, validation, operational readiness, and ongoing reliability at scale. This role sits at the intersection of silicon, systems, infrastructure, data center operations, and software. You will lead a team focused on taking new hardware platforms from lab validation into production data center deployment. You will be responsible for building the operational processes, technical workflows, tooling, and cross-functional alignment required to deploy and operate custom AI hardware reliably in OpenAI’s supercomputing infrastructure. The ideal candidate is both a strong leader and a deeply technical operator. You should be comfortable staying close to the technical details of hardware bring-up, fleet deployment, debugging, system validation, data center integration, and production operations. This role requires strong execution, excellent cross-functional judgment, and the ability to drive clarity in ambiguous, fast-moving environments. ## **In this role, you will:** * Lead a team responsible for deployment and operations of OpenAI’s custom silicon and systems in data center environments * Own the path from hardware bring-up and validation through production deployment, operational readiness, and sustained fleet support * Partner closely with silicon, systems, software, infrastructure, networking, data center, supply chain, and external partner teams to ensure successful deployment at scale * Define deployment processes, operational playbooks, technical readiness criteria, escalation paths, and reliability practices for new hardware platforms * Drive cross-functional execution across lab bring-up, rack/system integration, data center deployment, fleet monitoring, debugging, and issue resolution * Stay hands-on technically through architecture reviews, deployment planning, failure analysis, operational debugging, and critical system-level decision-making * Identify gaps in tooling, observability, automation, validation coverage, and operational processes, and build plans to close them * Establish clear metrics for deployment readiness, reliability, performance, maintainability, and operational health * Build a strong engineering culture grounded in ownership, technical rigor, operational excellence, and high-velocity execution * Ensure OpenAI’s custom hardware platforms can be deployed and operated reliably, repeatably, and safely at scale * Be a contributor and technical driver for the architecture and design of future ML systems ## **You might thrive in this role if you:** * Enjoy mentoring and developing engineers while staying deeply engaged in technical execution * Are excited by the challenge of bringing new custom hardware platforms into real-world production data center environments * Can operate across silicon, systems, software, infrastructure, and data center operations * Are comfortable l

Tech Lead, Deployment & Operations — Custom Infrastructure

Classified Tasks (14)

Augment (6)

Human-Only (8)

Job description