xAI· Model· Palo Alto, CA
Member of Technical Staff - RL Infrastructure
Comp$180,000 – $440,000
Classified Tasks (19)
Automate 5%Augment 89%Human-Only 5%
Automate (1)
Fully handled by AI agents
Track and report model performance metrics on newly onboarded evaluation datasets
analytical
Augment (17)
AI assists, human decides
Create and maintain robust data pipelines for large-scale training and evaluation
technical
Design and implement comprehensive evaluation suites to benchmark large language models
technical
Build automation frameworks to increase researcher and engineer productivity
technical
Design and implement efficient, robust environments for agentic models to perform actions
technical
Add features to the evaluation framework to streamline researcher workflows and increase observability
technical
Onboard open-source evaluation datasets into the internal evaluation framework, including ingestion and validation
technical
Standardize preprocessing pipelines to prepare datasets for large-scale reinforcement learning training
technical
Create data augmentation pipelines to generate additional training data and integrate them into training workflows
technical
Build high-performance sandboxes, virtual machines, and simulations for agent testing
technical
Develop full-stack applications for automating workflows and visualizing data and metrics
technical
Improve alerts, metrics, and error handling for large-scale reinforcement learning jobs
operational
Refactor agent, data, evaluation, and training frameworks to improve modularity and maintainability
technical
Write unit tests to validate code correctness and support rapid development cycles
technical
Develop and maintain CI/CD pipelines to support rapid iteration from research to production
technical
Instrument observability and monitoring systems to track model performance and evaluation results
operational
Automate common workflows to reduce manual intervention and accelerate experimentation
operational
Prepare and validate datasets requiring complex preprocessing for large-scale RL training
technical
Human-Only (1)
Requires human judgment
Design operational procedures and coding standards to streamline transition from small-scale experiments to large-scale RL training
operational
Job description
ABOUT xAI xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates. ABOUT THE ROLE: xAI is seeking experienced software engineers to create robust data pipelines, comprehensive evaluations for benchmarking LLMs, and automation frameworks to increase the productivity of researchers and engineers. Typical problems you will deal with include the following: We have a new agentic model capability that we’d like to improve. How do we design an efficient and robust environment for the agent to perform actions in? Evaluations and observability are a core part of knowing what we need to improve in our models. What new features can we add into our evaluation framework to ease the workflow of researchers & engineers and increase observability? A new open-source evaluation dataset has been released and researchers would like to track our models performance on it. How should we onboard it into our internal evaluation framework? Datasets have been collected that require complex pre-processing to prepare it for large-scale RL training. How do we standardize our preprocessing pipelines to minimize dataset onboarding time? A researcher on the team has an idea for how to augment a dataset to produce additional training data. How should we go about creating the data augmentation pipeline? RESPONSIBILITIES: Creating and maintaining frameworks for agent, data, and model evaluation tasks. Building environments for AI agents. Tools for automating common workflows. Improving alerts, metrics and error handling on large scale RL jobs. Refactoring existing agent, data, eval, training frameworks for better modularity. Designing operation procedures and coding standards to streamline the transition from small scale experimentation to large scale RL training. Writing unit tests, CI/CD frameworks to support rapid development cycles. BASIC QUALIFICATIONS: Experience building and maintaining frameworks that are used by many engineers. Experience in building high-performance sandboxes, virtual machines, and simulations. Experience building full-stack apps for automating workflows and data visualization. Experience in rapid iteration of research to production cycles. Experience in test automation, CI/CD. COMPENSATION AND BENEFITS: $180,000 - $440,000 USD Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks. xAI is an equal opportunity employer. For details on data processing, view