Find your next role

Discover amazing opportunities across our network of companies committed to gender equality in the workplace.

Member of Technical Staff, ML Infra, AGI

Amazon

Amazon

Software Engineering, IT, Data Science
San Francisco, CA, USA
Posted on Nov 25, 2025

Description

Are you interested in a unique opportunity to advance the accuracy and efficiency of Artificial General Intelligence (AGI) systems? If so, you're at the right place! We are the AGI Autonomy organization, and we are looking for a driven and talented Member of Technical Staff to join us to build state-of-the art agents.

Our lab is a small, talent-dense team with the resources and scale of Amazon. Each team in the lab has the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research. We’re entering an exciting new era where agents can redefine what AI makes possible. We’d love for you to join our lab and build it from the ground up!


Key job responsibilities
* Design, build, and maintain the compute platform that powers all AI research at the SF AI Lab, managing large-scale GPU pools and ensuring optimal resource utilization
* Partner directly with research scientists to understand experimental requirements and develop infrastructure solutions that accelerate research velocity
* Implement and maintain robust security controls and hardening measures while enabling researcher productivity and flexibility
* Modernize and scale existing infrastructure by converting manual deployments into reproducible Infrastructure as Code using AWS CDK
* Optimize system performance across multiple GPU architectures, becoming an expert in extracting maximum computational efficiency
* Design and implement monitoring, orchestration, and automation solutions for GPU workloads at scale
* Ensure infrastructure is compliant with Amazon security standards while creatively solving for research-specific requirements
* Collaborate with AWS teams to leverage and influence cloud services that support AI workloads
* Build distributed systems infrastructure, including Kubernetes-based orchestration, to support multi-tenant research environments
* Serve as the bridge between traditional systems engineering and ML infrastructure, bringing enterprise-grade reliability to research computing

About the team
This role is part of the foundational infrastructure team at the SF AI Lab, responsible for the platform that enables all research across the organization. Our team serves as the critical link between Amazon's enterprise infrastructure and the Lab's research needs. We are experts in performance optimization, systems architecture, and creative problem-solving—finding ways to push the boundaries of what's possible while maintaining security and reliability standards.

We work closely with research scientists, understanding their experimental needs and translating them into robust, scalable infrastructure solutions. Our team has deep expertise in ML framework internals and GPU optimization, but we're also pragmatic systems engineers who build traditional infrastructure with enterprise-grade quality. We value engineers who can balance research velocity with operational excellence, who bring curiosity about ML while maintaining strong fundamentals in systems engineering.

This is a small, high-impact team where your work directly enables breakthrough AI research. You'll have the opportunity to work with some of the most advanced AI infrastructure in the world while building the skills that define the future of ML systems engineering.