Find your next role

Discover amazing opportunities across our network of companies committed to gender equality in the workplace.

Software Development Engineer, EC2 UltraServer Delivery Team

Amazon

Amazon

Software Engineering
Seattle, WA, USA
Posted on Apr 2, 2026

Description

The Software Development Engineer II will design, build, and maintain cloud-based provisioning workflows for NVIDIA GB200/GB300 UltraServers, orchestrating complex multi-asset systems from infrastructure handoff to production delivery. This role requires expertise in AWS services, system architecture, and cross-functional collaboration with Manufacturing, Operations, and Program Management teams to deliver AI/ML infrastructure.

Key job responsibilities
The Software Development Engineer (SDE II) on the EC2 UltraServer Delivery team is responsible for delivering production-ready GB200 and GB300 UltraServers to customers by orchestrating complex multi-asset provisioning workflows. Following are the core responsibilities

System Design & Architecture

* Design and architect solutions that are cross-functional to Manufacturing, Operations, and Program Management
* Work in environments where the technology strategy is defined but the solution design is not
* Build solutions that are stable, logical, testable, and efficient with the ability to independently make trade-off decisions
* Investigate and develop design concepts to frame solution sets at an application and product level

Software Development

* Build cloud-based solutions using AWS native services for scaling infrastructure frameworks
* Write high-quality, maintainable code with proper testing and code reviews
* Develop and maintain the Multi-Asset Provisioning Service workflows for GB200 and GB300 UltraServer hosts
* Implement automation for hardware testing, cable validation, and testing processes
* Create observable systems with appropriate metrics and alarming

Operational Excellence

* Execute and monitor UltraServer workflows for UltraServer provisioning
* Troubleshoot workflow failures and coordinate with downstream teams
* Focus on operational excellence by identifying problems and proposing solutions that improve manufacturing software

Hardware & Software Integration

* Work with hardware and software integrations specific to GPU clusters and AI/ML training systems
* Manage network partition configurations for multi-node GPU clusters
* Handle firmware validation and consistency checks across asset groups

Team Collaboration

* Collaborate with customers and stakeholders to convert business needs into technical designs
* Participate in code reviews and technical assessments


A day in the life
This is a hands-on position in which you will own everything from end to end: requirements gathering, designs, design reviews, implementations, code reviews, incremental feature launches, operations, mentoring, and the driving of continuous improvement.

About the team
The EC2 UltraServer Provisioning team is a high-performing engineering organization responsible for delivering NVIDIA-based ML infrastructure at scale. We manage end-to-end provisioning workflows for GB200 and GB300 UltraServers, from host ingestion through testing, repair, and recovery. Our team drives operational excellence through continuous improvement of build quality metrics, reduction of dwell times, and collaboration on fleet-wide unsellable reduction initiatives. We work closely with hardware engineering, data center operations, and EC2 service teams to ensure reliable, efficient delivery of critical ML compute capacity. This is a high-impact role leading a two-pizza team of talented engineers solving complex technical challenges in one of Amazon's fastest-growing infrastructure domains.