Find your next role
Discover amazing opportunities across our network of companies committed to gender equality in the workplace.
Operations
Seattle, WA, USA
CIAT is the unified source for Infrastructure Operations data and BI solutions across Amazon's global data center fleet. We build and run the analytics platform that Central Ops leadership uses to manage rack install, decom, repair, logistics, capacity optimization, and network operations. The platform spans a large-scale datalake, multiple Redshift clusters, hundreds of Airflow pipelines, hundreds of AWS accounts, dozens of production QuickSight dashboards, and thousands of active users.
We need a System Development Engineer II to own platform infrastructure — the AWS accounts, application services, deployment automation, security posture, and emerging GenAI capabilities that the rest of the team builds on top of. You'll work with a senior SysDE who sets the technical direction, alongside other SysDEs and a cross-functional team of Data Engineers and BIEs who depend on your platform to ship their work.
The role is split between keeping production running (account governance, security remediation, deployment pipelines, on-call) and building new capabilities (Bedrock integration, QuickSight Q topic infrastructure, agent frameworks, self-service tooling). The GenAI platform work is early-stage — you'll help define the patterns, not just implement someone else's design.
Key job responsibilities
- Own AWS infrastructure across hundreds of accounts — cross-account access patterns, IAM governance, service control policies, LakeFormation permissions
- Build and maintain infrastructure-as-code (CDK/CloudFormation) for production services including LakeSQL, Validation Engine, TEMPO, Langley, CIAuth, and QuickSight
- Build deployment automation and CI/CD that lets Data Engineers and BIEs ship without waiting on a SysDE — the goal is self-service, not gatekeeping
- Stand up GenAI platform infrastructure — Bedrock integration, QuickSight Q topic configuration, agent systems (Spaces, Topics, Knowledge Bases, Actions), cross-account data access for AI workloads
- Drive security and compliance — Mirador/AppSec findings, patching, least-privilege IAM, security posture across production accounts
- Mentor junior SysDEs — break down complex problems into implementable pieces, review CRs, coach on architecture and operational thinking
- Reduce KTLO through automation, legacy system migration (Hammerstone → Airflow/NAWS), and better tooling
A day in the life
You might start the day investigating why a cross-account LakeFormation permission is blocking a QuickSight data source, then write a CDK construct so the same misconfiguration can't happen again. Review a CR from a teammate building a Lambda for automated QuickSight group provisioning. Pair with a DE to figure out why their Airflow DAG can't reach a Glue catalog in another account. After lunch, design the infrastructure for a new Bedrock-powered feature in Langley, or write a runbook for something you've seen break twice.
The through-line: you build systems that scale through automation, not through you personally doing things. When something breaks, you fix it and then fix the system. You're always asking "how do I make this self-service so I'm not the bottleneck?"