Find your next role
Discover amazing opportunities across our network of companies committed to gender equality in the workplace.
Amazon
• Admin for a small Redshift cluster
• Create + manage basic glue jobs that make structured data in S3 accessible via Athena, the Redshift cluster.
• Leverage Glue (or other appropriate tooling) to develop better training data pipelines
• Handle security and admin work for the account, particularly interfacing with internal corporate tools in a compliant manner
• Improve AWS Batch set-up. We use Batch for running model jobs, but I doubt our current set-up is ideal
• Work with scientists to improve training infrastructure. See last bullet to a degree; we don’t leverage Sagemaker to the full extent we could, and would be interested in improving on that front
• Work with scientists to deploy models. Potentially. We don’t know if we’ll be doing our own deployments, but if we do collaboration with scientists on setting up API end-points for external model access would be a value-add
Key job responsibilities
• Comfort, or at least familiarity, with S3, Glue, Athena, Redshift, IAM/Secrets Manager, EC2 + security configs, etc.
• Some familiarity with Quicksight
• Basics of DB (Redshift) management, best practices
• Comfort, or at least familiarity, with PySpark
o Optimization of highly distributed Spark SQL jobs may well come up
o Some experience running Spark jobs on distributed clusters might be helpful. We have internal tools that do this, but understanding how to leverage them better would be a value-add.
• SQL
• Python (basics)
• Data pipeline management
• Ideally comfortable with Amazon internal tooling (internal candidates only, obviously)
o Cradle
o DataCentral ecosystem (managing paths between team data on S3, semi-private Redshift to Andes for data we want to make public)
o Quicksight
o Has handled internal AWS stuff before
- 3+ years of data engineering experience
- Experience in at least one modern scripting or programming language, such as Python, Java, Scala, or NodeJS
- Knowledge of batch and streaming data architectures like Kafka, Kinesis, Flink, Storm, Beam
- Experience with AWS technologies like Redshift, S3, AWS Glue, EMR, Kinesis, FireHose, Lambda, and IAM roles and permissions
- Experience with non-relational databases / data stores (object storage, document or key-value stores, graph databases, column-family databases)
- Experience with data modeling, warehousing and building ETL pipelines
Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.
Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $118,900/year in our lowest geographic market up to $205,600/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, please visit https://www.aboutamazon.com/workplace/employee-benefits. This position will remain posted until filled. Applicants should apply via our internal or external career site.