Find your next role
Discover amazing opportunities across our network of companies committed to gender equality in the workplace.
IBM
At IBM Infrastructure & Technology, we design and operate the systems that keep the world running. From high-resiliency mainframes and hybrid cloud platforms to networking, automation, and site reliability. Our teams ensure the performance, security, and scalability that clients and industries depend on every day. Working in Infrastructure & Technology means tackling complex challenges with curiosity and collaboration. You’ll work with diverse technologies and colleagues worldwide to deliver resilient, future-ready solutions that power innovation. With continuous learning, career growth, and a supportive culture, IBM provides the opportunities to build expertise and shape the infrastructure that drives progress.
Site Reliability engineers apply Software Engineering principles to perform infrastructure management tasks more eHiciently. They are focused on reliability and resiliency, and build systems which proactively detect issues before they cause customer impact. They are responsible for maintaining a high-performance, secure, and stable infrastructure for our clients.
Additionally, SREs resolve customer issues and problems detected through monitoring. They participate in datacenter build and configuration activities, performing tests, and deploy new features and capacity.
As a Site Reliability Engineer, you will work in an agile, collaborative environment to build, deploy, configure, and maintain systems for the IBM client business. In this role, you will lead the problem resolution process for our clients, from analysis and troubleshooting, to deploying the latest software updates & fixes.
Site Reliability Engineering (SRE) professionals are engineers who specialize in reliability and resiliency with the right mix of knowledge and skills in software and systems, responsible to analyze business needs, problem determination, advise & design, build, test, deploy, changes and maintenance of a well-engineered information system and ecosystems.
Responsibilities:
As a compute Operations Site Reliability Engineer, working in US Shift timing, you perform the following tasks:
• Monitor provisioning tests and investigate/resolve any failures
• Perform code stack updates on infrastructure systems (VIOS, firmware, PowerVC, HMC, Novalink, NIM servers) as well as cloud supporting systems (jump servers, sobox, network nodes, gateways, TSM servers)
• Upload/maintain stock images
• Maintain UserIDs(Add/delete) and passwords
• Monitor daily/weekly backups to ensure they are working
• Manage and maintain Nagios monitoring environment, troubleshoot scripts/plug-ins if there is an issue
• Perform periodic LPMs, inactive migrations, or remote restarts of customer VMs to perform system maintenance, balance workloads, or free up resources
• Monitor and provide details of Capacity utilized in each Data enter
• Attend scheduled meetings planned by customer for cutover/maintenance windows
• Verify capacity requirements in case of provisioning failure issues by customers
• Work with customers to resolve any RSCT issues so that LPM activities can be performed without impacting customer workloads.
The candidate should be willing to work in US shift timings.
Relevant Industry work experience of 5-7years
• In-depth knowledge of Power server HW (Models, I/O Adapters etc)
• HMC knowledge and experience operating
• In-depth knowledge of PowerVM including installation/configuration and operating
• Experience with PowerVC including installation/configuration and operating
• Experience with Linux administration, commands and networking
• Knowledge of Nova Link including minimal installation/configuration
• High level knowledge of Power Systems supported Operating Systems (AIX and IBM)
• In-depth knowledge of how storage is connected and allocated to Power systems via NPIV connections
• Good understanding of Power Systems network configuration at the system level
• Experience with configuring and tuning PowerVS
• Experience training new personnel on tooling and processes
• Storage & Power RTS, MVS Network for Cisco, Juniper; general support skills
In a world where technology never stands still, we understand that, dedication to our clients success, innovation that matters, and trust and personal responsibility in all our relationships, lives in what we do as IBMers as we strive to be the catalyst that makes the world work better.
Being an IBMer means you’ll be able to learn and develop yourself and your career, you’ll be encouraged to be courageous and experiment everyday, all whilst having continuous trust and support in an environment where everyone can thrive whatever their personal or professional background.
Our IBMers are growth minded, always staying curious, open to feedback and learning new information and skills to constantly transform themselves and our company. They are trusted to provide on-going feedback to help other IBMers grow, as well as collaborate with colleagues keeping in mind a team focused approach to include different perspectives to drive exceptional outcomes for our customers. The courage our IBMers have to make critical decisions everyday is essential to IBM becoming the catalyst for progress, always embracing challenges with resources they have to hand, a can-do attitude and always striving for an outcome focused approach within everything that they do.
Are you ready to be an IBMer?
IBM’s greatest invention is the IBMer. We believe that through the application of intelligence, reason and science, we can improve business, society and the human condition, bringing the power of an open hybrid cloud and AI strategy to life for our clients and partners around the world.
Restlessly reinventing since 1911, we are not only one of the largest corporate organizations in the world, we’re also one of the biggest technology and consulting employers, with many of the Fortune 500 companies relying on the IBM Cloud to run their business.
At IBM, we pride ourselves on being an early adopter of artificial intelligence, quantum computing and blockchain. Now it’s time for you to join us on our journey to being a responsible technology innovator and a force for good in the world.
IBM is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, genetics, pregnancy, disability, neurodivergence, age, or other characteristics protected by the applicable law. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.
When applying to jobs of your interest, we recommend that you do so for those that match your experience and expertise. Our recruiters advise that you apply to not more than 3 roles in a year for the best candidate experience. For additional information about location requirements, please discuss with the recruiter following submission of your application.