My job alerts

Staff Site Reliability Engineer, FedRAMP

Illumio

This job is no longer accepting applications

See open jobs at Illumio.See open jobs similar to "Staff Site Reliability Engineer, FedRAMP" Valor Capital Group.

Software Engineering

Sunnyvale, CA, USA

Posted 6+ months ago

Staff Site Reliability Engineer

Hybrid: 3 days in office/week in Sunnyvale, CA

As a Staff Cloud Platform Site Reliability Engineer, you will work on supporting both our legacy infrastructure running on AWS as well as modernization efforts like deploying microservices on Kubernetes environments in AWS, Azure and GCP in the near future.

In this role, you will work on resolving issues and stabilizing our current legacy infrastructure running on AWS and hosting several fortune 500 customers. You will take part in rapid development towards containerization and micro services delivery, developing helm charts and other necessary activities to host services in Kubernetes. You will work on updating our existing tooling and automation, and define our digital transformation within this team.

To thrive in this role, you must have a ‘can-do’ attitude with a solution-oriented approach, and an excitement to solve challenging problems in an environment where requirements are changing rapidly.

About the Team:

Our Engineering team has established a culture based on thought leadership, independence, and responsibility. This powerful dynamic drives us forward as we work to make the digital world a safer place.

Those who join us represent the leader in Zero Trust Segmentation and work on a technology stack that ranges from operating systems to distributed applications to UI and visualization. Together, we will continue to build world-class products—driven by people with different perspectives, backgrounds, and a commitment to innovation in a time when the world faces its greatest cybersecurity threats in history.

The Cloud Operations team at Illumio is working to deploy and manage our SaaS services by reducing human error, aggressively focusing on automation, and providing deep insight into application behavior and health! We do that by incorporating aspects of software engineering and applying them to infrastructure and operations problems to create and manage scalable and reliable distributed software systems.

About the role:

We are looking for a platform or SRE engineer with a demonstrated track record of building secure, large scale, highly available services using Infrastructure automation and Infrastructure as Code, who is well versed in cloud architecture (with a strong focus on Kubernetes).

This engineer will be an essential member of our Operations team, collaborating with the Platform and Data engineers to deliver the latest Illumio products while also participating in on-calls to support our existing legacy infrastructure.

The Cloud Platform SRE Engineer will be responsible for designing and deploying scalable, reliable, and secure cloud infrastructure. This individual must have a thorough understanding and experience with AWS and/or Azure clouds. The platform will be based on Kubernetes and is built using cloud native technologies. The Cloud Platform SRE is responsible for building, operating, and maintaining this platform. They are responsible for defining and meeting Platform SLOs, capacity utilization, cost visibility, security compliance etc. They are highly critical to the success of the Multi-cloud Platform.

Key Responsibilities:

Driving reliability improvements back into applications
Building code to resolve reliability/resiliency issues
Mentor and educate team members to aid in strengthening technical expertise
Collaborate closely with cloud architects to drive cloud solutions
Curating proper SLI/SLOs to accurately measure or assess error budgets
Embed with the development teams to assist with cloud methodologies when developing products to ensure that the deliverable is as reliable as possible
Work with development teams to build and strengthen application security and compliance
Manage high impact situations that involve technically challenging issues across diverse audiences and drive to find the root cause, mitigate, and identify a solution
Focus on observability

Experience:

Bachelors degree or relevant work experience
6+ years of relevant SRE, DevOps, Platform or Infrastructure Engineering experience.
5+ years in production support role in a fast-paced industry/organization
Experience deploying, tuning, and maintaining Linux-based, highly available, fault-tolerant web platforms in public cloud providers such as AWS, Azure, and GCP
Common monitoring, log aggregation, and metrics gathering platforms experience (Icinga, Sensu, Splunk, Telegraf/InfluxDB, et. al.)
Configuration management & orchestration tools experience like Chef, Ansible, and AWS Services & APIs, or equivalent
Experience scripting/coding with Python, Java, Ruby and/or Go.
Experience with MySQL, PostgreSQL, Redis, or similar
Solid knowledge of Linux operating system, Ubuntu, RHEL, OEL7 is required
EKS and/or AKS frameworks
Knowledge/Experience of Incident Management/on-call: PagerDuty
Knowledge of Database Technologies, Release Management, REST, SRE, etc.
Load balancers/ Traffic manager knowledge
Experience working with Kubernetes, Docker, or other virtualization & containerization technologies
Networking basics and trouble shooting skills
Good understanding of Production deployment, Distributed Environments required
Strong problem solving and operational process skills, attention to detail
Application support and debugging experience in a dynamic fast-paced production environment
Experience with SDLC principles, architecture and operations.
Experience working with senior leadership both inside and outside of engineering.
Ability to manage multiple tasks and competing priorities to deliver projects on schedule
Excellent written and verbal communication skills.

About Illumio:

Illumio, the pioneer and market leader of Zero Trust segmentation, prevents breaches from becoming cyber disasters. Illumio protects critical applications and valuable digital assets with proven segmentation technology purpose-built for the Zero Trust security model. Illumio ransomware mitigation and segmentation solutions see risk, isolate attacks, and secure data across cloud-native apps, hybrid and multi-clouds, data centers, and endpoints, enabling the world’s leading organizations to strengthen their cyber resiliency and reduce risk.

Illumio believes that an environment of unique backgrounds, experiences, viewpoints, and individual contributions drives our success and makes us stronger together. We are dedicated to creating and maintaining a diverse culture and emphasizing inclusion and belonging.

Pay Range:

$ 175,000 USD - $ 210,000 USD

The pay range for this job level is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include (but are not limited to) responsibilities of the job, education, location, experience, knowledge, skills, abilities, as well as internal equity, alignment with market data, or applicable laws.

Benefits:

At Illumio we offer a wide range of benefits to our eligible team members. Our benefit programs vary by location and can include Medical, Dental, Vision Coverage – Health and Dependent Savings Accounts – Life and Disability Programs – Paid Parental Leave – Voluntary Benefit Programs – Company Sponsored Wellness Program – Wellness Reimbursement Program - Retirement Savings – Equity Opportunities – Paid time off and Paid Holidays – Employee Incentive Program.#LI-KD1 #LI-HYBRID

This job is no longer accepting applications

See open jobs at Illumio.See open jobs similar to "Staff Site Reliability Engineer, FedRAMP" Valor Capital Group.

See more open positions at Illumio