At Guidewire, we make software that offers Property and Casualty (P&C) Insurance companies the tools to take care of their customers when they need it the most, whether that’s a time of crisis, a natural disaster, an accident, or exposure to cyber risks. We build the core applications that insurance companies use to sell and underwrite policies, settle claims, and bill their customers. We also have a portfolio of innovative products serving the needs of P&C insurance companies in areas such as data management, digital online portals, and predictive analytics. We run these products on the Guidewire Cloud Platform, and we help hundreds of insurance providers all over the world to handle billions of dollars of business.
We are proud to be voted a Top Cloud Employer on Glassdoor by our own employees and positioned as a market leader by industry experts like Gartner. We have a fun work environment and a culture that lives by our core values of integrity, rationality, and collegiality.
We’re searching for people who are as passionate about working together to deliver quality products and support as we are. Join us and enjoy a career where you can make an impact. You’ll be inspired by those around you, and you’ll be trusted and empowered to go further.
As a Site Reliability Engineer, you will be part of a team that is passionately automating everything possible to make Guidewire systems run more efficiently. The Platform team is dedicated full-time to creating and running software that improves the reliability of systems in production, serving hundreds of customers and supporting millions of transactions each day. You will be ensuring the reliability of Guidewire’s flagship cloud platform and InsuranceSuite products and building tooling to help ensure efficient operations and optimal availability of all SaaS multi-tenant and customer-focused systems. Platform SREs collaborate closely with Guidewire’s core product developers to ensure that the Guidewire core cloud products address functional and non-functional requirements such as availability, performance, observability, and maintainability.
This role requires a high degree of collaboration, teamwork, ownership and responsibility. If you like to be challenged and have a passion for solving problems at scale with systems like AWS, Kubernetes and Aurora, then we would love to hear from you. The ideal candidate is someone who exemplifies the ethics of, “If you have to do something more than once, automate it,” and who can rapidly self-educate on new concepts and tools. Bonus points if you have prior experience doing production support of a SaaS platform and are comfortable working with bleeding edge highly containerized cloud-native environments in AWS.
ESSENTIAL DUTIES AND RESPONSIBILITIES
- Collaborate with development and other SRE teams to enhance the reliability and efficiency of microservices applications.
- Engage with product development (PD) teams by participating in design reviews and production readiness checks.
- Collaborate with engineering teams, providing product feedback and where necessary contribute code to the product
- Work closely with cross-functional teams to ensure seamless integration of new features and services. https://aws.amazon.com/blogs/apn/the-6-pillars-of-the-aws-well-architected-framework/
- Analyze data from observability and monitoring tools to improve operational metrics of microservices as well as the entire platform.
- Leverage end-to-end technical expertise gained by engagement with multiple PD teams and analyzing observability data to propose improvements in code and design to improve SLO and prevent incidents.
- Create system documentation and training materials to empower and educate our fellow team members
- Take a purist SRE approach to shared multi-tenant infrastructure for a resilient SaaS microservice-based containerized systems in addition to customer-centric application environments
- Oversee and automate the team’s growing presence in AWS
- Creatively build and develop tooling to aid in driving 24x7x365 follow-the-sun operations of critical production systemsBuild and maintain observability tooling, metrics, and dashboarding for a global platform product infrastructureImprove our incident management lifecycle to identify, mitigate, and learn from reliability risks and issues
- Collaborate with engineering teams, providing product feedback and where necessary contribute code to the product
Education and Work Experience
- Bachelor’s Degree in Computer Science or related field
- Software engineering and task automation skills with Bash, Python, and/or Go are a mustExperience supporting web applications running on Java / Apache / Tomcat in a live production environmentFamiliarity with the Agile software development lifecycle
- Deep background with Linux systems and engineeringHighly experienced with engineering and automating on Amazon Web Services (AWS)
- Prior experience with IaC tools like Terraform/Terragrunt/TerraspacePrior experience with devops/gitops tools (Git, Bitbucket, Flux CD, Teamcity) for gate promotions
- Production-At-Scale support background in a heavily microservice-based worldHands-on engineering and ops expertise in containerization (Docker, Helm, Kubernetes/EKS, CNI and Ingress networking)
- Strong understanding of Single-Sign On, SAML, OAuth (Bonus if hands-on experience with Okta)Seasoned expertise around x.509 certificate technology and basic concepts of encryption
- Experience working with Relational Databases such as Aurora Postgres and/or Oracle RDSAdvanced exposure to application development, web UI (design and development), JSON, application architecture
- Experience strongly utilizing observability tools (logging/APM) like Datadog, CloudWatch, and PagerDuty.
- amiliarity with event store/stream-processing technologies like Kafka or AWS SQSUnderstanding of Open Application Model systems such as KubeVela or Crossplane
Personal Qualities and Soft Skills
- You greatly prefer writing code than clicking a GUI.
- You enjoy teaching, being a mentor to others, and working across boundariesOutstanding troubleshooting skills; ability to think critically and display an aptitude for problem solving
- Strong analytical mind with a penchant for process development and enhancement
- A highly positive can-do attitude with desire for being a team player
- Great communication skills and ability to explain complex technical concepts to a varied audience
- Demonstrate strong follow-through, a strong work ethic and consistently keep and meet commitments
Other Requirements
- Ability to read, write, and speak English
- Ability to speak in public settings, interface with customers, partners and vendors confidently
- Travel – Up to 25% of the job will require travel, approximately a week a month