Post Jobs

Senior Site Reliability Engineer (SRE), Software Factory

Full Time
  • Full Time
  • Pittsburgh, PA and Palo Alto, CA and Remote
  • argo profile

argo

Who we are:

Argo AI is in the business of building self-driving technology you can trust. With experienced leaders in the field and collaborative partnerships with some of the world’s largest automakers, we’re building self-driving technology that is engineered to scale globally and transform mobility for millions. 

Talented individuals join our team because they share our purpose to make it safe, easy, and enjoyable for everyone to get around cities. We aspire to impact key industries that move people and goods, from ride hailing to deliveries.

Meet the team:

As the foundation of Argo AI engineering, the Software Factory team is poised to build the most productive engineering team in the world with an engineering system that scales. We build and develop the build system backend and frontend, infrastructure, CI and tooling that touches all phases and all aspects of software development: from source management to deploying to individual AVs roaming around different cities, from distributed building/testing to orchestrating container based integration tests in the cloud, from customizing and installing OSs and firmware to building CLI and web based applications.

As a Site Reliability Engineer on the team, you will be responsible for helping to build and run these mission critical systems.  Through the implementation of monitoring and automation, you will constantly ensure the health, reliability, scalability, and performance of the service.

What you’ll do: 

  • Design and implement scalable distributed systems to facilitate the development of self-driving vehicles
  • Monitor and maintain mission-critical production services to ensure maximum uptime
  • Contribute to our build/test/CI/CD system, the largest distributed system at Argo AI
  • Write tools to provide fast, robust builds and tests across our entire tech stack
  • Document actions to build a comprehensive library of runbooks, which will act as a knowledge base and foundation for automation
  • Scale the reliability and velocity of our systems and processes through increased automation
  • Participate in an on-call rotation and culture of continuous improvement through blameless postmortems

What you'll need to succeed:

  • Degree in Computer Engineering, Computer Science, Electrical Engineering, Robotics or a related field
  • Fundamental understanding of Linux operating system internals, TCP/IP networking, and storage subsystems
  • Track record of scaling and securing services in the cloud (AWS, GCP) or cloud native environments
  • Driven to leverage infrastructure-as-code principles to automate the creation of infrastructure resources (e.g. Terraform, CloudFormation)
  • Professional experience in at least one of Python, C/C++, Golang, Java, Rust
  • Enthusiasm about build tools (Bazel, BuildBarn, Cargo ) and software quality (unit testing, SCA, test-automation)
  • Experience working with modern build and CI systems (Jenkins, Buildkite, TeamCity, GitLab)
  • Experience operating GPU and spot Kubernetes clusters at scale
  • Understanding of engineering design limitations and ability to provide guidance to teams to scale their services to achieve desired performance within budget
  • A focus on increasing service reliability through defining and adhering to SLOs.
  • Strong communication skills and the ability to work effectively in a diverse and distributed team

Nice to have (optional):

  • Experience with OS(Linux) kernel or driver
  • Experience with embedded systems (Yocto, PetaLinux) or firmware development

What we offer you:

  • High-quality individual and family medical, dental, and vision insurance
  • Competitive compensation packages
  • Employer-matched 401(k) retirement plan with immediate vesting
  • Employer-paid group term life insurance and the option to elect voluntary life insurance 
  • Paid parental leave 
  • Paid medical leave
  • Unlimited vacation
  • Complimentary daily lunches, beverages, and snacks
  • Pre-tax commuter benefits
  • Monthly wellness stipend 
  • Professional development reimbursement
  • Employee assistance program
  • Discounted programs that include legal services, identity theft protection, pet insurance, and more
  • Company and team bonding outlets: employee resource groups, quarterly team activity stipend, and wellness initiatives

Our Background:

Argo AI was founded in late 2016 by industry experts with extensive experience building robotic systems for commercial applications. Our once-small team has since grown into an over 1,000-person strong company with strategic partnerships with two of the world’s leading automakers: Ford and Volkswagen. Our self-driving system is the first with commercial deployment plans for Europe and the U.S., and thanks to an ability to tap into both automakers’ global reach, our technology platform has the largest geographic deployment potential of any self-driving technology to date.

At Argo AI, we believe that embracing differences delivers superior results. We are an equal opportunity employer that is committed to an inclusive environment for all employees.

Let us apply for you (from 10$)

To apply for this job please visit boards.greenhouse.io.