At Scribd (pronounced “scribbed”), we believe reading is more important than ever. Join our cast of characters as we work to change the way the world reads by building the world’s largest and most fascinating digital library: giving subscribers access to a growing collection of ebooks, audiobooks, magazines, documents, Scribd Originals, and more. In addition to works from major publishers and top authors, our community includes over 1.9 M subscribers in nearly every country worldwide.
Have you heard about our future of work program, Scribd Flex? As a key principle, we embrace flexibility and allow employees, in partnership with their manager, to choose the work-style that best suits their individual needs and preferences. And, we create intentional in-person moments with each other that build culture and connection.
About the Team
Core Infrastructure owns the design, operation, and maintenance of our AWS Cloud infrastructure. As an infrastructure engineer, you will be part of implementing solutions to support the continued growth of Scribd. You will be part of the team which manages our existing AWS Cloud Environment while helping service owners adjust to a new AWS-centric model. You will help in this shift from a traditional operations organization into a services organization that provides key components to our backend technology stack such as container orchestration infrastructure, logging services, monitoring and alerting patterns, caching layers, and relational/non-relational clustered data storage. You and your team will educate developers and help delegate traditional operational responsibilities to teams that are already taking an increased level of ownership of their production environment. Sharing your experience and good judgment will be crucial to helping these teams scale their services operationally for years to come.
About the Role
- Scribd is searching for a Site Reliability Engineer to join the Core Infrastructure team and work on the foundations of Scribd as we take our platform into the future. Each day provides both opportunities to perform and opportunities to learn.
- We recognize that everyone has a unique set of work and life experiences, and believe that a broader set of perspectives will produce better results for all. We continually strive for inclusivity and strongly value diversity. We support others’ growth and celebrate our collective achievements.
- You will help build and operate a modern service-oriented AWS infrastructure and maintain uptime for a product used by 250+ million people every month.
- Your impact will influence the future and direction of our infrastructure to improve observability into the health of our services
- 5+ years of production infrastructure engineering experience developing and/or operating web applications in a major cloud provider (AWS, GCP, Azure)
- A good understanding of large-scale distributed systems in practice, including multi-tier architectures, application security, monitoring, and storage systems
- Mentoring skills: experience with training and educating teammates or colleagues on contemporary best practices
- Ability to lead deep technical design discussions within your team, and across partner teams
- You have experience with Docker, AWS ECS, and/or Kubernetes
- Strong written and verbal communication skills (we’re a remote team!)
- Experience with infrastructure as code tools (Terraform etc) and thoughts about their respective strengths and weaknesses
- Working knowledge of the TCP/IP stack, routing, and load balancing.
- Ability to read and write code in one or more languages including Go, Ruby, or Python (and Bash of course!)
- Software development background
- Familiarity with git and common software development practices
- Ability to write tests
- Capable of reading and understanding the code in order to participate in the code review process.
- Infrastructure engineering passion for building predictable, fault-tolerant distributed systems
- A strong understanding of AWS platform services and their strengths/weaknesses
- Eagerness to learn new technologies and ability to pick up things quickly and put them into use.
- An interest in service-oriented architectures and topics such as availability, eventual consistency, fault tolerance, and scalability.
- Knowledge of database design best practices for high throughput applications and experience writing SQL queries
- Experience working to improve 24/7 on-call rotations, reduce alert fatigue, and improve automation