Ancestry Remote- US Only Full-time

About Ancestry:
When you join Ancestry, you join a human-centered company where every person’s story is important. We believe that by discovering the struggles and triumphs of our past, we can foster deeper bonds and more meaningful connections among families and communities. Our talented team of scientists, engineers, genealogists, historians, and storytellers is dedicated to empowering customers around the world from all backgrounds on their journeys of personal discovery. 
With more than 30+ billion digitized global historical records, 100+ million family trees, and 20+ million people in our growing AncestryDNA database, Ancestry helps customers discover their family story and gain a new level of understanding about their lives. Passionate about dedicating your work to enriching people’s lives? You belong at Ancestry.
We are looking for a Senior Data Engineer to help us build our new customer engagement data and reporting platform. This is being rebuilt from scratch in collaboration with many teams up and down stream. Our team is responsible for helping define the new data sources then pull them in and build associated datasets through both streaming and batch methods as appropriate. This large and exciting initiative at Ancestry is improving reporting and data analytics efficiencies to better enabling key business stakeholders to not only do analytics on it but also be proactive in making the customer experience better.

What you will do:

  • Develop Data Warehousing projects with the Engagement Data Delivery Team
  • Understand Business, Operations and Analytics requirements for data from many sources
  • Develop, deploy, and support real-time automated data streams from numerous sources into the Data lake and data warehouse.
  • Develop and implement data auditing strategies and processes to ensure data accuracy and integrity.
  • Design Data Requirements and work with partner teams to provide access to data
  • Implement technical solutions to maintain ETL processes and troubleshoot extraction failures
  • Mentor fellow team members in your areas of expertise.
  • Work together with product, marketing, and development teams to define data specifications
  • Write clean, well designed, testable, efficient code (SQL and Python)
  • Support ongoing migration of other preexisting work flows to Airflow and EMR.
  • Explore more efficient tools and approaches to data processing.

What you have:

  • Computer science degree (or related discipline)
  • 5 years hands-on Data Engineering, ETL, or warehousing experience
  • Experience designing and developing enterprise grade data pipelines in fast-paced distributed environments (highly scalable, reliable, available)
  • Expert with advanced SQL, data modeling, and database technologies, handling massive data sets (Redshift, MySQL/Aurora, MSSQL or equivalent)
  • Experience with ETL tools (Airflow, EMR, Nifi, SSIS, Pentaho, Databricks, Informatica)
  • Proficient in Python
  • Good communication skills (verbal/written)
  • Have a passion for data and helping the business turn data into information and action

Nice to have:

  • Worked in a cloud based ecosystem (AWS)
  • Experience with distributed databases (Redshift, Snowflake, etc)
  • Prior experience with high volume data flows
  • Deep understanding of the challenges of data processing at scale
  • Strong experience with Python or Spark
  • Experience working with a data lake