H1 US Remote Full-time

At H1, we believe access to the best healthcare information is a basic human right. Our mission is to provide a platform that can optimally inform every doctor interaction globally. This promotes health equity and builds needed trust in healthcare systems. To accomplish this our teams harness the power of data and AI-technology to unlock groundbreaking medical insights and convert those insights into action that result in optimal patient outcomes and accelerates an equitable and inclusive drug development lifecycle.  Visit h1.co to learn more about us.
Data Engineering has teams which are responsible for collecting, curating, normalizing and matching data from hundreds of disparate sources from around the globe. Data sources include scientific publications, clinical trials, conference presentations and claims among others. In addition to developing the necessary data pipelines to keep every piece of information updated in real-time and provide the users with relevant insights, the teams are also building automated, scalable and low-latency systems for the recognition and linking of various types of entities, such as linking researchers and physicians to their scholarly research and clinical trials. As we rapidly expand the markets we serve and the breadth and depth of data we want to collect for our customers, the team must grow and scale to meet that demand.
As a Senior Big Data Engineer at H1, you will play a crucial role analyzing vast amounts of data, writing production grade pipelines and infrastructure optimization using PySpark and Hudi. You’ll manage projects across all stages including application deployment on a Kubernetes (k8s) based data platform to ensure efficient and scalable management of data-intensive workloads and storage. 
You will:
– Be responsible for product features related to data transformations, enrichment, and analytics.
– Work closely with internal stakeholders, gathering requirements, delivering solutions, while effectively communicating progress and tracking tasks to meet project timelines. 
– Act as a subject matter expert with our data, navigating challenges with a business analytics mindset, seizing every opportunity to refactor code in the interests of maintainability and reusability.
– You’ll provide L2 support for escalated issues as needed.
– Work with various tools including generative AI and open-source libraries, to standardize data and implement scalable distributed pipelines. 
– You’ll develop efficient distributed algorithms for processing and joining large datasets, as well as writing clean, modular code that is easy to maintain. 
– Additionally, you will lead the creation of proof of concepts (POCs) and drive projects to production-ready solutions, contributing to internal process improvements and infrastructure scalability.
You possess robust hands-on technical expertise encompassing both conventional and non-conventional ETL methodologies, alongside proficiency in T-SQL and Spark-SQL. Your skill set includes mastery of multiple programming languages such as Python (PySpark), Java, or Scala, as well as adeptness in streaming and other advanced data processing techniques. As a self-starter, you excel in managing projects across all stages, from requirement gathering and design to coding, testing, implementation, and ongoing support. Your proactive approach and diverse skill set make you an invaluable asset in driving innovation and delivering impactful solutions within our dynamic data engineering team.
-5+ years of experience in working with strong engineering teams and deploying products, 
– Strong coding skills in Python (PySpark), Java, Scala or any proficient language of choice and stacks supporting large scale data processing and/or machine learning.
– Experience with Docker/Kubernetes.
– Strong grasp of computer science fundamentals: data structures, algorithmic trade-offs, etc.
– Strong knowledge and understanding of concepts in machine learning is desirable.
– Experience in utilizing LLM models is a plus.
– Should be willing to manage projects through all the stages (requirements, design, coding, testing, implementation, and support).
– Ability to write clean, modular data processing code that is easy to maintain.
– working and maintaining cloud infrastructure with terraform is a plus
Not meeting all the requirements but still feel like you’d be a great fit? Tell us how you can contribute to our team in a cover letter!
This role pays $140,000 to $170,000 per year, based on experience, in addition to stock options.
Anticipated role close date: 04/29/2024