Project Description
HR software for the public sector, is currently seeking a remote contract
Data Scientist located in the US or Canada time zones. The Data Scientist will work closely with
a lead Data Scientist on the Business Intelligence team and play a pivotal role in augmenting
product functionality by processing and categorizing user inputted text.
This is a text processing problem. We need to analyze jobs and candidate resumes, categorize them, find common terms and determine the level of compatibility between applicant and a job.
Responsibilities
- Preprocess and clean noisy user-inputted, unstructured text
- Apply pre-trained language models or train new models for text embedding
- Assess quality of labeled training set to establish ground truth
- Train classification models for predicting predefined categories
- Create new subcategories where needed using clustering or similarity matching methods
- Assess quality of model classification and make iterative improvements throughout the
entire modeling pipeline
Requirements
-
3+ years of Python experience, including pandas, NumPy, and scikit-learn
-
2+ years of Natural Language Processing (NLP), specifically various text preprocessing,
embedding, and feature engineering techniques -
Knowledge of pre-trained language models
-
Experience with classification model (SVM, Random Forest) and/or clustering models
(hierarchical clustering, k-means, KNN) -
Experience data mining public data to enrich training set through APIs / web scrapers
-
Experience collaborating on analytics, data science, machine learning, or NLP projects
-
Experience working with large datasets
-
Experience contributing to predictive models deployed into production