truelogic Remote, anywhere in LATAM Full-time 2022-07-27

Project Description

HR software for the public sector, is currently seeking a remote contract
Data Scientist located in the US or Canada time zones. The Data Scientist will work closely with
a lead Data Scientist on the Business Intelligence team and play a pivotal role in augmenting
product functionality by processing and categorizing user inputted text.

This is a text processing problem. We need to analyze jobs and candidate resumes, categorize them, find common terms and determine the level of compatibility between applicant and a job.

Responsibilities

  • Preprocess and clean noisy user-inputted, unstructured text
  • Apply pre-trained language models or train new models for text embedding
  • Assess quality of labeled training set to establish ground truth
  • Train classification models for predicting predefined categories
  • Create new subcategories where needed using clustering or similarity matching methods
  • Assess quality of model classification and make iterative improvements throughout the
    entire modeling pipeline

 

Requirements

  • 3+ years of Python experience, including pandas, NumPy, and scikit-learn

  • 2+ years of Natural Language Processing (NLP), specifically various text preprocessing,
    embedding, and feature engineering techniques

  • Knowledge of pre-trained language models

  • Experience with classification model (SVM, Random Forest) and/or clustering models
    (hierarchical clustering, k-means, KNN)

  • Experience data mining public data to enrich training set through APIs / web scrapers

  • Experience collaborating on analytics, data science, machine learning, or NLP projects

  • Experience working with large datasets

  • Experience contributing to predictive models deployed into production