Skip to main content

Principal Data Scientist

Full-Time

Nashville, Tenn., or Remote

Apply Using Form Below

 

Overview:

Do you want to be a part of something purposeful, something bigger than yourself? Are you looking to apply all your data science experience and insights to solve meaningful problems? If so, you may be the person we’re looking for to join Azra AI on our mission to advance healthcare through clinical intelligence and automation.

At Azra AI, we empower health systems to improve clinical workflows by analyzing pathology and radiology reports in real-time and identifying cancer or other diagnoses. These reports are delivered to clinicians in an easy-to-use workflow tool that enables patients to get care quickly while allowing clinicians to focus on what they do best: caring for patients.

As the Principal Data Scientist, you will lead the development of advanced AI models that address key healthcare challenges. This role is ideal for someone experienced in end-to-end data science, from data acquisition to model validation and deployment. You will work with cross-functional teams to develop models that directly impact patient care, focusing on oncology, cardiology, neurology, radiology, and pathology applications.

Your Adventure at Azra AI:

In this role, you will have the opportunity to utilize your expertise in data science to address critical healthcare challenges. From processing unstructured clinical data to building predictive AI models, you’ll play a key role in advancing healthcare through innovation. You will also be involved in analyzing radiology images and integrating AI-driven image analysis techniques to enhance clinical outcomes. Working directly with the Chief Data Scientist, you’ll collaborate with various teams to drive impactful solutions. If you thrive in an environment where you can be hands-on, solve complex problems, and collaborate to achieve meaningful results, this role is for you.

Key Responsibilities:

  • Lead End-to-End Data Science Process: Oversee all stages of the data science lifecycle, from data acquisition, cleansing, and preparation to exploratory data analysis (EDA), feature engineering, model selection, and validation for healthcare use cases.
  • Model Development: Design and build predictive models using NLP and Transformer models for analyzing unstructured healthcare data, such as pathology reports, radiology images, and clinical notes, to support fast, data-driven decisions in oncology, cardiology, neurology, and radiology.
  • AI Image Analysis: Leverage AI techniques for analyzing radiology images (e.g., X-rays, MRIs, CT scans) to detect abnormalities and assist clinicians in diagnostic decisions, integrating these insights into the clinical workflow.
  • Data Transformation: Prepare and transform large, complex datasets, including millions of clinical reports and medical images, for efficient and scalable model development and deployment.
  • Feature Engineering and Optimization: Develop innovative feature extraction techniques to optimize model inputs and improve predictive accuracy for healthcare applications.
  • Model Validation & Verification: Ensure the robustness, accuracy, and clinical relevance of models through rigorous validation and verification protocols.
  • Collaboration Across Teams: Work closely with product, engineering, and clinical teams to integrate AI models and image analysis tools into the overall healthcare workflow, ensuring that solutions are actionable and scalable in real-world environments.
  • Proactive QA and Data Integrity Checks: Implement QA processes for ORU and ADT data during customer implementations, ensuring data accuracy and minimizing errors such as incorrect medical record numbers (MRNs).
  • Cloud Computing and Deployment: Leverage cloud platforms (GCP) to scale and deploy models securely in healthcare settings, maintaining high performance and compliance with regulatory standards.

Qualifications:

  • Master’s or Ph.D. in Data Science, Computer Science, Statistics, or a related field or equivalent experience.
  • Experience: 8+ years of experience in data science, machine learning, and AI, including applied experience in healthcare environments.
  • Technical Skills: Expertise in Python (pandas, numpy, scikit-learn, PyTorch) and hands-on experience with NLP and Transformer models, including Hugging Face and spaCy.
  • Cloud Proficiency: Experience deploying models in cloud environments (GCP or AWS), with an understanding of best practices in scaling AI solutions.
  • Collaborative Mindset: Ability to work cross-functionally with diverse teams, ensuring that AI models and image analysis tools integrate seamlessly into clinical workflows.
  • Machine Learning Expertise: Strong knowledge of machine learning techniques, statistical analysis, and training robust and scalable models.
  • Healthcare Compliance: Familiarity with healthcare regulations such as HIPAA and best practices for handling sensitive patient data.
  • Deployment: Familiarity with Docker for model deployment.

Preferred Qualifications:

  • Hands-on experience with AI-driven image analysis techniques, working with radiology images (e.g., X-rays, MRIs, CT scans) to develop and deploy image classification or segmentation models.
  • Strong knowledge of healthcare data systems, such as EHR/EMR, pathology, radiology, and lab reports, and experience with real-world healthcare applications.
  • Experience with feature engineering and data validation for real-world clinical applications.

Technologies Utilized:

  • Languages/Frameworks: Python (pandas, numpy, scikit-learn, spaCy, PyTorch, Hugging Face)
  • Cloud Services: GCP
  • Version Control: Git/GitLab
  • Databases: PostgreSQL, MySQL
  • Deployment Tools: Docker
  • ML Models: Transformers (NLP models), AI-driven image analysis models
  • Dashboards/Analytics: Streamlit