ML and Prompt Engineer

Full-Time

New Delhi

Job Overview

As an ML and Prompt QA Engineer, you will be responsible for testing, validating, and optimizing machine learning models and prompt-based AI systems to ensure accuracy, performance, and consistency. You will work closely with data scientists, prompt engineers, and product teams to establish robust testing frameworks and develop strategies to maintain high-quality outputs across various use cases.

Key Responsibilities

Testing ML Models: Conduct rigorous testing of ML models to validate their performance, accuracy, and stability across diverse datasets.

Prompt QA: Review, test, and refine prompts used in prompt-based AI systems (e.g., LLMs) to ensure high-quality responses and reduce inconsistencies.

Automation and Framework Development: Build and maintain automated testing frameworks and pipelines for continuous integration (CI) and deployment (CD) of ML models.

Error Analysis and Debugging: Perform root cause analysis on model failures, bias detection, and prompt inaccuracies, and recommend solutions to enhance outputs.

Data Validation: Ensure data integrity by validating datasets used in training and testing phases. Identify and report data anomalies or inconsistencies.

Collaboration with Engineers: Work alongside machine learning engineers, prompt engineers, and software developers to communicate test findings and improve model quality.

Performance Monitoring: Monitor deployed ML models and prompt systems for real-time performance issues and degradation, implementing measures to rectify them quickly.

Documentation: Maintain thorough and up-to-date documentation on test plans, methodologies, results, and improvements.

Requirements

Educational Background: Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or a related field.

Experience:

2+ years of experience in QA/testing for machine learning systems or AI-driven applications.

Hands-on experience with prompt engineering and testing AI language models (e.g., GPT, BERT).

Technical Skills:

Proficiency in Python and testing frameworks like PyTest or similar.

Familiarity with ML libraries (e.g., TensorFlow, PyTorch, Scikit-learn).

Experience with prompt testing and refinement in LLM-based systems (e.g., OpenAI, Anthropic).

Strong understanding of QA processes, CI/CD tools (e.g., Jenkins, GitLab), and automation testing.

Problem-Solving Skills: Strong analytical and debugging skills, with a keen eye for detail and accuracy.

Communication Skills: Excellent verbal and written communication skills, capable of translating technical findings into actionable insights for diverse stakeholders.

Preferred Qualifications

Knowledge of data labeling techniques and managing large datasets.

Experience in bias detection and mitigation in AI systems.

Familiarity with NLP (Natural Language Processing) concepts and techniques.

Experience with cloud platforms such as AWS, GCP, or Azure for model deployment and testing.

Knowledge of regulatory requirements for AI and machine learning systems (e.g., GDPR, AI ethics).