Lead Machine Learning Engineer | Generative AI | LLM | NLP

Developed a prompt testing framework for LLM models to check prompt quality by adding multiple system prompts (using chain-of-thought, few-shot etc.), and then compare each system prompt based on the model-generated answer quality. Quality of answers can be measured using NLP metrics such as ROUGE, BLEU, or BERTScore and Responsible AI metrics such as Faithfulness, Answer Relevancy Score, Harmfulness etc.

Developed a prompt testing framework for LLM models to check

Prompt Testing framework for LLM models

Developed a prompt testing framework for LLM models to check