azure-ai-evaluation-py

DevOps & Cloud
v0.1.0
Benign

Azure AI Evaluation SDK for Python.

11.7K downloads1.7K installsby @thegovind

Setup & Installation

Install command

clawhub install thegovind/azure-ai-evaluation-py

If the CLI is not installed:

Install command

npx clawhub@latest install thegovind/azure-ai-evaluation-py

Or install with OpenClaw CLI:

Install command

openclaw skills install thegovind/azure-ai-evaluation-py

or paste the repo link into your assistant's chat

Install command

https://github.com/openclaw/skills/tree/main/skills/thegovind/azure-ai-evaluation-py

What This Skill Does

Python SDK for evaluating generative AI applications using Azure OpenAI. Supports quality metrics (groundedness, relevance, coherence), NLP-based scoring (F1, BLEU, ROUGE), and safety evaluations (violence, hate, self-harm). Results can be logged to Azure AI Foundry for tracking across runs.

Combines quality, NLP, and safety evaluators in one SDK with direct Azure AI Foundry integration, eliminating the need to wire together separate scoring libraries.

When to Use It

  • Scoring RAG pipeline responses for groundedness against source documents
  • Running safety checks on chatbot outputs before production deployment
  • Batch evaluating a dataset of query/response pairs with multiple metrics
  • Logging evaluation runs to Azure AI Foundry for regression tracking
  • Building custom domain-specific evaluators for specialized content

Example Workflow

Here's how your AI assistant might use this skill in practice.

INPUT

User asks: Scoring RAG pipeline responses for groundedness against source documents

AGENT
  1. 1Scoring RAG pipeline responses for groundedness against source documents
  2. 2Running safety checks on chatbot outputs before production deployment
  3. 3Batch evaluating a dataset of query/response pairs with multiple metrics
  4. 4Logging evaluation runs to Azure AI Foundry for regression tracking
  5. 5Building custom domain-specific evaluators for specialized content
OUTPUT
Azure AI Evaluation SDK for Python.

Share this skill

Security Audits

VirusTotalBenign
OpenClawBenign
View full report

These signals reflect official OpenClaw status values. A Suspicious status means the skill should be used with extra caution.

Details

LanguageMarkdown
Last updatedFeb 26, 2026