AI Governance Leaderboard: We Scanned 21 Top Repos Before RSA 2026
AI Governance Leaderboard: We Scanned 21 Top Repos Before RSA 2026
RSA Conference 2026 starts March 23. Every AI security vendor will be on stage talking about governance, compliance, and responsible AI. We wanted to see what governance actually looks like in the repos people are shipping.
So we scanned 21 of the most popular AI/ML repositories using the same governance scanner anyone can run for free. No manual review. No subjective scoring. Just structural analysis of what each repo enforces automatically.
The results are not great.
The Numbers
- 21 repos scanned across AI agent frameworks, ML libraries, web frameworks, and AI SDKs
- Average score: 53/100 (grade C)
- Only 2 repos (10%) score 70+ and are on track for EU AI Act readiness
- 6 repos (29%) have any AI governance configuration (CLAUDE.md or .cursorrules)
- 1 repo scored an F
View the full interactive leaderboard
Top 5
| Rank | Repository | Score | Grade | EU AI Act |
|---|---|---|---|---|
| 1 | vllm-project/vllm | 78 | B | On track |
| 2 | BerriAI/litellm | 72 | B | On track |
| 3 | Significant-Gravitas/AutoGPT | 68 | B | Gaps identified |
| 4 | fastapi/fastapi | 62 | B | Gaps identified |
| 5 | langchain-ai/langchain | 61 | B | Gaps identified |
vLLM leads the pack at 78/100 with pre-commit hooks, 7 CI/CD workflows, a security policy, and Dependabot. Its one critical finding: 2 .env files committed to source control.
Bottom 3
| Rank | Repository | Score | Grade | EU AI Act |
|---|---|---|---|---|
| 19 | ollama/ollama | 36 | D | Not ready |
| 20 | microsoft/autogen | 30 | D | Not ready |
| 21 | yoheinakajima/babyagi | 17 | F | Not ready |
BabyAGI's 17/100 is the lowest score in the set. No CI/CD pipeline, no enforcement hooks, no security policy, no governance config. It scores points only for having a test directory and basic project hygiene.
The Pattern: CI/CD Without Enforcement
The most striking finding across all 21 repos: nearly every project has CI/CD, but almost none enforce rules structurally.
Most repos scored 15/15 on CI/CD. They have GitHub Actions. They run tests in the pipeline. That part of modern software development is well-adopted.
But enforcement -- pre-commit hooks, commit-lint, CODEOWNERS, branch protection -- averages only 11/30 across all repos. This is the gap. Rules exist in documentation but are not structurally enforced before code enters the pipeline.
This is exactly what we call the "detection gap" in the enforcement ladder framework. You can detect violations in CI, but by then the code is already committed. Structural enforcement catches problems before they enter the system.
AI Governance Is Nearly Absent
Only 6 of 21 repos (29%) have any AI governance configuration -- a CLAUDE.md file or .cursorrules. This means that in 71% of the most popular AI/ML repos, AI coding tools operate with zero structural guidance.
When a developer uses Cursor, Claude Code, or GitHub Copilot on these repos, the AI has no project-specific rules to follow. No constraints on what it can modify. No enforced patterns. The governance score for these repos on this dimension: 0/15.
The repos that do have governance configs: vLLM, LiteLLM, AutoGPT, LangChain, Transformers, and LocalAI.
What the Scores Mean
Our scanner evaluates 6 dimensions (100 points total):
- Enforcement (30 pts): Pre-commit hooks, commit-lint, CODEOWNERS, branch protection
- CI/CD (15 pts): GitHub Actions, Travis CI, CircleCI workflows
- Security (20 pts): Security policy, .gitignore, no committed .env files, Dependabot/Renovate
- Testing (10 pts): Test configuration files, test directories
- Governance (15 pts): CLAUDE.md, .cursorrules, governance directories
- Hygiene (10 pts): README, CONTRIBUTING, LICENSE, CHANGELOG, lockfiles
Grades: A (80+), B (60-79), C (40-59), D (20-39), F (below 20).
Category Breakdown
AI Agent Frameworks (8 repos, avg 47/100)
The agent frameworks -- the repos building autonomous AI systems -- scored the lowest as a category. AutoGPT leads at 68, but BabyAGI (17), Autogen (30), and SuperAGI (41) drag the average down. These are the repos building systems that make autonomous decisions, and they have the least governance infrastructure.
ML Libraries (3 repos, avg 62/100)
vLLM (78) lifts this category. scikit-learn and Transformers both score 54 -- solid CI/CD and testing, but weak on enforcement and governance.
Web Frameworks (3 repos, avg 58/100)
FastAPI (62), Pydantic (59), Django (54). These established projects have mature CI/CD but mostly lack AI governance configs and full enforcement tooling.
AI SDKs (4 repos, avg 56/100)
The Anthropic SDK (55), OpenAI SDK (53), LlamaIndex (58), and DSPy (56) cluster tightly in the C range. The Anthropic SDK notably has no pre-commit hooks despite being from the company that makes Claude.
Local AI / Inference (3 repos, avg 53/100)
LiteLLM (72) stands out. Ollama (36) is the weakest -- no enforcement hooks, no test infrastructure detected, and no governance config.
Methodology
All scans were run on March 16, 2026 using the Walseth AI Governance Scanner -- the same tool available for free at walseth.ai/scan. Scores are point-in-time snapshots based on the default branch at scan time.
The scanner analyzes the file tree of each repository via the GitHub API. It checks for the presence of specific files and directories that indicate structural governance. It does not read file contents beyond filenames and paths.
Repos that fail to scan (private, rate-limited, or not found) are excluded. All 21 repos in this leaderboard scanned successfully.
What Would It Take to Score an A?
No repo in this scan scored an A (80+). To get there, a project would need:
- Pre-commit hooks AND commit-lint AND CODEOWNERS (25/30 enforcement)
- 3+ CI/CD workflows (15/15)
- Security policy + Dependabot + no committed .env files (17-20/20)
- Test config + test directories (10/10)
- CLAUDE.md or .cursorrules + governance directory (15/15)
- README + CONTRIBUTING + LICENSE + lockfile (8-10/10)
The tooling exists. The patterns are well-understood. Most projects just have not prioritized structural enforcement alongside their CI/CD pipelines.
Scan Your Own Repo
Every score in this leaderboard was generated by the same free scanner you can run right now:
Scan your repo free at walseth.ai/scan
Want a deeper analysis? Our $497 Full Governance Report covers 30+ dimensions with specific remediation steps and a compliance roadmap.
View the full interactive leaderboard with sortable columns
Last scanned: March 16, 2026. Scores are point-in-time snapshots. Run the scanner to get the latest score for any repo.
Run our open-source governance scanner on any public repository. Six dimensions scored, instant results, no signup required.
Try the Free Governance ScannerGet AI Governance Insights
Practical takes on enforcement automation and EU AI Act readiness. No spam.
Related Articles
Mapping the Enforcement Ladder to NIST AI RMF: A Compliance Crosswalk
NIST AI Risk Management Framework defines four functions: Govern, Map, Measure, Manage. Here is how structural enforcement maps to each function -- with a concrete crosswalk table for compliance teams.
11 min readAI Coding Agents Need Enforcement Ladders, Not More Prompts
75% of AI coding models introduce regressions on sustained maintenance. The fix is not better prompts -- it is structural enforcement at five levels, from conversation to pre-commit hooks.
4 min readYour AI Agent Forgets Its Rules Every 45 Minutes — Here's the Fix
Every long-running AI agent hits context compression. Your system prompts, project rules, and behavioral constraints get silently dropped. Here's a production-proven hook that flushes critical knowledge to persistent storage before compression hits.
5 min readFramework Governance Scores
See how major AI/ML frameworks score on enforcement posture, context hygiene, and EU AI Act readiness.
Want to know where your AI governance stands?
Get a Free Governance Audit