AI Governance Leaderboard: We Scanned 21 Top Repos Before RSA 2026
AI Governance Leaderboard: We Scanned 21 Top Repos Before RSA 2026
RSA Conference 2026 starts March 23. Every AI security vendor will be on stage talking about governance, compliance, and responsible AI. We wanted to see what governance actually looks like in the repos people are shipping.
So we scanned 21 of the most popular AI/ML repositories using the same governance scanner anyone can run for free. No manual review. No subjective scoring. Just structural analysis of what each repo enforces automatically.
The results are not great.
The Numbers
- 21 repos scanned across AI agent frameworks, ML libraries, web frameworks, and AI SDKs
- Average score: 53/100 (grade C)
- Only 2 repos (10%) score 70+ and are on track for EU AI Act readiness
- 6 repos (29%) have any AI governance configuration (CLAUDE.md or .cursorrules)
- 1 repo scored an F
View the full interactive leaderboard
Top 5
| Rank | Repository | Score | Grade | EU AI Act |
|---|---|---|---|---|
| 1 | vllm-project/vllm | 78 | B | On track |
| 2 | BerriAI/litellm | 72 | B | On track |
| 3 | Significant-Gravitas/AutoGPT | 68 | B | Gaps identified |
| 4 | fastapi/fastapi | 62 | B | Gaps identified |
| 5 | langchain-ai/langchain | 61 | B | Gaps identified |
vLLM leads the pack at 78/100 with pre-commit hooks, 7 CI/CD workflows, a security policy, and Dependabot. Its one critical finding: 2 .env files committed to source control.
Bottom 3
| Rank | Repository | Score | Grade | EU AI Act |
|---|---|---|---|---|
| 19 | ollama/ollama | 36 | D | Not ready |
| 20 | microsoft/autogen | 30 | D | Not ready |
| 21 | yoheinakajima/babyagi | 17 | F | Not ready |
BabyAGI's 17/100 is the lowest score in the set. No CI/CD pipeline, no enforcement hooks, no security policy, no governance config. It scores points only for having a test directory and basic project hygiene.
The Pattern: CI/CD Without Enforcement
The most striking finding across all 21 repos: nearly every project has CI/CD, but almost none enforce rules structurally.
Most repos scored 15/15 on CI/CD. They have GitHub Actions. They run tests in the pipeline. That part of modern software development is well-adopted.
But enforcement -- pre-commit hooks, commit-lint, CODEOWNERS, branch protection -- averages only 11/30 across all repos. This is the gap. Rules exist in documentation but are not structurally enforced before code enters the pipeline.
This is exactly what we call the "detection gap" in the enforcement ladder framework. You can detect violations in CI, but by then the code is already committed. Structural enforcement catches problems before they enter the system.
AI Governance Is Nearly Absent
Only 6 of 21 repos (29%) have any AI governance configuration -- a CLAUDE.md file or .cursorrules. This means that in 71% of the most popular AI/ML repos, AI coding tools operate with zero structural guidance.
When a developer uses Cursor, Claude Code, or GitHub Copilot on these repos, the AI has no project-specific rules to follow. No constraints on what it can modify. No enforced patterns. The governance score for these repos on this dimension: 0/15.
The repos that do have governance configs: vLLM, LiteLLM, AutoGPT, LangChain, Transformers, and LocalAI.
What the Scores Mean
Our scanner evaluates 6 dimensions (100 points total):
- Enforcement (30 pts): Pre-commit hooks, commit-lint, CODEOWNERS, branch protection
- CI/CD (15 pts): GitHub Actions, Travis CI, CircleCI workflows
- Security (20 pts): Security policy, .gitignore, no committed .env files, Dependabot/Renovate
- Testing (10 pts): Test configuration files, test directories
- Governance (15 pts): CLAUDE.md, .cursorrules, governance directories
- Hygiene (10 pts): README, CONTRIBUTING, LICENSE, CHANGELOG, lockfiles
Grades: A (80+), B (60-79), C (40-59), D (20-39), F (below 20).
Category Breakdown
AI Agent Frameworks (8 repos, avg 47/100)
The agent frameworks -- the repos building autonomous AI systems -- scored the lowest as a category. AutoGPT leads at 68, but BabyAGI (17), Autogen (30), and SuperAGI (41) drag the average down. These are the repos building systems that make autonomous decisions, and they have the least governance infrastructure.
ML Libraries (3 repos, avg 62/100)
vLLM (78) lifts this category. scikit-learn and Transformers both score 54 -- solid CI/CD and testing, but weak on enforcement and governance.
Web Frameworks (3 repos, avg 58/100)
FastAPI (62), Pydantic (59), Django (54). These established projects have mature CI/CD but mostly lack AI governance configs and full enforcement tooling.
AI SDKs (4 repos, avg 56/100)
The Anthropic SDK (55), OpenAI SDK (53), LlamaIndex (58), and DSPy (56) cluster tightly in the C range. The Anthropic SDK notably has no pre-commit hooks despite being from the company that makes Claude.
Local AI / Inference (3 repos, avg 53/100)
LiteLLM (72) stands out. Ollama (36) is the weakest -- no enforcement hooks, no test infrastructure detected, and no governance config.
Methodology
All scans were run on March 16, 2026 using the Walseth AI Free Repo Scan -- the same free repo scan available at walseth.ai/scan. Scores are point-in-time snapshots based on the default branch at scan time.
The scanner analyzes the file tree of each repository via the GitHub API. It checks for the presence of specific files and directories that indicate structural governance. It does not read file contents beyond filenames and paths.
Repos that fail to scan (private, rate-limited, or not found) are excluded. All 21 repos in this leaderboard scanned successfully.
What Would It Take to Score an A?
No repo in this scan scored an A (80+). To get there, a project would need:
- Pre-commit hooks AND commit-lint AND CODEOWNERS (25/30 enforcement)
- 3+ CI/CD workflows (15/15)
- Security policy + Dependabot + no committed .env files (17-20/20)
- Test config + test directories (10/10)
- CLAUDE.md or .cursorrules + governance directory (15/15)
- README + CONTRIBUTING + LICENSE + lockfile (8-10/10)
The tooling exists. The patterns are well-understood. Most projects just have not prioritized structural enforcement alongside their CI/CD pipelines.
Scan Your Own Repo
Every score in this leaderboard was generated by the same free scanner you can run right now:
Scan your repo free at walseth.ai/scan
Want a deeper analysis? Start with the scanner, then request the $5,000 Baseline Sprint when the repo signal shows a real gap and you need a remediation roadmap. Ask about monitoring only after baseline work exists.
View the full interactive leaderboard with sortable columns
Last scanned: March 16, 2026. Scores are point-in-time snapshots. Run the scanner to get the latest score for any repo.
Proof Handoff
Keep the next move honest after this article
Run the free repo scan on any public repository to get a quick signal before you buy deeper work.
This post is explanation or saved evidence, not current truth for your repo. Use the proof and product path below instead of stopping at the article.
State right now: this article is explanation or saved evidence for one topic, not Walseth AI's live proof surface and not current truth for your repo by itself.
Next honest step: read /proof when you need Walseth AI's current measured proof, or run the free repo scan when you need current public-repo truth before a paid follow-through.
Measured proof
See Walseth AI's current operating proof
This article explains the model or preserves saved evidence. The proof page holds Walseth AI's current measured operating proof.
Repo truth
Run the free scan on your own public repository
Use the free scan when this post makes you ask what your own repo looks like right now instead of staying at explanation or saved examples.
Paid follow-through
Use the baseline sprint when the signal is already real
Choose the baseline sprint after the free scan or an equivalent repo signal confirms a real gap and you need remediation order.
Current article CTA
This post's direct CTA still points to the most relevant next surface for this topic.
Run Free Repo ScanGet AI Governance Insights
Practical takes on enforcement automation and EU AI Act readiness. No spam.
Newsletter only
What happens
Email updates only
Submitting adds this address to future newsletter sends only.
What it does not do
No service request
It does not start a scan, open a paid lane, or trigger a private follow-up.
If you need help now
Use the right path
Run the free repo scan for current public-repo signal. Request manual review if the issue is already real.
Related Articles
Mapping the Enforcement Ladder to NIST AI RMF: A Compliance Crosswalk
NIST AI Risk Management Framework defines four functions: Govern, Map, Measure, Manage. Here is how structural enforcement maps to each function -- with a concrete crosswalk table for compliance teams.
11 min readAI Coding Agents Need Enforcement Ladders, Not More Prompts
75% of AI coding models introduce regressions on sustained maintenance. The fix is not better prompts -- it is structural enforcement at five levels, from conversation to pre-commit hooks.
4 min readYour AI Agent Forgets Its Rules Every 45 Minutes — Here's the Fix
Every long-running AI agent hits context compression. Your system prompts, project rules, and behavioral constraints get silently dropped. Here's a production-proven hook that flushes critical knowledge to persistent storage before compression hits.
5 min readFramework Governance Scores
See how major AI/ML frameworks score on enforcement posture, context hygiene, and EU AI Act readiness.
Want to know where your AI governance stands?
Get a Free Governance Audit