AI Governance Leaderboard: We Scanned 21 Top Repos Before RSA 2026

March 16, 20266 min readEnforcement & Governance

AI Governance Leaderboard: We Scanned 21 Top Repos Before RSA 2026

RSA Conference 2026 starts March 23. Every AI security vendor will be on stage talking about governance, compliance, and responsible AI. We wanted to see what governance actually looks like in the repos people are shipping.

So we scanned 21 of the most popular AI/ML repositories using the same governance scanner anyone can run for free. No manual review. No subjective scoring. Just structural analysis of what each repo enforces automatically.

The results are not great.

The Numbers

21 repos scanned across AI agent frameworks, ML libraries, web frameworks, and AI SDKs
Average score: 53/100 (grade C)
Only 2 repos (10%) score 70+ and are on track for EU AI Act readiness
6 repos (29%) have any AI governance configuration (CLAUDE.md or .cursorrules)
1 repo scored an F

View the full interactive leaderboard

Top 5

Rank	Repository	Score	Grade	EU AI Act
1	vllm-project/vllm	78	B	On track
2	BerriAI/litellm	72	B	On track
3	Significant-Gravitas/AutoGPT	68	B	Gaps identified
4	fastapi/fastapi	62	B	Gaps identified
5	langchain-ai/langchain	61	B	Gaps identified

vLLM leads the pack at 78/100 with pre-commit hooks, 7 CI/CD workflows, a security policy, and Dependabot. Its one critical finding: 2 .env files committed to source control.

Bottom 3

Rank	Repository	Score	Grade	EU AI Act
19	ollama/ollama	36	D	Not ready
20	microsoft/autogen	30	D	Not ready
21	yoheinakajima/babyagi	17	F	Not ready

BabyAGI's 17/100 is the lowest score in the set. No CI/CD pipeline, no enforcement hooks, no security policy, no governance config. It scores points only for having a test directory and basic project hygiene.

The Pattern: CI/CD Without Enforcement

The most striking finding across all 21 repos: nearly every project has CI/CD, but almost none enforce rules structurally.

Most repos scored 15/15 on CI/CD. They have GitHub Actions. They run tests in the pipeline. That part of modern software development is well-adopted.

But enforcement -- pre-commit hooks, commit-lint, CODEOWNERS, branch protection -- averages only 11/30 across all repos. This is the gap. Rules exist in documentation but are not structurally enforced before code enters the pipeline.

This is exactly what we call the "detection gap" in the enforcement ladder framework. You can detect violations in CI, but by then the code is already committed. Structural enforcement catches problems before they enter the system.

AI Governance Is Nearly Absent

Only 6 of 21 repos (29%) have any AI governance configuration -- a CLAUDE.md file or .cursorrules. This means that in 71% of the most popular AI/ML repos, AI coding tools operate with zero structural guidance.

When a developer uses Cursor, Claude Code, or GitHub Copilot on these repos, the AI has no project-specific rules to follow. No constraints on what it can modify. No enforced patterns. The governance score for these repos on this dimension: 0/15.

The repos that do have governance configs: vLLM, LiteLLM, AutoGPT, LangChain, Transformers, and LocalAI.

What the Scores Mean

Our scanner evaluates 6 dimensions (100 points total):

Enforcement (30 pts): Pre-commit hooks, commit-lint, CODEOWNERS, branch protection
CI/CD (15 pts): GitHub Actions, Travis CI, CircleCI workflows
Security (20 pts): Security policy, .gitignore, no committed .env files, Dependabot/Renovate
Testing (10 pts): Test configuration files, test directories
Governance (15 pts): CLAUDE.md, .cursorrules, governance directories
Hygiene (10 pts): README, CONTRIBUTING, LICENSE, CHANGELOG, lockfiles

Grades: A (80+), B (60-79), C (40-59), D (20-39), F (below 20).

Category Breakdown

AI Agent Frameworks (8 repos, avg 47/100)

The agent frameworks -- the repos building autonomous AI systems -- scored the lowest as a category. AutoGPT leads at 68, but BabyAGI (17), Autogen (30), and SuperAGI (41) drag the average down. These are the repos building systems that make autonomous decisions, and they have the least governance infrastructure.

ML Libraries (3 repos, avg 62/100)

vLLM (78) lifts this category. scikit-learn and Transformers both score 54 -- solid CI/CD and testing, but weak on enforcement and governance.

Web Frameworks (3 repos, avg 58/100)

FastAPI (62), Pydantic (59), Django (54). These established projects have mature CI/CD but mostly lack AI governance configs and full enforcement tooling.

AI SDKs (4 repos, avg 56/100)

The Anthropic SDK (55), OpenAI SDK (53), LlamaIndex (58), and DSPy (56) cluster tightly in the C range. The Anthropic SDK notably has no pre-commit hooks despite being from the company that makes Claude.

Local AI / Inference (3 repos, avg 53/100)

LiteLLM (72) stands out. Ollama (36) is the weakest -- no enforcement hooks, no test infrastructure detected, and no governance config.

Methodology

All scans were run on March 16, 2026 using the Walseth AI Free Repo Scan -- the same free repo scan available at walseth.ai/scan. Scores are point-in-time snapshots based on the default branch at scan time.

The scanner analyzes the file tree of each repository via the GitHub API. It checks for the presence of specific files and directories that indicate structural governance. It does not read file contents beyond filenames and paths.

Repos that fail to scan (private, rate-limited, or not found) are excluded. All 21 repos in this leaderboard scanned successfully.

What Would It Take to Score an A?

No repo in this scan scored an A (80+). To get there, a project would need:

Pre-commit hooks AND commit-lint AND CODEOWNERS (25/30 enforcement)
3+ CI/CD workflows (15/15)
Security policy + Dependabot + no committed .env files (17-20/20)
Test config + test directories (10/10)
CLAUDE.md or .cursorrules + governance directory (15/15)
README + CONTRIBUTING + LICENSE + lockfile (8-10/10)

The tooling exists. The patterns are well-understood. Most projects just have not prioritized structural enforcement alongside their CI/CD pipelines.

Scan Your Own Repo

Every score in this leaderboard was generated by the same free scanner you can run right now:

Scan your repo free at walseth.ai/scan

Want a deeper analysis? Start with the scanner, then request the $5,000 Baseline Sprint when the repo signal shows a real gap and you need a remediation roadmap. Ask about monitoring only after baseline work exists.

View the full interactive leaderboard with sortable columns

Last scanned: March 16, 2026. Scores are point-in-time snapshots. Run the scanner to get the latest score for any repo.

Reading Path

Keep the next move clear after this article

Run the free repo scan on any public repository to get a quick signal before you buy deeper work.

This post is explanation or saved context, not current findings for your repo. Use the proof page and product path below instead of stopping at the article.

State right now: this article is explanation or saved evidence for one topic, not Walseth AI's proof page and not current findings for your repo by itself.

Next step: read /proof when you need Walseth AI's current measured proof, or run the free repo scan when you need current public-repo findings before a paid follow-through.

Operating record

See Walseth AI's current measured proof

This article explains the model or preserves saved context. The proof page holds Walseth AI's current measured proof.

Repo findings

Run the free scan on your own public repository

Use the free scan when this post makes you ask what your own repo looks like right now instead of staying at explanation or saved examples.

Paid follow-through

Use the baseline sprint when the signal is already real

Choose the baseline sprint after the free scan or an equivalent repo signal confirms a real gap and you need remediation order.

View Proof Page Run Free Repo Scan Request Baseline Sprint

Current article CTA

This post's direct CTA still points to the most relevant next surface for this topic.

Run Free Repo Scan

Get AI Governance Insights

Practical takes on enforcement automation and EU AI Act readiness. No spam.

Newsletter only

What happens

Email updates only

Submitting adds this address to future newsletter sends only.

What it does not do

No service request

It does not start a scan, open a paid lane, or trigger a private follow-up.

If you need help now

Use the right path

Run the free repo scan for current public-repo signal. Request baseline review if the issue is already real.

Framework Governance Scores

See how major AI/ML frameworks score on enforcement posture, context hygiene, and EU AI Act readiness.

View all scores →

Want to know where your AI governance stands?

Get a Free Governance Audit

AI Governance Leaderboard: We Scanned 21 Top Repos Before RSA 2026

AI Governance Leaderboard: We Scanned 21 Top Repos Before RSA 2026

The Numbers

Top 5

Bottom 3

The Pattern: CI/CD Without Enforcement

AI Governance Is Nearly Absent

What the Scores Mean

Category Breakdown

AI Agent Frameworks (8 repos, avg 47/100)

ML Libraries (3 repos, avg 62/100)

Web Frameworks (3 repos, avg 58/100)

AI SDKs (4 repos, avg 56/100)

Local AI / Inference (3 repos, avg 53/100)

Methodology

What Would It Take to Score an A?

Scan Your Own Repo

Keep the next move clear after this article

See Walseth AI's current measured proof

Run the free scan on your own public repository

Use the baseline sprint when the signal is already real

Get AI Governance Insights

Related Articles

Mapping the Enforcement Ladder to NIST AI RMF: A Compliance Crosswalk

AI Coding Agents Need Enforcement Ladders, Not More Prompts

Your AI Agent Forgets Its Rules Every 45 Minutes — Here's the Fix

Framework Governance Scores