scikit-learn Governance Audit

scikit-learn scores 18/100 on enforcement posture -- the ML library powering insurance underwriting, medical diagnostics, and fraud detection has zero hardcoded secrets (best in our portfolio) but zero enforcement hooks and no AI agent instructions.

Overall Score: 18/100 (Grade: F)

20/100

Enforcement Maturity

Grade: D

10/100

Context Hygiene

Grade: F

26/100

Automation Readiness

Grade: D

Executive Summary

scikit-learn is the foundational machine learning library for the Python ecosystem, with 60,000+ GitHub stars and ubiquitous adoption across safety-critical domains including insurance underwriting (Gradient AI), medical diagnostics, fraud detection, and credit scoring. It is the default ML toolkit for enterprises building regulated AI systems.

An automated governance audit reveals that despite scikit-learn's maturity and excellent security posture (zero hardcoded secrets -- the best in our portfolio), the project has critical structural gaps in AI governance. Tests exist but are embedded within package directories, and the absence of enforcement hooks and agent instructions leaves a structural governance gap in safety-critical ML infrastructure.

Enforcement Ladder Distribution

L5 - Hooks0 found

No automated enforcement before commits or tool use

L4 - Tests0 at root*

Tests exist within sklearn/ packages (sklearn/tests/, sklearn/cluster/tests/) but not discovered at root

L3 - Templates21 GH Actions + CircleCI + Makefile

Strong CI pipeline with multi-platform testing

L2 - Prose0 rules

AGENTS.md present but no CLAUDE.md or agent-specific enforcement rules

L1 - ConversationDefault

Default mode for all AI interactions

Diagnosis: scikit-learn has strong L3 investment (21 GitHub Actions + CircleCI + Makefile) and embedded L4 tests, but zero L2 (prose rules) and L5 (hooks). The test discovery gap creates a misleading governance picture -- the project has more test infrastructure than the score suggests, but the lack of enforcement hooks means AI agents operate with zero structural guardrails on a library used in safety-critical ML pipelines.

Critical Gaps Found

1. No L5 (Hook) Enforcement [CRITICAL]

No pre-commit hooks or Claude Code hooks were found. AI agents can modify any module -- including safety-critical estimators used in medical diagnostics and credit scoring -- without structural gatekeeping. A subtle change to a default parameter in sklearn.linear_model could silently affect thousands of downstream models.

2. Test Discovery Gap [CRITICAL]

Scanner detects 0 test files at root because tests are embedded within sklearn/ package directories -- a standard Python packaging pattern. While this is idiomatic, it creates governance visibility issues. Note: this is a scanner limitation, not a project deficiency.

3. No CLAUDE.md / Agent Instructions [HIGH]

No CLAUDE.md or equivalent AI agent instruction file was found. AGENTS.md is present (early governance awareness) but does not provide enforcement-level instructions. Every AI session starts from zero context on scikit-learn's complex estimator interface and API design patterns.

4. High TODO/FIXME Debt [MEDIUM]

624 TODO/FIXME/HACK markers found across the codebase. No systematic process for converting TODOs to actionable work items. AI agents may encounter and incorrectly "fix" TODO items in safety-critical code paths.

Positive: Zero Hardcoded Secrets

scikit-learn has 0 potential hardcoded secrets detected across 660 source files -- the best security posture in our entire audit portfolio. This demonstrates clean credential hygiene and a security-conscious development culture that other projects should emulate.

EU AI Act Compliance Mapping

scikit-learn is the foundational ML library underlying many high-risk AI systems (EU AI Act enforcement deadline: August 2, 2026). Organizations using scikit-learn in insurance underwriting, medical diagnostics, fraud detection, or credit scoring must ensure governance extends through the library layer.

Article 9: Risk Management System

Requirement	Readiness
9(2)(a) Risk identification	10%
9(2)(b) Risk evaluation	5%
9(2)(d) Risk management measures	15%
9(6) Testing for risk management	30%
9(7) Lifecycle risk management	5%

Article 15: Accuracy, Robustness and Cybersecurity

Requirement	Readiness
15(1) Accuracy levels	25%
15(2) Error resilience	20%
15(3) Manipulation robustness	5%
15(4) Cybersecurity	40%

Article 17: Quality Management System

Requirement	Readiness
17(1)(a) Compliance strategy	5%
17(1)(b) Design/development procedures	20%
17(1)(c) Test/validation procedures	25%
17(1)(g) Post-market monitoring	0%

Overall EU AI Act Readiness: ~15%

This is especially concerning for the ML library most commonly used in regulated domains. Organizations building high-risk AI systems with scikit-learn inherit these governance gaps unless they implement their own enforcement layer.

Recommendations

Immediate (Week 1)

Create CLAUDE.md with estimator interface requirements, API compatibility rules, deprecation workflow, and testing requirements -- 1 hour effort, high impact
Add 3 pre-commit hooks for estimator interface validation, parameter deprecation checks, and test co-location -- 2 hours effort
Add root-level test orchestration to improve governance tool compatibility -- 30 minutes effort

Short-term (Month 1)

Deploy L5 enforcement hooks for safety-critical estimator paths
Set up violation tracking to build a risk register from enforcement data
Create AI agent governance documentation mapping to EU AI Act articles

Strategic (Quarter)

Build enforcement ladder documentation mapping to EU AI Act requirements
Establish violation tracking across contributor AI tool usage
Autoresearch optimization -- auto-tune enforcement rules based on violation patterns

Appendix: Raw Scan Data

Test Files

660

Source Files

GitHub Actions

Potential Secrets

624

TODO/FIXME

490

Dead Code Markers

CLAUDE.md Files

L5 Hooks

Doc Files

*Test files show 0 at root level. Tests are embedded within sklearn/ package directories (sklearn/tests/, sklearn/cluster/tests/, etc.) following standard Python packaging conventions.

Want this analysis for your codebase?

Get the same structural governance audit -- risk classification, violation scan, and enforcement recommendations.

Request a Free Audit

This governance audit was generated by walseth.ai using automated enforcement posture scanning. The findings are based on static analysis of the repository structure, configuration files, and code patterns -- no code was executed during the audit.