scikit-learn Governance Audit
scikit-learn scores 18/100 on enforcement posture -- the ML library powering insurance underwriting, medical diagnostics, and fraud detection has zero hardcoded secrets (best in our portfolio) but zero enforcement hooks and no AI agent instructions.
Overall Score: 18/100 (Grade: F)
Executive Summary
scikit-learn is the foundational machine learning library for the Python ecosystem, with 60,000+ GitHub stars and ubiquitous adoption across safety-critical domains including insurance underwriting (Gradient AI), medical diagnostics, fraud detection, and credit scoring. It is the default ML toolkit for enterprises building regulated AI systems.
An automated governance audit reveals that despite scikit-learn's maturity and excellent security posture (zero hardcoded secrets -- the best in our portfolio), the project has critical structural gaps in AI governance. Tests exist but are embedded within package directories, and the absence of enforcement hooks and agent instructions leaves a structural governance gap in safety-critical ML infrastructure.
Enforcement Ladder Distribution
No automated enforcement before commits or tool use
Tests exist within sklearn/ packages (sklearn/tests/, sklearn/cluster/tests/) but not discovered at root
Strong CI pipeline with multi-platform testing
AGENTS.md present but no CLAUDE.md or agent-specific enforcement rules
Default mode for all AI interactions
Diagnosis: scikit-learn has strong L3 investment (21 GitHub Actions + CircleCI + Makefile) and embedded L4 tests, but zero L2 (prose rules) and L5 (hooks). The test discovery gap creates a misleading governance picture -- the project has more test infrastructure than the score suggests, but the lack of enforcement hooks means AI agents operate with zero structural guardrails on a library used in safety-critical ML pipelines.
Critical Gaps Found
1. No L5 (Hook) Enforcement [CRITICAL]
No pre-commit hooks or Claude Code hooks were found. AI agents can modify any module -- including safety-critical estimators used in medical diagnostics and credit scoring -- without structural gatekeeping. A subtle change to a default parameter in sklearn.linear_model could silently affect thousands of downstream models.
2. Test Discovery Gap [CRITICAL]
Scanner detects 0 test files at root because tests are embedded within sklearn/ package directories -- a standard Python packaging pattern. While this is idiomatic, it creates governance visibility issues. Note: this is a scanner limitation, not a project deficiency.
3. No CLAUDE.md / Agent Instructions [HIGH]
No CLAUDE.md or equivalent AI agent instruction file was found. AGENTS.md is present (early governance awareness) but does not provide enforcement-level instructions. Every AI session starts from zero context on scikit-learn's complex estimator interface and API design patterns.
4. High TODO/FIXME Debt [MEDIUM]
624 TODO/FIXME/HACK markers found across the codebase. No systematic process for converting TODOs to actionable work items. AI agents may encounter and incorrectly "fix" TODO items in safety-critical code paths.
Positive: Zero Hardcoded Secrets
scikit-learn has 0 potential hardcoded secrets detected across 660 source files -- the best security posture in our entire audit portfolio. This demonstrates clean credential hygiene and a security-conscious development culture that other projects should emulate.
EU AI Act Compliance Mapping
scikit-learn is the foundational ML library underlying many high-risk AI systems (EU AI Act enforcement deadline: August 2, 2026). Organizations using scikit-learn in insurance underwriting, medical diagnostics, fraud detection, or credit scoring must ensure governance extends through the library layer.
Article 9: Risk Management System
| Requirement | Readiness |
|---|---|
| 9(2)(a) Risk identification | 10% |
| 9(2)(b) Risk evaluation | 5% |
| 9(2)(d) Risk management measures | 15% |
| 9(6) Testing for risk management | 30% |
| 9(7) Lifecycle risk management | 5% |
Article 15: Accuracy, Robustness and Cybersecurity
| Requirement | Readiness |
|---|---|
| 15(1) Accuracy levels | 25% |
| 15(2) Error resilience | 20% |
| 15(3) Manipulation robustness | 5% |
| 15(4) Cybersecurity | 40% |
Article 17: Quality Management System
| Requirement | Readiness |
|---|---|
| 17(1)(a) Compliance strategy | 5% |
| 17(1)(b) Design/development procedures | 20% |
| 17(1)(c) Test/validation procedures | 25% |
| 17(1)(g) Post-market monitoring | 0% |
This is especially concerning for the ML library most commonly used in regulated domains. Organizations building high-risk AI systems with scikit-learn inherit these governance gaps unless they implement their own enforcement layer.
Recommendations
Immediate (Week 1)
- Create CLAUDE.md with estimator interface requirements, API compatibility rules, deprecation workflow, and testing requirements -- 1 hour effort, high impact
- Add 3 pre-commit hooks for estimator interface validation, parameter deprecation checks, and test co-location -- 2 hours effort
- Add root-level test orchestration to improve governance tool compatibility -- 30 minutes effort
Short-term (Month 1)
- Deploy L5 enforcement hooks for safety-critical estimator paths
- Set up violation tracking to build a risk register from enforcement data
- Create AI agent governance documentation mapping to EU AI Act articles
Strategic (Quarter)
- Build enforcement ladder documentation mapping to EU AI Act requirements
- Establish violation tracking across contributor AI tool usage
- Autoresearch optimization -- auto-tune enforcement rules based on violation patterns
Appendix: Raw Scan Data
*Test files show 0 at root level. Tests are embedded within sklearn/ package directories (sklearn/tests/, sklearn/cluster/tests/, etc.) following standard Python packaging conventions.
Want this analysis for your codebase?
Get the same structural governance audit -- risk classification, violation scan, and enforcement recommendations.
Request a Free Audit