The CPU Model Development Framework

The CPU Model Development Framework is designed to organize the research and development work of creating machine learning models. People get used to Agile, where the arrival of new features and capabilities indicates progress. So when modeling work happens, people don’t know how to discuss progress beyond “It’s under development".

At Komodo, we expect better. Stakeholder understanding and participation is a key ingredient in creating a powerful custom model. Our proprietary CPU framework helps stakeholders follow and participate in the highly explorative nature of modeling.

The framework’s name, CPU, reflects its three foundational pillars:

C - Complexity: Managing sample size, data quality, and data selection.
P - Power: Improving predictive power through better algorithms and deriving features through data engineering.
U - Understanding: Using your head to apply business and domain knowledge to create the ah-ha moments necessary for modeling breakthroughs.

Modeling is analytical work that requires continuous experimentation and iteration. Instead of following a linear path, it involves creating and refining different model versions and managing trade-offs between complexity, predictive power, and business relevance. The CPU framework enhances transparency throughout the entire process, from initial data selection to final model refinement.

The A.B.Cs of Model Development

Models are named using a hierarchical naming system with three significant digits (A.B.C) to track progress and changes:

A: Model Stage – Initial (0), Formal (1), or Final (2).
B: Data Input Changes – New features or samples added, describing progress in data engineering.
C: Algorithm Changes – Switching models, modifying hyperparameters, or changing architectures.

This approach helps to standardize model naming and provides clear indicators of where progress is being made.

More On Model Stages

Initial Stage (A=0)

Goal: Data Selection
Focus: Identifying Grain, Sample Size, and Original Data Features.
Approach: Use simple, interpretable models (e.g., Decision Trees) to calculate directional impact of new features and assess feature importance.
ETL Work: Heavy focus on cleaning, standardizing, and preparing original data for machine learning.
Outcome: Establish which features are suitable for further model development.

Formal Stage (A=1)

Goal: Predictive Power
Focus: Improving model performance by enhancing input quality and optimizing algorithms.
Approach: Increase complexity by expanding Derived Features, performing data and business analysis to refine inputs.
ETL Work: Continuous refinement and transformation of original data into high-quality derived features.
Benchmarking: Exploring algorithm selection and tuning.
Outcome: Achieve high-performance models through improved features and algorithms.

Final Stage (A=2)

Goal: Managing Bias/Variance Tradeoff
Focus: Ensuring real-life performance by simplifying models and reducing complexity where necessary.
Approach: Trim and prune features, simplify algorithms, and track performance throughout the simplification process.
Outcome: Create robust, generalizable models suitable for deployment.

Comparison with Other Frameworks

The CPU framework differs from broader conceptual frameworks like GODST, which is more focused on summarizing model concepts rather than organizing the R&D work. While GODST provides a conceptual understanding, the CPU framework acts as the Kanban board for day-to-day model development tasks.

Grain, Sample Size, Original Data, Derived Features, and Target are all introduced in GODST but expanded upon within the CPU framework as development progresses.

Additionally, CPU provides better separation of concerns between model R&D and application R&D, enhancing communication with clients and optimizing resource allocation.

Conclusion

The CPU Model Development Framework offers a structured approach to organizing machine learning model development. By breaking down the process into Initial, Formal, and Final stages, and providing a clear naming convention, the framework ensures that model development is systematic, transparent, and easily navigable.

The CPU Framework For Managing Model Development