Model Training with Katib
The Hyperparameter Problem
# The manual approach (we've all done this)
for lr in [0.001, 0.01, 0.1]:
for n_est in [50, 100, 200]:
model = RandomForestClassifier(learning_rate=lr, n_estimators=n_est)
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print(f"lr={lr}, n_est={n_est}, score={score}")What is Katib?
Your First Katib Experiment
Step 1: Create a Training Script
Step 2: Containerize Your Training Code
Step 3: Define Katib Experiment
Step 4: Run the Experiment
Step 5: View Results
Optimization Algorithms
1. Random Search
2. Grid Search
3. Bayesian Optimization
4. Hyperband
5. TPE (Tree-structured Parzen Estimator)
Parameter Types
Continuous Parameters
Discrete Integer Parameters
Categorical Parameters
Practical Example: Deep Learning with PyTorch
Training Script
Katib Experiment for PyTorch
Early Stopping
Integrating with Kubeflow Pipelines
Best Practices
1. Start with Random Search
2. Use Logarithmic Scale for Learning Rates
3. Limit Search Space Initially
4. Monitor Resource Usage
5. Use Early Stopping for Deep Learning
Common Issues
Trials Keep Failing
No Improvement After Many Trials
Slow Experiment Progress
Key Takeaways
Next Steps
Last updated