MLOps Journey: A Data Engineer's Perspective with Databricks and GitLab

Published: June 30, 2025

As a data engineering practitioner, I've witnessed the evolution of machine learning operations from experimental notebooks to production-ready systems. My journey with MLOps has been filled with challenges, learnings, and transformative experiences. In this post, I'll share my personal experiences implementing MLOps practices using Databricks Community Edition, Python, and GitLab as my technology stack of choice.

The Data Engineer's Dilemma in ML Projects

When I first transitioned from traditional data engineering to machine learning projects, I quickly realized that my existing toolkit and processes were insufficient. Traditional data pipelines were deterministic and relatively straightforward to test and deploy. Machine learning pipelines, however, introduced new complexities:

  • Models that behaved differently with varying data distributions

  • Experiments that needed careful tracking and reproducibility

  • Model drift that required continuous monitoring

  • Increased collaboration needs between data scientists and engineers

I found myself asking: How do I bring the same level of rigor and automation to ML workflows that I've established for data processing pipelines?

My MLOps Architecture Journey

After numerous iterations, I developed an MLOps architecture that balanced flexibility with governance. Here's a sequence diagram showing the end-to-end workflow I established:

spinner

This workflow helped establish clear handoffs between roles while maintaining the flexibility data scientists needed for experimentation.

Setting Up the Infrastructure

Databricks Community Edition: The Experimentation Platform

Databricks Community Edition became the foundation of my MLOps practice for several reasons:

  1. It provided a collaborative notebook environment that data scientists loved

  2. It included built-in MLflow for experiment tracking

  3. It offered seamless scaling for larger workloads

  4. It was accessible without enterprise-level budgets

Setting up Databricks for MLOps wasn't trivial. I needed to:

  1. Configure workspace permissions

  2. Create cluster configurations that balanced cost with performance

  3. Set up MLflow experiment tracking

  4. Establish connections with GitLab

The most critical part was establishing the MLflow tracking server:

GitLab: Version Control and CI/CD Pipeline

While Databricks handled experimentation, I needed a robust system for version control, collaboration, and automated testing. GitLab became my platform of choice because:

  1. It provided comprehensive CI/CD capabilities

  2. It had excellent support for merge requests and code reviews

  3. It integrated well with Python ecosystems

  4. It facilitated collaboration between data scientists and engineers

I structured my GitLab repository to accommodate both the code and ML artifacts:

The CI/CD pipeline was configured to:

  1. Run tests on code changes

  2. Validate data quality

  3. Train and validate models

  4. Register models if they met performance criteria

  5. Deploy models to production

Here's the GitLab CI/CD pipeline configuration that tied it all together:

The Data Processing and Model Training Workflow

One of my biggest challenges was establishing a repeatable process for data processing and model training. Here's a sequence diagram of the workflow I implemented:

spinner

The code to implement this workflow was designed to be both robust and maintainable:

The model training code followed a similar pattern of configuration-driven, trackable processes:

The Model Deployment and Monitoring Workflow

The final piece of my MLOps puzzle was model deployment and monitoring. This was perhaps the most challenging part, requiring careful orchestration between GitLab CI/CD, Databricks, and production systems. Here's a sequence diagram showing the deployment process:

spinner

The code for model deployment looked like this:

Lessons Learned as a Data Engineering Practitioner

Throughout my MLOps journey, I've learned several critical lessons that have shaped my practice as a data engineer:

1. Start with Strong Data Foundations

As a data engineer, I found that ML projects amplify data quality issues. My biggest success factor was investing heavily in data validation and quality controls. Prior to implementing MLOps, nearly 40% of model failures could be traced to data issues. After implementing robust data validation in the pipeline, this dropped to less than 10%.

2. Embrace Modularity

Making ML pipelines modular helps isolate issues and enables incremental improvements. I separate my pipelines into discrete steps:

  • Data ingestion

  • Validation

  • Preprocessing

  • Feature engineering

  • Model training

  • Evaluation

  • Deployment

This approach has reduced debugging time by 60% and made it easier to identify bottlenecks.

3. Automate Thoughtfully

Not everything should be automated immediately. I've found a phased approach works best:

  1. Start with automating the most error-prone manual tasks

  2. Add monitoring and alerting next

  3. Finally, implement automated retraining and deployment

4. Version Everything

In ML systems, versioning goes beyond code. I track:

  • Data versions (using DVC)

  • Model versions (using MLflow)

  • Environment configurations

  • Experiment parameters

This comprehensive versioning has been crucial for reproducing results and debugging production issues.

5. Monitor Not Just Performance, But Data Too

My most valuable lesson was learning to monitor input data distributions in production. Several times, we caught data drift issues before they impacted model performance by setting up monitoring for:

  • Feature distributions

  • Input data schema changes

  • Data quality metrics

6. Collaboration Is Key

The most successful ML projects I've worked on involved close collaboration between data scientists and engineers. Using tools like:

  • Shared GitLab repositories

  • Clear documentation

  • Standardized notebooks in Databricks

  • Regular sync meetings

These practices have dramatically improved the transition from experiment to production.

Challenges I Faced and How I Overcame Them

Challenge 1: Environment Inconsistency

Problem: Models would work in development but fail in production due to environment differences.

Solution: I implemented Docker containers for consistent environments and created detailed environment.yml files for Databricks clusters. This reduced environment-related failures by 90%.

Challenge 2: Long-Running Training Jobs Breaking CI/CD

Problem: Model training could take hours, making CI/CD pipelines impractical.

Solution: I separated the CI/CD pipeline into stages and used Databricks Jobs API to handle long-running training processes asynchronously. This kept most CI jobs under 10 minutes while still ensuring quality.

Challenge 3: Model Drift in Production

Problem: Models would silently degrade over time as data patterns shifted.

Solution: I implemented:

  • Statistical monitoring of input feature distributions

  • Performance monitoring with sliding windows

  • Automated retraining triggers when metrics dropped below thresholds

This approach has caught drift issues weeks before they would have impacted business metrics.

Conclusion: The Continuous MLOps Journey

My MLOps journey as a data engineer has transformed how I approach machine learning projects. The integration of Databricks for experimentation, GitLab for CI/CD, and MLflow for experiment tracking has created a robust, reproducible ML pipeline that balances flexibility with governance.

The key takeaway from my experience is that MLOps is not a destination but a continuous journey of improvement. Start small, focus on the highest-value problems, and gradually expand your MLOps capabilities.

For data engineers taking their first steps into MLOps, I recommend:

  1. Start with experiment tracking and model versioning

  2. Focus on data quality and validation

  3. Build modular pipelines that can evolve

  4. Implement monitoring early

  5. Collaborate closely with data scientists

I hope sharing my personal MLOps journey helps you on yours. What challenges are you facing in implementing MLOps in your organization? What tools and practices have you found most valuable? I'd love to hear about your experiences in the comments.


About the Author: A passionate data engineering practitioner with years of experience implementing MLOps solutions across various industries.

Last updated