Data management

Achieving DataOps Breakthrough

The promise of DataOps, the application of agile DevOps methods and tools to data engineering, is compelling even though the term has progressed through the hype cycle for years and, for many, little impact has been achieved. The potential remains strong, though. If impact is elusive, there are likely two main causes.

PPDM_DataOps.jpg
Source: PPDM

The promise of DataOps is compelling even though the term has progressed through the hype cycle for years now and, for many, little impact has yet been achieved. DataOps is defined as the application of agile DevOps methods and tools to the data engineering disciplines. This concept has succeeded in becoming the preferred approach for creation of quality data products. The early hype promised tenfold productivity gains for data engineering teams adopting this method. That created urgency in the early stages and may now be generating disappointment for those not experiencing a step level change. The potential remains strong though. If impact is elusive, there are likely two main causes.

First, organizational change management requires time and commitment. The larger the data engineering organization, the greater the challenge in designing and executing this change. Scale defeats many good intentions of changing organizations. If “DataOps” seems to have become another fancy name for what we have always done, look carefully at how the change has been designed for scale and how the transformation was rolled out.

Another leading cause of frustration is the lack of effective automation and tooling. Consistent, successful data engineering practices not only enable tooling but also require it. The complexity of data operations can be overwhelming. Just listing the considerations of a data shop—data extractions, data warehouses, data lakes, data pipelines, streaming data, data governance, data lineage, data quality, data security, etc.—can be exhausting.

Deploying new data products to production can result in breaking either the code or the data or both. This is further challenged by the variety of data structures within a distributed data landscape.

The best organizations avoid wallowing in this complexity. Instead, they simplify.

Read the full story here.