Breaking Down DataOps
By Alex Gutow
One of the latest buzzwords du jour is DataOps. It’s even recently broken through to the Gartner Hype Cycle. This one in particular has been really interesting for us to see develop and evolve within all sorts of different companies. We’re a community of data engineers, architects, and analysts who have all felt the pain around working with data, but have also experienced how amazing having the right data at the right time can be. It’s why we wanted to create a place to share ideas and strategies around DataOps, so we can all learn from each other as we figure out what works best. But before we can do all that, it’s important to talk about what DataOps actually means.
What is DataOps Solving?
To understand why DataOps is such a big deal right now, it’s important to understand what problems it’s trying to solve. At your company, you likely have a ton of data that different teams are trying to make sense of. They may be the marketing team trying to understand the effect of promotions on customer churn rate, a product team looking to track users after a new release, or a data science group working to better personalize recommendations.
However, also at your company, this data probably has a number of “pass-offs” between teams and technologies, before it can feed a new dashboard or model. Data comes in, data engineers prepare it, and pass it to BI or Data Science teams who continue to refine it, until it’s made available to analysts or other end-consumers. Building a new data workflow can take weeks (if you’re lucky), and it can be challenging and time-consuming to make any changes to an existing workflow.
Think about a classic assembly line for cars. Each one is finely tuned to produce the pieces necessary to build a Mini Cooper. Think of how long it would take for Mini to create a new assembly line to produce limos. Or even all of the changes needed to add a rear-view camera and screen to the dashboard. Those aren’t trivial to do in the current process.
While you may not actually be building a car, it’s ridiculous that building with data requires a similar heavy process. And especially with how fast data changes and how quickly people want access to it to answer new questions, this current workflow world just won’t work anymore.
That’s where DataOps comes in. DataOps aims to bring agility to the world of data and analytics. Rather than have your business slowed down by all these rigid, waterfall-like processes and pass-offs, DataOps enables you to not only build quickly with data, but also test and iterate as you go for better quality and productivity downstream.
It consists of a set of cultural tenets, philosophies and practices that unifies builders and consumers of data products, with an explicit focus on delivering speed, quality, and flexibility for your resulting data products. DataOps takes an integrated approach to better drive collaboration between these separate data teams, and automate and orchestrate as much as possible to minimize manual bottlenecks and errors.
The end goal of DataOps is shorter development cycles, increased iteration frequency, zero maintenance, and inclusive data-driven innovation, in complete consonance with business objectives.
Great! Where can I buy it?
Like all great data trends, it isn’t just about throwing new technology at the problem. DataOps gets its roots from the DevOps movement, which helped bridge the gap in software engineering between development, QA, and operations. While there are now a growing number of DevOps tools available to better automate, monitor, and troubleshoot across the development process, none of these tools alone could have driven this shift within companies. Their success relied heavily on the simultaneous changes happening around culture and process.
A key tenet of DataOps is around driving communication and collaboration between everyone involved in designing data workflows. No one person or team has the expertise, skills, or time to do everything and they shouldn’t need to. Data users and builders can come together to build and iterate, and ultimately all have responsibility for the entire end result, instead of pointing fingers when something is not as expected.
One of the biggest gains here can be having data engineers, who are skilled at figuring out “how” to build these pipelines, work closely with analytics teams, who have a deep understanding of the business and expertise around “what” needs to happen with the data for a project. We’ve seen this manifest itself in many different ways. One group that we’ve worked put together a company-wide hackathon where their builders and consumers came together for 48 hours to build with data. They were able to build some pretty amazing data products in record time, many of which are still being used across the company today.
It’s not just about bringing people together and building on data faster though. To productionize data, quality and reliability matter. However, in the DataOps world, you need to balance delivering on quality while also supporting open discovery and experimentation. Data stewardship and governance processes play a key role, but it’s critical that these processes continuously get feedback on what’s actually happening with the data so they don’t become stale.
Similarly, testing and validation processes need to be added as micro-stages of these workflows, rather than the final stage (if that). By dividing workflows into smaller, more incremental stages, it allows for multiple steps to happen in parallel and avoids heavy dependency or orchestration complexity. Changes and continuous improvements can easily be added into micro-stages, without the need to re-architect the entire data workflow.
Do I Hire a DataOps Manager?
The great part about DataOps is you likely have everyone you need already. Each team or builder that is part of your data workflow makes up your DataOps team. Since this will require cultural and procedural changes, their day-to-day tasks may end up looking different than they do today. Luckily, that usually means your team will be doing less of the manual, tedious tasks and focus more on the data workflows themself.
With DataOps being such a transformational shift for companies, you should also make sure to have an executive who supports this transition, to help get your whole data team on board. The Chief Data Officer (CDO) is the one we most commonly see backing this, though we’ve also seen CTOs or CIOs driving the cause.
Are You on the DataOps Bandwagon?
We’d love to hear where you are in your DataOps journey or if it’s even a consideration for you right now. Feel free to reach out and share your story with us or let us know what other questions you have around it. We can’t wait to hear from you!