fbpx

Stijn “Stan” Christiaens is the Co-Founder and Chief Technology Officer at Collibra.

Shutterstock

We all know that data governance has perception issues (to put it mildly). True data governance is the control and enablement of any and all data management activities. All too often, though, data leaders focus on the control angle or approach the issue from a technical data management angle, which results in flawed perceptions. For some, it conjures up images of police and bureaucracy. They have chilling visions of data locked away in dark catacombs, only accessible after months of fighting layers of red tape. For others, they painfully remember the energy they wasted attending meetings, updating spreadsheets and maintaining wikis, only to learn that nobody was making use of their hard work.

No wonder data governance has a bad rap. Despite the real value it provides, organizations shy away from implementing governance because of past, misguided experiences. Imagine the following scenario and see if anything sounds familiar.

Your company is kicking off a project to build a new data lake. Everyone is really excited about the project because people will finally find all their data in one place for reporting, business intelligence (BI), analytics and more.

During the kickoff meeting, all is going well until someone sucks the air out of the room by suggesting you incorporate data governance into the project. People look around at each other uncomfortably until a naysayer speaks up and says “No way — governance doesn’t work. It’s just a bunch of slides full of theory!” Then somebody else chimes in: “It can work, but it sure sounds like a lot of work. And that will slow our project down.” Strike one for data governance.

The project moves forward — without data governance — and a few months later, the lake starts to show signs of decay. The once pristine lake becomes clouded with ill-defined, poor quality data and there are too many copies. Once again, someone boldly brings up the idea of data governance. This time, the executive champion is on-board. He tells the team to turn the theoretical slides into spreadsheets, to schedule monthly stewardship meetings and to create a collaborative environment using a wiki. It all sounds great, until three months later when nobody shows up for the meetings. The spreadsheets grow frustratingly complex and sink to the bottom of overloaded inboxes. The final straw was when people stopped trusting the wild west wiki. It was strike two for data governance.

Fast-forward another six months. The data lake project continues to progress. With each day that passes, the lake becomes more like a swamp. The clusters are always starving for more nodes. The team is stressed. They see money swirling down the drain with no real value to show for it. The remaining data scientists become increasingly vocal as well. They complain that they are spending all of their time searching for data, which is a total waste of their talents. To make matters worse, the few insights they did uncover never made it to the business — those that did were not trusted. They begin murmuring that if things don’t change soon, they’ll take their talents elsewhere.

Clearly, the team is at a crossroads. Our data governance believer still knows governance could be an answer. Instead of tooting the data governance horn once again, she takes a different approach: “Let’s wrap a data catalog around the lake.” The catalog will help identify what data is in the lake, what it means, who owns it, who is using it and maybe even where it came from. The catalog can also help identify if there is data in the lake that shouldn’t be there — and who was responsible for dumping protected, duplicate or poor quality data into their crystal-clear lake.

The data lake team enthusiastically leans in. This “data catalog” could be the answer they’ve been looking for all along. It will help them clean up their dirty lake and ensure that only the right data is found in the right place at the right time. Not only that, but the catalog could make all the data easy to find, easy to understand and easy to trust — for everyone in the business.