Does it have to be called Data Governance?

This is a question that I get asked fairly regularly. After all it is not an exciting title and in no way conveys the benefits that an organisation can achieve by implementing Data Governance. Sadly however, there is no easy yes or no answer. There are a number of reasons for this:

  1. Data governance is a misunderstood and misused data management term

Naturally I am biased, but in my view, data governance is the foundation of all other data management disciplines (and of course therefore the most important). But the fact remains that despite an increasing focus on the topic, it remains a largely misunderstood discipline.

On top of this, it is a term which is frequently misused. A few years ago, a number of Data Security software vendors were using the term to describe their products. More recently the focus on meeting the EU GDPR requirements has led to a lot of confusion as to whether Data Protection and Data Governance are the same thing and I find that the terms are being used interchangeably. (For the record, having Data Governance in place does help you meet a chunk of the GDPR requirements, but they are not the same thing).

Having more people talking about Data Governance is definitely a good thing, but unless they are all meaning the same thing, it leads to much confusion over what data governance really is.

I explored this topic in a bit more detail in this blog: Why are there so many Data Governance Definitions?

In order to understand whether Data Governance is the right title for your organisation to call it, I would start with looking at how you define data governance. And this step leads nicely to the next item for consideration.

  1. Sometimes it is right to include things which are not pure data governance in the scope of your data governance initiative.

This is a topic that I covered in my last blog which you can read here.

To summarize that article, it is just not possible to have one or more people focus purely on Data Governance in smaller organisations. It’s a luxury of large organizations to be able to have separate teams responsible for each different data management discipline (e.g. Data Architecture, Data Modelling or Data Security).  Going back to my point above, if data governance is the foundation for all other data management disciplines, it is only natural that the line between them can sometimes get a little blurred. As a result of this, the responsibilities of the Data Governance Team can get expanded.

So consider what is included within the scope of your data governance initiative and decide whether it be more appropriate to name the initiative and your team (either or both)  something that is more aligned to the wider scope of the initiative and activities of the team.

Is the name going to make cultural change harder to achieve?

Achieving a sustainable cultural change is one of the biggest challenges in implementing data governance and insisting on calling it “data governance” could make achieving that cultural change more difficult if the term doesn’t resonate within your organization. This is related to a topic that I explored in another old blog Do we have to call them Data Owners?

Whether we’re talking about the roles, the team, or even the initiative the same principles are true. It is better to choose a name that works for the culture in your organization than to waste considerable effort trying to convince people that the “correct” terminology is the only one to use.

It would be my preference to explain that the initiative is to design and implement a Data Governance Framework, but if the primary reason for implementing data governance is to improve the quality of your data, perhaps calling it the “Data Quality Team” and “Data Quality Initiative” would fit better? After all, that very much focuses on the outcome of what you’re doing.  It also addresses the question that everybody asks (or should ask) when approached to get involved in data governance of “why are we doing this,” which is usually followed by “what’s in it for me?”

When having these conversations, I explain the initiative in terms of its outcomes (e.g. better quality data which will lead to more efficient ways of working, reduced costs and better customer service). That is a far easier concept to sell rather than implementing a governance structure, which can sound dull and boring.

Is the name causing confusion?

In the early days of a data governance initiative, the talk is all about designing and implementing a data governance framework. Once this work has been achieved you start designing and implementing processes which have “Data Quality” in their titles:

  • Data Quality Issue Resolution

  • Data Quality Reporting

I have been fortunate enough to work with organizations in the past who have had both a Data Governance Team (supporting the Data Owners and Data Stewards) and a Data Quality Team (responsible for the processes mentioned above) but that is fairly unusual in my experience. It is more common for the Data Governance Team to support the above processes. So it is worth considering whether it would confuse people if they had to report data quality issues to the Data Governance Team?

In summary, I would not want to miss the opportunity to educate more people on what Data Governance really is. But the banner under which it is delivered can be altered to make your data governance implementation both more successful and more sustainable. So if having considered all the points above in respect of your organization and you want to call it something else, then that is fine with me.

Deciding what to call your initiative is only the start of many things you need to do to make your Data Governance initiative successful.   You can download a free checklist of the things you need to do here. (Don't forget this is a high level summary view, but everyone who attends either my face to face or online training gets  a copy of the complete detailed checklist which I use when working with my clients.)

What do you include in Data Quality Issue Log?

58669333_m.jpg

Whenever I am helping clients implement a Data Governance Framework, a Data Quality Issue Resolution process is top of my list of the processes to implement. After all, if you are implementing Data Governance because you want to improve the quality of your data, it makes sense to have a central process to enable people to flag known issues, and to have a consistent approach for investigating and resolving them.

At the heart of such a process is the log you keep of the issues.  The log is what the Data Governance Team will be using while they help investigate and resolve data quality issues, as well as for monitoring and reporting on progress.  So, it is no surprise that I am often asked what should be included in this log.

For each client, I design a Data Quality Issue Resolution process that is as simple as possible (why create an overly complex process which only adds bureaucracy?) that meets their needs. Then, I create a Data Quality Issue Log to support that process.  Each log I design is, therefore, unique to that client.  That said, there are some column headings that I typically include on all logs.

Let’s have a look at each of these and consider why you might want to include them in your Data Quality Issue Log:

ID

Typically, I just use sequential numbers for an identifier (001, 002, 003 etc).  This has the advantage of being both simple and giving you an instant answer to how many issues have been identified since we introduced the process (a question that your senior stakeholders will ask you sooner or later).

If you are creating your log on an excel spreadsheet, then it is up to you to decide how you record ID numbers or letters.  If, however, you are recording your issues on an existing system (e.g. an Operational Risk System or Helpdesk System), you will need to follow their existing protocols.

Date Raised

Now this is important for tracking how long an issue has been open and monitoring average resolution times.  Just one small reminder: be sure to decide on and stick to a standard date format – it doesn’t look good for dates to have inconsistent formats in your Data Quality Issue log!

Raised By (Name and Department)

This is a good way to start to identify your key data consumers (it is usually the people using the data who notify you when there are issues with it) for each data set.  This is something you should also log in your Data Glossary for future reference (if you have one). More importantly, you need to know who to report progress to and agree on remedial action plans with.

Short Name of Issue

This is not essential and some of my clients prefer not to have it, but I do like to include this one. It makes referring to the Data Quality Issue easy and understandable.

If you are presenting a report to your Data Governance Committee or chasing Data Owners for a progress update, everyone will know what you mean if you refer to the “Duplicate Customer Issue”. They may not remember what “Data Quality Issue 067” is about, and “System x has an issue whereby duplicate customers are created if a field on a record is changed after the initial creation date of a record” is a bit wordy (this is the detail that can be supplied when it is needed).

Detailed Description

As I mentioned above, I don’t want to use the detailed description as the label for an issue, but the detailed description is needed. This is the full detail of the issue as supplied by the person who raised it and drives the investigation and remedial activities.

Impact

Again, this is supplied by the person who identified the issue. This field is useful in prioritizing your efforts when investigating and resolving issues. It is unlikely that your team will have unlimited resources and be able to action every single issue as soon as you are aware of it. Therefore, you need a way to prioritize which issues you investigate first. Understanding the impact of an issue means that you focus on resolving those issues that have the biggest impact on your organization.

I like to have defined classifications for this field. Something simple like High, Medium and Low is fine, just make sure that you define what these mean in business terms. I was once told about a ‘High’ impact issue and spent a fair amount of time on it before I discovered that in fact just a handful records had the wrong geocode. The percentage of incorrect records made it seem more likely that human error was to blame, rather than there being some major systemic issue that needed to be fixed! This small percentage of incorrect codes was indeed causing a problem for the team who reported them. They had to stop time critical month-end processes to fix them, but the impact category they chose had more to do with their level of frustration at the time they reported it than the true impact of the issue.

Data Owner

With all things (not just data), I find that activities don’t tend to happen unless it is very clear who is responsible for doing them. One of the first things I do after being notified of a data quality issue is to find out who the Data Owner for the affected data is and agree with them that they are responsible for investigating and fixing the issue (with support from the Data Governance Team of course).

Status

Status is another good field to use when monitoring and reporting on data quality issues. You may want to consider using more than just the obvious “open” and “closed’ statuses.

From time to time, you will come across issues that you either cannot fix, or that would be too costly to fix. In these situations, a business decision has to be made to accept the situation. You do not want to lose sight of these, but neither do you want to skew your numbers of ‘open’ issues by leaving them open indefinitely. I like to use ‘accepted’ as a status for these and have a regular review to see if solutions are possible at a later date. For example, the replacement of an old system can provide the answer to some outstanding issues.

Update

This is where you keep notes on progress to date and details of the next steps to be taken (and by whom).

Target Resolution Date

Finally, I like to keep a note of when we expect (and/or wish) the issue to be fixed by. This is a useful field for reporting and monitoring purposes. It also means that you don’t waste effort chasing for updates when issues won’t be fixed until a project delivers next year.

I hope this has given you a useful insight on the items you might want to include in your Data Quality Issue Log. You can download a template with these fields for free by clicking here.

Running and managing a Data Quality Log using excel and email is an easy place to start but it can get time consuming once volumes increase – especially when it comes to chasing those responsible!   That’s why I was delighted to be involved recently with helping Atticus Associates create their latest product in this space, DQLog.   The Atticus team are launching their beta version in Spring this year and they are keen to hear from anyone interested in trying it for their feedback.  If you are interested in testing the beta, please email me and I can put you in touch.