EP 161 - How to maintain data quality across systems - Malcolm Hawker, Chief Data Officer, Profisee

Podcasts > Ep. 161 - How to maintain data quality across systems

Ep. 161

How to maintain data quality across systems

Malcolm Hawker, Chief Data Officer, Profisee

Wednesday, February 01, 2023

This week, our guest is Malcolm Hawker, Chief Data Officer of Profisee. Profisee is a cloud-native master data management solution that helps enterprises solve data quality and governance issues.

In this talk, we discussed the challenges related to data management, from integration between systems to standardizing workflows. We also explored how blockchain can impact data governance by making it easier to manage data sharing between supply chain partners.

Key Questions:

How do you segregate jobs to be done in data management?
What are the distinct challenges for each platform and data set?
How centralized does network data mover tend to be?

Transcript.

Erik: Malcolm, thank you so much for joining us on the podcast today.

Malcolm: Thank you, Erik, for having me. I'm excited to be here.

Erik: Yeah, great. I'm really looking forward to this conversation. We're a company that focuses very much on helping our clients discover how to make use of data, so it's a topic that I wrestle with all the time. I think, obviously, it's the topic that you focus on at Profisee today. But it looks from your CV that you've also covered this from a number of different perspectives as an analyst at Gartner, architect at Dun and Bradstreet, senior director of I.T. at Neustar. So, there're seven or eight senior leadership positions that you've had in your career, they've all touched on data from different perspective. What is the bright line that brought you through that career to the position you're at today as head of strategy at Profisee?

Malcolm: Yeah, the through line there is a few different things. This may sound pithy. But one, I really like working with companies to help them solve really, really difficult problems. The harder the problem, the more attracted I am to it. That's been the through line of my career. But when it comes to data, there's maybe a bit of a sadist in me that really, really, really the heart of the problem, I'm going to run towards it. There's a lot of problems in the data world. We can certainly talk about that. There's a lot of challenges in the data world. But what makes it a little perplexing for a lot of companies is that, on the surface, those problems often seem rather simple.

You're looking at data, and you see two things. One says Jeff Smith, and the other one says Jeffrey Smith. You think, okay, well, they're probably the same person. Well, are you sure? Why do you have two different records for the same person that looks maybe they're the same, but maybe they're not the same? You're not 100% sure. That's one tiny little example of a data-related challenge that companies actually really, really struggle with.

Somebody outside the data space could look at that and say, "Okay. Well, that doesn't really look that hard. How hard could that be, or how hard could it be to make a system A talk to system B or these types of things?" But solving these challenges is really all what I'm about. It's all in my DNA, and it's been the through line through my entire career, whether I was running an IT organization at a $2 billion publicly-traded company, or whether it was as a consultant, an independent consultant, or whether it was working for a software company that was selling solutions to these problems.

The through line has been working with large companies to help them figure out how to turn data from a liability — which it often is for most companies — to an asset. The whole idea there is to make data the new oil, become data-driven. Use data as a lever to competitively differentiate yourself from your competitors. That's really been the consistent theme of my career for at least the last 20, 25 years.

Erik: Yeah, that's a good perspective. Because when data is first collected from consumers, from sensors in a factory, you're right, it really first is a liability. It's sitting in a data center. It has IP implications. It has privacy implications, cybersecurity implications. So, it's basically a liability, but it also has value. That's a great way of viewing the challenge here — how do you convert from a liability to a challenge?

When we were chatting earlier, you mentioned that at Gartner, you've reviewed something like 1,500 companies related to data management over the past few years. It's a very complicated landscape of companies that are addressing this challenge. In your previous role as an analyst, how do you segment that? How do you look at the different jobs to be done in data management?

Malcolm: Yeah, there's a lot of work there, believe it or not. The story that I tell, at least from a data perspective, is, I work in the engine room. We work in the world of data management, which for a lot of companies sits in an IT organization, generally under a CIO or some (CDO) Chief Data Officer, increasingly so.

When we look at the world that data management and data-related challenges, we can break it down into a few different pieces. One of them is data integration, getting data from system A to system B. The example I used before was making systems talk to each other. The easiest way to have two systems to talk the same language is to have their data be consistent across them, and that can be easily consumed across them. So, the world of data integration. Another world is data quality. How do you make sure that the data that is being viewed in a system, or viewed in a report, or used by your end customers is trustworthy, consistent, accurate, unique? They're all the attributes of data quality.

Another world is this thing called Master Data Management, which is what Profisee does for a living. Master Data Management solves for the proverbial single source of the truth. If I'm looking at two records, Jeff Smith and Jeffrey Smith, how do I know which one is accurate? Can I create what's called a gold master record that is used around the organization to make better decisions or to make reports more accurate? So, there are other worlds. There's BI. There's analytics, business intelligence and analytics, the creation of reports. There's data science, most certainly, for a lot of large organizations now.

So, when you put all these — there's metadata management, which is going to be the data about data. We're so specific in the data world. We even have a unique word that describes the data about data. It's most certainly data governance, which is the policies and procedures that relate to your data. How long can you retain it? Where do you archive it? Who can create it? Who has the rights to see the data? What are the rules related to how you define data, data objects? What are the rules for how you define relationships that may exist within your data, hierarchical type relationships within your data?

All of those things together define the world of data management. It's people like me and other data professionals that try to figure out the tough answers to those questions of how to make all of that stuff work, and how to make it all seamless within an organization so that the organization is making good decisions, and that it's fully and completely optimized from an operational perspective.

Because everything every company does — every application, every business process, every piece of the manufacturing process, every piece of the supply chain — it's all running on data. If any of it is wrong or bad, you're making wrong decisions. You're operating sub-optimally, or maybe the manufacturing plant even just shuts down entirely. Data is the lifeblood of all organizations in all companies in making sure that it's accurate and trustworthy and consistent and well-governed, and is deeply integrated across the organization. It's what I do for a living.

Erik: Okay. Great. So, that gives us a good background. Maybe we can drill here then into the area where you're focusing on.

Malcolm: Yeah.

Erik: The source of truth. Maybe one starting point would be what kind of data are you working with. Are you working with more traditional database data, where it's tagged and so forth? Do you also work with data that might be in a data lake or an unstructured data? Is there a scope around the datasets that you might be working with?

Malcolm: Yeah, so, not all data is created equally. Master data is the high-level acknowledgment that data that is shared widely across the organization needs special care in feeding. The quintessential example is data related to your customers. Pretty much, everybody is going to be working with customer data in some way, shape, or form. Sales is going to be using customer data to build sales proposals. Marketing op, very obviously, is interested in customer-related data. What do they want? What do they not want, and on and on? Manufacturing even cares about customer data because they're trying to meet the customer needs.

A customer has a thread, a data element, a data object, as we call it, that is used widely and is shared widely across the organization. So, when data is shared widely across the organization, it needs to be consistently governed. Meaning, it has to have consistent business rules at the core. Even things like definitions, how do I define a customer? Marketing may want to call it a customer one thing, and finance may want to call a customer something else completely. Resolving those differences, or maybe even embracing those differences but at least acknowledging them and managing them.

Master data is data that is shared widely across the organization. But it's not all data. It's actually finite. It's things like customers, suppliers, locations, assets, materials, other data points and other data objects that are used widely and need to be managed in a more consistent way, and need to be deeply integrated across the workflows and processes and systems that are all consuming that data. What I just described to you is called master data. Again, it needs special rules, because it is shared widely.

There's other data in the organization that may just exist within one application. We would not consider that master data. It may be very, very important. It could be critically important data for a specific business process. But it's only if it's only used by one department, or one area, or one database, or one table. It doesn't have to have rules that are defined and managed collectively across the organization. So, this is where we draw a line between what's considered master data and what's not master data. What Profisee does is sell the software solution platform to manage that master data, where you can have consistent rules for the management of that data, where you can allow that data to be deeply integrated across the organization, have consistent governance, quality standards for that master data.

Not all data is created equally. There's a unique subset of data that's called master data. It needs specific rules and special rules so it can be used widely across the organization. At a really, really high level, if your CEO asks how many customers do we have, there can really only be one answer. That's kind of this idea of a single version of the truth. At a higher level within the organization — CEO, CFO — there really only can be one answer to that question. That's what master data really helps with.

Erik: Okay. I got it. Maybe we can use my company as a microcosm here.

Malcolm: Sure. It's good.

Erik: It will be probably the simplest possible company as a boutique consultancy. You can think we have our CRM. Data is relatively well-structured there. But still, we have duplicates. Somebody goes to a meeting and meets somebody, puts them in. That person has already been there. So, now we have a duplicate. Then we might pull that out into an email tool for sending our newsletter. We pull that into an event management tool for managing our community. All of a sudden, we have people in different tools. Even though we're quite a small company, we already have an issue with managing that data. If we simplify, what would using Profisee look like for that set of challenges, just dealing with the customers and people related to customers?

Malcolm: One thing to keep in mind, MDM, Master Data Management is both a noun and a verb. It's both a thing you can buy from us and many others that are also on Gartner's Magic Quadrant for MDM. So, it is a software platform. It is a software solution, but it's also a way of managing data. It is a discipline. First and foremost, MDM is really a discipline.

For relatively small companies, from a software perspective, probably overkill. MDM solutions will be tens of thousands of dollars as a starting point. At the end, they can go quickly up from there. Typically, what we would see is, by the time a company hits a certain revenue point — there's no magic number here but, generally, at least several hundreds of millions of dollars of revenue — where companies start to get relatively big, where the cost from a manual data management starts to get excessive. Because for a company like yourself, where you've got discrepancies across systems or where there may be a handful of records within your CRM that really don't align to what's going on in the ERP system or some sort of revenue recognition system, you can manually address those. Your entire organization isn't going to be crippled by that, and it's not going to be entirely slowed down by that.

You will still be doing things that align to MDM as a discipline. You'd put in some policies, for example, that could say, hey, here are our standards for customer naming. Or, here's what you need to do to make sure that you don't create duplicate contacts in our CRM. There's processes and policies that you could put in place that tend to be fairly manual, but at least would help address that situation. Or, here's what you do when you do encounter a duplicate. You don't delete the old one. Maybe you just create a new one, or whatever the rules are that are going to work best for your organization.

For relatively small companies, that's how they'll solve that problem. Maybe they'll create some separate spreadsheet where it's some sort of new master that is sitting up outside of the CRM, or whatever the policies are that are going to work best. Again, for relatively small companies, they're probably going to just manually solve those problems.

But once companies get to a certain size, the problem that you described can become incredibly onerous and can slow down lots of different operational processes. A good example would be, maybe a customer service rep is using that exact same CRM system. A customer calls in with an issue and wants to open a trouble ticket. The customer service rep says, "Okay. Thank you. Who do you work for?" Well, I work for ACME Incorporated.

They do a search in their CRM for ACME Incorporated. They find 15 ACME Incorporated. They'll be like, okay, which one is the right ACME Incorporated? I see 15 of the same records here. I don't know which of these records to associate the struggle ticket with. I don't know which of the ones to look at if I want to see the customer service history of this entity that is calling me and needs my support. When a company gets to a certain size, these issues related to data can be really, really impactful and can slow down a customer service experience or hinder a customer service experience. They can hinder a manufacturing process.

Once companies typically get to this size, and they start to realize that they can't throw more manual processes at the problem, they can't just keep throwing bodies or processes and procedures at this, they'll inevitably turn to leveraging software like a Profisee to try to automate for some of these solutions.

Because Profisee can look in and see that. Oh, listen. You've got 15 ACMEs here. Let's configure some basic business rules so that your CRM agent will only see one. Even though you may be still be having 15 in the database, but what the agent would see is the one. Or, maybe you actually do want to merge those 15 records together into a single version of the truth. There's a lot of different ways to solve for the problem. These are what's called different styles or implementation styles of MDM. Profisee can support all of them. It really depends on your business need. But where we live is generally companies that are around $2 to $300 million in revenue, upwards to about 3 to 4 to 5 billion. That's Profisee's sweet spot.

Beyond 5 billion, into the extremely large companies, we don't tend to play there very much, at least from a marketing perspective, a customer's perspective. We do have a few very large companies, but our sweet spot is mid-sized companies that are struggling with data that may be relatively new to MDM, that are looking for easy to use, fast time-to-value solution. Those are our sweet spot customers. Generally, often operating in healthcare and manufacturing, or financial services. Those are three verticals where we really tend to shine from a solution perspective.

Erik: If we look at those three verticals, do they each have quite distinct challenges? Do the challenges tend to adhere around particular platforms or dataset?

Malcolm: Well, at the core, all the problems are consistent at the core, which is, companies tend to become as they grow and as they become more functionally aligned, where marketing exists and finance exists. Over time, companies naturally evolve these silos, where the way that marketing does things is different than the way finance does things. That's natural. The core problem that exists universally across all companies — whether it's a manufacturing company or a healthcare company — is that data silos tend to naturally evolve, where you will have buckets of data that exist in one system that don't exist in another, or maybe they're even replicated. You may not even know it, where you've got data that is poorly managed, or badly duplicated, or inconsistent, or inaccurate. These things naturally happen as companies grow.

Now, within each of these verticals, there are different words to describe some of the same things. In the healthcare space, it's about patient-related data. It's not necessarily customer-related data. There's a different label, but the data looks remarkably the same. The patient is a person. If you're a B2C company, a customer is a person. They're both people.

Things tend to be a little different from a naming perspective and a process perspective. Obviously, in the manufacturing world — but sitting at the core of all of this are these nouns, these core nouns. In the manufacturing world, it's suppliers or vendors for indirect spend, or it is materials, or it's assets. In the healthcare world, that's payer. Meaning, insurance companies. Providers meaning doctors. Everybody else in large healthcare companies, it's patients, and on and on. So, a lot of the data looks very, very similar. A lot of the core business challenges look very, very similar even though the business processes may differ a little bit, even though the definitions of things across those words may differ a little bit.

Erik: Okay. Let me ask a question. Maybe we can get into a few of the differences here if we look at the programs that you're working with. So, CRM. Everybody's going to have a CRM of some sort. They tend to be relatively standardized and fairly consolidated industry. Then if we get into SaaS for sales enablement or something, all of a sudden, we have a proliferation of a lot of programs. They might only be used regionally, et cetera. So, a lot less standardization. So, we have that. That's one area of challenge.

Second area I can imagine here, if we specifically look at healthcare as an example, we have partners. So, there's a hospital. There's an insurance company. They're both talking about the same person. So, then, you have this issue of integration across companies, not just within a company. So, that could be another set of challenges between integration.

Can you give us just a quick walkthrough of what would be the different platforms — CRM, SaaS, et cetera — that you would integrate with? What are the things that you wouldn't? I don't know. Maybe an Excel document, you would say, okay, we're not messing with that. But obviously, people store important data in Excel documents. What's out of scope in terms of systems, and then also if we look at that issue of data that's internal to your company versus data that's with customers or suppliers or partners that you still need to have aligned?

Malcolm: By definition, MDM software platforms like Profisee really need to be able to integrate to largely anything. If I was sitting in front of a whiteboard, I was drawing this out, what I'd be drawing is three databases on the bottom of the screen. The databases could be Excel spreadsheets — yes, an amazing number of companies totally have a lot of critical data management in Excel — or Access. Yes, that still happens. But you'd have multiple sources of data in databases or in tables. It could be an application. It could be an individual table. It doesn't matter. CRM, ERP, marketing automation systems, procurement systems, HR related systems, you name it. MDM solutions can consume data or can integrate to just about any source. Typically, the architecture here is, you would have number of sources of data. These would be the three databases on the bottom of the diagram that I would draw.

Sitting right over the top of those three databases — or four, or five, or six, or whatever it is — would be an MDM data hub. MDM solutions are data hubs. What they do is take a limited finite amount of data out of source systems, replicate that data, and then persist that data into some form of hub. So, we'll take a snapshot of data out of the CRM. We'll take it out of the ERP. We'll take it out of the marketing automation system, out of the HR system. Wherever you need, wherever that company leads to virtually, at a data level, integrate those systems together to create a common version of the truth, MDM system will take that data out, replicate it, manage it, and persist it in a data hub.

What MDM will then do is look at that, and apply some consistent business rules to that data. So, this is where data governance comes into play. These are the policies and procedures that you want to apply to this data to allow you to create a single version of the truth. This would even be things like, okay, what are the business rules for resolving differences and duplicates of data, a.k.a how do I define uniqueness within, let's say, a B2B record, a record for ACME Incorporated? How do I define it? How do I know when ACME is a unique thing? MDM has fantastic, these really powerful, algorithmically-driven pieces of software that can go and evaluate large sets of data. We can do things, like look and see it's Jeff, Jeffrey, or JJ, or it's ACME Inc., ACME Co., ACME & Son, ACME LLC. We can figure out if that is one thing or if that's four things. If it is one thing, what we can do then is, we can say, "Aha, this is a master record. We'll create a new master ID that links all four of those source IDs together." We can virtually link them together and continue to persist the source records, or we can actually physically merge those records together, if that meets the operating model for the organization.

Creating this single version of the truth and persisting the single version of truth is what MDM is all about. Once you've got that version of the truth, you can use it two different ways. It will be sitting there persisted in an MDM hub. So, you've got four records for ACME Incorporated in the past. Now you've got some new master record, a fifth record, potentially. You could use it two ways. One, you can use that to fuel analytics, where you can say as an organization, well, you know what? There's four different versions of ACME across all these systems, and we know it. But all we really want to do is to create a 360-degree view of ACME. We don't necessarily need to change the source data. We want marketing to continue to manage ACME how it manages ACME. This is more what we see from large manufacturing companies, where each individual operating unit or division wants autonomy, and they want to continue to manage their customer relationships how they want to do it.

What you can say is that, I do have a single master record for ACME Incorporated. I want to use it to create a 360-degree view of our relationship with ACME. I can use that new master ID that is linked to those source IDs to pull back every bit of information across every source, and to create that 360-degree view of ACME. So, that's one use of MDM. It's to create highly trusted, consistent, accurate, high quality, 360-degree views of anything, whether that is an employee, whether that is an asset, a location — in this case, a customer — that's one use of MDM.

The other use is what we call more operational styles of MDM, which is once you create that gold master record, you can turn around and push it back into the source system from where it came. So, you can actually go from, okay, before, I was calling this ACME Inc. but its correct name is ACME LLC. As a system that uses data about ACME — that CRM system, for example — what you could say is that, "Aha, the data that I had before was incorrect. Now I'm going to use this correct data so that my customer will see their name reflected correctly on the sales proposals. We'll be able to use the correct legal name when it comes to actually recognizing revenue and ERP systems and on and on."

You can use data that is created and managed by an MDM system for better analytics. You can use it actually into operational systems where you are using MDM as a source of truth for information related to ACME. Hopefully, that makes sense.

Erik: Yeah, I think that makes sense. Just one clarification. You mentioned you're pulling a snapshot. I guess, with the first use case you mentioned, you're maybe pulling a snapshot once a quarter, once a month in a reporting sample probably—

Malcolm: Or real time.

Erik: Yeah, that would be the question then. Is this done continuously in real time, or is this typically on a schedule? What does that data pull look like?

Malcolm: It depends. I can't tell you how many times I've been — particularly, as a consultant, you sit down. You start talking about those analytical use cases that I was mentioning where everybody wants real time. Then you ask and probe a little bit more. It's like, well, can the applications that would be using this and the reporting systems that are using this even consume data in real time? Well, no, they can't. We actually tend to run our reports on a daily basis. Okay.

But if you wanted to run real time, you could absolutely run real time. Once you create, manage, and persist this gold master record in an MDM, MDM systems are very, very good at consuming data from anywhere. They're very good at publishing data to anywhere. So, you can be pushing this data out typically through API's, typically through RESTful APIs. We're getting into the technology. Or, you could stream it via Kafka or others. You can be pushing that data into reporting stream. You could be pushing it into a CRM system. You could do that as often as you want, depending on the business need. It can be real time. It can be batch. It just depends on the use case.

Some of the customer-driven use cases where you are using that information, the example I gave was to support customer service inquiries, maybe you're using that information to make marketing decisions that have to happen real time. Meaning, what ad do I want to serve this person that is coming to our website? Or, what content do I want this customer to see? That's more of a real-time decision, where some sort of a digital marketing platform or a marketing automation platform would be pulling data out or pulling an MDM and saying, okay, well, who's the customer? What's the customer ID? Certainly, more of a yield for real-time use case there. But there are other operational uses of MDM where maybe it's not urgent to have real-time access to that data. Maybe it's more of a batch-driven process. So, it just depends on the need. But MDM systems like Profisee can support either use case.

Erik: Let's talk about the stakeholders here. You've mentioned a couple of use cases. I suppose there's also — it feels like there would have to be a human in the loop to quality check or sense check. Are we actually identifying the right identifier? So, the question would be, who is buying this? Who's the buyer? Who's the owner of the system? Then who are the users, from an analytic perspective or from an operational perspective?

Malcolm: Yeah, it's a great question. Typically, these solutions fall into some sort of broader data management stack. I'd mentioned a few other solutions: data governance, data quality, data integration, BI analytics. Typically, whoever manages your reporting platform — your Tableaus, your Bersih, your Qliks, or whatever reporting platform — typically, that organization is the same one that is managing and maintaining some form of an MDM solution. Again, typically, under a CIO or a CDO.

What we see so often is that companies will figure out they've got an MDM problem. Often a starting point is problems with reports. Well, what we'll see is, some executive will get a report. They're looking at a report, and they'll see ACME two times. Or, some senior executive is looking at some other data point. What they see is they know that that data is incorrect. "ACME changed its name last week. Why am I looking at this old data? Why am I looking at inconsistent data? Why do I get two different answers for the same question depending on which system the data is pulled from?" These things are all indicative of MDM-related challenges.

So, these executives will go to IT and say, "Hey, IT, my report's broken? Fix it." Then folks in IT will look at it as like, "Well, you know what? The reporting system is actually working fine. It's working to spec. It's pulling the data correctly. It's aggregating on the correct keys, our group buys and sort buys. Everything in our analytics platform, it looks great." But then you look at the source data, it's like, "Oh, we got a problem here in the source data. We've got some problems. We've got ACME appearing two times. It probably shouldn't appear two times because there's only really one customer here." That screams MDM.

Typically, the CDO or the CIO will go out. They'll approach Profisee or other data management vendors, and say, "Hey, I got this problem. Can you help?" Typically, the key stakeholders here are on the, what we call the business side of the house. It is people within sales, or marketing, or finance. It's people within supply chain, within procurement who are trying to optimize their supplier spends, people in the manufacturing world as well. Anybody on the business side who is in the business of marketing, selling, making anything in an organization, those are the stakeholders of MDM. So, these tend to be fairly complex buys from a software perspective. They don't have to be.

There can be a single stakeholder group. It's common, for example, finance leaders to use MDM to have a single source of truth when it comes to managing things like general ledger codes. There can be single stakeholder groups. But typically, by its nature, MDM tends to be cross functional. Because what MDM tends to do is, it finds and highlights breakpoints between different functions.

A classic example is what we see all the time. It's the way customer data looks in a CRM is different than the way customer data looks in ERP. Because one is a 'sell to' and the other is typically a 'bill to.' They don't necessarily have to be exactly the same. The 'sell to' and the 'bill to' can be the same corporate entity, but maybe different divisions. Maybe different departments. They're both right. But again, you don't want to have two different records for ACME Incorporated if there's really only one, or at least you don't want at higher levels enrolling up and creating these aggregated reports that are going to your CFO or your CEO, where you've got two different answers to the same question. Stakeholder groups are broad. They're diverse. They can be across the entire organization. They can even include HR where they've got duplicated employee records. But the internal organization that is deploying and managing solutions tends to be CIO, CDO.

Now, in the middle, between all of these — between stakeholders and between the IT — these software solutions are pretty smart. They're pretty capable. You had mentioned manual intervention here. Our algorithms are only so good. You are bang on, Erik, that sometimes human beings do need to intervene here. Sometimes, the algorithms will say we think that these two ACME records are the same. We have read a 60% or 70% confidence level that they're the same. But maybe the use case is a financial one, or a legal one, or a compliance one where 70% is not good enough, where you do need to have a human being review that record. We call that data stewardship.

Typically, those data stewards, optimally, will live on the business side of the house. You will have somebody in sales, or somebody in marketing, or somebody in finance who knows those business processes and who knows the data relatively well to help with the manual oversight of some of this stuff. Sometimes, the data stewards can live on the IT side. But typically, they will live on the business side of the house. There is a collaboration here. MDM is a collaboration between IT and between the business. You need both. You also need both involved in managing and configuring and setting some of these business rules. IT will sit down with business stakeholders and say, "Hey, we want to solve this problem with having two different versions of ACME. To do that, we need to have some consistent definitions of a customer."

I know that sounds a little crazy, but believe it or not, companies really struggle with having a consistent definition of customer. How marketing defines a customer is different than how finance will often define a customer. But to implement an MDM solution, you need to start managing and configuring some of these business rules. So, this is where governance, some form of the Data Governance Committee will come into play, where people will sit down, and they'll start talking about things like how do we define our customers? How do we define our relationships? What data is important to us? What data isn't important to us? What are all the various rules for managing this data. If we do create a single version of the truth, who has the rights to update it? In what system, and under what conditions? These are all policies that would fall generally under some umbrella of a Data Governance Committee, some form of data governance policies that need consistent care in feeding.

It's a collaboration. It's IT and it's business stakeholders. Typically, IT is spending the money to solve a problem for the business stakeholders. Once the solution is integrated, it is an act of collaboration ongoing that includes care in feeding of data through what are known as data stewards.

Erik: This raises another challenging area, which is the topic of managing data across regions. I'm sitting here in China, which is its own kind of beast. Companies have countless headaches trying to figure out what data they can and cannot move out, out of China into other markets. I think other geographies also have their own issues. Europe has its own set of challenges around personally identifiable information. How centralized does MDM tend to be? Does it tend to be deployed at a set of rules that are deployed regionally, or do you aim for global rule set with exceptions on as needed basis?

Malcolm: Yeah, it's a great question. 10 years ago, 15 years ago, MDMs tend to be like those data hubs that I was talking about. It tended to be deployed on-prem. You would have a fairly regionalized approach to it based on local data persistence and rules of the road. For example, in Europe or in Germany, if you're managing data related to people, that data needs to persist somewhere. Typically, 10, 15 years ago, you would have seen fairly regionalized approaches to solving for this, where data would be persistent where it needed to be persistent. If you have an instance running in Canada, you'd have an instance running in Germany and on and on.

Now our solution can still run on-prem if you want. If you want it to run on-prem, there are a handful of our clients that are still running on the on-prem solutions. But we are fully SaaS-enabled now. We can run in pretty much any cloud and support any operational requirement about data where data needs to persist.

From an operations perspective, they're just, like, where does the data actually live? We do have some clients that will have data stored in a US version of a cloud provider, or a European version of a cloud provider, or other local instantiations to support country specific compliance rules. That's common. But from an operations perspective, we can support any deployment model.

Things get more challenging. The centralization versus decentralization issue really becomes more challenging from a business process perspective. The operations in where do the databases actually live, that's the easy part. That's really the easy part here. The hard part is going to be bringing disparate groups together to agree on consistent data management rules. Because what we see generally, historically, is — I'll give you a very, very good, recent and poignant example, which was, in the manufacturing space, a lot of companies, very, very big companies, had local manufacturing presence. They'd be making goods in country A, country B, country C, to meeting those local demands and local needs. Well, along comes the pandemic. Along comes everything that went with it and all of these massive supply chain disruptions. Companies, then, for a long time, highly decentralized models of managing data and managing manufacturing were working. They were fine. Then they weren't.

All of a sudden, a lot of companies — this is a lot of the conversations that I had while I was a Gartner analyst. Companies would call me up and say, "Hey, we need to manage our data here a little more centrally because we need broader visibility on the business risks associated with our supply chain. We've really been negatively impacted by some of these supply chain disruptions. We want to continue to operate locally. But at least from a data perspective, we want to do a little bit more centralization of the data so that we at least have visibility on what's going on. Before, we were largely driving blind. Or, it was, after the fact we pull the reports, everything was fine. But now that model doesn't work anymore."

When you make the jump from highly decentralized ways of working to at least in a slightly more centralized ways of working, what you find is that it creates a lot of stress fractures in organizations that weren't there before. Because it didn't have to be there before. You start having to ask questions about, okay, well, how do you create an enterprise-wide definition of supplier or enterprise-wide definition of vendor? How do you create all the business rules related to data that you need in a more centralized way? That's the hard part.

The easy part is figuring out from an ops perspective and in database management perspective. That's the easy part. The hard part is figuring out all these business rules. I've talked to a number of senior managers at extremely large global manufacturing companies. We're even wrestling with who were hired at a corporate level. Who were either CDO, or CIO, or working for a CDO, or a CIO, who were in these large manufacturing companies? We're even having difficulty getting access to local data. Let alone, even getting to the point of having conversations about having to have a common supplier definition. Even just getting access to data was a challenge. So, that's the hard part.

Erik: Yeah, great. Thanks. It makes sense. It mirrors a lot of the headaches that we go through on a weekly basis here.

Malcolm: I bet.

Erik: There's a question a bit out of left field that I just have to throw out there. Because I know you published a white paper recently or maybe an article on blockchain impact on data governance. It's a fascinating topic. I feel like blockchain was a technology designed for data governance. Then people figured out that you can gamble with it, and the focus immediately shifted. But there seems to be a real use case for data governance. So, what's your perspective on this?

Malcolm: Well, it's funny. I published an article recently in Forbes about that. The title of the article was, How Blockchain Will Save Data Governance. Everything I just talked about — common supplier definitions or common business rules for managing data, that's data governance. The use case that I described earlier is a perfect example, a large multinational manufacturing company that is struggling to be a little more centralized. Not entirely centralized, but let's just say a little more centralized, at least from a data perspective.

Well, I happen to think that blockchain is a perfect use case for that. The conference I am at right now — I'm sitting in a hotel room in Washington, DC. I came to this conference to present this idea that I have about how blockchain can be really, really useful in our space. But really, the problem here is not technology. The problem is business process. Really, when you peel the onion, the problem here is data sharing. In a very, very, very high level, blockchain is fantastic and can be — I think it'd be truly transformational in the data governance and the data management space. But to do that, you have to embrace the idea of sharing data across a peer-to-peer ecosystem.

In the world of Bitcoin, you want to do that because it's currency. There are network effects that exist when everybody adheres to the same rules of the road, and when everybody agrees that this is how transactions are validated, and this is what a transaction means, and this is what a block means. If we all act together in the same way and adhere to the same business rules, then everything will be more widely adopted. This thing will grow in value. We'll all become trillionaires. That's fantastic in that world. But in the world of business, to create peer-to-peer networks of sharing of data, it's going to be a real challenge. I still think it will be transformational. I think blockchain will help facilitate these broad data ecosystems, but we got to get companies over the hump of sharing data.

I was reading an article recently about IBM and Maersk joint venture that they had announced a while ago called TradeLens, where they were talking about using Blockchain to facilitate data sharing across complex supply chain ecosystems. At the time, I was actually at the IBM conference — it was called Think. It was in Las Vegas six or seven years ago — where the CEOs of IBM and Maersk actually announced this. It was like, man, this is awesome. This is transformational. This could be a game-changer because of how clunky things are across logistics, related to global supply chains and shipping containers. Having consistent definitions for shipping containers, knowing where they are and where they start and where they end and the full lineage, as we say, in the data world, the full provenance of any item and where it is in this budget. In full visibility and full transparency that the blockchain can afford us, man, this is going to be a game changer. Then last week, I just learned that they're sunsetting that partnership and giving up on it.

I do think blockchain will be transformational. I do think that the corporate world has a few roadblocks to get over when it comes to widespread sharing of data. Because that's ultimately what we're talking about. A blockchain of one is not valuable. One cell phone, a person with one phone, with one node on a network, is not valuable. The value will come when there are millions of nodes in a network, whether that is a network related to supply chain or whether that's a network related to customer-related data or any other network. That's when the real value will come. The only way that we'll get there is when companies realize that for some use cases — not all, but for some use cases — there will be fantastic economies of scale when data is managed in a more shared way and as a shared resource across a complex blockchain-enabled ecosystem.

Erik: Yeah, that's the crux. It does really make sense on paper. It doesn't necessarily make sense to the lawyers and then management. I guess, it's just one of those issues, where getting there requires a leap of faith. Because you have all these hypothetical problems, and you have to assume that you can solve them. Large corporations, they don't like taking leaps of faith. But it would be interesting to see if maybe there are some opportunities where smaller organizations could lead the way there and maybe with less sensitive datasets.

Malcolm: Yeah, and there's plenty of examples of this, of widespread data sharing to facilitate commerce or to facilitate trade, that are being done in highly-centralized models that I would argue are ripe for disruption through blockchain. A really, really high-level example is UPC codes on products. It doesn't matter who you are. It doesn't matter what manufacturer you are. You are adhering to this data standard. A UPC code is a data standard. It is master data. It's everything we've been talking about. It's consistent ways of defining things. It's consistent labeling. It's consistent quality standards. It's consistent structures of data. So that if a consumer goes to Kroger, or CVS, or Walgreens, or Safeway, it doesn't matter that when you go beep over the reader, you get the same data. That is a classic example of a data sharing network, an ecosystem where companies have come together and agreed on some data standards to facilitate commerce and to facilitate trade. But it's being done in a highly, highly centralized way.

Well, could you do something like that in a decentralized way? I'm not saying it's necessarily UPC codes. It could be anything. The example I gave before is information related to ACME Incorporated. There are certain providers of data that make a lot of money that sell information about ACME Incorporated. They're very, very centralized. They have single databases, repositories of information about companies, or repositories of information about people or about people's needs or behaviors, and on and on. They're all largely centralized, and they're all largely existing as data sharing ecosystems. Companies do see the value here. They get it at a certain degree. But when it comes to, okay, well, you've got to give something to get something, often that's going to be a challenge. The lawyers most certainly can be involved.

Matter of fact, I'm at this conference here. I asked. I was talking yesterday about smart contracts. There is a notion in my world and in the data world. It's something called the data contract, where you could actually, even over APIs, where you could actually tag contracts on your data to say, this is how it can be used. This is what it means. These are the restrictions of how this data can be used. I asked a lawyer who was giving a presentation. I said, can you automate that? Because I was thinking about smart contracts on blockchains. Why couldn't you automate distribution of shared data, or the management of data, or the governance of data, the rules of who can use it and how they can use it? I was like, "Can you automate that?" The lawyer automatically said no. I was like, wait a minute. Of course, you can. There's this thing called smart contracts.

I figured out why he said no because I was basically asking, could I automate your job? Anybody's first reaction to that is going to be odd. No, you can't automate my job. But yeah, we got a long way to go in the blockchain world. I think the horse is out of the barn, and the cat is out of the bag. Pick your metaphor. The technology is too revolutionary. There's too many advantages here. It always comes back to — I think it comes back to economies of scale that can be leveraged through blockchain, that don't exist when companies are working as an island, on their own.

Erik: Yeah, well, it'll be fascinating to watch this evolve. I think you're right. The technology is there. There's been a lot of experimentation. We're all waiting for somebody to figure something out, and then just build a nice 100-million-dollar business. Just kind of figure that out, and get some cash flow going. I wanted to then ask maybe a last question here from my side. Maybe there's some other things you wanted to share. But if we look at the future development of Profisee, is blockchain on the development roadmap? Is this a challenge that you guys will be taking a crack at? If not, what are the other areas on the roadmap? What are the other priorities?

Malcolm: We're talking about blockchain. But honestly, in terms of our short-term roadmap, it's not there. All software companies, that road to revolution happens through evolution — at least for existing software companies that have existing customer bases that can't just rip and replace their entire tech stack. So, we're talking about it. I suspect, we will absolutely get there. I think there are some baby steps that we can take to enable this. A great example would be to enable some more widespread sharing across our individual customers. I think there are steps that we can take.

We're talking about blockchain, but what we're really focused right now is a few different areas. One is really, really, really turbocharging the Azure environment. So, we're deeply integrated with Microsoft. We've got a long-standing partnership with Microsoft. We are working actively to become the default MDM solution provider in the Azure space. Of course, there are others, but we want to be the preferred provider. If you are migrating to Azure, if you are doubling down on Microsoft's cloud-based solutions, we want to be your MDM provider of choice. So, we are building out and enhancing. We already have an integration. There's something called Purview, which is Microsoft's data governance tool. It's their data discovery and data cataloging tool. We already have an integration there. We're continuing to build that out. We've got it built. We've got an integration into Synapse, which is their Microsoft BI layer in the cloud. We're continuing to work on those integrations. We continue to try to position ourselves as, again, the default MDM provider in the Microsoft space.

There are a few other things that we're working on that are pretty cool. We are focusing so many software companies on more AI and ML-enabled use cases, more AI and ML-enabled capabilities within our solution. One, something that we're working on is integrating graph capabilities into what's called our entity resolution, a.k.a Matching. The example that I gave before is ACME, ACME Inc., ACME Co., ACME LLC. We are working to build out some graph-based capabilities to augment and improve the algorithms that we've got, that are running to do what we call identity resolution — know when something is unique or when it's not unique.

I suspect that we will continue to invest in AI, in ML, and other areas to try to automate more of those stewardship roles that we were talking about, to do things potentially — if a data steward, if a human makes a decision about a piece of data, that decision can be turned around and used to train some ML algorithms, so that decision doesn't have to be a human-intervened one in the future. Our software can learn from some of the decisions that users are making day in and day out. Those are some examples of some of our 365-day type roadmap, our shorter-term roadmap.

We're intrigued with Blockchain, most certainly. For our core customers, though, where we are from the size of company that we work with and the types of companies that we work with, I think we're really three to five years out before any sort of commercial mainstream applications of blockchain in the data management space. There are more and more and more blockchain-enabled solutions that are coming even in healthcare, for things like patient records management. Certainly, banking, in terms of international monetary transfers and foreign exchange and those types of things. So, blockchain is gaining speed. Data management, we tend to lag a little bit. That's okay, but we're certainly looking at it.

Erik: Yeah, great. Thanks. That sounds like a pragmatic roadmap. You have to focus on where you can generate value in the foreseeable future here. Great. Malcolm, I think we've covered a good bit of territory here. Thanks for your time. Any last thoughts that you'd like to share with the audience?

Malcolm: No, this has been a great conversation. I love to do it again. We do work in the IoT space a little bit. One thing that I find really fascinating is this schism, this break between IT and OT. Yes, IoT sensors can exist anywhere. They can exist in vending machines. They can exist in just about anywhere. But they're heavily being used on manufacturing floors to optimize what's happening in the manufacturing process.

There's an interesting break between operational technology and information technology, where a lot of the data that exists on the manufacturing floor doesn't really tend to get beyond the manufacturing floor. I've had some interesting conversations recently with thought leaders in the OT world about IoT and about the data that that's generating, and what they're doing with it. I asked one of those thought leaders. I said, "Okay. You're watching the manufacturing floor. You know everything that's going on. These sensors are tracking literally everything." I use an example. I said, "Okay. Well, let's imagine that some of the sensors are sitting in a robot arm. You know everything about this robot arm because you have to, including who made, who manufactured the robot arm, the supplier of that robot arm. Obviously, a key supplier to your organization."

I asked somebody. I said, "How would you know, or would you even know if that supplier, the person who sold you that robot arm, was also your customer?" I didn't get a response back. The person thought about it and thought about it. Then he said, "We probably wouldn't know." That's the break between OT and IT. The world of IT, my world, is like CRM records. It's data in ERP. It's data in digital marketing systems. It's procurement systems. It's direct spends, it's indirect spends. It's all of that stuff. Knowing things like what we call balance of trade — meaning, you're selling to somebody, but then that somebody has actually been buying from you — that's important. That is the total relationship that you have with ACME Incorporated. You're selling to them, but they're also buying from you.

In the OT world, you have this robot arm that is mission critical to your entire manufacturing process. But you don't know if the person, the company that sold it to you is also a customer. Fixing for that, that, to me, I think is interesting and relevant.

So, I guess, a parting thought here would be, if there's anybody that is listening to this, in the more in the IoT world and maybe wrestling or challenged or been given a mandate to figure out how do we bring these worlds together so that maybe that we bring the world of direct spend and indirect spend together, maybe we'd figure out how to bring the world of vendors, the people that are on the indirect side, and suppliers on the direct side and figuring this all out and solving for OT and IT, if you're interested in this, I am. So, reach out to me on LinkedIn. It would be great to get into deeper conversations on this.

Erik: Yeah, awesome. Maybe that's a follow-up conversation we'll have to have at some point.

Malcolm: That'd be great.

Erik: I think IoT data, just looking at that use case, you could also say, well, the data coming off of our production line about your robotic arm is very valuable for your R&D team, if we decided to give it to you.

Malcolm: Yep, absolutely.

Erik: We probably won't decide to give it to you, but we could. We can anonymize the production data. We can give you the data that's less sensitive. There's a lot of value that's untapped there. Lots of areas to explore here on the IoT side. I would love to have a follow-up conversation with you, Malcolm. But really, thanks for the time today.

Malcolm: Thanks so much, Erik. I appreciate it.

No account yet?

Transcript.

Contact us