In today’s economy, data is often seen as the lifeblood of any business. At Dun & Bradstreet, its entire business is customer data. Being able to connect and correlate data to offer a unified view of the customer is its competitive edge.
“By not having a connected view of your customers, suppliers, and partners, you risk wasting the company’s time, money, and possibly damaging market position and brand reputation,” said Dr. Anthony Scriffignano, senior vice president and chief data scientist, Dun & Bradstreet.
“Additionally, we are living in an era of increasing digital disruption from the confluence of big data, artificial intelligence, and changing regulations relating to the sourcing and use of data,” he added.
The Economic Toll of Disconnected Data
Dr. Scriffignano linked real business consequences in having disconnected data. A recent survey noted that one quarter of responding businesses with over 500 employees are losing customers because of bad data. The number rises to 32% for companies under 500 employees.
One-fifth of businesses have also lost money by offering too much credit to a customer, with 17% acknowledging the severe impact on reputations. Meanwhile, 16% are facing regulatory issues because of no or incorrect information, while 12% were fined for data-related matters.
At Dun & Bradstreet, Dr. Scriffignano decided to tackle the issue of disconnected data head-on.
“A vast portion of data is unstructured. It does not come with an ontology or data dictionary. So, the question relates to how and to what extent such information might be useful to augment decisions made with previously curated information,” he said.
The first step was to understand what the current situation was, what regulations were relevant, and what corporate data was actually available.
“The next steps involved identifying empirically rigorous methodologies to conduct the test, involving business owners in a process very similar to what was outlined above,” said Dr. Scriffignano.
He noted that it was essential to capture learnings and identify bias in the results during this examination. "Finally, as conclusions were reached, we needed to understand the advisability of rolling the emerging capabilities into existing and new processes."
Spotting the Outliers
Another initiative was advanced anomaly detection.
Dun & Bradstreet curates multiple data sets to see relationships among disparate entities. These can be very helpful in understanding risks and opportunities while discovering suspicious business activities.
“The question was whether we could combine these two observations, enabling a new automated capability for detecting potentially malfeasant behavior,” said Dr. Scriffignano.
So the company deployed advanced anomaly detection to look for outliers based on existing data relationships. “Outliers are assumed to have random variation with respect to some otherwise relevant relationship,” described Dr. Scriffignano.
“We worked from the presumption that outlier behavior was not all random, rather that there were observable clusters of coalescent behavior (at a smaller scale), which could be tied to potential malfeasance. We have developed an entire suite of new capabilities relating to this investigation.”
Addressing the Data Ownership Struggle
One critical challenge that many data scientists face is data ownership. Dr. Scriffignano called it data stewardship, and there are several frameworks.
“The most important thing is to choose an approach that best suits the organization,” he added.
Dun & Bradstreet took the first step toward data stewardship by making a real-world business case. "Taking a data-centric view to data ownership is often part of the tension. Show each department how strong data management affects its stakeholder outcomes and bottom lines," said Dr. Scriffignano.
Next, the company had the departments get involved in the strategy and setting the metrics. “Prioritize what needs to be done first. Define success together and use the same vocabulary. This must be a collaborative process to be effective.”
There are multiple frameworks for data stewardship. “One best practice is to appoint a lead from each group to be involved in planning, testing, and reviewing implementation to the established expected business results of the specific targeted project or program. This must include your technology department. RACI (responsible, accountable, consulted, and informed) is an excellent model for this sort of collaboration to make sure that the appropriate parties are consulted and kept informed.”
Dr. Scriffignano noted that leads and data stewards can then act as the project champions for the group after the project goes live.
“Monitoring and maintenance must be considered upfront. Finally, track results and show ROI,” he added.
The Science of Convincing the User
Many companies see low adoption of data science tools, often complaining that these are either too complex or not suited for their work.
"Too many businesses start with the tools instead of the data. That's an issue. If the data is siloed, mismatched, and structured in various ways, you can't have a unified data framework that functions the way it should. This phenomenon is often at the heart of why tools and techniques may seem difficult to use," said Dr. Scriffignano.
Another reason that data management tools are often not used is simple inertia. "Users don't look at tools as an objective; they need to get a job done. If that job is getting done on a spreadsheet and your objective is to migrate to a tool in order to get additional benefits such as data governance, process rigor, and increased insight cross-functionally, then you need to focus on those benefits."
"It's the outcome that matters. The tool is merely an expedient," Dr. Scriffignano added.