Supporting Data Science Through Data Management

As organizations wake up to the value of data-driven decisions, many are rethinking how they need to manage their data, says Angel Viña, the CEO and founder of Denodo, who notes that data management can benefit data science initiatives, too.

While data management is hardly a new topic, Viña believes that data virtualization is a game-changer in our digital and data-centric world today.

A better way to do data

Speaking to CDOTrends, Viña explained: “Companies traditionally manage their data by deploying additional data siloes or repositories to store their data; Denodo leverages data virtualization to support the retrieval and manipulation of data. The latter works regardless of the technical details of the data – such as how it is formatted and its physical location.”

“Data management used to be about integrating and centralizing existing data, moving it from existing systems into a new location, such as a new data store, data warehouse, or cloud data store. This is not the case with data virtualization. Data is ingested and placed within data stores. The data is then made available through their metadata.”

Traditional controls required for data security and governance can be implemented with data virtualization, and the data can be exposed in whatever format that the application or the user requires. By simplifying data access and unifying the data structure, enterprises gain greater agility, as well as faster and easier access to data, he says.

“Data management today isn’t about integrating more data silos. It is not about moving to a new database anymore. Data management today is about the metadata, it is about supporting the functions that you lack from the data management layer. Governance, data security, ease of delivery, strong integration capabilities – all those can be provided through metadata with a new data management system.”

Data for maximum returns

Viña recommends that organizations adopt an incremental approach to implementing data virtualization. As employees become familiar with the ability to access data through a new single point of access, they will start to leverage it consistently. The result is higher usage across the entire organization, he says.

“Start with a small number of data sources, and then add incrementally add in more sources with the help of your internal advisers, or by using external consultants. As you set up the first system and it works stably, you continue adding more and more [data sources] until you have all the data under your organization incorporated,” he said.

Of course, the onus is up to organizations to implement the relevant controls to ensure that employees only access data that are relevant to their roles.

The way Viña sees it, the purpose of modern data management is to simplify data access for employees, eliminating the involvement of data experts or the IT department. And with layers of unnecessary bureaucracy out of the way, data access can take place significantly faster, even as the lowering of barriers serve to drive a self-service culture for data discovery.

Enabling data science

So how can organizations enable their data storage to better support data science applications? Viña suggests that the priority for businesses is to consider how they can create a self-service culture when it comes to data. This includes setting up a suitable data management system with the ability to offer data virtualization.

“[Data scientists need] access to the data sources with the [management] interface to do profiling, as well as the interface to do modeling. When you have that, you can start with your data science job,” said Viña.

“Self-service is no longer limited to storage. Storage is one aspect of the equation, but what data scientists want is the ability to do data discovery. To be able to easily access data within the organization for analysis and to make new findings.”

“What we provide is a platform that offers various data preparation processes to help your data scientists find insights in your data. We are unique with our ability to easily move data to your data science platform with access to the original data sets,” Viña summed up.

Photo credit: iStockphoto/metamorworks