Data Science DataOps

Polyglot Persistence: Watch Out for These Two Words in 2023

By Winston Thomas
December 20, 2022

Polyglot persistence sounds like an ethereal term (or one taken out of a Harry Potter movie). It’s also a concept altering how we approach databases and data management.

The explosion of data has seen a similar jump in database varieties. Today, we still use traditional row-based relational databases. But we also employ column databases, document databases (JSON or XML) and graph databases for graph-based analysis. Meanwhile, key-value databases offer the speed that today’s businesses desire.

The traditional approach was to choose the best database product to fit the application's data requirements. This monoglot (or single database-based) approach was a compromise, where you select a database that is good for critical tasks but may not be for others.

Microservices-based modern applications are architected differently. They also spew and ingest a large variety of data. This means that you end up doing a lot of data conversions when using a monoglot model since your database may not be designed to handle certain data types well.

Using different databases can also be a nightmare as you manually assign which data goes to which database. It means your knowledge of all databases needs is a superset of the universe of known databases — a hard skill to find.

Today’s applications, like e-commerce systems, are built this way. They may use ElasticSearch for search results, allow MongoDB to store user information, use Redis to handle the cache, and let Google keep the financial transactions. A database engineer must carefully design the architecture, so the system works optimally.

So what if you can assign each microservice of an application to the best-suited database? This way, you can optimize your data workloads processing and management for speed and performance. This is the core promise of polyglot persistence.

The way of the polyglot

The concept behind polyglot persistence is not new. It borrows from Neal Ford’s polyglot programming concept, where different coding languages are used to create an application or system (Polyglot means using different languages).

The most significant advantage of the polyglot persistence concept is that it can access different data models. In the e-commerce example above and highlighted here, ElasticSearch can create recommendations based on the data, while MongoDB can’t. So, a polyglot persistence model “assigns” the right database for the data workload and outcome.

For data teams, this automatic “assignment” causes less data fragmentation when using different databases. Polyglot persistence also removes the need to know which database fits the data workload before manually coding (since it automatically “assigns”). This saves time and money when developers make the wrong assumption about databases. It also makes the architecture more scalable.

Lastly, polyglot persistence, in theory, uses all the features of each database, not just some. So, you are running each data workload in an optimal and maximized environment.

Specifically for chief data officers, polyglot persistence offers other benefits: better data governance (because they’re increasingly tasked to do this), improved data mapping (that ensures proper data workflows) and data observability.

It is not for all

Like all new concepts, polyglot persistence has its challenges. The model can be costly to roll out and does add complexity.

But the biggest and most obvious concern is talent. Most data teams have access to experts who are good with some databases, but not all. So, this makes it harder to know whether you are using your database portfolio optimally.

Testing can be another hurdle. When you shard data across databases, debugging is never straightforward. Getting the data architecture right is crucial and cannot be an afterthought.

Security is another angle. According to Dave Russell, Veeam Software’s vice president for enterprise strategy, polyglot persistence will only get pronounced, and it will significantly impact how companies protect their data assets. “Which is a long-winded way of probably saying things are going to get more complex from an availability perspective,” he explains, adding that his company is looking into rolling out support.

Besides, the traditional monoglot model is not dead. It can be great for some straightforward data management scenarios where polyglot persistence feels like overkill.

Wait, there is another model

While there is a lot of noise about polyglot persistence, another model is waiting on the sidelines. This is the multimodal database.

Polyglot persistence assumes that every database is specialized for a type of data store or data workload. So you use this concept to “stitch” the different databases together.

What if the underlying database supports different data stores, types, and workloads? This is what multimodal databases propose, and it is where many previous monoglot database providers are moving toward.

The most apparent advantage is operational simplicity. You don’t have to manage and test different databases. A single data store also makes it easier for in-house talent to upskill. They just need to take the advanced courses (and hopefully get a pay increment).

Then you have the licensing fees. With polyglot persistence, you’re at the mercy of different licensing schemes. With multimodal databases, you have one or a few.

So why is it not replacing polyglot persistence? Well, first, it’s relatively new. So, there is not much talent out there who are experts in multimodal databases, but there are many who are experts in different databases.

There is also a performance question. Since optimization for multimodal databases occurs at a higher database layer, it can impact processing speed.

For the moment, polyglot persistence is here to stay. For data teams, it’s also time to familiarize ourselves with the development as multimodal databases mature in the background.

Whatever the future holds, polyglot persistence has re-ignited the database wars. And it will be a concept to watch out for in the coming years as companies look to finetune their applications better component by component.

Winston Thomas is the editor-in-chief of CDOTrends and DigitalWorkforceTrends. He’s a singularity believer, a blockchain enthusiast, and believes we already live in a metaverse. You can reach him at [email protected].

Image credits: iStockphoto/ insta_photos

Winston Thomas

Winston Thomas is the editor-in-chief of CDOTrends. He likes to piece together the weird and wondering tech puzzle for readers and identify groundbreaking business models led by tech while waiting for the singularity.

Polyglot Persistence: Watch Out for These Two Words in 2023

Related

Will DataSecOps Save the Data Industry?

Data Observability: DataOps’ Shiny New Toy

Winston Thomas

Recommended Stories

New Study Reveals Growing AI Use, Data Quality Challenges in Analytics

Snowflake Just Solved a Major Data Engineering Headache

Enterprises Need a New Toolbox to Tune up Their GenAI Development

Pharmarack's Data Revolution: Cleaning India's Pharma Supply Chain Mess

Data Quality Is Now the Primary Factor Limiting GenAI Adoption

Recommended Whitepapers

Your Data-First Guide to APAC Expansion: Spotlight on South Korea

An IDC Report: How South Korean Enterprises Are Embracing Hybrid Cloud

Country Focus: Australia | Unlocking Innovation and Efficiency: Harnessing the Power of Hybrid Cloud and Data

eGuide: The Business of MDM

IDC InfoBrief: Hybrid Cloud & Data for Innovation in Asia Pacific and Hong Kong