Data engineers get blamed a lot. Yet, their role is crucial if data scientists and analysts (the more visible parts of data science) want to perform.
Their role is simple: collect, manage, and convert raw data into usable information for data scientists and business analysts to analyze. Increasingly, they are also playing a crucial role in helping AI algorithms ingest vast swathes of data from diverse sources as they mature into models.
The job is also time-consuming. So when data scientists and business analysts don’t get the data they want, they will point the fingers at the data engineers.
The pressures of plumbing
TJ Chandler, managing director for APAC at Fivetran, sees data engineers as “plumbers.”
“They are the ones that deliver the water to the spout. Without them, many CDOs and other executives will find disappointment knowing that there’s this ocean of data out there but not a drop to drink,” he explains.
Therein lies the rub. Data engineers who manage data pipelines well are unseen as data scientists and business analysts can do their work efficiently. But where there is a pipeline bottleneck or trouble connecting with different data sources, they are at the receiving end of scorn.
And this thankless job is only getting worse. With data science teams asking to connect to more data sources, data engineering teams are under pressure to deliver efficient pipeline management.
“We did a survey among analysts and found that 60% of them said that additional business insights could be extracted from existing data if they only had more time to get to it,” says Chandler.
But it is time that data engineers are unable to give.
“We know that setting up a dashboard in Looker or Tableau actually doesn't take too much time but only if you have the data available. And getting the data is not the problem either. The difficult part is getting the data from its source into that data warehouse so that it can be extracted to produce those dashboards for insights,” says Chandler, adding that this is where data engineers are always working against time.
Companies are also changing the way they approach data questions. “The old world was, ‘Hey, I’ve a problem, and so I'm going to get some software.’ Now, companies are looking at it very differently. They are saying, ‘I need this outcome, which solution or combination of solutions, including building it myself, is going to give me the highest probability of that outcome’,” says Chandler.
Data engineers also face pressure downstream as business leaders want more reports from their data science and business analyst teams faster. Many of these reports are no longer for strategic decision-making but for operational pivots and adjustments. It means pipeline delays can make such reports operationally useless.
“We also now have citizen consumers of data. And they all want reports and dashboards. Every knowledge worker and every staff person want to have access to timely, relevant, and accurate reports,” observes Chandler.
Data engineers need to find a better way to manage the data pipelines. Fivetran, which offers a solution for automatic data integration built on a fully-managed ELT architecture, thinks it has one.
Using automation to scale pipelines
Fivetran used automation to tackle what data engineers had to do manually: create, manage and maintain the data pipelines.
It is looking to compress the development time and keeps maintenance to “near zero” via automation. Meanwhile, its Powered by Fivetran managed service product offers a REST API, automated data connectors for more than 150 different data sources, and an easy “embeddable” setup.
The net effect is time reduction. “What used to take six months for consultants to build with hand-coding and API writing was shortened to maybe six weeks by modern tools that allow drag and drop and ETL configuration. But those still broke and took a lot of time to repair. So we've compressed that down to six minutes,” claims Chandler.
For a subscription fee, data engineers can authenticate sources and get the data flowing. And the entire process is managed by Fivetran’s team of engineers.
This will become important as companies emerge from the pandemic doldrums and look at acquisition and merger opportunities, increasing the number of data sources data engineers need to integrate.
It will also help with the increasing demand for pipeline integration of third-party data sources as Asia pivots toward an AI-driven economy. And when IoT data streams become mainstream, data engineers can handle the increased number of data sources that traditional data engineering will baulk at.
“The old world would have data engineers saying, ‘Okay, look at all the sources we would like to have, prioritize them and then start integrating them one by one.’ Now, it is a rapid iteration cycle, where you don't have to choose and prioritize; just hook them up, start running them through, see what reports you get. So this rapid iteration cycle is something that is transforming their work,” says Chandler.
The dawn of the strategic data engineer
This level of automation is bound to shape jobs, and Chandler admits that the data engineering job is no different. But he also argues that it brings back the strategic aspect of data engineering that everyone misses out on.
“No. I think we’re transforming the nature of work. So, instead of hand-coding pipelines and writing APIs from scratch, they’re able to redirect their time to figure out what is the next data source they’re going to integrate,” explains Chandler.
Chandler also sees a shift in the balance of power for data engineers within DataOps teams through automation.
“They can have their relevance back. And they don't have to fight to be relevant just because stuff will break without them. Rather, they can be a strategic partner adding value,” he explains.
By then, we may not call them data engineers anymore.
Winston Thomas is the editor-in-chief of CDOTrends, DigitalWorkforceTrends and DataOpsTrends. He is always curious about all things digital, including new digital business models, the widening impact of AI/ML, unproven singularity theories, proven data science success stories, lurking cybersecurity dangers, and reimagining the digital experience. You can reach him at [email protected].
Image credit: iStockphoto/Bulat Silvia