The How and Why of Remote-Ready Data Science
- By DSAITrends editors
- December 02, 2020
With the ongoing pandemic, the days of casual discussions around the office will not be coming back any time soon. Data scientists forced to work from home face issues around data access, reduced collaboration, and inadequate infrastructure, says Florian Douetteau, the CEO of data science platform firm Dataiku.
To help data scientists work remotely, Douetteau outlined some suggestions in a contributed piece which he says will help make remote data science work. Of course, he would have you turn to his firm’s data science platform for end-to-end data science, though the suggestions themselves are worth thinking about.
Data science teams tend to work on their projects in an ad-hoc fashion, which typically translates to work being stored locally. With things figured out along the way, this often culminates with processes and workflows that are not easily reproducible.
On the other hand, a central location encourages good data governance and collaboration practices, argues Douetteau. The result: Employees from across the organization can easily work together on data science and machine learning project.
Single point of access
Successful remote data science calls for a single point of access for data access without the need to move them for processing. This is easier said than done, however, considering the disparate silos where data is typically kept, which can range from Hadoop clusters, managed cloud databases, object storage, or traditional DBMS.
With a central point of access, data scientists can stop at a single location to find what they need, and for instant access to crucial information such as the format and schema data. The result is consistent and secure access to data regardless of where employees are working from.
While data science used to be the domain of a select few, it is increasingly no longer the case. Explicit knowledge of the business plays a vital role to structure the right questions, which means giving access to domain experts such as accountants or the sales directors is increasingly important.
Compared to highly trained data scientists, citizen data scientists will probably benefit more from easy access and communication with their peers, though there is no reason the former will not benefit from collaboration, too.
There are two aspects to collaboration, according to Douetteau. A horizontal collaboration amongst people working together with similar skills and training, and vertical collaboration across teams working together, albeit with different roles and responsibilities. A data science deployment geared towards remote work should be capable of facilitating collaboration in both scenarios.
A remote-ready data science deployment will stay relevant even after the pandemic. For one, it can give organizations access to the talent they might not otherwise retain in an increasingly competitive field. Moreover, businesses can eliminate untimely interruptions to data science and AI projects should the office be unavailable.
Finally, and probably most importantly, it will go a long way towards ensuring that all aspects of key data science projects are found in one place – and can be properly backed up.
Image credit: iStockphoto/Chaay_Tee