Data Science Data Management

Duolingo’s Lead Data Scientist on Their Top Best Practice

By Paul Mah
May 12, 2021

Language-learning platform Duolingo has more than 300 million users completing over seven billion language-learning exercises each month. How does its data science team leverage this enormous trove of data to improve Duolingo’s user experience and improve learning outcomes for students of indigenous languages?

In a recent interview with Built In, lead data scientist Erin Gustafson shed some light on Duolingo’s top best practice in applying data science to help users benefit from its 95 available courses in 38 languages.

Don’t neglect the project kickoff

Gustafson deems the project kickoff project as the number one best practice. The process involves drafting a kickoff document and scheduling a meeting with key stakeholders to discuss the plan. Goals, requirements, stakeholders, technical approach, and timeline are all defined at the documentation phase by the data scientists with their managers and team leads, she says.

The objective is to force a contemplation of the objectives early in the project, ensure that requisite data are available and that it delivers adequate ROI. Contrary to what one might expect, required data might not necessarily be available – which means the project will need to be shelved or at least deferred.

The documentation phase is followed by the kickoff meeting itself, which is held with data scientists and the cross-functional stakeholders identified in the earlier phase. The latter group is vital because the success of every data science project is really determined by its impact on the product or business.

“The cross-functional nature of this meeting is really important because the success of a data science project is not solely determined by how well the technical approach is executed … [Putting] product managers, engineers, learning scientists and others in the meeting ensures that we’re asking the right questions and plan to answer them appropriately,” explained Gustafson.

An after-action review

One recent addition to the data science process at Duolingo would be the after-action review. Typically used to reflect on lessons learned from past projects, this has been incorporated into the standard lifecycle of a data science project.

As with project kickoff meetings, an after-action review will typically involve having a similar cross-functional group to determine what worked well, identify project outcomes that fall short, and put the spotlight on areas that can be improved. According to Gustafson, both the project kickoff and after-action review helped the data science team at Duolingo work smarter.

“[We] recently wrapped work on an MVP [minimum viable product] model, reflected on the project as a team in an after-action review, and immediately applied those learnings in a kickoff doc for the next iteration on the model. These two processes in tandem have helped us work smarter.”

The importance of a well-defined goal

Perhaps the most important consideration would be to establish a well-defined goal for the project. This entails understanding how a particular project or model will be used and is inextricably linked to the chosen technical approach.

“[Not] being aligned on goals could mean that the data scientist doesn’t understand the use case for the model they’re building. Success for a model looks different depending on whether you hope to draw strong inferences from your model versus generate accurate predictions,” concluded Gustafson.

The need to agree on outcomes and metrics was highlighted by another data scientist recently. Speaking at Adobe’s annual Adobe Summit last month, Akash Maharaj, a senior data scientist at Adobe talked about the importance of optimizing machine learning models using the right metric. He noted that data science and business teams need to agree on objectives and that requires joint effort and data analysis to best identify them.

Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].

Image credit: iStockphoto/Sushiman

Paul Mah

Paul Mah is the editor of DSAITrends, where he report on the latest developments in data science and AI. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose.