Singapore Budgets USD52M to Train LLM
- By Paul Mah
- December 20, 2023
Singapore has committed SGD70 million (USD52 million) to build the region’s first large language model (LLM) to incorporate the diverse cultures and languages of Southeast Asia.
Dubbed the National Multimodel LLM Programme (NMLP), IMDA will partner with AI Singapore and A* Star to develop it over two years. The NMLP will see a model of between 30 billion and 50 billion parameters in size and incorporate speech and text.
Will build on existing Sea-Lion model
It will be built on AI Singapore’s Southeast Asian Languages in One Network (Sea-Lion) model, an open-source LLM trained on 11 languages in the region. The work will extend SEA-LION into a multimodal speech-text model.
There are several goals for the NMLP, which include building skilled AI talent in Singapore and increasing the understanding of how LLMs work. And at a time when LLMs are trained almost exclusively by big tech firms in the West and China, to establish a trusted environment for the use of AI. Finally, it is to foster an AI industry to develop LLM-enabled solutions.
“This national effort underscores Singapore's commitment to become a global AI hub. Language is an essential enabler for collaboration,” said Ong Chen Hui, assistant chief executive of the Biztech Group at IMDA.
“By investing in talent and investing in large language AI models for regional languages, we want to foster industry collaboration across borders and drive the next wave of AI innovation in Southeast Asia,” she said.
In a LinkedIn comment, Laurence Liew, the director of AI Singapore, noted that the majority (75%) of the engineers building SEA-LION came through the AI Apprenticeship Programme.
These engineers are all Singaporeans and mostly self-taught before joining AIAP to deepen their skills and gain production-ready coding and deployment skills, he noted. Alluding to the importance of investing in local AI specialists, he said: “Singapore has plenty of talent.”
Of course, it is worth noting that training LLMs is a costly endeavor, and USD52 million is probably on the lower end of the resources required to train just one model.
Google’s just-launched Gemini AI model, for instance, is really three different models of different sizes. Still, a localized LLM model is a good thing and will pave the way for more inclusive and relevant AI solutions that will benefit a wider range of users across the region.
Image credit: iStockphoto/lena_serditova
Paul Mah
Paul Mah is the editor of DSAITrends, where he report on the latest developments in data science and AI. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose.