Meta AI Giving Away Its New Large Language Model

AI researchers at Meta have created a massive new language model to rival OpenAI’s GPT-3 and advance our understanding of large language models. And it is giving it away as part of its effort to democratize AI.

Open Pretrained Transformer (OPT-175B) is a language model with 175 billion parameters trained on publicly available data sets. According to Meta, 992 A100 GPUs equipped with 80GB of onboard memory from Nvidia were used over a training period of two months.

(For the record, Meta is currently building a new AI Research SuperCluster with 6,080 A100 GPUs.)

Democratizing AI

To facilitate “community engagement”, the release includes both the pre-trained model, extensive notes about its development, logbook detailing the training process, and the code needed to train and use the model.

“While in some cases the public can interact with these models through paid APIs, full research access is still limited to only a few highly resourced labs,” explained Meta AI’s researchers in a blog post.

“This restricted access has limited researchers' ability to understand how and why these large language models work, hindering progress on efforts to improve their robustness and mitigate known issues such as bias and toxicity.”

According to Meta, the limitations and risks posed by large language models are still not well understood. But without direct access to these models, researchers are limited in their ability to design detection and mitigation strategies to curb abuse.

On the other hand, organizations can build powerful AI applications and run them on a far smaller budget if they are freed from the need to train language models from scratch.

“The first thing I expect [researchers] to do is to use it to build other types of language-based systems, whether it’s machine translation, a chatbot, something that completes text – all of these require this kind of state-of-the-art language model,” said Joelle Pineau, director of Meta AI Research Labs to IEEE Spectrum.

Case of two steps forward, one step back?

However, detractors point out that the largess is not as generous as it appears to be. They argue that Meta is keeping access to OPT-175B to a select audience, namely through a vetted process that limits it to academic researchers, people affiliated with the government, academic organizations, and industry research laboratories.

Crucially, the fact that OPT-175B is released under a noncommercial license also puts its use in a commercial setting in doubt.

On the other hand, there is no question that OPT-175B can go a long way to help researchers understand how large language models work, and for an opportunity to work on techniques to mitigate known issues such as bias and toxicity.

Large language models are known to veer towards toxic language. OpenAI reportedly relies on human contractors to manually clean up responses from GPT-3, though it isn’t clear if such engagements happened only at the beginning or are regular engagements.

GPT-3 is certainly being fine-tuned, though, with users documenting improved responses after initial nonsensical responses.

On its part, the white paper published by Meta AI does outline OPT-175B’s capabilities in areas such as bias and toxicity, hate speech detection, and evaluation for “safety” in dialogue.

What’s next?

Meta claims that training of OPT-175B requires just one-seventh of the carbon footprint that is required for training GPT-3.

It acknowledges that it is still significant, however: “While this is a significant achievement, the energy cost of creating such a model is still nontrivial… there exists significant compute and carbon [costs] to reproduce models of this size. OPT-175B was developed with an estimated carbon emissions footprint (CO2eq) of 75 tons.”

In general, AI researchers have praised the release of OPT-175B. In particular, the release of their project logbook was touted as “unprecedented”.

Of course, the hefty compute resources required do mean that many research labs might not have the infrastructure to run OPT-175B properly, though the release of smaller versions will alleviate this issue.

“We hope that OPT-175B will bring more voices to the frontier of large language model creation, help the community collectively design responsible release strategies, and add an unprecedented level of transparency and openness to the development of large language models in the field," concluded the Meta authors in the white paper. 

Researchers can request access to OPT-175B here; the related white paper can be found here (pdf).

Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].​

Image credit: iStockphoto/g-stockstudio