The advent of large, open-source language models was a major milestone in the development of artificial intelligence. Based on this technology, many promising projects have been developed that have made AI much more accessible to the mass user. One of the most striking examples is the Code Llama chatbot. In this article, we will explain what is a large language model and introduce you to the five most in-demand public LLMs in 2024.
What is an Open Source LLM
Large Language Model (LLM) is a generative AI model designed to perform various actions with text. It is able to understand the meaning of text and analyze it, rewrite it to increase uniqueness, and translate between multiple languages. Additionally, the neural network can generate texts from scratch on different topics, in different volumes, and in different styles based on user requests (hints or prompts). The source of information for LLMs is downloaded datasets that contain extensive amounts of information from the internet, including articles, books, posts, and websites.
Currently, there are two types of large language models: proprietary (licensed by the owner) and public domain (open source). Among proprietary LLMs, the most famous one is the GPT neural network from OpenAI, which serves as the basis for the well-known chatbot ChatGPT. The creators of open language models allow everyone to use their products for free, as well as modify and customize them without any restrictions.
Benefits of LLM
Open-source LLM bring tangible benefits to individuals, businesses, and non-profit organizations. Its main advantages include:
- Customization. Open source code makes it easy to customize and adapt these models to the requirements and specifics of a particular industry, company, or project.
- Confidentiality. By running LLMs on internal infrastructure, users gain full control over their data.
- Saving. Public domain language models do not require licensing fees, allowing for significant cost savings. Open source makes AI and ML technologies accessible to small businesses, startups, and individuals.
- Transparency. Anyone can study the LLM program code, evaluating its parameters and capabilities. This makes them more reliable and secure compared to proprietary technologies.
- Innovation. Changes to large language models drive rapid innovation in the industry. Users can continuously improve publicly available LLMs and create new projects based on them.
- Independence. Open language models eliminate users' dependence on software vendors, giving them greater freedom of action.
Large open-source language models are actively used for generating, editing, and translating text content. Additionally, they are often used to develop smart chatbots and conduct various research.
Llama 2
Now that you already know what is large language model, we suggest moving on to getting acquainted with the most famous and popular systems of this type. Let's start with Llama 2. It was released by Meta Corporation in July 2023 and since then has already managed to win the title of the best open source large language model. Several other Meta products are based on this free, public LLM, including the Llama Chat chatbot and the Code Llama neural network.
Key features:
- Llama 2 generates, checks, and edits code and text, as well as debugs code and writes explanations for it. It is capable of processing requests both in code format and written in natural language.
- There are three versions of this LLM: with 7 billion (7B), 13 billion (13B), and 70 billion (70B) parameters.
- Llama 2 was trained using billions of web pages, millions of user searches, Wikipedia articles, and Project Gutenberg books.
- The neural network works with a number of popular programming languages: Python, C++, Java, PHP, TypeScript (JavaScript), C#, and Bash.
- The development of this language model involved the Research Super Cluster and a number of internal clusters with Nvidia A100 GPUs. The training duration ranged from 184K GPU-hours for version 7B to 1.7M GPU-hours for version 70B.
- Most of the characteristics of Llama 2 correspond to the proprietary GPT-3.5 and PaLM models but lag behind GPT-4 and PaLM 2.
Meta has made its open source LLM freely available, allowing it to be used for personal, commercial, and scientific purposes. Anyone can download Llama 2 from the official website (the “lightest” version 7B takes up approximately 13 GB of space), study the documentation, and run it on a computer.
Mistral
Mistral is the newest and largest open source LLM, released by the French company Mistral AI in February 2024. The company was founded by people from Google DeepMind and Meta. Microsoft Corporation was also directly involved in the preparation of its product. The maximum power version of Mistral Large is available to users of the Microsoft Azure cloud computing service.
Key features:
- Mistral is partially open source. The developers have made the code for the scales, which are numerical parameters that affect its operation, publicly available. However, the source data and information about the model training process are closed.
- The company has introduced a multilingual chatbot, Le Chat, to support users in learning Mistral AI technologies. Currently, a beta version of the bot is available for free, which includes three versions of LLM: Mistral Small, Mistral Large, and Mistral Next.
- The developers claim that their large language model, Mistral 7B, outperformed Llama 2 13B in all conducted tests. In the global ranking of the best LLMs, their product is in second place, only behind GPT-4.
- The top version of the Mistral Large neural network is capable of performing highly complex tasks. It can analyze, edit, and generate text in English, German, French, Italian, and Spanish. Additionally, it possesses decent programming skills.
Large language model open source Mistral easily integrate into websites and search engines. It is capable of creating real-time applications and can be used to develop AI assistants. With it as a foundation, anyone can create and release their own neural networks.
BLOOM
BLOOM (full name – BigScience Large Open-science Open-access Multilingual Language Model) is deservedly called one of the best open source LLM. It was released in the summer of 2022 by the BigScience project in collaboration with Hugging Face and the French National Center for Scientific Research. The neural network was trained on a supercomputer, the energy for which was obtained from nuclear fuel.
Key features:
- BLOOM has a “transformer” architecture and consists of 176 billion parameters. During training, 1.5 terabytes of text and 350 billion unique tokens were loaded into it.
- The training material for the language model was the ROOTS dataset, which included data from 100+ sources in 59 languages: 46 spoken and 13 programming languages.
- The neural network has proven itself well in writing, translating, and editing texts of varying length and content. It coped equally well with other NLP processes.
- The system effectively automates a number of programming activities, particularly code generation and debugging.
- Wide possibilities for linguistic analysis and AI research have ensured BLOOM recognition in the scientific community. The developers claim that their system was the first among all existing open source LLM models to gain skills in working with text in Spanish and Arabic.
- BigScience's large language model outperforms OpenAI's GPT-3 in terms of the number of parameters – 176B versus 175B.
BLOOM is freely available on the Hugging Face project website. Users can select the languages they are interested in from those supported by the neural network and then send it a request to perform certain actions.
MPT-7B
MPT-7B is one of the best LLM models open source. It was developed by the MosaicML Foundation Series organization. The release took place in May 2023. The full name of the model is MosaicML Pretrained Transformer (MPT). It has a “transformer” architecture and is trained on 1 trillion tokens of text and program code.
In July 2023, MosaicML was acquired by Databricks. Upon completion of the transaction, it, along with all of its language models, became part of the Databricks Lakehouse platform. In the same month, a new LLM was released – MPT-7B-8K. It differs from its predecessor in its increased context length, which allows it to better summarize documents and answer questions.
Key features:
- The open source neural network is available for commercial use. Its characteristics correspond to LLaMA-7B.
- The language model was developed in just 10 days on the MosaicML platform without human intervention. The total project budget was $200,000.
- Users can train, customize, and run their own language models based on this project, either from scratch or using special checkpoints.
- In addition to the basic LLM open source MPT-7B, the developers have released three improved versions: MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-StoryWriter-65k+.
- MPT-7B Base with 6.7B parameters includes FlashAttention for fast training and inference, as well as ALiBi for fine-tuning and extrapolation over longer context lengths.
- MPT-7B-Instruct is a model for carrying out short instructions. When training it, used the standard MosaicML dataset, as well as the Databricks Dolly-15k, Anthropic Helpful, and Harmless datasets.
- MPT-7B-Chat is a chatbot format model for generating dialogues. It was created by fine-tuning MPT-7B on the ShareGPT-Vicuna, HC3, Alpaca, Helpful, Harmless, and Evol-Instruct datasets.
- The main purpose of the MPT-7B-StoryWriter-65k+ is the processing and generation of literary texts with a very long context. During its development, the basic version of MPT-7B was trained on a large dataset of fiction texts with a volume of 65 thousand tokens.
You can view the basic and advanced versions of the MPT-7B on the Hugging Face website. Users can try its capabilities locally there.
Falcon
The Falcon neural network was released by the Technology Innovation Institute (TII). This applied research center is part of the Abu Dhabi Government's Advanced Technology Research Council. The presentation of the open source large language model took place in June 2023. At the moment, it is available in 4 versions: 180B, 40B, 7.5B, and 1.3B. They vary in scale and power, containing from 1.3 to 180 billion parameters.
Key features:
- The model was trained using Amazon Web Services resources over the course of two months. Up to 4096 GPUs were simultaneously used, and the total AI training time reached 7,000,000 GPU/hours.
- The release of the top version of the Falcon 180B took place in September 2023. After that, it became the world's largest open source LLM.
- A data set of 3.5 million tokens from the RefinedWeb dataset from TII was used as training material for the neural network.
- The large language model Falcon 180B has 2.5 times more parameters than Llama 2 from Meta. It is more powerful than GPT-3.5 from OpenAI and is approximately equal to Google PaLM 2, despite the fact that it contains 2 times less data.
- You can run the maximum version 180B on a computer with 320 GB of memory. For its full-scale deployment, at least 5120 GB VRAM will be required.
The neural network is available for review in Hugging Face Hub (standard version and for chat). Those interested can also test its capabilities in Falcon Chat Demo Space.
Final thoughts
Free LLMs offer abundant opportunities for the utilization of the latest AI and ML technologies by businesses, non-profit organizations, and individual users. The availability of open source code allows for easy modification and customization, as well as the creation of new systems based on them.
All the open source large language models discussed in our article have demonstrated their effectiveness and relevance, proving themselves valuable in various projects. Each model possesses unique features and applications. Llama 2 and Mistral excel in natural language processing tasks due to their optimized architectures. BLOOM and MPT-7B exhibit high scalability and adaptability to different language models. Falcon, on the other hand, caters to applications that require high speed and minimal resource consumption. These models provide powerful tools for AI research and development.
You probably know that the speed of leads processing directly affects the conversion and customer loyalty. Do you want to receive real-time information about new orders from Facebook and Instagram in order to respond to them as quickly as possible? Use the SaveMyLeads online connector. Link your Facebook advertising account to the messenger so that employees receive notifications about new leads. Create an integration with the SMS service so that a welcome message is sent to each new customer. Adding leads to a CRM system, contacts to mailing lists, tasks to project management programs – all this and much more can be automated using SaveMyLeads. Set up integrations, get rid of routine operations and focus on the really important tasks.