One of the main tech trends of recent years is generative neural networks and smart chatbots created on their basis, like ChatGPT. However, now they are being replaced by a new generation of neural networks – universal action-transformers that perform many more tasks. In our article, you will learn about the first AI model of this type – ACT-1. We will tell about its functions and application possibilities, differences from other models, as well as about the Adept AI startup that developed it.

History of Creation and Development of Adept AI

Adept AI is a young AI startup based in San Francisco, founded in 2022 by a group of former OpenAI, Google, and DeepMind employees. The CEO of the new company is David Luan. Previously, he led the large language model program at Google LLC and led the technical department at OpenAI. Other Adept AI founders include researchers Ashish Vaswani and Niki Parmar. They hold the positions of chief scientific officer and chief technology officer of the company, respectively. Previously, Vaswani and Parmar worked at Google Brain. There they were co-authors of the Transformer technology, which became a real breakthrough in the field of AI. By the way, this is what the letter “T” stands for in GPT. The total team size of this promising startup does not exceed 50 people.

AI Adept<br>

Adept AI specialists use their experience working on well-known projects in the field of artificial intelligence to create an innovative neural network. The demonstration of the first version of the ACT-1 AI assistant developed on its basis brought the startup $350 million in investments. It received them in March 2023 in a Series B funding round. Participants included Addition, Greylock, Atlassian Ventures, Microsoft, Nvidia, Workday Ventures, Caterina Fake, Frontiers Capital, PSP Growth, SV Angel and A.Capital. The invested funds increased the company's capitalization to $1 billion, and the total number of investments it attracted to $415 million.

According to Adept CEO David Luan, they are using venture capital funding to hire new employees and train AI models. The laboratory works closely with Oracle and Nvidia corporations, using the cloud infrastructure of the former and the technologies of the latter to work on its project. Adept AI Labs runs thousands of NVIDIA GPUs on Oracle Cloud Infrastructure clusters and leverages the high throughput of the OCI network. This allows it to train large-scale artificial intelligence and machine learning models faster and more efficiently.

The startup team focused its efforts on developing a universal AI model designed to automate the interaction between humans and programs. Researchers did not create omnipotent artificial intelligence (a human substitute). Instead, they introduced a smart intermediate interface between users and the digital space. Their invention qualitatively speeds up the solution of repetitive tasks and increases productivity without pretending to be superintelligence.

In the process of developing an innovative model, Adept specialists used computer vision. It allows the neural network to perceive and understand visual information in the same way as a person does. The second important aspect is machine learning. It helps AI learn, master different actions based on the data loaded into it, and constantly improve its skills.

ACT-1 – the Firstborn in the Family of Action-transformers

The laboratory's first development was called Adept ACT-1, which stands for "Action Transformer". According to the creators, this new generation AI model does not just fulfill user requests, but fully works with software. A neural network can be trained to use any program, website, web application or API at the human level. Its algorithms analyze information through pixels on the display and then perform user-specified actions in a browser or other software. Thus, ACT-1 can truly be called a digital agent-intermediary between a person and a computer (human-computer interface – HCI).

The ACT-1 AI agent architecture is based on the Fuyu-8B multimodal language model. In general, it is similar to other generative neural networks, in particular GPT. The difference between them is higher interactivity. The model analyzes the image on the screen, correlates it with the user's request, and converts this data into actions given to it. To train it to perform various tasks, Adept uses reinforcement learning technology. For now, its interface is adapted only for desktop computers.

Fuyu-8B has many important advantages:

  • The capacious, minimalistic architecture simplifies and accelerates the process of learning, deploying and scaling Adept ACT 1, as well as subsequent versions of the program.
  • Designed from scratch specifically for creating AI agents. Therefore, it has a wide range of capabilities. It processes images at arbitrary resolutions, answers questions about charts, graphs and user interfaces, localizes images on the screen, and so on.
  • Provides high data processing speed. It analyzes volumetric images and provides a response to the user's request in less than 100 milliseconds.
  • Performs well on standard picture comprehension tests, including visual question answering and natural picture captions.

The Fuyu-8B language model is distributed under the open license CC-BY-NC. This allows the community to freely use it for third-party AI projects. In addition, the AI startup Adept team has developed a more massive Fuyu-Medium neural network, which is still at the stage of discussing the results. Since the Fuyu-8B is a first and actual test model, instructions for tuning, post-processing, or data filtering are not available for it. Users should configure it themselves for their application.

Possibility of Using the ACT-1 AI Agent

The AI agent is capable of executing complex user requests using various software and coordinating its actions across multiple programs. ACT-1 Adept is versatile and quickly learns to operate various digital tools. One of the first among them was the Internet browser. To achieve this, the developers connected the neural network to the Google Chrome widget in demo mode. Integration with Chrome allowed the AI model to view web pages and perform various actions on them, including typing, clicking and scrolling. During the product presentation, developers used ACT-1 for the following tasks:

  • collecting data from email and documents when filling out insurance claims;
  • entering data from invoices sent by email into a payment program;
  • drawing up a walking route around the city in Google Maps.

The action-transformer interface is presented in the form of a window on top of a browser or other programs. The self-learning function allows you to use the neural network for many actions related to working with computer software. It can transfer LinkedIn page URLs into recruiting software, search the Internet, fill out spreadsheets, and so on. Users simply enter a text query into the dialog box and ACT-1 gets to work. AI agent capabilities are especially useful for manual data processing tasks. For example, those processes that required more than 10 clicks in Salesforce can now be completed by typing just one sentence.

Connect applications without developers in 5 minutes!
How to Connect Webhooks to Notion
How to Connect Webhooks to Notion
How to Connect Facebook Leads to Elastic Email
How to Connect Facebook Leads to Elastic Email
  • Understanding graphs and charts. A neural network demonstrates an in-depth understanding of context by analyzing the relationships between different types of data and drawing logical conclusions. It also correctly answers non-trivial questions using standard diagrams.
  • Understanding documents. The AI agent qualitatively recognizes and analyzes different types of documents, from Word/Excel/PDF files to complex infographics.
  • Localization. ACT-1 processes images for detailed localization of text and UI elements on them.
  • Searching for information on the Internet. If a language model does not know the answer to a question or does not understand how to perform a particular action, it searches for information on the Internet.
  • Work on mistakes. AI corrects errors and improves its skills based on user feedback.

Prospects and Challenges of Adept AI Developments

Adept AI researchers are confident that action transformers will increase the speed and productivity of human-computer interaction in the coming years. People will control it using natural language rather than a graphical interface, directly telling the PC what it needs to do. Neural networks will help novice users master programs faster and more efficiently, realizing their ideas even without the slightest skills or experience.

Instructions, guides, and FAQs will be intended for AI, not humans. People will not have to delve into the intricacies of the interface of each program or search for information on the Internet – language models will do all this. The prospects for using action transformers like the ACT-1 are not limited to routine office tasks. The expanded capabilities of artificial intelligence will make it an effective participant in research in a number of important and complex industries – in engineering, the creation of new medicine, and more.

Despite justifiable optimism, improving action neural networks brings more than just benefits. It also comes with certain challenges. According to the developers, the AI models they created can cause harm to a person if used incorrectly. Therefore, they pay increased attention to user feedback, and also combine machine learning and incremental deployment methods. However, security is one of the key problems of artificial intelligence technologies in general, and not just the Adept product.

Differences and Similarities Between ACT-1 and other AI Models

The main difference between ACT-1 AI and other generative neural networks is that it has actually changed the rules of the game in this field. Previously released systems were highly specialized, although they performed a wide range of actions. For example, GPT writes, rewrites and translates text well, Stable Diffusion produces high-quality graphics. However, none of them is capable of controlling other programs the way a person does. Researchers have been trying to create such intelligent agents for more than 10 years, but it was the Adept AI that managed to achieve a breakthrough in this area.

The action transformer ACT-1 is generally similar to other AI models. It is developed based on machine learning with reinforcement and a large dataset. Its key feature and advantage is versatility. It is this that allows this neural network to interact flexibly and effectively with the virtual space around it.

In fairness, it is worth noting that today new types of models are less mature and adapted for practical use than traditional generative networks. Therefore, Adept AI does not yet release its developments for use by a wide audience.


Aspiring startup Adept AI has set itself an ambitious goal – to develop a universal ACT-1 AI model capable of fully interacting with a computer at the human level. However, in fact, the company faces an even more difficult task – to make its neural network not only efficient and productive, but also reliable and safe. In any case, its developments have opened up a very promising direction in the AI segment. Its real significance will become clearer in the near future.


If you use Facebook Lead Ads, then you should know what it means to regularly download CSV files and transfer data to various support services. How many times a day do you check for new leads in your ad account? How often do you transfer data to a CRM system, task manager, email service or Google Sheets? Try using the SaveMyLeads online connector. This is a no-code tool with which anyone can set up integrations for Facebook. Spend just a few minutes and you will receive real-time notifications in the messenger about new leads. Another 5-10 minutes of work in SML, and the data from the FB advertising account will be automatically transferred to the CRM system or Email service. The SaveMyLeads system will do the routine work for you, and you will surely like it.