OpenAI announced the launch of a new artificial intelligence model, CriticGPT, based on GPT-4. It is capable of detecting and correcting errors in the program code generated by ChatGPT. The main purpose of CriticGPT is to support AI trainers as they apply an approach such as reinforcement learning from human feedback (RLHF). Testing has shown that the implementation of this model increases the effectiveness of trainers by more than 60%.
OpenAI noted that as generative models improve in aspects of logical thinking and behavioral modeling, the accuracy of ChatGPT increases. However, this improvement has a side effect: chatbot errors become harder to spot. This makes things difficult for AI trainers during the RLHF process. To facilitate its implementation, experts trained CriticGPT to generate critical comments that focus on inaccuracies in ChatGPT responses.
The company says CriticGPT analyzes code built on GPT-4, identifies errors, comments on them, and suggests fixes. To train CriticGPT, codes with artificially created errors manually introduced by developers were used. Test results confirmed that using this model significantly improves the quality of work of AI trainers, improving their performance by 63%.
CriticGPT's capabilities are not limited to just code analysis. In the experiments, the model was applied to a sample of ChatGPT training data that had previously been rated as error-free by human experts. During testing, CriticGPT found errors in 24% of cases, which were then confirmed by human reviewers. This demonstrates the model's ability to adapt to a variety of non-programming tasks and its ability to detect subtle errors that may go undetected even with careful analysis.
OpenAI cites the disadvantage of CriticGPT as training the model on a relatively small amount of data. This makes it applicable only to relatively simple problems. However, as AI systems become increasingly complex, the need for more advanced tools becomes clear. Because of this, OpenAI plans to scale up and integrate models like CriticGPT into the RLHF process.