What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is a technique used to improve the capabilities of a large language model by augmenting it with a text corpus and the ability to search that corpus on the fly. A system using RAG receives a query from a user, uses that query to search its corpus, and adds the results to the LLM's prompt to generate a relevant response.

Some advantages of using RAG are that it:

  • Can enhance an LLM’s knowledge on specific subjects through the use of specialized corpuses that it may not have encountered much or at all during its training.

  • Allows the system to quote or cite sources from the corpus that have been returned by the query.

  • Appears to reduce hallucinations in some contexts.

  • Reduces inference costs by including only relevant parts of the corpus into prompts.

  • Makes it easier to ensure an LLM’s outputs stay current in domains where information changes quickly (since it’s easier to keep a text corpus up to date than to retrain a model).

An example of a system based on RAG is aisafety.info’s chatbot.

Further reading: