Retrieval-Augmented Generation, or RAG, is a type of language generation model that combines pre-trained parametric and non-parametric memory for language generation.
For example, the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever. RAG models can use the retrieved passages from Wikipedia to generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
RAG models have been shown to achieve state-of-the-art results on several knowledge-intensive NLP tasks, such as open domain question answering, natural language generation and summarization. RAG models are one of the techniques for grounding language models, which means providing them with relevant information from external sources to improve their performance and explainability.