Building an AI-Powered PDF Chatbot Using OpenAI and Retrieval-Augmented Generation (RAG)

Posted by Jordi Corbilla February 01, 2025

Building an AI-Powered PDF Chatbot Using OpenAI and Retrieval-Augmented Generation (RAG)

Introduction

Have you ever struggled to find a specific piece of information buried deep inside a long PDF document? Whether it’s a financial report, a legal document, or a research paper, manually searching for answers can be frustrating and time-consuming.

What if you could simply ask a question and get an AI-powered answer instantly—without reading the entire document?

In this article, we’ll explore how to build a Retrieval-Augmented Generation (RAG) pipeline using OpenAI’s GPT models and vector databases to create a chatbot that can answer questions from any PDF document.

Why Not Just Use GPT-4 Alone?

Large Language Models (LLMs) like GPT-4 are powerful but lack context about specific documents unless that information is included in the prompt. If you were to feed an entire PDF into a GPT-4 prompt, you’d quickly hit token limits and face high API costs.

The Solution: Retrieval-Augmented Generation (RAG)

Instead of passing entire PDFs to GPT-4, we split, store, and retrieve only the most relevant sections of a document before asking the model to generate a response.

A RAG pipeline consists of three key steps:

Extracting and Splitting Text: Convert a PDF into manageable chunks.
Vectorising & Storing the Chunks: Convert text into embeddings and store them in a vector database for quick retrieval.
Querying & Response Generation: Retrieve relevant chunks and use GPT-4 to answer the user’s query.

This approach drastically reduces token usage, making queries faster and more cost-effective.

Building the AI-Powered PDF Chatbot

Tools and Technologies Used

LangChain: For orchestrating LLM calls, vector search, and retrieval.
OpenAI GPT-4: For generating responses based on retrieved content.
FAISS / ChromaDB: To store and retrieve relevant document chunks efficiently.
PyPDFLoader: To extract text from PDFs.

Step 1: Install Dependencies

First, install the necessary Python libraries:


pip install langchain openai chromadb faiss-cpu tiktoken PyPDF2

Step 2: Load and Process the PDF

We use PyPDFLoader to extract text and RecursiveCharacterTextSplitter to split it into chunks that are easier to search.


from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def load_and_preprocess_pdf(pdf_path):
    loader = PyPDFLoader(pdf_path)
    documents = loader.load()

    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000, chunk_overlap=200
    )
    chunks = text_splitter.split_documents(documents)
    return chunks

Step 3: Store and Retrieve Document Chunks

We convert text chunks into embeddings and store them in ChromaDB for retrieval.


from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

def create_or_load_vector_store(chunks):
    embeddings = OpenAIEmbeddings()
    persist_directory = "chroma_vector_store"

    if os.path.exists(persist_directory):
        print("Loading existing vector store...")
        vectorstore = Chroma(persist_directory=persist_directory, embedding_function=embeddings)
    else:
        print("Creating new vector store...")
        vectorstore = Chroma.from_documents(chunks, embeddings, persist_directory=persist_directory)
    return vectorstore

Step 4: Query GPT-4 with Retrieved Chunks

We retrieve only the most relevant document chunks and pass them to GPT-4 for an intelligent response.


from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

def create_rag_pipeline(vectorstore):
    retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})
    qa_chain = RetrievalQA.from_chain_type(
        llm=ChatOpenAI(model="gpt-4", streaming=True),
        retriever=retriever,
        return_source_documents=True
    )
    return qa_chain

Step 5: Track Token Usage and Costs

To measure the cost efficiency of RAG, we log token usage:


from langchain.callbacks import get_openai_callback

def query_rag_pipeline(rag_pipeline, query):
    with get_openai_callback() as callback:
        result = rag_pipeline({"query": query})
        
        print("\nAnswer:")
        print(result["result"])

        print("\nToken Usage:")
        print(f"- Prompt tokens: {callback.prompt_tokens}")
        print(f"- Completion tokens: {callback.completion_tokens}")
        print(f"- Total tokens: {callback.total_tokens}")
        print(f"- Estimated cost: ${callback.total_cost:.5f}")
    return result

Step 6: Run the Chatbot


def main():
    pdf_path = "example.pdf"
    print("Loading and processing PDF...")
    chunks = load_and_preprocess_pdf(pdf_path)

    print("Creating/loading vector store...")
    vectorstore = create_or_load_vector_store(chunks)

    print("Setting up RAG pipeline...")
    rag_pipeline = create_rag_pipeline(vectorstore)

    while True:
        query = input("\nAsk a question (or type 'exit' to quit): ")
        if query.lower() == "exit":
            break
        query_rag_pipeline(rag_pipeline, query)

if __name__ == "__main__":
    main()

How This Improves Performance and Reduces Cost

Instead of passing entire PDFs to GPT-4, this chatbot:
✅ Retrieves only relevant parts, reducing token usage.
✅ Persists embeddings, avoiding redundant recomputation.
✅ Uses vector search, making queries faster and scalable.

Example Cost Comparison

Approach	Tokens Used	Cost per Query
Full PDF in Prompt	~20,000	$0.10
RAG-based Retrieval	~350	$0.0021

🚀 95%+ cost savings while maintaining accuracy!

Potential Applications

This RAG-based PDF chatbot can be used for:

📜 Legal Document Analysis – Quickly retrieve case laws, contracts, and compliance details.
📚 Educational Use – Answer questions from textbooks or research papers.
🏦 Finance & Regulations – Automate MiFID II, GDPR, and financial document inquiries.
🎓 Corporate Knowledge Management – Search internal company policies instantly.

Final Thoughts

The RAG-PDF chatbot is a practical solution for efficiently querying large documents using GPT-4 while significantly reducing token costs. By integrating vector search with OpenAI’s LLMs, we can build scalable and cost-effective AI applications for real-world business use cases.

👉 Try the full implementation on GitHub: RAG-PDF-Chatbot

Let me know in the comments—what would you use this for? 🚀

Search This Blog

Random thoughts on coding and technology

Building an AI-Powered PDF Chatbot Using OpenAI and Retrieval-Augmented Generation (RAG)

Introduction

Why Not Just Use GPT-4 Alone?

The Solution: Retrieval-Augmented Generation (RAG)

Building the AI-Powered PDF Chatbot

Tools and Technologies Used

Step 1: Install Dependencies

Step 2: Load and Process the PDF

Step 3: Store and Retrieve Document Chunks

Step 4: Query GPT-4 with Retrieved Chunks

Step 5: Track Token Usage and Costs

Step 6: Run the Chatbot

How This Improves Performance and Reduces Cost

Example Cost Comparison

Potential Applications

Final Thoughts

Comments

Post a Comment

Popular Posts

Train Your Own LoRA with ComfyUI: A Step-by-Step Guide

"Cannot load SSL Library" using Delphi XE7