Building an AI-Powered PDF Chatbot Using OpenAI and Retrieval-Augmented Generation (RAG)

Introduction

Have you ever struggled to find a specific piece of information buried deep inside a long PDF document? Whether it’s a financial report, a legal document, or a research paper, manually searching for answers can be frustrating and time-consuming.

What if you could simply ask a question and get an AI-powered answer instantly—without reading the entire document?

In this article, we’ll explore how to build a Retrieval-Augmented Generation (RAG) pipeline using OpenAI’s GPT models and vector databases to create a chatbot that can answer questions from any PDF document.


Why Not Just Use GPT-4 Alone?

Large Language Models (LLMs) like GPT-4 are powerful but lack context about specific documents unless that information is included in the prompt. If you were to feed an entire PDF into a GPT-4 prompt, you’d quickly hit token limits and face high API costs.

The Solution: Retrieval-Augmented Generation (RAG)

Instead of passing entire PDFs to GPT-4, we split, store, and retrieve only the most relevant sections of a document before asking the model to generate a response.

A RAG pipeline consists of three key steps:

  1. Extracting and Splitting Text: Convert a PDF into manageable chunks.
  2. Vectorising & Storing the Chunks: Convert text into embeddings and store them in a vector database for quick retrieval.
  3. Querying & Response Generation: Retrieve relevant chunks and use GPT-4 to answer the user’s query.

This approach drastically reduces token usage, making queries faster and more cost-effective.


Building the AI-Powered PDF Chatbot

Tools and Technologies Used

  • LangChain: For orchestrating LLM calls, vector search, and retrieval.
  • OpenAI GPT-4: For generating responses based on retrieved content.
  • FAISS / ChromaDB: To store and retrieve relevant document chunks efficiently.
  • PyPDFLoader: To extract text from PDFs.

Step 1: Install Dependencies

First, install the necessary Python libraries:


pip install langchain openai chromadb faiss-cpu tiktoken PyPDF2

Step 2: Load and Process the PDF

We use PyPDFLoader to extract text and RecursiveCharacterTextSplitter to split it into chunks that are easier to search.


from langchain.document_loaders import PyPDFLoader from langchain.text_splitter import RecursiveCharacterTextSplitter def load_and_preprocess_pdf(pdf_path): loader = PyPDFLoader(pdf_path) documents = loader.load() text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200 ) chunks = text_splitter.split_documents(documents) return chunks

Step 3: Store and Retrieve Document Chunks

We convert text chunks into embeddings and store them in ChromaDB for retrieval.


from langchain.vectorstores import Chroma from langchain.embeddings.openai import OpenAIEmbeddings def create_or_load_vector_store(chunks): embeddings = OpenAIEmbeddings() persist_directory = "chroma_vector_store" if os.path.exists(persist_directory): print("Loading existing vector store...") vectorstore = Chroma(persist_directory=persist_directory, embedding_function=embeddings) else: print("Creating new vector store...") vectorstore = Chroma.from_documents(chunks, embeddings, persist_directory=persist_directory) return vectorstore

Step 4: Query GPT-4 with Retrieved Chunks

We retrieve only the most relevant document chunks and pass them to GPT-4 for an intelligent response.


from langchain.chains import RetrievalQA from langchain.chat_models import ChatOpenAI def create_rag_pipeline(vectorstore): retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3}) qa_chain = RetrievalQA.from_chain_type( llm=ChatOpenAI(model="gpt-4", streaming=True), retriever=retriever, return_source_documents=True ) return qa_chain

Step 5: Track Token Usage and Costs

To measure the cost efficiency of RAG, we log token usage:


from langchain.callbacks import get_openai_callback def query_rag_pipeline(rag_pipeline, query): with get_openai_callback() as callback: result = rag_pipeline({"query": query}) print("\nAnswer:") print(result["result"]) print("\nToken Usage:") print(f"- Prompt tokens: {callback.prompt_tokens}") print(f"- Completion tokens: {callback.completion_tokens}") print(f"- Total tokens: {callback.total_tokens}") print(f"- Estimated cost: ${callback.total_cost:.5f}") return result

Step 6: Run the Chatbot


def main(): pdf_path = "example.pdf" print("Loading and processing PDF...") chunks = load_and_preprocess_pdf(pdf_path) print("Creating/loading vector store...") vectorstore = create_or_load_vector_store(chunks) print("Setting up RAG pipeline...") rag_pipeline = create_rag_pipeline(vectorstore) while True: query = input("\nAsk a question (or type 'exit' to quit): ") if query.lower() == "exit": break query_rag_pipeline(rag_pipeline, query) if __name__ == "__main__": main()

How This Improves Performance and Reduces Cost

Instead of passing entire PDFs to GPT-4, this chatbot:
Retrieves only relevant parts, reducing token usage.
Persists embeddings, avoiding redundant recomputation.
Uses vector search, making queries faster and scalable.

Example Cost Comparison

ApproachTokens UsedCost per Query
Full PDF in Prompt~20,000$0.10
RAG-based Retrieval~350$0.0021

🚀 95%+ cost savings while maintaining accuracy!


Potential Applications

This RAG-based PDF chatbot can be used for:

  1. 📜 Legal Document Analysis – Quickly retrieve case laws, contracts, and compliance details.
  2. 📚 Educational Use – Answer questions from textbooks or research papers.
  3. 🏦 Finance & Regulations – Automate MiFID II, GDPR, and financial document inquiries.
  4. 🎓 Corporate Knowledge Management – Search internal company policies instantly.

Final Thoughts

The RAG-PDF chatbot is a practical solution for efficiently querying large documents using GPT-4 while significantly reducing token costs. By integrating vector search with OpenAI’s LLMs, we can build scalable and cost-effective AI applications for real-world business use cases.

👉 Try the full implementation on GitHub: RAG-PDF-Chatbot

Let me know in the comments—what would you use this for? 🚀



Comments

Popular Posts