LLMs don’t know your PDF.
They don’t know your company wiki either. Or your research papers.
What they can do with RAG is look through your documents in the background and answer using what they find.
But how does that actually work? Here’s the basic idea behind RAG: Chunking: The document is split into small, overlapping parts so the LLM can handle them. This keeps structure and context.
Embeddings & Search: Each part is turned into a vector (a numerical representation of meaning). Your question is also turned into a vector, and the system compares them to find the best matches.
Retriever + LLM: The top matches are sent to the LLM, which uses them to generate an answer based on that context.