FastRAGDocumentation

Documentation

Everything you need to set up, run, and deploy FastRAG — from first clone to production.

Next.js 16LangChain v3PineconeGPT-4oVercel AI SDK
Overview
Getting Started

Overview

FastRAG is a production-ready RAG (Retrieval-Augmented Generation) starter kit built with Next.js, LangChain, Pinecone, and OpenAI. It eliminates 40+ hours of boilerplate — vector ingestion pipelines, streaming responses, context window management, and a mobile-ready chat UI — so you can focus on building your actual product.

16+
Next.js
Latest
LangChain
v3
Pinecone
GPT-4o
OpenAI
What you getMulti-file PDF ingestion, URL scraping with Puppeteer, streaming chat with source citations, mobile-responsive chat UI, and full source code you actually understand — no black boxes.
Getting Started

Prerequisites

Make sure you have accounts and API keys ready. All services have free tiers.

Node.js v18+Required

Required to run Next.js locally.

Get key
OPENAI_API_KEYRequired

Used for embeddings and chat completions. Requires at least $5 in pre-paid credits — a new API key alone isn't enough.

Get key
PINECONE_API_KEYRequired

Vector database for storing and querying embeddings. The free Starter plan is sufficient.

Get key
Browserless.io TokenOptional

Headless Chrome for URL scraping. Only needed if you use the web scraping feature.

Get key
Getting Started

Installation

1

Clone or unzip the project

If you have GitHub repo access (included with all purchases):

bash
git clone fastrag.git
cd fastrag
Or unzip the downloaded ZIP file and open the folder in your terminal — both options are included with your purchase.
2

Install dependencies

bash
npm install
Seeing peer dependency warnings? Run npm install --legacy-peer-deps — common due to LangChain's rapid release cadence.
Getting Started

Environment Setup

Rename .env.example to .env.local and fill in your keys:

.env.local
# OpenAI — platform.openai.com/api-keys
OPENAI_API_KEY=sk-proj-...

# Pinecone — app.pinecone.io
PINECONE_API_KEY=pc-sk-...

# Must match the index name you create in Pinecone
PINECONE_INDEX=fast-rag

# Optional: only needed for URL scraping
BROWSERLESS_TOKEN=your-token-here
OPENAI_API_KEY
Required

Powers text-embedding-3-small (ingestion) and GPT-4o (chat).

PINECONE_API_KEY
Required

Used to upsert and query your vector index.

PINECONE_INDEX
Required

Must exactly match the index name in Pinecone — case-sensitive. "fast-rag" ≠ "Fast-RAG".

BROWSERLESS_TOKEN
Optional

Powers headless Chrome for scraping JS-rendered sites. Skip if not using URL ingestion.

Getting Started

Pinecone Setup

CriticalThis is the most common setup mistake. Using wrong dimension settings will crash the app immediately on first ingestion.
1

Go to app.pinecone.io and sign in

2

Click "Create Index"

3

Use these exact settings:

SettingValueNote
Namefast-ragMust match PINECONE_INDEX in .env.local
Dimensions1024⚠ Do NOT use the default 1536
MetricCosineRequired for semantic similarity
CloudAWS us-east-1Recommended for lowest latency
4

Click Create — wait ~30 seconds for the index to initialise

Why 1024 dimensions?FastRAG uses text-embedding-3-small forced to 1024 dims instead of the default 1536. This cuts Pinecone storage costs by ~33% with negligible quality loss.
Getting Started

Running Locally

bash
npm run dev

Open http://localhost:3000 in your browser.

Quick testUpload a small PDF (<1 MB), wait for the ingestion confirmation toast, then ask a question about its contents. If you get a cited answer — everything is wired up correctly.
How It Works

Architecture

FastRAG is a standard two-phase RAG pipeline. Ingestion happens once per document; retrieval happens on every chat message.

Ingestion (once per document)

Source
PDF / URL
Parse
Extract text
Chunk
1 000 chars
Embed
1 024 dims
Store
Pinecone

Retrieval (every message)

Question
User input
Embed
Same model
Query
Top-4 chunks
Prompt
Inject context
Stream
GPT-4o

Three Next.js API routes handle everything:

pages/api/ingest.jsPDF uploads, recursive chunking, and vector upsert
pages/api/ingest-url.jsPuppeteer scraping, text extraction, vectorization
pages/api/chat.jsSimilarity search, prompt construction, GPT-4o stream
How It Works

PDF Ingestion

Handled by pages/api/ingest.js. Supports multiple files uploaded simultaneously via drag-and-drop.

1

Form Parsing formidable handles the multipart upload and exposes file paths on the server filesystem.

2

Loading LangChain's PDFLoader extracts raw text from each file, page by page, preserving order.

3

Splitting RecursiveCharacterTextSplitter cuts text into 1 000-char chunks with 200-char overlap. The overlap preserves sentence context across chunk boundaries.

4

Embedding text-embedding-3-small converts each chunk into a 1 024-dimensional vector via the OpenAI Embeddings API.

5

Storage Vectors are upserted to Pinecone under a 'global' namespace so all documents are searched together in a single query.

How It Works

URL Ingestion

Handled by pages/api/ingest-url.js. Paste any URL to scrape, clean, and vectorize it in seconds.

1

Headless Browser puppeteer-core connects to Browserless.io — a remote Chromium instance that fully renders JavaScript before scraping. Works perfectly on React, Next.js, and SPA sites.

2

Extraction Pulls full body text after JS execution completes (waitUntil: 'networkidle2'), then strips navigation, footers, and boilerplate HTML.

3

Metadata Tags each vector with the source URL so the AI can cite the exact page in responses.

Puppeteer scrapes a single page, not an entire site. It will not follow links or crawl multiple pages automatically. For multi-page ingestion, call the endpoint once per URL.
How It Works

Chat & Retrieval

Handled by pages/api/chat.js. Every user message triggers a full retrieval cycle before GPT is called.

1

Embed Question The user's message is converted to a 1 024-dim vector using the same model as ingestion — ensuring cosine similarity scores are meaningful.

2

Pinecone Query Top-4 matching chunks are retrieved via similarity search. The topK value is configurable in chat.js.

3

Prompt Construction Retrieved chunks are injected into a system prompt that instructs the model to answer only from the provided context and to cite the source URL or filename.

4

Streaming GPT-4o streams the response token-by-token via LangChainAdapter and Vercel AI SDK. Users see answers appear in real time — no loading spinner needed.

How It Works

Frontend

Lives entirely in pages/index.js. A single-page chat interface with two ingestion modes.

File Upload Mode
Drag-and-drop or click to upload PDFs. Multiple files supported simultaneously. Calls /api/ingest.
URL Mode
Paste any URL to scrape and ingest it. Calls /api/ingest-url. Requires Browserless token.
useChat Hook
ai/react's useChat handles SSE streaming, message state, loading state, and auto-scroll.
Mobile Ready
Fully responsive layout. Tested natively on iOS Safari and Android Chrome.
Deployment

Deploy to Vercel

FastRAG is optimised for Vercel. Deployment takes about 5 minutes from a fresh clone.

1

Push your code to GitHub

bash
git init && git add .
git commit -m "initial"
git remote add origin https://github.com/you/fastrag.git
git push -u origin main
2

Import to Vercel

Go to vercel.com/new, import your GitHub repo, and select Next.js as the framework preset. No further configuration needed.

3

Add environment variables

In Vercel project → Settings → Environment Variables, add all keys from your .env.local:

+OPENAI_API_KEY
+PINECONE_API_KEY
+PINECONE_INDEX
+BROWSERLESS_TOKEN
4

Click Deploy — live in ~2 minutes

Vercel's free Hobby plan has a 10-second function timeout. Large PDFs or slow Puppeteer jobs may timeout. Upgrade to Pro ($20/mo) for a 60-second limit.
Reference

Troubleshooting

Click any error to expand the cause and fix.

Reference

FAQ