Deep dive

Everything included.
Nothing hidden.

No black boxes. No magic abstractions. Every feature is yours to read, customize, and extend. Here's exactly how each one works.

Web Scraping

Web Scraping Engine

Turn any URL into searchable knowledge in seconds.

FastRAG's scraping pipeline uses Puppeteer connected to Browserless.io — a cloud-hosted headless Chrome service that's safe to run in serverless environments like Vercel. Unlike simple HTTP fetches, the headless browser fully renders JavaScript before scraping. This means React apps, Next.js docs sites, Nuxt, and any SPA are scraped perfectly without missing dynamically loaded content. After scraping, raw HTML is cleaned, boilerplate is stripped, and the resulting text is chunked using LangChain's RecursiveCharacterTextSplitter before being vectorized.
  • Renders SPAs and JavaScript-heavy sites perfectly
  • Handles bot detection better than raw fetch
  • Strips nav, footers, and boilerplate automatically
  • Configurable chunk size and overlap
  • Respects robots.txt and rate limits
scraping.js
500)">// pages/api/ingest-url.js
const browser = await puppeteer.connect({
browserWSEndpoint: process.env.BROWSERLESS_URL,
});
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle2' });
const html = await page.content();
const text = await cleanHTML(html); 500)">// strips boilerplate
const chunks = await splitter.splitText(text);
await vectorStore.addDocuments(chunks, { namespace });
PDF Ingestion

Multi-File PDF Ingestion

Drag, drop, and start chatting with any document.

The PDF pipeline accepts up to 10 files simultaneously (up to 10MB each). Files are processed using pdf-parse for text extraction, then passed through LangChain's document loaders for consistent chunking. Each file is tagged with its source metadata so the chat interface can cite exactly which document — and which page — an answer came from. Multiple PDFs are merged into a single namespace, so users can ask cross-document questions naturally.
  • Process up to 10 PDFs in a single upload
  • Per-page metadata for accurate source citations
  • Cross-document querying in one conversation
  • Handles scanned PDFs via OCR (optional)
  • Automatic deduplication of repeated content
pdf.js
500)">// lib/ingest-pdf.js
const loader = new PDFLoader(filePath);
const docs = await loader.load();
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const chunks = await splitter.splitDocuments(docs);
await vectorStore.addDocuments(chunks, {
namespace: userId,
metadata: { source: filename, type: 'pdf' },
});
Vector Optimization

Smart Vector Optimization

33% cheaper Pinecone storage with zero quality loss.

By default, OpenAI's embedding models produce 1536-dimensional vectors. FastRAG forces 1024-dim output using the dimensions parameter — a supported OpenAI feature that reduces vector size by 33% with minimal semantic quality loss. This optimization compounds significantly at scale. If you're storing millions of chunks, the savings on Pinecone storage costs are substantial. FastRAG also configures the cosine similarity metric correctly out of the box, which is optimal for semantic search.
  • text-embedding-3-large with forced 1024-dim output
  • 33% reduction in Pinecone storage costs
  • Cosine similarity metric configured correctly
  • Batched embedding API calls to reduce latency
  • Automatic retry on rate limit errors
vectors.js
500)">// lib/embeddings.js
const embeddings = new OpenAIEmbeddings({
model: 'text-embedding-3-large',
dimensions: 1024, 500)">// ← the cost-saving trick
});
const vectorStore = await PineconeStore.fromExistingIndex(
embeddings,
{ pineconeIndex, namespace: userId }
);
Streaming

Real-Time Streaming Responses

Answers appear word by word. No waiting.

FastRAG uses Vercel AI SDK's StreamingTextResponse to pipe GPT-4o output directly to the client as Server-Sent Events (SSE). Users see the response appearing token-by-token, which dramatically improves perceived speed. The frontend uses the useChat hook from ai/react, which handles the SSE connection, appends chunks to the message state, and auto-scrolls — all with one hook. No custom WebSocket setup required.
  • Vercel AI SDK StreamingTextResponse on the server
  • useChat hook on the client — one line of code
  • Auto-scroll and smooth text rendering
  • Works on Vercel Edge Runtime
  • Graceful fallback if streaming is unsupported
streaming.js
500)">// pages/api/chat.js
import { OpenAIStream, StreamingTextResponse } from 'ai';
const response = await openai.chat.completions.create({
model: 'gpt-4o',
stream: true, 500)">// ← enable streaming
messages: buildMessages(context, query),
});
const stream = OpenAIStream(response);
return new StreamingTextResponse(stream);

Ready to ship?

Get the full source code and start building your AI SaaS today.