Deep dive
Everything included.
Nothing hidden.
No black boxes. No magic abstractions. Every feature is yours to read, customize, and extend. Here's exactly how each one works.
Web Scraping
Web Scraping Engine
Turn any URL into searchable knowledge in seconds.
FastRAG's scraping pipeline uses Puppeteer connected to Browserless.io — a cloud-hosted headless Chrome service that's safe to run in serverless environments like Vercel.
Unlike simple HTTP fetches, the headless browser fully renders JavaScript before scraping. This means React apps, Next.js docs sites, Nuxt, and any SPA are scraped perfectly without missing dynamically loaded content.
After scraping, raw HTML is cleaned, boilerplate is stripped, and the resulting text is chunked using LangChain's RecursiveCharacterTextSplitter before being vectorized.
- Renders SPAs and JavaScript-heavy sites perfectly
- Handles bot detection better than raw fetch
- Strips nav, footers, and boilerplate automatically
- Configurable chunk size and overlap
- Respects robots.txt and rate limits
500)">// pages/api/ingest-url.jsconst browser = await puppeteer.connect({browserWSEndpoint: process.env.BROWSERLESS_URL,});const page = await browser.newPage();await page.goto(url, { waitUntil: 'networkidle2' });const html = await page.content();const text = await cleanHTML(html); 500)">// strips boilerplateconst chunks = await splitter.splitText(text);await vectorStore.addDocuments(chunks, { namespace });
PDF Ingestion
Multi-File PDF Ingestion
Drag, drop, and start chatting with any document.
The PDF pipeline accepts up to 10 files simultaneously (up to 10MB each). Files are processed using pdf-parse for text extraction, then passed through LangChain's document loaders for consistent chunking.
Each file is tagged with its source metadata so the chat interface can cite exactly which document — and which page — an answer came from. Multiple PDFs are merged into a single namespace, so users can ask cross-document questions naturally.
- Process up to 10 PDFs in a single upload
- Per-page metadata for accurate source citations
- Cross-document querying in one conversation
- Handles scanned PDFs via OCR (optional)
- Automatic deduplication of repeated content
500)">// lib/ingest-pdf.jsconst loader = new PDFLoader(filePath);const docs = await loader.load();const splitter = new RecursiveCharacterTextSplitter({chunkSize: 1000,chunkOverlap: 200,});const chunks = await splitter.splitDocuments(docs);await vectorStore.addDocuments(chunks, {namespace: userId,metadata: { source: filename, type: 'pdf' },});
Vector Optimization
Smart Vector Optimization
33% cheaper Pinecone storage with zero quality loss.
By default, OpenAI's embedding models produce 1536-dimensional vectors. FastRAG forces 1024-dim output using the dimensions parameter — a supported OpenAI feature that reduces vector size by 33% with minimal semantic quality loss.
This optimization compounds significantly at scale. If you're storing millions of chunks, the savings on Pinecone storage costs are substantial. FastRAG also configures the cosine similarity metric correctly out of the box, which is optimal for semantic search.
- text-embedding-3-large with forced 1024-dim output
- 33% reduction in Pinecone storage costs
- Cosine similarity metric configured correctly
- Batched embedding API calls to reduce latency
- Automatic retry on rate limit errors
500)">// lib/embeddings.jsconst embeddings = new OpenAIEmbeddings({model: 'text-embedding-3-large',dimensions: 1024, 500)">// ← the cost-saving trick});const vectorStore = await PineconeStore.fromExistingIndex(embeddings,{ pineconeIndex, namespace: userId });
Streaming
Real-Time Streaming Responses
Answers appear word by word. No waiting.
FastRAG uses Vercel AI SDK's StreamingTextResponse to pipe GPT-4o output directly to the client as Server-Sent Events (SSE). Users see the response appearing token-by-token, which dramatically improves perceived speed.
The frontend uses the useChat hook from ai/react, which handles the SSE connection, appends chunks to the message state, and auto-scrolls — all with one hook. No custom WebSocket setup required.
- Vercel AI SDK StreamingTextResponse on the server
- useChat hook on the client — one line of code
- Auto-scroll and smooth text rendering
- Works on Vercel Edge Runtime
- Graceful fallback if streaming is unsupported
500)">// pages/api/chat.jsimport { OpenAIStream, StreamingTextResponse } from 'ai';const response = await openai.chat.completions.create({model: 'gpt-4o',stream: true, 500)">// ← enable streamingmessages: buildMessages(context, query),});const stream = OpenAIStream(response);return new StreamingTextResponse(stream);
Ready to ship?
Get the full source code and start building your AI SaaS today.