Embeddable RAG FAQ-Based AI Chatbot Widget - JavaScript, Cloudflare Workers, Vectorize, BGE Model, Llama 3 FullStack Project
A production-ready, embeddable AI chatbot widget powered by Cloudflare Workers, featuring RAG (Retrieval Augmented Generation), real-time streaming responses, and zero-dependency client-side implementation.
-
Production-Live: https://arnob-mahmud.vercel.app/
- Overview
- Features
- Technology Stack
- Project Structure
- How It Works
- Installation & Setup
- Environment Variables
- Deployment
- Usage
- API Endpoints
- Components & Architecture
- Reusing Components
- Code Examples
- Keywords
- Conclusion
This project is a fully functional, embeddable AI chatbot widget that can be integrated into any website. It leverages Cloudflare's edge computing infrastructure to provide fast, scalable AI-powered conversations with RAG capabilities for accurate, context-aware responses.
- Zero Dependencies: Client-side widget is pure vanilla JavaScript
- Edge Computing: Runs on Cloudflare Workers for global low-latency
- RAG Implementation: Semantic search using Vectorize for accurate FAQ retrieval
- Real-time Streaming: Server-Sent Events (SSE) for progressive response rendering
- Session Management: Persistent conversations using Cloudflare KV
- Mobile Responsive: Optimized for all screen sizes with keyboard handling
- Dark/Light Mode: Automatic theme detection with manual toggle
- RAG (Retrieval Augmented Generation): Semantic search through FAQ database
- Streaming AI Responses: Real-time token streaming using Workers AI (Llama 3)
- Session Persistence: 30-day conversation history stored in Cloudflare KV
- CORS Support: Cross-origin requests enabled for embedding
- Aggressive Caching: Static assets cached for 1 year for optimal performance
- Health Check Endpoint: Monitoring and status verification
- Embeddable Widget: Single script tag integration
- Inline Styles: No external CSS dependencies required
- Dark/Light Mode: System preference detection with manual toggle
- Mobile Optimized: Responsive design with keyboard-aware positioning
- Progressive Rendering: Messages appear as they stream
- Typing Indicator: Visual feedback during AI processing
- Menu System: Theme toggle and chat clearing
- Accessibility: Proper ARIA labels and keyboard navigation
- Cloudflare Workers: Serverless edge computing platform
- Workers AI: AI model inference (Llama 3, BGE embeddings)
- Vectorize: Vector database for semantic search
- Cloudflare KV: Key-value storage for sessions
- Server-Sent Events (SSE): Real-time streaming protocol
- Vanilla JavaScript: Zero dependencies, pure ES6+
- Inline CSS: Portable styling without external frameworks
- Tailwind CSS: Used only for demo page (optional)
- DOM API: Native browser APIs for maximum compatibility
- Wrangler CLI: Cloudflare Workers development and deployment
- Node.js: Runtime environment
- npm: Package management
cloudflare-chatbot-widget/
βββ src/
β βββ index.js # Main Worker entry point (API routes, RAG, streaming)
β βββ input.css # Tailwind CSS source (for demo page)
βββ public/
β βββ index.html # Demo page showcasing the widget
β βββ widget.js # Embeddable widget script (vanilla JS)
β βββ styles.css # Compiled CSS (Tailwind + widget styles)
βββ wrangler.jsonc # Cloudflare Workers configuration
βββ tailwind.config.js # Tailwind CSS configuration
βββ package.json # Dependencies and scripts
βββ .gitignore # Git ignore rules
βββ README.md # This filesrc/index.js: Main Worker code handling all API routes, RAG logic, streaming, and asset servingpublic/widget.js: Self-contained embeddable widget script (577 lines, fully commented)public/index.html: Demo page demonstrating widget integrationpublic/styles.css: Compiled CSS including Tailwind utilities and widget-specific styleswrangler.jsonc: Cloudflare bindings configuration (KV, Vectorize, AI, ASSETS)
User Input β Widget (widget.js)
β
POST /api/chat β Cloudflare Worker (src/index.js)
β
1. Extract/Generate Session ID
2. Retrieve FAQ Context (RAG)
- Generate embedding vector
- Query Vectorize index
- Get top 3 relevant FAQs
3. Build AI Message Array
- System prompt + FAQ context
- Last 10 conversation messages
4. Stream AI Response
- Workers AI (Llama 3)
- Transform stream
- Parse SSE format
5. Save to KV Storage
β
SSE Stream β Widget
β
Progressive Rendering β User sees response in real-time- Question Embedding: User's question is converted to a 768-dimensional vector using BGE model
- Semantic Search: Vectorize index is queried for similar vectors (cosine similarity)
- Context Retrieval: Top 3 most relevant FAQ entries are retrieved
- Context Injection: FAQs are formatted and injected into the AI system prompt
- AI Generation: Llama 3 generates response using the FAQ context + conversation history
- Session Creation: New sessions get a UUID-based session ID
- Cookie Storage: Session ID stored in HttpOnly cookie (30-day expiration)
- KV Storage: Full conversation history stored in Cloudflare KV
- Session Retrieval: Existing sessions load conversation history on widget initialization
- Node.js 18+ installed
- Cloudflare account with Workers plan
- Wrangler CLI installed globally (or use npx)
git clone <repository-url>
cd cloudflare-chatbot-widgetnpm installThis installs:
wrangler: Cloudflare Workers CLItailwindcss: CSS framework (for demo page)autoprefixer&postcss: CSS processing
npx wrangler kv namespace create CHAT_SESSIONSCopy the namespace ID and update wrangler.jsonc:
npx wrangler vectorize create faq-vectors \
--dimensions=768 \
--metric=cosineUpdate wrangler.jsonc:
"vectorize": [
{
"binding": "VECTORIZE",
"index_name": "faq-vectors"
}
]Workers AI is automatically available in your Worker. No additional setup needed.
npm run build:cssThis compiles Tailwind CSS and appends widget-specific styles.
npm run deployOr manually:
npx wrangler deployAfter deployment, populate the Vectorize index:
curl -X POST https://your-worker.workers.dev/api/seedThis generates embeddings for all FAQs and stores them in Vectorize.
This project uses Cloudflare bindings configured in wrangler.jsonc. No .env file is needed for production.
For local development, create .dev.vars (git-ignored):
# .dev.vars (optional, for local testing)
# Most bindings are configured in wrangler.jsonc-
KV Namespace (
CHAT_SESSIONS)- Purpose: Store conversation sessions
- TTL: 30 days
- Created via:
wrangler kv namespace create
-
Vectorize Index (
VECTORIZE)- Purpose: Store FAQ embeddings for RAG
- Dimensions: 768 (BGE model output)
- Metric: cosine similarity
- Created via:
wrangler vectorize create
-
Workers AI (
AI)- Purpose: Run AI models (Llama 3, BGE embeddings)
- Automatically available: No setup needed
-
Assets (
ASSETS)- Purpose: Serve static files (widget.js, styles.css, index.html)
- Directory:
./public - Automatically configured
{
"name": "ai-chatbot-widget",
"main": "src/index.js",
"compatibility_date": "2025-12-23",
"assets": {
"directory": "./public",
"binding": "ASSETS",
},
"ai": {
"binding": "AI",
},
"vectorize": [
{
"binding": "VECTORIZE",
"index_name": "faq-vectors",
},
],
"kv_namespaces": [
{
"binding": "CHAT_SESSIONS",
"id": "YOUR_KV_NAMESPACE_ID",
},
],
}npm run devStarts local development server with hot reload.
npm run deployThis:
- Builds CSS (
npm run build:css) - Deploys Worker to Cloudflare (
wrangler deploy)
-
Seed FAQ Data:
curl -X POST https://your-worker.workers.dev/api/seed
-
Verify Health:
curl https://your-worker.workers.dev/api/health
-
Test Widget: Visit
https://your-worker.workers.devto see the demo page.
Add these lines to your HTML:
<!-- Optional: Configure widget -->
<script>
window.CHATBOT_BASE_URL = "https://your-worker.workers.dev";
window.CHATBOT_TITLE = "Support Assistant";
window.CHATBOT_GREETING = "Hi! π How can I help you today?";
window.CHATBOT_PLACEHOLDER = "Type your message...";
</script>
<!-- Load widget script -->
<script src="https://your-worker.workers.dev/widget.js"></script>| Variable | Default | Description |
|---|---|---|
CHATBOT_BASE_URL |
window.location.origin |
API endpoint URL |
CHATBOT_TITLE |
'Chat Assistant' |
Widget header title |
CHATBOT_GREETING |
'π How can I help you today?' |
Initial greeting message |
CHATBOT_PLACEHOLDER |
'Message...' |
Input field placeholder |
import { useEffect } from "react";
function App() {
useEffect(() => {
// Configure widget
window.CHATBOT_BASE_URL = "https://your-worker.workers.dev";
window.CHATBOT_TITLE = "AI Assistant";
// Load widget script
const script = document.createElement("script");
script.src = "https://your-worker.workers.dev/widget.js";
script.async = true;
document.body.appendChild(script);
return () => {
// Cleanup (optional)
document.body.removeChild(script);
};
}, []);
return <div>Your app content</div>;
}// app/layout.tsx
export default function RootLayout({ children }) {
return (
<html>
<head>
<script
dangerouslySetInnerHTML={{
__html: `
window.CHATBOT_BASE_URL = '${process.env.NEXT_PUBLIC_CHATBOT_URL}';
window.CHATBOT_TITLE = 'Support';
`,
}}
/>
</head>
<body>
{children}
<script src={`${process.env.NEXT_PUBLIC_CHATBOT_URL}/widget.js`} />
</body>
</html>
);
}Sends a message and receives a streaming AI response.
Request:
{
"message": "Tell me about your services"
}Response: Server-Sent Events (SSE) stream
data: {"response": "Hello"}
data: {"response": "! "}
data: {"response": "I can"}
...
data: [DONE]Headers:
Content-Type: application/json(request)Content-Type: text/event-stream(response)Set-Cookie: chatbot_session=...(new sessions)
Retrieves conversation history for the current session.
Request: Cookie-based (no body needed)
Response:
{
"messages": [
{
"role": "user",
"content": "Hello",
"timestamp": 1234567890
},
{
"role": "assistant",
"content": "Hi! How can I help?",
"timestamp": 1234567891
}
]
}Headers:
- Cookie:
chatbot_session=<session-id>
Populates the Vectorize index with FAQ embeddings.
Request: No body required
Response:
{
"success": true,
"count": 20
}Note: Run this once after deployment to populate the knowledge base.
Health check endpoint for monitoring.
Response:
{
"status": "ok"
}GET /widget.js- Embeddable widget scriptGET /styles.css- Widget stylesheetGET /index.html- Demo pageGET /- Servesindex.html
All static assets are served with aggressive caching headers (1 year).
Location: src/index.js
Purpose: Implements Retrieval Augmented Generation
Process:
async function faq(env, q) {
// 1. Generate embedding
const embedding = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
text: [q],
});
// 2. Query Vectorize
const results = await env.VECTORIZE.query(embedding.data[0], {
topK: 3,
returnMetadata: "all",
});
// 3. Format context
return results.matches
.map((m) => `Q: ${m.metadata.question}\nA: ${m.metadata.answer}`)
.join("\n\n");
}Reusability: Can be extracted to a separate module for use in other Workers.
Location: src/index.js
Purpose: Handles chat requests with streaming
Key Features:
- Session management
- RAG context injection
- SSE streaming
- KV persistence
Reusability: The streaming pattern can be adapted for other AI models.
Location: src/index.js
Purpose: Populates Vectorize index
Process:
- Iterate through FAQ array
- Generate embeddings in parallel
- Upsert to Vectorize index
Reusability: FAQ array can be externalized to a JSON file or database.
Location: public/widget.js
Purpose: Creates DOM elements and sets up widget
Key Features:
- Creates floating action button
- Creates chat window
- Applies responsive styles
- Binds event handlers
Reusability: The initialization pattern can be adapted for other embeddable widgets.
Location: public/widget.js
Purpose: Manages dark/light mode
Implementation:
- Toggles
darkclass on container - Updates inline styles for all elements
- Preserves state across sessions
Reusability: Theme logic can be extracted to a standalone utility.
Location: public/widget.js
Purpose: Handles SSE streaming and progressive rendering
Process:
- Send POST request
- Read SSE stream
- Parse
data: {...}lines - Update DOM incrementally
Reusability: SSE handling can be extracted to a reusable utility function.
// Copy faq() function from src/index.js
// Adapt to your embedding model and vector database
async function customRAG(env, question) {
const embedding = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
text: [question],
});
const results = await env.VECTORIZE.query(embedding.data[0], {
topK: 5, // Adjust as needed
returnMetadata: "all",
});
return results.matches.map((m) => m.metadata);
}-
Copy
widget.jsto your project -
Update API URL:
window.CHATBOT_BASE_URL = "https://your-api.com";
-
Customize styling by modifying inline styles in
init() -
Add to HTML:
<script src="/path/to/widget.js"></script>
// Reusable SSE handler
async function streamResponse(url, data, onChunk) {
const response = await fetch(url, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(data),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const lines = decoder.decode(value).split("\n");
for (const line of lines) {
if (line.startsWith("data: ")) {
const json = JSON.parse(line.slice(6));
onChunk(json);
}
}
}
}
// Usage
streamResponse("/api/chat", { message: "Hello" }, (chunk) => {
console.log(chunk.response);
});Modify the seed() function to use your own FAQs:
const faqs = [
["What is your return policy?", "We offer 30-day returns..."],
["How do I track my order?", "You can track your order..."],
// Add more FAQs
];Switch to a different Workers AI model:
// In chat() function
const stream = await env.AI.run("@cf/meta/llama-3-70b-instruct", {
messages: msgs,
stream: true,
});Use a different storage backend:
// Replace KV with your own storage
await yourDatabase.save(sessionId, sessionData);Modify widget appearance in widget.js:
// In init() function, update inline styles
btn.style.cssText = `
position: fixed !important;
bottom: 2rem !important;
right: 2rem !important;
width: 4rem !important;
height: 4rem !important;
background-color: #your-color !important;
// ... more styles
`;- RAG (Retrieval Augmented Generation)
- Cloudflare Workers
- Server-Sent Events (SSE)
- Vector Database
- Semantic Search
- Embeddable Widget
- Vanilla JavaScript
- Edge Computing
- AI Chatbot
- Streaming Responses
- Session Management
- Dark Mode
- Mobile Responsive
- Zero Dependencies
- Workers AI
- Vectorize
- Cloudflare KV
- Embeddings
- BGE Model
- Llama 3
This project demonstrates a production-ready implementation of an AI chatbot widget with RAG capabilities, built entirely on Cloudflare's edge computing platform. It showcases:
- Modern Architecture: Serverless, edge-first design
- Best Practices: Clean code, comprehensive comments, proper error handling
- Performance: Aggressive caching, streaming responses, optimized assets
- Portability: Zero-dependency client-side code, inline styles
- Scalability: Cloudflare's global network ensures low latency worldwide
The codebase is well-documented and structured for easy understanding and extension. Each component can be reused independently in other projects.
Feel free to use this project repository and extend this project further!
If you have any questions or want to share your work, reach out via GitHub or my portfolio at https://arnob-mahmud.vercel.app/.
Enjoy building and learning! π
Thank you! π