RAG Strategy & Vector Operations
Version: 1.0 Last Updated: 2025-10-15 Owner: AI/ML Architecture Status: Future Enhancement (Post-MVP)
Table of Contents
- Overview
- pgvector Setup
- Embedding Generation
- Vector Storage Pattern
- Similarity Search Queries
- Index Strategy
- RAG Implementation Architecture
- Performance Considerations
Overview
Purpose
This document outlines the Retrieval-Augmented Generation (RAG) strategy for Zixly, enabling AI-powered insights from unstructured business data (invoices, contracts, meeting notes, etc.) via vector similarity search.
Use Cases
Phase 1 (Post-MVP):
- Semantic search on custom metrics (find similar operational patterns)
- Natural language querying of business data (“Show me projects over budget in Q3”)
Phase 2 (Year 2):
- Document intelligence (extract insights from uploaded PDFs, emails)
- Predictive analytics (forecast cash flow using historical embeddings)
- Automated report generation (summarize financial trends in plain English)
Technology Stack
| Component | Technology | Purpose |
|---|---|---|
| Vector Database | PostgreSQL + pgvector | Store and query embeddings |
| Embedding Model | OpenAI text-embedding-3-small (1536 dimensions) |
Convert text → vectors |
| Distance Metric | Cosine similarity | Measure vector closeness (range: -1 to 1) |
| Query Interface | Raw SQL (Prisma doesn’t support pgvector natively) | Execute vector operations |
pgvector Setup
Extension Installation
Already Enabled in Phase 1: See Phase 1 Implementation Plan
-- Verify pgvector extension is enabled
SELECT * FROM pg_extension WHERE extname = 'vector';
-- Expected result:
-- extname | extversion | ...
-- vector | 0.5.1 | ...
If not enabled:
CREATE EXTENSION IF NOT EXISTS vector;
Supabase Note: Supabase includes pgvector by default in all projects (Sydney region included).
Vector Column in Schema
Already defined in Prisma schema:
model CustomMetric {
// ... other fields
embedding Unsupported("vector(1536)")? // OpenAI embedding dimension
}
Why Unsupported: Prisma doesn’t have native pgvector type support. We use Unsupported() to preserve the column in migrations, but must use raw SQL for vector operations.
Migration SQL:
-- Generated by Prisma (part of initial migration)
ALTER TABLE "custom_metrics" ADD COLUMN "embedding" vector(1536);
Embedding Generation
OpenAI API Integration
Model: text-embedding-3-small (1536 dimensions, $0.02/1M tokens)
Why this model:
- ✅ Latest OpenAI embedding model (released Jan 2024)
- ✅ 1536 dimensions (balance between quality and storage)
- ✅ 8191 token context window (handles long custom metric descriptions)
Implementation (lib/ai/embeddings.ts):
import OpenAI from 'openai'
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
})
export async function generateEmbedding(text: string): Promise<number[]> {
// Validate input
if (!text || text.trim().length === 0) {
throw new Error('Embedding text cannot be empty')
}
// Truncate to 8191 tokens (model limit)
const truncated = text.slice(0, 32000) // ~8k tokens approx
// Generate embedding
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: truncated,
encoding_format: 'float',
})
if (response.data.length === 0) {
throw new Error('OpenAI returned empty embedding')
}
return response.data[0].embedding
}
Usage Example:
import { generateEmbedding } from '@/lib/ai/embeddings'
// Generate embedding for custom metric
const metricDescription = `
Project Delivery Time: Average days from project kickoff to completion.
Target: 30 days. Current: 42 days.
Industry: Construction. Department: Operations.
`
const embedding = await generateEmbedding(metricDescription)
// Returns: Float32Array of 1536 values
Vector Storage Pattern
Storing Embeddings
Problem: Prisma doesn’t support vector type natively.
Solution: Use Prisma $executeRawUnsafe for INSERT/UPDATE with embeddings.
Pattern 1: Insert with Embedding
import prisma from '@/lib/prisma'
import { generateEmbedding } from '@/lib/ai/embeddings'
async function createCustomMetricWithEmbedding(data: {
clientKPIId: string
metricName: string
metricValue: number
unit: string
recordDate: Date
description: string // For embedding generation
}) {
// Generate embedding from description
const embedding = await generateEmbedding(
`${data.metricName}: ${data.description}. Value: ${data.metricValue} ${data.unit}`
)
// Convert embedding to PostgreSQL vector format
const vectorString = `[${embedding.join(',')}]`
// Insert with raw SQL (Prisma limitation)
await prisma.$executeRawUnsafe(
`
INSERT INTO custom_metrics (
id, client_kpi_id, metric_name, metric_value, unit, record_date, embedding, created_at
) VALUES (
$1, $2, $3, $4, $5, $6, $7::vector, $8
)
`,
crypto.randomUUID(), // id (manual cuid generation)
data.clientKPIId,
data.metricName,
data.metricValue,
data.unit,
data.recordDate,
vectorString, // Cast to vector type
new Date()
)
}
Pattern 2: Update Existing Record with Embedding
async function addEmbeddingToExistingMetric(metricId: string, description: string) {
const embedding = await generateEmbedding(description)
const vectorString = `[${embedding.join(',')}]`
await prisma.$executeRawUnsafe(
`UPDATE custom_metrics SET embedding = $1::vector WHERE id = $2`,
vectorString,
metricId
)
}
Similarity Search Queries
Basic Cosine Similarity Search
Query: Find custom metrics semantically similar to a search query.
async function findSimilarMetrics(query: string, topK: number = 5) {
// Generate embedding for query
const queryEmbedding = await generateEmbedding(query)
const vectorString = `[${queryEmbedding.join(',')}]`
// Execute similarity search
const results = await prisma.$queryRawUnsafe<
Array<{
id: string
metric_name: string
metric_value: number
similarity: number
}>
>(
`
SELECT
id,
metric_name,
metric_value,
(1 - (embedding <=> $1::vector)) AS similarity
FROM custom_metrics
WHERE embedding IS NOT NULL
ORDER BY embedding <=> $1::vector -- Cosine distance
LIMIT $2
`,
vectorString,
topK
)
return results
}
Usage:
// User query: "Show me metrics related to customer satisfaction"
const similar = await findSimilarMetrics('customer satisfaction scores and NPS', 5)
// Results (ordered by similarity):
// [
// { metric_name: 'Net Promoter Score', metric_value: 72, similarity: 0.89 },
// { metric_name: 'Customer Satisfaction (CSAT)', metric_value: 4.2, similarity: 0.85 },
// { metric_name: 'Support Ticket Resolution Time', metric_value: 18, similarity: 0.67 },
// ...
// ]
Distance Operators
pgvector supports 3 distance metrics:
| Operator | Metric | Range | Use Case |
|---|---|---|---|
<-> |
Euclidean distance (L2) | 0 to ∞ | Geometric distance |
<=> |
Cosine distance | 0 to 2 | Semantic similarity (recommended) |
<#> |
Inner product | -∞ to ∞ | Dot product (for normalized vectors) |
Cosine Similarity Conversion:
-- Cosine distance: 0 (identical) to 2 (opposite)
-- Cosine similarity: 1 (identical) to -1 (opposite)
-- Formula: similarity = 1 - distance
SELECT (1 - (embedding <=> query_embedding)) AS similarity
FROM custom_metrics;
Filtered Similarity Search
Query: Find similar metrics within a specific tenant (respecting multi-tenancy).
async function findSimilarMetricsForTenant(query: string, tenantId: string, topK: number = 5) {
const queryEmbedding = await generateEmbedding(query)
const vectorString = `[${queryEmbedding.join(',')}]`
// Set tenant context for RLS
await prisma.$executeRawUnsafe(`SET LOCAL app.tenant_id = $1`, tenantId)
const results = await prisma.$queryRawUnsafe<
Array<{
id: string
metric_name: string
similarity: number
}>
>(
`
SELECT
cm.id,
cm.metric_name,
(1 - (cm.embedding <=> $1::vector)) AS similarity
FROM custom_metrics cm
INNER JOIN client_kpis ck ON cm.client_kpi_id = ck.id
WHERE cm.embedding IS NOT NULL
AND ck.tenant_id = $2
ORDER BY cm.embedding <=> $1::vector
LIMIT $3
`,
vectorString,
tenantId,
topK
)
return results
}
Index Strategy
Vector Index Types
pgvector supports 2 index types:
| Index Type | Algorithm | Build Time | Query Speed | Accuracy |
|---|---|---|---|---|
| IVFFlat | Inverted File + Flat | Fast | Medium | ~95-99% |
| HNSW | Hierarchical Navigable Small World | Slow | Very Fast | ~99%+ |
Recommendation: Start with IVFFlat for MVP (simpler), migrate to HNSW at scale (500+ customers).
Creating IVFFlat Index
Index Configuration:
-- Create IVFFlat index with 100 lists (clusters)
-- Lists = sqrt(rows) is a common heuristic
-- For 10,000 metrics: sqrt(10000) = 100 lists
CREATE INDEX custom_metrics_embedding_idx
ON custom_metrics
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
Index Build Process:
- Sample vectors: pgvector samples vectors to create centroids (cluster centers)
- Assign to lists: Each vector assigned to nearest centroid
- Query optimization: Similarity search only checks nearest lists (not all vectors)
Performance: IVFFlat reduces query time from O(N) to O(N/lists) — 10x faster with 100 lists.
When to Create Index
Threshold: Create index after 1,000+ vectors inserted.
Why: Small datasets (<1,000 vectors) are faster with sequential scan than index overhead.
Migration:
-- Migration: Add index when custom_metrics reaches threshold
-- File: prisma/migrations/YYYYMMDDHHMMSS_add_vector_index_custom_metrics/migration.sql
-- Only create if table has enough rows
DO $$
BEGIN
IF (SELECT COUNT(*) FROM custom_metrics WHERE embedding IS NOT NULL) > 1000 THEN
CREATE INDEX CONCURRENTLY custom_metrics_embedding_idx
ON custom_metrics
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
RAISE NOTICE 'Vector index created';
ELSE
RAISE NOTICE 'Skipping index (not enough vectors yet)';
END IF;
END $$;
RAG Implementation Architecture
End-to-End RAG Flow
┌─────────────────────────────────────────────────────────────────┐
│ USER QUERY │
│ "Show me all operational efficiency metrics from Q3" │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ Step 1: QUERY EMBEDDING │
│ OpenAI API: text-embedding-3-small │
│ Input: Query string │
│ Output: Float[1536] │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ Step 2: VECTOR SIMILARITY SEARCH │
│ PostgreSQL + pgvector │
│ Query: SELECT ... ORDER BY embedding <=> query_embedding │
│ Output: Top 5 most similar custom_metrics │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ Step 3: CONTEXT AUGMENTATION │
│ Construct prompt with retrieved metrics: │
│ "Based on these metrics: [retrieved data], answer: [query]" │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ Step 4: GENERATE RESPONSE │
│ OpenAI GPT-4: Generate natural language response │
│ Input: Augmented prompt │
│ Output: Structured answer with citations │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ Step 5: RETURN TO USER │
│ API response with answer + source metric IDs │
└─────────────────────────────────────────────────────────────────┘
API Endpoint Example
Endpoint: POST /api/ai/query
// app/api/ai/query/route.ts
import { NextRequest, NextResponse } from 'next/server'
import { generateEmbedding } from '@/lib/ai/embeddings'
import { findSimilarMetricsForTenant } from '@/lib/ai/vector-search'
import OpenAI from 'openai'
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
export async function POST(request: NextRequest) {
const { query } = await request.json()
const tenantId = request.headers.get('x-tenant-id')!
// Step 1: Find relevant metrics via vector search
const relevantMetrics = await findSimilarMetricsForTenant(query, tenantId, 5)
// Step 2: Construct augmented prompt
const context = relevantMetrics
.map((m) => `- ${m.metric_name}: ${m.metric_value} (similarity: ${m.similarity.toFixed(2)})`)
.join('\n')
const prompt = `
You are a business intelligence assistant for Zixly.
Based on the following relevant metrics from the customer's data:
${context}
Answer this question: ${query}
Provide a concise, data-driven response with specific numbers.
`
// Step 3: Generate response with GPT-4
const completion = await openai.chat.completions.create({
model: 'gpt-4-turbo-preview',
messages: [{ role: 'user', content: prompt }],
temperature: 0.3, // Low temperature for factual responses
})
return NextResponse.json({
answer: completion.choices[0].message.content,
sources: relevantMetrics.map((m) => ({ id: m.id, name: m.metric_name })),
})
}
Performance Considerations
Embedding Generation Cost
| Operation | Cost (OpenAI) | Latency |
|---|---|---|
| Generate 1 embedding (1536 dim) | $0.00002 | ~200ms |
| Batch 100 embeddings | $0.002 | ~2s |
| Monthly (10,000 new metrics) | $0.20 | N/A |
Optimization: Batch embed multiple metrics in single API call (OpenAI supports up to 2,048 inputs per request).
Vector Search Performance
| Dataset Size | Sequential Scan | IVFFlat Index | HNSW Index |
|---|---|---|---|
| 1,000 vectors | 10ms | 15ms (overhead) | 20ms (overhead) |
| 10,000 vectors | 100ms | 15ms | 10ms |
| 100,000 vectors | 1,000ms | 50ms | 15ms |
| 1,000,000 vectors | 10,000ms | 200ms | 30ms |
Threshold: Create index after 1,000 vectors to avoid sequential scan penalty.
Storage Requirements
| Component | Size per Row | 1M Rows |
|---|---|---|
| Vector (1536 float32) | 6 KB | 6 GB |
| IVFFlat index | ~10 KB | ~10 GB |
| HNSW index | ~15 KB | ~15 GB |
Total: ~20-30 KB per custom metric with embedding + index.
Related Documentation
- System Architecture - Data Architecture
- Database Schema Diagram
- Phase 1 Implementation Plan - RAG Readiness
Document End
Review Cycle: Before RAG feature development (Post-MVP, Year 2) Next Review: 2026-06-01 (RAG feature scoping) Change History:
- 2025-10-15: Initial version (v1.0) - RAG strategy outlined for future implementation
Approval:
- AI/ML Lead: [Founder Name] - Approved 2025-10-15