RAG Strategy & Vector Operations

Version: 1.0 Last Updated: 2025-10-15 Owner: AI/ML Architecture Status: Future Enhancement (Post-MVP)


Table of Contents

  1. Overview
  2. pgvector Setup
  3. Embedding Generation
  4. Vector Storage Pattern
  5. Similarity Search Queries
  6. Index Strategy
  7. RAG Implementation Architecture
  8. Performance Considerations

Overview

Purpose

This document outlines the Retrieval-Augmented Generation (RAG) strategy for Zixly, enabling AI-powered insights from unstructured business data (invoices, contracts, meeting notes, etc.) via vector similarity search.

Use Cases

Phase 1 (Post-MVP):

Phase 2 (Year 2):

Technology Stack

Component Technology Purpose
Vector Database PostgreSQL + pgvector Store and query embeddings
Embedding Model OpenAI text-embedding-3-small (1536 dimensions) Convert text → vectors
Distance Metric Cosine similarity Measure vector closeness (range: -1 to 1)
Query Interface Raw SQL (Prisma doesn’t support pgvector natively) Execute vector operations

pgvector Setup

Extension Installation

Already Enabled in Phase 1: See Phase 1 Implementation Plan

-- Verify pgvector extension is enabled
SELECT * FROM pg_extension WHERE extname = 'vector';

-- Expected result:
-- extname | extversion | ...
-- vector  | 0.5.1      | ...

If not enabled:

CREATE EXTENSION IF NOT EXISTS vector;

Supabase Note: Supabase includes pgvector by default in all projects (Sydney region included).


Vector Column in Schema

Already defined in Prisma schema:

model CustomMetric {
  // ... other fields
  embedding  Unsupported("vector(1536)")?  // OpenAI embedding dimension
}

Why Unsupported: Prisma doesn’t have native pgvector type support. We use Unsupported() to preserve the column in migrations, but must use raw SQL for vector operations.

Migration SQL:

-- Generated by Prisma (part of initial migration)
ALTER TABLE "custom_metrics" ADD COLUMN "embedding" vector(1536);

Embedding Generation

OpenAI API Integration

Model: text-embedding-3-small (1536 dimensions, $0.02/1M tokens)

Why this model:

Implementation (lib/ai/embeddings.ts):

import OpenAI from 'openai'

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
})

export async function generateEmbedding(text: string): Promise<number[]> {
  // Validate input
  if (!text || text.trim().length === 0) {
    throw new Error('Embedding text cannot be empty')
  }

  // Truncate to 8191 tokens (model limit)
  const truncated = text.slice(0, 32000) // ~8k tokens approx

  // Generate embedding
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: truncated,
    encoding_format: 'float',
  })

  if (response.data.length === 0) {
    throw new Error('OpenAI returned empty embedding')
  }

  return response.data[0].embedding
}

Usage Example:

import { generateEmbedding } from '@/lib/ai/embeddings'

// Generate embedding for custom metric
const metricDescription = `
  Project Delivery Time: Average days from project kickoff to completion.
  Target: 30 days. Current: 42 days.
  Industry: Construction. Department: Operations.
`

const embedding = await generateEmbedding(metricDescription)
// Returns: Float32Array of 1536 values

Vector Storage Pattern

Storing Embeddings

Problem: Prisma doesn’t support vector type natively.

Solution: Use Prisma $executeRawUnsafe for INSERT/UPDATE with embeddings.

Pattern 1: Insert with Embedding

import prisma from '@/lib/prisma'
import { generateEmbedding } from '@/lib/ai/embeddings'

async function createCustomMetricWithEmbedding(data: {
  clientKPIId: string
  metricName: string
  metricValue: number
  unit: string
  recordDate: Date
  description: string // For embedding generation
}) {
  // Generate embedding from description
  const embedding = await generateEmbedding(
    `${data.metricName}: ${data.description}. Value: ${data.metricValue} ${data.unit}`
  )

  // Convert embedding to PostgreSQL vector format
  const vectorString = `[${embedding.join(',')}]`

  // Insert with raw SQL (Prisma limitation)
  await prisma.$executeRawUnsafe(
    `
    INSERT INTO custom_metrics (
      id, client_kpi_id, metric_name, metric_value, unit, record_date, embedding, created_at
    ) VALUES (
      $1, $2, $3, $4, $5, $6, $7::vector, $8
    )
    `,
    crypto.randomUUID(), // id (manual cuid generation)
    data.clientKPIId,
    data.metricName,
    data.metricValue,
    data.unit,
    data.recordDate,
    vectorString, // Cast to vector type
    new Date()
  )
}

Pattern 2: Update Existing Record with Embedding

async function addEmbeddingToExistingMetric(metricId: string, description: string) {
  const embedding = await generateEmbedding(description)
  const vectorString = `[${embedding.join(',')}]`

  await prisma.$executeRawUnsafe(
    `UPDATE custom_metrics SET embedding = $1::vector WHERE id = $2`,
    vectorString,
    metricId
  )
}

Similarity Search Queries

Query: Find custom metrics semantically similar to a search query.

async function findSimilarMetrics(query: string, topK: number = 5) {
  // Generate embedding for query
  const queryEmbedding = await generateEmbedding(query)
  const vectorString = `[${queryEmbedding.join(',')}]`

  // Execute similarity search
  const results = await prisma.$queryRawUnsafe<
    Array<{
      id: string
      metric_name: string
      metric_value: number
      similarity: number
    }>
  >(
    `
    SELECT
      id,
      metric_name,
      metric_value,
      (1 - (embedding <=> $1::vector)) AS similarity
    FROM custom_metrics
    WHERE embedding IS NOT NULL
    ORDER BY embedding <=> $1::vector  -- Cosine distance
    LIMIT $2
    `,
    vectorString,
    topK
  )

  return results
}

Usage:

// User query: "Show me metrics related to customer satisfaction"
const similar = await findSimilarMetrics('customer satisfaction scores and NPS', 5)

// Results (ordered by similarity):
// [
//   { metric_name: 'Net Promoter Score', metric_value: 72, similarity: 0.89 },
//   { metric_name: 'Customer Satisfaction (CSAT)', metric_value: 4.2, similarity: 0.85 },
//   { metric_name: 'Support Ticket Resolution Time', metric_value: 18, similarity: 0.67 },
//   ...
// ]

Distance Operators

pgvector supports 3 distance metrics:

Operator Metric Range Use Case
<-> Euclidean distance (L2) 0 to ∞ Geometric distance
<=> Cosine distance 0 to 2 Semantic similarity (recommended)
<#> Inner product -∞ to ∞ Dot product (for normalized vectors)

Cosine Similarity Conversion:

-- Cosine distance: 0 (identical) to 2 (opposite)
-- Cosine similarity: 1 (identical) to -1 (opposite)
-- Formula: similarity = 1 - distance
SELECT (1 - (embedding <=> query_embedding)) AS similarity
FROM custom_metrics;

Query: Find similar metrics within a specific tenant (respecting multi-tenancy).

async function findSimilarMetricsForTenant(query: string, tenantId: string, topK: number = 5) {
  const queryEmbedding = await generateEmbedding(query)
  const vectorString = `[${queryEmbedding.join(',')}]`

  // Set tenant context for RLS
  await prisma.$executeRawUnsafe(`SET LOCAL app.tenant_id = $1`, tenantId)

  const results = await prisma.$queryRawUnsafe<
    Array<{
      id: string
      metric_name: string
      similarity: number
    }>
  >(
    `
    SELECT
      cm.id,
      cm.metric_name,
      (1 - (cm.embedding <=> $1::vector)) AS similarity
    FROM custom_metrics cm
    INNER JOIN client_kpis ck ON cm.client_kpi_id = ck.id
    WHERE cm.embedding IS NOT NULL
      AND ck.tenant_id = $2
    ORDER BY cm.embedding <=> $1::vector
    LIMIT $3
    `,
    vectorString,
    tenantId,
    topK
  )

  return results
}

Index Strategy

Vector Index Types

pgvector supports 2 index types:

Index Type Algorithm Build Time Query Speed Accuracy
IVFFlat Inverted File + Flat Fast Medium ~95-99%
HNSW Hierarchical Navigable Small World Slow Very Fast ~99%+

Recommendation: Start with IVFFlat for MVP (simpler), migrate to HNSW at scale (500+ customers).


Creating IVFFlat Index

Index Configuration:

-- Create IVFFlat index with 100 lists (clusters)
-- Lists = sqrt(rows) is a common heuristic
-- For 10,000 metrics: sqrt(10000) = 100 lists
CREATE INDEX custom_metrics_embedding_idx
ON custom_metrics
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Index Build Process:

  1. Sample vectors: pgvector samples vectors to create centroids (cluster centers)
  2. Assign to lists: Each vector assigned to nearest centroid
  3. Query optimization: Similarity search only checks nearest lists (not all vectors)

Performance: IVFFlat reduces query time from O(N) to O(N/lists) — 10x faster with 100 lists.


When to Create Index

Threshold: Create index after 1,000+ vectors inserted.

Why: Small datasets (<1,000 vectors) are faster with sequential scan than index overhead.

Migration:

-- Migration: Add index when custom_metrics reaches threshold
-- File: prisma/migrations/YYYYMMDDHHMMSS_add_vector_index_custom_metrics/migration.sql

-- Only create if table has enough rows
DO $$
BEGIN
  IF (SELECT COUNT(*) FROM custom_metrics WHERE embedding IS NOT NULL) > 1000 THEN
    CREATE INDEX CONCURRENTLY custom_metrics_embedding_idx
    ON custom_metrics
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);

    RAISE NOTICE 'Vector index created';
  ELSE
    RAISE NOTICE 'Skipping index (not enough vectors yet)';
  END IF;
END $$;

RAG Implementation Architecture

End-to-End RAG Flow

┌─────────────────────────────────────────────────────────────────┐
│                         USER QUERY                               │
│  "Show me all operational efficiency metrics from Q3"           │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  Step 1: QUERY EMBEDDING                                         │
│  OpenAI API: text-embedding-3-small                             │
│  Input: Query string                                             │
│  Output: Float[1536]                                             │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  Step 2: VECTOR SIMILARITY SEARCH                                │
│  PostgreSQL + pgvector                                           │
│  Query: SELECT ... ORDER BY embedding <=> query_embedding        │
│  Output: Top 5 most similar custom_metrics                      │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  Step 3: CONTEXT AUGMENTATION                                    │
│  Construct prompt with retrieved metrics:                        │
│  "Based on these metrics: [retrieved data], answer: [query]"    │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  Step 4: GENERATE RESPONSE                                       │
│  OpenAI GPT-4: Generate natural language response               │
│  Input: Augmented prompt                                         │
│  Output: Structured answer with citations                        │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  Step 5: RETURN TO USER                                          │
│  API response with answer + source metric IDs                    │
└─────────────────────────────────────────────────────────────────┘

API Endpoint Example

Endpoint: POST /api/ai/query

// app/api/ai/query/route.ts
import { NextRequest, NextResponse } from 'next/server'
import { generateEmbedding } from '@/lib/ai/embeddings'
import { findSimilarMetricsForTenant } from '@/lib/ai/vector-search'
import OpenAI from 'openai'

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })

export async function POST(request: NextRequest) {
  const { query } = await request.json()
  const tenantId = request.headers.get('x-tenant-id')!

  // Step 1: Find relevant metrics via vector search
  const relevantMetrics = await findSimilarMetricsForTenant(query, tenantId, 5)

  // Step 2: Construct augmented prompt
  const context = relevantMetrics
    .map((m) => `- ${m.metric_name}: ${m.metric_value} (similarity: ${m.similarity.toFixed(2)})`)
    .join('\n')

  const prompt = `
    You are a business intelligence assistant for Zixly.

    Based on the following relevant metrics from the customer's data:
    ${context}

    Answer this question: ${query}

    Provide a concise, data-driven response with specific numbers.
  `

  // Step 3: Generate response with GPT-4
  const completion = await openai.chat.completions.create({
    model: 'gpt-4-turbo-preview',
    messages: [{ role: 'user', content: prompt }],
    temperature: 0.3, // Low temperature for factual responses
  })

  return NextResponse.json({
    answer: completion.choices[0].message.content,
    sources: relevantMetrics.map((m) => ({ id: m.id, name: m.metric_name })),
  })
}

Performance Considerations

Embedding Generation Cost

Operation Cost (OpenAI) Latency
Generate 1 embedding (1536 dim) $0.00002 ~200ms
Batch 100 embeddings $0.002 ~2s
Monthly (10,000 new metrics) $0.20 N/A

Optimization: Batch embed multiple metrics in single API call (OpenAI supports up to 2,048 inputs per request).


Vector Search Performance

Dataset Size Sequential Scan IVFFlat Index HNSW Index
1,000 vectors 10ms 15ms (overhead) 20ms (overhead)
10,000 vectors 100ms 15ms 10ms
100,000 vectors 1,000ms 50ms 15ms
1,000,000 vectors 10,000ms 200ms 30ms

Threshold: Create index after 1,000 vectors to avoid sequential scan penalty.


Storage Requirements

Component Size per Row 1M Rows
Vector (1536 float32) 6 KB 6 GB
IVFFlat index ~10 KB ~10 GB
HNSW index ~15 KB ~15 GB

Total: ~20-30 KB per custom metric with embedding + index.



Document End

Review Cycle: Before RAG feature development (Post-MVP, Year 2) Next Review: 2026-06-01 (RAG feature scoping) Change History:

Approval: