RAG Strategy & Vector Operations

Version: 1.0 Last Updated: 2025-10-15 Owner: AI/ML Architecture Status: Future Enhancement (Post-MVP)

Overview
pgvector Setup
Embedding Generation
Vector Storage Pattern
Similarity Search Queries
Index Strategy
RAG Implementation Architecture
Performance Considerations

Overview

Purpose

This document outlines the Retrieval-Augmented Generation (RAG) strategy for Zixly, enabling AI-powered insights from unstructured business data (invoices, contracts, meeting notes, etc.) via vector similarity search.

Use Cases

Phase 1 (Post-MVP):

Semantic search on custom metrics (find similar operational patterns)
Natural language querying of business data (“Show me projects over budget in Q3”)

Phase 2 (Year 2):

Document intelligence (extract insights from uploaded PDFs, emails)
Predictive analytics (forecast cash flow using historical embeddings)
Automated report generation (summarize financial trends in plain English)

Technology Stack

Component	Technology	Purpose
Vector Database	PostgreSQL + pgvector	Store and query embeddings
Embedding Model	OpenAI `text-embedding-3-small` (1536 dimensions)	Convert text → vectors
Distance Metric	Cosine similarity	Measure vector closeness (range: -1 to 1)
Query Interface	Raw SQL (Prisma doesn’t support pgvector natively)	Execute vector operations

pgvector Setup

Extension Installation

Already Enabled in Phase 1: See Phase 1 Implementation Plan

-- Verify pgvector extension is enabled
SELECT * FROM pg_extension WHERE extname = 'vector';

-- Expected result:
-- extname | extversion | ...
-- vector  | 0.5.1      | ...

If not enabled:

CREATE EXTENSION IF NOT EXISTS vector;

Supabase Note: Supabase includes pgvector by default in all projects (Sydney region included).

Vector Column in Schema

Already defined in Prisma schema:

model CustomMetric {
  // ... other fields
  embedding  Unsupported("vector(1536)")?  // OpenAI embedding dimension
}

Why Unsupported: Prisma doesn’t have native pgvector type support. We use Unsupported() to preserve the column in migrations, but must use raw SQL for vector operations.

Migration SQL:

-- Generated by Prisma (part of initial migration)
ALTER TABLE "custom_metrics" ADD COLUMN "embedding" vector(1536);

Embedding Generation

OpenAI API Integration

Model: text-embedding-3-small (1536 dimensions, $0.02/1M tokens)

Why this model:

✅ Latest OpenAI embedding model (released Jan 2024)
✅ 1536 dimensions (balance between quality and storage)
✅ 8191 token context window (handles long custom metric descriptions)

Implementation (lib/ai/embeddings.ts):

import OpenAI from 'openai'

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
})

export async function generateEmbedding(text: string): Promise<number[]> {
  // Validate input
  if (!text || text.trim().length === 0) {
    throw new Error('Embedding text cannot be empty')
  }

  // Truncate to 8191 tokens (model limit)
  const truncated = text.slice(0, 32000) // ~8k tokens approx

  // Generate embedding
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: truncated,
    encoding_format: 'float',
  })

  if (response.data.length === 0) {
    throw new Error('OpenAI returned empty embedding')
  }

  return response.data[0].embedding
}

Usage Example:

import { generateEmbedding } from '@/lib/ai/embeddings'

// Generate embedding for custom metric
const metricDescription = `
  Project Delivery Time: Average days from project kickoff to completion.
  Target: 30 days. Current: 42 days.
  Industry: Construction. Department: Operations.
`

const embedding = await generateEmbedding(metricDescription)
// Returns: Float32Array of 1536 values

Vector Storage Pattern

Storing Embeddings

Problem: Prisma doesn’t support vector type natively.

Solution: Use Prisma $executeRawUnsafe for INSERT/UPDATE with embeddings.

Pattern 1: Insert with Embedding

import prisma from '@/lib/prisma'
import { generateEmbedding } from '@/lib/ai/embeddings'

async function createCustomMetricWithEmbedding(data: {
  clientKPIId: string
  metricName: string
  metricValue: number
  unit: string
  recordDate: Date
  description: string // For embedding generation
}) {
  // Generate embedding from description
  const embedding = await generateEmbedding(
    `${data.metricName}: ${data.description}. Value: ${data.metricValue} ${data.unit}`
  )

  // Convert embedding to PostgreSQL vector format
  const vectorString = `[${embedding.join(',')}]`

  // Insert with raw SQL (Prisma limitation)
  await prisma.$executeRawUnsafe(
    `
    INSERT INTO custom_metrics (
      id, client_kpi_id, metric_name, metric_value, unit, record_date, embedding, created_at
    ) VALUES (
      $1, $2, $3, $4, $5, $6, $7::vector, $8
    )
    `,
    crypto.randomUUID(), // id (manual cuid generation)
    data.clientKPIId,
    data.metricName,
    data.metricValue,
    data.unit,
    data.recordDate,
    vectorString, // Cast to vector type
    new Date()
  )
}

Pattern 2: Update Existing Record with Embedding

async function addEmbeddingToExistingMetric(metricId: string, description: string) {
  const embedding = await generateEmbedding(description)
  const vectorString = `[${embedding.join(',')}]`

  await prisma.$executeRawUnsafe(
    `UPDATE custom_metrics SET embedding = $1::vector WHERE id = $2`,
    vectorString,
    metricId
  )
}

Similarity Search Queries

Basic Cosine Similarity Search

Query: Find custom metrics semantically similar to a search query.

async function findSimilarMetrics(query: string, topK: number = 5) {
  // Generate embedding for query
  const queryEmbedding = await generateEmbedding(query)
  const vectorString = `[${queryEmbedding.join(',')}]`

  // Execute similarity search
  const results = await prisma.$queryRawUnsafe<
    Array<{
      id: string
      metric_name: string
      metric_value: number
      similarity: number
    }>
  >(
    `
    SELECT
      id,
      metric_name,
      metric_value,
      (1 - (embedding <=> $1::vector)) AS similarity
    FROM custom_metrics
    WHERE embedding IS NOT NULL
    ORDER BY embedding <=> $1::vector  -- Cosine distance
    LIMIT $2
    `,
    vectorString,
    topK
  )

  return results
}

Usage:

// User query: "Show me metrics related to customer satisfaction"
const similar = await findSimilarMetrics('customer satisfaction scores and NPS', 5)

// Results (ordered by similarity):
// [
//   { metric_name: 'Net Promoter Score', metric_value: 72, similarity: 0.89 },
//   { metric_name: 'Customer Satisfaction (CSAT)', metric_value: 4.2, similarity: 0.85 },
//   { metric_name: 'Support Ticket Resolution Time', metric_value: 18, similarity: 0.67 },
//   ...
// ]

Distance Operators

pgvector supports 3 distance metrics:

Operator	Metric	Range	Use Case
`<->`	Euclidean distance (L2)	0 to ∞	Geometric distance
`<=>`	Cosine distance	0 to 2	Semantic similarity (recommended)
`<#>`	Inner product	-∞ to ∞	Dot product (for normalized vectors)

Cosine Similarity Conversion:

-- Cosine distance: 0 (identical) to 2 (opposite)
-- Cosine similarity: 1 (identical) to -1 (opposite)
-- Formula: similarity = 1 - distance
SELECT (1 - (embedding <=> query_embedding)) AS similarity
FROM custom_metrics;

Filtered Similarity Search

Query: Find similar metrics within a specific tenant (respecting multi-tenancy).

async function findSimilarMetricsForTenant(query: string, tenantId: string, topK: number = 5) {
  const queryEmbedding = await generateEmbedding(query)
  const vectorString = `[${queryEmbedding.join(',')}]`

  // Set tenant context for RLS
  await prisma.$executeRawUnsafe(`SET LOCAL app.tenant_id = $1`, tenantId)

  const results = await prisma.$queryRawUnsafe<
    Array<{
      id: string
      metric_name: string
      similarity: number
    }>
  >(
    `
    SELECT
      cm.id,
      cm.metric_name,
      (1 - (cm.embedding <=> $1::vector)) AS similarity
    FROM custom_metrics cm
    INNER JOIN client_kpis ck ON cm.client_kpi_id = ck.id
    WHERE cm.embedding IS NOT NULL
      AND ck.tenant_id = $2
    ORDER BY cm.embedding <=> $1::vector
    LIMIT $3
    `,
    vectorString,
    tenantId,
    topK
  )

  return results
}

Index Strategy

Vector Index Types

pgvector supports 2 index types:

Index Type	Algorithm	Build Time	Query Speed	Accuracy
IVFFlat	Inverted File + Flat	Fast	Medium	~95-99%
HNSW	Hierarchical Navigable Small World	Slow	Very Fast	~99%+

Recommendation: Start with IVFFlat for MVP (simpler), migrate to HNSW at scale (500+ customers).

Creating IVFFlat Index

Index Configuration:

-- Create IVFFlat index with 100 lists (clusters)
-- Lists = sqrt(rows) is a common heuristic
-- For 10,000 metrics: sqrt(10000) = 100 lists
CREATE INDEX custom_metrics_embedding_idx
ON custom_metrics
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Index Build Process:

Sample vectors: pgvector samples vectors to create centroids (cluster centers)
Assign to lists: Each vector assigned to nearest centroid
Query optimization: Similarity search only checks nearest lists (not all vectors)

Performance: IVFFlat reduces query time from O(N) to O(N/lists) — 10x faster with 100 lists.

When to Create Index

Threshold: Create index after 1,000+ vectors inserted.

Why: Small datasets (<1,000 vectors) are faster with sequential scan than index overhead.

Migration:

-- Migration: Add index when custom_metrics reaches threshold
-- File: prisma/migrations/YYYYMMDDHHMMSS_add_vector_index_custom_metrics/migration.sql

-- Only create if table has enough rows
DO $$
BEGIN
  IF (SELECT COUNT(*) FROM custom_metrics WHERE embedding IS NOT NULL) > 1000 THEN
    CREATE INDEX CONCURRENTLY custom_metrics_embedding_idx
    ON custom_metrics
    USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);

    RAISE NOTICE 'Vector index created';
  ELSE
    RAISE NOTICE 'Skipping index (not enough vectors yet)';
  END IF;
END $$;

RAG Implementation Architecture

End-to-End RAG Flow

┌─────────────────────────────────────────────────────────────────┐
│                         USER QUERY                               │
│  "Show me all operational efficiency metrics from Q3"           │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  Step 1: QUERY EMBEDDING                                         │
│  OpenAI API: text-embedding-3-small                             │
│  Input: Query string                                             │
│  Output: Float[1536]                                             │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  Step 2: VECTOR SIMILARITY SEARCH                                │
│  PostgreSQL + pgvector                                           │
│  Query: SELECT ... ORDER BY embedding <=> query_embedding        │
│  Output: Top 5 most similar custom_metrics                      │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  Step 3: CONTEXT AUGMENTATION                                    │
│  Construct prompt with retrieved metrics:                        │
│  "Based on these metrics: [retrieved data], answer: [query]"    │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  Step 4: GENERATE RESPONSE                                       │
│  OpenAI GPT-4: Generate natural language response               │
│  Input: Augmented prompt                                         │
│  Output: Structured answer with citations                        │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│  Step 5: RETURN TO USER                                          │
│  API response with answer + source metric IDs                    │
└─────────────────────────────────────────────────────────────────┘

API Endpoint Example

Endpoint: POST /api/ai/query

// app/api/ai/query/route.ts
import { NextRequest, NextResponse } from 'next/server'
import { generateEmbedding } from '@/lib/ai/embeddings'
import { findSimilarMetricsForTenant } from '@/lib/ai/vector-search'
import OpenAI from 'openai'

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })

export async function POST(request: NextRequest) {
  const { query } = await request.json()
  const tenantId = request.headers.get('x-tenant-id')!

  // Step 1: Find relevant metrics via vector search
  const relevantMetrics = await findSimilarMetricsForTenant(query, tenantId, 5)

  // Step 2: Construct augmented prompt
  const context = relevantMetrics
    .map((m) => `- ${m.metric_name}: ${m.metric_value} (similarity: ${m.similarity.toFixed(2)})`)
    .join('\n')

  const prompt = `
    You are a business intelligence assistant for Zixly.

    Based on the following relevant metrics from the customer's data:
    ${context}

    Answer this question: ${query}

    Provide a concise, data-driven response with specific numbers.
  `

  // Step 3: Generate response with GPT-4
  const completion = await openai.chat.completions.create({
    model: 'gpt-4-turbo-preview',
    messages: [{ role: 'user', content: prompt }],
    temperature: 0.3, // Low temperature for factual responses
  })

  return NextResponse.json({
    answer: completion.choices[0].message.content,
    sources: relevantMetrics.map((m) => ({ id: m.id, name: m.metric_name })),
  })
}

Performance Considerations

Embedding Generation Cost

Operation	Cost (OpenAI)	Latency
Generate 1 embedding (1536 dim)	$0.00002	~200ms
Batch 100 embeddings	$0.002	~2s
Monthly (10,000 new metrics)	$0.20	N/A

Optimization: Batch embed multiple metrics in single API call (OpenAI supports up to 2,048 inputs per request).

Vector Search Performance

Dataset Size	Sequential Scan	IVFFlat Index	HNSW Index
1,000 vectors	10ms	15ms (overhead)	20ms (overhead)
10,000 vectors	100ms	15ms	10ms
100,000 vectors	1,000ms	50ms	15ms
1,000,000 vectors	10,000ms	200ms	30ms

Threshold: Create index after 1,000 vectors to avoid sequential scan penalty.

Storage Requirements

Component	Size per Row	1M Rows
Vector (1536 float32)	6 KB	6 GB
IVFFlat index	~10 KB	~10 GB
HNSW index	~15 KB	~15 GB

Total: ~20-30 KB per custom metric with embedding + index.

Document End

Review Cycle: Before RAG feature development (Post-MVP, Year 2) Next Review: 2026-06-01 (RAG feature scoping) Change History:

2025-10-15: Initial version (v1.0) - RAG strategy outlined for future implementation

Approval:

AI/ML Lead: [Founder Name] - Approved 2025-10-15

Zixly Documentation

DevOps Automation Services for Brisbane Tech Companies

RAG Strategy & Vector Operations

Table of Contents

Overview

Purpose

Use Cases

Technology Stack

pgvector Setup

Extension Installation

Vector Column in Schema

Embedding Generation

OpenAI API Integration

Vector Storage Pattern

Storing Embeddings

Similarity Search Queries

Basic Cosine Similarity Search

Distance Operators

Filtered Similarity Search

Index Strategy

Vector Index Types

Creating IVFFlat Index

When to Create Index

RAG Implementation Architecture

End-to-End RAG Flow

API Endpoint Example

Performance Considerations

Embedding Generation Cost

Vector Search Performance

Storage Requirements

RAG Strategy & Vector Operations

Table of Contents

Overview

Purpose

Use Cases

Technology Stack

pgvector Setup

Extension Installation

Vector Column in Schema

Embedding Generation

OpenAI API Integration

Vector Storage Pattern

Storing Embeddings

Similarity Search Queries

Basic Cosine Similarity Search

Distance Operators

Filtered Similarity Search

Index Strategy

Vector Index Types

Creating IVFFlat Index

When to Create Index

RAG Implementation Architecture

End-to-End RAG Flow

API Endpoint Example

Performance Considerations

Embedding Generation Cost

Vector Search Performance

Storage Requirements

Related Documentation