Skip to content
February 12, 202614 min readarchitecture

Vector Databases: When to Build vs Buy

Every SaaS adding AI features faces the vector database decision. pgvector on your existing PostgreSQL, a managed service like Pinecone, or a dedicated engine like Qdrant? Here's the decision framework based on your data scale, query patterns, and team capacity.

vector-databaseaipostgresqlpgvectorsaas
Vector Databases: When to Build vs Buy

TL;DR

Start with pgvector. For 80% of SaaS AI features, pgvector on your existing PostgreSQL handles the load, keeps your infrastructure simple, and performs well up to 5-10 million vectors. The teams that need a dedicated vector database are processing 10M+ vectors, require sub-10ms query latency at scale, or need advanced filtering that pgvector doesn't support efficiently. Pinecone is the right choice for teams that want zero operational overhead. Qdrant is the right choice for teams that want control and performance. Weaviate is the right choice for teams that need built-in hybrid search. The wrong choice: adopting a dedicated vector database before you need one, adding infrastructure complexity for a feature that might get 100 queries per day.

Part of the AI-Assisted Development Guide ... a comprehensive guide to building AI features that deliver real value.


The Decision Before the Decision

Before evaluating vector databases, answer two questions:

1. How many vectors will you store?

ScaleCountRecommendation
SmallUnder 100Kpgvector, no question
Medium100K - 5Mpgvector with HNSW indexes
Large5M - 50MDedicated vector database
Massive50M+Managed service or custom infrastructure

2. What's your query latency requirement?

RequirementLatencyRecommendation
Relaxed< 500mspgvector handles this easily
Standard< 100mspgvector with HNSW up to 5M vectors
Strict< 10msDedicated vector database with in-memory indexes
Real-time< 5msDedicated vector database, edge deployment

If both answers point to pgvector, stop reading and use pgvector. Every additional infrastructure component you add is a liability.


Option 1: pgvector (Start Here)

pgvector is a PostgreSQL extension that adds vector data types, similarity search operators, and indexing. It runs on your existing database.

Setup

-- Enable the extension CREATE EXTENSION IF NOT EXISTS vector; -- Create a table with vector column CREATE TABLE document_embeddings ( id BIGSERIAL PRIMARY KEY, document_id BIGINT REFERENCES documents(id), chunk_index INT NOT NULL, content TEXT NOT NULL, embedding vector(1536), -- matches OpenAI text-embedding-3-small dimensions metadata JSONB DEFAULT '{}', created_at TIMESTAMPTZ DEFAULT NOW() ); -- Create HNSW index for fast approximate nearest neighbor search CREATE INDEX ON document_embeddings USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 200);

Performance Characteristics

Vector CountQuery Latency (HNSW)Recall@10Memory Usage
10K1-3ms99%50MB
100K3-8ms98%500MB
1M8-25ms97%5GB
5M20-60ms95%25GB
10M50-150ms93%50GB

These benchmarks are on a 4 vCPU, 16GB RAM PostgreSQL instance with 1536-dimensional vectors and HNSW indexing (m=16, ef_construction=200).

The pgvector Sweet Spot

// Full-stack search with pgvector ... no additional infrastructure async function searchDocuments( queryEmbedding: number[], tenantId: string, topK: number = 10 ): Promise<SearchResult[]> { const results = await db.query( ` SELECT d.id, d.content, d.metadata, 1 - (d.embedding <=> $1::vector) AS similarity FROM document_embeddings d WHERE d.metadata->>'tenant_id' = $2 ORDER BY d.embedding <=> $1::vector LIMIT $3 `, [JSON.stringify(queryEmbedding), tenantId, topK] ); return results.rows; }

The killer advantage: tenant filtering happens in the same query as vector search. No need to coordinate between a vector database and your application database. Multi-tenancy, access control, and vector search in a single SQL query.

pgvector Limitations

  1. No built-in hybrid search. You need a separate full-text search (PostgreSQL tsvector) and manual result merging.
  2. Memory pressure. HNSW indexes live in shared memory. At 10M+ vectors, the index competes with your application queries for RAM.
  3. No horizontal scaling. pgvector runs on a single PostgreSQL instance. You can't shard across nodes without a managed PostgreSQL service like Neon or Citus.
  4. Recall degrades at scale. Above 5M vectors, HNSW recall drops below 95% without tuning ef_search (which increases latency).

Option 2: Dedicated Vector Databases

When pgvector's limitations become real constraints... not hypothetical ones... consider a dedicated vector database.

Pinecone (Managed, Zero Ops)

AttributeDetails
HostingFully managed (serverless or dedicated pods)
Max vectorsBillions (serverless), limited by pod size (pods)
Query latency10-50ms (serverless), 5-20ms (pods)
PricingRead Units model: $16-24/M RUs + $0.33/GB storage/month ($50/mo min)
Metadata filteringYes, pre-filter before search
Hybrid searchYes (sparse-dense vectors)

Best for: Teams that want zero infrastructure management and are willing to pay a premium for it. Pinecone's serverless tier handles bursty workloads without capacity planning.

Pricing note: Pinecone migrated to a Read Unit (RU) model. A simple query against 1,000 vectors costs 1 RU. Metadata filtering increases cost to 5-10 RUs per query. At $16/M RUs (Standard plan), a simple 50K queries/day workload costs ~$24/month for reads alone. The old per-query pricing is no longer accurate... check Pinecone's cost calculator for your specific workload.

import { Pinecone } from "@pinecone-database/pinecone"; const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY }); const index = pinecone.index("documents"); async function search(embedding: number[], tenantId: string) { const results = await index.namespace(tenantId).query({ vector: embedding, topK: 10, includeMetadata: true, filter: { status: { $eq: "published" } }, }); return results.matches; }

Pinecone's namespace feature maps cleanly to multi-tenant SaaS: one namespace per tenant, isolated vector spaces, no cross-tenant leakage.

Qdrant (Self-Hosted, High Performance)

AttributeDetails
HostingSelf-hosted (Docker/K8s) or Qdrant Cloud
Max vectorsBillions (with disk-based storage)
Query latency3-15ms (memory), 10-50ms (disk)
PricingOpen source (self-hosted), or usage-based (Cloud, from $0.014/hr)
Metadata filteringAdvanced (nested, array, geo, full-text)
Hybrid searchYes (sparse vectors + payload filtering)

Best for: Teams that want control over their infrastructure and need advanced filtering capabilities. Qdrant's filtering is the most powerful among dedicated vector databases.

Weaviate (Hybrid Search Native)

AttributeDetails
HostingSelf-hosted (Docker/K8s) or Weaviate Cloud
Max vectorsBillions
Query latency5-25ms
PricingOpen source (self-hosted), or $25/1M vector-dims/month (Cloud Flex)
Metadata filteringGraphQL-based, object-oriented
Hybrid searchBuilt-in BM25 + vector, single query

Best for: Teams that need hybrid search (BM25 + vector) as a first-class feature without building the merge logic themselves.


Cost Comparison

For a SaaS with 1M vectors (1536-dim) and 50K queries/day:

SolutionMonthly CostIncludes
pgvector (existing DB)$0 incrementalRuns on existing PostgreSQL
pgvector (dedicated instance)$100-200db.m7g.large on RDS
Pinecone Serverless~$75-150Read Units + $0.33/GB storage ($50/mo minimum)
Qdrant Cloud~$100-200Usage-based, depends on cluster configuration
Qdrant (self-hosted)~$40t3.medium instance
Weaviate Cloud (Flex)~$40-100$25/1M vector-dims/month + storage
Turbopuffer~$20-60Usage-based, S3-backed storage

pgvector on your existing database costs literally nothing extra. This is why it's the default recommendation.

Worth watching: Turbopuffer and LanceDB emerged as serious contenders in 2025. Turbopuffer (used by Cursor, Notion, and Linear) handles 2.5T+ documents with an S3-backed architecture that separates compute from storage. LanceDB offers multimodal search (text, images, video) with a disk-native design. Both are production-ready and worth evaluating if you're starting fresh.


The Migration Path

Start with pgvector. If and when you hit its limits, migrate to a dedicated vector database. The migration is straightforward because the data model is simple: vectors + metadata.

// Migration script: pgvector → Qdrant async function migrateToQdrant() { const batchSize = 1000; let offset = 0; while (true) { const rows = await db.query( ` SELECT id, content, embedding, metadata FROM document_embeddings ORDER BY id LIMIT $1 OFFSET $2 `, [batchSize, offset] ); if (rows.rowCount === 0) break; await qdrantClient.upsert("documents", { points: rows.rows.map((row) => ({ id: row.id, vector: row.embedding, payload: { content: row.content, ...row.metadata, }, })), }); offset += batchSize; console.log(`Migrated ${offset} vectors`); } }

The migration is a one-time data copy. Your application code changes are minimal... swap the search function implementation, keep the interface the same.


Decision Framework

Do you have > 5M vectors? ├── No → Use pgvector └── Yes → Do you need < 10ms latency? ├── No → pgvector with tuned HNSW (ef_search=200+) └── Yes → Do you want zero ops? ├── Yes → Pinecone └── No → Do you need advanced filtering? ├── Yes → Qdrant └── No → Do you need native hybrid search? ├── Yes → Weaviate └── No → Qdrant (best price/performance)

When to Apply This

  • You're building an AI feature that needs semantic search over your product's data
  • You have a knowledge base, documentation set, or customer data that users need to query
  • You need to store and search document embeddings for RAG, recommendation, or similarity features

When NOT to Apply This

  • You need exact keyword search only... use PostgreSQL full-text search or Elasticsearch
  • Your "AI feature" is a single prompt with no retrieval... you don't need a vector database
  • Your dataset fits in memory as a flat list... brute force search is fast enough under 10K vectors

Choosing a vector database for your SaaS AI feature? I help teams make this decision based on their actual data scale and query requirements... not vendor marketing.


Continue Reading

This post is part of the AI-Assisted Development Guide ... covering AI integration patterns, cost optimization, and building features users want.

More in This Series

Get insights like this weekly

Join The Architect's Brief — one actionable insight every Tuesday.

Need help with AI-assisted development?

Let's talk strategy