Semantic Search Made Simple: Storing OpenAI Embeddings in PostgreSQL with PGVector

Semantic search is the backbone of modern AI‑driven applications, turning raw text into meaningful vectors that capture context and intent. By combining OpenAI’s powerful embeddings with PostgreSQL’s reliability and the vector extension PGVector, you can build a fast, scalable relevance ranking system without leaving your existing SQL stack. In this guide we walk through every step— from setting up the environment to executing production‑ready similarity queries.

1. Why PostgreSQL + PGVector?

Familiar SQL interface: Your developers already write queries; no new language to learn.
Transactional guarantees: ACID compliance ensures data consistency even when mixing text and vectors.
Rich ecosystem: Extensions for full‑text search, GIS, JSONB, and more can coexist with vectors.
Cost‑effective scaling: Use existing PostgreSQL clusters or managed services like Amazon RDS, Azure Database for PostgreSQL, or Google Cloud SQL.

2. Prerequisites

Before you begin, make sure you have:

Python 3.10+ (for the OpenAI SDK)
PostgreSQL 14+ with the pgvector extension installed.
An OpenAI API key.
Basic knowledge of SQL and Python.

3. Setting Up the PostgreSQL Extension

First, enable pgvector in your database:

CREATE EXTENSION IF NOT EXISTS vector;

Next, create a table that will hold your documents and their embeddings. The vector data type is defined by the dimension of the embeddings you will store (OpenAI’s text-embedding-ada-002 produces 1536‑dimensional vectors).

CREATE TABLE documents (
    id          BIGSERIAL PRIMARY KEY,
    title       TEXT NOT NULL,
    content     TEXT NOT NULL,
    embedding   VECTOR(1536) NOT NULL
);

4. Generating Embeddings with OpenAI

Using the openai Python package, you can fetch embeddings in bulk. Below is a concise script that reads a CSV of articles, generates embeddings, and inserts them into PostgreSQL.

import openai
import psycopg2
import csv

openai.api_key = "YOUR_OPENAI_API_KEY"

conn = psycopg2.connect(
    dbname="your_db",
    user="your_user",
    password="your_pass",
    host="localhost",
    port=5432
)
cur = conn.cursor()

with open('articles.csv', encoding='utf-8') as f:
    reader = csv.DictReader(f)
    for row in reader:
        content = row['content']
        embedding = openai.Embedding.create(
            input=content,
            model="text-embedding-ada-002"
        )['data'][0]['embedding']

        cur.execute(
            "INSERT INTO documents (title, content, embedding) VALUES (%s, %s, %s)",
            (row['title'], content, embedding)
        )
conn.commit()
cur.close()
conn.close()

Tip: To keep costs down, batch requests (e.g., 100 at a time) and throttle the API if you hit rate limits.

5. Indexing for Speed

PGVector supports several index types, but the most common for semantic search is the ivfflat index, which balances speed and accuracy.

CREATE INDEX idx_documents_embedding ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Key points:

vector_cosine_ops uses cosine similarity, the de‑facto standard for semantic relevance.
The lists parameter controls the granularity of the IVF (inverted file) index; 100–200 is a good starting point for 10k–100k rows.
After creating the index, rebuild it to populate the IVF structure: REFRESH MATERIALIZED VIEW idx_documents_embedding (or simply re‑create it).

6. Querying with Cosine Similarity

Once indexed, similarity queries become trivial. Suppose you want the top 5 documents most relevant to a user query. You first generate an embedding for the query and then compare it against the stored vectors.

import openai
import psycopg2

openai.api_key = "YOUR_OPENAI_API_KEY"

query_text = "What are the latest trends in AI ethics?"
query_embedding = openai.Embedding.create(
    input=query_text,
    model="text-embedding-ada-002"
)['data'][0]['embedding']

conn = psycopg2.connect(...)
cur = conn.cursor()
cur.execute("""
    SELECT id, title, content,
           1 - (embedding <=> %s) AS cosine_distance
    FROM documents
    ORDER BY cosine_distance
    LIMIT 5;
""", (query_embedding,))

results = cur.fetchall()
for r in results:
    print(f"{r[1]} (score: {1 - r[3]:.4f})")
cur.close()
conn.close()

Here, <=> is the cosine distance operator defined by PGVector. We invert it to get a similarity score (higher is better). The ORDER BY clause automatically uses the IVF index for efficient nearest‑neighbor search.

7. Real‑World Use Cases

Document Search: Replace keyword matching with semantic relevance in knowledge bases, FAQs, or research libraries.
Chatbot Context: Retrieve the most contextually relevant snippets to feed into a conversational AI.
Product Recommendation: Match user reviews or product descriptions to recommend similar items.
Data Deduplication: Find near‑duplicate entries across millions of records.

8. Scaling Tips

Use PostgreSQL read replicas: Offload search queries to replicas while writes hit the primary.
Batch embedding updates: Re‑embed documents during off‑peak hours to reduce API costs and latency.
Tune lists parameter: Higher values yield better accuracy at the expense of index build time and memory.
Leverage pgvector clustering: Partition the documents table by topic or region if your data grows extremely large.

9. Common Pitfalls & Fixes

Out‑of‑Memory Errors: When inserting millions of vectors, stream inserts and commit in batches.
Slow Index Build: For huge datasets, temporarily disable foreign keys and triggers during index creation.
Embedding Drift: If your OpenAI model updates, re‑embed all documents and rebuild the index to maintain consistency.
Incorrect Distance Metric: Remember that vector_cosine_ops expects <=>; using the wrong operator yields meaningless scores.

10. Final Thoughts

By marrying OpenAI embeddings with PostgreSQL’s robust storage and the lightning‑fast vector queries offered by PGVector, you can deliver semantic search at scale without reinventing your database stack. The learning curve is shallow, the integration is painless, and the performance is hard to beat—especially when you need transactional guarantees and tight control over data.

Ready to add semantic search to your next project? Start by installing PGVector, generating a few embeddings, and watching your search relevance soar.