Integration: Supabase
Use Supabase as a Document Store for Haystack โ pgvector for embedding search, PGroonga for full-text BM25 search, and Supabase Storage for file downloads
Table of Contents
Overview
Supabase is an open-source Postgres platform. The supabase-haystack package provides three sets of components for building Haystack pipelines:
- pgvector โ dense embedding and keyword retrieval via the
pgvectorextension (pre-installed on Supabase). - PGroonga โ full-text BM25 search via the
pgroongaextension (no embeddings required). - Supabase Storage โ download files from a Supabase Storage bucket into
ByteStreamobjects ready for indexing.
The pgvector components are a thin wrapper around
pgvector-haystack, inheriting all of its functionality: three vector similarity functions (cosine_similarity, inner_product, l2_distance), exact or HNSW search, metadata filtering, and keyword retrieval via PostgreSQL’s ts_rank_cd. The two Supabase-specific defaults are that the connection string is read from SUPABASE_DB_URL and that create_extension is False (Supabase enables pgvector for you).
Installation
pip install supabase-haystack
For the pgvector components, set the database connection string:
export SUPABASE_DB_URL="postgresql://postgres.[project-ref]:[password]@aws-0-[region].pooler.supabase.com:5432/postgres"
For the PGroonga and Storage components, set the project URL and service role key:
export SUPABASE_SERVICE_KEY="<your-service-role-key>"
For local development, the
docker-compose.yml in the repo spins up a pgvector Postgres on localhost:5432.
Usage
pgvector Components
These components use Supabase Postgres with the pgvector extension for embedding-based and keyword retrieval.
SupabasePgvectorDocumentStore: stores HaystackDocumentobjects (content, embedding, metadata, optional blob) in a Postgres table, and handles writes, filtering, and both sync and async retrieval.SupabasePgvectorEmbeddingRetriever: dense Retriever that compares a query embedding against stored embeddings using the configuredvector_function(cosine_similarity,inner_product, orl2_distance).SupabasePgvectorKeywordRetriever: keyword Retriever that scores documents with PostgreSQL’sts_rank_cd, considering term frequency, proximity, and section weight.
Indexing
from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.writers import DocumentWriter
from haystack.document_stores.types import DuplicatePolicy
from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore
document_store = SupabasePgvectorDocumentStore(
table_name="haystack_documents",
embedding_dimension=384,
vector_function="cosine_similarity",
recreate_table=True,
search_strategy="hnsw",
)
documents = [
Document(content="There are over 7,000 languages spoken around the world today."),
Document(content="Elephants have been observed to recognize themselves in mirrors."),
Document(content="Bioluminescent waves can be seen in the Maldives and Puerto Rico."),
]
indexing = Pipeline()
indexing.add_component("embedder", SentenceTransformersDocumentEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2"))
indexing.add_component("writer", DocumentWriter(
document_store=document_store, policy=DuplicatePolicy.OVERWRITE))
indexing.connect("embedder", "writer")
indexing.run({"embedder": {"documents": documents}})
Retrieval
from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack_integrations.components.retrievers.supabase import SupabasePgvectorEmbeddingRetriever
querying = Pipeline()
querying.add_component("text_embedder", SentenceTransformersTextEmbedder(
model="sentence-transformers/all-MiniLM-L6-v2"))
querying.add_component("retriever",
SupabasePgvectorEmbeddingRetriever(document_store=document_store, top_k=3))
querying.connect("text_embedder.embedding", "retriever.query_embedding")
result = querying.run({"text_embedder": {"text": "How many languages are there?"}})
for doc in result["retriever"]["documents"]:
print(doc.score, "โ", doc.content)
For keyword or hybrid (dense + keyword) retrieval, swap in or combine SupabasePgvectorKeywordRetriever โ it takes a query string directly and can be joined with the embedding retriever via DocumentJoiner using reciprocal rank fusion.
PGroonga Components
These components use the PGroonga PostgreSQL extension for fast, multilingual full-text BM25 search. No embeddings are required โ retrieval works on plain text queries.
Prerequisites: enable the PGroonga extension in your Supabase project:
CREATE EXTENSION IF NOT EXISTS pgroonga;
SupabaseGroongaDocumentStore: storesDocumentobjects in a Postgres table with a PGroonga index on the content column. Supports both sync and async operations. Authenticates viaSUPABASE_SERVICE_KEYand a project URL rather than a raw connection string.SupabaseGroongaBM25Retriever: full-text Retriever backed bySupabaseGroongaDocumentStore. Accepts a plain textqueryand returns ranked documents using PGroonga BM25 scoring. Supports bothrun()(sync) andrun_async()(async).
Indexing
from haystack import Document
from haystack.document_stores.types import DuplicatePolicy
from haystack.utils import Secret
from haystack_integrations.document_stores.supabase import SupabaseGroongaDocumentStore
document_store = SupabaseGroongaDocumentStore(
supabase_url="https://<project-ref>.supabase.co",
supabase_key=Secret.from_env_var("SUPABASE_SERVICE_KEY"),
table_name="haystack_fts_documents",
recreate_table=True,
)
document_store.warm_up()
documents = [
Document(content="There are over 7,000 languages spoken around the world today."),
Document(content="Elephants have been observed to recognize themselves in mirrors."),
Document(content="Bioluminescent waves can be seen in the Maldives and Puerto Rico."),
]
document_store.write_documents(documents, policy=DuplicatePolicy.OVERWRITE)
Retrieval
from haystack_integrations.components.retrievers.supabase import SupabaseGroongaBM25Retriever
retriever = SupabaseGroongaBM25Retriever(document_store=document_store, top_k=3)
result = retriever.run(query="languages spoken around the world")
for doc in result["documents"]:
print(doc.score, "โ", doc.content)
Supabase Storage
SupabaseBucketDownloader: downloads files from a Supabase Storage bucket and returns them asByteStreamobjects. Each stream carriesmeta["file_path"]andmeta["bucket_name"]. Supports optional extension filtering (e.g.[".pdf", ".txt"]). Designed to feed directly into document converters in indexing pipelines.
from haystack import Pipeline
from haystack.components.converters import PyPDFToDocument
from haystack.components.writers import DocumentWriter
from haystack.utils import Secret
from haystack_integrations.components.downloaders.supabase import SupabaseBucketDownloader
from haystack_integrations.document_stores.supabase import SupabasePgvectorDocumentStore
document_store = SupabasePgvectorDocumentStore(
table_name="haystack_documents",
embedding_dimension=384,
)
indexing = Pipeline()
indexing.add_component("downloader", SupabaseBucketDownloader(
supabase_url="https://<project-ref>.supabase.co",
supabase_key=Secret.from_env_var("SUPABASE_SERVICE_KEY"),
bucket_name="my-documents",
file_extensions=[".pdf"],
))
indexing.add_component("converter", PyPDFToDocument())
indexing.add_component("writer", DocumentWriter(document_store=document_store))
indexing.connect("downloader.streams", "converter.sources")
indexing.connect("converter.documents", "writer.documents")
indexing.run({"downloader": {"sources": ["reports/q1.pdf", "reports/q2.pdf"]}})
License
supabase-haystack is distributed under the terms of the
Apache-2.0 license.
