Building a Smarter Search with Qdrant, BGE-M3 All-in-One Embedding Model, and Hybrid Reranking
🔗Source code with README: https://github.com/yuniko-software/bge-m3-qdrant-sample
Why Hybrid Search with Reranking?
The combination of BGE-M3's all-in-one embedding model, Qdrant, and hybrid reranking offers significant advantages over traditional methods:
- Combines strengths of multiple approaches - Integrates semantic understanding from dense vectors, keyword precision from sparse vectors, and token-level matching from ColBERT vectors
- Outperforms traditional full-text search - Goes beyond basic keyword matching to understand meaning while preserving the keyword matching benefits of full-text search
- Two-stage efficiency - Uses fast methods for initial retrieval followed by precise reranking, optimizing both speed and accuracy
This approach delivers more relevant results for complex queries while maintaining the strengths of traditional search methods.

Who Would Benefit from This Implementation?
This solution is particularly valuable for organizations with large document repositories (RAG architecture), e-commerce platforms needing nuanced product discovery, or content-heavy applications where users expect high quality search experiences.
Key Components of the System
BGE-M3 Embedding Model
BGE-M3 (BAAI General Embedding Model 3) is a state-of-the-art embedding model that can simultaneously generate dense, sparse and multi-vector (ColBERT) representations for text.
Qdrant Vector Database
Qdrant is a vector database designed for efficient similarity search. It supports:
- Multiple vector types in a single collection
- Hybrid search combining dense and sparse vectors
- Multi-vector search for token-level matching (ColBERT)
Building a Sample Product Search System
1. Setting Up the Environment
First, we need to set up Qdrant and install the necessary packages:
# Pull and run Qdrant Docker image
!docker run -d --name qdrant-db -p 6333:6333 -p 6334:6334 qdrant/qdrant:latest
# Install required packages
%pip install -U transformers FlagEmbedding accelerate
%pip install pandas
%pip install qdrant_client
2. Loading the Product Dataset
We'll use a CSV file containing product information:
def load_products(file_path='products.csv'):
products_df = pd.read_csv(file_path, sep='|')
products_json = products_df.to_dict(orient='records')
return products_df, products_json
products_df, products_json = load_products()
3. Initializing the BGE-M3 Model
def initialize_model():
"""Initialize the BGE-M3 embedding model"""
return BGEM3FlagModel('BAAI/bge-m3', use_fp16=True)
model = initialize_model()
4. Formatting Products for Embedding
def create_product_text(product):
"""Format product information for embedding"""
return f"Product: {product['Name']}\\nDescription: {product['Description']}"
5. Generating Embeddings
The BGE-M3 model generates three types of embeddings in a single pass:
def generate_embeddings(text, model):
"""Generate all three types of embeddings for a text"""
return model.encode(
[text],
return_dense=True,
return_sparse=True,
return_colbert_vecs=True
)
Let's explore each type of embedding:
Dense Vectors (1024 dimensions)
These vectors capture semantic meaning, allowing the system to understand that "running footwear" and "athletic shoes" are conceptually similar.
Sparse Vectors (Lexical Weights)
These vectors maintain the connection to the original tokens, helping with exact keyword matching. For example, the token "Kinvara" in "Saucony Men's Kinvara 13 Running Shoe" gets a high weight.
ColBERT Vectors (Token-level embeddings)
These vectors represent each token in the text, allowing for fine-grained matching between query tokens and product description tokens.
6. Creating the Qdrant Collection
We'll create a collection in Qdrant that can store and efficiently search all three vector types:
def create_qdrant_collection(collection_name="products"):
client = QdrantClient("localhost", port=6333)
client.create_collection(
collection_name=collection_name,
vectors_config={
"dense": models.VectorParams(
size=1024,
distance=models.Distance.COSINE
),
"colbert": models.VectorParams(
size=1024,
distance=models.Distance.COSINE,
multivector_config=models.MultiVectorConfig(
comparator=models.MultiVectorComparator.MAX_SIM
),
)
},
sparse_vectors_config={
"sparse": models.SparseVectorParams(
index=models.SparseIndexParams(
on_disk=True
)
)
},
)
return client
7. Indexing Products in Qdrant
We'll generate embeddings for all products and insert them into the Qdrant collection:
def insert_products_to_qdrant(client, product_embeddings, collection_name="products"):
for embedding in product_embeddings:
product = embedding["product"]
dense_vector = embedding["dense_vector"]
colbert_vectors = embedding["colbert_vectors"]
sparse_data = embedding["sparse_weights"]
# Convert sparse weights to Qdrant format
qdrant_sparse = create_sparse_vector(sparse_data)
# Insert into Qdrant
client.upsert(
collection_name=collection_name,
points=[
models.PointStruct(
id=product["Id"],
payload=product,
vector={
"dense": dense_vector,
"colbert": colbert_vectors,
"sparse": qdrant_sparse
}
)
]
)
8. Implementing the Search Function
The real power of this system comes from the hybrid search approach:
def search_products(client, model, search_query, limit=3, prefetch_limit=6, collection_name="products"):
# Generate embeddings for the query
query_outputs = model.encode(
[search_query],
return_dense=True,
return_sparse=True,
return_colbert_vecs=True
)
dense_vec = query_outputs["dense_vecs"][0]
sparse_vec = query_outputs["lexical_weights"][0]
colbert_vec = query_outputs["colbert_vecs"][0]
# Convert sparse vector to Qdrant format
qdrant_sparse = create_sparse_vector(sparse_vec)
# Set up prefetch for hybrid search
prefetch = [
models.Prefetch(
query=qdrant_sparse,
using="sparse",
limit=prefetch_limit),
models.Prefetch(
query=dense_vec,
using="dense",
limit=prefetch_limit)
]
# Perform reranking with ColBERT
results = client.query_points(
collection_name,
prefetch=prefetch,
query=colbert_vec,
using="colbert",
with_payload=True,
limit=limit,
)
return results
This search function works in three stages:
- Pre-filtering with both sparse and dense vectors to get a candidate set
- Reranking with ColBERT for precise token-level matching
- Returning the most relevant products
Pre-filtering is crucial because the computationally intensive ColBERT reranking would be prohibitively expensive to run on an entire collection. By first narrowing down candidates using faster vector search methods, we maintain efficiency while still leveraging the precision of token-level matching on the most promising results.

Example 1: Running shoes for men
# Example 1: Running shoes for men
result= search_products(client, model, "running shoes for men")
display_search_results(result)
Found 3 matching products
----------------------------------------
1. adidas Men's Ultraboost Personal Best Running Shoe - Score: 3.76
Price: 626.1 USD
Men's shoes - low (non football).
2. adidas Men's Racer Tr21 Running Shoe - Score: 3.51
Price: 413.1 EUR
Everyday style with a running twist. These men's adidas sneakers have a Cloudfoam midsole for ste...
3. Under Armour Men's Charged Assert 9 Running Shoe - Score: 3.49
Price: 434.94 USD
These running shoes are built to help anyone go faster-Charged Cushioning® helps protect against ...
Example 2: Nintendo
# Example 2: Nintendo products
result = search_products(client, model, "nintendo")
display_search_results(result)
Found 3 matching products
----------------------------------------
1. Nintendo Switch OLED Model - Score: 2.41
Price: 349.99 USD
The Nintendo Switch OLED Model offers a vivid 7-inch OLED display and enhanced audio for portable...
2. NZND Case for At&t Motivate 3... - Score: 2.10
Price: 84.09 USD
3. Samsung 980 PRO 1TB NVMe M.2 SSD - Score: 1.97
Price: 149.99 USD
Example 3: Xbox gamepad
# Example 3: Xbox gamepad
result = search_products(client, model, "xbox gamepad")
display_search_results(result)
Found 3 matching products
----------------------------------------
1. PowerA Advantage Wired Controller for Xbox Series X|S... - Score: 3.36
Price: 473.73 EUR
Illuminate the possibilities with the PowerA Advantage Wired Controller for Xbox Series X|S with...
2. Microsoft Xbox Series X - Score: 2.79
Price: 499.99 USD
The Microsoft Xbox Series X offers powerful gaming performance with 4K resolution, lightning-fast...
3. Microsoft Xbox Series X - Score: 2.74
Price: 499.99 USD
The Xbox Series X is a powerful next-gen gaming console with 4K UHD resolution, fast loading spee...
Conclusion
The combination of BGE-M3's multi-vector embeddings and Qdrant's hybrid search capabilities creates a powerful search architecture that delivers superior results across multiple dimensions:
- Technical advantages: 🛠️
- Balances semantic understanding with lexical precision
- Leverages token-level matching for fine-grained relevance assessment
- Optimizes computational resources through strategic pre-filtering
- Information retrieval improvements: 🔍
- Addresses the vocabulary mismatch problem
- Handles complex natural language queries effectively
- Maintains high recall while improving precision
- Implementation benefits: 🚀
- Adaptable to diverse domains and content types
- Scalable to large document collections
- Provides a modular approach that can be tuned for specific use cases
This architecture represents a significant advancement over both traditional keyword search and single-vector embedding approaches, providing engineering teams with a robust framework for building next-generation search applications.