Building a Smarter Search with Qdrant, BGE-M3 All-in-One Embedding Model, and Hybrid Reranking

Building a Smarter Search with Qdrant, BGE-M3 All-in-One Embedding Model, and Hybrid Reranking

🔗Source code with README: https://github.com/yuniko-software/bge-m3-qdrant-sample


Why Hybrid Search with Reranking?

The combination of BGE-M3's all-in-one embedding model, Qdrant, and hybrid reranking offers significant advantages over traditional methods:

  • Combines strengths of multiple approaches - Integrates semantic understanding from dense vectors, keyword precision from sparse vectors, and token-level matching from ColBERT vectors
  • Outperforms traditional full-text search - Goes beyond basic keyword matching to understand meaning while preserving the keyword matching benefits of full-text search
  • Two-stage efficiency - Uses fast methods for initial retrieval followed by precise reranking, optimizing both speed and accuracy

This approach delivers more relevant results for complex queries while maintaining the strengths of traditional search methods.


BGE-M3 embedding model benchmark. Multi-lingual

Who Would Benefit from This Implementation?

This solution is particularly valuable for organizations with large document repositories (RAG architecture), e-commerce platforms needing nuanced product discovery, or content-heavy applications where users expect high quality search experiences.


Key Components of the System

BGE-M3 Embedding Model

BGE-M3 (BAAI General Embedding Model 3) is a state-of-the-art embedding model that can simultaneously generate dense, sparse and multi-vector (ColBERT) representations for text.

Qdrant Vector Database

Qdrant is a vector database designed for efficient similarity search. It supports:

  • Multiple vector types in a single collection
  • Hybrid search combining dense and sparse vectors
  • Multi-vector search for token-level matching (ColBERT)

Building a Sample Product Search System

1. Setting Up the Environment

First, we need to set up Qdrant and install the necessary packages:

# Pull and run Qdrant Docker image
!docker run -d --name qdrant-db -p 6333:6333 -p 6334:6334 qdrant/qdrant:latest

# Install required packages
%pip install -U transformers FlagEmbedding accelerate
%pip install pandas
%pip install qdrant_client

2. Loading the Product Dataset

We'll use a CSV file containing product information:

def load_products(file_path='products.csv'):
    products_df = pd.read_csv(file_path, sep='|')
    products_json = products_df.to_dict(orient='records')
    return products_df, products_json

products_df, products_json = load_products()

3. Initializing the BGE-M3 Model

def initialize_model():
    """Initialize the BGE-M3 embedding model"""
    return BGEM3FlagModel('BAAI/bge-m3', use_fp16=True)

model = initialize_model()

4. Formatting Products for Embedding

def create_product_text(product):
    """Format product information for embedding"""
    return f"Product: {product['Name']}\\nDescription: {product['Description']}"

5. Generating Embeddings

The BGE-M3 model generates three types of embeddings in a single pass:

def generate_embeddings(text, model):
    """Generate all three types of embeddings for a text"""
    return model.encode(
        [text], 
        return_dense=True,
        return_sparse=True,
        return_colbert_vecs=True
    )

Let's explore each type of embedding:

Dense Vectors (1024 dimensions)

These vectors capture semantic meaning, allowing the system to understand that "running footwear" and "athletic shoes" are conceptually similar.

Sparse Vectors (Lexical Weights)

These vectors maintain the connection to the original tokens, helping with exact keyword matching. For example, the token "Kinvara" in "Saucony Men's Kinvara 13 Running Shoe" gets a high weight.

ColBERT Vectors (Token-level embeddings)

These vectors represent each token in the text, allowing for fine-grained matching between query tokens and product description tokens.

6. Creating the Qdrant Collection

We'll create a collection in Qdrant that can store and efficiently search all three vector types:

def create_qdrant_collection(collection_name="products"):
    client = QdrantClient("localhost", port=6333)
    
    client.create_collection(
        collection_name=collection_name,
        vectors_config={
            "dense": models.VectorParams(
                size=1024,
                distance=models.Distance.COSINE
            ),
            "colbert": models.VectorParams(
                size=1024,
                distance=models.Distance.COSINE,
                multivector_config=models.MultiVectorConfig(
                    comparator=models.MultiVectorComparator.MAX_SIM
                ),
            )
        },
        sparse_vectors_config={
            "sparse": models.SparseVectorParams(
                index=models.SparseIndexParams(
                    on_disk=True
                )
            )
        },
    )
    
    return client

7. Indexing Products in Qdrant

We'll generate embeddings for all products and insert them into the Qdrant collection:

def insert_products_to_qdrant(client, product_embeddings, collection_name="products"):
    for embedding in product_embeddings:
        product = embedding["product"]
        dense_vector = embedding["dense_vector"]
        colbert_vectors = embedding["colbert_vectors"]
        sparse_data = embedding["sparse_weights"]

        # Convert sparse weights to Qdrant format
        qdrant_sparse = create_sparse_vector(sparse_data)
        
        # Insert into Qdrant
        client.upsert(
            collection_name=collection_name,
            points=[
                models.PointStruct(
                    id=product["Id"],
                    payload=product,
                    vector={
                        "dense": dense_vector,
                        "colbert": colbert_vectors,
                        "sparse": qdrant_sparse
                    }
                )
            ]
        )

8. Implementing the Search Function

The real power of this system comes from the hybrid search approach:

def search_products(client, model, search_query, limit=3, prefetch_limit=6, collection_name="products"):
    # Generate embeddings for the query
    query_outputs = model.encode(
        [search_query],
        return_dense=True,
        return_sparse=True,
        return_colbert_vecs=True
    )
    
    dense_vec = query_outputs["dense_vecs"][0]
    sparse_vec = query_outputs["lexical_weights"][0]
    colbert_vec = query_outputs["colbert_vecs"][0]
    
    # Convert sparse vector to Qdrant format
    qdrant_sparse = create_sparse_vector(sparse_vec)
    
    # Set up prefetch for hybrid search
    prefetch = [
        models.Prefetch(
            query=qdrant_sparse,
            using="sparse",
            limit=prefetch_limit),
        models.Prefetch(
            query=dense_vec,
            using="dense",
            limit=prefetch_limit)
    ]
    
    # Perform reranking with ColBERT
    results = client.query_points(
        collection_name,
        prefetch=prefetch,
        query=colbert_vec,
        using="colbert",
        with_payload=True,
        limit=limit,
    )
    
    return results

This search function works in three stages:

  1. Pre-filtering with both sparse and dense vectors to get a candidate set
  2. Reranking with ColBERT for precise token-level matching
  3. Returning the most relevant products

Pre-filtering is crucial because the computationally intensive ColBERT reranking would be prohibitively expensive to run on an entire collection. By first narrowing down candidates using faster vector search methods, we maintain efficiency while still leveraging the precision of token-level matching on the most promising results.

Example 1: Running shoes for men

# Example 1: Running shoes for men
result= search_products(client, model, "running shoes for men")
display_search_results(result)
Found 3 matching products
----------------------------------------
1. adidas Men's Ultraboost Personal Best Running Shoe - Score: 3.76
   Price: 626.1 USD
   Men's shoes - low (non football).

2. adidas Men's Racer Tr21 Running Shoe - Score: 3.51
   Price: 413.1 EUR
   Everyday style with a running twist. These men's adidas sneakers have a Cloudfoam midsole for ste...

3. Under Armour Men's Charged Assert 9 Running Shoe - Score: 3.49
   Price: 434.94 USD
   These running shoes are built to help anyone go faster-Charged Cushioning® helps protect against ...

Example 2: Nintendo

# Example 2: Nintendo products
result = search_products(client, model, "nintendo")
display_search_results(result)
Found 3 matching products
----------------------------------------
1. Nintendo Switch OLED Model - Score: 2.41
   Price: 349.99 USD
   The Nintendo Switch OLED Model offers a vivid 7-inch OLED display and enhanced audio for portable...

2. NZND Case for At&t Motivate 3... - Score: 2.10
   Price: 84.09 USD

3. Samsung 980 PRO 1TB NVMe M.2 SSD - Score: 1.97
   Price: 149.99 USD

Example 3: Xbox gamepad

# Example 3: Xbox gamepad
result = search_products(client, model, "xbox gamepad")
display_search_results(result)
Found 3 matching products
----------------------------------------
1. PowerA Advantage Wired Controller for Xbox Series X|S... - Score: 3.36
   Price: 473.73 EUR
   Illuminate the possibilities with the PowerA Advantage Wired Controller for Xbox Series X|S with...

2. Microsoft Xbox Series X - Score: 2.79
   Price: 499.99 USD
   The Microsoft Xbox Series X offers powerful gaming performance with 4K resolution, lightning-fast...

3. Microsoft Xbox Series X - Score: 2.74
   Price: 499.99 USD
   The Xbox Series X is a powerful next-gen gaming console with 4K UHD resolution, fast loading spee...

Conclusion

The combination of BGE-M3's multi-vector embeddings and Qdrant's hybrid search capabilities creates a powerful search architecture that delivers superior results across multiple dimensions:

  • Technical advantages: 🛠️
    • Balances semantic understanding with lexical precision
    • Leverages token-level matching for fine-grained relevance assessment
    • Optimizes computational resources through strategic pre-filtering
  • Information retrieval improvements: 🔍
    • Addresses the vocabulary mismatch problem
    • Handles complex natural language queries effectively
    • Maintains high recall while improving precision
  • Implementation benefits: 🚀
    • Adaptable to diverse domains and content types
    • Scalable to large document collections
    • Provides a modular approach that can be tuned for specific use cases

This architecture represents a significant advancement over both traditional keyword search and single-vector embedding approaches, providing engineering teams with a robust framework for building next-generation search applications.