RAG in Go: A Practical Implementation Using Qdrant and OpenAI

RAG in Go: A Practical Implementation Using Qdrant and OpenAI

⏱️ Estimated reading time: ~10 minutes

One of the key challenges for companies working with AI is enabling efficient access to internal knowledge. At first glance, it might seem easy: just feed your corporate documents into a large language model (LLM) and let it generate answers. But in practice, such solutions often fall short -lacking in precision, speed, or cost-efficiency - and rarely deliver the expected quality.

Retrieval-Augmented Generation (RAG) is a method where a language model (like GPT) looks up external data before creating a response. This approach lets you build smart systems that provide accurate, relevant answers using your specific data.

🔍 Why RAG Matters

Traditional LLMs are constrained by their training data. They cannot access your product catalog, internal documentation, or recent updates. RAG solves this limitation by first conducting a vector search in a Qdrant database to find relevant documents, which then serve as context for the model's response.

This approach allows the system to adapt to your specific domain without retraining the LLM -making it ideal for support bots, internal assistants, and product search features.

🛠️ Project Structure

This Go-based RAG project is organized as follows:

go-qdrant-rag-sample/
├── .gitignore                   ← excludes sensitive files like .env
├── docker-compose.yml          ← spins up Qdrant
├── go.mod / go.sum             ← Go module and dependencies
├── README.md                   ← project overview
├── .vscode/
│   └── launch.json             ← VS Code debug config
├── cmd/
│   └── api/
│       └── main.go             ← entry point
├── data/
│   └── products.csv            ← product data to embed
├── env/
│   └── .env                    ← stores your OpenAI key (excluded from Git)
├── internal/
│   ├── api/
│   │   └── server.go           ← runs the HTTP server
│   ├── config/
│   │   └── env.go              ← loads environment variables
│   ├── models/
│   │   └── product.go          ← data structure for product
│   ├── qdrant/
│   │   ├── collection.go       ← handles collection creation
│   │   ├── embedder.go        ← generates text embeddings
│   │   ├── ingester.go        ← indexes data into Qdrant
│   │   ├── rag.go             ← core RAG logic: search + LLM
│   │   └── search.go          ← semantic search logic
│   └── utils/
│       └── csvreader.go       ← reads product data from CSV

📌 Find the complete source code in our GitHub repository:

https://github.com/yuniko-software/go-qdrant-rag-sample

⚙️ How to Run the Project

  1. Start Qdrant:
docker-compose up -d qdrant

2. Create an .env file in the env/ folder and set your OpenAI API key:

OPENAI_API_KEY=sk-your_key

⚠️ Do not share this key. The .env file is already in .gitignore.

  1. Run the backend server:
go run ./cmd/api

Your API is now running at:

http://localhost:8080/rag 

🧪 Example Request

You can test it via Postman or the RapidAPI extension for VS Code:

GET http://localhost:8080/rag?q=sonyPS5&top=10

Example response:

{
  "question": "sonyPS5",
  "total": 2,
  "answer": [
      {
        "description": "The Sony PlayStation 5 Console delivers ...",
        "name": "Sony PlayStation 5 Console",
        "price": 499.99
      },
      {
        "description": "PlayStation 5 Digital Edition ...",
        "name": "Sony PlayStation 5 Digital Edition",
        "price": 399.99
      }
    ]
}

🗂️ RAG system workflow

📌 What’s important :

  1. The user formulates a question, which is submitted to the system. The query is converted into a vector representation using an embedding model.
  2. The retrieval mechanism performs semantic analysis and searches the database (e.g., Qdrant) for text fragments that are most relevant to the query.
func SearchProducts(query string, topK int, maxPrice *float64) ([]SearchResult, error) {
  embedding, err := GetEmbedding(query)
  if err != nil {
  return nil, err
  }
  
  request := map[string]interface{}{
  	"vector":       embedding,
  	"top":          topK,
  	"with_payload": true,
  	"with_vector":  false,
  }
  
  if maxPrice != nil {
  	request["filter"] = map[string]interface{}{
  		"must": []map[string]interface{}{
  			{
  				"key": "price",
  				"range": map[string]interface{}{
  					"lt": *maxPrice,
  				},
  			},
  		},
  	}
  }
  
  host := config.QdrantHost()
  
  client := resty.New()
  resp, err := client.R().
  	SetHeader("Content-Type", "application/json").
  	SetBody(request).
  	Post(host + "/collections/products/points/search")
  
  if err != nil {
  	return nil, err
  }
  
  var result struct {
  	Result []SearchResult `json:"result"`
  }
  
  if err := json.Unmarshal(resp.Body(), &result); err != nil {
  	return nil, err
  }
  
  return result.Result, nil
}
  1. The retrieved fragments are combined into a unified context. When forming the prompt, rules for how the context should be interpreted are also defined - whether the model should rely strictly on the provided context or be allowed to supplement the answer.
  2. The resulting prompt is sent to the large language model (LLM), which generates a meaningful response based on it.
func RunRAG(question string, topK int) (RAGResponse, error) {
  results, err := SearchProducts(question, topK, nil)
  if err != nil {
  return RAGResponse{}, fmt.Errorf("retrieval error: %w", err)
  }
  // Build context from Qdrant results
  context := ""
  for _, r := range results {
  	name := r.Payload["name"]
  	desc := r.Payload["description"]
  	context += fmt.Sprintf("- %v: %v\n", name, desc)
  }
  
  prompt := fmt.Sprintf(`
  You are a helpful assistant. Given the product context below, respond with a valid JSON array of the best-matching product payloads.
  
  Each payload must include:
  - "name": string
  - "description": string
  - "minimum_order": integer
  - "price": number
  - "price_currency": string
  - "supply_ability": integer
  
  DO NOT include any "id" or "score" fields.
  DO NOT wrap the response in triple backticks or Markdown formatting.
  
  Context:
  %s
  
  Question: %s
  
  Respond ONLY with the array of payloads.`, context, question)
  
  apiKey := os.Getenv("OPENAI_API_KEY")
  if apiKey == "" {
  	return RAGResponse{}, fmt.Errorf("missing OPENAI_API_KEY")
  }
  
  reqBody := map[string]interface{}{
  	"model": "gpt-4o-2024-08-06",
  	"messages": []map[string]string{
  		{"role": "user", "content": prompt},
  	},
  }
  encoded, _ := json.Marshal(reqBody)
  
  req, _ := http.NewRequest("POST", "https://api.openai.com/v1/chat/completions", bytes.NewBuffer(encoded))
  req.Header.Set("Content-Type", "application/json")
  req.Header.Set("Authorization", "Bearer "+apiKey)
  
  res, err := http.DefaultClient.Do(req)

  ...
}
  1. The response generated by the LLM may contain extra details or technical metadata.
  2. The response then undergoes postprocessing: unnecessary parts are removed, and the result is formatted into a readable and clean output.
  3. Finally, the system returns a polished, human-readable answer to the user.

✅ Conclusion

This project demonstrates how developers can build a complete RAG pipeline using Golang — from data ingestion and embedding to vector search and response generation. By working with Go, you not only gain performance and simplicity but also get a clear, modular view of how a RAG system is structured and operates under the hood.