RAG in Go: A Practical Implementation Using Qdrant and OpenAI
⏱️ Estimated reading time: ~10 minutes
One of the key challenges for companies working with AI is enabling efficient access to internal knowledge. At first glance, it might seem easy: just feed your corporate documents into a large language model (LLM) and let it generate answers. But in practice, such solutions often fall short -lacking in precision, speed, or cost-efficiency - and rarely deliver the expected quality.
Retrieval-Augmented Generation (RAG) is a method where a language model (like GPT) looks up external data before creating a response. This approach lets you build smart systems that provide accurate, relevant answers using your specific data.
🔍 Why RAG Matters
Traditional LLMs are constrained by their training data. They cannot access your product catalog, internal documentation, or recent updates. RAG solves this limitation by first conducting a vector search in a Qdrant database to find relevant documents, which then serve as context for the model's response.
This approach allows the system to adapt to your specific domain without retraining the LLM -making it ideal for support bots, internal assistants, and product search features.
🛠️ Project Structure
This Go-based RAG project is organized as follows:
go-qdrant-rag-sample/
├── .gitignore ← excludes sensitive files like .env
├── docker-compose.yml ← spins up Qdrant
├── go.mod / go.sum ← Go module and dependencies
├── README.md ← project overview
├── .vscode/
│ └── launch.json ← VS Code debug config
├── cmd/
│ └── api/
│ └── main.go ← entry point
├── data/
│ └── products.csv ← product data to embed
├── env/
│ └── .env ← stores your OpenAI key (excluded from Git)
├── internal/
│ ├── api/
│ │ └── server.go ← runs the HTTP server
│ ├── config/
│ │ └── env.go ← loads environment variables
│ ├── models/
│ │ └── product.go ← data structure for product
│ ├── qdrant/
│ │ ├── collection.go ← handles collection creation
│ │ ├── embedder.go ← generates text embeddings
│ │ ├── ingester.go ← indexes data into Qdrant
│ │ ├── rag.go ← core RAG logic: search + LLM
│ │ └── search.go ← semantic search logic
│ └── utils/
│ └── csvreader.go ← reads product data from CSV
📌 Find the complete source code in our GitHub repository:
https://github.com/yuniko-software/go-qdrant-rag-sample
⚙️ How to Run the Project
- Start Qdrant:
docker-compose up -d qdrant
2. Create an .env file in the env/ folder and set your OpenAI API key:
OPENAI_API_KEY=sk-your_key
⚠️ Do not share this key. The .env file is already in .gitignore.
- Run the backend server:
go run ./cmd/api
Your API is now running at:
http://localhost:8080/rag
🧪 Example Request
You can test it via Postman or the RapidAPI extension for VS Code:
GET http://localhost:8080/rag?q=sonyPS5&top=10
Example response:
{
"question": "sonyPS5",
"total": 2,
"answer": [
{
"description": "The Sony PlayStation 5 Console delivers ...",
"name": "Sony PlayStation 5 Console",
"price": 499.99
},
{
"description": "PlayStation 5 Digital Edition ...",
"name": "Sony PlayStation 5 Digital Edition",
"price": 399.99
}
]
}
🗂️ RAG system workflow

📌 What’s important :
- The user formulates a question, which is submitted to the system. The query is converted into a vector representation using an embedding model.
- The retrieval mechanism performs semantic analysis and searches the database (e.g., Qdrant) for text fragments that are most relevant to the query.
func SearchProducts(query string, topK int, maxPrice *float64) ([]SearchResult, error) {
embedding, err := GetEmbedding(query)
if err != nil {
return nil, err
}
request := map[string]interface{}{
"vector": embedding,
"top": topK,
"with_payload": true,
"with_vector": false,
}
if maxPrice != nil {
request["filter"] = map[string]interface{}{
"must": []map[string]interface{}{
{
"key": "price",
"range": map[string]interface{}{
"lt": *maxPrice,
},
},
},
}
}
host := config.QdrantHost()
client := resty.New()
resp, err := client.R().
SetHeader("Content-Type", "application/json").
SetBody(request).
Post(host + "/collections/products/points/search")
if err != nil {
return nil, err
}
var result struct {
Result []SearchResult `json:"result"`
}
if err := json.Unmarshal(resp.Body(), &result); err != nil {
return nil, err
}
return result.Result, nil
}
- The retrieved fragments are combined into a unified context. When forming the prompt, rules for how the context should be interpreted are also defined - whether the model should rely strictly on the provided context or be allowed to supplement the answer.
- The resulting prompt is sent to the large language model (LLM), which generates a meaningful response based on it.
func RunRAG(question string, topK int) (RAGResponse, error) {
results, err := SearchProducts(question, topK, nil)
if err != nil {
return RAGResponse{}, fmt.Errorf("retrieval error: %w", err)
}
// Build context from Qdrant results
context := ""
for _, r := range results {
name := r.Payload["name"]
desc := r.Payload["description"]
context += fmt.Sprintf("- %v: %v\n", name, desc)
}
prompt := fmt.Sprintf(`
You are a helpful assistant. Given the product context below, respond with a valid JSON array of the best-matching product payloads.
Each payload must include:
- "name": string
- "description": string
- "minimum_order": integer
- "price": number
- "price_currency": string
- "supply_ability": integer
DO NOT include any "id" or "score" fields.
DO NOT wrap the response in triple backticks or Markdown formatting.
Context:
%s
Question: %s
Respond ONLY with the array of payloads.`, context, question)
apiKey := os.Getenv("OPENAI_API_KEY")
if apiKey == "" {
return RAGResponse{}, fmt.Errorf("missing OPENAI_API_KEY")
}
reqBody := map[string]interface{}{
"model": "gpt-4o-2024-08-06",
"messages": []map[string]string{
{"role": "user", "content": prompt},
},
}
encoded, _ := json.Marshal(reqBody)
req, _ := http.NewRequest("POST", "https://api.openai.com/v1/chat/completions", bytes.NewBuffer(encoded))
req.Header.Set("Content-Type", "application/json")
req.Header.Set("Authorization", "Bearer "+apiKey)
res, err := http.DefaultClient.Do(req)
...
}- The response generated by the LLM may contain extra details or technical metadata.
- The response then undergoes postprocessing: unnecessary parts are removed, and the result is formatted into a readable and clean output.
- Finally, the system returns a polished, human-readable answer to the user.
✅ Conclusion
This project demonstrates how developers can build a complete RAG pipeline using Golang — from data ingestion and embedding to vector search and response generation. By working with Go, you not only gain performance and simplicity but also get a clear, modular view of how a RAG system is structured and operates under the hood.