Technical Deep Dive

Hana'sRAG Pipeline

A multi-stage system augmenting LLMs with retrieved knowledge from centralized memory—ensuring enterprise-grade accuracy.

OpenAI GPTMistral AIMongoDB AtlasGoogle KubernetesNest.jsPerplexity.aiJina.aiAzure AI

Key Components

A comprehensive architecture that handles everything from data ingestion to secure deployment

Ingestion Layer

Multiple memory types supported including Google Docs, PDFs, Confluence, Jira, and audio transcripts.

Manual ingestion via /memory command

Automatic sync for scheduled updates

Jina.ai for embedding conversion

Unstructured.ai for data processing

Embedding & Vectorization

Content chunking and vector embeddings using OpenAI or Microsoft Azure AI Services.

Intelligent content chunking

OpenAI & Azure AI integration

MongoDB Atlas vector storage

Semantic search optimization

Storage & Indexing

Advanced vector storage with metadata optimization and relevance filtering.

Score threshold filtering (>0.5)

Metadata grouping optimization

Advanced indexing strategies

Token usage reduction

Retrieval Engine

High-performance similarity search with batch processing for datasets with 90K+ rows.

Semantic similarity search

Regex partial matching

Metadata filtering

Batch processing at scale

LLM Generation

Context-aware response generation with OpenAI GPT, Mistral AI, and Perplexity.ai enrichment.

Multi-model support

Web context enrichment

Post-processing for accuracy

Citation and sourcing

Security & Orchestration

Enterprise-grade deployment on GKE with Nest.js, featuring HIPAA compliance.

Google Kubernetes Engine

JWT/OAuth authentication

HIPAA compliance via BAA

Auto-resync capabilities

Workflow Example

See how Hana's RAG pipeline processes data from ingestion to accurate response generation

Step 1

Ingestion

User adds memory via dashboard or chat command

Example: @Hana /memory add "Q3 goals: Increase efficiency by 20%"

Step 2

Processing

Content is chunked, embedded, and stored in MongoDB Atlas

Example: Vector embeddings generated with metadata

Step 3

Query

User asks a question in natural language

Example: @Hana What are our Q3 goals?

Step 4

Retrieval

Similarity search retrieves relevant chunks

Example: Score threshold >0.5 for high relevance

Step 5

Generation

LLM generates contextualized response

Example: Based on memory: Q3 goals include 20% efficiency increase

Step 6

Output

Accurate, cited response delivered to user

Example: Response with source attribution

Unique Optimizations

Advanced techniques that make Hana's RAG pipeline reliable, efficient, and enterprise-ready

Multi-Stage RAG

Initial semantic search followed by re-ranking for better precision.

Auto-Resync

Daily automatic updates keep data fresh without manual intervention.

Deep Integrations

Real-time data from Google Workspace, Stability.ai, and more.

Efficiency Optimized

Batch processing, metadata grouping, score thresholds reduce costs.

Enterprise Ready

CASA Tier-2, HIPAA compliant, no data used for external training.

Ready to get started?

Build yourRAG pipeline

Let's discuss how we can implement a custom RAG solution that grounds your AI in real data, just like we did for Hana—trusted by 19,000+ users.

47K+

Active users

800+

Organizations

14+

Months RAG expertise