blog.post

RAG Knowledge Base: From Document Ingestion to Traceable Answers

Tech choices, RAG pipeline, and test results from the Cloud Knowledge Base graduation project.

June 18, 20262 min read

RAGSpring BootMilvusGraduation Project

My undergraduate capstone Cloud Knowledge Base is an end-to-end RAG Q&A system: users upload documents, the system parses and chunks them, embeds vectors into a store, and answers questions with cited sources.

Architecture overview

Document parsing — PDF / Word / Markdown unified parsing and semantic chunking
Embedding — Tongyi embeddings written to Milvus
Retrieval-augmented generation — Top-K similar chunks + prompt assembly
Traceable answers — Responses include referenced source passages

When deleting a document, also remove its vectors in Milvus to avoid "ghost retrieval" — an easy detail to miss in multi-user setups.

Backend highlights

Spring Boot 3 + JWT for multi-user isolation
Delete document → delete vectors, keeping the index consistent
28 functional and security tests all passed

Frontend highlights

Vue 3 for document management, Q&A history, and resource library modules
Bookshelf / notes / media extensions decoupled from the core RAG pipeline

Lessons learned

Chunk granularity

Too large: poor retrieval. Too small: fragmented context. We used a hybrid strategy: paragraph boundaries plus a max token limit.

Citation UI

Users need to know where answers come from. Showing cited snippets alongside the answer noticeably improved trust.

Email without domain verification is fine for testing only — configure SPF / DKIM in production.

RAG Knowledge Base: From Document Ingestion to Traceable Answers

Architecture overview

Backend highlights

Frontend highlights

Lessons learned

Chunk granularity

Citation UI

Further reading

Hello MDX — Blog System Launch

RAG Knowledge Base: From Document Ingestion to Traceable Answers

Architecture overview

Backend highlights

Frontend highlights

Lessons learned

Chunk granularity

Citation UI

Further reading

More posts

Hello MDX — Blog System Launch