Private AI chatbot (Thesis)

AI chatbot with GDPR/nLPD-safe document access

Private AI chatbot (Thesis)

As part of my Bachelor’s thesis at Haaga-Helia University of Applied Sciences - learn more about me here - I designed and developed a chatbot that answers questions based on a company’s internal documents. The solution operates entirely in-house, without relying on third-party AI providers, ensuring full control over both data and outcomes.

The project was done for Innovatim ➚, a Swiss consulting firm whose clients work in industries where data privacy is non-negociable. They needed something that could run fully on their own infrastructure, provide reliable answers to employees, and fit neatly into client websites or internal tools.

The project

The core idea was simple: help employees quickly find answers hidden in all kind of internal documents, through a natural, conversational chat interface. Think of it as a private ChatGPT, answering queries based exclusively on a company’s internal documents.

To make this a reality, I had to overcome several practical challenges. These included processing a variety of file types such as PDFs and Word documents, ensuring the chatbot provided accurate answers while clearly showing their sources, and designing an interface that felt intuitive and user-friendly from the start.

From the beginning, I also had to consider several business requirements:

I approached the project with a lightweight, iterative methodology inspired by agile principles. Every feature, from the chatbot itself to the admin dashboard, was developed with a strong focus on user experience. My goal wasn’t just to build a tool that worked, but one that felt reliable, secure, and seamless for those who use it.

The solution

The chatbot uses Retrieval-Augmented Generation (RAG), which means it first searches through the company’s internal documents to find information relevant to the user’s question. Then, it generates answers based only on that retrieved content, ensuring responses are accurate and grounded in the company’s own data, without calling on any external AI services.

The user-facing frontend was designed to be clean and intuitive, while the admin panel offers essential tools for managing documents and switching models.

Users can upload PDFs, Word documents, Excel sheets, and plain text files directly through the admin interface.

Users can upload PDFs, Word documents, Excel sheets, and plain text files directly through the admin interface.

The chatbot interal prompt is also editable from the same dashboard.

The chatbot interal prompt is also editable from the same dashboard.

When a user asks a question, the chatbot searches for relevant documents and uses them to generate accurate responses.

When a user asks a question, the chatbot searches for relevant documents and uses them to generate accurate responses.

Security is maintained through token-based access for the chatbot and simple authentication for the backend. Although multilingual support is currently limited, the system’s architecture allows easy expansion.

The entire solution runs within Docker containers and is ready for deployment on private infrastructure. Some features, such as displaying file sources and enhancing language support, were outside the scope of the project, but are planned for future updates.

Here’s an overview of the system’s container-based architecture:

Diagram showing the architecture of the web chatbot project.

By the end of the thesis, the prototype successfully met most of the project’s original goals. It handles multiple document formats, supports several languages, runs entirely on local infrastructure, and delivers an experience immediately familiar for users.

There are still areas for improvement, such as refining citation formatting and streamlining the deployment process, but the foundation is strong. Most importantly, the company I collaborated with is satisfied with the outcome and is actively exploring ways to extend and bring this solution to market.

This project showed that AI doesn’t have to be a black box hosted in the cloud. With the right approach, you can build AI systems that are clear, trustworthy, and completely under your control.

Read the thesis