Home / Projects / Sebastian

Sebastian: My Personal RAG Knowledge Assistant

A self-hosted AI assistant with access to calendar, email, maps, notes, and drive that learns how I live and work, keeping all data private and under my control.

2024
1 month
Personal AI Assistant
RAG FastAPI pgvector GPT-4o-mini Self-Hosted
RAG System Architecture

The Concept

Have you ever thought of an idea, opened Instagram, and the exact same thing shows up on your feed? That eerie moment when algorithms seem to know you better than you know yourself.

Now imagine having that kind of intelligence, but working for you, not against you.

No data harvesting, no ads, no cloud logging, just an assistant that actually remembers your world.

💡 Core Philosophy

"ChatGPT meets Jarvis, but all the data stays with me."

Sebastian isn't a chatbot. He's a personal memory system (part calendar manager, part research assistant, part life archivist)

He doesn't try to replace me; he observes how I work and makes sure I never lose track of what matters.

Why I Built This

Sebastian is built around one principle:

Context is everything.

He doesn't just respond to prompts, he uses context from my digital life to make sense of what I ask.

Real Examples:

  • "Remind me to reply to Youssef next week"
    → He knows Youssef is in my Gmail threads
  • "Find me time to finish the proposal"
    → He checks my Google Calendar
  • "Summarize my last meeting notes"
    → He pulls data from Notion
  • "Where did I park last Tuesday?"
    → He checks Google Maps location history

🔒 Privacy First: All of this works locally or through my own API keys, so my data never leaves my environment.

Personal Workspace

Sebastian integrates with my entire digital workspace

Architecture Overview

Sebastian runs as a local API layer built with FastAPI, powered by a RAG engine, and enhanced by several personal integrations.

┌──────────────────────────────┐
│        FastAPI Core          │
│   (Query Router + Auth)      │
└─────────────┬────────────────┘
              │
    ┌─────────┼──────────┐
    │         │          │
 Notion     Gmail     Calendar
  API        API        API
    │         │          │
    └─────────┼──────────┘
              ↓
       Context Aggregator
              ↓
   pgvector + OpenAI Embeddings
              ↓
         RAG Engine
              ↓
       GPT-4o-mini

🧰 Tech Stack

Backend & API

  • FastAPI (Python)
  • PostgreSQL + pgvector
  • LangChain
  • Docker

AI & Embeddings

  • GPT-4o-mini
  • OpenAI Embeddings
  • text-embedding-3-small
  • RAG Pipeline

Integrations

  • Google Calendar API
  • Gmail API
  • Google Drive API
  • Google Maps API

Tools & Auth

  • Notion API
  • n8n Automation
  • Auth0 (future)
  • Self-hosted deployment

🧠 Memory System

Sebastian's long-term memory uses RAG (Retrieval-Augmented Generation).
All my data, from meeting notes to location logs, is vectorized and stored in PostgreSQL with pgvector.

Database Schema

The memory system stores embeddings alongside metadata, enabling semantic search across all my digital information sources.

Example Memory Entries

📝 Notion Note

Tagged by project and topic

📅 Calendar Event

With participants and location

📧 Gmail Thread

Sender, subject, and content

This means Sebastian can answer:

  • "What did I discuss with Youssef last week?"
  • "Show me notes related to my next presentation."
  • "What's my busiest weekday this month?"

Example Workflow: "Summarize my week"

Let's say I ask Sebastian to summarize my week. Here's what happens internally:

1️⃣ Collect Context

Fetch events from Google Calendar, unread emails, and recent Notion notes.

2️⃣ Embed & Rank

Generate embeddings for all entries → use pgvector to find the most relevant ones.

3️⃣ Compose Prompt

Build a structured context summary (meetings, notes, tasks).

4️⃣ Generate Insight

Ask GPT-4o-mini to produce a weekly summary with actionable insights.

Sebastian's Response:

"You had 5 meetings (3 related to your AI projects), finished 2 Notion tasks, and got 4 unread follow-ups. You've been most active between 10-14h. Would you like me to schedule a review slot tomorrow?"

Hosting

Sebastian is self-hosted, meaning:

💾

You own your memory

🎛️

Full control

📤

Export anytime

🚀 Next Steps

  • Voice Interface: Add voice-to-text using Whisper + TTS for hands-free interaction
  • Home Automation: Connect to Home Assistant for environment control
  • Local Embeddings: Use nomic-embed-text for fully offline operation
  • Daily Journaling: Auto-generate daily journal from my data patterns
  • Notion Dashboard: Stream responses via WebSocket to live Notion dashboard
  • Proactive Insights: Sebastian suggests optimizations before I ask

Interested in building your own personal AI?

Let's discuss RAG systems, self-hosted AI, and privacy-first automation.

Let's Connect