跳转至

Two Products, Personal Conversate AI and Personal Tutorial AI

Based on our discussion, your M1 Max (64GB RAM) is the perfect "Server-Grade" environment to host both of these products side-by-side.

Here are the detailed technical development schemes for Product A (The Companion) and Product B (The Tutor).


Shared Infrastructure (The Foundation)

Since both run on the same machine, they share the backbone. * Hardware: MacBook Pro M1 Max (64GB Unified Memory). * Model Server: Ollama (for managing LLMs) + LM Studio (optional, for testing). * Orchestration: Python 3.11 + LangChain. * Privacy: All data stored in local JSON/Parquet/ChromaDB files. Zero cloud egress.


Product 1: The Personal Conversational AI

Codename: The Witness Core Philosophy: Latency, Empathy, and "Stateful" Continuity. Primary Tech: Dual-LLM (Fast/Slow), Standard RAG (Real-time), GraphRAG (Nightly).

1. Technical Stack

  • The Ears: Whisper (Large-v3) (Runs on CoreML/Neural Engine).
  • The Mouth (TTS): Coqui XTTS_v2 (high quality, clonable voice) or Piper (faster).
  • The "Reflex" Brain (Fast): Llama-3.2-3B (or Qwen2.5-3B for Chinese).
  • The "Deep" Brain (Smart): Llama-3.1-70B-Instruct (4-bit quantization).
  • Memory Store: ChromaDB (Vector Store) + Local File System (.txt logs).

2. The Architecture: "The Latency Masking Pipeline"

This system is designed to prevent awkward silences while the 70B model thinks.

The Workflow (Step-by-Step):

  1. Input: User speaks. Whisper transcribes to text.
  2. Parallel Execution:
    • Thread A (Reflex): The 3B Model receives the text immediately.
      • Prompt: "User said X. Give a generic emotional acknowledgment (e.g., 'Oh really?', 'That's tough.'). Keep it under 5 words."
      • Action: TTS speaks this immediately (Latency: <0.5s).
    • Thread B (Deep Thought): The 70B Model receives the text + Standard RAG Context.
      • RAG Query: Search ChromaDB for "Relevant past conversations" (e.g., User mentions 'Steve', retrieve who Steve is).
      • Prompt: "User said X. Context: [Retrieved Memories]. Generate a thoughtful, 2-sentence response."
  3. The Handoff:
    • As the "Reflex" audio finishes playing, the "Deep Thought" audio is queued to play immediately after.
    • User hears: "Oh wow... (pause) ... Does that mean you're going to quit your job?" (Seamless flow).

3. The Memory System (Day/Night Cycle)

  • Day Mode (Read/Write): Use Standard RAG. Every user message is embedded and saved to ChromaDB instantly.
  • Night Mode (The Dream State):
    • Trigger: 3:00 AM (or manual command).
    • Process: Run Microsoft GraphRAG on the day's transcript.txt + previous history.
    • Task 1 (Update Graph): Map new entities (e.g., "Steve is now an Ex-Boss").
    • Task 2 (Strategic Planning):
      • Prompt: "Based on the graph, what topics has the user avoided? What is unclear? Generate 3 questions for tomorrow."
    • Output: Saves a daily_briefing.json that the AI reads when it wakes up.

Product 2: The Tutorial AI

Codename: The Professor Core Philosophy: Mastery, Synthesis, and Structure. Primary Tech: GraphRAG (Heavy), Large Context LLM, Document Ingestion.

1. Technical Stack

  • The Interface: Text-First (Markdown supported). Use Streamlit or Gradio.
  • The Brain: Command R (35B) (Excellent for RAG/Citations) or Qwen2.5-72B (If studying STEM/Math).
  • The Knowledge Base: Microsoft GraphRAG (Strictly).
  • Ingestion Tools: PyMuPDF or Marker (to convert PDFs to clean Markdown).

2. The Architecture: "The Library Pipeline"

This product does not care about speed; it cares about holistic understanding.

Phase A: Ingestion (The "Study" Phase) * You drop a folder of 10 PDF books (e.g., "Permaculture Design"). * Step 1: Script converts PDFs -> Clean Markdown text. * Step 2: GraphRAG Indexing runs. * Entity Extraction: It identifies terms like "Swale," "Zone 1," "Mulch." * Community Detection: It groups concepts (e.g., "Water Management Techniques"). * Result: A graph network of the books, stored locally.

Phase B: Interaction (The "Classroom" Phase) The UI offers two buttons for the user:

  • Button 1: "Fact Check" (Local Search)

    • User: "What is the definition of a Swale?"
    • Tech: GraphRAG Local Search. Looks at the "Swale" node and its immediate neighbors.
    • Latency: ~3-5 seconds.
    • Output: Precise definition with citations (Book A, Page 42).
  • Button 2: "Synthesize" (Global Search)

    • User: "Compare the water management strategies between Book A and Book B."
    • Tech: GraphRAG Global Search (Map-Reduce).
    • Process: It scans the "Water" communities across the entire graph, synthesizes the conflicts and agreements.
    • Latency: ~30-60 seconds.
    • Output: A mini-essay/tutorial in Markdown format.

3. The "Socratic" Feature

Unlike the Companion AI, The Professor uses the Graph to test you. * Feature: "Generate Quiz." * Logic: The AI traverses the Knowledge Graph, finds two connected nodes (e.g., "Nitrogen Fixation" and "Legumes"), and generates a question: "Explain the relationship between Legumes and Nitrogen Fixation based on Chapter 3."


Summary Comparison Table

Feature Product A: The Companion Product B: The Professor
Primary Goal Emotional Connection / Latency Knowledge Mastery / Synthesis
LLM Model Llama 3 70B (Personality) Command R or Qwen 72B (Logic)
RAG Type Standard RAG (Instant) GraphRAG (Global)
GraphRAG Role Offline Analysis (Nightly) Primary Search Engine (Always)
Input Audio (Whisper) Text / PDF / Code
Output Style Conversational, Short, Spoken Structured, Long, Markdown
Latency Tolerance Extremely Low (<1s) High (30s+ acceptable)

Development Roadmap (Where to Start)

  1. Week 1 (Infrastructure): Install Ollama, pull llama3.1:70b and nomic-embed-text. Verify your M1 Max memory usage.
  2. Week 2 (Build The Professor): It is easier to build. Set up GraphRAG, ingest 1 book, and test "Global Search" queries via command line.
  3. Week 3 (Build The Companion Logic): Write the Python script for the "Dual-Thread" (Reflex/Deep) logic using text only.
  4. Week 4 (Add Senses): Add Whisper (Ear) and Coqui (Mouth) to the Companion.
  5. Week 5 (The Bridge): Write the "Nightly Script" that uses GraphRAG to analyze the Companion's chat logs.