跳转至

FalkorDB Updated PS and TS for The Professor

Here is the updated Product Statement and Technical Stack for your Tutorial AI, "The Professor," re-architected to leverage FalkorDB for dynamic curriculum management.

This version moves away from the static nature of Microsoft GraphRAG, replacing it with FalkorDB to create a "Living Syllabus" that adapts physically to the student's performance in real-time.


1. Product Statement

For rigorous autodidacts and researchers who demand structural mastery over simple information retrieval, "The Professor" is a Local-First, Adaptive Pedagogical Engine That converts static libraries (PDFs/Textbooks) into a gamified, dependency-based knowledge graph. Unlike "Chat with PDF" tools that randomly fetch paragraphs, or static courses that force a linear path, "The Professor" utilizes FalkorDB to maintain a "Living Syllabus." It maps concepts as a dependency tree (Prerequisites -> Advanced Topics) and tracks the student's mastery of each node in real-time. If the student fails a "Feynman Test," the graph automatically expands to insert remedial sub-nodes, ensuring no foundational gaps remain. Guaranteed to provide a university-level Socratic tutoring experience, fully offline and private on your M1 Max.


2. Core Pillars

  1. The Dependency Graph: We treat knowledge like a technology tree in a video game. You cannot unlock "Quantum Entanglement" until you have mastered "Wave Function." FalkorDB enforces this hierarchy, preventing the AI from hallucinating advanced answers before the basics are established.
  2. The "Feynman" Feedback Loop: The system does not just explain; it demands you explain it back. The AI grades your explanation against the source text stored in the graph.
  3. Self-Healing Curriculum: Because FalkorDB is a transactional database, we can write to it instantly. If you struggle with a concept, the AI physically refactors the graph structure, inserting new "Bridge Nodes" (remedial lessons) into your syllabus on the fly.

3. The Technical Stack (FalkorDB Edition)

Hardware Target: MacBook Pro M1 Max (64GB RAM) Environment: Docker (FalkorDB), Python 3.11, Ollama, Streamlit.

A. The Database (The Living Syllabus)

  • Engine: FalkorDB (Docker Container).
  • Data Model (The Schema):
    • Static Nodes: SourceMaterial (The Book), Concept (The Topic), Chunk (Raw Text + Vector).
    • Dynamic Nodes: Student (User Profile), Session (Log).
    • Edges: (:Concept)-[:PREREQUISITE]->(:Concept), (:Student)-[:MASTERED {score: 95}]->(:Concept).
  • Vector Indexing: Enabled on Chunk nodes for semantic retrieval.

B. The Intelligence Layer

  • 1. The Teacher (The Brain): Qwen2.5-72B-Instruct (4-bit).
    • Why: It currently holds the crown for Open Source STEM/Logic/Coding tasks. It generates the lecture scripts and grades the user's answers.
  • 2. The Architect (The Graph Builder): Qwen2.5-14B-Instruct.
    • Role: Used during ingestion. It reads the raw PDF text and structures it into Cypher queries to build the initial Dependency Graph.
  • 3. The Illustrator (The Blackboard): Qwen2.5-Coder-7B.
    • Role: Specialized in generating Mermaid.js diagrams and LaTeX formulas to visually explain the concepts.

C. The Interface Layer

  • UI: Streamlit (Split View).
    • Left: Chat/Voice Interface.
    • Right: The Blackboard (Renders Markdown/Mermaid) + The Map (PyVis visualization of the FalkorDB Syllabus graph).
  • I/O: Whisper (Input) + Coqui TTS (Output).

4. The Mechanism (The Pedagogical Loop)

This workflow replaces the standard "Chat" loop with a "Teaching" loop.

Phase 1: Ingestion (The "Curriculum Builder")

  • One-time process when adding a book.
  • Input: PDF Textbook.
  • Qwen-14B: Scans Table of Contents and Chapter Summaries.
  • Action: Executes Cypher to build the tree:
    CREATE (c1:Concept {name: 'Newtonian Physics'})
    CREATE (c2:Concept {name: 'Lagrangian Mechanics'})
    CREATE (c1)-[:PREREQUISITE_TO]->(c2)
    
  • Embedder: Chunks the text, vectorizes it, and links chunks to the Concepts.

Phase 2: The Lesson (The Interaction)

  • State Check: Python script queries FalkorDB: "Find the first Concept where (:Student)-[:MASTERED]->(:Concept) does NOT exist, but all PREREQUISITES are mastered."
  • Retrieval: Fetches the text chunks + vectors associated with that specific Concept.
  • Generation (Teacher & Illustrator):
    • Teacher: Explains the concept via TTS.
    • Illustrator: Generates a Mermaid Flowchart shown on the Blackboard.

Phase 3: The Feynman Test (The Assessment)

  • AI: "Now, explain the relationship between [Concept A] and [Concept B] in your own words."
  • User: "Well, A causes B because..."
  • Grading: Qwen-72B compares User Audio (Text) vs. Source Chunks.
    • Pass: Run CREATE (:Student)-[:MASTERED]->(:Concept). The UI Map turns that node Gold.
    • Fail: The AI queries the graph for the sub-components of the concept and starts a Remedial Session.

5. Development Roadmap

  1. Week 1: The Graph Structure

    • Design the Cypher Schema for a "Syllabus."
    • Manually insert a few nodes (e.g., "Math 101" -> "Math 102") into FalkorDB to test the "Unlock" logic.
  2. Week 2: The Ingestion Pipeline

    • Use PyMuPDF to read a PDF.
    • Use Qwen-14B to output JSON representing the hierarchy.
    • Write the Python script to push this hierarchy into FalkorDB.
  3. Week 3: The Blackboard (Streamlit)

    • Build the Streamlit UI.
    • Get streamlit-mermaid working so you can render diagrams generated by the LLM.
  4. Week 4: The Loop

    • Connect Qwen-72B.
    • Write the prompt for the "Grader" (comparing User Input vs Database Truth).
    • Connect the "Pass" signal to the database update query.