Active Listener Agent

This is the most powerful application of Generative AI in this project. You are essentially building an "Active Listener" agent.

Instead of just recording what you say, the AI reviews a batch of your stories (e.g., after you finish talking about "Childhood") and says: "Wait, you mentioned X and Y, but they contradict each other. Can you explain that?"

Here is the implementation plan for a "Critical Review Mode" using Gemini 1.5 Pro’s large context window.

1. The Workflow: "The Sunday Review"¶

We don't want to bombard the user with questions while they are speaking. We want a dedicated Review Phase.

Selection: The user (or the app) selects a batch of memories (e.g., "All recordings from last week" or "All memories tagged 'High School'").
Ingestion: The Cloud Function pulls the full transcripts of these memories.
Analysis: Gemini 1.5 Pro reads all of them at once (leveraging its massive context window) to build a mental model of the narrative.
Generation: It outputs 3 "Critical Questions" based on synthesis, not just keywords.
Response: The user taps a question and records a new "Answer" memory.

2. The Cloud Function (The Brain)¶

This function takes a list of memoryIds, retrieves their text, and asks Gemini to analyze the collection as a whole.

File: functions/src/index.ts

import { onCall } from "firebase-functions/v2/https";
import * as admin from "firebase-admin";
import { VertexAI } from "@google-cloud/vertexai";

const db = admin.firestore();
const vertexAI = new VertexAI({ project: process.env.GCLOUD_PROJECT, location: "us-central1" });
// Use Gemini 1.5 Pro because we need high-level reasoning, not just speed.
const model = vertexAI.getGenerativeModel({ model: "gemini-1.5-pro-preview-0409" });

export const generateCriticalReview = onCall(async (request) => {
    const { memoryIds } = request.data;
    const uid = request.auth.uid;

    // 1. Fetch the full transcripts for the selected memories
    let fullNarrative = "";
    for (const id of memoryIds) {
        const doc = await db.doc(`users/${uid}/memories/${id}`).get();
        const data = doc.data();
        fullNarrative += `\n[Date: ${data.estimatedDate}] ${data.transcript}\n`;
    }

    // 2. The "Investigative Journalist" Prompt
    const prompt = `
        You are a professional biographer and psychologist reviewing a draft of an autobiography.
        Here is the raw transcript text from the user's recent sessions:

        <TRANSCRIPT_START>
        ${fullNarrative}
        <TRANSCRIPT_END>

        Your Goal: Identify gaps, contradictions, or emotional avoidance.

        Task: Generate 3 "Critical Questions" to ask the user.

        Criteria for Questions:
        1. **Connect the Dots:** Link two events that seem unrelated. (e.g., "You mentioned being lonely in 1990, but also said you had a busy social life in college. Did you feel lonely *in the crowd*?")
        2. **Probe Avoidance:** Identify topics the user grazed over but didn't explore. (e.g., "You talked about your father's job, but never about how he felt about it. Was he happy?")
        3. **Challenge Logic:** Point out inconsistencies gently.

        Return JSON format:
        [
            { "question": "...", "reasoning": "I asked this because..." },
            { "question": "...", "reasoning": "..." }
        ]
    `;

    // 3. Generate
    const result = await model.generateContent(prompt);
    const responseText = result.response.candidates[0].content.parts[0].text;

    // Clean and Parse JSON
    const cleanJson = responseText.replace(/```json|```/g, "").trim();
    const questions = JSON.parse(cleanJson);

    // 4. Save to Firestore "Inbox"
    const batch = db.batch();
    questions.forEach((q: any) => {
        const ref = db.collection(`users/${uid}/pending_questions`).doc();
        batch.set(ref, {
            text: q.question,
            reasoning: q.reasoning,
            createdAt: admin.firestore.FieldValue.serverTimestamp(),
            status: "unanswered",
            sourceMemoryIds: memoryIds // Link back to original memories
        });
    });
    await batch.commit();

    return { success: true };
});

3. The Prompt Strategy: "The 3 Levels of Criticism"¶

To make the questions "smart," the prompt above uses specific techniques. You can refine the prompt to instruct Gemini to look for these specific patterns:

The "Surface vs. Depth" Check:
- Instruction: "Look for moments where the user describes what happened but not how it felt. Ask them to revisit that moment focusing purely on emotion."
The "Character Arc" Check:
- Instruction: "Compare the user's behavior in early memories vs. later memories. Ask: 'You reacted with anger in story A, but with patience in story B. What changed in you between those years?'"
The "Missing Person" Check:
- Instruction: "Identify people who are mentioned frequently by name but have no described personality. Ask: 'You mention Uncle Joe often, but I don't know what he was like. Describe his voice or mannerisms.'"

4. The UI Implementation (iPhone)¶

How does the user see this?

A. The "Review" Button In your memory list (timeline), allow the user to long-press and select multiple items. * Action: "Analyze These Memories." * UI: Show a loading spinner (Gemini 1.5 Pro takes ~5-10 seconds for deep analysis).

B. The "Biographer's Inbox" Create a new screen called "Pending Questions." * Card UI: * Main Text: "You talked about your father's job, but never about how he felt about it. Was he happy?" * Subtext (The "Smart" part): "AI Reasoning: You mentioned his job 4 times in the last session but never described his mood." * Button: "Answer Now."

C. Answering When they tap "Answer Now," it opens the standard Audio Recorder. * Crucial Data Link: When saving this new recording, add a field responseToQuestionId: "xyz". * Why? This allows the AI to later know that this specific story was an answer to that specific prompt, creating a thread.

5. Why Gemini 1.5 Pro is Critical Here¶

Standard "Chat with PDF" apps use RAG (chopping text into small chunks). That is bad for biography. * RAG Limitation: If you chop a story into chunks, the AI loses the narrative arc. It might see "I failed math" in chunk A and "I became an engineer" in chunk B, but miss the connection. * Gemini 1.5 Pro: You can feed 1 hour of transcript (approx 10,000 words) into the context window whole. The AI sees the entire narrative arc at once. This allows it to make "Smart" connections that RAG would miss.

Cost Note¶

Using Gemini 1.5 Pro for this "Batch Review" is more expensive than the "Flash" model. * Strategy: Don't do this automatically after every recording. * Strategy: Make it a specific user action: "Generate Interview Questions for this Chapter." This makes the cost deliberate and valuable to the user.