跳转至

This is the revised Project Task Book optimized for the MacBook + iPhone ecosystem.

Using a Mac simplifies the hardware engineering significantly, allowing you to focus purely on the AI and App logic. This plan assumes you are using macOS Sequoia (or later) and Xcode 17+ (standard for 2026).


🍎 Project Task Book: The AI Autobiography Agent (iOS Native)

Project Goal: Create a voice-first, AI-driven iOS app that interviews the user, structures their life story, and autonomously writes a biography. Tech Stack: Flutter (iOS), Firebase (Functions, Firestore, Storage), Vertex AI (Gemini Hybrid Strategy).


🟢 Phase 0: The Apple Ecosystem Setup (Days 1-2)

Objective: Configure the Mac for professional iOS development and connect the Cloud backend.

0.1 Development Environment

  • Install Xcode: Download from App Store. Open it to install command-line tools.
  • Install Homebrew: Run the standard curl script in Terminal.
  • Install Flutter & Tools:
    brew install --cask flutter
    brew install cocoapods
    gem install cocoapods # Ruby gem is often needed for iOS linking
    
  • Verify Setup: Run flutter doctor. Fix any signing issues in Xcode.
  • Physical Device Setup: Enable "Developer Mode" on your iPhone (Settings > Privacy & Security). Connect to Mac via cable once to trust the computer.

0.2 Cloud Architecture (Firebase Console)

  • Create Project: Initialize a new Firebase project (Blaze Plan required).
  • Enable Google Cloud APIs:
    • Vertex AI API
    • Cloud Storage API
  • Database Setup: Initialize Firestore.
    • Composite Index: Collection memories -> Fields: userId (Asc) + estimatedDate (Asc).
    • Vector Index: Collection memories -> Field: embedding (Vector, 768 dims).

🟡 Phase 1: The iOS Capture Engine (MVP) (Days 3-6)

Objective: Record high-fidelity audio on iPhone, upload to Cloud, and process with Gemini.

1.1 iOS Permissions & Configuration

  • Info.plist Configuration: Add keys for NSMicrophoneUsageDescription and UIBackgroundModes (audio).
  • Runner Setup: In Xcode, go to Signing & Capabilities -> Add Capability -> "Background Modes" -> Check "Audio, AirPlay, and PiP".

1.2 Flutter Recorder Logic

  • Dependency: flutter_sound and permission_handler.
  • Audio Session: Configure the iOS Audio Session to mode: measurement or spokenAudio to disable auto-gain if you want raw input, or speech to let iOS clean up noise.
  • Codec: Hardcode Codec.aacMP4 (Native iOS format).
  • Upload Logic: Use firebase_storage with putFile and retry logic for resiliency.

1.3 The "Fast" AI Layer (Gemini 2.0 Flash)

  • Cloud Function: onObjectFinalized trigger.
  • Model: gemini-2.0-flash-exp (Low latency, low cost).
  • Prompt: "Transcribe verbatim. Extract: Sentiment, People, Date, Location. Return JSON."
  • Test: Record on iPhone -> Check Firestore for JSON.

🟠 Phase 2: The Visual Timeline & Search (Days 7-12)

Objective: A scrollable "Life Map" that identifies gaps in your history.

2.1 The Timeline UI

  • Data Fetching: Query memories ordered by Date.
  • Visuals: Use a custom Painter or timelines_plus. Draw a line connecting nodes.
  • The "Fog of War":
    • Logic: if (dateB - dateA) > 2 years -> Render a blurry "Missing Era" block.
    • Action: Tapping the block triggers the "Gap Interviewer" (Phase 3).

2.2 Vector Memory (The Brain)

  • Embedding Trigger: onCreate function for Firestore.
  • Model: Vertex AI text-embedding-004.
  • Action: Convert the summary (not the whole transcript) into a vector. Store in embedding field.

🔴 Phase 3: The "Active Listener" Agent (Days 13-19)

Objective: The App wakes up and asks YOU questions.

3.1 The "Shake to Ask" (Quick Question)

  • Sensor Integration: Use sensors_plus. Detect Shake event.
  • Logic:
    1. Pick Random Theme (e.g., "First Love").
    2. Vector Search: "Did user talk about First Love?"
    3. Gemini 2.0 Flash generates a question based on the missing info.
  • TTS: Use an AI Voice API (Google or ElevenLabs) to speak the question.

3.2 The "Deep Review" (Critical Analysis)

  • UI: "Review my Childhood" button.
  • Logic: Aggregate all transcripts from that era.
  • Model: Gemini 3 Pro (The "Reasoning" Model).
  • Prompt: "Analyze these 20 pages of transcript. Find the psychological contradictions. Where is the user lying to themselves? Generate 3 hard questions."
  • Output: Save to a "Pending Questions" inbox.

🟣 Phase 4: The Book Writer (Days 20-25)

Objective: Transform oral history into written prose.

4.1 Narrative Synthesis

  • UI: "Generate Chapter 1".
  • Context Caching: (Crucial Step) Upload the user's "Character Sheet" and "Glossary" to Vertex AI Context Cache to save money.
  • Model: Gemini 3 Pro.
  • Prompt: "Write a 3,000-word chapter based on these specific memories. Use a [User Selected Style] tone. Use the first person."

4.2 PDF Export

  • Layout: Use pdf package. Add Title Page, Table of Contents.
  • Review: Allow user to edit the text in-app before finalizing (Simple Text Editor).

🏁 Phase 5: Polish & Deployment (Days 26-30)

  • On-Device Testing: Run the app unplugged. Test walking, background recording, and interruption handling (phone calls).
  • Biometrics: Add FaceID (local_auth) to lock the diary.
  • TestFlight: Archive the build in Xcode and upload to App Store Connect (TestFlight) to send to close friends/family for beta testing.

💡 Critical "Missing Info" & Suggestions

Since you are using the Apple ecosystem, you have access to specific tools that can enhance this project. Here is what you should consider adding:

1. The "Photo Memory" Feature (Multimodal)

  • The Idea: Sometimes you can't describe a memory, but you have a photo of it.
  • iOS Implementation: Allow the user to upload a photo from their Camera Roll.
  • AI Action: Send the Image + Audio to Gemini 2.0 Flash.
    • Prompt: "The user is holding this photo and talking about it. Combine the visual details in the photo with their story."
    • Result: The AI writes: "I held the faded photo of the 1998 Ford Escort. You could see the rust on the bumper..." (It sees what you didn't say).

2. Apple Watch Companion

  • The Idea: The best dictation tool is the one on your wrist.
  • Implementation: A simple WatchOS app with a big "Record" button.
  • Sync: It records locally on the watch, then transfers the file to the iPhone when connected, which then uploads to Cloud.
  • Why: "Walking and talking" is the most natural way to do oral history.

3. Local "On-Device" AI (Privacy)

  • The Idea: In 2026, iPhones have powerful NPUs (Neural Processing Units).
  • Implementation: You could use Google MediaPipe for LLMs (running a small Gemma 2 model on the phone).
  • Use Case: Use the on-device model to generate the "Quick Titles" or "Tags" for memories instantly, without waiting for the Cloud. This makes the app feel snappier.

4. "Glossary" Management

  • The Problem: Gemini will misspell names. "Kaitlyn" vs "Caitlin".
  • Solution: Add a settings page: "My People & Places."
    • Data: User adds: Mom = Sarah, Hometown = Poughkeepsie.
    • Injection: Pass this JSON map into the System Instructions of every Gemini call.

5. Background Audio "Keep-Alive"

  • The Risk: iOS hates background processes. If you stop talking for 15 minutes but leave the recorder on, iOS might kill the app to save battery.
  • Solution: Implement a "Silent Audio Loop." Play a 0-volume sound file in the loop while recording. This tricks iOS into thinking the app is a music player, keeping it alive indefinitely in the background.****