This is the revised Project Task Book optimized for the MacBook + iPhone ecosystem.
Using a Mac simplifies the hardware engineering significantly, allowing you to focus purely on the AI and App logic. This plan assumes you are using macOS Sequoia (or later) and Xcode 17+ (standard for 2026).
🍎 Project Task Book: The AI Autobiography Agent (iOS Native)¶
Project Goal: Create a voice-first, AI-driven iOS app that interviews the user, structures their life story, and autonomously writes a biography. Tech Stack: Flutter (iOS), Firebase (Functions, Firestore, Storage), Vertex AI (Gemini Hybrid Strategy).
🟢 Phase 0: The Apple Ecosystem Setup (Days 1-2)¶
Objective: Configure the Mac for professional iOS development and connect the Cloud backend.
0.1 Development Environment¶
- Install Xcode: Download from App Store. Open it to install command-line tools.
- Install Homebrew: Run the standard curl script in Terminal.
- Install Flutter & Tools:
- Verify Setup: Run
flutter doctor. Fix any signing issues in Xcode. - Physical Device Setup: Enable "Developer Mode" on your iPhone (Settings > Privacy & Security). Connect to Mac via cable once to trust the computer.
0.2 Cloud Architecture (Firebase Console)¶
- Create Project: Initialize a new Firebase project (Blaze Plan required).
- Enable Google Cloud APIs:
- Vertex AI API
- Cloud Storage API
- Database Setup: Initialize Firestore.
- Composite Index: Collection
memories-> Fields:userId(Asc) +estimatedDate(Asc). - Vector Index: Collection
memories-> Field:embedding(Vector, 768 dims).
- Composite Index: Collection
🟡 Phase 1: The iOS Capture Engine (MVP) (Days 3-6)¶
Objective: Record high-fidelity audio on iPhone, upload to Cloud, and process with Gemini.
1.1 iOS Permissions & Configuration¶
- Info.plist Configuration: Add keys for
NSMicrophoneUsageDescriptionandUIBackgroundModes(audio). - Runner Setup: In Xcode, go to Signing & Capabilities -> Add Capability -> "Background Modes" -> Check "Audio, AirPlay, and PiP".
1.2 Flutter Recorder Logic¶
- Dependency:
flutter_soundandpermission_handler. - Audio Session: Configure the iOS Audio Session to
mode: measurementorspokenAudioto disable auto-gain if you want raw input, orspeechto let iOS clean up noise. - Codec: Hardcode
Codec.aacMP4(Native iOS format). - Upload Logic: Use
firebase_storagewithputFileand retry logic for resiliency.
1.3 The "Fast" AI Layer (Gemini 2.0 Flash)¶
- Cloud Function:
onObjectFinalizedtrigger. - Model:
gemini-2.0-flash-exp(Low latency, low cost). - Prompt: "Transcribe verbatim. Extract: Sentiment, People, Date, Location. Return JSON."
- Test: Record on iPhone -> Check Firestore for JSON.
🟠 Phase 2: The Visual Timeline & Search (Days 7-12)¶
Objective: A scrollable "Life Map" that identifies gaps in your history.
2.1 The Timeline UI¶
- Data Fetching: Query
memoriesordered by Date. - Visuals: Use a custom Painter or
timelines_plus. Draw a line connecting nodes. - The "Fog of War":
- Logic:
if (dateB - dateA) > 2 years-> Render a blurry "Missing Era" block. - Action: Tapping the block triggers the "Gap Interviewer" (Phase 3).
- Logic:
2.2 Vector Memory (The Brain)¶
- Embedding Trigger:
onCreatefunction for Firestore. - Model: Vertex AI
text-embedding-004. - Action: Convert the summary (not the whole transcript) into a vector. Store in
embeddingfield.
🔴 Phase 3: The "Active Listener" Agent (Days 13-19)¶
Objective: The App wakes up and asks YOU questions.
3.1 The "Shake to Ask" (Quick Question)¶
- Sensor Integration: Use
sensors_plus. Detect Shake event. - Logic:
- Pick Random Theme (e.g., "First Love").
- Vector Search: "Did user talk about First Love?"
- Gemini 2.0 Flash generates a question based on the missing info.
- TTS: Use an AI Voice API (Google or ElevenLabs) to speak the question.
3.2 The "Deep Review" (Critical Analysis)¶
- UI: "Review my Childhood" button.
- Logic: Aggregate all transcripts from that era.
- Model: Gemini 3 Pro (The "Reasoning" Model).
- Prompt: "Analyze these 20 pages of transcript. Find the psychological contradictions. Where is the user lying to themselves? Generate 3 hard questions."
- Output: Save to a "Pending Questions" inbox.
🟣 Phase 4: The Book Writer (Days 20-25)¶
Objective: Transform oral history into written prose.
4.1 Narrative Synthesis¶
- UI: "Generate Chapter 1".
- Context Caching: (Crucial Step) Upload the user's "Character Sheet" and "Glossary" to Vertex AI Context Cache to save money.
- Model: Gemini 3 Pro.
- Prompt: "Write a 3,000-word chapter based on these specific memories. Use a [User Selected Style] tone. Use the first person."
4.2 PDF Export¶
- Layout: Use
pdfpackage. Add Title Page, Table of Contents. - Review: Allow user to edit the text in-app before finalizing (Simple Text Editor).
🏁 Phase 5: Polish & Deployment (Days 26-30)¶
- On-Device Testing: Run the app unplugged. Test walking, background recording, and interruption handling (phone calls).
- Biometrics: Add FaceID (
local_auth) to lock the diary. - TestFlight: Archive the build in Xcode and upload to App Store Connect (TestFlight) to send to close friends/family for beta testing.
💡 Critical "Missing Info" & Suggestions¶
Since you are using the Apple ecosystem, you have access to specific tools that can enhance this project. Here is what you should consider adding:
1. The "Photo Memory" Feature (Multimodal)¶
- The Idea: Sometimes you can't describe a memory, but you have a photo of it.
- iOS Implementation: Allow the user to upload a photo from their Camera Roll.
- AI Action: Send the Image + Audio to Gemini 2.0 Flash.
- Prompt: "The user is holding this photo and talking about it. Combine the visual details in the photo with their story."
- Result: The AI writes: "I held the faded photo of the 1998 Ford Escort. You could see the rust on the bumper..." (It sees what you didn't say).
2. Apple Watch Companion¶
- The Idea: The best dictation tool is the one on your wrist.
- Implementation: A simple WatchOS app with a big "Record" button.
- Sync: It records locally on the watch, then transfers the file to the iPhone when connected, which then uploads to Cloud.
- Why: "Walking and talking" is the most natural way to do oral history.
3. Local "On-Device" AI (Privacy)¶
- The Idea: In 2026, iPhones have powerful NPUs (Neural Processing Units).
- Implementation: You could use Google MediaPipe for LLMs (running a small Gemma 2 model on the phone).
- Use Case: Use the on-device model to generate the "Quick Titles" or "Tags" for memories instantly, without waiting for the Cloud. This makes the app feel snappier.
4. "Glossary" Management¶
- The Problem: Gemini will misspell names. "Kaitlyn" vs "Caitlin".
- Solution: Add a settings page: "My People & Places."
- Data: User adds:
Mom = Sarah,Hometown = Poughkeepsie. - Injection: Pass this JSON map into the System Instructions of every Gemini call.
- Data: User adds:
5. Background Audio "Keep-Alive"¶
- The Risk: iOS hates background processes. If you stop talking for 15 minutes but leave the recorder on, iOS might kill the app to save battery.
- Solution: Implement a "Silent Audio Loop." Play a 0-volume sound file in the loop while recording. This tricks iOS into thinking the app is a music player, keeping it alive indefinitely in the background.****