Execution Detail Plan

This is an exceptionally high-quality architectural specification. The PDF correctly identifies that the "hard part" of this app isn't transcription (which is a solved problem), but Contextual Continuity—the ability of the AI to "remember" that you talked about your high school crush in Session 1 when you are discussing your wedding in Session 10.

By choosing Gemini 1.5 Pro for direct audio ingestion (Multimodal), you are skipping the error-prone "Speech-to-Text" layer entirely. This is the bleeding edge of current AI capability.

Here is your Executable Plan to build this on Windows 11 using WSL (Ubuntu). This setup allows you to keep your heavy Android tools on Windows while running your backend/AI logic in the fast Linux environment.

Part 1: The "Hybrid" Development Environment (Win11 + WSL)¶

Developing mobile apps in WSL has one specific pain point: USB Debugging. WSL cannot easily "see" the Android phone plugged into your USB port. We will use a TCP bridge to solve this.

Step 1.1: Windows Side (Host)¶

Install Android Studio: Install it on Windows (not WSL). Install the Android SDK and Emulator.
Install Flutter: Install Flutter on Windows and add it to your path.
Enable ADB Bridge:
- Plug in your Android phone (or start an emulator).
- Open PowerShell and run: adb tcpip 5555
- This opens a port so WSL can talk to the phone.

Step 1.2: WSL Side (The Brain)¶

Open your Ubuntu terminal in WSL.

Install Dependencies:

sudo apt update && sudo apt install -y curl git unzip xz-utils zip libglu1-mesa

Install Flutter (in WSL):
- Download the Flutter Linux tarball and extract it to ~/development/flutter.
- Add to .bashrc: export PATH="$PATH:$HOME/development/flutter/bin"

Install Node.js (for Firebase Functions):

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
source ~/.bashrc
nvm install 20

Install Firebase Tools:

npm install -g firebase-tools
firebase login --no-localhost

Connect to Windows ADB:
- In WSL, run: adb connect $(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):5555
- Run flutter devices. You should now see your Windows-connected phone inside Linux!

Part 2: Project Initialization (The Scaffold)¶

We will follow the PDF's structure: Flutter Client + Firebase Backend.

Create the Project:

mkdir autobiography_app && cd autobiography_app
flutter create . --org com.yourname.biography

Initialize Firebase:

firebase init
# Select: Firestore, Functions, Storage, Emulators
# Language: TypeScript
# Project: Create a new project (or select existing)

Add PDF-Recommended Dependencies: Inside pubspec.yaml:

dependencies:
  firebase_core: ^latest
  firebase_auth: ^latest
  cloud_firestore: ^latest
  firebase_storage: ^latest
  cloud_functions: ^latest
  flutter_sound: ^latest  # For audio capture
  path_provider: ^latest  # For local file storage
  permission_handler: ^latest
  uuid: ^latest

Part 3: Phase 1 Implementation (The Capture Engine)¶

This is the "MVP" described in section 4 of your PDF.

Step 3.1: The Recorder (Flutter)¶

Location: lib/services/audio_recorder.dart

We need to record AAC/M4A as specified to save bandwidth while maintaining quality for the AI.

import 'package:flutter_sound/flutter_sound.dart';
import 'package:path_provider/path_provider.dart';
import 'dart:io';

class AudioRecorderService {
  final FlutterSoundRecorder _recorder = FlutterSoundRecorder();

  Future<void> init() async {
    await _recorder.openRecorder();
  }

  Future<String> startRecording() async {
    final dir = await getTemporaryDirectory();
    // Unique ID for filename
    String fileName = 'session_${DateTime.now().millisecondsSinceEpoch}.m4a';
    String path = '${dir.path}/$fileName';

    // 64kbps AAC as per PDF spec
    await _recorder.startRecorder(
      toFile: path,
      codec: Codec.aacADTS, 
      bitRate: 64000, 
      sampleRate: 16000, 
    );
    return path;
  }

  Future<String?> stopRecording() async {
    return await _recorder.stopRecorder();
  }
}

Step 3.2: The Uploader (Flutter)¶

Location: lib/services/upload_service.dart

We upload to a specific path that the Cloud Function watches: users/{uid}/raw_audio/{file}.

import 'package:firebase_storage/firebase_storage.dart';
import 'package:firebase_auth/firebase_auth.dart';
import 'dart:io';

Future<void> uploadAudio(String filePath) async {
  File file = File(filePath);
  String uid = FirebaseAuth.instance.currentUser!.uid;
  String fileName = filePath.split('/').last;

  // Resumable upload for long sessions
  final ref = FirebaseStorage.instance.ref().child('users/$uid/raw_audio/$fileName');
  UploadTask task = ref.putFile(file, SettableMetadata(contentType: 'audio/x-m4a'));

  task.snapshotEvents.listen((event) {
    print('Progress: ${(event.bytesTransferred / event.totalBytes) * 100} %');
  });

  await task;
}

Step 3.3: The "Brain" (Cloud Functions + Vertex AI)¶

Location: functions/src/index.ts

This is the most critical code. It triggers when audio lands, sends it to Gemini 1.5 Pro, and saves the structured JSON to Firestore.

Note: You must enable the "Vertex AI API" in your Google Cloud Console for this to work.

import * as v2 from "firebase-functions/v2";
import * as admin from "firebase-admin";
import { VertexAI } from "@google-cloud/vertexai";

admin.initializeApp();
const db = admin.firestore();

// Initialize Vertex AI
const vertexAI = new VertexAI({ project: process.env.GCLOUD_PROJECT, location: "us-central1" });
const model = vertexAI.getGenerativeModel({ model: "gemini-1.5-pro-preview-0409" });

export const processAudio = v2.storage.onObjectFinalized(
  { timeoutSeconds: 3600, memory: "2GiB" }, // 60 min timeout for long audio
  async (event) => {
    const fileBucket = event.data.bucket;
    const filePath = event.data.name; 
    const contentType = event.data.contentType;

    // Only process audio in the raw_audio folder
    if (!filePath || !filePath.includes("raw_audio/") || !contentType?.startsWith("audio/")) {
        return;
    }

    const uid = filePath.split("/")[1]; // Extract UID from path structure

    // Construct the GCS URI (gs://...) required by Gemini
    const gcsUri = `gs://${fileBucket}/${filePath}`;

    const prompt = `
      You are an expert biographer. Listen to this audio file.
      1. Transcribe the audio verbatim.
      2. Identify the specific time period discussed (e.g., "High School", "1990s").
      3. Extract key entities (People, Places).
      4. Detect the emotional tone.

      Return ONLY valid JSON in this format:
      {
        "transcript": "string",
        "summary": "string",
        "timePeriod": "string",
        "entities": ["string"],
        "emotion": "string"
      }
    `;

    // Multimodal Call: Text Prompt + Audio File URI
    const result = await model.generateContent([
      { fileData: { mimeType: contentType, fileUri: gcsUri } },
      { text: prompt }
    ]);

    const responseText = result.response.candidates[0].content.parts[0].text;

    // Parse JSON (Add error handling for production)
    // Gemini often wraps JSON in ```json ... ```, so we clean that.
    const cleanJson = responseText?.replace(/```json|```/g, "").trim();
    const data = JSON.parse(cleanJson || "{}");

    // Write to Firestore "Memories" collection
    await db.collection(`users/${uid}/memories`).add({
      ...data,
      audioRef: gcsUri,
      createdAt: admin.firestore.FieldValue.serverTimestamp(),
      processed: true
    });
  }
);

Part 4: The Data Structure (Firestore)¶

Your PDF (Page 9) outlines the schema perfectly. To support the Timeline View and Vector Search, you need to configure indexes in Google Cloud Console.

Composite Index (for Timeline):
- Collection: memories
- Fields: userId (Ascending), estimatedDate (Ascending)
- Why? Allows where('userId', '==', me).orderBy('estimatedDate').

Vector Index (for RAG):

This is done via command line (gcloud) as shown in the PDF.

Run this in your WSL terminal:

gcloud firestore indexes composite create \
--collection-group=memories \
--query-scope=COLLECTION \
--field-config field-path=embedding,vector-config='{"dimension":768, "flat": "{}"}'

Part 5: Suggestion & Expansion on the "Writing" Process¶

The PDF covers Phase 2 (RAG) well, but here is a specific suggestion for the User Interface (UI) during the "Writing" phase to make it feel magical.

The "Gap Detector" UI¶

Don't just show a list of recordings. Create a visual Life Map.

The Logic: When the app loads, fetch all memories. Calculate the time range (e.g., 1980 - 2024). Divide the range into "Eras" (Childhood, Teens, 20s).
The Visual: If the user has 5 recordings from "Teens" but 0 from "20s", the app should render a "Fog of War" over the 20s section.
The Interaction: When the user taps the foggy "20s" section, the AI (using the RAG context of the previous era) prompts: > "We know you graduated in 1998. What happened immediately after that? Did you move?"

Immediate Next Steps for You¶

Environment: Set up the WSL-to-Windows ADB bridge. This is the #1 blocker for this specific setup.
Authentication: Enable "Anonymous Auth" in Firebase Console so you can test recording without building a login screen first.
Deploy Functions: Run firebase deploy --only functions from WSL and check the Google Cloud logs to ensure the function starts correctly.
Test: Record a 30-second clip on the phone, upload it, and watch the Firestore database populate with the JSON extracted by Gemini.

This plan moves you from "PDF Theory" to "Working Prototype" in about 4 hours of work.