跳转至

2 Pass FalkorDB Verification

The best way to verify that your FalkorDB is preserving the original meaning (and not just hallucinating generic facts) is to run a "Citational Fidelity Test."

You don't just want the LLM to answer the question; you want it to prove it read the specific text chunk from the database.

Here is the strategy and the Python script to run this test using your 35B/32B model (e.g., Qwen 32B or Command R).

The Strategy: "The Closed-Book Loop"

  1. Select a Random Truth: Pull a random (:Chunk) of raw text directly from FalkorDB. This is your "Ground Truth."
  2. Generate a Question: Use an LLM to generate a specific question based only on that chunk.
  3. The Test: Feed that question into "The Professor" (your RAG pipeline).
  4. The Comparison: Check if "The Professor" retrieves that exact chunk and uses its specific phrasing.

The Python Test Script

You can ask Google Antigravity to generate this script for you. It automates the verification process.

File: test_fidelity.py

import random
from falkordblite import FalkorDB
from ollama import Client

client = Client()
db = FalkorDB(dir='./my_db_data')
graph = db.select_graph('mind_palace')

def run_fidelity_test():
    print("--- STARTING FIDELITY TEST ---")

    # 1. GET GROUND TRUTH
    # Grab a random Chunk of text from the DB
    query = """
    MATCH (c:Chunk) 
    RETURN c.text, id(c) 
    ORDER BY rand() LIMIT 1
    """
    result = graph.query(query)
    ground_truth_text = result[0][0]
    chunk_id = result[0][1]

    print(f"\n[GROUND TRUTH CHUNK]:\n{ground_truth_text[:200]}...")

    # 2. GENERATE QUESTION (The Probe)
    # We ask the LLM to create a question that can ONLY be answered by this text
    probe_prompt = f"""
    Read this text. Generate a specific, difficult question about a detail mentioned here.
    Do not ask a general question. Ask about a specific relationship or definition.
    TEXT: {ground_truth_text}
    """
    question = client.generate(model='qwen2.5:32b', prompt=probe_prompt)['response']
    print(f"\n[GENERATED QUESTION]: {question}")

    # 3. RUN THE PROFESSOR (The Retrieval)
    # This simulates your App's logic (Vector + Graph Search)

    # A. Generate Vector for the question
    # B. Query FalkorDB for the nearest Chunk (Simulated here)
    embedding = client.embeddings(model='nomic-embed-text', prompt=question)['embedding']

    rag_query = """
    CALL db.idx.vector.queryNodes('Chunk', 'embedding', 1, $vec) 
    YIELD node, score
    RETURN node.text, score
    """
    rag_result = graph.query(rag_query, params={'vec': embedding})

    retrieved_text = rag_result[0][0]
    similarity_score = rag_result[0][1]

    print(f"\n[RETRIEVED TEXT]:\n{retrieved_text[:200]}...")

    # 4. COMPARE (The Verdict)
    if retrieved_text == ground_truth_text:
        print("\n✅ SUCCESS: The System retrieved the EXACT source chunk.")
    else:
        print("\n❌ FAILURE: The System retrieved a different chunk.")

    # 5. GENERATE ANSWER
    answer_prompt = f"""
    Answer the question based ONLY on the Retrieved Text.
    Question: {question}
    Retrieved Text: {retrieved_text}
    """
    final_answer = client.generate(model='qwen2.5:32b', prompt=answer_prompt)['response']
    print(f"\n[PROFESSOR ANSWER]:\n{final_answer}")

if __name__ == "__main__":
    run_fidelity_test()

How to Analyze the Results

When you run this script, look for these three signs of quality:

1. The Retrieval Match (Binary Check)

  • Success: The [RETRIEVED TEXT] matches the [GROUND TRUTH CHUNK].
  • Fail: If it retrieved a different chunk, your Embedding Model (nomic-embed) might be struggling, or your chunks are too small/similar.

2. The Vocabulary Check (Stylistic Check)

Compare the Professor Answer to the Ground Truth. * Original Book: "The phenomenon acts as a pervasive dampener on economic velocity." * Bad AI Answer: "It slows down the economy." (Genericized). * Good AI Answer: "The author describes it as a 'pervasive dampener' on economic velocity." (Fidelity).

If the AI is genericizing: You need to adjust your System Prompt in prompts.py. * Change: "Explain this concept." * To: "Explain this concept using the author's exact terminology. Do not paraphrase unique definitions."

3. The "Hallucination" Check

Does the AI bring in outside info? * Question: "What caused the crisis?" * Book: "Bad weather." * AI Answer: "Bad weather and poor government policy." * Verdict: Fail. The AI added "Government Policy" from its own training data. You must strictly prompt it: "Answer using ONLY the provided context."

Summary

This test script confirms that your 2-Pass Pipeline (7B Miner -> 72B Manager) actually linked the concepts back to the raw text correctly. If the Vector Search hits the correct Chunk, your database is solid.