2 Pass FalkorDB Verification
The best way to verify that your FalkorDB is preserving the original meaning (and not just hallucinating generic facts) is to run a "Citational Fidelity Test."
You don't just want the LLM to answer the question; you want it to prove it read the specific text chunk from the database.
Here is the strategy and the Python script to run this test using your 35B/32B model (e.g., Qwen 32B or Command R).
The Strategy: "The Closed-Book Loop"¶
- Select a Random Truth: Pull a random
(:Chunk)of raw text directly from FalkorDB. This is your "Ground Truth." - Generate a Question: Use an LLM to generate a specific question based only on that chunk.
- The Test: Feed that question into "The Professor" (your RAG pipeline).
- The Comparison: Check if "The Professor" retrieves that exact chunk and uses its specific phrasing.
The Python Test Script¶
You can ask Google Antigravity to generate this script for you. It automates the verification process.
File: test_fidelity.py
import random
from falkordblite import FalkorDB
from ollama import Client
client = Client()
db = FalkorDB(dir='./my_db_data')
graph = db.select_graph('mind_palace')
def run_fidelity_test():
print("--- STARTING FIDELITY TEST ---")
# 1. GET GROUND TRUTH
# Grab a random Chunk of text from the DB
query = """
MATCH (c:Chunk)
RETURN c.text, id(c)
ORDER BY rand() LIMIT 1
"""
result = graph.query(query)
ground_truth_text = result[0][0]
chunk_id = result[0][1]
print(f"\n[GROUND TRUTH CHUNK]:\n{ground_truth_text[:200]}...")
# 2. GENERATE QUESTION (The Probe)
# We ask the LLM to create a question that can ONLY be answered by this text
probe_prompt = f"""
Read this text. Generate a specific, difficult question about a detail mentioned here.
Do not ask a general question. Ask about a specific relationship or definition.
TEXT: {ground_truth_text}
"""
question = client.generate(model='qwen2.5:32b', prompt=probe_prompt)['response']
print(f"\n[GENERATED QUESTION]: {question}")
# 3. RUN THE PROFESSOR (The Retrieval)
# This simulates your App's logic (Vector + Graph Search)
# A. Generate Vector for the question
# B. Query FalkorDB for the nearest Chunk (Simulated here)
embedding = client.embeddings(model='nomic-embed-text', prompt=question)['embedding']
rag_query = """
CALL db.idx.vector.queryNodes('Chunk', 'embedding', 1, $vec)
YIELD node, score
RETURN node.text, score
"""
rag_result = graph.query(rag_query, params={'vec': embedding})
retrieved_text = rag_result[0][0]
similarity_score = rag_result[0][1]
print(f"\n[RETRIEVED TEXT]:\n{retrieved_text[:200]}...")
# 4. COMPARE (The Verdict)
if retrieved_text == ground_truth_text:
print("\n✅ SUCCESS: The System retrieved the EXACT source chunk.")
else:
print("\n❌ FAILURE: The System retrieved a different chunk.")
# 5. GENERATE ANSWER
answer_prompt = f"""
Answer the question based ONLY on the Retrieved Text.
Question: {question}
Retrieved Text: {retrieved_text}
"""
final_answer = client.generate(model='qwen2.5:32b', prompt=answer_prompt)['response']
print(f"\n[PROFESSOR ANSWER]:\n{final_answer}")
if __name__ == "__main__":
run_fidelity_test()
How to Analyze the Results¶
When you run this script, look for these three signs of quality:
1. The Retrieval Match (Binary Check)¶
- Success: The
[RETRIEVED TEXT]matches the[GROUND TRUTH CHUNK]. - Fail: If it retrieved a different chunk, your Embedding Model (nomic-embed) might be struggling, or your chunks are too small/similar.
2. The Vocabulary Check (Stylistic Check)¶
Compare the Professor Answer to the Ground Truth. * Original Book: "The phenomenon acts as a pervasive dampener on economic velocity." * Bad AI Answer: "It slows down the economy." (Genericized). * Good AI Answer: "The author describes it as a 'pervasive dampener' on economic velocity." (Fidelity).
If the AI is genericizing:
You need to adjust your System Prompt in prompts.py.
* Change: "Explain this concept."
* To: "Explain this concept using the author's exact terminology. Do not paraphrase unique definitions."
3. The "Hallucination" Check¶
Does the AI bring in outside info? * Question: "What caused the crisis?" * Book: "Bad weather." * AI Answer: "Bad weather and poor government policy." * Verdict: Fail. The AI added "Government Policy" from its own training data. You must strictly prompt it: "Answer using ONLY the provided context."
Summary¶
This test script confirms that your 2-Pass Pipeline (7B Miner -> 72B Manager) actually linked the concepts back to the raw text correctly. If the Vector Search hits the correct Chunk, your database is solid.