Using Microsoft GraphRAG locally by LLM

Yes, you can configure Microsoft GraphRAG to run entirely locally using a tool like Ollama (or LM Studio) to serve the LLM and embedding models.

The "trick" is to configure GraphRAG to think it is talking to OpenAI, but point it to your local server instead.

Prerequisites¶

Install Ollama: Download from ollama.com.

Pull Your Models: Run these commands in your terminal to get an LLM and an embedding model.

ollama pull llama3.2  # The LLM (Brain)
ollama pull nomic-embed-text  # The Embedding Model (Memory)

Install GraphRAG:
```
pip install graphrag
```

Step-by-Step Configuration¶

1. Initialize Your Project¶

Create a folder for your project and run the initialization command.

mkdir -p ./ragtest/input
# (Put your .txt files inside ./ragtest/input now)
python -m graphrag.index --init --root ./ragtest

This creates two critical files in ./ragtest: .env and settings.yaml.

2. Edit `settings.yaml`¶

Open ./ragtest/settings.yaml and modify the LLM and Embeddings sections to point to your local Ollama instance.

Important: Ollama's OpenAI-compatible endpoint is typically http://localhost:11434/v1.

encoding_model: cl100k_base
skip_workflows: []

llm:
  api_key: ollama  # Can be any string, but must be present
  type: openai_chat # GraphRAG thinks it's OpenAI
  model: llama3.2 # Must match your ollama model name exactly
  model_supports_json: true # Llama 3+ supports JSON mode
  # max_tokens: 4000 
  # request_timeout: 180.0
  api_base: http://localhost:11434/v1 # Point to local Ollama

parallelization:
  stagger: 0.3
  # num_threads: 50 # Reduce this if your PC lags (e.g., set to 4 or 8)

embeddings:
  async_mode: threaded # or asyncio
  llm:
    api_key: ollama
    type: openai_embedding # GraphRAG thinks it's OpenAI
    model: nomic-embed-text # Must match your ollama model name
    api_base: http://localhost:11434/v1

3. Run the Indexing¶

This is the heavy lifting where GraphRAG reads your documents and builds the knowledge graph.

python -m graphrag.index --root ./ragtest

Note: This process is much slower than standard RAG. On a local machine, indexing even a small book can take 10-20 minutes depending on your GPU.

4. Query Your Data¶

Once indexing finishes (you'll see a success message), you can ask questions.

Global Search (High-level themes):

python -m graphrag.query \
--root ./ragtest \
--method global \
"What are the main themes in this text?"

Local Search (Specific details):

python -m graphrag.query \
--root ./ragtest \
--method local \
"Who is the character John and what did he do?"

Troubleshooting Common Issues¶

Context Window Errors: If you see errors about "context length," your local model might have a smaller context window than the default OpenAI settings. In settings.yaml, try reducing the chunks: size (default is 1200) to something like 1024 or 512.
JSON Errors: If the indexing fails during the "Entity Extraction" phase, it usually means the local model isn't outputting valid JSON. Ensure you are using a smart model (like llama3 or mistral) and model_supports_json: true is set.
Performance: GraphRAG makes thousands of API calls. If your computer freezes, reduce num_threads in settings.yaml to 4 or 2.