Using Microsoft GraphRAG locally by LLM
Yes, you can configure Microsoft GraphRAG to run entirely locally using a tool like Ollama (or LM Studio) to serve the LLM and embedding models.
The "trick" is to configure GraphRAG to think it is talking to OpenAI, but point it to your local server instead.
Prerequisites¶
- Install Ollama: Download from ollama.com.
- Pull Your Models: Run these commands in your terminal to get an LLM and an embedding model.
- Install GraphRAG:
Step-by-Step Configuration¶
1. Initialize Your Project¶
Create a folder for your project and run the initialization command.
mkdir -p ./ragtest/input
# (Put your .txt files inside ./ragtest/input now)
python -m graphrag.index --init --root ./ragtest
./ragtest: .env and settings.yaml.
2. Edit settings.yaml¶
Open ./ragtest/settings.yaml and modify the LLM and Embeddings sections to point to your local Ollama instance.
Important: Ollama's OpenAI-compatible endpoint is typically http://localhost:11434/v1.
encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ollama # Can be any string, but must be present
type: openai_chat # GraphRAG thinks it's OpenAI
model: llama3.2 # Must match your ollama model name exactly
model_supports_json: true # Llama 3+ supports JSON mode
# max_tokens: 4000
# request_timeout: 180.0
api_base: http://localhost:11434/v1 # Point to local Ollama
parallelization:
stagger: 0.3
# num_threads: 50 # Reduce this if your PC lags (e.g., set to 4 or 8)
embeddings:
async_mode: threaded # or asyncio
llm:
api_key: ollama
type: openai_embedding # GraphRAG thinks it's OpenAI
model: nomic-embed-text # Must match your ollama model name
api_base: http://localhost:11434/v1
3. Run the Indexing¶
This is the heavy lifting where GraphRAG reads your documents and builds the knowledge graph.
Note: This process is much slower than standard RAG. On a local machine, indexing even a small book can take 10-20 minutes depending on your GPU.4. Query Your Data¶
Once indexing finishes (you'll see a success message), you can ask questions.
Global Search (High-level themes):
python -m graphrag.query \
--root ./ragtest \
--method global \
"What are the main themes in this text?"
Local Search (Specific details):
python -m graphrag.query \
--root ./ragtest \
--method local \
"Who is the character John and what did he do?"
Troubleshooting Common Issues¶
- Context Window Errors: If you see errors about "context length," your local model might have a smaller context window than the default OpenAI settings. In
settings.yaml, try reducing thechunks: size(default is 1200) to something like1024or512. - JSON Errors: If the indexing fails during the "Entity Extraction" phase, it usually means the local model isn't outputting valid JSON. Ensure you are using a smart model (like
llama3ormistral) andmodel_supports_json: trueis set. - Performance: GraphRAG makes thousands of API calls. If your computer freezes, reduce
num_threadsinsettings.yamlto4or2.