Update QWen LLM to Speed up Teaching
Since your app relies on FalkorDB to provide the facts, the LLM does not need to "know" the physics or history itself; it just needs to be smart enough to read the context and follow instructions (teach/grade).
This lowers the intelligence requirement significantly. You do not need the massive 72B model.
The fastest viable model for your M1 Max (64GB) is:
The Winner: Qwen2.5-14B-Instruct¶
- Speed on M1 Max: ~40 to 50 tokens/second.
- Experience: This generates text faster than you can comfortably read. It feels "Instant."
- Size: ~9 GB (at 4-bit quantization).
- Why it works:
- RAG Proficiency: The 14B model is excellent at "Context Adherence." If FalkorDB gives it a chunk of text, it sticks to it without hallucinating.
- Instruction Following: It is smart enough to output the valid JSON needed for your UI (Mermaid diagrams, Grading scores), which smaller models (7B/3B) often mess up.
How to get it:
The "Speed Demon" Alternative: Qwen2.5-7B-Instruct¶
If you want "Video Game Speed" (where the text flies onto the screen), you can drop down to the 7B model.
- Speed on M1 Max: ~80 to 100+ tokens/second.
- Size: ~5 GB.
- The Risk: It is "chattier." It might sometimes ignore your instruction to "be concise" or it might make small syntax errors in the JSON grading.
- Use Case: Use this for the Lectures (Phase 2), but stick to the 14B for the Grading (Phase 3) where logic matters more than speed.
My Recommendation¶
Start with Qwen2.5-14B. It is the perfect balance. It is faster than human speech (great for TTS) but smart enough to act like a Professor.
Update your app.py or prompts.py to point to:
model="qwen2.5:14b"