Other Options for Local LLMs and Hermes

With your M5 Max and 128GB of Unified Memory, you are in a rare tier of hardware. You can run the absolute best open-weight models in the world.

If you want to experiment with models other than Qwen for this specific job (writing complex JSON schemas and executing API tools), here are the best candidate models you can easily run locally via Ollama:

  1. Hermes 3 (Llama 3.1 - 70B)

  2. Why it's perfect: This model was fine-tuned specifically by Nous Research—the exact same team that built your Hermes Agent. It is quite literally designed to be the native "brain" for the software you are using. It is uniquely trained on advanced function calling, strict JSON output, and multi-step agentic reasoning.

  3. Download Size: ~40 GB (Leaves you with a massive amount of memory for context window).
  4. How to run: ollama run hermes3:70b

  5. Command R+ (104B)

  6. Why it's perfect: Built by Cohere, this model was designed from the ground up specifically for enterprise tool use, API calling, and RAG (Retrieval-Augmented Generation). It is widely considered one of the most reliable open models for strictly following formatting rules (like n8n's complex JSON structure) without hallucinating.

  7. Download Size: ~60 GB.
  8. How to run: ollama run command-r-plus

  9. Mixtral 8x22B Instruct

  10. Why it's perfect: This is a massive "Mixture of Experts" (MoE) model by Mistral. It has 141 Billion total parameters, but like the Qwen MoE, it only uses a fraction of them at a time (39B). It is exceptionally fast on Apple Silicon and scores incredibly high on coding benchmarks, making it highly capable of writing n8n workflows.

  11. Download Size: ~80 GB (Will fit beautifully in your 128GB RAM).
  12. How to run: ollama run mixtral:8x22b

  13. DeepSeek Coder V2 (or equivalent coding models)

  14. Why it's perfect: While general-purpose models are great, specialized "Coder" models are trained almost exclusively on syntax, brackets, and API structures. If your agent is acting entirely as a backend n8n engineer, a large coding model will almost never make a JSON syntax error.

  15. How to run: ollama run deepseek-coder-v2

Which one should you choose? Because you are using the Hermes Agent framework, Hermes 3 (Llama 3.1 70B) is the most logical first choice. Its system prompts and tool-calling structures are perfectly aligned with the Docker container you just set up. If that model struggles with a specific n8n workflow, you can hot-swap to Command R+ or your Qwen model just by changing the name in your config file!