Ollama
toolA tool for running open-source LLMs locally — download, configure, and serve models on your own hardware.
Ollama makes it easy to run large language models on local hardware. It handles model downloading, quantization, and serving behind an API. You can run models like Llama, Mistral, Qwen, and DeepSeek without sending data to external services.
Ollama provides two API styles: a native API (/api/chat) with features like think mode control, and an OpenAI-compatible API (/v1/chat/completions) for drop-in compatibility with existing tools. The native API is preferred for models like Qwen that need specific parameter control.
In hybrid architectures, Ollama handles tasks that work well with local models (simple generation, classification) while cloud providers handle tasks needing frontier capabilities (complex reasoning, tool use). SUBCORP uses Ollama as the first model in its fallback chain.