Full access is free during Beta. A paid subscription will be offered after Beta.

Ollama — User Guide

Local open models.

Strengths
  • Minimalist installation, one-line command to run large models
  • Supports mainstream models such as Llama 3, Mistral, Qwen, DeepSeek, etc.
  • Data is processed completely locally and privacy is guaranteed.
  • Provide REST API for easy integration into own applications
  • Completely free, no token restrictions, no network dependencies
Best for
  • Run AI assistant locally to protect data privacy
  • Develop and test AI applications (local API service)
  • Use AI offline (no internet connection required)
  • Enterprise intranet deployment, data does not leave the country
  • Learn and research open source large models

Installation and quick start

Ollama is extremely easy to install and you can run large models locally in minutes.

Scenario

Install Ollama and run your first model

Prompt example
# Step 1: Install Ollama
# macOS/Linux:
curl -fsSL https://ollama.com/install.sh | sh

# Windows:
# Visit https://ollama.com/download to download the installation package

# Step 2: Run the model (automatically download)
ollama run llama3.2

# Step 3: Start a conversation
# After the installation is complete, enter the question directly in the terminal
Output / what to expect
Ollama will automatically download the Llama 3.2 model (approximately 2GB), After the download is completed, enter the conversation mode directly. Conversations can be had in the terminal just like using ChatGPT. Type /bye to exit the conversation.
Tips

You need to download the model for the first run. It is recommended to choose the model size suitable for your own hardware (a 7B model requires about 8GB of memory).

Scenario

View available models and choose the right one

Prompt example
# View installed models
ollama list

# Search available models (visit ollama.com/library)
# Commonly used model recommendations:

# General dialogue (Chinese and English):
ollama pull qwen2.5:7b # Ali Tongyi Qianwen, excellent in Chinese
ollama pull deepseek-r1:7b # DeepSeek inference model

# Code generation:
ollama pull qwen2.5-coder:7b # Code-specific model
ollama pull codellama:7b # Meta code model

#Lightweight (low configuration computer):
ollama pull llama3.2:1b # 1B parameters, very low configuration to run
Output / what to expect

Choose the appropriate model based on your hardware configuration:

  • 8GB memory: select 7B parameter model
  • 16GB memory: can run 13B parameter model
  • 32GB+ memory: can run 34B parameter model
Tips

The Qwen series is recommended for Chinese tasks, and Qwen-Coder or DeepSeek-Coder is recommended for coding tasks.

Starter & above

The rest of this guide

Additional scenarios and the full comparison table are included with Starter and above. Sign in with an eligible account to load them.

You're on the Free plan. Upgrade to Starter or higher to unlock the rest of this guide—additional scenarios and the full comparison table.

Loading full guide…