This is an old revision of the document!
Table of Contents
Ollama: Local Large Language Model Execution
Ollama is a tool that allows users to run large language models (LLMs) locally on their machines without relying on cloud services. This ensures greater privacy, data control, and offline usage capabilities.
Key Features
- Local Model Execution: Install and run AI models directly on your device, such as Llama 3.3, DeepSeek-R1, Phi-4, Mistral, and Gemma 2.
- Cross-Platform Compatibility: Available for macOS, Linux, and Windows, making it accessible on multiple environments.
- Command Line Interface (CLI): Operates through the terminal or command prompt, offering efficient interaction with installed models.
- Privacy and Data Control: Since the tool runs locally, your data is not sent to external servers, ensuring enhanced security and privacy.
Installation and Basic Usage
1. Download and Install:
- Visit [Ollama Official Website](https://ollama.com/) and download the appropriate version for your operating system.
- Follow the installer instructions to complete the setup.
2. Using the Terminal:
- After installation, open your system's terminal or command prompt.
- Run models using simple commands. For example, to run the Mistral model, use:
~$ ollama run mistral
Supported Models
Ollama supports several popular large language models, including but not limited to:
- Llama (all versions)
- DeepSeek-R1
- Phi-4
- Mistral
- Gemma 2
Advantages of Ollama
- Offline Functionality: No internet connection is needed once models are installed.
- Data Security: Data remains on the local device, eliminating the risk of data breaches from cloud services.
- High Performance: Running models locally can offer faster responses depending on system specifications.
Model library
Ollama supports a list of models available on [ollama.com/library](https://ollama.com/library)
Here are some example models that can be downloaded:
| Model | Parameters | Size | Download Command |
|---|---|---|---|
| DeepSeek-R1 | 7B | 4.7GB | `ollama run deepseek-r1` |
| DeepSeek-R1 | 671B | 404GB | `ollama run deepseek-r1:671b` |
| Llama 3.3 | 70B | 43GB | `ollama run llama3.3` |
| Llama 3.2 | 3B | 2.0GB | `ollama run llama3.2` |
| Llama 3.2 | 1B | 1.3GB | `ollama run llama3.2:1b` |
| Llama 3.2 Vision | 11B | 7.9GB | `ollama run llama3.2-vision` |
| Llama 3.2 Vision | 90B | 55GB | `ollama run llama3.2-vision:90b` |
| Llama 3.1 | 8B | 4.7GB | `ollama run llama3.1` |
| Llama 3.1 | 405B | 231GB | `ollama run llama3.1:405b` |
| Phi 4 | 14B | 9.1GB | `ollama run phi4` |
| Phi 3 Mini | 3.8B | 2.3GB | `ollama run phi3` |
| Gemma 2 | 2B | 1.6GB | `ollama run gemma2:2b` |
| Gemma 2 | 9B | 5.5GB | `ollama run gemma2` |
| Gemma 2 | 27B | 16GB | `ollama run gemma2:27b` |
| Mistral | 7B | 4.1GB | `ollama run mistral` |
| Moondream 2 | 1.4B | 829MB | `ollama run moondream` |
| Neural Chat | 7B | 4.1GB | `ollama run neural-chat` |
| Starling | 7B | 4.1GB | `ollama run starling-lm` |
| Code Llama | 7B | 3.8GB | `ollama run codellama` |
| Llama 2 Uncensored | 7B | 3.8GB | `ollama run llama2-uncensored` |
| LLaVA | 7B | 4.5GB | `ollama run llava` |
| Solar | 10.7B | 6.1GB | `ollama run solar` |
Note
You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
CLI Reference
Create a model
ollama create is used to create a model from a Modelfile.
ollama create mymodel -f ./Modelfile
Pull a model
ollama pull llama3.2
This command can also be used to update a local model. Only the diff will be pulled.
Remove a model
ollama rm llama3.2
Copy a model
ollama cp llama3.2 my-model
Multiline input
For multiline input, you can wrap text with “”“:
”““Hello,
… world! … ””“ I'm a basic program that prints the famous “Hello, world!” message to the console.
Multimodal models
ollama run llava “What's in this image? /Users/jmorgan/Desktop/smile.png”
Output: The image features a yellow smiley face, which is likely the central focus of the picture.
Pass the prompt as an argument
ollama run llama3.2 “Summarize this file: $(cat README.md)”
Output: Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
Show model information
ollama show llama3.2
List models on your computer
ollama list
List which models are currently loaded
ollama ps
Stop a model which is currently running
ollama stop llama3.2
Start Ollama ollama serve is used when you want to start ollama without running the desktop application.
Learn More
For more detailed information and tutorials, visit [Ollama's official website](https://ollama.com/) or check out this [video overview](https://www.youtube.com/watch?v=wxyDEqR4KxM).
