This is an old revision of the document!

Ollama: Local Large Language Model Execution

Ollama is a tool that allows users to run large language models (LLMs) locally on their machines without relying on cloud services. This ensures greater privacy, data control, and offline usage capabilities.

Key Features

Local Model Execution: Install and run AI models directly on your device, such as Llama 3.3, DeepSeek-R1, Phi-4, Mistral, and Gemma 2.
Cross-Platform Compatibility: Available for macOS, Linux, and Windows, making it accessible on multiple environments.
Command Line Interface (CLI): Operates through the terminal or command prompt, offering efficient interaction with installed models.
Privacy and Data Control: Since the tool runs locally, your data is not sent to external servers, ensuring enhanced security and privacy.

Installation and Basic Usage

1. Download and Install:

Visit [Ollama Official Website](https://ollama.com/) and download the appropriate version for your operating system.
Follow the installer instructions to complete the setup.

2. Using the Terminal:

After installation, open your system's terminal or command prompt.
Run models using simple commands. For example, to run the Mistral model, use:

    
~$ ollama run mistral

Supported Models

Ollama supports several popular large language models, including but not limited to:

Llama (all versions)
DeepSeek-R1
Phi-4
Mistral
Gemma 2

Advantages of Ollama

Offline Functionality: No internet connection is needed once models are installed.
Data Security: Data remains on the local device, eliminating the risk of data breaches from cloud services.
High Performance: Running models locally can offer faster responses depending on system specifications.

Model library

Ollama supports a list of models available on [ollama.com/library](https://ollama.com/library)

Here are some example models that can be downloaded:

Model	Parameters	Size	Download Command
DeepSeek-R1	7B	4.7GB	`ollama run deepseek-r1`
DeepSeek-R1	671B	404GB	`ollama run deepseek-r1:671b`
Llama 3.3	70B	43GB	`ollama run llama3.3`
Llama 3.2	3B	2.0GB	`ollama run llama3.2`
Llama 3.2	1B	1.3GB	`ollama run llama3.2:1b`
Llama 3.2 Vision	11B	7.9GB	`ollama run llama3.2-vision`
Llama 3.2 Vision	90B	55GB	`ollama run llama3.2-vision:90b`
Llama 3.1	8B	4.7GB	`ollama run llama3.1`
Llama 3.1	405B	231GB	`ollama run llama3.1:405b`
Phi 4	14B	9.1GB	`ollama run phi4`
Phi 3 Mini	3.8B	2.3GB	`ollama run phi3`
Gemma 2	2B	1.6GB	`ollama run gemma2:2b`
Gemma 2	9B	5.5GB	`ollama run gemma2`
Gemma 2	27B	16GB	`ollama run gemma2:27b`
Mistral	7B	4.1GB	`ollama run mistral`
Moondream 2	1.4B	829MB	`ollama run moondream`
Neural Chat	7B	4.1GB	`ollama run neural-chat`
Starling	7B	4.1GB	`ollama run starling-lm`
Code Llama	7B	3.8GB	`ollama run codellama`
Llama 2 Uncensored	7B	3.8GB	`ollama run llama2-uncensored`
LLaVA	7B	4.5GB	`ollama run llava`
Solar	10.7B	6.1GB	`ollama run solar`

Note

You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

CLI Reference

Create a model

ollama create is used to create a model from a Modelfile.

ollama create mymodel -f ./Modelfile

Pull a model

ollama pull llama3.2

  This command can also be used to update a local model. Only the diff will be pulled.

Remove a model

ollama rm llama3.2

Copy a model

ollama cp llama3.2 my-model

Multiline input

For multiline input, you can wrap text with “”“:

”““Hello,

… world! … ””“ I'm a basic program that prints the famous “Hello, world!” message to the console.

Multimodal models

ollama run llava “What's in this image? /Users/jmorgan/Desktop/smile.png”

  Output: The image features a yellow smiley face, which is likely the central focus of the picture.

Pass the prompt as an argument

ollama run llama3.2 “Summarize this file: $(cat README.md)”

  Output: Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.

Show model information

ollama show llama3.2

List models on your computer

ollama list

List which models are currently loaded

ollama ps

Stop a model which is currently running

ollama stop llama3.2

Start Ollama ollama serve is used when you want to start ollama without running the desktop application.

Learn More

For more detailed information and tutorials, visit [Ollama's official website](https://ollama.com/) or check out this [video overview](https://www.youtube.com/watch?v=wxyDEqR4KxM).

Bargawiki

Table of Contents