Ollama: Local Large Language Model Execution

Ollama is a tool that allows users to run large language models (LLMs) locally on their machines without relying on cloud services. This ensures greater privacy, data control, and offline usage capabilities.

Key Features

Local Model Execution: Install and run AI models directly on your device, such as Llama 3.3, DeepSeek-R1, Phi-4, Mistral, and Gemma 2.
Cross-Platform Compatibility: Available for macOS, Linux, and Windows, making it accessible on multiple environments.
Command Line Interface (CLI): Operates through the terminal or command prompt, offering efficient interaction with installed models.
Privacy and Data Control: Since the tool runs locally, your data is not sent to external servers, ensuring enhanced security and privacy.

Installation and Basic Usage

1. Download and Install:

Visit [Ollama Official Website](https://ollama.com/) and download the appropriate version for your operating system.
Follow the installer instructions to complete the setup.

2. Using the Terminal:

After installation, open your system's terminal or command prompt.
Run models using simple commands. For example, to run the Mistral model, use:

    
~$ ollama run mistral

Supported Models

Ollama supports several popular large language models, including but not limited to:

Llama (all versions)
DeepSeek-R1
Phi-4
Mistral
Gemma 2

Advantages of Ollama

Offline Functionality: No internet connection is needed once models are installed.
Data Security: Data remains on the local device, eliminating the risk of data breaches from cloud services.
High Performance: Running models locally can offer faster responses depending on system specifications.

Model library

Ollama supports a list of models available on [ollama.com/library](https://ollama.com/library)

Here are some example models that can be downloaded:

Model	Parameters	Size	Download Command
DeepSeek-R1	7B	4.7GB	`ollama run deepseek-r1`
DeepSeek-R1	671B	404GB	`ollama run deepseek-r1:671b`
Llama 3.3	70B	43GB	`ollama run llama3.3`
Llama 3.2	3B	2.0GB	`ollama run llama3.2`
Llama 3.2	1B	1.3GB	`ollama run llama3.2:1b`
Llama 3.2 Vision	11B	7.9GB	`ollama run llama3.2-vision`
Llama 3.2 Vision	90B	55GB	`ollama run llama3.2-vision:90b`
Llama 3.1	8B	4.7GB	`ollama run llama3.1`
Llama 3.1	405B	231GB	`ollama run llama3.1:405b`
Phi 4	14B	9.1GB	`ollama run phi4`
Phi 3 Mini	3.8B	2.3GB	`ollama run phi3`
Gemma 2	2B	1.6GB	`ollama run gemma2:2b`
Gemma 2	9B	5.5GB	`ollama run gemma2`
Gemma 2	27B	16GB	`ollama run gemma2:27b`
Mistral	7B	4.1GB	`ollama run mistral`
Moondream 2	1.4B	829MB	`ollama run moondream`
Neural Chat	7B	4.1GB	`ollama run neural-chat`
Starling	7B	4.1GB	`ollama run starling-lm`
Code Llama	7B	3.8GB	`ollama run codellama`
Llama 2 Uncensored	7B	3.8GB	`ollama run llama2-uncensored`
LLaVA	7B	4.5GB	`ollama run llava`
Solar	10.7B	6.1GB	`ollama run solar`

Note

You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

CLI Reference

Create a model

ollama create is used to create a model from a Modelfile.

 
ollama create mymodel -f ./Modelfile

Pull a model

 
ollama pull llama3.2

This command can also be used to update a local model. Only the diff will be pulled.

Remove a model

 
ollama rm llama3.2

Copy a model

 
ollama cp llama3.2 my-model

Multiline input

For multiline input, you can wrap text with “”“:

 
>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.

Multimodal models

 
ollama run llava "What's in this image? /Users/jmorgan/Desktop/smile.png"

Output: The image features a yellow smiley face, which is likely the central focus of the picture.

Pass the prompt as an argument

 
ollama run llama3.2 "Summarize this file: $(cat README.md)"

Show model information

 
ollama show llama3.2

List models on your computer

 
ollama list

List which models are currently loaded

 
ollama ps

Stop a model which is currently running

 
ollama stop llama3.2

 
Start Ollama
ollama serve is used when you want to start ollama without running the desktop application.

How to Run Ollama and Connect to the Service API Through Internal Network or Internet

Setting Environment Variables on Linux

If Ollama is run as a systemd service, environment variables should be set using systemctl:

Edit the Ollama Service File: Open the Ollama service configuration file with the following command:

  sudo systemctl edit ollama.service

Add the Environment Variable: In the editor, add the following lines under the [Service] section:

  [Service]
  Environment="OLLAMA_HOST=0.0.0.0"

Note #1: Sometimes, 0.0.0.0 does not work due to your environment setup. Instead, you can try setting it to your local ip address like 10.0.0.x or xxx.local, etc.

Note #2: You should put this above this line ### Lines below this comment will be discarded. It should look something like this:

  ### Editing /etc/systemd/system/ollama.service.d/override.conf
  ### Anything between here and the comment below will become the new contents of the file
  [Service]
  Environment="OLLAMA_HOST=0.0.0.0"
  ### Lines below this comment will be discarded
  ### /etc/systemd/system/ollama.service
  # [Unit]
  # Description=Ollama Service
  # After=network-online.target
  #
  # [Service]
  # ExecStart=/usr/local/bin/ollama serve
  # User=ollama
  # Group=ollama
  # Restart=always
  # RestartSec=3
  # Environment="PATH=/home/kimi/.nvm/versions/node/v20.5.0/bin:/home/kimi/.local/share/pnpm:/usr/local/sbin:/usr/local/bin:/usr/s>
  #
  # [Install]
  # WantedBy=default.target

Restart the Service: After editing the file, reload the systemd daemon and restart the Ollama service:

  sudo systemctl daemon-reload
  sudo systemctl restart ollama

Learn More

For more detailed information and tutorials, visit [Ollama's official website](https://ollama.com/) or check out this [video overview](https://www.youtube.com/watch?v=wxyDEqR4KxM).

Table of Contents