The Goal

The only Microsoft Windows machine I still run is my gaming machine which is a NZXT Player 3 with an Intel Core i7-13700KF, 32 Gigs of DDR5 RAM, and a NVIDIA GeForce RTX 4070 Ti Super. The following is how I have started using it to locally host Ollama Generative AI workloads and the Open WebUI interface.

Get everything running and accessible locally

Install Podman and the NVIDIA Container Toolkit

Download and install Podman Desktop from https://podman-desktop.io/downloads. After Podman Desktop is installed and Podman Machine is running, we need to install the NVIDIA Container Toolkit in the Podman Machine. Open a terminal and execute the following:

$ podman machine ssh
Connecting to vm podman-machine-default. To close connection, use `~.` or `exit`
Last login: Sat May 18 15:38:17 2024 from ::1
$ curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
[nvidia-container-toolkit]
name=nvidia-container-toolkit
baseurl=https://nvidia.github.io/libnvidia-container/stable/rpm/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://nvidia.github.io/libnvidia-container/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
 
[nvidia-container-toolkit-experimental]
name=nvidia-container-toolkit-experimental
baseurl=https://nvidia.github.io/libnvidia-container/experimental/rpm/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=0
gpgkey=https://nvidia.github.io/libnvidia-container/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
$ sudo yum install -y nvidia-container-toolkit
Unable to read consumer identity
 
This system is not registered with an entitlement server. You can use subscription-manager to register.
 
Last metadata expiration check: 0:00:29 ago on Sat May 18 15:43:04 2024.
Dependencies resolved.
========================================================================================================================
 Package                                 Architecture     Version              Repository                          Size
========================================================================================================================
Installing:
 nvidia-container-toolkit                x86_64           1.15.0-1             nvidia-container-toolkit           1.0 M
Installing dependencies:
 libnvidia-container-tools               x86_64           1.15.0-1             nvidia-container-toolkit            39 k
 libnvidia-container1                    x86_64           1.15.0-1             nvidia-container-toolkit           1.0 M
 nvidia-container-toolkit-base           x86_64           1.15.0-1             nvidia-container-toolkit           3.6 M
 
Transaction Summary
========================================================================================================================
Install  4 Packages
 
Total download size: 5.6 M
Installed size: 17 M
Downloading Packages:
(1/4): libnvidia-container-tools-1.15.0-1.x86_64.rpm                                    369 kB/s |  39 kB     00:00
(2/4): libnvidia-container1-1.15.0-1.x86_64.rpm                                         2.7 MB/s | 1.0 MB     00:00
(3/4): nvidia-container-toolkit-1.15.0-1.x86_64.rpm                                     1.6 MB/s | 1.0 MB     00:00
(4/4): nvidia-container-toolkit-base-1.15.0-1.x86_64.rpm                                6.1 MB/s | 3.6 MB     00:00
------------------------------------------------------------------------------------------------------------------------
Total                                                                                   8.1 MB/s | 5.6 MB     00:00
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                                                                1/1
  Installing       : nvidia-container-toolkit-base-1.15.0-1.x86_64                                                  1/4
  Installing       : libnvidia-container1-1.15.0-1.x86_64                                                           2/4
  Running scriptlet: libnvidia-container1-1.15.0-1.x86_64                                                           2/4
/sbin/ldconfig: /usr/lib/wsl/lib/libcuda.so.1 is not a symbolic link
 
 
  Installing       : libnvidia-container-tools-1.15.0-1.x86_64                                                      3/4
  Installing       : nvidia-container-toolkit-1.15.0-1.x86_64                                                       4/4
  Running scriptlet: nvidia-container-toolkit-1.15.0-1.x86_64                                                       4/4
/usr/sbin/ldconfig: /usr/lib/wsl/lib/libcuda.so.1 is not a symbolic link
 
Installed products updated.
 
Installed:
  libnvidia-container-tools-1.15.0-1.x86_64                libnvidia-container1-1.15.0-1.x86_64
  nvidia-container-toolkit-1.15.0-1.x86_64                 nvidia-container-toolkit-base-1.15.0-1.x86_64
 
Complete!
$ sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
INFO[0000] Auto-detected mode as "wsl"
INFO[0000] Selecting /dev/dxg as /dev/dxg
INFO[0000] Using WSL driver store paths: [/usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a]
INFO[0000] Selecting /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libcuda.so.1.1 as /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libcuda.so.1.1
INFO[0000] Selecting /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libcuda_loader.so as /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libcuda_loader.so
INFO[0000] Selecting /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libnvidia-ptxjitcompiler.so.1 as /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libnvidia-ptxjitcompiler.so.1
INFO[0000] Selecting /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libnvidia-ml.so.1 as /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libnvidia-ml.so.1
INFO[0000] Selecting /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libnvidia-ml_loader.so as /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libnvidia-ml_loader.so
INFO[0000] Selecting /usr/lib/wsl/lib/libdxcore.so as /usr/lib/wsl/lib/libdxcore.so
WARN[0000] Could not locate libnvdxgdmal.so.1: pattern libnvdxgdmal.so.1 not found
INFO[0000] Selecting /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/nvcubins.bin as /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/nvcubins.bin
INFO[0000] Selecting /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/nvidia-smi as /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/nvidia-smi
INFO[0000] Generated CDI spec with version 0.3.0

The NVIDIA Container Toolkit should be installed and running, but let’s test it.

$ nvidia-ctk cdi list
INFO[0000] Found 1 CDI devices
nvidia.com/gpu=all

This next command will list all of the NVIDIA GPUs found.

$ /usr/lib/wsl/lib/nvidia-smi
Sat May 18 15:47:49 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.76.01              Driver Version: 552.44         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 Ti     On  |   00000000:01:00.0  On |                  N/A |
|  0%   32C    P8              7W /  285W |    1377MiB /  12282MiB |     12%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
 
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Note

I am continuing to execute podman from the Podman Machine’s shell. I am doing this so I can use the Linux tools to dynamically calculate the max CPUs and max RAM given to the pod. Otherwise the following podman commands can be executed from the Windows terminal.

Create AI Pod and Containers

The AI Pod and Network

Change the following commands to match the maximum percentage of CPUs and RAM the pod is allowed to consume as well as the block io weight value. In my example here I’m setting them both to 75%.

PERCENT_CPU=75
PERCENT_RAM=75
BLKIO_WEIGHT=10

I am going to create an isolated network for this pod.

$ podman network create ai_pod
ai_pod

Create a pod with the CPU and RAM limits and the needed network configuration. The resulting ID will be unique.

$ podman pod create \
  --blkio-weight ${BLKIO_WEIGHT} \
  --cpus "$(echo $(( $(nproc)*${PERCENT_CPU}/100 )) )" \
  --memory "$(echo $(( $(grep MemTotal /proc/meminfo | grep --only-matching '[[:digit:]]*')*${PERCENT_RAM}/100 )) )"k \
  --name ai_pod \
  --network=ai_pod \
  --publish 3000:8080 \
  --publish 3001:11434 \
  --replace
abf6d4b8225376d8f3df7e6c946f72ba6d073584ecea76fac69955e26589c146

The Ollama container

Create the Ollama container. This also creates a persistent volume to hold the AI models.

$ podman run \
  --cap-drop=ALL \
  --detach \
  --gpus=all \
  --name ollama-container \
  --pod ai_pod \
  --pull newer \
  --volume ollama-container:/root/.ollama \
  docker.io/ollama/ollama:latest

The Open WebUI container

Finally create the Open WebUI container. This also create another persistent volume to hold data used by Open WebUI. Since everything only accessible locally we are disabling authentication using the highlighted line.

$ podman run \
  --cap-drop=ALL \
  --detach \
  --env OLLAMA_BASE_URL=http://localhost:11434 \
  --env WEBUI_AUTH=false \
  --name open-webui-container \
  --pod ai_pod \
  --pull newer \
  --volume open-webui-container:/app/backend/data \
  --replace \
  ghcr.io/open-webui/open-webui:main

Congratulations!! You should be able to access the Open WebUI at http://localhost:3000 and the ollama api at http://localhost:3001/v1

Get local LAN access to the Open WebUI

Reconfigure WSL

By default to make containers running in Windows WSL accessible from the local LAN, you will have to lookup their IP address and run several netsh commands to setup NAT in Windows, as well as update the Windows Firewall to allow traffic to the ports. Which in my mind is just one of the many reasons Windows sucks for this shit!!

We are going to get around this by first creating a .wslconfig file in our %UserProfile% directory with the following contents. This tells WSL to mirror the Windows IP to Podman Machine, and enables traffic from Windows to “loopback” to the container.

[wsl2]
networkingMode=mirrored
 
[experimental]
hostAddressLoopback=true

After which the WSL will need to be restarted. Do this why launching a Windows terminal and executing the following.

$ wsl --shutdown
$ podman machine start
Starting machine "podman-machine-default"
 
This machine is currently configured in rootless mode. If your containers
require root permissions (e.g. ports < 1024), or if you run into compatibility
issues with non-podman clients, you can switch using the following command:
 
        podman machine set --rootful
 
API forwarding for Docker API clients is not available due to the following startup failures.
        could not start api proxy since expected pipe is not available: podman-machine-default
 
Podman clients are still able to connect.
Machine "podman-machine-default" started successfully

At this point we could just add Windows Firewall rules to allow ports 3000 and 3001 to Podman and call it a day. However, I will continue to make this more complicated.

Implement Traefik web router

Important

All of the instructions from this point forward are very specific to my home network, such as the home.lan DNS domain, the use of a private certificate authority, and the use of Keycloak for OpenID Connect authentication. You will need to make the appropriate changes to fit your environment.

Traefik container

The following will create the Traefik container and attach it to the ai_pod network. This also setting up a route in Traefik so when it sees a request for hostname ai-traefik.home.lan it will return the Traefik api dashboard. This is helpful when troubleshooting Traefik issues.

$ podman run \
  --detach \
  --name traefik-container \
  --label traefik.enable=true \
  --label traefik.http.routers.traefik-dashboard.rule="Host(\`ai-traefik.home.lan\`)" \
  --label traefik.http.routers.traefik-dashboard.entrypoints=http \
  --label traefik.http.routers.traefik-dashboard.service=api@internal \
  --pull newer \
  --network=ai_pod \
  --publish 80:80 \
  --volume /run/docker.sock:/var/run/docker.sock:ro \
  --replace \
  docker.io/library/traefik:v3.1 \
  --providers.docker=true \
  --providers.docker.network=ai_pod \
  --providers.docker.exposedbydefault=false \
  --api.dashboard=true \
  --api.insecure=true \
  --entryPoints.http.address=:80

Ollama container

Recreate the Ollama container with the needed labels to add a Traefik route for http://ollama-api.home.lan.

$ podman run \
  --cap-drop=ALL \
  --detach \
  --gpus=all \
  --label traefik.http.routers.ollama-api.rule="Host(\`ollama-api.home.lan\`)" \
  --label traefik.http.routers.ollama-api.entrypoints=http \
  --label traefik.http.routers.ollama-api.service=ollama-api \
  --label traefik.http.services.ollama-api.loadbalancer.server.port=11434 \
  --name ollama-container \
  --pod ai_pod \
  --pull newer \
  --volume ollama-container:/root/.ollama \
  docker.io/ollama/ollama:latest

Open WebUI container

Next recreate the Open WebUI container with the needed labels to add a Traefik route for http://ollama.home.lan.

$ podman run \
  --cap-drop=ALL \
  --detach \
  --env OLLAMA_BASE_URL=http://localhost:11434 \
  --env WEBUI_AUTH=false \
  --label traefik.enable=true \
  --label traefik.http.routers.ollama.rule="Host(\`ollama.home.lan\`)" \
  --label traefik.http.routers.ollama.entrypoints=http \
  --label traefik.http.routers.ollama.service=ollama \
  --label traefik.http.services.ollama.loadbalancer.server.port=8080 \
  --name open-webui-container \
  --pod ai_pod \
  --pull newer \
  --volume open-webui-container:/app/backend/data \
  --replace \
  ghcr.io/open-webui/open-webui:main

Enable HTTPS

Podman Machine Changes

I run Red Hat IDM for both DNS and certificate services on my local LAN and I want all web based services to use HTTPS. IDM’s ACME service requires the http01 challenge happen over port 80. Because of this we need to allow the rootless Traefik container to bind to port 80 and add the CA’s certificate bundle to the Podman Machine’s trusted ca store.

Following the tutorial described at https://github.com/containers/podman/blob/main/docs/tutorials/podman-install-certificate-authority.md I copied the CA’s certificate from /etc/ipa/ca.crt on the IDM server to /etc/pki/ca-trust/source/anchors in the Podman Machine environment. Then from a Windows terminal execute:

$ podman machine sh
$ sudo echo "net.ipv4.ip_unprivileged_port_start=80" >> /etc/sysctl.conf
$ sudo sysctl net.ipv4.ip_unprivileged_port_start=80
net.ipv4.ip_unprivileged_port_start = 80
$ sudo update-ca-trust

Traefik container

Now recreate the Traefik container to access the ACME service and setup the https entry point.

$ podman run \
  --detach \
  --name traefik-container \
  --label traefik.enable=true \
  --label traefik.http.routers.traefik-dashboard.rule="Host(\`ai-traefik.home.lan\`)" \
  --label traefik.http.routers.traefik-dashboard.tls=true \
  --label traefik.http.routers.traefik-dashboard.tls.certresolver=idm1homelan \
  --label traefik.http.routers.traefik-dashboard.entrypoints=https \
  --label traefik.http.routers.traefik-dashboard.service=api@internal \
  --pull newer \
  --network=ai_pod \
  --publish 80:80 \
  --publish 443:443 \
  --volume /run/docker.sock:/var/run/docker.sock:ro \
  --volume /etc/ssl/certs/ca-certificates.crt:/etc/ssl/certs/ca-certificates.crt:ro \
  --replace \
  docker.io/library/traefik:v3.1 \
  --providers.docker=true \
  --providers.docker.network=ai_pod \
  --providers.docker.exposedbydefault=false \
  --api.dashboard=true \
  --api.insecure=true \
  --entryPoints.http.address=:80 \
  --entryPoints.http.http.redirections.entryPoint.to=https \
  --entryPoints.http.http.redirections.entryPoint.scheme=https \
  --entryPoints.https.address=:443 \
  --entryPoints.https.http.tls=true \
  --entryPoints.https.http.tls.certresolver=idm1homelan \
  --certificatesresolvers.idm1homelan.acme.caserver=https://idm1.home.lan/acme/directory \
  --certificatesresolvers.idm1homelan.acme.httpchallenge=true \
  --certificatesresolvers.idm1homelan.acme.httpchallenge.entrypoint=http 

Ollama container

Now we can redeploy the Ollama containers changing the entryPoint and enabling HTTPS.

$ podman run \
  --cap-drop=ALL \
  --detach \
  --gpus=all \
  --label traefik.http.routers.ollama-api.rule="Host(\`ollama-api.home.lan\`)" \
  --label traefik.http.routers.ollama-api.entrypoints=https \
  --label traefik.http.routers.ollama-api.service=ollama-api \
  --label traefik.http.services.ollama-api.loadbalancer.server.port=11434 \
  --name ollama-container \
  --pod ai_pod \
  --pull newer \
  --volume ollama-container:/root/.ollama \
  docker.io/ollama/ollama:latest

Open WebUI container

$ podman run \
  --cap-drop=ALL \
  --detach \
  --env OLLAMA_BASE_URL=http://localhost:11434 \
  --env WEBUI_AUTH=false \
  --label traefik.enable=true \
  --label traefik.http.routers.ollama.rule="Host(\`ollama.home.lan\`)" \
  --label traefik.http.routers.ollama.entrypoints=https \
  --label traefik.http.routers.ollama.service=ollama \
  --label traefik.http.services.ollama.loadbalancer.server.port=8080 \
  --name open-webui-container \
  --pod ai_pod \
  --pull newer \
  --volume open-webui-container:/app/backend/data \
  --replace \
  ghcr.io/open-webui/open-webui:main

Enable authentication via Keycloak

Open WebUI container

$ podman run \
  --cap-drop=ALL \
  --detach \
  --env OLLAMA_BASE_URL=http://localhost:11434 \
  --env REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt \
  --env SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt \
  --env WEBUI_AUTH=true \
  --env ENABLE_OAUTH_SIGNUP=true \
  --env OAUTH_MERGE_ACCOUNTS_BY_EMAIL=true \
  --env OAUTH_CLIENT_ID=ollama \
  --env OAUTH_CLIENT_SECRET="[**CLIENT SECRET FROM KEYCLOAK**]" \
  --env OPENID_PROVIDER_URL="https://auth.home.lan/realms/HOME.LAN/.well-known/openid-configuration" \
  --label traefik.enable=true \
  --label traefik.http.routers.ollama.rule="Host(\`ollama.home.lan\`)" \
  --label traefik.http.routers.ollama.entrypoints=https \
  --label traefik.http.routers.ollama.service=ollama \
  --label traefik.http.services.ollama.loadbalancer.server.port=8080 \
  --name open-webui-container \
  --pod ai_pod \
  --pull newer \
  --volume open-webui-container:/app/backend/data \
  --volume /etc/ssl/certs/ca-certificates.crt:/etc/ssl/certs/ca-certificates.crt:ro \
  --replace \
  ghcr.io/open-webui/open-webui:main

Note

If you have been following long the progress, you will need to run podman exec -it open-webui-container rm /app/backend/data/webui.db && podman restart open-webui-container to wipe the Open WebUI authentication database, then be sure to use the Open WebUI authentication screen to sign up and create the initial administrator account before authenticating via Keycloak.

TO DO

  • Authentication ai-traefik.home.lan
  • OAuth client configuration to secret
  • Turn off Open WebUI login window - forward to Keycloak