The Goal
The only Microsoft Windows machine I still run is my gaming machine which is a NZXT Player 3 with an Intel Core i7-13700KF, 32 Gigs of DDR5 RAM, and a NVIDIA GeForce RTX 4070 Ti Super. The following is how I have started using it to locally host Ollama Generative AI workloads and the Open WebUI interface.
Get everything running and accessible locally
Install Podman and the NVIDIA Container Toolkit
Download and install Podman Desktop from https://podman-desktop.io/downloads. After Podman Desktop is installed and Podman Machine is running, we need to install the NVIDIA Container Toolkit in the Podman Machine. Open a terminal and execute the following:
$ podman machine ssh
Connecting to vm podman-machine-default. To close connection, use `~.` or `exit`
Last login: Sat May 18 15:38:17 2024 from ::1$ curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
[nvidia-container-toolkit]
name=nvidia-container-toolkit
baseurl=https://nvidia.github.io/libnvidia-container/stable/rpm/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://nvidia.github.io/libnvidia-container/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
[nvidia-container-toolkit-experimental]
name=nvidia-container-toolkit-experimental
baseurl=https://nvidia.github.io/libnvidia-container/experimental/rpm/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=0
gpgkey=https://nvidia.github.io/libnvidia-container/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
$ sudo yum install -y nvidia-container-toolkit
Unable to read consumer identity
This system is not registered with an entitlement server. You can use subscription-manager to register.
Last metadata expiration check: 0:00:29 ago on Sat May 18 15:43:04 2024.
Dependencies resolved.
========================================================================================================================
Package Architecture Version Repository Size
========================================================================================================================
Installing:
nvidia-container-toolkit x86_64 1.15.0-1 nvidia-container-toolkit 1.0 M
Installing dependencies:
libnvidia-container-tools x86_64 1.15.0-1 nvidia-container-toolkit 39 k
libnvidia-container1 x86_64 1.15.0-1 nvidia-container-toolkit 1.0 M
nvidia-container-toolkit-base x86_64 1.15.0-1 nvidia-container-toolkit 3.6 M
Transaction Summary
========================================================================================================================
Install 4 Packages
Total download size: 5.6 M
Installed size: 17 M
Downloading Packages:
(1/4): libnvidia-container-tools-1.15.0-1.x86_64.rpm 369 kB/s | 39 kB 00:00
(2/4): libnvidia-container1-1.15.0-1.x86_64.rpm 2.7 MB/s | 1.0 MB 00:00
(3/4): nvidia-container-toolkit-1.15.0-1.x86_64.rpm 1.6 MB/s | 1.0 MB 00:00
(4/4): nvidia-container-toolkit-base-1.15.0-1.x86_64.rpm 6.1 MB/s | 3.6 MB 00:00
------------------------------------------------------------------------------------------------------------------------
Total 8.1 MB/s | 5.6 MB 00:00
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Installing : nvidia-container-toolkit-base-1.15.0-1.x86_64 1/4
Installing : libnvidia-container1-1.15.0-1.x86_64 2/4
Running scriptlet: libnvidia-container1-1.15.0-1.x86_64 2/4
/sbin/ldconfig: /usr/lib/wsl/lib/libcuda.so.1 is not a symbolic link
Installing : libnvidia-container-tools-1.15.0-1.x86_64 3/4
Installing : nvidia-container-toolkit-1.15.0-1.x86_64 4/4
Running scriptlet: nvidia-container-toolkit-1.15.0-1.x86_64 4/4
/usr/sbin/ldconfig: /usr/lib/wsl/lib/libcuda.so.1 is not a symbolic link
Installed products updated.
Installed:
libnvidia-container-tools-1.15.0-1.x86_64 libnvidia-container1-1.15.0-1.x86_64
nvidia-container-toolkit-1.15.0-1.x86_64 nvidia-container-toolkit-base-1.15.0-1.x86_64
Complete!
$ sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
INFO[0000] Auto-detected mode as "wsl"
INFO[0000] Selecting /dev/dxg as /dev/dxg
INFO[0000] Using WSL driver store paths: [/usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a]
INFO[0000] Selecting /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libcuda.so.1.1 as /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libcuda.so.1.1
INFO[0000] Selecting /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libcuda_loader.so as /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libcuda_loader.so
INFO[0000] Selecting /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libnvidia-ptxjitcompiler.so.1 as /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libnvidia-ptxjitcompiler.so.1
INFO[0000] Selecting /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libnvidia-ml.so.1 as /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libnvidia-ml.so.1
INFO[0000] Selecting /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libnvidia-ml_loader.so as /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/libnvidia-ml_loader.so
INFO[0000] Selecting /usr/lib/wsl/lib/libdxcore.so as /usr/lib/wsl/lib/libdxcore.so
WARN[0000] Could not locate libnvdxgdmal.so.1: pattern libnvdxgdmal.so.1 not found
INFO[0000] Selecting /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/nvcubins.bin as /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/nvcubins.bin
INFO[0000] Selecting /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/nvidia-smi as /usr/lib/wsl/drivers/nv_dispi.inf_amd64_de8e1115ac61e38a/nvidia-smi
INFO[0000] Generated CDI spec with version 0.3.0The NVIDIA Container Toolkit should be installed and running, but let’s test it.
$ nvidia-ctk cdi list
INFO[0000] Found 1 CDI devices
nvidia.com/gpu=allThis next command will list all of the NVIDIA GPUs found.
$ /usr/lib/wsl/lib/nvidia-smi
Sat May 18 15:47:49 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.76.01 Driver Version: 552.44 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4070 Ti On | 00000000:01:00.0 On | N/A |
| 0% 32C P8 7W / 285W | 1377MiB / 12282MiB | 12% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+Note
I am continuing to execute
podmanfrom the Podman Machine’s shell. I am doing this so I can use the Linux tools to dynamically calculate the max CPUs and max RAM given to the pod. Otherwise the followingpodmancommands can be executed from the Windows terminal.
Create AI Pod and Containers
The AI Pod and Network
Change the following commands to match the maximum percentage of CPUs and RAM the pod is allowed to consume as well as the block io weight value. In my example here I’m setting them both to 75%.
PERCENT_CPU=75
PERCENT_RAM=75
BLKIO_WEIGHT=10I am going to create an isolated network for this pod.
$ podman network create ai_pod
ai_podCreate a pod with the CPU and RAM limits and the needed network configuration. The resulting ID will be unique.
$ podman pod create \
--blkio-weight ${BLKIO_WEIGHT} \
--cpus "$(echo $(( $(nproc)*${PERCENT_CPU}/100 )) )" \
--memory "$(echo $(( $(grep MemTotal /proc/meminfo | grep --only-matching '[[:digit:]]*')*${PERCENT_RAM}/100 )) )"k \
--name ai_pod \
--network=ai_pod \
--publish 3000:8080 \
--publish 3001:11434 \
--replace
abf6d4b8225376d8f3df7e6c946f72ba6d073584ecea76fac69955e26589c146The Ollama container
Create the Ollama container. This also creates a persistent volume to hold the AI models.
$ podman run \
--cap-drop=ALL \
--detach \
--gpus=all \
--name ollama-container \
--pod ai_pod \
--pull newer \
--volume ollama-container:/root/.ollama \
docker.io/ollama/ollama:latestThe Open WebUI container
Finally create the Open WebUI container. This also create another persistent volume to hold data used by Open WebUI. Since everything only accessible locally we are disabling authentication using the highlighted line.
$ podman run \
--cap-drop=ALL \
--detach \
--env OLLAMA_BASE_URL=http://localhost:11434 \
--env WEBUI_AUTH=false \
--name open-webui-container \
--pod ai_pod \
--pull newer \
--volume open-webui-container:/app/backend/data \
--replace \
ghcr.io/open-webui/open-webui:mainCongratulations!! You should be able to access the Open WebUI at http://localhost:3000 and the ollama api at http://localhost:3001/v1
Get local LAN access to the Open WebUI
Reconfigure WSL
By default to make containers running in Windows WSL accessible from the local LAN, you will have to lookup their IP address and run several netsh commands to setup NAT in Windows, as well as update the Windows Firewall to allow traffic to the ports. Which in my mind is just one of the many reasons Windows sucks for this shit!!
We are going to get around this by first creating a .wslconfig file in our %UserProfile% directory with the following contents. This tells WSL to mirror the Windows IP to Podman Machine, and enables traffic from Windows to “loopback” to the container.
[wsl2]
networkingMode=mirrored
[experimental]
hostAddressLoopback=trueAfter which the WSL will need to be restarted. Do this why launching a Windows terminal and executing the following.
$ wsl --shutdown
$ podman machine start
Starting machine "podman-machine-default"
This machine is currently configured in rootless mode. If your containers
require root permissions (e.g. ports < 1024), or if you run into compatibility
issues with non-podman clients, you can switch using the following command:
podman machine set --rootful
API forwarding for Docker API clients is not available due to the following startup failures.
could not start api proxy since expected pipe is not available: podman-machine-default
Podman clients are still able to connect.
Machine "podman-machine-default" started successfullyAt this point we could just add Windows Firewall rules to allow ports 3000 and 3001 to Podman and call it a day. However, I will continue to make this more complicated.
Implement Traefik web router
Important
All of the instructions from this point forward are very specific to my home network, such as the
home.lanDNS domain, the use of a private certificate authority, and the use of Keycloak for OpenID Connect authentication. You will need to make the appropriate changes to fit your environment.
Traefik container
The following will create the Traefik container and attach it to the ai_pod network. This also setting up a route in Traefik so when it sees a request for hostname ai-traefik.home.lan it will return the Traefik api dashboard. This is helpful when troubleshooting Traefik issues.
$ podman run \
--detach \
--name traefik-container \
--label traefik.enable=true \
--label traefik.http.routers.traefik-dashboard.rule="Host(\`ai-traefik.home.lan\`)" \
--label traefik.http.routers.traefik-dashboard.entrypoints=http \
--label traefik.http.routers.traefik-dashboard.service=api@internal \
--pull newer \
--network=ai_pod \
--publish 80:80 \
--volume /run/docker.sock:/var/run/docker.sock:ro \
--replace \
docker.io/library/traefik:v3.1 \
--providers.docker=true \
--providers.docker.network=ai_pod \
--providers.docker.exposedbydefault=false \
--api.dashboard=true \
--api.insecure=true \
--entryPoints.http.address=:80Ollama container
Recreate the Ollama container with the needed labels to add a Traefik route for http://ollama-api.home.lan.
$ podman run \
--cap-drop=ALL \
--detach \
--gpus=all \
--label traefik.http.routers.ollama-api.rule="Host(\`ollama-api.home.lan\`)" \
--label traefik.http.routers.ollama-api.entrypoints=http \
--label traefik.http.routers.ollama-api.service=ollama-api \
--label traefik.http.services.ollama-api.loadbalancer.server.port=11434 \
--name ollama-container \
--pod ai_pod \
--pull newer \
--volume ollama-container:/root/.ollama \
docker.io/ollama/ollama:latestOpen WebUI container
Next recreate the Open WebUI container with the needed labels to add a Traefik route for http://ollama.home.lan.
$ podman run \
--cap-drop=ALL \
--detach \
--env OLLAMA_BASE_URL=http://localhost:11434 \
--env WEBUI_AUTH=false \
--label traefik.enable=true \
--label traefik.http.routers.ollama.rule="Host(\`ollama.home.lan\`)" \
--label traefik.http.routers.ollama.entrypoints=http \
--label traefik.http.routers.ollama.service=ollama \
--label traefik.http.services.ollama.loadbalancer.server.port=8080 \
--name open-webui-container \
--pod ai_pod \
--pull newer \
--volume open-webui-container:/app/backend/data \
--replace \
ghcr.io/open-webui/open-webui:mainEnable HTTPS
Podman Machine Changes
I run Red Hat IDM for both DNS and certificate services on my local LAN and I want all web based services to use HTTPS. IDM’s ACME service requires the http01 challenge happen over port 80. Because of this we need to allow the rootless Traefik container to bind to port 80 and add the CA’s certificate bundle to the Podman Machine’s trusted ca store.
Following the tutorial described at https://github.com/containers/podman/blob/main/docs/tutorials/podman-install-certificate-authority.md I copied the CA’s certificate from /etc/ipa/ca.crt on the IDM server to /etc/pki/ca-trust/source/anchors in the Podman Machine environment. Then from a Windows terminal execute:
$ podman machine sh
$ sudo echo "net.ipv4.ip_unprivileged_port_start=80" >> /etc/sysctl.conf
$ sudo sysctl net.ipv4.ip_unprivileged_port_start=80
net.ipv4.ip_unprivileged_port_start = 80
$ sudo update-ca-trustTraefik container
Now recreate the Traefik container to access the ACME service and setup the https entry point.
$ podman run \
--detach \
--name traefik-container \
--label traefik.enable=true \
--label traefik.http.routers.traefik-dashboard.rule="Host(\`ai-traefik.home.lan\`)" \
--label traefik.http.routers.traefik-dashboard.tls=true \
--label traefik.http.routers.traefik-dashboard.tls.certresolver=idm1homelan \
--label traefik.http.routers.traefik-dashboard.entrypoints=https \
--label traefik.http.routers.traefik-dashboard.service=api@internal \
--pull newer \
--network=ai_pod \
--publish 80:80 \
--publish 443:443 \
--volume /run/docker.sock:/var/run/docker.sock:ro \
--volume /etc/ssl/certs/ca-certificates.crt:/etc/ssl/certs/ca-certificates.crt:ro \
--replace \
docker.io/library/traefik:v3.1 \
--providers.docker=true \
--providers.docker.network=ai_pod \
--providers.docker.exposedbydefault=false \
--api.dashboard=true \
--api.insecure=true \
--entryPoints.http.address=:80 \
--entryPoints.http.http.redirections.entryPoint.to=https \
--entryPoints.http.http.redirections.entryPoint.scheme=https \
--entryPoints.https.address=:443 \
--entryPoints.https.http.tls=true \
--entryPoints.https.http.tls.certresolver=idm1homelan \
--certificatesresolvers.idm1homelan.acme.caserver=https://idm1.home.lan/acme/directory \
--certificatesresolvers.idm1homelan.acme.httpchallenge=true \
--certificatesresolvers.idm1homelan.acme.httpchallenge.entrypoint=http Ollama container
Now we can redeploy the Ollama containers changing the entryPoint and enabling HTTPS.
$ podman run \
--cap-drop=ALL \
--detach \
--gpus=all \
--label traefik.http.routers.ollama-api.rule="Host(\`ollama-api.home.lan\`)" \
--label traefik.http.routers.ollama-api.entrypoints=https \
--label traefik.http.routers.ollama-api.service=ollama-api \
--label traefik.http.services.ollama-api.loadbalancer.server.port=11434 \
--name ollama-container \
--pod ai_pod \
--pull newer \
--volume ollama-container:/root/.ollama \
docker.io/ollama/ollama:latestOpen WebUI container
$ podman run \
--cap-drop=ALL \
--detach \
--env OLLAMA_BASE_URL=http://localhost:11434 \
--env WEBUI_AUTH=false \
--label traefik.enable=true \
--label traefik.http.routers.ollama.rule="Host(\`ollama.home.lan\`)" \
--label traefik.http.routers.ollama.entrypoints=https \
--label traefik.http.routers.ollama.service=ollama \
--label traefik.http.services.ollama.loadbalancer.server.port=8080 \
--name open-webui-container \
--pod ai_pod \
--pull newer \
--volume open-webui-container:/app/backend/data \
--replace \
ghcr.io/open-webui/open-webui:mainEnable authentication via Keycloak
Open WebUI container
$ podman run \
--cap-drop=ALL \
--detach \
--env OLLAMA_BASE_URL=http://localhost:11434 \
--env REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt \
--env SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt \
--env WEBUI_AUTH=true \
--env ENABLE_OAUTH_SIGNUP=true \
--env OAUTH_MERGE_ACCOUNTS_BY_EMAIL=true \
--env OAUTH_CLIENT_ID=ollama \
--env OAUTH_CLIENT_SECRET="[**CLIENT SECRET FROM KEYCLOAK**]" \
--env OPENID_PROVIDER_URL="https://auth.home.lan/realms/HOME.LAN/.well-known/openid-configuration" \
--label traefik.enable=true \
--label traefik.http.routers.ollama.rule="Host(\`ollama.home.lan\`)" \
--label traefik.http.routers.ollama.entrypoints=https \
--label traefik.http.routers.ollama.service=ollama \
--label traefik.http.services.ollama.loadbalancer.server.port=8080 \
--name open-webui-container \
--pod ai_pod \
--pull newer \
--volume open-webui-container:/app/backend/data \
--volume /etc/ssl/certs/ca-certificates.crt:/etc/ssl/certs/ca-certificates.crt:ro \
--replace \
ghcr.io/open-webui/open-webui:mainNote
If you have been following long the progress, you will need to run
podman exec -it open-webui-container rm /app/backend/data/webui.db && podman restart open-webui-containerto wipe the Open WebUI authentication database, then be sure to use the Open WebUI authentication screen to sign up and create the initial administrator account before authenticating via Keycloak.
TO DO
- Authentication ai-traefik.home.lan
- OAuth client configuration to secret
- Turn off Open WebUI login window - forward to Keycloak