The Thursday Night That Got Away From Me

I figured this would be a quick one. Install Ollama on the gaming laptop, spin up a chat UI in Docker, configure the network so my other machines can reach it. Done before dinner.

That was 6:30 PM. By the time the server was actually working from every device on my network, every restaurant within delivery range had closed. I ended up eating half a block of sharp cheddar over the kitchen sink at midnight while my laptop hummed in the other room, running a 30-billion parameter language model.

But the server works. It costs $0/month. And I'd do it again.

$0/mo

Monthly cost of running your own AI server after the initial hardware investment. No API keys, no per-token billing, no surprise invoices. Just electricity.

The project took longer than expected because I was juggling it alongside everything else that Thursday. Production pipelines were shipping 76 translated articles for ForopoulosNow. I had research papers open in another window. And I was trying to build out the core inference server for my platform on an old gaming laptop specifically because of its GPU and GDDR6 VRAM. Somewhere around 9 PM I grabbed a handful of almonds off my desk and called it dinner.

It was not dinner.

The server costs $0/month, sends zero data to the cloud, and can't be rate-limited because a company in San Francisco had a bad quarter. All it cost me was a dignified meal.

Here's the complete setup guide, written so you can do this in an actual 45 minutes if you configure your network before you start debugging why localhost works but nothing else does.

Hardware: What Actually Matters

Gaming laptop with RGB keyboard lighting in a dark room, screen displaying code — HERMES. Named after the messenger god because it delivers responses. Also because naming your machines after Greek gods makes SSH sessions 40% more dramatic.

You don't need to buy anything. I specifically like laptops for this kind of low-level inference work because of the energy footprint. A laptop GPU sips power compared to a full desktop rig, and for the kind of tasks I'm throwing at it (coding assistance, brainstorming, draft iteration), it doesn't need to be a datacenter. Here's what I'm running:

Component	Spec	Why It Matters
GPU	NVIDIA RTX 3080 Mobile, 16 GB GDDR6 VRAM	VRAM is the bottleneck. 16 GB runs 30B parameter models comfortably. 8 GB is the minimum for useful models.
CPU	Intel Core i9 (12th gen)	Handles orchestration. The GPU does the real work.
RAM	32 GB DDR5	Enough for the OS, Docker, and model runtime simultaneously.
Storage	1 TB NVMe SSD	Models are large. llama3.1:8b is 4.7 GB. Bigger ones hit 20+ GB. Fast reads matter.

You Probably Already Own This

Any NVIDIA GPU with 8+ GB VRAM can run useful local models. That gaming laptop collecting dust in your closet, that old desktop with a GTX 1070 or better, that's your AI server. The most expensive part is hardware you've already paid for.

Installing Ollama

Ollama is the runtime. Think of it as Docker for AI models: you pull a model, run it, and Ollama handles VRAM allocation, inference, and the API layer. Installation is the easy part.

bash

1# On Windows: download from ollama.com and run the installer
2# On Linux:
3curl -fsSL https://ollama.com/install.sh | sh
4
5# Pull your first model
6ollama pull llama3.1:8b
7
8# Test it locally
9ollama run llama3.1:8b "Explain Docker in one sentence"

If this works, your GPU is doing inference. The model is running locally on your hardware. No API key needed, no cloud involved.

30B

Parameters in a large local model. 30 billion tiny numerical weights working together to predict the next token. Your GPU juggles all of them in parallel. Smaller 8B models are faster; bigger ones are smarter.

Setting Up Open WebUI

Open WebUI gives you a proper chat interface that talks to Ollama over its API. One Docker command:

bash

1docker run -d -p 3000:8080 \
2  --add-host=host.docker.internal:host-gateway \
3  -v open-webui:/app/backend/data \
4  --name open-webui \
5  --restart always \
6  ghcr.io/open-webui/open-webui:main

Open localhost:3000 in your browser. Create an admin account. Select a model from the dropdown. Start chatting. This part genuinely takes about two minutes.

The trap is thinking you're done here. You're not. Everything works on localhost. Nothing works from other devices yet. That's the next section, and it's where I lost my evening.

Network Configuration (Do This First)

This is the section that most local AI tutorials skip entirely, and it's the reason my 45-minute project turned into a six-hour ordeal. If you configure your network before testing from other devices, you'll avoid the frustrating loop of "why doesn't this work" followed by discovering each security layer one at a time.

Localhost Proves Nothing

Your model running on localhost confirms exactly one thing: the model works. It tells you nothing about whether other devices on your network can reach it. Configure all your network and firewall rules before you even try accessing from another machine.

There are four layers to get through. Do them all upfront.

Layer 1: Bind Ollama to Your Network

Ollama defaults to listening on 127.0.0.1 only. You need to change this to 0.0.0.0 so it accepts connections from other devices.

bash

1# Windows: System Properties -> Environment Variables
2# Add new system variable: OLLAMA_HOST = 0.0.0.0
3# Then restart Ollama
4
5# Linux: edit the systemd service or export in .bashrc
6export OLLAMA_HOST=0.0.0.0
7systemctl restart ollama

Layer 2: Windows Defender Firewall Rules

Create inbound rules for both services:

powershell

1# Run PowerShell as Administrator
2New-NetFirewallRule -DisplayName "Ollama API" `
3  -Direction Inbound -Protocol TCP -LocalPort 11434 -Action Allow
4
5New-NetFirewallRule -DisplayName "Open WebUI" `
6  -Direction Inbound -Protocol TCP -LocalPort 3000 -Action Allow

Layer 3: Third-Party Security Tools

Audit Every Security Tool on the Machine

Portmaster, GlassWire, Little Snitch, Malwarebytes Web Protection, any VPN client with a kill switch. If it touches network traffic, it needs an exception for your AI server ports. Don't wait until you're debugging at 10 PM to discover that Portmaster has been quietly blocking every inbound connection. Open each tool's dashboard and add rules for ports 11434 and 3000 before you test anything.

I run Portmaster for network privacy. It does its job well, which means it blocks inbound connections by default. Adding exceptions for the Ollama and Open WebUI ports took thirty seconds. Knowing to do it upfront instead of discovering it four hours into debugging would have saved my entire evening.

Dark computer terminal screen with green text and error messages — The difference between configuring your firewall at 7 PM and discovering it at 11 PM is roughly one missed dinner.

Layer 4: Verify Each Layer

After configuring everything, test from another device on your network:

bash

1# From another machine, test the Ollama API directly
2curl http://192.168.1.100:11434/api/tags
3
4# If this returns a JSON list of models, your network config is correct
5# If it times out, work backwards through the layers above

Network security layers between your AI server and the rest of your devices. Ollama binding, OS firewall, third-party security tools, and network-level rules. Configure all four before testing.

Securing the Setup

The Ollama API has zero authentication. Anyone on your network can hit it and generate whatever they want. Open WebUI has a login screen. Plan your security around that reality.

Open WebUI (port 3000): Accessible from trusted devices. Built-in authentication. This is how users interact with the models.
Ollama API (port 11434): Restrict to your local subnet only. No reason for it to be broadly accessible.

powershell

1# Restrict Ollama API to local network only
2New-NetFirewallRule -DisplayName "Ollama API - Local Only" `
3  -Direction Inbound -Protocol TCP -LocalPort 11434 `
4  -RemoteAddress 192.168.1.0/24 -Action Allow

Right-Sized Security

This isn't enterprise infrastructure. It's a gaming laptop running AI in your office. The chat UI has auth. The raw API is scoped to your local subnet. That's the right level for a home lab. Over-engineering the security means you won't actually use the thing.

What You Get

Once everything is configured, the payoff is immediate:

Computer workstation with multiple monitors showing development tools and data — Every device on the network can reach HERMES. Desktop, MacBook, even my phone. All private, all free.

Chat from any device on your network through Open WebUI's clean interface.
VS Code autocomplete powered by a local 30B model with sub-second response times.
API access from scripts and development tools. Local inference at network speed.
Zero monthly cost. The GPU was already paid for. The software is open source.

130W

Power draw of the RTX 3080 under full inference load. About the same as a bright light bulb. Budget $10-15/month in electricity. Compare that to $200+/month in API costs for equivalent usage.

Nobody can take it away from you. No pricing changes. No API deprecations. No "we're pivoting to enterprise" emails. It runs on your hardware.

The models aren't as sharp as Claude or GPT-4 for complex reasoning. I won't pretend otherwise. But for coding assistance, brainstorming, draft writing, and the kind of rapid-fire iteration where you ask 50 questions in an hour without thinking about cost? They're genuinely useful. And they're yours.

When there's no cost per query, you stop self-censoring your questions. You ask the dumb question. You run the experiment that probably won't work. That changes how you build things.

The Honest Trade-offs

What you gain:

Complete privacy. Your prompts, your code, your data never leave your network. That's not a privacy policy. That's physics.
No rate limits or quotas. Ask 500 questions in an hour. Nobody throttles you.
Offline capability. Internet goes down? The AI doesn't care. It's three feet away from you.
Iteration speed. This is the real value. When queries are free, you use AI differently. You experiment more. You iterate faster.

What you give up:

Raw intelligence. Cloud models are still smarter for complex reasoning. Local models are closing the gap fast, but they're not there yet.
Hardware requirements. 8+ GB VRAM minimum, 16+ recommended. Not everyone has this sitting around.
Setup time. Budget an evening the first time. It gets easier after that.
Power consumption. A laptop GPU is efficient for this workload, but it still pulls real watts under load. Budget $10-15/month extra on your electric bill. Still a fraction of cloud API costs.
Manual updates. New model releases? You pull them yourself.

The Real Value

The killer feature of local AI isn't saving money on API bills. It's the iteration speed. When there's no cost per query, you stop rationing your questions and start treating the AI like a colleague at the next desk instead of a metered service. That shift changes how you work.

Best Practices (What I Wish I'd Done)

Organized desk workspace with laptop, monitor, and warm lighting at night — A calmer version of this setup. Mise en place applies to server builds too.

Configure your network before testing. Audit every firewall and security tool on the machine. Make the rules first, test second. This one change would have saved me four hours.
Use Open WebUI as the front door. Don't expose the raw Ollama API to your network unless you have a specific integration need. The built-in auth is worth it.
Start with one model. Get good at prompting llama3.1:8b before you download five different models. VRAM isn't infinite and model swapping takes time.
Name your machine. Saying "HERMES is being difficult" is more satisfying than "the laptop thing isn't working" when you're debugging at 11 PM.
Order food before you open a terminal. Personal policy.

Total API costs since setting up HERMES. The models run on my hardware, on my network, under my control. The only ongoing cost is electricity.

The Punchline

I may have had half a block of cheese for dinner, but at least I got an in-house LLM server out of it.

HERMES runs 24/7 now. I can talk to it from any device in my house. My code editor uses it for completions. My scripts hit its API for batch processing. It costs nothing. It sends nothing to the cloud. It works when the internet doesn't. And nobody can take it away with a pricing change or an API deprecation.

The cloud isn't going away. I still use Claude for the heavy lifting. But having your own AI running locally isn't a novelty. It's a tool. And if you've got a gaming GPU gathering dust, you already own the hard part.

"The best time to set up a local AI server was six months ago. The second best time is tonight. Just order dinner first."

Build Your Own AI Server 0/7

Check your GPU's VRAM (nvidia-smi on Linux, Task Manager on Windows)
Install Ollama and pull llama3.1:8b as your starter model
Spin up Open WebUI in Docker and verify it works on localhost
Configure ALL network and firewall rules before testing from other devices
Add exceptions in every third-party security tool (Portmaster, Malwarebytes, etc.)
Test API access from another device on your network
Lock down the Ollama API to your local subnet only

Google Calendar Outlook

How was this article?

I Built My Own AI Server and Had Cheese for Dinner

Contents

The Thursday Night That Got Away From Me

Hardware: What Actually Matters

Installing Ollama

Setting Up Open WebUI

Network Configuration (Do This First)

Layer 1: Bind Ollama to Your Network

Layer 2: Windows Defender Firewall Rules

Layer 3: Third-Party Security Tools

Layer 4: Verify Each Layer

Securing the Setup

What You Get

The Honest Trade-offs

Best Practices (What I Wish I'd Done)

The Punchline

Share

You Might Also Like

Flippers's Electromagnetic Grimoire: Wireless Reconnaissance and Documentation Part 11: Jamming, Interference, and Defensive Signal Disruption Recognition

Cloudflare Command Center: Domains, DNS, Zero Trust, and Tunnels from Beginner to Expert Part 8: Email DNS Done Right. SPF, DKIM, DMARC, MX, and Domain Reputation

3D Printing 101 – Part 13: You Made It. The Full Journey Recap and What Comes Next

Lee Foropoulos

Never Miss a Post

Sync Across Devices

Essential Cookies

Contents

The Thursday Night That Got Away From Me

Hardware: What Actually Matters

Installing Ollama

Setting Up Open WebUI

Network Configuration (Do This First)

Layer 1: Bind Ollama to Your Network

Layer 2: Windows Defender Firewall Rules

Layer 3: Third-Party Security Tools

Layer 4: Verify Each Layer

Securing the Setup

What You Get

The Honest Trade-offs

Best Practices (What I Wish I'd Done)

The Punchline

Share

You Might Also Like

Flippers's Electromagnetic Grimoire: Wireless Reconnaissance and Documentation Part 11: Jamming, Interference, and Defensive Signal Disruption Recognition

Cloudflare Command Center: Domains, DNS, Zero Trust, and Tunnels from Beginner to Expert Part 8: Email DNS Done Right. SPF, DKIM, DMARC, MX, and Domain Reputation

3D Printing 101 – Part 13: You Made It. The Full Journey Recap and What Comes Next

Lee Foropoulos

Stay in the Loop

Never Miss a Post

Sync Across Devices