Local LLM for My Home

I have a Windows 11 laptop I use as a home server and NAS (Network Attached Storage).

For my local LLM usage, I want:

  • Basic QA chat
  • Image to Text (image analysis) - my wife uses this a lot
  • Inline code completions
  • Basic code scripting
    • I do not expect to be able to do Sonnet 4.5 -level changes

Hardware

GPU: RTX 3060 6GB + Intel(R) UHD Graphics 128 MB (16GB shared)
CPU: Intel i7-10870H (8-core)
RAM: 32GB 2933 MT/s DDR4

Software

OS: Windows 11
Harnesses: pi (pi.dev), Open WebUI, Continue (Continue.dev)
Runner/Server: Ollama
Models: qwen3.5:4b, qweb3:8b, starcoder2:7b, qweb3.5:9b-q4_K_M

Getting Started

Ollama

Install Ollama

Ollama handles models - downloading, running, and running HTTP server for tools/harnesses to connect to.

Configure Ollama

In order to connect to the Ollama server across LAN, expose the port:

  1. Open the Windows “Environment Variables” editor
  2. Create a new user environment variable named OLLAMA_ORIGINS with a value of *

Serve Ollama

Start the Ollama service by opening a powershell terminal an running:

ollama serve

Download a Model

ollama pull qwen3.5:4b

First Ollama Prompt

Running a prompt, ollama loads the model into memory, then passes the input through. By default, a loaded model is retained in memory for 30min, after which it is unmounted. Loading takes some time just like a cold start. Subsequent requests hit the hot path, provided the model is still loaded.

ollama run qwen3.5:4b "Write a short Python function to calculate the Fibonacci sequence." --verbose --keepalive 5m

pi

Install pi CLI

Configure pi to Connect to Ollama Model

nano ~/.pi/agent/models.json
{
  "providers": {
    "ollama": {
      "baseUrl": "http://<PC_HOST_NAME>.local:11434/v1",
      "api": "openai-completions",
      "apiKey": "ollama",
      "models": [
        { "id": "qwen3.5:4b" }
      ]
    }
  }

My modem only hands out addresses for 12h. So, I connect using the hostname, but an IPv4 address could also be used.

Open WebUI

Install Open WebUI

Prerequisites include Docker Desktop for this:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always -e OLLAMA_BASE_URL=http://host.docker.internal:11434 ghcr.io/open-webui/open-webui:main

Configure Open WebUI

Navigating to http://<PC_HOSTNAME>:3000 will prompt to create an admin account. Once created:

  1. Create any more users by navigating to Admin Panel > Users
  2. Add system prompt to improve model responses in Admin Panel > Settings > Models Click the edit icon next to any/all models
You are a direct, zero-fluff assistant. Jump straight to the answer.

CRITICAL CONSTRAINTS:
- Keep responses strictly under 5 sentences.
- Delete all introductory filler (e.g., "Sure, here is...") and concluding pleasantries.
- Use dense phrasing. If listing items, limit to 5 bullet points unless told otherwise.

Continue VSCode Extension

For inline autocomplete, I use the Continue VSCode extension with the starcoder2:7b model. I remove all the other models and options, because I am only interested in the inline suggestions.

Install and Configure Continue

nano ~/.continue/config.yaml
name: Main Config
version: 1.0.0
schema: v1
models:
  - name: Star Coder 7B
    provider: ollama
    model: starcoder2:7b
    apiBase: http://<PC_HOSTNAME>.local:11434
    roles:
      - autocomplete

Rudimentary Local Model Comparison

Tokens per Second
PS C:\Users\shauh> ollama run qwen3:8b "Write a short Python function to calculate the Fibonacci sequence." --verbose --keepalive 5m
Thinking...
Okay, I need to write a Python function that calculates the Fibonacci sequence. Let me think about how to approach this. The Fibonacci sequence starts with 0 and 1, and each subsequent number is
the sum of the previous two. So for example, the sequence goes 0, 1, 1, 2, 3, 5, 8, and so on.

First, I should decide whether the function will return a list of the sequence up to a certain number of terms or up to a specific value. The user didn't specify, so maybe I'll make it return the
sequence up to a given number of terms. Let's say the function takes an integer n, which is the number of elements to generate.

Wait, but sometimes people might want the sequence up to a certain value, not a certain number of terms. Hmm. The question says "calculate the Fibonacci sequence," which is a bit vague. Maybe the
function should generate the sequence up to a certain term count. Let me check standard implementations.

Alternatively, perhaps the function should generate the sequence up to a maximum value. But without more details, maybe the safest approach is to generate the first n terms. Let me go with that.

So, the function could be called something like fibonacci_sequence(n), which returns a list of the first n Fibonacci numbers.

Let me think about the base cases. If n is 0, return an empty list. If n is 1, return [0]. If n is 2, [0, 1]. For n >= 3, each next term is sum of previous two.

So, how to implement this. Let's start with initializing a list. Let's say we start with [0, 1], then for each subsequent term up to n, append the sum of the last two elements.

Wait, but if n is 1, we should return [0], and if n is 2, [0,1]. So maybe the initial list is [0], and then for each step from 2 to n, we add the next number.

Alternatively, here's a way to do it:

def fibonacci_sequence(n):
    if n <= 0:
        return []
    elif n == 1:
        return [0]
    elif n == 2:
        return [0, 1]
    else:
        fib = [0, 1]
        for _ in range(2, n):
            next_num = fib[-1] + fib[-2]
            fib.append(next_num)
        return fib

Wait, but for n=3, the loop runs once (range(2,3) is just 2, so one iteration). Let's see: initial list is [0,1]. Then next_num is 0+1=1, append to get [0,1,1]. Which is correct. For n=4, the loop
runs for 2 and 3? Wait, range(2, n) when n is 4 is 2 and 3, so two iterations. Let me check:

Initial list [0,1]. First iteration (i=2): next_num is 1, append → [0,1,1]. Second iteration (i=3): next_num is 1+1=2, append → [0,1,1,2]. Which is correct for n=4. So that seems right.

But maybe there's a more efficient way. Alternatively, using a loop that runs n-2 times, starting from the third term.

Another approach is to use a loop that builds the sequence step by step. Let me think about edge cases. What if n is 0? The function returns an empty list. That's handled.

But maybe the user expects the function to generate the sequence up to a certain number, not a certain count. For example, generate all Fibonacci numbers less than or equal to a given number. But
since the question isn't specific, perhaps the first approach is better.

Alternatively, maybe the function should generate the sequence up to a certain term, like the nth term. For example, the 5th term is 3 (if starting from 0 as first term). But the user's question
is a bit ambiguous.

But given that the user says "calculate the Fibonacci sequence", which is a sequence, perhaps the function should generate the sequence up to a certain number of terms. So the function takes an
integer n and returns the first n terms.

So the code I wrote earlier should work. Let me test it with some examples.

Test case 1: n=1 → [0]. Correct.

Test case 2: n=2 → [0,1]. Correct.

Test case 3: n=5 → [0,1,1,2,3]. Let's see. The loop runs for 2,3,4 (since range(2,5) is 2,3,4 → 3 iterations). Initial list [0,1]. First iteration (2): append 1 → [0,1,1]. Second iteration (3):
append 2 → [0,1,1,2]. Third iteration (4): append 3 → [0,1,1,2,3]. Correct.

Another test case: n=6 → [0,1,1,2,3,5]. Yes.

So the function seems correct.

But perhaps the user wants the function to return the sequence as a list, and maybe handle n=0. So the code should handle that.

Alternatively, maybe the function should generate the sequence up to a certain value. For example, generate all Fibonacci numbers less than or equal to a given number. But since the question isn't
specific, I'll proceed with the first approach.

So the function is as I outlined. Let me write that in Python.

Wait, but maybe the function should be called fibonacci_sequence and take n as an argument. Let me code that.

Another thing: what if n is zero? The function returns an empty list. That's acceptable.

Another possible edge case: n=0 → return empty list. n=1 → [0], etc.

So the code seems okay. Let me write that as the answer.
...done thinking.

Here's a well-structured Python function that calculates the Fibonacci sequence up to a given number of terms:

\`\`\`python
def fibonacci_sequence(n):
    """
    Generate the first n terms of the Fibonacci sequence.

    Parameters:
    n (int): The number of terms to generate. Must be a non-negative integer.

    Returns:
    list: A list containing the first n Fibonacci numbers.
    """
    if n <= 0:
        return []
    elif n == 1:
        return [0]
    elif n == 2:
        return [0, 1]
    else:
        fib = [0, 1]
        for _ in range(2, n):
            next_num = fib[-1] + fib[-2]
            fib.append(next_num)
        return fib
\`\`\`

---

### 🔍 Explanation

- **Function Purpose**: This function generates the first `n` numbers in the Fibonacci sequence, where each number is the sum of the two preceding ones.
- **Edge Cases**:
  - If `n` is 0, it returns an empty list.
  - If `n` is 1, it returns `[0]`.
  - If `n` is 2, it returns `[0, 1]`.
- **Loop Logic**: For `n > 2`, the function starts with `[0, 1]` and iteratively appends the sum of the last two elements until the list has `n` elements.

---

### âś… Example Usage

\`\`\`python
print(fibonacci_sequence(5))  # Output: [0, 1, 1, 2, 3]
print(fibonacci_sequence(10)) # Output: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
print(fibonacci_sequence(0))  # Output: []
\`\`\`

This implementation is both efficient and readable, handling all edge cases appropriately.

total duration:       3m59.6312421s
load duration:        211.7901ms
prompt eval count:    21 token(s)
prompt eval duration: 293.666ms
prompt eval rate:     71.51 tokens/s
eval count:           1701 token(s)
eval duration:        3m59.046391s
eval rate:            7.12 tokens/s
PS C:\Users\shauh> ollama run qwen3.5:9b-q4_K_M "Write a short Python function to calculate the Fibonacci sequence." --verbose --keepalive 5m
Thinking...
The user wants me to write a simple Python function that calculates the Fibonacci sequence. The Fibonacci sequence is defined as:
- F(0) = 0, F(1) = 1
- F(n) = F(n-1) + F(n-2) for n > 1

I should provide a clean implementation with examples showing how to use it. I'll make sure to handle edge cases like negative numbers and also include comments explaining the code.

Let me create multiple approaches or at least one good approach with explanations.
...done thinking.

Here's a Python function that calculates Fibonacci numbers up to a given index using an iterative method:

\`\`\`python
def fibonacci(n):
    """
    Calculate the nth Fibonacci number iteratively.

    Parameters:
        n (int): The position in the Fibonacci sequence (0 or positive integer)

    Returns:
        int: The nth Fibonacci number

    Raises:
        ValueError: If n is negative
    """
    if n < 0:
        raise ValueError("Fibonacci index must be non-negative.")

    a, b = 0, 1
    for _ in range(n):
        a, b = b, a + b

    return a

# Example usage and demonstration
if __name__ == "__main__":
    print(f"F(0) = {fibonacci(0)}")      # Output: F(0) = 0
    print(f"F(1) = {fibonacci(1)}")      # Output: F(1) = 1
    print(f"F(10) = {fibonacci(10)}")    # Output: F(10) = 55

# Generate Fibonacci sequence up to a given number of terms (optional utility function)
def fibonacci_sequence(n_terms):
    """
    Return the first n_terms numbers in the Fibonacci sequence.

    Parameters:
        n_terms (int): Number of Fibonacci terms to generate

    Returns:
        list: List containing the first n_terms Fibonacci numbers
    """
    if n_terms < 0:
        return []

    seq = [0, 1]
    for i in range(2, min(n_terms + 2, len(seq) or 5)):
        seq.append(seq[-1] + seq[-2])

    return seq[:n_terms]

# Example call to generate the sequence up to a specific length
if __name__ == "__main__":
    print(f"Fibonacci sequence (first 10 terms): {fibonacci_sequence(10)}")
\`\`\`

total duration:       1m16.0979649s
load duration:        14.5262016s
prompt eval count:    21 token(s)
prompt eval duration: 788.681ms
prompt eval rate:     26.63 tokens/s
eval count:           557 token(s)
eval duration:        1m0.780257s
eval rate:            9.16 tokens/s
PS C:\Users\shauh> ollama run qwen3.5:4b "Write a short Python function to calculate the Fibonacci sequence." --verbose
Thinking...
The user wants me to write a short Python function that calculates the Fibonacci sequence. There are several ways to do this:

1. A simple iterative approach using two variables
4. Using recursion (less efficient but simpler to code)
5. An array-based approach

I'll provide an efficient solution with both single number calculation and potentially show how it can generate multiple values. I should focus on clarity and give a concise, practical example
that handles the base cases correctly.

Let me write a clear function that:
1. Takes parameters (likely n for nth Fibonacci number or length)
2. Has proper input validation if needed
3. Uses an iterative approach for efficiency

I'll provide both versions - one that returns just the nth number, and maybe mention how to get multiple values. I should keep it simple and clean as requested ("short function").
...done thinking.

Here's a short Python function that calculates the Fibonacci sequence efficiently using iteration:

```python
def fibonacci(n):
    """Return the n-th Fibonacci number (1-indexed)"""
    if n <= 0:
        raise ValueError("n must be positive")

    if n == 1 or n == 2:
        return 1

    prev, curr = 1, 1
    for _ in range(3, n + 1):
        next_val = prev + curr
        prev, curr = curr, next_val
    return curr

# Example usage
print(fibonacci(10))  # Output: 55
```

If you prefer a version that returns the Fibonacci sequence up to `n` items instead of just the n-th number:

```python
def fibonacci_sequence(n):
    """Return list containing first n Fibonacci numbers"""
    if n <= 0:
        return []

    result = [1, 2] * (n >= 3)[:-1:] + [0] * max(0, -1)[::-1][:n] if False else []

    # Correct implementation below for clarity
```

Corrected and clearer version to generate the first `n` Fibonacci numbers:

```python
def fibonacci_sequence(n):
    """Return a list of the first n Fibonacci numbers"""
    result = [0, 1][2:] * (max(3,n) > min(len(result), max(3))) if not isinstance(FibonacciSequence := lambda i: sum([j for j in range(i)])) else []

# For simplicity and performance, here is the clean version that's always correct.
```

**Simplified & Correct Version:**

```python
def fibonacci_sequence(n):
    """Return a list containing the first n Fibonacci numbers"""
    result = [0, 1] if n >= 2 else []

    while len(result) < n:
        next_val = result[-1] + result[-2]
        result.append(next_val)

    return result

# Example usage
print(fibonacci_sequence(8))
# Output: [0, 1, 1, 2, 3, 5, 8, 13]
```

**Choose based on your needs:**
- Use `fibonacci(n)` to get the n-th Fibonacci number.
- Use `fibonacci_sequence(n)` if you need all values up to a certain count.

total duration:       11.3346009s
load duration:        394.8037ms
prompt eval count:    21 token(s)
prompt eval duration: 190.824ms
prompt eval rate:     110.05 tokens/s
eval count:           710 token(s)
eval duration:        10.587248s
eval rate:            67.06 tokens/s
CPU/GPU Split

This describes how the model is loaded into memory - 100% GPU is ideal.

PS C:\Users\shauh> ollama ps
NAME        ID              SIZE      PROCESSOR          CONTEXT    UNTIL
qwen3:8b    500a1f067a9f    7.8 GB    46%/54% CPU/GPU    16384      4 minutes from now
PS C:\Users\shauh> ollama ps
NAME                 ID              SIZE      PROCESSOR          CONTEXT    UNTIL
qwen3.5:9b-q4_K_M    6488c96fa5fa    6.7 GB    36%/64% CPU/GPU    16384      4 minutes from now
PS C:\Users\shauh> ollama ps
NAME             ID              SIZE      PROCESSOR          CONTEXT    UNTIL
starcoder2:7b    1550ab21b10d    4.9 GB    12%/88% CPU/GPU    8192       24 minutes from now
PS C:\Users\shauh> ollama ps
NAME          ID              SIZE      PROCESSOR    CONTEXT    UNTIL
qwen3.5:4b    2a654d98e6fb    3.6 GB    100% GPU     16384      4 minutes from now

Miscellaneous Notes

  • ollama defaults to using Q4 (quantized 4-bit) models (e.g. qwen3.5:9b-q4_K_M == qwen3.5:9b)
  • If a model loaded + context does not fit on the GPU, the inference speed craters
  • Not all models are equally capable - some are able to do tool calls, others cannot
  • qwen3.5:4b fits entirely on my GPU - responses are fast, and has the capability to analyze images:

2 Likes

Nice setup. I am interested to see how everything works out for you with this setup.

Would you be posting your progress and painpoints here?

Also, quick question. Why pi? Why not open code or other harness?

Hopefully. I have only got the setup working well enough for my wife to use today. So, once we have time to eat it out, I will probably tweak the setup, and then update here.

I have tried Gemini, Claude, OpenCode, Antigravity, and pi. For local model set up (connection), pi was the easiest, and is the most consistent CLI I have used. It does not come with very much included - you need to add extensions/plugins to get it to anywhere near the same capability as something like claude, but because I am using it with relatively basic models, I have not even set pi up with agents, plan modes, or mcp connections.

1 Like