Vound Colorado, Ltd. Knowledge Base - Intella Assist: Using LiteLLM as an OpenAI-Compatible Proxy

Disclaimer

This guide is provided for demonstration purposes only. It is not production-ready: no security hardening, no service management, and no monitoring are included. If you plan to use this in production, you must review and implement proper security, performance, and reliability practices yourself.

What is LiteLLM?

LiteLLM is a lightweight proxy that exposes a uniform, OpenAI-compatible API for a wide range of Large Language Models (LLMs).

It allows you to:

Add features missing in some OpenAI APIs compatibility layers (e.g., response_format, retries, caching, logging).
Simplify integrations: change models without changing client code.

In short: LiteLLM is a universal translator between Intella Assist (which speaks “OpenAI API”) and the many different LLM providers available.

Why use LiteLLM with Intella Assist?

Intella Assist is designed around the OpenAI API. While many providers offer partial compatibility, differences often cause issues.

Example:

Anthropic Claude supports an OpenAI-style API but only partially. It does not support response_format, which Intella Assist requires for structured outputs.

👉 Proxying through LiteLLM fixes this — Claude then behaves as if it fully supports the OpenAI API.

Other benefits of LiteLLM:

Centralized logging & monitoring.
Support for rate-limiting, retries, and caching.
Ability to switch providers with a single config change.

Installing LiteLLM on Ubuntu

Update your system
```
sudo apt update
sudo apt upgrade -y
```

Install Python and pip

sudo apt install python3 python3-pip -y

Create a virtual environment (recommended)

python3 -m venv ~/litellm-env
source ~/litellm-env/bin/activate

Install LiteLLM with proxy support
```
pip install "litellm[proxy]"
```
Set provider API keys (example for Anthropic)
```
export ANTHROPIC_API_KEY="YOUR_API_KEY"
```
(Add to ~/.bashrc to make permanent.)
Run LiteLLM with your chosen model
Example: Anthropic Claude 3.7 Sonnet
```
litellm --model anthropic/claude-3-7-sonnet-latest
```
By default, LiteLLM runs at:
```
http://0.0.0.0:4000/v1
```

Configuring Intella Assist

In Intella Connect/Investigator, go to:
Admin Dashboard → Settings → Intella Assist
Enter your LiteLLM endpoint:
```
http://<your-server-ip>:4000/v1
```
Provide an API key (any non-empty string, e.g. test-key).

LiteLLM itself does not enforce API keys, but Intella requires one.
Click Test integration to confirm.

Example Request

You can test LiteLLM directly:

curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer dummy-key" \
  -d '{
    "model": "anthropic/claude-3-7-sonnet-latest",
    "messages": [{"role":"user","content":"Hello from Intella Assist!"}],
    "response_format": {"type": "json_object"}
  }'

LiteLLM forwards the request to Claude, applies response_format, and returns structured JSON — exactly what Intella Assist expects.

Summary

LiteLLM is a general-purpose OpenAI-compatible proxy.
In our example, Anthropic Claude works with response_format only when proxied through LiteLLM.

✅ You now have the conceptual overview of LiteLLM and a working Ubuntu example with Claude.