Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
212 changes: 12 additions & 200 deletions src/content/docs/demos/04-llm-model.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,10 @@ This guide demonstrates how to deploy a Large Language Model (LLM) in your Otter
Ensure you have the following:

- **Python 3.8+**: For running the test scripts
- **OpenAI API Key**: Obtain from [OpenAI Platform](https://platform.openai.com/api-keys)
- **Python Libraries**: `requests` and `openai`
- **Python Libraries**: `requests`

```bash
pip install requests openai
pip install requests
```

## Deploy LLM Model
Expand Down Expand Up @@ -60,45 +59,39 @@ The model deployment may take several minutes depending on the model size and av
Once your LLM model is deployed and ready, you can test it using Python with OpenAI API integration.

<Aside title="Prerequisites">
Ensure you have Python and required libraries installed:
Ensure you have Python and the `requests` library installed:

```bash
pip install requests openai
pip install requests
```

Obtain your OpenAI API key from [OpenAI Platform](https://platform.openai.com/api-keys).
</Aside>

### Connection Information

Before running the test scripts, you'll need:
- **OpenAI API Key**: Your API key from OpenAI
- **Model Name**: The model you created (e.g., `llm-demo`)
- **API Base URL**: Optional, if using a custom endpoint

<Tabs>

Before running the test scripts, you'll need to find the following information from the `<url>/scope/<scope-name>/models/llm` page:
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation mentions finding information from the page <url>/scope/<scope-name>/models/llm but doesn't explain what <url> and <scope-name> should be replaced with, or where users can find these values. This could leave users uncertain about where to navigate. Consider providing more context about what these placeholders represent or linking to documentation about URL structure and scope names.

Suggested change
Before running the test scripts, you'll need to find the following information from the `<url>/scope/<scope-name>/models/llm` page:
Before running the test scripts, you'll need to find the following information from the Models page in your OtterScale cluster. You can open it by going to your cluster URL (for example, `https://your-cluster.example.com`), then navigating to the appropriate scope (namespace) and opening the **Models** page for your LLM, which corresponds to the `<url>/scope/<scope-name>/models/llm` path (where `<url>` is your cluster URL and `<scope-name>` is the name of the scope/namespace you selected):

Copilot uses AI. Check for mistakes.
- **Service URL**: The URL information from the Service card
- **Name**: The `name` field in the model table
- **Model Name**: The `Model Name` field in the model table
Comment on lines +71 to +74
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming convention for these configuration fields is confusing and inconsistent with the code example below. The documentation mentions three fields: "Service URL", "Name", and "Model Name", but the code example uses variables named SERVICE_URL, NAME, and MODEL_NAME. The relationship between these is unclear:

  • "Name" maps to the NAME variable which is used in the "OtterScale-Model-Name" header
  • "Model Name" maps to the MODEL_NAME variable which is used as the "model" in the payload

This creates ambiguity about which UI field corresponds to which variable. Consider using clearer naming that explicitly distinguishes between the model identifier used in the header versus the model identifier used in the API payload, or provide more explicit mapping between the UI fields and code variables.

Copilot uses AI. Check for mistakes.

<TabItem label="Simple Question">

```python
import requests
import json

# Configuration
SERVICE_URL = "<your_service_url>" # e.g., http://localhost:8000
MODEL_NAME = "<your_model_name>" # e.g., llm-demo
MODEL_ID = "RedHatAI/Llama-3.2-1B-Instruct-FP8"
NAME = "<your_name>" # e.g., llm-demo
MODEL_NAME = "<your_model_name>"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve clarity for the user, consider adding an example value for MODEL_NAME, similar to how it's done for SERVICE_URL and NAME. Based on the text earlier in this document, a good example would be meta-llama/Llama-2-7b-chat.

MODEL_NAME = "<your_model_name>"    # e.g., meta-llama/Llama-2-7b-chat

Comment on lines +83 to +84
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable naming and usage here is confusing. Based on the Connection Information section, there are two separate fields from the UI: "Name" and "Model Name". However, the variable names NAME and MODEL_NAME don't clearly indicate their purposes:

  • NAME is used in the "OtterScale-Model-Name" header
  • MODEL_NAME is used as the "model" field in the payload

Consider renaming these variables to be more descriptive of their purposes, such as OTTERSCALE_MODEL_NAME and MODEL_ID, or add inline comments explaining what each represents to help users correctly map the UI fields to the code variables.

Copilot uses AI. Check for mistakes.

def ask_question(question):
"""Send a simple question to the LLM and get a response."""
headers = {
"OtterScale-Model-Name": MODEL_NAME,
"OtterScale-Model-Name": NAME,
"Content-Type": "application/json"
}

payload = {
"model": MODEL_ID,
"model": MODEL_NAME,
"prompt": question
}

Expand All @@ -120,184 +113,3 @@ answer = ask_question(question)
print(f"Q: {question}")
print(f"A: {answer}")
```

</TabItem>

<TabItem label="Conversation">

```python
import requests
import json

# Configuration
SERVICE_URL = "<your_service_url>" # e.g., http://localhost:8000
MODEL_NAME = "<your_model_name>" # e.g., llm-demo
MODEL_ID = "RedHatAI/Llama-3.2-1B-Instruct-FP8"

class LLMChat:
def __init__(self, service_url, model_name, model_id):
self.service_url = service_url
self.model_name = model_name
self.model_id = model_id
self.conversation = []

def add_message(self, role, content):
"""Add a message to the conversation history."""
self.conversation.append({"role": role, "content": content})

def send_message(self, user_message):
"""Send a user message and get a response."""
self.add_message("user", user_message)

headers = {
"OtterScale-Model-Name": self.model_name,
"Content-Type": "application/json"
}

# Build context from conversation history
context = "\n".join([f"{msg['role']}: {msg['content']}" for msg in self.conversation])

payload = {
"model": self.model_id,
"prompt": context
}

try:
response = requests.post(
f"{self.service_url}/v1/chat",
headers=headers,
json=payload
)
response.raise_for_status()
result = response.json()
assistant_response = result.get("response", str(result))
self.add_message("assistant", assistant_response)
return assistant_response
except Exception as e:
return f"✗ Error: {str(e)}"

def clear_history(self):
"""Clear conversation history."""
self.conversation = []

# Test
chat = LLMChat(service_url=SERVICE_URL, model_name=MODEL_NAME, model_id=MODEL_ID)

# First message
response1 = chat.send_message("What are the three main colors of the French flag?")
print(f"Q: What are the three main colors of the French flag?")
print(f"A: {response1}\n")

# Follow-up message (maintains context)
response2 = chat.send_message("Which one represents liberty?")
print(f"Q: Which one represents liberty?")
print(f"A: {response2}\n")
```

</TabItem>

<TabItem label="Complete Example">

```python
import requests
import json

# Configuration
SERVICE_URL = "<your_service_url>" # e.g., http://localhost:8000
MODEL_NAME = "<your_model_name>" # e.g., llm-demo
MODEL_ID = "RedHatAI/Llama-3.2-1B-Instruct-FP8"

class LLMDemo:
def __init__(self, service_url, model_name, model_id):
self.service_url = service_url
self.model_name = model_name
self.model_id = model_id
self.conversation = []

def test_connection(self):
"""Test if the service is accessible."""
try:
headers = {
"OtterScale-Model-Name": self.model_name,
"Content-Type": "application/json"
}
payload = {
"model": self.model_id,
"prompt": "Hello"
}
response = requests.post(
f"{self.service_url}/v1/chat",
headers=headers,
json=payload,
timeout=10
)
response.raise_for_status()
print("✓ Connection successful!")
return True
except Exception as e:
print(f"✗ Connection failed: {str(e)}")
return False

def send_message(self, user_message):
"""Send a message and get a response."""
self.conversation.append({"role": "user", "content": user_message})

headers = {
"OtterScale-Model-Name": self.model_name,
"Content-Type": "application/json"
}

# Build context from conversation history
context = "\n".join([f"{msg['role']}: {msg['content']}" for msg in self.conversation])

payload = {
"model": self.model_id,
"prompt": context
}

try:
response = requests.post(
f"{self.service_url}/v1/chat",
headers=headers,
json=payload
)
response.raise_for_status()
result = response.json()
assistant_response = result.get("response", str(result))
self.conversation.append({"role": "assistant", "content": assistant_response})
return assistant_response
except Exception as e:
return f"✗ Error: {str(e)}"

def clear_history(self):
"""Clear conversation history."""
self.conversation = []

# Usage
if __name__ == "__main__":
demo = LLMDemo(service_url=SERVICE_URL, model_name=MODEL_NAME, model_id=MODEL_ID)

# Test connection
demo.test_connection()

# Start conversation
response1 = demo.send_message("Tell me about artificial intelligence in 2 sentences.")
print(f"Q: Tell me about artificial intelligence in 2 sentences.")
print(f"A: {response1}\n")

# Follow-up questions
response2 = demo.send_message("What are the main applications?")
print(f"Q: What are the main applications?")
print(f"A: {response2}\n")

response3 = demo.send_message("How does machine learning fit into this?")
print(f"Q: How does machine learning fit into this?")
print(f"A: {response3}")
```

</TabItem>
</Tabs>

<Aside type="note">
Replace `<your_openai_api_key>` with your actual OpenAI API key and `<your_model_name>` with your deployed model name. The examples use the OpenAI API format, which is compatible with many LLM services.
</Aside>
Loading