Skip to content

Conversation

@danenania
Copy link
Contributor

Implements a get_user tool that allows the AI to retrieve authenticated user information during chat conversations. This enables testing of tool-calling behavior and user context handling.

  • Add TOOLS array with get_user function definition
  • Implement executeToolCall function to handle tool execution
  • Add tool call loop with max 10 iterations
  • Return user JWT claims (userId, role, issuer, timestamps, scope)

Implements a get_user tool that allows the AI to retrieve authenticated
user information during chat conversations. This enables testing of
tool-calling behavior and user context handling.

- Add TOOLS array with get_user function definition
- Implement executeToolCall function to handle tool execution
- Add tool call loop with max 10 iterations
- Return user JWT claims (userId, role, issuer, timestamps, scope)
Copy link

@promptfoo-scanner promptfoo-scanner bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the new get_user tool implementation for LLM function calling. The tool enables personalized chat by providing user context to the LLM. I found one medium-severity issue related to PII exposure to the external LLM service.

Minimum severity threshold for this scan: Medium

Comment on lines +28 to +36
return JSON.stringify({
userId: req.user.sub,
role: req.user.role || 'unknown',
issuer: req.user.iss,
audience: req.user.aud,
issuedAt: req.user.iat ? new Date(req.user.iat * 1000).toISOString() : null,
expiresAt: req.user.exp ? new Date(req.user.exp * 1000).toISOString() : null,
scope: req.user.scope || null
});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MediumThe get_user tool sends user PII (user ID, role, token issuer, audience, timestamps, and scope) to the external LiteLLM service. While this appears to be a single-user context where users access only their own data, the token metadata fields (issuer, audience, issuedAt, expiresAt, scope) may be unnecessary for typical chat personalization and increase the PII footprint sent to the LLM provider.

💡 Suggested Fix

Apply data minimization by limiting the tool to essential fields only:

case 'get_user':
  if (req.user) {
    return JSON.stringify({
      userId: req.user.sub,
      role: req.user.role || 'unknown',
      // Removed: issuer, audience, issuedAt, expiresAt, scope
      // Token metadata typically not needed for chat personalization
    });
  } else {
    return JSON.stringify({
      error: 'No authenticated user',
      message: 'This endpoint requires authentication to retrieve user information'
    });
  }

Additionally, ensure your LiteLLM deployment has appropriate data protection agreements in place, and consider updating your privacy policy to disclose that user data may be sent to the LLM provider during chat interactions.

🤖 AI Agent Prompt

The get_user tool at src/routes/chat.ts:28-36 exposes user PII including JWT token metadata (issuer, audience, timestamps, scope) to an external LiteLLM service. Investigate whether all these fields are necessary for the chat personalization use case. Check if there are any privacy policy or user consent mechanisms that cover this data sharing. Consider whether data minimization would be appropriate - typically only userId and role are needed for personalization. Also investigate the LiteLLM deployment model (self-hosted vs. cloud) and whether data processing agreements are in place. If this is a self-hosted deployment, the risk may be lower. Look for any application-wide patterns for handling PII in LLM contexts.

Copy link

@promptfoo-scanner promptfoo-scanner bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this PR which adds a new get_user tool for AI function calling. The tool allows the LLM to retrieve information about the authenticated user. I found one security concern related to sending user authentication data (PII) to external LLM providers.

Minimum severity threshold for this scan: Medium

Comment on lines +28 to +36
return JSON.stringify({
userId: req.user.sub,
role: req.user.role || 'unknown',
issuer: req.user.iss,
audience: req.user.aud,
issuedAt: req.user.iat ? new Date(req.user.iat * 1000).toISOString() : null,
expiresAt: req.user.exp ? new Date(req.user.exp * 1000).toISOString() : null,
scope: req.user.scope || null
});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MediumThe get_user tool sends user authentication data (user ID, role, JWT claims) to external LLM providers via LiteLLM. When the LLM calls this tool, the results are added to the message history and forwarded to third-party APIs (OpenAI, Anthropic, etc.). This exposes user identifiers and authentication metadata to external parties, which could raise compliance concerns under GDPR or CCPA if users haven't consented to this data sharing.

💡 Suggested Fix

Consider anonymizing user data before sending it to the LLM. Instead of real user IDs, use session-scoped hashes and generalized role categories:

case 'get_user':
  if (req.user) {
    return JSON.stringify({
      // Use session-scoped anonymous ID instead of actual user ID
      sessionId: generateSessionHash(req.user.sub, req.sessionID),
      // Generalize role to broader categories
      roleCategory: categorizeRole(req.user.role),
      authenticated: true
    });
  } else {
    return JSON.stringify({
      error: 'No authenticated user',
      message: 'This endpoint requires authentication to retrieve user information'
    });
  }

// Helper functions
function generateSessionHash(userId: string, sessionId: string): string {
  const crypto = require('crypto');
  return crypto.createHash('sha256')
    .update(`${userId}-${sessionId}`)
    .digest('hex')
    .substring(0, 16);
}

function categorizeRole(role: string | undefined): string {
  const adminRoles = ['admin', 'superadmin', 'owner'];
  const powerRoles = ['moderator', 'editor', 'contributor'];
  if (!role) return 'user';
  if (adminRoles.includes(role.toLowerCase())) return 'admin';
  if (powerRoles.includes(role.toLowerCase())) return 'power-user';
  return 'user';
}

Alternatively, if user context isn't essential for the chat functionality, consider removing the tool entirely to avoid any PII exposure.

🤖 AI Agent Prompt

The code at src/routes/chat.ts:28-36 sends user authentication data (user ID from req.user.sub, role, JWT issuer, audience, and timestamps) to external LLM providers. This happens when the LLM calls the get_user tool - the tool results are added to messages at lines 224-228 and then sent to LiteLLM at line 186, which forwards to external APIs.

Your task is to investigate and implement a privacy-preserving solution:

  1. First, determine if the user context provided by get_user is actually necessary for the application's chat functionality. Review how the LLM uses this information and whether it's essential for the user experience.

  2. If user context IS needed, implement anonymization:

    • Replace actual user IDs with session-scoped hashes that can't be used to identify users across sessions
    • Generalize sensitive fields like roles into broader categories
    • Remove JWT metadata that isn't needed by the LLM (issuer, audience, timestamps)
    • Consider what minimal context the LLM actually needs
  3. If user context is NOT needed, remove the tool entirely to eliminate PII exposure.

  4. Check if there are existing privacy utilities in the codebase that could be leveraged or extended for this purpose.

  5. Consider the application's architecture: Is LiteLLM configured to use self-hosted models or external cloud providers? If using cloud providers, check if data processing agreements are in place that cover this use case.

The goal is to minimize PII exposure to external parties while maintaining the functionality that users need.

Copy link

@promptfoo-scanner promptfoo-scanner bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds a get_user tool for AI function calling that allows the LLM to retrieve authenticated user information. I reviewed the LLM security implications of this change and found one privacy consideration worth addressing.

Minimum severity threshold for this scan: Medium

Comment on lines +28 to +36
return JSON.stringify({
userId: req.user.sub,
role: req.user.role || 'unknown',
issuer: req.user.iss,
audience: req.user.aud,
issuedAt: req.user.iat ? new Date(req.user.iat * 1000).toISOString() : null,
expiresAt: req.user.exp ? new Date(req.user.exp * 1000).toISOString() : null,
scope: req.user.scope || null
});

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium

The get_user tool sends user identifiers (including the user ID from req.user.sub) to the external LLM provider. While this is in a same-user context (each user only sees their own data), sending PII to a third-party LLM service creates privacy exposure depending on the provider's data handling practices, logging, and whether they have data protection agreements in place.

💡 Suggested Fix

Consider minimizing the data sent to the LLM. If per-user personalization isn't required, send only the role instead of the user ID:

case 'get_user':
  if (req.user) {
    return JSON.stringify({
      role: req.user.role || 'unknown',
      authenticated: true
    });
  } else {
    return JSON.stringify({
      error: 'No authenticated user',
      message: 'This endpoint requires authentication to retrieve user information'
    });
  }

Alternatively, if you need per-user context, hash the user ID to reduce PII exposure:

import { createHash } from 'crypto';

function hashUserId(userId: string): string {
  return createHash('sha256').update(userId).digest('hex').substring(0, 16);
}

// In the get_user case:
return JSON.stringify({
  userId: hashUserId(req.user.sub),
  role: req.user.role || 'unknown',
  // ... other fields
});
🤖 AI Agent Prompt

The code at src/routes/chat.ts:28-36 sends user PII (specifically the user ID from JWT claims) to an external LLM provider via the get_user tool. This creates privacy exposure as the data is sent to a third party.

Investigate the following to determine the appropriate fix:

  1. Check if user ID is actually needed: Review how the LLM responses use the user context. Look at system prompts (referenced by getSystemPrompt imported at line 3) and any existing chat functionality to understand if the LLM needs the actual user ID for personalization, or if just knowing the user's role would be sufficient.

  2. Assess current usage patterns: Search for other places in the codebase where user context is handled. Look for patterns around privacy, consent mechanisms, or data minimization principles that might already be established.

  3. Evaluate LLM provider configuration: Check environment configuration and documentation to understand which LLM provider is being used (via LiteLLM), whether data protection agreements exist, and what the data retention/logging policies are.

  4. Determine fix approach:

    • If user ID isn't needed for functionality, remove it and send only role/authentication status
    • If per-user personalization is required, implement hashing/anonymization of the user ID
    • If raw user ID is truly necessary, add configuration controls to disable the feature in environments without appropriate privacy safeguards

Start by examining the system prompt and chat context to understand what information the LLM actually needs to provide useful responses.

@danenania danenania closed this Nov 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants