-
Notifications
You must be signed in to change notification settings - Fork 0
feat: guest screening risk assessment #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Adds an LLM-generated risk assessment to the vacation rental app. This helps owners screen potential guests more easily. It's included in email message to owner when a guest makes an inquiry about a rental property to provide helpful context to owner. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR introduces LLM-based guest screening functionality that sends complete guest application records to external LLM providers. The implementation includes highly sensitive PII (SSNs, bank account numbers, credit scores, criminal records) that should never be sent to LLMs. Four security vulnerabilities were identified ranging from medium to critical severity, primarily focused on excessive PII exposure and lack of application-level data filtering.
Minimum severity threshold for this scan: 🟡 Medium | Learn more
| GUEST APPLICATION DATA: | ||
| ${JSON.stringify(guestApplication, null, 2)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 Critical
The entire guest application object is serialized and sent to external LLM providers, including highly sensitive PII that should never be sent to LLMs: Social Security Numbers (SSN field), bank account numbers (securityDepositAccountNumber), bank routing numbers (securityDepositRoutingNumber), driver's license numbers, credit scores, detailed criminal records, and date of birth. The guest application data file contains actual SSNs in format "542-88-7721" and full bank account numbers like "8847293012". This violates fundamental security principles as these data types should not be transmitted to third-party LLM services regardless of context. The LLM evaluation prompt focuses on booking history, platform reviews, and communication quality—none of which require SSNs or financial account numbers.
💡 Suggested Fix
Create a filtering function that sends only evaluation-relevant fields to the LLM:
function filterGuestApplicationForLLM(guestApplication: GuestApplication) {
return {
// Platform profile
platformId: guestApplication.platformId,
memberSince: guestApplication.memberSince,
verified: guestApplication.verified,
verifiedId: guestApplication.verifiedId,
superguest: guestApplication.superguest,
// Booking history
totalReviews: guestApplication.totalReviews,
averageRating: guestApplication.averageRating,
responseRate: guestApplication.responseRate,
previousBookings: guestApplication.previousBookings,
cancellationRate: guestApplication.cancellationRate,
// Current booking details
checkIn: guestApplication.checkIn,
checkOut: guestApplication.checkOut,
numberOfGuests: guestApplication.numberOfGuests,
bookingPurpose: guestApplication.bookingPurpose,
guestMessage: guestApplication.guestMessage,
houseRulesAcknowledged: guestApplication.houseRulesAcknowledged,
reviews: guestApplication.reviews,
// High-level verification status only (no sensitive details)
idVerificationStatus: guestApplication.idVerificationStatus,
backgroundCheckStatus: guestApplication.backgroundCheckStatus,
};
}
// Then use: JSON.stringify(filterGuestApplicationForLLM(guestApplication), null, 2)This removes SSN, bank accounts, credit details, criminal records, addresses, and other sensitive fields while preserving all data needed for risk assessment.
🤖 AI Agent Prompt
The code at src/routes/rental.ts:92-93 serializes the complete guest application object and sends it to an external LLM provider. This object contains highly sensitive PII including SSNs, bank account numbers, credit scores, and criminal records (see fields defined in src/types/rental.ts and sample data in src/data/guest-applications.json).
The root issue is that the entire GuestApplication object is passed to JSON.stringify() without filtering. You need to implement application-level data filtering before the LLM call.
Key investigation steps:
- Review
src/types/rental.tsto identify all fields in the GuestApplication interface - Analyze the evaluation prompt (lines 40-113 in this file) to determine which fields are actually needed for the risk assessment
- Create a filtering function that extracts only the necessary fields (booking history, platform reviews, verification status) and excludes sensitive PII (SSN, bank accounts, credit details, criminal records, full addresses, income, etc.)
- Apply the filtering function before JSON.stringify() at line 93
The evaluation prompt focuses on communication quality, booking profile, and trust indicators—it does not require financial account numbers, SSNs, or detailed criminal records. A filtered object with ~25 fields (out of 100+) will provide everything the LLM needs for assessment.
Consider whether the application should even be collecting/storing some of these highly sensitive fields (SSN, bank accounts) if they're not needed for the core functionality.
| @@ -1,2 +1,2 @@ | |||
| You are a vacation rental assistant. Help users find and book vacation rentals, answer questions about properties, and provide travel recommendations. Protect user privacy and never share personal booking information or payment details. | |||
| You are a vacation rental assistant. Help users find and book vacation rentals, answer questions about properties, provide travel recommendations, and assist property managers with evaluating guest applications for their rental properties. Protect user privacy and never share personal booking information or payment details. When evaluating guest applications, focus only on relevant booking and platform information that helps assess guest suitability. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟠 High
The "secure" mode relies entirely on a system prompt instruction to filter sensitive data ("focus only on relevant booking and platform information"), but all sensitive data is still sent to the LLM in the user prompt. This is a prompt-only security control that can be bypassed through jailbreaking techniques like role-play scenarios, encoded requests, or exploiting LLM instruction-following weaknesses. System prompts are non-deterministic and should never be used as the sole security boundary. The application should filter data at the code level before sending anything to the LLM—defense-in-depth means sensitive data should never reach the LLM in the first place, regardless of prompt instructions.
💡 Suggested Fix
Remove the prompt-based filtering instruction and implement application-level filtering instead (see previous comment). Once data filtering is implemented in code at src/routes/rental.ts:93, update this prompt to remove the security-critical instruction:
You are a vacation rental assistant. Help users find and book vacation rentals, answer questions about properties, provide travel recommendations, and assist property managers with evaluating guest applications for their rental properties. Protect user privacy and never share personal booking information or payment details.
The phrase "focus only on relevant booking and platform information" should be removed because the application now enforces this through code, not through LLM instructions.
🤖 AI Agent Prompt
The secure mode system prompt at src/domains/vacation-rental/secure.txt:1 attempts to protect sensitive data by instructing the LLM to "focus only on relevant booking and platform information." However, this is a prompt-only security control.
Investigate the data flow:
- Check
src/routes/rental.ts:37-38to see how the security level controls prompt selection - Verify at
src/routes/rental.ts:93that the full guest application is still serialized regardless of security level - Note that
src/domains/index.tsshows the only difference between "insecure" and "secure" modes is the prompt text—no application-level filtering occurs
The security issue: An attacker with API access could use jailbreak techniques to extract the sensitive data that's already in the LLM's context. The fix is two-fold:
- Implement application-level data filtering (primary fix—see the rental.ts comment)
- Update this prompt to remove the filtering instruction since it will be enforced in code
Security controls must be deterministic. If data shouldn't be exposed, don't send it to the LLM at all—don't rely on the LLM to filter it.
| body: JSON.stringify({ | ||
| model: model || 'gpt-3.5-turbo', | ||
| messages, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Medium
The optional model parameter from the API request is passed directly to the LiteLLM service without validation or allowlisting. This gives authenticated callers unnecessary control over which LLM model processes the sensitive guest data, enabling potential cost abuse (expensive models like GPT-4), probing attacks against the LiteLLM configuration, or exploitation of model-specific vulnerabilities. The Zod schemas at lines 14-17 and 19-24 accept any string value for the model parameter with no restrictions.
💡 Suggested Fix
Add an allowlist of permitted models using Zod enum validation:
// Add near the top of the file after imports
const ALLOWED_MODELS = [
'gpt-3.5-turbo',
'gpt-4',
'gpt-4-turbo',
] as const;
const modelSchema = z.enum(ALLOWED_MODELS).optional();
// Update both schemas to use the restricted enum:
const screenGuestQuerySchema = z.object({
applicationId: z.string(),
model: modelSchema, // Changed from z.string().optional()
});
const sendMessageBodySchema = z.object({
applicationId: z.string(),
message: z.string().min(1),
propertyOwnerEmail: z.string().email(),
model: modelSchema, // Changed from z.string().optional()
});Alternative: If callers shouldn't control model selection at all, remove the parameter entirely and use a configuration constant instead.
🤖 AI Agent Prompt
At src/routes/rental.ts:132-134, the code sends a model parameter directly to the LiteLLM API. This parameter comes from user input (query string at line 191 or request body at line 235) and is validated only as "optional string" in the Zod schemas at lines 14-17 and 19-24.
Investigate:
- Determine if API callers have a legitimate need to control model selection, or if this should be an internal configuration decision
- If model selection should be restricted, create a Zod enum with an allowlist of permitted models (e.g., ['gpt-3.5-turbo', 'gpt-4'])
- If callers shouldn't control this at all, remove the model parameter from both schemas and use a constant like
DEFAULT_SCREENING_MODEL = 'gpt-3.5-turbo'
This follows the principle of least privilege—external callers should only have permissions they genuinely need. Model selection is typically an internal architectural decision, not an API parameter.
Consider cost implications (GPT-4 is more expensive than GPT-3.5), security characteristics of different models, and potential for probing attacks.
Adds an LLM-generated risk assessment to the vacation rental app. This helps owners screen potential guests more easily. It's included in email message to owner when a guest make an inquiry about a rental property to provide helpful context to owner.