-
Notifications
You must be signed in to change notification settings - Fork 0
Fix .env filtering and update prompts #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughSummary by CodeRabbit
Summary by CodeRabbit
WalkthroughThe changes update the dependency list to include the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant AI
User->>CLI: Run update command
CLI->>CLI: Collect and trim git diff (up to 200,000 chars)
CLI->>CLI: Detect security files (improved logic)
CLI->>CLI: Build AI prompt with enhanced instructions
CLI->>CLI: Count tokens using tiktoken-rs
CLI->>AI: Send chat completion request (max_tokens capped)
AI-->>CLI: Return response
CLI-->>User: Output AI response
Possibly related PRs
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code Graph Analysis (1)src/main.rs (1)
🪛 GitHub Check: Test (macos-latest, stable)src/main.rs[warning] 677-677: 🪛 GitHub Check: Test (ubuntu-latest, stable)src/main.rs[warning] 677-677: ⏰ Context from checks skipped due to timeout of 90000ms (4)
🔇 Additional comments (11)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @arthrod, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello everyone, Gemini here with a summary of this pull request by @arthrod. This PR focuses on improving the handling of sensitive files, specifically .env files, updating some default parameters, refining the system prompt for the AI, and incorporating a token calculation library (tiktoken-rs) to better manage the input size for the AI model.
Highlights
- .env File Filtering: The logic for filtering out sensitive
.envfiles has been updated. Previously, only a specific list of.envfile names was blocked. Now, any file starting with.envwill be blocked by default, with the exception of.env.otherthings, unless the--ok-to-send-envflag is used (which is not directly changed in this PR but is part of the overall functionality). - Increased Default Max Input Characters: The default value for the
--max-input-charscommand-line argument has been significantly increased from 50,000 to 200,000 characters. This allows larger diffs to be sent to the AI by default before truncation occurs. - Refined System Prompt: The system prompt provided to the AI has been slightly adjusted to be more specific about detecting and mentioning real secrets or API keys, rather than just any potential API key pattern, and to mention the file containing the secret.
- Token Calculation Integration: The
tiktoken-rscrate has been added as a dependency. This library is now used to calculate the maximum number of tokens allowed for the AI's response based on the specific model being used and the input messages, ensuring the request stays within the model's limits while respecting the user's--max-tokenssetting. - Test Updates: Integration tests have been updated to reflect the new default value for
--max-input-chars.
Changelog
Click here to see the changelog
- Cargo.lock
- Added
tiktoken-rsand its dependencies (anyhow,async-convert,async-trait,base640.21.7,bit-set,bit-vec,bitflags1.3.2,bstr,darling0.14.4,darling_core0.14.4,darling_macro0.14.4,derive_builder0.12.0,derive_builder_core0.12.0,derive_builder_macro0.12.0,encoding_rs,equivalent,fancy-regex,h2,hashbrown,http0.2.12,http-body0.4.6,httpdate,hyper0.14.32,hyper-rustls,indexmap,reqwest0.11.27,reqwest-eventsource0.4.0,ring,rustc-hash,rustls,rustls-native-certs,rustls-pemfile,rustls-webpki,sct,strsim0.10.0,sync_wrapper0.1.2,system-configuration,system-configuration-sys,tokio-rustls,untrusted,windows-sys0.48.0,windows-targets0.48.5,windows_aarch64_gnullvm0.48.5,windows_aarch64_msvc0.48.5,windows_i686_gnu0.48.5,windows_i686_msvc0.48.5,windows_x86_64_gnu0.48.5,windows_x86_64_gnullvm0.48.5,windows_x86_64_msvc0.48.5,winreg). - Updated versions and dependencies for existing crates (
base64,derive_builder,reqwest,reqwest-eventsource,rustix,schannel,strsim,sync_wrapper,tokio-util,tower-http,windows-sys,windows-targets).
- Added
- Cargo.toml
- Added
tiktoken-rscrate with version0.7andasync-openaifeature enabled (L27).
- Added
- src/main.rs
- Updated the default value for the
--max-input-charsCLI argument from 50000 to 200000 (L84). - Modified the
is_security_filefunction to block any file starting with.env(except.env.otherthings) instead of a fixed list of.envfiles (L221-225, L238). - Minor formatting adjustments (L430, L436, L584, L587-591).
- Updated the system prompt text to refine the instruction regarding API key/secret detection (L643).
- Added logic using
tiktoken-rsto calculate the maximum allowed tokens for the AI response based on the model and input messages (L651-681). - Used the calculated
allowed_max_tokens(min of user input and calculated) when creating the chat completion request (L703).
- Updated the default value for the
- tests/integration_tests.rs
- Updated the assertion in
test_cli_defaultsto check for the new defaultmax-input-charsvalue of 200000 (L158). - Minor formatting adjustments in
test_max_input_chars_truncation(L176, L180, L187, L194, L200, L206, L213, L219, L223, L230, L237, L247, L255, L261, L268).
- Updated the assertion in
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
CI Feedback 🧐(Feedback updated until commit a1b877a)A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
|
PR Code Suggestions ✨Latest suggestions up to a1b877a
Previous suggestions✅ Suggestions up to commit 97d63df
|
|||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant improvements to noob-commit, including enhanced .env file filtering for better security, dynamic calculation of maximum response tokens using tiktoken-rs for more reliable API calls, an increased default for max-input-chars, and a more precise system prompt. The changes are well-implemented and the updates to tests are appropriate.
I have one suggestion regarding logging for the new token calculation logic, which I believe would improve debuggability.
Summary of Findings
- Error Logging for Token Calculation: When
tiktoken_rs::get_chat_completion_max_tokensfails, the error is silently ignored, and the system falls back to the user-definedmax_tokens. It's suggested to log this failure to aid in debugging and inform the user. - Dependency Versioning (Cargo.lock): The project now includes two versions of the
async-openaicrate (0.14.3viatiktoken-rsand0.28.2directly). This is handled by Cargo but is a minor point of observation regarding dependency management. (Severity: low, not commented due to review settings) - Code Formatting: Minor whitespace and formatting changes were observed, likely due to
cargo fmt. These are stylistic and do not impact functionality. (Severity: low, not commented due to review settings)
Merge Readiness
The pull request introduces valuable enhancements and is mostly in good shape. I recommend addressing the suggested logging improvement for the token calculation to enhance maintainability and user feedback. Once this medium-severity suggestion is considered, the PR should be in an excellent state for merging. As an AI, I am not authorized to approve pull requests; please ensure further review and approval from team members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
src/main.rs (4)
81-86: 200 000-char default may exhaust memory for very large diffs
A single diff that large is rare but not impossible (e.g. vendored code). Consider a soft cap plus a warning rather than loading the full string into memory by default.
586-594: Log message before truncation hides the final size
Aftertruncatethe diff length iscli.max_input_charsplus the “…(diff truncated)” suffix, so the logged “to {} characters” value is slightly misleading. Either subtract the suffix length or log the actual final length.
651-678: Duplicate message construction – keep one source of truth
messagesandtk_messagesare manually kept in sync. A future edit risks diverging content/role order. Extract a small helper that returns both formats from the same data to avoid drift.
680-682: Silently falling back on unknown models hides configuration errors
Whenget_chat_completion_max_tokenserrors you default to the CLI limit without signalling the user. Log a warning so mis-typed model names are obvious.-let allowed_max_tokens = tiktoken_rs::get_chat_completion_max_tokens(&cli.model, &tk_messages) - .unwrap_or(cli.max_tokens as usize); +let allowed_max_tokens = match tiktoken_rs::get_chat_completion_max_tokens(&cli.model, &tk_messages) { + Ok(v) => v, + Err(e) => { + log::warn!("tiktoken_rs: unknown model '{}': {}", cli.model, e); + cli.max_tokens as usize + } +};
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
Cargo.toml(1 hunks)src/main.rs(7 hunks)tests/integration_tests.rs(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Test (macos-latest, beta)
- GitHub Check: Test (ubuntu-latest, beta)
- GitHub Check: Test (ubuntu-latest, nightly)
- GitHub Check: Test (windows-latest, nightly)
- GitHub Check: Test (macos-latest, stable)
- GitHub Check: Test (windows-latest, beta)
- GitHub Check: Test (windows-latest, stable)
- GitHub Check: Test (macos-latest, nightly)
- GitHub Check: Code Coverage
🔇 Additional comments (3)
Cargo.toml (1)
27-27:Details
✅ Verification successful
Verify that the
async-openaifeature really exists ontiktoken-rs0.7
Crate feature names occasionally drift between versions. If the feature flag is misspelled, the build will silently ignore it and you’ll get a compile-time “unrecognised feature” error later.
🏁 Script executed:
#!/bin/bash # List the published features for the exact crate / version cargo search tiktoken-rs --limit 1 | cut -d' ' -f1 | xargs cargo metadata --no-deps --format-version 1 | jq '.packages[] | select(.name=="tiktoken-rs") | .features'Length of output: 660
🏁 Script executed:
#!/bin/bash # Fetch the list of feature flags for tiktoken-rs v0.7.0 from crates.io curl -s "https://crates.io/api/v1/crates/tiktoken-rs/0.7.0" | jq '.version.features | keys'Length of output: 127
async-openai feature confirmed
Fetched feature list fortiktoken-rsv0.7.0 from crates.io and verified that"async-openai"is published. No changes needed.• File: Cargo.toml (line 27)
tests/integration_tests.rs (1)
155-159: Assertion update looks good – the test now matches the new default of200 000characters.src/main.rs (1)
703-704: Potential precision loss convertingusize→u16
std::cmp::minensures the value is ≤cli.max_tokens(u16), so the cast is safe today. If the CLI flag is ever widened tou32, this silent cast will truncate. Preferu32in the struct and here.
| // `.env.otherthings` which is explicitly allowed. | ||
| if filename.starts_with(".env") && filename != ".env.otherthings" { | ||
| return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Fix .env file detection
| // `.env.otherthings` which is explicitly allowed. | |
| if filename.starts_with(".env") && filename != ".env.otherthings" { | |
| return true; | |
| // `.env.otherthings` which is explicitly allowed. | |
| if (filename == ".env" || filename.starts_with(".env.")) && filename != ".env.otherthings" { | |
| return true; |
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
src/main.rs
Outdated
| let allowed_max_tokens = tiktoken_rs::get_chat_completion_max_tokens(&cli.model, &tk_messages) | ||
| .unwrap_or(cli.max_tokens as usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Fix token limit fallback
| let allowed_max_tokens = tiktoken_rs::get_chat_completion_max_tokens(&cli.model, &tk_messages) | |
| .unwrap_or(cli.max_tokens as usize); | |
| let allowed_max_tokens = tiktoken_rs::get_chat_completion_max_tokens(&cli.model, &tk_messages) | |
| .unwrap_or(4096); // Use a reasonable default token limit |
Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com>
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like
Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
User description
Summary
.envfiles unless--ok-to-send-envis specified, except for.env.otherthings--max-input-charsto 200000Testing
cargo fmt --allcargo testhttps://chatgpt.com/codex/tasks/task_e_68430a28fcc4832bb2e009f0785bd421
PR Type
Enhancement, Bug fix, Tests
Description
Use tiktoken to compute allowed max response tokens dynamically
Update .env file filtering to block all except .env.otherthings
Revise system prompt to clarify real secret detection
Increase default max-input-chars to 200000
Update integration tests for new defaults and behaviors
Changes walkthrough 📝
main.rs
Dynamic token calculation, .env filtering, and prompt improvementssrc/main.rs
Cargo.toml
Add tiktoken-rs dependency for token calculationCargo.toml
integration_tests.rs
Update tests for new max-input-chars defaulttests/integration_tests.rs