|
| 1 | +# JGit-Based Repository Inspection Infrastructure |
| 2 | + |
| 3 | +This document describes the JGit-based infrastructure added to enable filter functionality similar to the Node.js git-proxy project. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The infrastructure uses JGit to clone and inspect remote repositories locally, enabling filters to: |
| 8 | +- Extract complete commit information (not just the head commit) |
| 9 | +- Analyze commit ranges and diffs |
| 10 | +- Validate GPG signatures |
| 11 | +- Scan for secrets and sensitive information |
| 12 | +- Check commit messages and author emails |
| 13 | + |
| 14 | +## Components |
| 15 | + |
| 16 | +### LocalRepositoryCache |
| 17 | + |
| 18 | +Manages local bare clones of remote repositories: |
| 19 | + |
| 20 | +```java |
| 21 | +// Initialize cache (typically done once at application startup) |
| 22 | +LocalRepositoryCache cache = new LocalRepositoryCache(); |
| 23 | + |
| 24 | +// Get or clone a repository |
| 25 | +Repository repo = cache.getOrClone("https://github.com/owner/repo.git"); |
| 26 | + |
| 27 | +// Use JGit operations on the repository |
| 28 | +try (Git git = new Git(repo)) { |
| 29 | + // ... perform git operations |
| 30 | +} |
| 31 | +``` |
| 32 | + |
| 33 | +**Features:** |
| 34 | +- Caches repositories in temporary directories |
| 35 | +- Automatically fetches updates when accessing cached repos |
| 36 | +- Cleans up on JVM shutdown |
| 37 | +- Thread-safe concurrent access |
| 38 | + |
| 39 | +### CommitInspectionService |
| 40 | + |
| 41 | +Provides utilities for extracting commit information: |
| 42 | + |
| 43 | +```java |
| 44 | +// Get details for a specific commit |
| 45 | +Commit commit = CommitInspectionService.getCommitDetails(repository, "abc123"); |
| 46 | + |
| 47 | +// Get all commits in a range |
| 48 | +List<Commit> commits = CommitInspectionService.getCommitRange( |
| 49 | + repository, |
| 50 | + "oldCommit", |
| 51 | + "newCommit" |
| 52 | +); |
| 53 | + |
| 54 | +// Get diff between commits |
| 55 | +List<DiffEntry> diff = CommitInspectionService.getDiff( |
| 56 | + repository, |
| 57 | + "oldCommit", |
| 58 | + "newCommit" |
| 59 | +); |
| 60 | + |
| 61 | +// Get formatted diff as string |
| 62 | +String diffText = CommitInspectionService.getFormattedDiff( |
| 63 | + repository, |
| 64 | + "oldCommit", |
| 65 | + "newCommit" |
| 66 | +); |
| 67 | +``` |
| 68 | + |
| 69 | +### EnrichPushCommitsFilter |
| 70 | + |
| 71 | +A servlet filter that enriches push requests with full commit information: |
| 72 | + |
| 73 | +```java |
| 74 | +// Registered in filter chain after ParseGitRequestFilter |
| 75 | +var enrichFilter = new EnrichPushCommitsFilter(provider, repositoryCache); |
| 76 | +context.addFilter(enrichFilterHolder, urlPattern, EnumSet.of(DispatcherType.REQUEST)); |
| 77 | +``` |
| 78 | + |
| 79 | +**What it does:** |
| 80 | +1. Extracts basic commit info from the push packet |
| 81 | +2. Clones/fetches the remote repository locally |
| 82 | +3. Uses JGit to extract all commits in the push range |
| 83 | +4. Populates `GitRequestDetails.commits` with full commit information |
| 84 | +5. Extracts user email from commit author |
| 85 | + |
| 86 | +### TemporaryRepositoryResolver |
| 87 | + |
| 88 | +Integrates with LocalRepositoryCache to serve repositories for JGit operations: |
| 89 | + |
| 90 | +```java |
| 91 | +var resolver = new TemporaryRepositoryResolver(cache); |
| 92 | +// Used by JGit servlet handlers to resolve repository requests |
| 93 | +``` |
| 94 | + |
| 95 | +## Usage Example |
| 96 | + |
| 97 | +Here's how the infrastructure works in a typical push operation: |
| 98 | + |
| 99 | +1. **Git client pushes to proxy**: |
| 100 | + ``` |
| 101 | + git push http://proxy:8080/github.com/owner/repo.git |
| 102 | + ``` |
| 103 | + |
| 104 | +2. **ForceGitClientFilter** validates the client |
| 105 | + |
| 106 | +3. **ParseGitRequestFilter** parses the basic push information from the packet |
| 107 | + |
| 108 | +4. **EnrichPushCommitsFilter** (NEW): |
| 109 | + - Clones/fetches `https://github.com/owner/repo.git` to temp directory |
| 110 | + - Uses JGit to extract all commits in the range |
| 111 | + - Populates full commit details in `GitRequestDetails` |
| 112 | + |
| 113 | +5. **Validation Filters** can now access complete commit information: |
| 114 | + ```java |
| 115 | + var commits = requestDetails.getCommits(); // All commits in push |
| 116 | + for (Commit commit : commits) { |
| 117 | + String email = commit.getAuthor().getEmail(); |
| 118 | + String message = commit.getMessage(); |
| 119 | + String signature = commit.getSignature(); |
| 120 | + // ... validate |
| 121 | + } |
| 122 | + ``` |
| 123 | + |
| 124 | +6. **Proxy completes** if all filters pass |
| 125 | + |
| 126 | +## Filter Examples |
| 127 | + |
| 128 | +### CheckAuthorEmailsFilter |
| 129 | + |
| 130 | +Validates commit author emails against configured patterns: |
| 131 | + |
| 132 | +```java |
| 133 | +var commitConfig = CommitConfig.builder() |
| 134 | + .author(AuthorConfig.builder() |
| 135 | + .email(EmailConfig.builder() |
| 136 | + .domain(DomainConfig.builder() |
| 137 | + .allow(".*\\.company\\.com$") |
| 138 | + .build()) |
| 139 | + .build()) |
| 140 | + .build()) |
| 141 | + .build(); |
| 142 | + |
| 143 | +var filter = new CheckAuthorEmailsFilter(commitConfig); |
| 144 | +``` |
| 145 | + |
| 146 | +### SecretScanningFilter |
| 147 | + |
| 148 | +Scans commits for potential secrets: |
| 149 | + |
| 150 | +```java |
| 151 | +var secretConfig = SecretScanningConfig.defaultConfig(); // Includes common patterns |
| 152 | +var filter = new SecretScanningFilter(secretConfig); |
| 153 | +``` |
| 154 | + |
| 155 | +### GpgSignatureFilter |
| 156 | + |
| 157 | +Validates GPG signatures on commits: |
| 158 | + |
| 159 | +```java |
| 160 | +var gpgConfig = GpgConfig.builder() |
| 161 | + .enabled(true) |
| 162 | + .requireSignedCommits(true) |
| 163 | + .trustedKeysFile("/path/to/public-keys.asc") |
| 164 | + .build(); |
| 165 | + |
| 166 | +var filter = new GpgSignatureFilter(gpgConfig); |
| 167 | +``` |
| 168 | + |
| 169 | +## Performance Considerations |
| 170 | + |
| 171 | +- **First Push**: Clones repository (slower) |
| 172 | +- **Subsequent Pushes**: Uses cached clone with fetch (faster) |
| 173 | +- **Memory**: Bare repositories are compact (no working directory) |
| 174 | +- **Disk**: Cached in temp directory, cleaned up on shutdown |
| 175 | +- **Concurrency**: Thread-safe cache with synchronized cloning |
| 176 | + |
| 177 | +## Comparison with Node.js git-proxy |
| 178 | + |
| 179 | +| Feature | Node.js git-proxy | Java jgit-proxy | |
| 180 | +|---------|-------------------|-----------------| |
| 181 | +| Repository Cloning | Child process `git clone` | JGit API | |
| 182 | +| Commit Inspection | Child process `git log`, `git show` | JGit RevWalk | |
| 183 | +| Diff Analysis | Child process `git diff` | JGit DiffFormatter | |
| 184 | +| GPG Verification | Child process `git verify-commit` | BouncyCastle PGP | |
| 185 | +| Pack Analysis | Child process `git verify-pack` | Not yet implemented | |
| 186 | + |
| 187 | +## Future Enhancements |
| 188 | + |
| 189 | +1. **Pack File Analysis**: Implement hidden commits check using JGit pack file APIs |
| 190 | +2. **Diff Content Scanning**: Extend SecretScanningFilter to scan actual file diffs |
| 191 | +3. **Repository Retention**: Add configurable cache expiry and size limits |
| 192 | +4. **Async Cloning**: Clone repositories asynchronously to avoid blocking requests |
| 193 | +5. **Mirror Mode**: Support local git mirrors instead of on-demand cloning |
0 commit comments