Skip to content

Commit 5494cb3

Browse files
Copilotcoopernetes
andcommitted
Add documentation for JGit infrastructure and filter usage
Co-authored-by: coopernetes <57812123+coopernetes@users.noreply.github.com>
1 parent 7ac1353 commit 5494cb3

1 file changed

Lines changed: 193 additions & 0 deletions

File tree

JGIT-INFRASTRUCTURE.md

Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
# JGit-Based Repository Inspection Infrastructure
2+
3+
This document describes the JGit-based infrastructure added to enable filter functionality similar to the Node.js git-proxy project.
4+
5+
## Overview
6+
7+
The infrastructure uses JGit to clone and inspect remote repositories locally, enabling filters to:
8+
- Extract complete commit information (not just the head commit)
9+
- Analyze commit ranges and diffs
10+
- Validate GPG signatures
11+
- Scan for secrets and sensitive information
12+
- Check commit messages and author emails
13+
14+
## Components
15+
16+
### LocalRepositoryCache
17+
18+
Manages local bare clones of remote repositories:
19+
20+
```java
21+
// Initialize cache (typically done once at application startup)
22+
LocalRepositoryCache cache = new LocalRepositoryCache();
23+
24+
// Get or clone a repository
25+
Repository repo = cache.getOrClone("https://github.com/owner/repo.git");
26+
27+
// Use JGit operations on the repository
28+
try (Git git = new Git(repo)) {
29+
// ... perform git operations
30+
}
31+
```
32+
33+
**Features:**
34+
- Caches repositories in temporary directories
35+
- Automatically fetches updates when accessing cached repos
36+
- Cleans up on JVM shutdown
37+
- Thread-safe concurrent access
38+
39+
### CommitInspectionService
40+
41+
Provides utilities for extracting commit information:
42+
43+
```java
44+
// Get details for a specific commit
45+
Commit commit = CommitInspectionService.getCommitDetails(repository, "abc123");
46+
47+
// Get all commits in a range
48+
List<Commit> commits = CommitInspectionService.getCommitRange(
49+
repository,
50+
"oldCommit",
51+
"newCommit"
52+
);
53+
54+
// Get diff between commits
55+
List<DiffEntry> diff = CommitInspectionService.getDiff(
56+
repository,
57+
"oldCommit",
58+
"newCommit"
59+
);
60+
61+
// Get formatted diff as string
62+
String diffText = CommitInspectionService.getFormattedDiff(
63+
repository,
64+
"oldCommit",
65+
"newCommit"
66+
);
67+
```
68+
69+
### EnrichPushCommitsFilter
70+
71+
A servlet filter that enriches push requests with full commit information:
72+
73+
```java
74+
// Registered in filter chain after ParseGitRequestFilter
75+
var enrichFilter = new EnrichPushCommitsFilter(provider, repositoryCache);
76+
context.addFilter(enrichFilterHolder, urlPattern, EnumSet.of(DispatcherType.REQUEST));
77+
```
78+
79+
**What it does:**
80+
1. Extracts basic commit info from the push packet
81+
2. Clones/fetches the remote repository locally
82+
3. Uses JGit to extract all commits in the push range
83+
4. Populates `GitRequestDetails.commits` with full commit information
84+
5. Extracts user email from commit author
85+
86+
### TemporaryRepositoryResolver
87+
88+
Integrates with LocalRepositoryCache to serve repositories for JGit operations:
89+
90+
```java
91+
var resolver = new TemporaryRepositoryResolver(cache);
92+
// Used by JGit servlet handlers to resolve repository requests
93+
```
94+
95+
## Usage Example
96+
97+
Here's how the infrastructure works in a typical push operation:
98+
99+
1. **Git client pushes to proxy**:
100+
```
101+
git push http://proxy:8080/github.com/owner/repo.git
102+
```
103+
104+
2. **ForceGitClientFilter** validates the client
105+
106+
3. **ParseGitRequestFilter** parses the basic push information from the packet
107+
108+
4. **EnrichPushCommitsFilter** (NEW):
109+
- Clones/fetches `https://github.com/owner/repo.git` to temp directory
110+
- Uses JGit to extract all commits in the range
111+
- Populates full commit details in `GitRequestDetails`
112+
113+
5. **Validation Filters** can now access complete commit information:
114+
```java
115+
var commits = requestDetails.getCommits(); // All commits in push
116+
for (Commit commit : commits) {
117+
String email = commit.getAuthor().getEmail();
118+
String message = commit.getMessage();
119+
String signature = commit.getSignature();
120+
// ... validate
121+
}
122+
```
123+
124+
6. **Proxy completes** if all filters pass
125+
126+
## Filter Examples
127+
128+
### CheckAuthorEmailsFilter
129+
130+
Validates commit author emails against configured patterns:
131+
132+
```java
133+
var commitConfig = CommitConfig.builder()
134+
.author(AuthorConfig.builder()
135+
.email(EmailConfig.builder()
136+
.domain(DomainConfig.builder()
137+
.allow(".*\\.company\\.com$")
138+
.build())
139+
.build())
140+
.build())
141+
.build();
142+
143+
var filter = new CheckAuthorEmailsFilter(commitConfig);
144+
```
145+
146+
### SecretScanningFilter
147+
148+
Scans commits for potential secrets:
149+
150+
```java
151+
var secretConfig = SecretScanningConfig.defaultConfig(); // Includes common patterns
152+
var filter = new SecretScanningFilter(secretConfig);
153+
```
154+
155+
### GpgSignatureFilter
156+
157+
Validates GPG signatures on commits:
158+
159+
```java
160+
var gpgConfig = GpgConfig.builder()
161+
.enabled(true)
162+
.requireSignedCommits(true)
163+
.trustedKeysFile("/path/to/public-keys.asc")
164+
.build();
165+
166+
var filter = new GpgSignatureFilter(gpgConfig);
167+
```
168+
169+
## Performance Considerations
170+
171+
- **First Push**: Clones repository (slower)
172+
- **Subsequent Pushes**: Uses cached clone with fetch (faster)
173+
- **Memory**: Bare repositories are compact (no working directory)
174+
- **Disk**: Cached in temp directory, cleaned up on shutdown
175+
- **Concurrency**: Thread-safe cache with synchronized cloning
176+
177+
## Comparison with Node.js git-proxy
178+
179+
| Feature | Node.js git-proxy | Java jgit-proxy |
180+
|---------|-------------------|-----------------|
181+
| Repository Cloning | Child process `git clone` | JGit API |
182+
| Commit Inspection | Child process `git log`, `git show` | JGit RevWalk |
183+
| Diff Analysis | Child process `git diff` | JGit DiffFormatter |
184+
| GPG Verification | Child process `git verify-commit` | BouncyCastle PGP |
185+
| Pack Analysis | Child process `git verify-pack` | Not yet implemented |
186+
187+
## Future Enhancements
188+
189+
1. **Pack File Analysis**: Implement hidden commits check using JGit pack file APIs
190+
2. **Diff Content Scanning**: Extend SecretScanningFilter to scan actual file diffs
191+
3. **Repository Retention**: Add configurable cache expiry and size limits
192+
4. **Async Cloning**: Clone repositories asynchronously to avoid blocking requests
193+
5. **Mirror Mode**: Support local git mirrors instead of on-demand cloning

0 commit comments

Comments
 (0)