@@ -122,3 +122,299 @@ This produces a full-snapshot diff of the tagged commit, which is harmless for t
122122
123123** ` SecretScanningFilter ` ** — passes ` commitFrom ` /` commitTo ` to ` gitleaks git ` .
124124Gitleaks calls native ` git log ` , which peels tags natively. No special handling needed.
125+
126+ ---
127+
128+ ## Branches and refs
129+
130+ ### What the proxy sees on the wire
131+
132+ Every ` git push ` sends one or more ** packet lines** before the pack data.
133+ Each line has the format:
134+
135+ ```
136+ <oldOid> <newOid> <refName>\0<capabilities>
137+ ```
138+
139+ | Field | Meaning |
140+ | -------| ---------|
141+ | ` oldOid ` | The SHA the client believes the ref currently points to on the remote. All-zeros (` 0000… ` ) for a new ref (branch or tag). |
142+ | ` newOid ` | The SHA the client wants the ref to point to after the push. All-zeros for a ** ref deletion** . |
143+ | ` refName ` | Full ref path: ` refs/heads/main ` , ` refs/tags/v1.0 ` , etc. |
144+
145+ The null byte ` \0 ` separates the ref triple from the capability string (e.g. ` report-status side-band-64k ` ).
146+ Only the ** first** packet line carries capabilities; subsequent lines omit the ` \0… ` suffix.
147+
148+ ` GitReceivePackParser.parsePush() ` splits this line and populates ` PushInfo ` (proxy mode)
149+ or JGit's ` ReceiveCommand ` carries the same triple (S&F mode).
150+
151+ ### Determining the push type from the packet line
152+
153+ The packet line SHAs encode what kind of ref update is happening:
154+
155+ | ` oldOid ` | ` newOid ` | ` refName ` | Meaning |
156+ | ----------| ----------| -----------| ---------|
157+ | ` 000…0 ` | ` abc123 ` | ` refs/heads/feature ` | ** New branch** — create a branch pointing at ` abc123 ` |
158+ | ` abc123 ` | ` def456 ` | ` refs/heads/feature ` | ** Branch update** — fast-forward (or force push) from ` abc123 ` to ` def456 ` |
159+ | ` abc123 ` | ` 000…0 ` | ` refs/heads/feature ` | ** Branch deletion** — remove the ref entirely |
160+ | ` 000…0 ` | ` abc123 ` | ` refs/tags/v1.0 ` | ** New tag** — see the "Tag objects" section above |
161+
162+ In S&F mode, JGit's ` ReceiveCommand.Type ` enum maps these directly: ` CREATE ` , ` UPDATE ` ,
163+ ` UPDATE_NONFASTFORWARD ` , ` DELETE ` .
164+
165+ In proxy mode, ` GitRequestDetails ` exposes helper methods:
166+ - ` isRefDeletion() ` — ` commitTo ` is all-zeros
167+ - ` isTagPush() ` — ` branch ` starts with ` refs/tags/ `
168+
169+ There is no explicit ` isNewBranch() ` helper; filters check ` commitFrom.matches("^0+$") ` directly.
170+
171+ ### New branches — what makes them tricky
172+
173+ A new-branch push (` oldOid ` = zeros) doesn't tell you which commits are "new".
174+ The pack may contain many commits, but some of them may already exist on the remote
175+ under a different branch. Only the commits ** not reachable from any existing ref** are
176+ genuinely new in this push.
177+
178+ Both modes solve this the same way — via ` CommitInspectionService.getCommitRange() ` :
179+
180+ ``` java
181+ // New branch path (fromId is null or zero):
182+ var logCmd = git. log(). add(toId);
183+ for (Ref ref : repository. getRefDatabase(). getRefsByPrefix(" refs/heads/" )) {
184+ if (ref. getObjectId() != null ) logCmd. not(ref. getObjectId());
185+ }
186+ ```
187+
188+ This walks backward from the pushed tip, excluding anything reachable from existing
189+ branch heads. The result is only the commits that are genuinely new.
190+
191+ ** S&F mode** : JGit's ` ReceivePack ` has already unpacked the objects into its own
192+ repository, so ` getCommitRange() ` works against that repo directly.
193+
194+ ** Proxy mode** : ` EnrichPushCommitsFilter ` must first clone/fetch the upstream and
195+ unpack the push's pack data into the local clone (see "How proxy mode gets a repository"
196+ below), then ` getCommitRange() ` can walk the combined object store.
197+
198+ ### Branch updates — the commit range
199+
200+ For an existing branch update (` oldOid ` is a real SHA), the commit range is
201+ straightforward:
202+
203+ ```
204+ git log oldOid..newOid
205+ ```
206+
207+ ` CommitInspectionService.getCommitRange() ` uses ` git.log().addRange(fromId, toId) ` ,
208+ which is JGit's equivalent. This returns exactly the commits introduced by this push.
209+
210+ ### Force pushes (non-fast-forward)
211+
212+ A force push rewrites history. ` oldOid ` is no longer an ancestor of ` newOid ` .
213+
214+ In S&F mode, JGit classifies this as ` ReceiveCommand.Type.UPDATE_NONFASTFORWARD ` .
215+ ` ForwardingPostReceiveHook.buildRefUpdates() ` sets ` force=true ` for these so the
216+ upstream accepts the rewrite.
217+
218+ In proxy mode, the request is forwarded as-is — the upstream git server decides
219+ whether to accept the force push based on its own configuration. The proxy's filter
220+ chain still runs validation on the new commits, but ` getCommitRange() ` may behave
221+ unexpectedly: ` addRange(oldId, newOid) ` only returns commits reachable from ` newOid `
222+ but not ` oldOid ` . If the branches diverged, commits on the old branch that were
223+ dropped are ** not** included — the range shows only what was added, not what was removed.
224+
225+ ### Ref deletions
226+
227+ When ` newOid ` is all-zeros, the client is deleting a ref. There are no objects in the
228+ pack and no commits to validate.
229+
230+ ** S&F mode** : ` ReceiveCommand.Type.DELETE ` . Hooks that iterate commands skip ` DELETE `
231+ types explicitly (e.g. ` CheckEmptyBranchHook ` , ` CheckHiddenCommitsHook ` , ` DiffGenerationHook ` ).
232+ ` ForwardingPostReceiveHook ` handles deletion by creating a ` RemoteRefUpdate ` with
233+ a null source ref — JGit translates this to a delete on the upstream.
234+
235+ ** Proxy mode** : ` GitReceivePackParser.parsePush() ` checks ` newCommit.equals(ZERO_OID) `
236+ and skips pack parsing entirely (there's nothing to parse). ` GitRequestDetails ` will
237+ have ` commitTo ` = zeros, ` commit ` = null, ` pushedCommits ` = empty.
238+ ` isRefDeletion() ` returns true, and filters should check this early and skip.
239+
240+ ---
241+
242+ ## How the proxy gets commit data
243+
244+ The two proxy modes obtain commit metadata very differently.
245+
246+ ### S&F mode: JGit ReceivePack
247+
248+ JGit's ` ReceivePack ` handles the entire git protocol server-side. When the client
249+ pushes, JGit:
250+
251+ 1 . Receives the pack data and unpacks objects into the local repository
252+ 2 . Creates ` ReceiveCommand ` entries for each ref update
253+ 3 . Calls the pre-receive hook chain with access to the full ` Repository `
254+
255+ Hooks can call any JGit API — ` RevWalk ` , ` DiffFormatter ` , ` git.log() ` — because
256+ the objects are already in the local object store. No special setup required.
257+
258+ The repository is a bare repo managed by the S&F servlet, one per provider+repo
259+ combination.
260+
261+ ### Proxy mode: clone + unpack
262+
263+ Proxy-mode filters run as servlet filters on an HTTP request. They don't have a
264+ local repository by default — the request is just bytes on the wire being forwarded
265+ to the upstream.
266+
267+ ` EnrichPushCommitsFilter ` bridges this gap:
268+
269+ 1 . ** Clone/fetch** : ` LocalRepositoryCache.getOrClone(remoteUrl) ` maintains a bare
270+ clone of each upstream repository. First push triggers a ` git clone --bare --depth 100 ` ;
271+ subsequent pushes do ` git fetch --depth 100 ` . The cache is keyed by
272+ ` owner_reponame ` (derived from the URL).
273+
274+ 2 . ** Unpack push data** : The push's pack data (from the HTTP request body) is fed
275+ into JGit's ` PackParser ` , which inserts the objects into the local clone's object
276+ store. This is the equivalent of what ` ReceivePack ` does internally in S&F mode.
277+
278+ 3 . ** Walk commits** : With objects now in the local clone, ` CommitInspectionService `
279+ can walk the commit range, generate diffs, etc.
280+
281+ The local clone is published on ` GitRequestDetails.localRepository ` so all downstream
282+ filters can use it.
283+
284+ #### Shallow clone implications
285+
286+ The default clone depth is 100 commits. This means:
287+
288+ - ` getCommitRange() ` for a new branch will only walk back 100 commits. Commits beyond
289+ that depth are not in the local clone and won't appear in the range.
290+ - ` getDiff() ` for a new branch uses ` findNewBranchBase() ` to diff against the parent
291+ of the oldest new commit. If the oldest new commit's parent is beyond the shallow
292+ boundary, ` resolve(parentSha + "^{tree}") ` returns null and the diff falls back to
293+ the empty tree (full-snapshot diff).
294+ - Secret scanning via gitleaks is passed ` commitFrom..commitTo ` and runs ` git log `
295+ natively — it respects the shallow boundary silently.
296+
297+ For most pushes this is fine. A push with more than 100 new commits on a new branch
298+ is unusual, and the shallow clone can be deepened via configuration (` cloneDepth ` ).
299+
300+ ---
301+
302+ ## Diff generation
303+
304+ ### Where diffs are generated
305+
306+ Diffs are generated in both modes but through different code paths:
307+
308+ | Mode | Component | When | What |
309+ | ------| -----------| ------| ------|
310+ | S&F | ` DiffGenerationHook ` (order 280) | Pre-receive, after validation hooks pass | Push diff + optional default-branch diff |
311+ | Proxy | ` ScanDiffFilter ` (order 300) | In the filter chain, after ` EnrichPushCommitsFilter ` | Push diff only |
312+
313+ Both ultimately call ` CommitInspectionService.getFormattedDiff(repo, fromCommit, toCommit) ` .
314+
315+ ### How diffs are computed
316+
317+ ` CommitInspectionService.getDiff() ` resolves both sides to tree objects, then runs
318+ JGit's ` DiffFormatter ` :
319+
320+ ``` java
321+ ObjectId oldId = isNullCommit(fromCommit)
322+ ? findNewBranchBase(repository, toCommit) // new branch: diff against merge base
323+ : repository. resolve(fromCommit + " ^{tree}" ); // existing branch: diff against old tip
324+ ObjectId newId = repository. resolve(toCommit + " ^{tree}" );
325+ ```
326+
327+ The ` ^{tree} ` peel works for both commits and annotated tags — it follows the chain
328+ down to the commit, then to its tree.
329+
330+ ### New branch diff base (` findNewBranchBase ` )
331+
332+ For a new-branch push, diffing against the empty tree would show the entire repo
333+ snapshot — useless for review and would trigger false-positive secret scan findings
334+ on existing files.
335+
336+ Instead, ` findNewBranchBase() ` finds the oldest new commit (same "exclude existing refs"
337+ walk as ` getCommitRange() ` ), then returns the ** tree of that commit's first parent** .
338+ This means the diff shows only the changes introduced by the new commits, not the
339+ entire history they're built on.
340+
341+ If the oldest new commit is a root commit (no parent), the base is null, and the
342+ diff does fall back to the empty tree — but this only happens for genuinely new
343+ repositories.
344+
345+ ### Default-branch diff (S&F only)
346+
347+ ` DiffGenerationHook ` generates a second diff when pushing to a non-default branch:
348+ the total diff of ` defaultBranch..commitTo ` . This helps reviewers see the full scope
349+ of a feature branch without having to check it out.
350+
351+ The default branch is resolved from ` HEAD ` (which in a bare clone is a symbolic ref
352+ to the remote's default branch), falling back to ` refs/heads/main ` or ` refs/heads/master ` .
353+
354+ This diff is stored as a separate ` PushStep ` with step name ` diff:default-branch ` and
355+ tagged as ` type: auto:default-branch ` so the dashboard UI can label it appropriately.
356+
357+ ### Hidden commits detection
358+
359+ The "hidden commits" check exists in both modes (` CheckHiddenCommitsHook ` / ` CheckHiddenCommitsFilter ` )
360+ and catches a subtle attack vector: a developer could create a branch from unapproved
361+ commits that haven't been pushed yet. Git's pack protocol bundles all objects needed
362+ by the receiving side, including ancestor commits that the remote doesn't have.
363+
364+ The algorithm is:
365+
366+ 1 . ** introduced** = commits from ` getCommitRange(oldId, newId) ` — the explicit push range
367+ 2 . ** allNew** = ` RevWalk ` from ` newId ` , marking all existing refs as uninteresting
368+ 3 . ** hidden** = ` allNew ` minus ` introduced `
369+
370+ If hidden is non-empty, the push is rejected. The developer needs to get the hidden
371+ commits approved and pushed first, then retry.
372+
373+ ---
374+
375+ ## Pack data parsing
376+
377+ ### What ` GitReceivePackParser ` does (proxy mode only)
378+
379+ In proxy mode, ` ParseGitRequestFilter ` needs to extract commit metadata from the raw
380+ HTTP request body before JGit ever touches it. The request body contains:
381+
382+ 1 . Packet lines (ref updates + capabilities)
383+ 2 . A flush packet (` 0000 ` )
384+ 3 . Pack data (the ` PACK ` signature followed by pack objects)
385+
386+ ` GitReceivePackParser.parsePush() ` reads the packet line via JGit's ` PacketLineIn ` ,
387+ then parses the first object from the pack data manually:
388+
389+ - Scans for the ` PACK ` signature (4 bytes: ` P ` , ` A ` , ` C ` , ` K ` )
390+ - Skips the 12-byte pack header (signature + version + object count)
391+ - Reads the first pack entry's type+size header (variable-length encoding)
392+ - Inflates the zlib-compressed object data
393+ - If the type is ` OBJ_COMMIT ` (1), parses the raw commit content for author, committer,
394+ parent, message, and GPG signature
395+
396+ This is a ** best-effort parse of the first object only** . It handles the common case
397+ (a commit push where the tip commit is the first pack entry) but intentionally does
398+ not handle:
399+ - Delta objects (` OBJ_OFS_DELTA ` , ` OBJ_REF_DELTA ` ) — logged as a warning
400+ - Tag objects (` OBJ_TAG ` , type 4) — throws "No commit object found"
401+ - Packs where the commit is not the first entry
402+ - Empty packs (lightweight tag pointing to an existing commit)
403+
404+ These failures are caught by the ` try/catch ` in ` parsePush() ` , and
405+ ` PushInfo.commit ` is left null. ` EnrichPushCommitsFilter ` downstream recovers
406+ full commit data from the local clone anyway — the pack-parsed commit is just an
407+ early-availability optimization for ` ParseGitRequestFilter ` .
408+
409+ ### Why the pack parser exists alongside ` EnrichPushCommitsFilter `
410+
411+ ` ParseGitRequestFilter ` runs at order ` MIN_VALUE + 1 ` — it's the first filter.
412+ It needs to populate ` GitRequestDetails ` before any other filter runs.
413+ ` EnrichPushCommitsFilter ` runs at ` MIN_VALUE + 2 ` — immediately after — but requires
414+ a network clone/fetch which may fail.
415+
416+ The pack parser gives ` ParseGitRequestFilter ` a synchronous, no-network way to
417+ extract the head commit's metadata. If it succeeds, ` requestDetails.commit ` is
418+ available immediately. If it fails (tag push, delta-only pack, etc.), the commit
419+ is null and filters that need it wait for ` EnrichPushCommitsFilter ` to populate
420+ ` pushedCommits ` from the local clone.
0 commit comments