Skip to content

Commit fed4a49

Browse files
Copilotdsyme
andauthored
daily-file-diet: replace expensive find | wc -l with git ls-tree (#283)
* Initial plan * daily-file-diet: replace expensive find | wc -l with git ls-tree Co-authored-by: dsyme <7204669+dsyme@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: dsyme <7204669+dsyme@users.noreply.github.com>
1 parent c7e1542 commit fed4a49

2 files changed

Lines changed: 12 additions & 20 deletions

File tree

docs/daily-file-diet.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ gh aw compile
2222

2323
The Daily File Diet workflow runs on weekdays and:
2424

25-
1. **Scans Source Files** - Finds all non-test source files in your repository, excluding generated directories like `node_modules`, `vendor`, `dist`, and `target`
25+
1. **Scans Source Files** - Finds all tracked non-test source files in your repository using `git ls-tree`, which automatically respects `.gitignore` and avoids scanning generated directories like `node_modules`, `vendor`, `dist`, and `target`
2626
2. **Identifies Oversized Files** - Detects files exceeding 500 lines (the healthy size threshold)
2727
3. **Analyzes Structure** - Examines what the file contains: functions, classes, modules, and their relationships
2828
4. **Creates Refactoring Issues** - Proposes concrete split strategies with specific file names, responsibilities, and implementation guidance
@@ -80,7 +80,7 @@ gh aw edit daily-file-diet
8080

8181
Common customizations:
8282
- **Adjust the threshold** - Change the 500-line limit to suit your team's preferences
83-
- **Focus on specific languages** - Restrict `find` commands to your repository's primary language
83+
- **Focus on specific languages** - Restrict the `grep` pattern in the `git ls-tree` pipeline to your repository's primary language
8484
- **Add labels** - Apply team-specific labels to generated issues
8585
- **Change the schedule** - Run less frequently if your codebase changes slowly
8686

workflows/daily-file-diet.md

Lines changed: 10 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -25,19 +25,13 @@ tools:
2525
github:
2626
toolsets: [default]
2727
bash:
28-
- "find . -type f -not -path '*/.git/*' -not -path '*/node_modules/*' -not -path '*/vendor/*' -not -path '*/dist/*' -not -path '*/build/*' -not -path '*/.next/*' -not -path '*/target/*' -not -path '*/__pycache__/*' -not -path '*/coverage/*' -not -path '*/venv/*' -not -path '*/.tox/*' -not -path '*/.mypy_cache/*' -name '*' -exec wc -l {} \\; 2>/dev/null"
28+
- "git ls-tree -r --name-only HEAD"
29+
- "git ls-tree -r -l --full-name HEAD"
30+
- "git ls-tree -r --name-only HEAD | grep -E * | grep -vE * | xargs wc -l 2>/dev/null"
31+
- "git ls-tree -r --name-only HEAD | grep -E * | xargs wc -l 2>/dev/null"
2932
- "wc -l *"
3033
- "head -n * *"
3134
- "grep -n * *"
32-
- "find . -type f -name '*.go' -not -path '*_test.go' -not -path '*/vendor/*'"
33-
- "find . -type f -name '*.py' -not -path '*/__pycache__/*' -not -path '*/venv/*'"
34-
- "find . -type f -name '*.ts' -not -path '*/node_modules/*' -not -path '*/dist/*'"
35-
- "find . -type f -name '*.js' -not -path '*/node_modules/*' -not -path '*/dist/*'"
36-
- "find . -type f -name '*.rb' -not -path '*/vendor/*'"
37-
- "find . -type f -name '*.java' -not -path '*/target/*'"
38-
- "find . -type f -name '*.rs' -not -path '*/target/*'"
39-
- "find . -type f -name '*.cs'"
40-
- "find . -type f \\( -name '*.go' -o -name '*.py' -o -name '*.ts' -o -name '*.js' -o -name '*.rb' -o -name '*.java' -o -name '*.rs' -o -name '*.cs' -o -name '*.cpp' -o -name '*.c' \\) -not -path '*/node_modules/*' -not -path '*/vendor/*' -not -path '*/dist/*' -not -path '*/build/*' -not -path '*/target/*' -not -path '*/__pycache__/*' -exec wc -l {} \\; 2>/dev/null"
4135
- "sort *"
4236
- "cat *"
4337

@@ -67,14 +61,12 @@ First, determine the primary programming language(s) used in this repository. Th
6761

6862
**For polyglot or unknown repos:**
6963
```bash
70-
find . -type f \( -name "*.go" -o -name "*.py" -o -name "*.ts" -o -name "*.js" -o -name "*.rb" -o -name "*.java" -o -name "*.rs" -o -name "*.cs" -o -name "*.cpp" -o -name "*.c" \) \
71-
-not -path "*/node_modules/*" \
72-
-not -path "*/vendor/*" \
73-
-not -path "*/dist/*" \
74-
-not -path "*/build/*" \
75-
-not -path "*/target/*" \
76-
-not -path "*/__pycache__/*" \
77-
-exec wc -l {} \; 2>/dev/null | sort -rn | head -20
64+
git ls-tree -r --name-only HEAD \
65+
| grep -E '\.(go|py|ts|tsx|js|jsx|rb|java|rs|cs|cpp|c|h|hpp)$' \
66+
| grep -vE '(_test\.go|\.test\.(ts|js)|\.spec\.(ts|js)|test_[^/]*\.py|[^/]*_test\.py)$' \
67+
| xargs wc -l 2>/dev/null \
68+
| sort -rn \
69+
| head -20
7870
```
7971

8072
Also skip test files (files ending in `_test.go`, `.test.ts`, `.spec.ts`, `.test.js`, `.spec.js`, `_test.py`, `test_*.py`, etc.) — focus on non-test production code.

0 commit comments

Comments
 (0)