Skip to content

daily-file-diet: replace expensive find | wc -l with git ls-tree #256

@lkraav

Description

@lkraav

IMHO :copilot: is correct to flag as follows ⬇️

find ... -exec wc -l {} \; runs one wc process per file and scans every file type (-name '*'). On larger repos this can be very slow and risks hitting workflow timeouts. Prefer restricting to source extensions and using a single wc invocation (e.g., -exec wc -l {} + or -print0 | xargs -0 wc -l) while keeping existing path exclusions.

    - "find . -type f \\( -name '*.go' -o -name '*.py' -o -name '*.ts' -o -name '*.js' -o -name '*.rb' -o -name '*.java' -o -name '*.rs' -o -name '*.cs' -o -name '*.cpp' -o -name '*.c' \\) -not -path '*/.git/*' -not -path '*/node_modules/*' -not -path '*/vendor/*' -not -path '*/dist/*' -not -path '*/build/*' -not -path '*/.next/*' -not -path '*/target/*' -not -path '*/__pycache__/*' -not -path '*/coverage/*' -not -path '*/venv/*' -not -path '*/.tox/*' -not -path '*/.mypy_cache/*' -print0 | xargs -0 wc -l 2>/dev/null"

What about doing something like git ls-tree -r -t -l --full-name HEAD | grep \.c\\?jsx\\?$ | sort -rn -k 4 | head -n 10 for a near-instant calculation even on larger repos?

Example output
± git ls-tree -r -t -l --full-name HEAD | grep \.c\\?jsx\\?$ | sort -rn -k 4 | head -n 10
100644 blob 65d31932231ed13af4fc89e6d6a427f1355a5159   61617    apps/api/src/resources/payment/payment.service.js
100644 blob dffccf9f746756dbd0126cb6320a94ea660c87b6   51076    apps/testers-portal-api/src/resources/test/get/validation-helpers.js
100644 blob eb680232f5ba857d69d3edbdf525aaac80ed7719   50382    apps/web/src/client/pages/test-results/survey/survey.jsx
100644 blob b99b9e9e942cf1d3691e57bbcb4b22e4385e79c1   48097    apps/admin/src/client/components/tests-table/tests-table.jsx
100644 blob 95ed5702d9652b6e5b2a0a3b168d8af9c72f6e97   47869    apps/api/src/resources/test/test.service.js
100644 blob 8c750d70131c032c6f54f6eb3a1d8a15c4891374   47060    apps/web/src/client/pages/test/steps/payment/payment.jsx
100644 blob 4a445db81b6c76916039e16945acf9338e97143b   43841    packages/lib/client/components/logo/logo-type.jsx
100644 blob 74fca97dcab4310b3bf9681792d14451a2af35b7   42685    apps/api/src/helpers/create-survey-results-csv.helper.spec.js
100644 blob 09a19ca908cbcaa442e31adc7c191814b44ce4c5   41439    apps/web/src/client/components/common-payment/common-payment.jsx
100644 blob 867a95cf3fb3423e90f4b6535d0a21145c46e43f   41114    apps/web/src/client/pages/test-results/standard-test/standard-test.jsx

grep statement specifics need to be figured out, but this also avoids manually ignorelisting a ton of potentially unrelated files in a general purpose solution such as we're shipping here.

Source: https://stackoverflow.com/questions/9456550/how-can-i-find-the-n-largest-files-in-a-git-repository

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions