⚡ perf: optimize migrate-mdx.js with async I/O#22
Conversation
Co-authored-by: sunnylqm <615282+sunnylqm@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
📝 WalkthroughWalkthroughThe migration script was refactored from synchronous to asynchronous operations, converting from CommonJS to ES modules, replacing manual directory traversal with recursive Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@site/migrate-mdx.js`:
- Line 40: The script currently swallows migration failures by only logging
errors via main().catch(console.error), which can still yield a zero exit
status; update the catch handler to set process.exitCode = 1 (or call
process.exit(1)) after logging the error so CI/scripts detect failures—modify
the invocation around main() and its catch (the main() call) to log the error
and then set process.exitCode = 1 to indicate a non-zero exit on failure.
- Around line 10-35: The current code runs all file I/O in an unbounded
Promise.all (files.map(...)) using fs.readFile and fs.writeFile which can spike
descriptors; change to a bounded concurrency worker pool (e.g., use p-limit or
an async queue) and replace the Promise.all(files.map(...)) pattern with a
limited runner that processes N files at a time (configurable, e.g., 5-20),
invoking the same async handler that reads, transforms (Callout/Tabs import
logic, Tabs.Tab replacement) and writes each file; ensure errors are propagated
and awaited for all tasks before exit.
| await Promise.all(files.map(async file => { | ||
| let content = await fs.readFile(file, 'utf8'); | ||
|
|
||
| // replace <Callout type="warning"> ... </Callout> with :::warning ... ::: | ||
| content = content.replace(/<Callout[^>]*type="([^"]+)"[^>]*>([\s\S]*?)<\/Callout>/g, (_, type, body) => { | ||
| return `:::${type}\n${body.trim()}\n:::`; | ||
| }); | ||
| content = content.replace(/<Callout>([\s\S]*?)<\/Callout>/g, (_, body) => { | ||
| return `:::info\n${body.trim()}\n:::`; | ||
| }); | ||
|
|
||
| // Replace imports | ||
| content = content.replace(/import\s*\{([^}]*)\}\s*from\s*"nextra\/components";?/g, (match, imports) => { | ||
| const list = imports.split(',').map(i => i.trim()).filter(i => i !== 'Callout' && i !== ''); | ||
| if (list.length === 0) return ''; | ||
| if (list.includes('Tabs') && !list.includes('Tab')) { | ||
| list.push('Tab'); | ||
| } | ||
| return `import { ${list.join(', ')} } from "rspress/theme";`; | ||
| }); | ||
|
|
||
| // Replace <Tabs.Tab> with <Tab> | ||
| content = content.replace(/Tabs\.Tab/g, 'Tab'); | ||
|
|
||
| await fs.writeFile(file, content, 'utf8'); | ||
| })); |
There was a problem hiding this comment.
Bound parallel I/O to avoid EMFILE/resource spikes at scale.
Running all file reads/writes in one Promise.all is unbounded; large trees can overwhelm file descriptors and memory. Use a fixed concurrency worker pool.
Suggested fix
async function main() {
const dirents = await fs.readdir('./pages', { recursive: true, withFileTypes: true });
const files = dirents
.filter(dirent => !dirent.isDirectory() && (dirent.name.endsWith('.mdx') || dirent.name.endsWith('.md')))
.map(dirent => path.join(dirent.parentPath || dirent.path, dirent.name));
- await Promise.all(files.map(async file => {
+ const CONCURRENCY = 32;
+ let idx = 0;
+
+ async function worker() {
+ while (idx < files.length) {
+ const file = files[idx++];
+ await processFile(file);
+ }
+ }
+
+ async function processFile(file) {
let content = await fs.readFile(file, 'utf8');
// replace <Callout type="warning"> ... </Callout> with :::warning ... :::
content = content.replace(/<Callout[^>]*type="([^"]+)"[^>]*>([\s\S]*?)<\/Callout>/g, (_, type, body) => {
return `:::${type}\n${body.trim()}\n:::`;
@@
// Replace <Tabs.Tab> with <Tab>
content = content.replace(/Tabs\.Tab/g, 'Tab');
await fs.writeFile(file, content, 'utf8');
- }));
+ }
+
+ await Promise.all(Array.from({ length: Math.min(CONCURRENCY, files.length) }, () => worker()));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@site/migrate-mdx.js` around lines 10 - 35, The current code runs all file I/O
in an unbounded Promise.all (files.map(...)) using fs.readFile and fs.writeFile
which can spike descriptors; change to a bounded concurrency worker pool (e.g.,
use p-limit or an async queue) and replace the Promise.all(files.map(...))
pattern with a limited runner that processes N files at a time (configurable,
e.g., 5-20), invoking the same async handler that reads, transforms
(Callout/Tabs import logic, Tabs.Tab replacement) and writes each file; ensure
errors are propagated and awaited for all tasks before exit.
| fs.writeFileSync(file, content, 'utf8'); | ||
| }); | ||
| console.log('Done migrating MDX'); | ||
| main().catch(console.error); |
There was a problem hiding this comment.
Return non-zero exit status on migration failure.
main().catch(console.error) logs the error but can still exit successfully. Set process.exitCode = 1 so CI/scripts fail correctly on partial/failed migrations.
Suggested fix
-main().catch(console.error);
+main().catch(err => {
+ console.error(err);
+ process.exitCode = 1;
+});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| main().catch(console.error); | |
| main().catch(err => { | |
| console.error(err); | |
| process.exitCode = 1; | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@site/migrate-mdx.js` at line 40, The script currently swallows migration
failures by only logging errors via main().catch(console.error), which can still
yield a zero exit status; update the catch handler to set process.exitCode = 1
(or call process.exit(1)) after logging the error so CI/scripts detect
failures—modify the invocation around main() and its catch (the main() call) to
log the error and then set process.exitCode = 1 to indicate a non-zero exit on
failure.
💡 What:
site/migrate-mdx.jsto use ES module imports (node:fs/promises).walkfunction with the highly optimized nativefs.readdir(..., { recursive: true, withFileTypes: true }).fs.readFileSyncandfs.writeFileSyncin a.forEach()loop with asynchronousfs.readFileandfs.writeFile.Promise.allto execute the file reading and writing concurrently.🎯 Why:
fs.readdirSyncand generating arrays iteratively with.concatwas computationally inefficient and memory-intensive, specially when scaling up to thousands of files.📊 Measured Improvement:
In a benchmark with 500 generated dummy MDX files executed 50 times inside the provided node/bun container environment:
Node (v22)
Bun (Preferred Project Environment)
Additionally, memory footprints decreased by removing the recursive
.concat()allocation forwalk()in favor of iterating the single set of generatedDirententities returned byrecursive: true.PR created automatically by Jules for task 6952802215060890542 started by @sunnylqm
Summary by CodeRabbit