Add two-phase remote site import by adamziel · Pull Request #2939 · Automattic/studio

adamziel · 2026-03-27T18:00:39Z

Related issues

Related to Add site import command with streaming migration and SQLite db-apply #2770

How AI was used in this PR

AI helped port the original draft import command, wire the new importer child/runtime integration, and draft focused test coverage. I reviewed and validated the resulting code paths, build output, and test results in this workspace.

Proposed Changes

add a new studio site import command that stores raw imports in ~/.studio/imports/<key> and makes --name optional by inferring the local site name from the remote host
import essential files first, flatten the runtime into ~/Studio/<site>, apply the remote database to SQLite, generate a Playground runtime blueprint, create/start the Studio site, and then continue the skipped-file phase with progress output
persist imported-site metadata so later studio site start uses the generated runtime blueprint and studio site delete --files removes both the visible site directory and the technical import directory

Testing Instructions

npx eslint --fix apps/cli/commands/site/import.ts apps/cli/importer-child.ts apps/cli/lib/import/migration-client.ts apps/cli/commands/site/start.ts apps/cli/commands/site/delete.ts apps/cli/commands/site/tests/import.test.ts apps/cli/commands/site/tests/start.test.ts apps/cli/commands/site/tests/delete.test.ts apps/cli/index.ts apps/cli/vite.config.ts tools/common/lib/site-events.ts tools/common/logger-actions.ts
npm run cli:build
npm test -- apps/cli/commands/site/tests/import.test.ts apps/cli/commands/site/tests/start.test.ts apps/cli/commands/site/tests/delete.test.ts
npm run typecheck currently fails on a pre-existing repo issue in apps/cli/lib/archive.ts:18 (followSymlinks is not part of ArchiverOptions)

Pre-merge Checklist

Have you checked for TypeScript, React or other console errors?

wojtekn · 2026-03-30T09:19:22Z

@adamziel currently Studio uses "import" terminology for importing from a backup file or .sql file, and sync/push/pull for remote site integrations. Also, we are moving the current Import features from UI to the studio import command.

Could you please use another namespace for the new command, e.g., studio stream-sync, studio sync stream, or similar?

adamziel · 2026-03-31T08:56:49Z

Let me loop in @Poliuk – would you be able to map all these distinct features and find a useful naming strategy for them?

The import repair tests use spawnSync('sqlite3', ...) to create test databases. CI environments without sqlite3 (e.g. Windows agents) fail these tests. Skip them conditionally with it.skipIf rather than failing the entire test suite.

…rt-flow # Conflicts: # apps/cli/package.json # package-lock.json

…am-import-flow Add timeout to sqlite3 detection

- migration-client test: skip the PHP-dependent test when native PHP is unavailable instead of trying to mock spawnSync (the vi.mock for node:child_process didn't intercept reliably) - runtime-start-options test: use path.join for paths that the source code constructs via path.join, so the mock matches on Windows where path.join produces backslashes

The previous regex-based parser ignored the byte-length prefix in PHP's serialize format and used JavaScript's String.length (UTF-16 code units) instead of the UTF-8 byte count when re-serializing. This silently corrupts the active_plugins option when any plugin path contains multi-byte characters, causing PHP's unserialize() to fail and effectively deactivating every plugin on the imported site. Replace the hand-rolled parsePhpSerializedStringArray and serializePhpStringArray with a PHP WASM invocation that uses native serialize()/unserialize(). The two per-slug calls are batched into a single PHP WASM boot for efficiency.

parseRuntimePhpConstants used a regex to approximate PHP's define() syntax — it would silently miss constants with double-quoted values, concatenated expressions, or any construct beyond simple literals. Replace it with a PHP WASM invocation that includes the file in a fresh environment and reads back the user-defined constants via get_defined_constants(). This lets PHP's own parser handle the syntax.

downloadedFiles and downloadedBytes were overwritten directly from each progress record, so they dropped back to near-zero whenever the importer process restarted (exit code 2). bytesReceived already had cumulative logic but the other counters did not. Apply the same restart-detection pattern to downloadedFiles and downloadedBytes, and clamp totalFiles/totalBytes to never decrease.

Two problems prevented ?studio-auto-login from working on imported sites: 1. Studio MU-plugins (including the auto-login endpoint) were only mounted for sites without useExactMountLayout. Imported sites use exact mount layouts, so the endpoint didn't exist at all. Move the MU-plugin mounts outside the layout conditional so they're always present. 2. The auto-login MU-plugin looked up studio_admin_username (absent in imported databases) then fell back to 'admin' (also typically absent). Add a final fallback that finds the first administrator account via get_users(), which matches whatever admin the remote site had.

When the progress message wrapped across multiple terminal lines and then shrank on the next update, ora's line-clearing logic over-cleared and erased previous output above the spinner. Truncating the text to a single line prevents wrapping entirely.

When only totalFiles was known (before the first downloaded_files record), the display showed "2838 files". Once a progress record added downloadedFiles, it switched to "50/2838 files". The user saw the leading number jump from 2838 to 50 on each format change. Always use "X/Y files" once either count is known, defaulting the downloaded count to 0, so the leading number only ever increases.

When resuming an import with a stored secret, the previous flow always loaded all WP.com sites (slow, noisy "Loaded 414 sites" message) and sometimes rotated the secret eagerly. Now: - If a URL is provided and a stored secret exists, try it immediately without fetching the site list or rotating. - Only load sites and rotate the secret on demand if the preflight fails with the stored secret. - Replace the separate "Connecting to remote site" spinner with "Initiating the migration" and remove the chatty site-list and secret-rotation success messages.

The file count was only appearing in the message segment (after bytes) instead of the structured files segment (before bytes), which meant the cumulative tracking never applied and the number danced with each record. Three fixes: - readNumber now parses string values — PHP json_encode may emit numeric fields as strings depending on the serialization path. - Broader field name coverage: files_downloaded, files, bytes_downloaded, bytes are now recognized alongside the existing variants. - The free-text message is suppressed once structured file counts are available, avoiding a duplicate "[2838 files]" alongside "0/2838 files".

The streaming-site-migration importer emits files_imported (per-file running count), files_completed (per-request total), files_indexed (index size in lifecycle events), and bytes_processed. None of these matched the field names the progress tracker was looking for (downloaded_files, total_files, etc.), so the structured X/Y files display was never populated and the user saw a dancing number from the raw message text instead. Also handle string-typed numbers from PHP json_encode, extract files_indexed from lifecycle events that return early, and suppress the free-text message once structured file counts are available.

The previous URLs appended ?studio-auto-login as a query parameter to the site URL or wp-admin path. The auto-login MU-plugin intercepts requests to the /studio-auto-login path, not a query parameter on arbitrary pages. Use the correct /studio-auto-login endpoint and pass the target page via redirect_to, matching the pattern used everywhere else in the codebase.

files_completed (per-HTTP-request count from completion chunks) overlaps with files_imported (per-invocation running count from file_progress and heartbeat records). Mixing both into the same cumulative tracker caused the restart detector to misfire whenever a completion chunk reported a lower value than the latest file_progress record, inflating the displayed count far beyond the actual total (e.g. 17137/3495). Use only files_imported for the downloaded count. Same treatment for bytes_processed which is a per-request subset that conflicts with bytes_received. Also prioritize files_indexed over total_files since that is the field the importer actually emits.

Both db-sync and db-apply were running without progress callbacks, showing only a static spinner. The importer already emits structured progress for both: bytes_received during the SQL dump download, and statements_executed/statements_total during db-apply. Wire up onProgress callbacks and progressLabel for both commands. Map bytes_read (db-apply's file offset) to downloadedBytes so the byte progress shows during statement application. Suppress the raw message text when structured statement counts are already displayed.

A bare "2959 files" without a total is noise — it tells the user nothing about how much work remains. Only show the files segment once totalFiles is known, always in X/Y format. The total was also drifting upward as the importer discovered symlink targets and expanded the index. Lock totalFiles and totalBytes to the first value reported so the denominator stays stable throughout the progress display. Remove the premature files_indexed extraction from lifecycle events since those fire before the fetch phase when the index is still incomplete.

The importer's files_imported is a per-invocation running total that resets on exit-code-2 restarts. The importer re-scans already-handled files on resume, so accumulating across restarts counted the same files multiple times — producing counts like 12097/3482. Use the raw per-invocation value directly. It briefly resets to 0 on restart but is always accurate within an invocation. Keep accumulation only for bytes_received (HTTP-level counter that genuinely resets per request without re-scanning).

The importer now emits files_done (cumulative entries processed from the download list, stable across restarts) and files_total (total entries in the download list, fixed once the diff phase completes). Prefer these over files_imported/files_indexed which were per-invocation counters that reset on restart and could exceed the index size. Remove the totalFiles lock since files_total from the importer is already stable by construction.

files_done already includes files_imported (it's download_list_done + files_imported). Emitting both was redundant and forced the consumer to choose between them. Now the importer emits only files_done in progress and heartbeat records, and Studio only reads files_done.

files_done from the importer drops on exit-code-2 restarts because files_imported resets to 0 before the batch offset advances. Use Math.max against the previous snapshot so the displayed count never goes backward. Same treatment for downloadedBytes which can also drop when the importer restarts a batch.

Two bugs: 1. Heartbeats during indexing emitted files_done:0 without files_total. Studio set downloadedFiles=0, which tripped the message suppression (hasStructuredCounts was true), but totalFiles was undefined so no structured file segment rendered either. Total information blackout. Fix: only suppress the message when totalFiles is actually set. 2. The progress label said "Downloading essential files" during the index and diff phases. Now the importer's phase name overrides the generic label when it describes a non-download activity (e.g. "Indexing remote files", "Preparing download list").

Debug records are internal importer diagnostics, not user-facing progress. Stop surfacing them as the spinner message.

When the remaining-files download exited with code 2 (partial progress), the state became status=partial, stage=fetch-skipped. On the next run, prepareSkippedEarlierState overwrote this to status=complete, and the importer was invoked with --filter=skipped-earlier. But the importer's validation only accepts this flag from a complete state — it didn't handle resuming a partial skipped-earlier run. Now detect when the state already indicates an in-progress skipped- earlier run and skip both the state overwrite and the --filter flag. The importer resumes from the state file's stage field directly.

adamziel force-pushed the adamziel/stream-import-flow branch from f58d38f to f8006c2 Compare March 27, 2026 18:11

wojtekn assigned adamziel Mar 30, 2026

adamziel added 5 commits March 31, 2026 20:57

Add two-phase remote site import

985311f

Fix lint errors after rebase

4dfcb6e

Support wp.com imports without secrets

9f5d51d

Fix CLI dist module type

607cc00

Fix CLI build output for site import

3694f85

adamziel force-pushed the adamziel/stream-import-flow branch from 4e7bc64 to 3694f85 Compare March 31, 2026 18:59

adamziel added 20 commits March 31, 2026 21:16

Fix import URL normalization

abf6834

Handle importer JSON stream output

14a4a66

Fix filtered import progress display

71f8bd6

Stream importer progress and add abort

c588af3

Repair unresumable file index state

89eab7e

Improve importer progress streaming

403b2d0

Refine importer timer updates

7c0e761

Accumulate importer byte progress

c59091e

Fix importer state mounts

bd22fb0

Run importer with native PHP

11f90d3

Fix flatten source directory

6a0c22c

Restore files-sync state for skipped files

da4b2dd

Fix imported site SQLite startup

af8a4bb

Propagate JSPI flags through the CLI daemon

a27b066

Repair stale completed site imports

a7426dd

Fix import repair test typing

28b7bce

Fix import repair test lint

b728477

Lock db apply to SQLite rewrites

52b780b

Prefer importer new-site-url when available

cad783d

Use importer new-site-url for db apply

7e274c6

adamziel force-pushed the adamziel/stream-import-flow branch from 2052d3d to 943e346 Compare April 7, 2026 13:25

adamziel added 4 commits April 7, 2026 15:48

Merge remote-tracking branch 'origin/trunk' into adamziel/stream-impo…

f0cccaa

…rt-flow # Conflicts: # apps/cli/package.json # package-lock.json

Trigger CI re-run

6435ddd

fixup! Merge remote-tracking branch 'origin/trunk' into adamziel/stre…

a265e32

…am-import-flow Add timeout to sqlite3 detection

adamziel force-pushed the adamziel/stream-import-flow branch from b89abd4 to 989fcc9 Compare April 7, 2026 15:01

adamziel added 23 commits April 7, 2026 20:47

Refactor native importer rewrite test to avoid PHP dependency

d1248a9

Merge trunk into adamziel/stream-import-flow

472b23c

Suppress debug messages like 'Waiting for server response'

d2a1af0

Debug records are internal importer diagnostics, not user-facing progress. Stop surfacing them as the spinner message.

Merge trunk into adamziel/stream-import-flow

2d94ebb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add two-phase remote site import#2939

Add two-phase remote site import#2939
adamziel wants to merge 69 commits intotrunkfrom
adamziel/stream-import-flow

adamziel commented Mar 27, 2026

Uh oh!

wojtekn commented Mar 30, 2026

Uh oh!

adamziel commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

adamziel commented Mar 27, 2026

Related issues

How AI was used in this PR

Proposed Changes

Testing Instructions

Pre-merge Checklist

Uh oh!

wojtekn commented Mar 30, 2026

Uh oh!

adamziel commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants