cloud: revise Dedicated import docs for the new wizard UX#22699
cloud: revise Dedicated import docs for the new wizard UX#22699alastori wants to merge 12 commits intopingcap:release-8.5from
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the TiDB Cloud documentation for CSV and Parquet file imports to align with recent UI changes, including new storage provider selections and destination mapping workflows. It also improves the troubleshooting guide for AWS IAM access by providing clearer instructions on obtaining environment-specific IDs. The review feedback suggests several refinements to enhance readability and professional tone, such as converting passive voice to active voice, correcting minor grammatical omissions, and standardizing UI terminology.
tidb-cloud/import-csv-files.md
Outdated
| > - To achieve better performance, it is recommended to limit the size of each compressed file to 100 MiB. | ||
| > - The Snappy compressed file must be in the [official Snappy format](https://github.com/google/snappy). Other variants of Snappy compression are not supported. | ||
| > - For uncompressed files, if you cannot update the CSV filenames according to the preceding rules in some cases (for example, the CSV file links are also used by your other programs), you can keep the filenames unchanged and use the **Mapping Settings** in [Step 4](#step-4-import-csv-files-to-tidb-cloud) to import your source data to a single target table. | ||
| > - For uncompressed files, if you cannot update the CSV filenames according to the preceding rules (for example, the CSV file links are also used by your other programs), you can keep the filenames unchanged and unselect **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** in the **Destination Mapping** step of [Step 4](#step-4-import-csv-files-to-tidb-cloud) to manually map your source files to a single target table. |
There was a problem hiding this comment.
The term "deselect" is generally preferred over "unselect" in software documentation when referring to clearing a checkbox or radio button.
| > - For uncompressed files, if you cannot update the CSV filenames according to the preceding rules (for example, the CSV file links are also used by your other programs), you can keep the filenames unchanged and unselect **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** in the **Destination Mapping** step of [Step 4](#step-4-import-csv-files-to-tidb-cloud) to manually map your source files to a single target table. | |
| > - For uncompressed files, if you cannot update the CSV filenames according to the preceding rules (for example, the CSV file links are also used by your other programs), you can keep the filenames unchanged and deselect **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** in the **Destination Mapping** step of [Step 4](#step-4-import-csv-files-to-tidb-cloud) to manually map your source files to a single target table. |
References
- Clarity, simplicity, and readability are key aspects of the review. (link)
tidb-cloud/import-csv-files.md
Outdated
| 5. In the **Destination Mapping** section, specify how source files are mapped to target tables. | ||
|
|
||
| When importing multiple files, you can use **Advanced Settings** > **Mapping Settings** to customize the mapping of individual target tables to their corresponding CSV files. For each target database and table: | ||
| When a directory is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is selected by default. |
There was a problem hiding this comment.
Avoid using the passive voice to make the instructions more direct and clear.
| When a directory is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is selected by default. | |
| When you specify a directory in **Source Files URI**, TiDB Cloud selects the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option by default. |
References
- Avoid passive voice overuse. (link)
tidb-cloud/import-csv-files.md
Outdated
| - `s3://mybucket/myfolder/my-data*.csv`: all CSV files starting with `my-data` (such as `my-data10.csv` and `my-data100.csv`) in `myfolder` will be imported into the same target table. | ||
| > **Note:** | ||
| > | ||
| > When a single file is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is not displayed, and TiDB Cloud automatically populates the **Source** field with the file name. In this case, you only need to enter the target database and table for data import. |
There was a problem hiding this comment.
Avoid using the passive voice to improve readability and directness.
| > When a single file is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is not displayed, and TiDB Cloud automatically populates the **Source** field with the file name. In this case, you only need to enter the target database and table for data import. | |
| When you specify a single file in **Source Files URI**, TiDB Cloud does not display the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option and automatically populates the **Source** field with the file name. In this case, you only need to enter the target database and table for data import. |
References
- Avoid passive voice overuse. (link)
tidb-cloud/import-csv-files.md
Outdated
|
|
||
| - To let TiDB Cloud automatically map all source files that follow the [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) to their corresponding tables, keep this option selected and select **CSV** as the data format. If your source folder includes schema files (such as `${db_name}-schema-create.sql` and `${db_name}.${table_name}-schema.sql`), TiDB Cloud uses them to create the target databases and tables when they do not already exist. | ||
|
|
||
| - To manually configure the mapping rules to associate your source CSV files with the target database and table, unselect this option, and then fill in the following fields: |
There was a problem hiding this comment.
The term "deselect" is preferred over "unselect" for UI actions.
| - To manually configure the mapping rules to associate your source CSV files with the target database and table, unselect this option, and then fill in the following fields: | |
| - To manually configure the mapping rules to associate your source CSV files with the target database and table, deselect this option, and then fill in the following fields: |
References
- Clarity, simplicity, and readability. (link)
tidb-cloud/import-csv-files.md
Outdated
|
|
||
| - To manually configure the mapping rules to associate your source CSV files with the target database and table, unselect this option, and then fill in the following fields: | ||
|
|
||
| - **Source**: enter the file name pattern in the `[file_name].csv` format. For example, `TableName.01.csv`. You can also use wildcards to match multiple files. Only `*` and `?` wildcards are supported. |
There was a problem hiding this comment.
Avoid passive voice to make the statement more direct.
| - **Source**: enter the file name pattern in the `[file_name].csv` format. For example, `TableName.01.csv`. You can also use wildcards to match multiple files. Only `*` and `?` wildcards are supported. | |
| - **Source**: enter the file name pattern in the `[file_name].csv` format. For example, `TableName.01.csv`. You can also use wildcards to match multiple files. TiDB Cloud only supports the `*` and `?` wildcards. |
References
- Avoid passive voice overuse. (link)
tidb-cloud/import-csv-files.md
Outdated
| If necessary, click **Edit CSV Configuration** to configure the options according to your CSV files. You can set the separator and delimiter characters, specify whether to use backslashes for escaped characters, and specify whether your files contain a header row. | ||
|
|
||
| 7. When the import progress shows **Completed**, check the imported tables. | ||
| 6. Click **Next**. TiDB Cloud scans the source files accordingly. |
There was a problem hiding this comment.
The word "accordingly" is unnecessary here and can be removed for conciseness.
| 6. Click **Next**. TiDB Cloud scans the source files accordingly. | |
| 6. Click **Next**. TiDB Cloud scans the source files. |
References
- Avoid unnecessary words and repetition. (link)
| > For TiDB Cloud Starter or TiDB Cloud Essential, see [Import Apache Parquet Files from Cloud Storage into TiDB Cloud Starter or Essential](/tidb-cloud/import-parquet-files-serverless.md). | ||
|
|
||
| ## Limitations | ||
|
|
There was a problem hiding this comment.
The note in Step 1 still reads:
If you cannot update the Parquet filenames according to the preceding rules in some cases (for example, the Parquet file links are also used by your other programs), you can keep the filenames unchanged and use the Mapping Settings in Step 4 to import your source data to a single target table.
The CSV doc's equivalent note was updated to reference the new Destination Mapping step. This Parquet note should receive the same update. Suggested replacement:
If you cannot update the Parquet filenames according to the preceding rules (for example, the Parquet file links are also used by your other programs), you can keep the filenames unchanged and deselect Use File naming conventions for automatic mapping in the Destination Mapping substep of Step 4 to manually map your source files to a single target table.
There was a problem hiding this comment.
Thanks, fixed in the latest commit. The Step 1 Note now mirrors the CSV doc wording.
There was a problem hiding this comment.
The Troubleshooting section in the Parquet doc was not updated, but the same section in import-csv-files.md was. Three stale references remain:
- "Resolve warnings during data import" — still says
After clicking **Start Import**andusing **Advanced Settings** to make changes. Should match the CSV doc's updated wording: "If the Pre-check step shows a warning…returning to the Destination Mapping step and switching to manual mapping rules." - "Zero rows in the imported tables" — still says
no data files matched the Bucket URIandusing **Advanced Settings** to make changes. Should saysource URIand reference the Destination Mapping step, matching the CSV doc. - "After resolving these issues, you need to import the data again." — the CSV doc now says "return to the wizard and run the import again". The Parquet doc should match.
There was a problem hiding this comment.
Good catch, all three references in the Parquet Troubleshooting section are now updated to match the CSV doc:
- "Resolve warnings during data import" now references the Pre-check step and switching to manual mapping rules on the Destination Mapping step.
- "Zero rows in the imported tables" now says "source URI" (not "Bucket URI") and also references the Destination Mapping fallback.
- "After resolving these issues" now says "return to the wizard and run the import again".
|
|
||
| - For CSV files, see **Advanced Settings** > **Mapping Settings** in [Step 4. Import CSV files to TiDB Cloud](/tidb-cloud/import-csv-files.md#step-4-import-csv-files-to-tidb-cloud) | ||
| - For Parquet files, see **Advanced Settings** > **Mapping Settings** in [Step 4. Import Parquet files to TiDB Cloud](/tidb-cloud/import-parquet-files.md#step-4-import-parquet-files-to-tidb-cloud) | ||
| In the import wizard, on the **Destination Mapping** step, unselect **Use File naming conventions for automatic mapping**, and then fill in the **Source**, **Target Database**, and **Target Table** fields. The **Source** field accepts a file name pattern that supports the `*` and `?` wildcards. |
There was a problem hiding this comment.
| In the import wizard, on the **Destination Mapping** step, unselect **Use File naming conventions for automatic mapping**, and then fill in the **Source**, **Target Database**, and **Target Table** fields. The **Source** field accepts a file name pattern that supports the `*` and `?` wildcards. | |
| In the import wizard, on the **Destination Mapping** step, deselect **Use File naming conventions for automatic mapping**, and then fill in the **Source**, **Target Database**, and **Target Table** fields. The **Source** field accepts a file name pattern that supports the `*` and `?` wildcards. |
There was a problem hiding this comment.
Applied. Also prefixed the label with "TiDB" (Use TiDB file naming conventions for automatic mapping) to match the literal wizard checkbox text. I swept the same fix through import-csv-files.md and import-parquet-files.md so the label reads consistently across all three files (19 occurrences total).
Update tidb-cloud/import-csv-files.md and tidb-cloud/import-parquet-files.md to match the new 3-step Dedicated import wizard (Connection -> Destination Mapping -> Pre-check -> Start Import) introduced in tidb-cloud-console#4617: - Replace per-provider "Import Data from X" pages with the unified "Import Data from Cloud Storage" page that exposes a Storage Provider dropdown - Rename "File URI" / "Folder URI" to "Source Files URI" and document both single-file and folder URI formats - Rename "Bucket Access" to "Credentials"; for AWS, point users to the "Having trouble? Create Role ARN manually" expandable to fetch the TiDB Cloud Account ID and External ID for their cluster - Replace "Connect" + "Destination" + "Start Import" steps with the new "Destination Mapping" step that supports automatic mapping via file naming conventions or manual mapping rules with wildcard patterns - Document the new "Pre-check" step (separate from manual scan retries) - Preserve the Azure Blob Storage Private Link content from pingcap#22427 Also update tidb-cloud/troubleshoot-import-access-denied-error.md to: - Replace the literal sample Account ID and External ID values with <TiDB-Cloud-Account-ID> and <TiDB-Cloud-External-ID> placeholders (these values are environment-specific and per-cluster) - Add a discovery procedure that points users at the wizard's "Add New Role ARN" -> "Having trouble? Create Role ARN manually" expandable to fetch the actual values from the console Mirrors the Serverless rewrite from pingcap#21361 (Cloud: Import ux optimization). Related: DM-12710, DM-12798, DM-12799
Update two more files referenced from the import walkthroughs to match the new 3-step Dedicated import wizard: - tidb-cloud/dedicated-external-storage.md Update the discovery procedures for the TiDB Cloud Account ID, TiDB Cloud External ID, and Google Cloud Service Account ID so they point at the new "Import Data from Cloud Storage" page instead of the legacy per-provider pages, and use the new "Having trouble? Create Role ARN manually" expandable label. - tidb-cloud/naming-conventions-for-data-import.md Replace the legacy "Advanced Settings > Mapping Settings" reference in the file-pattern section with the new "Destination Mapping" step and the "Use File naming conventions for automatic mapping" toggle. Related: DM-12710
The Dedicated import wizard uses the field label "Source URI" while the Premium and Serverless wizards use "Source Files URI". The doc reflects the actual Dedicated wizard label. The label divergence between tiers is tracked separately and the wizard will be aligned in a follow-up.
- Replace "unselect" with "deselect" for the auto-mapping toggle - Convert passive "X is selected by default" / "is not displayed" to active "TiDB Cloud selects X by default" / "does not display X" - Convert passive "Only X and Y are supported" to active "TiDB Cloud only supports X and Y" - Drop unnecessary "accordingly" from "scans the source files" Skipped Gemini's "create a new one" article suggestions because the actual wizard button text is "Click here to create new one with AWS CloudFormation" (no article); aligning the doc to the button text takes precedence over the grammar nit. The missing article is tracked as part of DM-12803.
Match the pattern in tidb-cloud/premium/import-csv-files-premium.md and tidb-cloud/import-csv-files-serverless.md so users landing on the Dedicated page can quickly jump to the correct doc for their tier. - import-csv-files.md: link to both Starter/Essential and Premium CSV import docs (Premium has its own CSV import doc). - import-parquet-files.md: link only to Starter/Essential (Premium has no separate Parquet import doc).
Apply Grace's review comments on pingcap#22699: - import-parquet-files.md Step 1 note: reference the Destination Mapping step and use "deselect", matching the CSV doc. - import-parquet-files.md Troubleshooting: update "Resolve warnings during data import" and "Zero rows in the imported tables" to match the CSV doc (Pre-check wording, source URI, return-to-wizard). - naming-conventions-for-data-import.md: change "unselect" to "deselect". Also prefix the checkbox label with "TiDB" across csv, parquet, and naming-conventions docs (19 occurrences) to match the literal wizard UI text "Use TiDB file naming conventions for automatic mapping".
5e3a385 to
983e275
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| 3. Click **Import data from Cloud Storage**. | ||
|
|
||
| 4. Click **Show Google Cloud Server Account ID**, and then copy the Service Account ID for later use. | ||
| 4. On the **Import Data from Cloud Storage** page, set **Storage Provider** to **Google Cloud Storage**, and then copy the Google Cloud Service Account ID displayed under **Credentials** for later use. |
There was a problem hiding this comment.
| 4. On the **Import Data from Cloud Storage** page, set **Storage Provider** to **Google Cloud Storage**, and then copy the Google Cloud Service Account ID displayed under **Credentials** for later use. | |
| 4. On the **Import Data from Cloud Storage** page, set **Storage Provider** to **Google Cloud Storage**, and then copy the Google Cloud Service Account ID displayed under **Credential** for later use. |
tidb-cloud/import-csv-files.md
Outdated
| - **Source URI**: | ||
| - When importing one file, enter the source file URI in the following format `gs://[bucket_name]/[data_source_folder]/[file_name].csv`. For example, `gs://mybucket/myfolder/TableName.01.csv`. | ||
| - When importing multiple files, enter the source folder URI in the following format `gs://[bucket_name]/[data_source_folder]/`. For example, `gs://mybucket/myfolder/`. | ||
| - **Credentials**: TiDB Cloud provides a unique Google Cloud Service Account ID on this page (such as `example-service-account@your-project.iam.gserviceaccount.com`). Grant this Service Account ID the necessary IAM permissions (such as `Storage Object Viewer`) on your GCS bucket within your Google Cloud project. For more information, see [Configure GCS access](/tidb-cloud/dedicated-external-storage.md#configure-gcs-access). |
There was a problem hiding this comment.
| - **Credentials**: TiDB Cloud provides a unique Google Cloud Service Account ID on this page (such as `example-service-account@your-project.iam.gserviceaccount.com`). Grant this Service Account ID the necessary IAM permissions (such as `Storage Object Viewer`) on your GCS bucket within your Google Cloud project. For more information, see [Configure GCS access](/tidb-cloud/dedicated-external-storage.md#configure-gcs-access). | |
| - **Credential**: TiDB Cloud provides a unique Google Cloud Service Account ID on this page (such as `example-service-account@your-project.iam.gserviceaccount.com`). Grant this Service Account ID the necessary IAM permissions (such as `Storage Object Viewer`) on your GCS bucket within your Google Cloud project. For more information, see [Configure GCS access](/tidb-cloud/dedicated-external-storage.md#configure-gcs-access). |
| - **Source URI**: | ||
| - When importing one file, enter the source file URI in the following format `gs://[bucket_name]/[data_source_folder]/[file_name].parquet`. For example, `gs://mybucket/myfolder/TableName.01.parquet`. | ||
| - When importing multiple files, enter the source folder URI in the following format `gs://[bucket_name]/[data_source_folder]/`. For example, `gs://mybucket/myfolder/`. | ||
| - **Credentials**: TiDB Cloud provides a unique Google Cloud Service Account ID on this page (such as `example-service-account@your-project.iam.gserviceaccount.com`). Grant this Service Account ID the necessary IAM permissions (such as `Storage Object Viewer`) on your GCS bucket within your Google Cloud project. For more information, see [Configure GCS access](/tidb-cloud/dedicated-external-storage.md#configure-gcs-access). |
There was a problem hiding this comment.
| - **Credentials**: TiDB Cloud provides a unique Google Cloud Service Account ID on this page (such as `example-service-account@your-project.iam.gserviceaccount.com`). Grant this Service Account ID the necessary IAM permissions (such as `Storage Object Viewer`) on your GCS bucket within your Google Cloud project. For more information, see [Configure GCS access](/tidb-cloud/dedicated-external-storage.md#configure-gcs-access). | |
| - **Credential**: TiDB Cloud provides a unique Google Cloud Service Account ID on this page (such as `example-service-account@your-project.iam.gserviceaccount.com`). Grant this Service Account ID the necessary IAM permissions (such as `Storage Object Viewer`) on your GCS bucket within your Google Cloud project. For more information, see [Configure GCS access](/tidb-cloud/dedicated-external-storage.md#configure-gcs-access). |
|
|
||
| - **Folder URI**: enter the Azure Blob Storage URI where your source files are located using the format `https://[account_name].blob.core.windows.net/[container_name]/[data_source_folder]/`. The path must end with a `/`. For example, `https://myaccount.blob.core.windows.net/mycontainer/data-ingestion/`. | ||
| - **SAS Token**: enter an account SAS token to allow TiDB Cloud to access the source files in your Azure Blob Storage container. If you don't have one yet, you can create it using the provided Azure ARM template by clicking **Click here to create a new one with Azure ARM template** and following the instructions on the screen. Alternatively, you can manually create an account SAS token. For more information, see [Configure Azure Blob Storage access](/tidb-cloud/dedicated-external-storage.md#configure-azure-blob-storage-access). | ||
| - **Credentials**: enter an account SAS token to allow TiDB Cloud to access the source files in your Azure Blob Storage container. If you do not have one yet, click **Click here to create a new one with Azure ARM template** and follow the instructions on the screen, or manually create an account SAS token. For more information, see [Configure Azure Blob Storage access](/tidb-cloud/dedicated-external-storage.md#configure-azure-blob-storage-access). |
There was a problem hiding this comment.
| - **Credentials**: enter an account SAS token to allow TiDB Cloud to access the source files in your Azure Blob Storage container. If you do not have one yet, click **Click here to create a new one with Azure ARM template** and follow the instructions on the screen, or manually create an account SAS token. For more information, see [Configure Azure Blob Storage access](/tidb-cloud/dedicated-external-storage.md#configure-azure-blob-storage-access). | |
| - **SAS Token**: enter an account SAS token to allow TiDB Cloud to access the source files in your Azure Blob Storage container. If you do not have one yet, click **Click here to create a new one with Azure ARM template** and follow the instructions on the screen, or manually create an account SAS token. For more information, see [Configure Azure Blob Storage access](/tidb-cloud/dedicated-external-storage.md#configure-azure-blob-storage-access). |
[LGTM Timeline notifier]Timeline:
|
|
@zoubingwu: adding LGTM is restricted to approvers and reviewers in OWNERS files. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Co-authored-by: xixirangrang <hfxsd@hotmail.com>
|
@zoubingwu: adding LGTM is restricted to approvers and reviewers in OWNERS files. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@alastori: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Update the TiDB Cloud Dedicated import docs to match the new 3-step wizard introduced in tidb-cloud-console#4617. Mirrors the Serverless rewrite from #21361.
Files changed:
Key changes (per provider walkthrough):
Troubleshoot doc fix:
The page previously hard-coded `380838443567` as "the TiDB Cloud Account ID" and a literal hex External ID. These values are environment-specific and per-cluster, so this PR replaces them with `` and `` placeholders and adds a short discovery procedure that points users at the wizard's Add New Role ARN -> Having trouble? Create Role ARN manually expandable to fetch the actual values from the console.
Background
Captured the new wizard end-to-end against three Dedicated clusters (AWS, Azure, GCP) on the staging deploy preview `deploy-preview-4617--staging-tidbcloud.netlify.app`. The full AS-IS report with screenshots, observations, and the doc-update plan is at https://pingcap.feishu.cn/wiki/I5lgwOQnSibElNka3U4cX899nud
Deferrals: Azure Blob and GCS provider walkthroughs were modeled after the Serverless rewrite (#21361) and the existing Azure Private Link content (#22427); they were not exercised end-to-end in this session and would benefit from a follow-up validation pass before the wizard ships to GA.
Test plan
Related