Skip to content

perf: move third party dataset to plugin#6303

Closed
newfish-cmyk wants to merge 16 commits intolabring:v4.14.7-devfrom
newfish-cmyk:plugin-dataset
Closed

perf: move third party dataset to plugin#6303
newfish-cmyk wants to merge 16 commits intolabring:v4.14.7-devfrom
newfish-cmyk:plugin-dataset

Conversation

@newfish-cmyk
Copy link
Copy Markdown
Contributor

No description provided.

@gru-agent
Copy link
Copy Markdown
Contributor

gru-agent Bot commented Jan 21, 2026

There is too much information in the pull request to test.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 21, 2026

Preview mcp_server Image:

registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-pr:fatsgpt_mcp_server_4bb3fabf90646857d152d59bb362c71b16daf1b2

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 21, 2026

Preview sandbox Image:

registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-pr:fatsgpt_sandbox_4bb3fabf90646857d152d59bb362c71b16daf1b2

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 26, 2026

Preview fastgpt Image:

registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-pr:fatsgpt_2d49b31df23b30d77b3202d3151d74bfbd86c6d0

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors the third-party dataset integration from a hardcoded approach (with built-in support for Feishu and Yuque) to a flexible plugin-based architecture. The changes enable external dataset sources to be loaded dynamically from a plugin system, improving extensibility and maintainability.

Changes:

  • Replaced apiDatasetServer with pluginDatasetServer structure to support plugin-based dataset sources
  • Removed hardcoded Feishu and Yuque implementations, replacing them with a unified plugin request handler
  • Added dynamic form rendering for plugin dataset configurations
  • Introduced migration script (initv4147.ts) to migrate existing data to the new schema

Reviewed changes

Copilot reviewed 55 out of 65 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
packages/service/core/dataset/schema.ts Updated schema to use pluginDatasetServer instead of apiDatasetServer
packages/service/core/dataset/apiDataset/index.ts Refactored to route requests through plugin system
packages/service/core/dataset/apiDataset/plugin/api.ts New unified plugin dataset request handler
projects/app/src/pages/api/admin/initv4147.ts Data migration script for converting old dataset configs
projects/app/src/pageComponents/dataset/ApiDatasetForm/* New dynamic form components for plugin datasets
projects/app/src/pageComponents/config/tool/ThirdPartyDatasetConfig.tsx Admin UI for managing third-party dataset plugins
projects/app/src/web/common/system/useSystemStore.ts Added plugin dataset state management
projects/app/src/service/common/system/index.ts Added initialization for plugin dataset sources
packages/global/core/dataset/apiDataset/utils.ts Refactored security filtering for plugin configs
pnpm-lock.yaml Updated @fastgpt-sdk/plugin to 0.2.18-alpha1
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread projects/app/src/pages/api/core/dataset/update.ts
Comment on lines +90 to +94
useEffect(() => {
formFields
.filter((f) => f.type === 'tree-select' && pluginConfig[f.key] !== undefined)
.forEach((f) => updateFieldPathName(f.key));
}, [formFields, pluginConfig, updateFieldPathName]);
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Infinite loop potential: The useEffect hook that watches pluginConfig changes to update path names could trigger infinite re-renders if updateFieldPathName causes pluginConfig to change. While updateFieldPathName itself doesn't directly modify pluginConfig, the dependency array includes pluginConfig which will trigger the effect on every config change. Consider using a more specific dependency or adding a condition to prevent unnecessary path name fetches.

Copilot uses AI. Check for mistakes.
Comment thread projects/app/src/pages/api/common/system/getInitData.ts Outdated
Comment thread packages/global/core/dataset/apiDataset/utils.ts
Comment thread packages/service/core/dataset/apiDataset/plugin/api.ts Outdated
Comment thread packages/service/core/dataset/schema.ts
Comment thread projects/app/src/pages/api/admin/initv4147.ts Outdated
Comment thread projects/app/src/pageComponents/dataset/detail/Info/index.tsx Outdated
Comment thread pnpm-lock.yaml Outdated
Copy link
Copy Markdown
Contributor Author

@newfish-cmyk newfish-cmyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: perf: move third party dataset to plugin

📊 变更概览

  • PR 编号: #6303
  • 作者: @newfish-cmyk
  • 分支: main ← plugin-dataset
  • 变更统计: +1704 -1870 行
  • 涉及文件: 78 个文件

📋 变更总结

这是一个重要的架构重构 PR,将第三方知识库(飞书、语雀、自定义 API)的实现从硬编码迁移到插件系统:

  1. 移除硬编码的第三方数据源:删除 feishuyuqueapiDataset 类型,统一为 pluginDataset
  2. 新增插件数据源架构:通过 @fastgpt-sdk/plugin 提供动态数据源支持
  3. 新增管理配置界面:允许管理员启用/禁用插件数据源
  4. 类型系统重构ApiDatasetServerTypePluginDatasetServerType

✅ 优点

  1. 架构改进:将硬编码的第三方集成抽象为插件系统,提高了扩展性和可维护性
  2. 代码简化:删除了大量重复的飞书/语雀特定代码(~800 行),统一为插件 API 调用
  3. 类型安全:使用 Zod schema 进行运行时类型验证(APIFileItemSchemaPluginDatasetServerSchema 等)
  4. 敏感信息处理filterPluginDatasetServerPublicData 函数使用通用的敏感字段检测逻辑
  5. 国际化支持:完整的中英文翻译更新
  6. 组件拆分:将 ApiDatasetForm 拆分为 FolderTreeSelectModalFormFieldRendererPluginDatasetForm 等独立组件

⚠️ 问题汇总

🔴 严重问题 (2 个,必须修复)

1. 数据迁移脚本需要审查

  • 文件: projects/app/src/pages/api/admin/initv4147.ts
  • 这个迁移脚本将旧的 apiDatasetServer 转换为新的 pluginDatasetServer 格式
  • 需要确保:
    • 迁移脚本有幂等性保护
    • 旧数据格式完全覆盖(feishu、yuque、apiServer)
    • 生产环境执行前需要备份

2. SDK 版本是 alpha

  • 依赖 @fastgpt-sdk/plugin@0.2.18-alpha2
  • Alpha 版本在生产环境可能存在不稳定性
  • 建议:等待正式版本发布或确认 alpha 版本的稳定性

🟡 建议改进 (3 个)

1. 敏感字段检测可能遗漏

  • 文件: packages/global/core/dataset/apiDataset/utils.ts:L161-166
  • 当前只检测 ['token', 'secret', 'auth']
  • 可能遗漏: passwordkeycredential
// 建议扩展
const sensitiveFields = ['token', 'secret', 'auth', 'password', 'key', 'credential', 'apikey'];

2. 类型定义重复

  • packages/service/core/ai/type.d.ts 中的 PluginDatasetSourceTypepackages/global/core/dataset/apiDataset/type.ts 中的 PluginDatasetSourceConfig 结构相似
  • 建议统一为一个类型定义

3. 错误处理可改进

  • 文件: packages/service/core/dataset/apiDataset/plugin/api.ts
  • 多处错误处理使用相同模式,可以抽取为工具函数
// 当前模式
if (res.status !== 200) {
  const errorBody = res.body as { error?: string };
  return Promise.reject(errorBody?.error || 'Failed to ...');
}

🟢 可选优化 (2 个)

1. 全局缓存类型声明

  • PluginDatasetSourcesCache 在全局声明,建议添加初始化检查

2. 组件内存泄漏风险

  • useSystemStore 中的 pluginDatasets 状态更新需要确保组件卸载时不会导致内存泄漏

🧪 测试建议

  1. 功能测试

    • 创建新的插件知识库
    • 导入文件到插件知识库
    • 同步插件知识库内容
    • 管理员启用/禁用插件数据源
  2. 迁移测试

    • 执行 initv4147 迁移脚本
    • 验证旧飞书/语雀知识库正常工作
    • 验证旧 API 知识库正常工作
  3. 边界条件

    • 插件服务不可用时的错误处理
    • 敏感信息是否正确过滤

💬 总体评价

  • 代码质量: ⭐⭐⭐⭐☆ (4/5)
  • 安全性: ⭐⭐⭐⭐☆ (4/5)
  • 性能: ⭐⭐⭐⭐⭐ (5/5)
  • 可维护性: ⭐⭐⭐⭐⭐ (5/5)

🚀 审查结论

建议: 需要小修改后通过

主要关注点:

  1. 确认 SDK alpha 版本的稳定性
  2. 审查并测试数据迁移脚本
  3. 考虑扩展敏感字段检测列表

🤖 Generated with Claude Code

Copy link
Copy Markdown
Contributor Author

@newfish-cmyk newfish-cmyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: perf: move third party dataset to plugin

📊 变更概览

  • PR 编号: #6303
  • 作者: @newfish-cmyk
  • 分支: main ← plugin-dataset
  • 变更统计: +1710 -1863 行
  • 涉及文件: 79 个文件
  • Commits: 7 个

✅ 优点

1. 架构改进 🏗️

  • apiDatasetfeishuyuque 三种类型统一为 pluginDataset,架构更加清晰
  • 类型定义从 .d.ts 迁移到 .ts 并添加 Zod schema 验证,增强类型安全
  • 敏感信息过滤改为通用化实现(基于字段名匹配),更具扩展性

2. 代码简化 📉

  • 删除了三个独立的 API 实现文件(custom/api.tsfeishuDataset/api.tsyuqueDataset/api.ts,约 800 行)
  • 统一使用 pluginClient 调用插件服务,减少重复代码
  • 净删除 153 行代码

3. 类型安全 🔒

// 新增 Zod schema 验证
export const PluginDatasetServerSchema = z.object({
  pluginId: z.string(),
  config: z.record(z.string(), z.any())
});

4. 前端组件拆分合理 🧩

ApiDatasetForm.tsx(436 行)拆分为职责清晰的子组件:

  • index.tsx - 入口组件
  • PluginDatasetForm.tsx - 表单主体
  • FormFieldRenderer.tsx - 字段渲染器
  • FolderTreeSelectModal.tsx - 目录选择器

5. 迁移脚本设计良好 ✅

  • 幂等性保护:查询条件包含 pluginDatasetServer: { $exists: false }
  • 覆盖完整:feishu、yuque、apiServer 三种类型都有处理
  • feConfigs 状态正确迁移到 system_plugin_datasets

⚠️ 问题汇总

🟡 建议改进 (2 个)

1. 敏感信息过滤关键字可补充 (packages/global/core/dataset/apiDataset/utils.ts)

当前过滤列表:

const sensitiveFields = ['token', 'secret', 'auth'];

建议考虑补充:passwordkeycredential 等常见敏感字段名

2. 错误处理可抽取为工具函数 (packages/service/core/dataset/apiDataset/plugin/api.ts)

4 处使用相同的错误处理模式:

if (res.status !== 200) {
  const errorBody = res.body as { error?: string };
  return Promise.reject(errorBody?.error || 'Failed to ...');
}

可考虑抽取为:

const unwrapPluginResponse = <T>(res, fallbackError: string): T => {
  if (res.status !== 200) {
    throw new Error((res.body as { error?: string })?.error || fallbackError);
  }
  return res.body as T;
};

🟢 可选优化 (1 个)

迁移脚本日志增强

建议在迁移脚本中添加日志输出,便于生产环境调试:

console.log('[v4.14.7 Migration] Results:', JSON.stringify(results, null, 2));

🧪 测试建议

  1. 迁移测试:验证幂等性(多次执行不会重复迁移)
  2. 功能测试
    • 插件知识库创建流程
    • 敏感信息过滤效果
    • 插件状态开关功能
  3. 回归测试:确保已有知识库数据正常显示和同步

💬 总体评价

维度 评分 说明
代码质量 ⭐⭐⭐⭐⭐ 架构清晰,代码简化明显
安全性 ⭐⭐⭐⭐☆ 敏感信息过滤机制完善,关键字可补充
性能 ⭐⭐⭐⭐⭐ 无性能问题
可维护性 ⭐⭐⭐⭐⭐ 插件化架构易于扩展

🚀 审查结论

建议: ✅ 可以合并

这是一个高质量的重构 PR,将第三方知识库迁移到插件化架构,架构改进明显,代码质量优秀。上述建议改进项为锦上添花,不阻塞合并。


审查时间: 2026-01-28 | 审查者: Claude Code

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 71 out of 82 changed files in this pull request and generated 9 comments.

Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +83 to +101
const { rawText: parsedText } = await readFileRawTextByUrl({
teamId,
tmbId,
url: previewUrl,
relatedId: apiFileId,
datasetId,
customPdfParse,
getFormatText: true
});

// 存入缓存
getS3RawTextSource().addRawTextBuffer({
sourceId: previewUrl,
sourceName: title || '',
text: parsedText,
customPdfParse
});

return { title, rawText: parsedText };
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling for the readFileRawTextByUrl call. If this function throws an error (e.g., network failure, invalid URL, parsing error), it will propagate without proper context. Consider wrapping this in a try-catch block and providing a more specific error message that helps with debugging, such as:

try {
  const { rawText: parsedText } = await readFileRawTextByUrl({...});
  // ... rest of the code
} catch (error) {
  return Promise.reject(`Failed to parse file from URL: ${error.message}`);
}
Suggested change
const { rawText: parsedText } = await readFileRawTextByUrl({
teamId,
tmbId,
url: previewUrl,
relatedId: apiFileId,
datasetId,
customPdfParse,
getFormatText: true
});
// 存入缓存
getS3RawTextSource().addRawTextBuffer({
sourceId: previewUrl,
sourceName: title || '',
text: parsedText,
customPdfParse
});
return { title, rawText: parsedText };
try {
const { rawText: parsedText } = await readFileRawTextByUrl({
teamId,
tmbId,
url: previewUrl,
relatedId: apiFileId,
datasetId,
customPdfParse,
getFormatText: true
});
// 存入缓存
getS3RawTextSource().addRawTextBuffer({
sourceId: previewUrl,
sourceName: title || '',
text: parsedText,
customPdfParse
});
return { title, rawText: parsedText };
} catch (error) {
const message =
error instanceof Error ? error.message : String(error);
return Promise.reject(
`Failed to parse file from URL "${previewUrl}": ${message}`
);
}

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +22
status: {
type: Number,
default: 1
}
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema defines status as a generic number type without validation. This allows any numeric value to be stored, but only 0 and 1 are meaningful according to the comment. Consider adding validation to restrict the status field to only valid values:

status: {
  type: Number,
  default: 1,
  enum: [0, 1]
}

This would prevent data integrity issues from invalid status values being stored.

Copilot uses AI. Check for mistakes.
Comment on lines +22 to +58
const feishu = await MongoDataset.collection.updateMany(
{ 'apiDatasetServer.feishuServer': { $exists: true }, pluginDatasetServer: { $exists: false } },
[
{
$set: {
type: DatasetTypeEnum.pluginDataset,
pluginDatasetServer: { pluginId: 'feishu', config: '$apiDatasetServer.feishuServer' }
}
}
]
);

// 2. 迁移语雀知识库
const yuque = await MongoDataset.collection.updateMany(
{ 'apiDatasetServer.yuqueServer': { $exists: true }, pluginDatasetServer: { $exists: false } },
[
{
$set: {
type: DatasetTypeEnum.pluginDataset,
pluginDatasetServer: { pluginId: 'yuque', config: '$apiDatasetServer.yuqueServer' }
}
}
]
);

// 3. 迁移 API 文件库
const customApi = await MongoDataset.collection.updateMany(
{ 'apiDatasetServer.apiServer': { $exists: true }, pluginDatasetServer: { $exists: false } },
[
{
$set: {
type: DatasetTypeEnum.pluginDataset,
pluginDatasetServer: { pluginId: 'custom-api', config: '$apiDatasetServer.apiServer' }
}
}
]
);
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration script doesn't remove or clean up the old apiDatasetServer field after migrating to pluginDatasetServer. This will leave stale data in the database. Consider adding logic to unset the old field after successful migration, for example:

$unset: { apiDatasetServer: "" }

This will help keep the database clean and prevent confusion about which field is authoritative.

Copilot uses AI. Check for mistakes.
icon: pluginDataset.iconOutline || pluginDataset.icon,
avatar: pluginDataset.icon,
label: parseI18nString(pluginDataset.name, language),
collectionLabel: language === 'en' ? 'File' : '文件',
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded collectionLabel logic uses a simple language check that only handles 'en' vs Chinese. This approach doesn't properly use the i18n translation system and will fail for other languages. Consider using the translation function t() with a proper translation key instead:

collectionLabel: t('common:File')

This would be more consistent with the rest of the codebase and properly support all configured languages.

Copilot uses AI. Check for mistakes.
<FolderTreeSelectModal
selectId={pluginConfig[currentTreeSelectField] || 'root'}
server={(e) => {
const { basePath, folderToken, ...restConfig } = pluginDatasetServer.config;
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The destructuring on line 164 extracts basePath and folderToken from config but these are hardcoded field names from the old system. Since this is now a generic plugin system, these specific field names might not exist in all plugins. This could cause runtime errors or unexpected behavior.

Consider either:

  1. Removing this logic if it's no longer needed for the plugin system
  2. Making it conditional based on the pluginId
  3. Documenting why these specific fields need to be removed

Copilot uses AI. Check for mistakes.

return global.PluginDatasetSourcesCache.map((source) => ({
...source,
status: configMap.get(source.sourceId) ?? 1
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default status value of 1 (enabled) on line 42 means that if a plugin dataset source exists in the cache but has no corresponding database entry, it will be enabled by default. This could be a security concern if new plugin sources are added and automatically enabled without admin approval.

Consider defaulting to 0 (disabled) instead, requiring explicit admin action to enable new plugin dataset sources:

status: configMap.get(source.sourceId) ?? 0

Copilot uses AI. Check for mistakes.
return useYuqueDatasetRequest({ yuqueServer });
export const getApiDatasetRequest = async (pluginServer?: PluginDatasetServerType) => {
if (!pluginServer?.pluginId) {
return Promise.reject('Missing pluginDatasetServer');
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message 'Missing pluginDatasetServer' is not very descriptive. It doesn't indicate whether the entire object is missing or just the pluginId field. Consider a more specific error message:

return Promise.reject('Missing pluginDatasetServer or pluginId');

This would help with debugging when this error occurs.

Suggested change
return Promise.reject('Missing pluginDatasetServer');
return Promise.reject('Missing pluginDatasetServer or pluginId');

Copilot uses AI. Check for mistakes.
Comment on lines +228 to +241
export async function initPluginDatasetSources(): Promise<void> {
try {
const res = await pluginClient.dataset.source.list();

if (res.status === 200) {
global.PluginDatasetSourcesCache = res.body as PluginDatasetSourceConfig[];
} else {
console.warn('Failed to load plugin dataset sources:', res.body);
global.PluginDatasetSourcesCache = [];
}
} catch (error) {
console.warn('Failed to load plugin dataset sources:', error);
global.PluginDatasetSourcesCache = [];
}
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function silently sets an empty array when plugin dataset sources fail to load (lines 236, 240). This could lead to a degraded user experience where plugin datasets appear unavailable without clear indication of why. Consider:

  1. Logging the error at a higher level (error instead of warn) since this is a critical feature
  2. Setting a flag to indicate the load failure so the UI can display an appropriate message
  3. Implementing retry logic with exponential backoff

This would help operators diagnose and fix the issue more quickly.

Copilot uses AI. Check for mistakes.
Comment thread projects/app/src/pageComponents/dataset/detail/CollectionCard/Header.tsx Outdated
Copy link
Copy Markdown
Collaborator

@c121914yu c121914yu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

缓存 pluginDatasets 配置逻辑优化:
客户端

  1. useInitApp 里获取数据,并且每 5 分钟轮询获取一次
    服务端
    封装一个获取 pluginDatasets 的方法,用 versionId 那个缓存逻辑(plugin 启动/admin 开关知识库时候,刷新 versionId)。不需要 init 系统时候预加载。

Comment thread packages/global/openapi/admin/core/dashboard/api.ts
Comment thread projects/app/src/global/core/config/api.ts
Comment thread projects/app/src/pages/api/core/config/updatePluginDatasetStatus.ts Outdated
Comment thread projects/app/src/pages/api/core/dataset/apiDataset/list.ts Outdated
Comment thread projects/app/src/web/core/config/api.ts
Comment thread packages/service/common/cache/type.ts Outdated
Comment thread packages/web/i18n/zh-Hant/common.json Outdated
Comment thread projects/app/src/pages/api/common/system/getInitData.ts Outdated
Comment thread projects/app/src/pages/api/core/dataset/pluginDataset/getCatalog.ts Outdated
Comment thread projects/app/src/web/common/system/staticData.ts Outdated
Comment thread packages/service/core/dataset/pluginDataset/controller.ts Outdated
@newfish-cmyk newfish-cmyk force-pushed the plugin-dataset branch 4 times, most recently from 2410a78 to d404105 Compare January 31, 2026 09:23
@c121914yu c121914yu changed the base branch from v4.14.6-dev to dataset-plugin-test February 2, 2026 08:28
@c121914yu c121914yu changed the base branch from dataset-plugin-test to v4.14.7-dev February 2, 2026 11:48
@c121914yu
Copy link
Copy Markdown
Collaborator

等 plugin 改完再合并这个 PR

@c121914yu c121914yu force-pushed the v4.14.7-dev branch 2 times, most recently from e0eb0fc to b49b74a Compare February 12, 2026 05:13
@c121914yu c121914yu deleted the branch labring:v4.14.7-dev February 12, 2026 08:38
@c121914yu c121914yu closed this Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants