perf: move third party dataset to plugin#6303
perf: move third party dataset to plugin#6303newfish-cmyk wants to merge 16 commits intolabring:v4.14.7-devfrom
Conversation
|
There is too much information in the pull request to test. |
Preview mcp_server Image: |
Preview sandbox Image: |
Preview fastgpt Image: |
There was a problem hiding this comment.
Pull request overview
This pull request refactors the third-party dataset integration from a hardcoded approach (with built-in support for Feishu and Yuque) to a flexible plugin-based architecture. The changes enable external dataset sources to be loaded dynamically from a plugin system, improving extensibility and maintainability.
Changes:
- Replaced
apiDatasetServerwithpluginDatasetServerstructure to support plugin-based dataset sources - Removed hardcoded Feishu and Yuque implementations, replacing them with a unified plugin request handler
- Added dynamic form rendering for plugin dataset configurations
- Introduced migration script (initv4147.ts) to migrate existing data to the new schema
Reviewed changes
Copilot reviewed 55 out of 65 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/service/core/dataset/schema.ts | Updated schema to use pluginDatasetServer instead of apiDatasetServer |
| packages/service/core/dataset/apiDataset/index.ts | Refactored to route requests through plugin system |
| packages/service/core/dataset/apiDataset/plugin/api.ts | New unified plugin dataset request handler |
| projects/app/src/pages/api/admin/initv4147.ts | Data migration script for converting old dataset configs |
| projects/app/src/pageComponents/dataset/ApiDatasetForm/* | New dynamic form components for plugin datasets |
| projects/app/src/pageComponents/config/tool/ThirdPartyDatasetConfig.tsx | Admin UI for managing third-party dataset plugins |
| projects/app/src/web/common/system/useSystemStore.ts | Added plugin dataset state management |
| projects/app/src/service/common/system/index.ts | Added initialization for plugin dataset sources |
| packages/global/core/dataset/apiDataset/utils.ts | Refactored security filtering for plugin configs |
| pnpm-lock.yaml | Updated @fastgpt-sdk/plugin to 0.2.18-alpha1 |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| useEffect(() => { | ||
| formFields | ||
| .filter((f) => f.type === 'tree-select' && pluginConfig[f.key] !== undefined) | ||
| .forEach((f) => updateFieldPathName(f.key)); | ||
| }, [formFields, pluginConfig, updateFieldPathName]); |
There was a problem hiding this comment.
Infinite loop potential: The useEffect hook that watches pluginConfig changes to update path names could trigger infinite re-renders if updateFieldPathName causes pluginConfig to change. While updateFieldPathName itself doesn't directly modify pluginConfig, the dependency array includes pluginConfig which will trigger the effect on every config change. Consider using a more specific dependency or adding a condition to prevent unnecessary path name fetches.
c342ead to
5613876
Compare
There was a problem hiding this comment.
PR Review: perf: move third party dataset to plugin
📊 变更概览
- PR 编号: #6303
- 作者: @newfish-cmyk
- 分支: main ← plugin-dataset
- 变更统计: +1704 -1870 行
- 涉及文件: 78 个文件
📋 变更总结
这是一个重要的架构重构 PR,将第三方知识库(飞书、语雀、自定义 API)的实现从硬编码迁移到插件系统:
- 移除硬编码的第三方数据源:删除
feishu、yuque、apiDataset类型,统一为pluginDataset - 新增插件数据源架构:通过
@fastgpt-sdk/plugin提供动态数据源支持 - 新增管理配置界面:允许管理员启用/禁用插件数据源
- 类型系统重构:
ApiDatasetServerType→PluginDatasetServerType
✅ 优点
- 架构改进:将硬编码的第三方集成抽象为插件系统,提高了扩展性和可维护性
- 代码简化:删除了大量重复的飞书/语雀特定代码(~800 行),统一为插件 API 调用
- 类型安全:使用 Zod schema 进行运行时类型验证(
APIFileItemSchema、PluginDatasetServerSchema等) - 敏感信息处理:
filterPluginDatasetServerPublicData函数使用通用的敏感字段检测逻辑 - 国际化支持:完整的中英文翻译更新
- 组件拆分:将
ApiDatasetForm拆分为FolderTreeSelectModal、FormFieldRenderer、PluginDatasetForm等独立组件
⚠️ 问题汇总
🔴 严重问题 (2 个,必须修复)
1. 数据迁移脚本需要审查
- 文件:
projects/app/src/pages/api/admin/initv4147.ts - 这个迁移脚本将旧的
apiDatasetServer转换为新的pluginDatasetServer格式 - 需要确保:
- 迁移脚本有幂等性保护
- 旧数据格式完全覆盖(feishu、yuque、apiServer)
- 生产环境执行前需要备份
2. SDK 版本是 alpha
- 依赖
@fastgpt-sdk/plugin@0.2.18-alpha2 - Alpha 版本在生产环境可能存在不稳定性
- 建议:等待正式版本发布或确认 alpha 版本的稳定性
🟡 建议改进 (3 个)
1. 敏感字段检测可能遗漏
- 文件:
packages/global/core/dataset/apiDataset/utils.ts:L161-166 - 当前只检测
['token', 'secret', 'auth'] - 可能遗漏:
password、key、credential等
// 建议扩展
const sensitiveFields = ['token', 'secret', 'auth', 'password', 'key', 'credential', 'apikey'];2. 类型定义重复
packages/service/core/ai/type.d.ts中的PluginDatasetSourceType与packages/global/core/dataset/apiDataset/type.ts中的PluginDatasetSourceConfig结构相似- 建议统一为一个类型定义
3. 错误处理可改进
- 文件:
packages/service/core/dataset/apiDataset/plugin/api.ts - 多处错误处理使用相同模式,可以抽取为工具函数
// 当前模式
if (res.status !== 200) {
const errorBody = res.body as { error?: string };
return Promise.reject(errorBody?.error || 'Failed to ...');
}🟢 可选优化 (2 个)
1. 全局缓存类型声明
PluginDatasetSourcesCache在全局声明,建议添加初始化检查
2. 组件内存泄漏风险
useSystemStore中的pluginDatasets状态更新需要确保组件卸载时不会导致内存泄漏
🧪 测试建议
-
功能测试
- 创建新的插件知识库
- 导入文件到插件知识库
- 同步插件知识库内容
- 管理员启用/禁用插件数据源
-
迁移测试
- 执行 initv4147 迁移脚本
- 验证旧飞书/语雀知识库正常工作
- 验证旧 API 知识库正常工作
-
边界条件
- 插件服务不可用时的错误处理
- 敏感信息是否正确过滤
💬 总体评价
- 代码质量: ⭐⭐⭐⭐☆ (4/5)
- 安全性: ⭐⭐⭐⭐☆ (4/5)
- 性能: ⭐⭐⭐⭐⭐ (5/5)
- 可维护性: ⭐⭐⭐⭐⭐ (5/5)
🚀 审查结论
建议: 需要小修改后通过
主要关注点:
- 确认 SDK alpha 版本的稳定性
- 审查并测试数据迁移脚本
- 考虑扩展敏感字段检测列表
🤖 Generated with Claude Code
newfish-cmyk
left a comment
There was a problem hiding this comment.
PR Review: perf: move third party dataset to plugin
📊 变更概览
- PR 编号: #6303
- 作者: @newfish-cmyk
- 分支: main ← plugin-dataset
- 变更统计: +1710 -1863 行
- 涉及文件: 79 个文件
- Commits: 7 个
✅ 优点
1. 架构改进 🏗️
- 将
apiDataset、feishu、yuque三种类型统一为pluginDataset,架构更加清晰 - 类型定义从
.d.ts迁移到.ts并添加 Zod schema 验证,增强类型安全 - 敏感信息过滤改为通用化实现(基于字段名匹配),更具扩展性
2. 代码简化 📉
- 删除了三个独立的 API 实现文件(
custom/api.ts、feishuDataset/api.ts、yuqueDataset/api.ts,约 800 行) - 统一使用
pluginClient调用插件服务,减少重复代码 - 净删除 153 行代码
3. 类型安全 🔒
// 新增 Zod schema 验证
export const PluginDatasetServerSchema = z.object({
pluginId: z.string(),
config: z.record(z.string(), z.any())
});4. 前端组件拆分合理 🧩
ApiDatasetForm.tsx(436 行)拆分为职责清晰的子组件:
index.tsx- 入口组件PluginDatasetForm.tsx- 表单主体FormFieldRenderer.tsx- 字段渲染器FolderTreeSelectModal.tsx- 目录选择器
5. 迁移脚本设计良好 ✅
- 幂等性保护:查询条件包含
pluginDatasetServer: { $exists: false } - 覆盖完整:feishu、yuque、apiServer 三种类型都有处理
- feConfigs 状态正确迁移到
system_plugin_datasets表
⚠️ 问题汇总
🟡 建议改进 (2 个)
1. 敏感信息过滤关键字可补充 (packages/global/core/dataset/apiDataset/utils.ts)
当前过滤列表:
const sensitiveFields = ['token', 'secret', 'auth'];建议考虑补充:password、key、credential 等常见敏感字段名
2. 错误处理可抽取为工具函数 (packages/service/core/dataset/apiDataset/plugin/api.ts)
4 处使用相同的错误处理模式:
if (res.status !== 200) {
const errorBody = res.body as { error?: string };
return Promise.reject(errorBody?.error || 'Failed to ...');
}可考虑抽取为:
const unwrapPluginResponse = <T>(res, fallbackError: string): T => {
if (res.status !== 200) {
throw new Error((res.body as { error?: string })?.error || fallbackError);
}
return res.body as T;
};🟢 可选优化 (1 个)
迁移脚本日志增强
建议在迁移脚本中添加日志输出,便于生产环境调试:
console.log('[v4.14.7 Migration] Results:', JSON.stringify(results, null, 2));🧪 测试建议
- 迁移测试:验证幂等性(多次执行不会重复迁移)
- 功能测试:
- 插件知识库创建流程
- 敏感信息过滤效果
- 插件状态开关功能
- 回归测试:确保已有知识库数据正常显示和同步
💬 总体评价
| 维度 | 评分 | 说明 |
|---|---|---|
| 代码质量 | ⭐⭐⭐⭐⭐ | 架构清晰,代码简化明显 |
| 安全性 | ⭐⭐⭐⭐☆ | 敏感信息过滤机制完善,关键字可补充 |
| 性能 | ⭐⭐⭐⭐⭐ | 无性能问题 |
| 可维护性 | ⭐⭐⭐⭐⭐ | 插件化架构易于扩展 |
🚀 审查结论
建议: ✅ 可以合并
这是一个高质量的重构 PR,将第三方知识库迁移到插件化架构,架构改进明显,代码质量优秀。上述建议改进项为锦上添花,不阻塞合并。
审查时间: 2026-01-28 | 审查者: Claude Code
5e6404e to
2035d78
Compare
1bbe1aa to
2d49b31
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 71 out of 82 changed files in this pull request and generated 9 comments.
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const { rawText: parsedText } = await readFileRawTextByUrl({ | ||
| teamId, | ||
| tmbId, | ||
| url: previewUrl, | ||
| relatedId: apiFileId, | ||
| datasetId, | ||
| customPdfParse, | ||
| getFormatText: true | ||
| }); | ||
|
|
||
| // 存入缓存 | ||
| getS3RawTextSource().addRawTextBuffer({ | ||
| sourceId: previewUrl, | ||
| sourceName: title || '', | ||
| text: parsedText, | ||
| customPdfParse | ||
| }); | ||
|
|
||
| return { title, rawText: parsedText }; |
There was a problem hiding this comment.
Missing error handling for the readFileRawTextByUrl call. If this function throws an error (e.g., network failure, invalid URL, parsing error), it will propagate without proper context. Consider wrapping this in a try-catch block and providing a more specific error message that helps with debugging, such as:
try {
const { rawText: parsedText } = await readFileRawTextByUrl({...});
// ... rest of the code
} catch (error) {
return Promise.reject(`Failed to parse file from URL: ${error.message}`);
}| const { rawText: parsedText } = await readFileRawTextByUrl({ | |
| teamId, | |
| tmbId, | |
| url: previewUrl, | |
| relatedId: apiFileId, | |
| datasetId, | |
| customPdfParse, | |
| getFormatText: true | |
| }); | |
| // 存入缓存 | |
| getS3RawTextSource().addRawTextBuffer({ | |
| sourceId: previewUrl, | |
| sourceName: title || '', | |
| text: parsedText, | |
| customPdfParse | |
| }); | |
| return { title, rawText: parsedText }; | |
| try { | |
| const { rawText: parsedText } = await readFileRawTextByUrl({ | |
| teamId, | |
| tmbId, | |
| url: previewUrl, | |
| relatedId: apiFileId, | |
| datasetId, | |
| customPdfParse, | |
| getFormatText: true | |
| }); | |
| // 存入缓存 | |
| getS3RawTextSource().addRawTextBuffer({ | |
| sourceId: previewUrl, | |
| sourceName: title || '', | |
| text: parsedText, | |
| customPdfParse | |
| }); | |
| return { title, rawText: parsedText }; | |
| } catch (error) { | |
| const message = | |
| error instanceof Error ? error.message : String(error); | |
| return Promise.reject( | |
| `Failed to parse file from URL "${previewUrl}": ${message}` | |
| ); | |
| } |
| status: { | ||
| type: Number, | ||
| default: 1 | ||
| } |
There was a problem hiding this comment.
The schema defines status as a generic number type without validation. This allows any numeric value to be stored, but only 0 and 1 are meaningful according to the comment. Consider adding validation to restrict the status field to only valid values:
status: {
type: Number,
default: 1,
enum: [0, 1]
}This would prevent data integrity issues from invalid status values being stored.
| const feishu = await MongoDataset.collection.updateMany( | ||
| { 'apiDatasetServer.feishuServer': { $exists: true }, pluginDatasetServer: { $exists: false } }, | ||
| [ | ||
| { | ||
| $set: { | ||
| type: DatasetTypeEnum.pluginDataset, | ||
| pluginDatasetServer: { pluginId: 'feishu', config: '$apiDatasetServer.feishuServer' } | ||
| } | ||
| } | ||
| ] | ||
| ); | ||
|
|
||
| // 2. 迁移语雀知识库 | ||
| const yuque = await MongoDataset.collection.updateMany( | ||
| { 'apiDatasetServer.yuqueServer': { $exists: true }, pluginDatasetServer: { $exists: false } }, | ||
| [ | ||
| { | ||
| $set: { | ||
| type: DatasetTypeEnum.pluginDataset, | ||
| pluginDatasetServer: { pluginId: 'yuque', config: '$apiDatasetServer.yuqueServer' } | ||
| } | ||
| } | ||
| ] | ||
| ); | ||
|
|
||
| // 3. 迁移 API 文件库 | ||
| const customApi = await MongoDataset.collection.updateMany( | ||
| { 'apiDatasetServer.apiServer': { $exists: true }, pluginDatasetServer: { $exists: false } }, | ||
| [ | ||
| { | ||
| $set: { | ||
| type: DatasetTypeEnum.pluginDataset, | ||
| pluginDatasetServer: { pluginId: 'custom-api', config: '$apiDatasetServer.apiServer' } | ||
| } | ||
| } | ||
| ] | ||
| ); |
There was a problem hiding this comment.
The migration script doesn't remove or clean up the old apiDatasetServer field after migrating to pluginDatasetServer. This will leave stale data in the database. Consider adding logic to unset the old field after successful migration, for example:
$unset: { apiDatasetServer: "" }
This will help keep the database clean and prevent confusion about which field is authoritative.
| icon: pluginDataset.iconOutline || pluginDataset.icon, | ||
| avatar: pluginDataset.icon, | ||
| label: parseI18nString(pluginDataset.name, language), | ||
| collectionLabel: language === 'en' ? 'File' : '文件', |
There was a problem hiding this comment.
The hardcoded collectionLabel logic uses a simple language check that only handles 'en' vs Chinese. This approach doesn't properly use the i18n translation system and will fail for other languages. Consider using the translation function t() with a proper translation key instead:
collectionLabel: t('common:File')This would be more consistent with the rest of the codebase and properly support all configured languages.
| <FolderTreeSelectModal | ||
| selectId={pluginConfig[currentTreeSelectField] || 'root'} | ||
| server={(e) => { | ||
| const { basePath, folderToken, ...restConfig } = pluginDatasetServer.config; |
There was a problem hiding this comment.
The destructuring on line 164 extracts basePath and folderToken from config but these are hardcoded field names from the old system. Since this is now a generic plugin system, these specific field names might not exist in all plugins. This could cause runtime errors or unexpected behavior.
Consider either:
- Removing this logic if it's no longer needed for the plugin system
- Making it conditional based on the pluginId
- Documenting why these specific fields need to be removed
|
|
||
| return global.PluginDatasetSourcesCache.map((source) => ({ | ||
| ...source, | ||
| status: configMap.get(source.sourceId) ?? 1 |
There was a problem hiding this comment.
The default status value of 1 (enabled) on line 42 means that if a plugin dataset source exists in the cache but has no corresponding database entry, it will be enabled by default. This could be a security concern if new plugin sources are added and automatically enabled without admin approval.
Consider defaulting to 0 (disabled) instead, requiring explicit admin action to enable new plugin dataset sources:
status: configMap.get(source.sourceId) ?? 0| return useYuqueDatasetRequest({ yuqueServer }); | ||
| export const getApiDatasetRequest = async (pluginServer?: PluginDatasetServerType) => { | ||
| if (!pluginServer?.pluginId) { | ||
| return Promise.reject('Missing pluginDatasetServer'); |
There was a problem hiding this comment.
The error message 'Missing pluginDatasetServer' is not very descriptive. It doesn't indicate whether the entire object is missing or just the pluginId field. Consider a more specific error message:
return Promise.reject('Missing pluginDatasetServer or pluginId');This would help with debugging when this error occurs.
| return Promise.reject('Missing pluginDatasetServer'); | |
| return Promise.reject('Missing pluginDatasetServer or pluginId'); |
| export async function initPluginDatasetSources(): Promise<void> { | ||
| try { | ||
| const res = await pluginClient.dataset.source.list(); | ||
|
|
||
| if (res.status === 200) { | ||
| global.PluginDatasetSourcesCache = res.body as PluginDatasetSourceConfig[]; | ||
| } else { | ||
| console.warn('Failed to load plugin dataset sources:', res.body); | ||
| global.PluginDatasetSourcesCache = []; | ||
| } | ||
| } catch (error) { | ||
| console.warn('Failed to load plugin dataset sources:', error); | ||
| global.PluginDatasetSourcesCache = []; | ||
| } |
There was a problem hiding this comment.
The function silently sets an empty array when plugin dataset sources fail to load (lines 236, 240). This could lead to a degraded user experience where plugin datasets appear unavailable without clear indication of why. Consider:
- Logging the error at a higher level (error instead of warn) since this is a critical feature
- Setting a flag to indicate the load failure so the UI can display an appropriate message
- Implementing retry logic with exponential backoff
This would help operators diagnose and fix the issue more quickly.
c121914yu
left a comment
There was a problem hiding this comment.
缓存 pluginDatasets 配置逻辑优化:
客户端
- useInitApp 里获取数据,并且每 5 分钟轮询获取一次
服务端
封装一个获取 pluginDatasets 的方法,用 versionId 那个缓存逻辑(plugin 启动/admin 开关知识库时候,刷新 versionId)。不需要 init 系统时候预加载。
2d49b31 to
b393699
Compare
2410a78 to
d404105
Compare
7883b72 to
4bb3fab
Compare
|
等 plugin 改完再合并这个 PR |
e0eb0fc to
b49b74a
Compare
No description provided.