[RHIDP-11602] [RHIDP-11600] AI Notebooks Developer Preview Backend#2499
[RHIDP-11602] [RHIDP-11600] AI Notebooks Developer Preview Backend#2499JslYoon wants to merge 5 commits intoredhat-developer:mainfrom
Conversation
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/fileParser.ts
Outdated
Show resolved
Hide resolved
...ces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/documents/documentService.ts
Show resolved
Hide resolved
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/fileParser.ts
Outdated
Show resolved
Hide resolved
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/ai-notebooks-router.ts
Outdated
Show resolved
Hide resolved
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/ai-notebooks-router.ts
Outdated
Show resolved
Hide resolved
...ces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/documents/documentService.ts
Show resolved
Hide resolved
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/fileParser.ts
Outdated
Show resolved
Hide resolved
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/document-service.ts
Outdated
Show resolved
Hide resolved
| /** | ||
| * Strip HTML tags and extract readable text from HTML content | ||
| */ | ||
| function stripHtmlTags(html: string): string { |
There was a problem hiding this comment.
this is very overwhelming. any existing html parser can help on this?
perhaps https://www.npmjs.com/package/htmlparser2 ?
There was a problem hiding this comment.
For reference this is the function @thepetk wrote to do the same in the techdocs mcp plugin: https://github.com/redhat-developer/rhdh-plugins/blob/main/workspaces/mcp-integrations/plugins/techdocs-mcp-extras/src/service.ts#L156-L172
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/ai-notebooks-router.ts
Outdated
Show resolved
Hide resolved
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/document-service.ts
Outdated
Show resolved
Hide resolved
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/document-service.ts
Outdated
Show resolved
Hide resolved
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/document-service.ts
Outdated
Show resolved
Hide resolved
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/document-service.ts
Outdated
Show resolved
Hide resolved
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/document-service.ts
Outdated
Show resolved
Hide resolved
workspaces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/document-service.ts
Outdated
Show resolved
Hide resolved
...ces/lightspeed/plugins/lightspeed-backend/src/service/notebooks/documents/documentService.ts
Show resolved
Hide resolved
| const startTime = Date.now(); | ||
| const pollIntervalMs = 1000; | ||
|
|
||
| while (Date.now() - startTime < this.fileProcessingTimeoutMs) { |
There was a problem hiding this comment.
I'm thinking this could block the backend for up to 30 secs.
can we make this sync file uploading process to be async?
let's say, once user upload any document, immediately return a job id
response {
status: 'processing',
job_id: job_id,
file_id: file.id,
};
and introduce a new API to able to get the job status GET /documents/job_id/status
response {
status: file.status, // 'in_progress', 'completed', 'failed'
progress: file.bytes_processed / file.total_bytes * 100,
error/message
};
and rely on frontend side to monitor & display the status
There was a problem hiding this comment.
or, can async handle the file, and use stream to send event for status update and frontend listen to the event. but the idea is not to use a while loop to block backend.
for example
setInterval(async () => {
const file = await this.client.vectorStores.files.retrieve(...);
res.write(`{
status: file.status,
progress: file.bytes_processed / file.total_bytes * 100,
message/error
})`);
if (file.status === 'completed' || file.status === 'failed') {
clearInterval(interval);
res.end();
}
}, 1000); // every one second as you defined
}
There was a problem hiding this comment.
I think instant upload & a new route to check status makes more sense, also less logical complexity @yangcao77. I would say blocking the backend is not necessary, since this file upload is async, it would not interfere with any other endpoints. Perhaps a "loading" button on the frontend while the document is uploading on frontend can help?
There was a problem hiding this comment.
perhaps
response {
status: 'processing',
job_id: job_id,
file_id: file.id,
};
can be
` response {
status: 'processing',
file_id: file.id,
};`
since file_id is unique. Uploading a file with the same file_id will overwrite the file.
yangcao77
left a comment
There was a problem hiding this comment.
left a couple comments.
also I noticed there is no tests being created. please create unit tests for most of the functions. and integration tests for the endpoints.
Signed-off-by: Lucas <lyoon@redhat.com>
|
| // const credentials = await httpAuth.credentials(req); | ||
| // const user = await userInfo.getUserInfo(credentials); | ||
| // return user.userEntityRef; | ||
| return 'user:default/guest'; |
There was a problem hiding this comment.
Yep, still making some final changes!


Hey, I just made a Pull Request!
Stories:
https://redhat.atlassian.net/browse/RHIDP-11606
https://redhat.atlassian.net/browse/RHIDP-11600
Implementing AI Notebooks backend functionalities for Session, Document, and LLM query
✔️ Checklist