例行检查 / Checklist
问题描述 / Bug Description
Hello. Thank you for your excellent work. I'd like to share a bug/issue I've encountered.
When using LibreChat with the standard Google endpoint, API requests are always in streaming mode. This results in receiving two responses instead of one, causing the model to repeat the same (or nearly the same) message in the chat. It seems the proxy might be sending the same request twice, without waiting for a response from the model when it's in the streamGenerateContent?alt=sse mode (the key part being alt=sse for Server-Side Events).
The proxy doesn't wait for these events and repeats the request, which leads to a duplicated response. I can see two responses concatenated together in the response body (both in the proxy logs and in my chat interface).
It's not possible to disable streaming for the default Google endpoint in LibreChat. However, the repetition disappears when I use the OpenAI-compatible Google API endpoint, configure the proxy to send OpenAI-compatible commands (/chat/completions), and disable streaming in the interface.
None of the available delay settings in the proxy interface solve this problem.
Would it be possible to add delay settings (or another solution) to ensure the proxy waits for the model's response via the SSE trigger? This would prevent it from resending requests or closing the connection prematurely, which seems to cause the LibreChat agents to duplicate the request.
复现步骤 / Steps to Reproduce
- Use LibreChat
- Set up default Google Enpoint
- Add an Agent/Assistant with Google model as the backend
预期结果 / Expected Behavior
The expected result of the fix is that the proxy will not repeat the request before a specified delay has passed. Consequently, the model will have sufficient time to compile and deliver complex responses without answering the same query multiple times.
相关截图 / Screenshots
例行检查 / Checklist
问题描述 / Bug Description
Hello. Thank you for your excellent work. I'd like to share a bug/issue I've encountered.
When using LibreChat with the standard Google endpoint, API requests are always in streaming mode. This results in receiving two responses instead of one, causing the model to repeat the same (or nearly the same) message in the chat. It seems the proxy might be sending the same request twice, without waiting for a response from the model when it's in the streamGenerateContent?alt=sse mode (the key part being alt=sse for Server-Side Events).
The proxy doesn't wait for these events and repeats the request, which leads to a duplicated response. I can see two responses concatenated together in the response body (both in the proxy logs and in my chat interface).
It's not possible to disable streaming for the default Google endpoint in LibreChat. However, the repetition disappears when I use the OpenAI-compatible Google API endpoint, configure the proxy to send OpenAI-compatible commands (/chat/completions), and disable streaming in the interface.
None of the available delay settings in the proxy interface solve this problem.
Would it be possible to add delay settings (or another solution) to ensure the proxy waits for the model's response via the SSE trigger? This would prevent it from resending requests or closing the connection prematurely, which seems to cause the LibreChat agents to duplicate the request.
复现步骤 / Steps to Reproduce
预期结果 / Expected Behavior
The expected result of the fix is that the proxy will not repeat the request before a specified delay has passed. Consequently, the model will have sufficient time to compile and deliver complex responses without answering the same query multiple times.
相关截图 / Screenshots