You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found a similar open issue regarding truncated responses in the Langchain-Chatchat repository. You can check the details here[1]. However, no solution has been found in the comments.
Regarding your issue, the max_tokens parameter is indeed intended to limit the number of tokens generated by the language model during response generation. If the responses are still truncated despite setting a large max_tokens, there might be other factors at play, such as network issues or server-side handling errors that could cause incomplete data transmission [2][3].
Additionally, the streaming API handles chunked responses, and truncation could occur due to network connection issues or errors in server response handling. The fetchSSE method includes error handling to manage such scenarios, but specific causes of truncation would depend on the implementation details and network conditions [2][3].
You might want to check the network connection stability and ensure that the server-side implementation correctly handles streaming responses. Also, verify that the max_tokens parameter is correctly configured and passed to the LLM during instantiation [4].
问题描述 / Problem Description
LLM回复有时是截断的,但是max_tokens的设置设置为很大, 这个参数难道不是指定LLM返回内容的token数吗,为什么设置了很大还是只输出了一小部分的内容呢?可以有人帮忙看下嘛谢谢!
复现问题的步骤 / Steps to Reproduce
预期的结果 / Expected Result
应该返回完整的回复
实际结果 / Actual Result
只返回了一小部分token
环境信息 / Environment Information
附加信息 / Additional Information
如图所示,自己搭了一个前端,然后输入提示词,但是LLM返回的内容是截断的。
由于服务器事件流返回的chunk也是断的:
到这里就没了,这个问题困扰了很久,有大佬知道怎么解决吗,谢谢!
The text was updated successfully, but these errors were encountered: