How can I achieve streaming responses with Node-llama-cpp and Langchain? #3313

hafsalm · 2023-11-17T09:50:25Z

hafsalm
Nov 17, 2023

How do I implement streaming responses in Langchain when integrating it with Node-llama-cpp? Are there specific features or configurations in Node-llama-cpp or Langchain that support and facilitate streaming behavior when used in conjunction?

Thank you in advance for your help and support!

@hafsalm · 2023-11-17T10:00:37Z

dosubot[bot]
bot Nov 17, 2023

🤖

Hello @hafsalm,

I'm Dosu, an AI bot here to assist you with your queries and issues about langchainjs while you wait for a human maintainer. I'm here to help solve bugs, answer your questions, and guide you on becoming a contributor. Remember, no question is too big or too small. Let's work together on this!

To implement streaming responses in Langchain when integrating it with Node-llama-cpp, you can follow these steps:

Import the necessary modules from Langchain. In this case, we're using the LlamaCpp model from the langchain/llms/llama_cpp module.

import { LlamaCpp } from "langchain/llms/llama_cpp";

Define the path to your model and create a new instance of the LlamaCpp model. To enable streaming, pass in streaming: true to the constructor. Additionally, pass in a handler for the handleLLMNewToken event.

const model = new LlamaCpp({
  modelPath: "/Replace/with/path/to/your/model/gguf-llama2-q4_0.bin",
  streaming: true,
  callbacks: [
    {
      handleLLMNewToken(token: string) {
        process.stdout.write(token);
      },
    },
  ],
});

Call the model instance with a string. The response will be a stream of tokens, which are written to the standard output in this example.

await model.call("Where do Llamas come from?");

The handleLLMNewToken event in the streaming process in Langchain is used to handle each new token that is generated by the language model. It allows the application to process the generated text in real-time, as it is being generated, rather than waiting for the entire text to be generated before processing it.

You can find more details and examples in the following sources:

I hope this helps! If you have any other questions, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

2 replies

hafsalm Nov 17, 2023
Author

@dosu-bot The sample code didn't work. handleLLMNewToken event does not seem to be defined for llamacpp with langchain.

dosubot[bot] bot Nov 17, 2023

🤖

Based on the context provided, it seems that the handleLLMNewToken event is not recognized because it's not being used correctly. In your code, handleLLMNewToken is being passed as a property of an object in the callbacks array. However, handleLLMNewToken should be a method of the OpenAI instance.

To resolve this issue, you should define handleLLMNewToken as a method in the OpenAI class and call it when a new token is generated. Here's how you can do it:

import { OpenAI } from "langchain/llms/openai";

class MyOpenAI extends OpenAI {
  handleLLMNewToken(token: string) {
    process.stdout.write(token);
  }
}

// To enable streaming, we pass in `streaming: true` to the LLM constructor.
const chat = new MyOpenAI({
  maxTokens: 25,
  streaming: true,
});

await chat.call("Write me a song about sparkling water.");

In this code, MyOpenAI is a subclass of OpenAI that defines the handleLLMNewToken method. This method is called whenever a new token is generated. The call method doesn't need to pass callbacks anymore because handleLLMNewToken is now a method of the OpenAI instance. This approach is more in line with the context shared, where process.stdout.write(token); is used to write the token to the standard output.

In the context shared, the handleLLMNewToken method is defined in the LogStreamCallbackHandler class, which is a part of the callback handlers in LangChain. This method is called whenever a new token is generated by the language model. The method takes in the run and the token as parameters. It then adds a new event to the run's log entries array and calls the onLLMNewToken method if it's defined. This method can be overridden in subclasses of LogStreamCallbackHandler to provide custom behavior when a new token is generated.

You can find more details in the following sources:

As for the callbacks array in the LlamaCpp constructor, I wasn't able to find specific information about its structure and usage in the langchainjs codebase. It might be helpful to review the documentation or source code of the LlamaCpp class for more information.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How can I achieve streaming responses with Node-llama-cpp and Langchain? #3313

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How can I achieve streaming responses with Node-llama-cpp and Langchain? #3313

Uh oh!

hafsalm Nov 17, 2023

Replies: 1 comment · 2 replies

Uh oh!

dosubot[bot] bot Nov 17, 2023

Sources

Uh oh!

hafsalm Nov 17, 2023 Author

Uh oh!

dosubot[bot] bot Nov 17, 2023

Sources

hafsalm
Nov 17, 2023

Replies: 1 comment 2 replies

dosubot[bot]
bot Nov 17, 2023

hafsalm Nov 17, 2023
Author