-
Notifications
You must be signed in to change notification settings - Fork 13
How to decrease model inference time #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As to a machine configuration, that's about what you should use. Most configurations work at around the same speed. |
Well I was testing this with 4 GPU on A100, Now I have H100 with 8 gpu now that I can use but I have seen that attention heads are 28 so 8 gpu's cannot be used ? - so my concern is how to use it with 8 then ? Also I have notices that this works with specific log format, time and message, if the message contains multiple fields it wont be able to process that - any thoughts around it |
Same thing. This is the
Can you clarify what you mean by "multiple fields" and "won't be able to process that"? Not sure what you mean exactly. |
Tried increasing parallelism and below error I am getting: |
As the error states:
Try 4 or 7 instead. As a side note, this is more of a discussion about using vLLM rather than Outlines, so I may need to refer you to another forum. |
Try 7. |
Hi Team,
I am trying to use this for my application logs, by tweaking a bit of security-prompt.
Few Observations/query that I have:
I tried switching to 32B the performance was even slower
It takes a quite of lot time to run analysis, I used 100 chunks, and for this it took almost 5 mins and if we inc the file size to 5k the analysis will take almost 10 min
Is there a way to increase the inference ? so that analysis can be run more quickly
also I am running this on A100, 4gpu - any other machine configuration for which it will be more faster and accurate?
Any other suggestion that u guys have to make it work in more faster and optimized way.
The text was updated successfully, but these errors were encountered: