Skip to content

Trace the end-to-end time spent on handling GMessages #945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
masih opened this issue Apr 16, 2025 · 3 comments
Open

Trace the end-to-end time spent on handling GMessages #945

masih opened this issue Apr 16, 2025 · 3 comments

Comments

@masih
Copy link
Member

masih commented Apr 16, 2025

Add detailed time metrics to trace exactly how much time is spent on what right from sub.Next for granite topic to the end when the message is either buffered elsewhere for post processing or handed to gpbft.

Context
In passive testing at scale 50% we see "subscriber too slow" logs increase with QUALITY quorum of senders drop to ~50%. When the buffer size was doubled to 256, the log rate decreased and the quorum of senders in QUALITY phase increased to ~65%.

We need these metrics to understand what exactly is slow in processing.
We know that a fair chunk of time is spent on fetch committee ( though testes were run with power override which does significantly reduce the time spent on fetching committee )

@BigLep
Copy link
Member

BigLep commented Apr 22, 2025

@BigLep BigLep moved this from Todo to In progress in F3 Apr 22, 2025
@BigLep
Copy link
Member

BigLep commented Apr 22, 2025

2025-04-22 conversation: we have the coding done, but there is an operational side to collect the traces. This is probably another day to do the operational side.

Our guess is this won't affect parameters for activation.

@BigLep BigLep moved this from In progress to Todo in F3 Apr 24, 2025
@BigLep
Copy link
Member

BigLep commented May 2, 2025

2025-05-02 conversation: deferring because it's clear that the biggest time is spent in committee and proposal fetch (sometimes upwards of 10 seconds), and we have data on those already. Optimizing those is Lotus work and that is where we should spend time rather than full end to end tracing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants