Skip to content

Discussion: Should multiple related announcement types be combined in a single batch schema? #298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wesbiggs opened this issue Mar 6, 2025 · 2 comments

Comments

@wesbiggs
Copy link
Member

wesbiggs commented Mar 6, 2025

To improve and simplify the ability for clients to construct threads from batches, we could make a single Parquet schema that can include all the content-related announcement types (think of each Parquet row as being a union of the fields in Broadcast, Reply, Reaction, Update, etc.) — many of these fields are overlapping anyway). Rather than posting separate batches for each of those types we get more efficient use of blockspace. Clients need to change the way they process the batch but IMO this will make thread construction and maintenance more efficient than less.

Pros:

  • For batch creators, fewer batches (and therefore on-chain transactions) represent
  • For batch consumers, fewer batches to consume

Cons:

  • Somewhat more complex schema parsing
  • Arguably less expressive schemas (as many columns would be declared optional
  • Small increase to bytes per row

Open questions:

  • Which Announcement Types would logically be grouped by this approach?
  • Dealing with legacy data?
@wesbiggs
Copy link
Member Author

wesbiggs commented Mar 6, 2025

Notes from Community Call 2025-03-06:

  • Consider how Bloom filters would apply; size vs. false positives
  • Original intent was that not all applications would be interested in all announcement types, but that applied more to the differences between content announcements, graph announcements, etc., where several of these types have moved to become DSNP User Data instead.
  • Not an all-or-nothing; question is which types logically should be grouped together?
  • Polymorphic content increases computational complexity (but Parquet has some facilities for automatically handling polymorphism)
  • Benefits to batch producer decrease at high volumes of activity

@enddynayn
Copy link
Contributor

enddynayn commented Mar 6, 2025

My understanding is that grouping batches together will optimize blockspace usage. Is the current separation significantly impacting block space utilization?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants