Skip to content

Add Conn.WriteMulti function to produce to multiple topics/partitions #1094

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

stevevls
Copy link
Contributor

This function removes the (arbitrary) restriction that a kafka.Conn can only write to a single topic+partition. It gives caller a means to publish batches of messages that span topics and partitions provided that they are published to the broker that is the partition leader for all of them.

This is something that we've been wanting to leverage from an internal Segment library for a long time. As we've moved to topics with higher partition counts, the limitation of the conn means we need to open a ton of network connections, and we're unable to batch across partitions to reduce network overhead.

Totally open to feedback, esp. with naming. 😄

Also, LMK if the docs on WriteMulti can use more work.

@rhansen2 rhansen2 self-assigned this May 12, 2023
@rhansen2
Copy link
Collaborator

Thanks for this submission!

We were wondering if this could potentially be implemented by utilizing the protocol packages Conn type or potentially the RoundTrip functionality so that we don't have to handle all the api specific version code for writing and reading the protocol messages?

@stevevls
Copy link
Contributor Author

No problem! The short answer is I know I can't use Transport for the use case I was trying to enable. RoundTripper may work, but I'm not sure. This PR was something I coded up on my way out of Segment because I had always wanted our internal steam processing library to be able to do this multi-publish operation. Due to some unfortunate lack of coordination, the primitives in kafka-go 0.4 were designed without consulting the maintainers of said library. That lib interfaces with the Conn class directly in order to build reliable asynchronous delivery. I can't recall the details, but Transport added too much "magic" that got in the way. IIRC, we would have to rewrite our Kafka sink from the ground up, and even then I think there were some unsolvable issues.

I don't have access to the source anymore, but I wrote most of the library's code (I'm being careful not to name it since it's not public). Also on my way out I opened a PR in that project that uses this branch of kafka-go. If you want to try your hand at a non-Conn based multi-produce solution, it would be a pretty big performance win for Segment services.

I'm not really invested in this PR anymore since I've moved on to another company. It's a take it or leave it sort of thing, so no hard feelings if you close it out. I encouraged the team that took ownership of the other library to push the multi-produce feature through, even if it takes another form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants