Study accuracy and performance penalty in byte size calculation of Event/Batch size

In the parent issue #7417 we need to measure the size in bytes of the batch and hence also the size of each event. This poses a couple of problems:
- what does it mean the size of an event/object?
- should we have a byte-exact value or is acceptable an estimation?
- how the estimation influences accuracy and performance vs byte-exact?

## What does it mean the size of an event/object?
In JVM like any other  language every object instance needs some space to be managed by the runtime. For example a simple Object instance 16 bytes on a 64 CPU and 8 on 32 (https://shipilev.net/jvm/objects-inside-out/#_observation_32_bit_vms_improve_footprint). So considering for example a string what is it's byte size? Is the list of bytes that encode the string (call it payload) or also the memory occupation of the String object instance?

## Should we have a byte-exact value or is acceptable an estimation?
If we decide to use an approximation of the size how much good it is respect to byte-exact? How much does it cost?
The exact byte size of an object wouldn't be always feasible. Considering the JOL library, a tool used to compute memory consumption of a single instance/class, given that JVM is HotSpot, doesn't solve completely the problem. It doesn't calculate the full retained size of instance, the instance itself plus the full reference graph that's reachable from that. To do that we need navigate the graph, and it's not always easy or feasible(consider private fields in an instance).

Computing the size would require to navigate the graph of an event instance, which is mostly a maps-of-maps structure. However, to have a sort of estimation we could also think to serialise the event with CBOR and take the byte array's size obtained as an approximation of the size. How much accurate are these approximations?

## How the estimation influences accuracy and performance vs byte-exact?

Given that the byte-exact size is difficult to reach, also from HPROF heap dumps, how behaves the estimations proposed respect to accuracy and performances?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Study accuracy and performance penalty in byte size calculation of Event/Batch size #17736

What does it mean the size of an event/object?

Should we have a byte-exact value or is acceptable an estimation?

How the estimation influences accuracy and performance vs byte-exact?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Study accuracy and performance penalty in byte size calculation of Event/Batch size #17736

Description

What does it mean the size of an event/object?

Should we have a byte-exact value or is acceptable an estimation?

How the estimation influences accuracy and performance vs byte-exact?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions