[Alert] Discussion about indicators for scaling up Kafka #608

bingkunyangvungle · 2024-10-12T15:24:13Z

What can we help you with?

This will be just a place for discussing what should we monitor for scaling up the Kafka.

Where would you expect to find this information?

This is just the case of our own when we need to scale up our Kafka clusters.
In normal case, we would just scale up the cluster when the disk is growing and might hit the 100%, or in the case when the CPU/Memory are not sufficient to handle the traffic.
And in our case is that we have this issue that sometimes one or more broker just stopped clean up the out-of-date segments(one or two). Then the local disk would be filled up quickly with these uncleaned segments and restarting the brokers "seems" to resolve the issue temporarily. However, this can happen every day and we just can't keep doing the "restart", especially when the traffic continues to grow, there's be more of these brokers whose local segments can't be cleaned up.

Details

After trying various ways, we found that scaling up the cluster(adding more brokers) can greatly alleviates the issue(although we still have partition with uncleaned segments), then easily there's question about when should we scale up? Which indicator can show us when to scale up before we have to "restart" to resolve the issue first? In our case, our CPU and memory doesn't seem to change much by the scaling up, but the IO wait drops from about 35% to 25% and that looks like an indicator that we can use. Other than this one, we didn't show any other indicator that can help us in this case.

Discussion

So based on the description above, if IOWait is really the indicator that we can use, why sometimes the broker doesn't delete the segments when the IOWait is high? What is the normal range for the IOWait to make sure that the system can work well? Is there some other indicator that can be used in this case? Maybe the throughput? The IOPS? Any idea is welcomed for discussion. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Alert] Discussion about indicators for scaling up Kafka #608

[Alert] Discussion about indicators for scaling up Kafka #608

bingkunyangvungle commented Oct 12, 2024 •

edited

Loading

[Alert] Discussion about indicators for scaling up Kafka #608

[Alert] Discussion about indicators for scaling up Kafka #608

Comments

bingkunyangvungle commented Oct 12, 2024 • edited Loading

What can we help you with?

Where would you expect to find this information?

Details

Discussion

bingkunyangvungle commented Oct 12, 2024 •

edited

Loading