Can it rebalance partitions? And delete crap from topics? Every time we have to ...

kvakerok · on July 15, 2024

If you have to manually delete data from topics, you are doing Kafka wrong. The whole point of it is high speed data throughput, so that something automated does it for you downstream.

sauljp · on July 15, 2024

Totally agree

otabdeveloper4 · on July 15, 2024

Seems like a ./kafka-topic shell script is their "something automated".

[Pro tip: kafka is a piece of shit architecture and actually doesn't provide you with anything better.]

cosmic_quanta · on July 15, 2024

> kafka is a piece of shit architecture

Hmmm interesting. I have only seen people rave about it, but haven't used it myself. Why is it shit architecture?

otabdeveloper4 · on July 15, 2024

A message queue has one (1) job: to keep track of the already-processed message pointer.

Which Kafka doesn't do. So you either store everything forever (lol) or you write some sort of broken half-baked solution for a message queue on top of Kafka. (Broken and half-baked because you're not going to achieve fault tolerance or consistency without re-implementing the storage layer.)

Now, you're just gonna say that "Kafka isn't a message queue". Well, I don't need half of a solution that isn't even a message queue. Nobody needs that.

p_l · on July 15, 2024

Then use a message queue, not a log?

Honestly, probably a lot of Kafka woes (as a bystander) come from people using it as message queue when it's not one.

otabdeveloper4 · on July 17, 2024

There is no other use case for Kafka except using it as a message queue. That's by design. It's just an extremely poorly designed piece of software.

p_l · on July 17, 2024

Considering that the core behaviour of message queue (like RabbitMQ, IBM MQ, etc) and streaming log system (like Kafka) is wildly different, I'd rather claim that the very idea of using Kafka as message queue is the core problem.

You can use a distributed log to approximate a dedicated message router (which is honestly what "message queue" systems actually are - the queue is an artifact of limited capacity not required behaviour) but such uses are going to be wrong 9 out of 10.

OTOH if you want multiple readers observe same event stream, including across time dimension, not just receive messages, then message queue systems are going to be wrong solution and systems like Kafka are going to be good options.

Both have their pros and cons, both have their uses, both are shit solution when you need the other.

The more important question is "which of those, if any, you actually need".

sauljp · on July 15, 2024

nobleach · on July 15, 2024

Kafka isn't a message queue. If someone told you that it was, they were wrong. It's an event/stream processor.

sauljp · on July 15, 2024

I do not think it is a shit arch. I think that is an arch for some problems, but people insist on using it for uses cases it was not built for.

kvakerok · on July 17, 2024

Kafka is a stream processor. If you want a messaging queue, use something like RabbitMQ.

sauljp · on July 15, 2024

Hi! kaskade is more like AKHQ than Cruise control.

We use strimzi by default, so we deploy cruise control with kafka, and it takes care of rebalancing the data across the nodes. Also you can deploy it without strimzi.

Delete crap is more complicated, usually with kafka-delete-records (this is king of new I think). The problem is the offsets. By general rule you should not delete data from topics

rockwotj · on July 14, 2024

> delete crap from topics

You have to rewrite the whole topic to do this right? (or do some hacks with compaction if you have unique keys)

bink · on July 14, 2024

Or just advance the offsets for that partition beyond the problematic data, or adjust retention, or both.

nick0garvey · on July 14, 2024

There is also the deleteRecords API specifically for this. It's easier than the retention shrink -> increase dance, as it is a single API call and retention does not kick in immediately. The log segment must roll for retention to apply, either due to size or time.

https://kafka.apache.org/11/javadoc/org/apache/kafka/clients...