内容简介:Let’s face it—distributed streaming is an exciting technology that can be leveraged in many ways. Use cases include messaging, log aggregation, distributed tracing, and event sourcing, among others. Distributed streaming can result in significant benefits
Let’s face it—distributed streaming is an exciting technology that can be leveraged in many ways. Use cases include messaging, log aggregation, distributed tracing, and event sourcing, among others. Distributed streaming can result in significant benefits for companies that choose to use it, but, when not implemented correctly, it can initiate a frustrating technical debt cycle.
How do you know if you’re properly implementing Kafka in your environment? This article will examine a few of the common mistakes that people make when undertaking this process. Avoiding them will help you start off your Kafka implementation the right way.
Kafka Architecture
Kafka has five primary components: producers, brokers, consumers, topics, and ZooKeepers. Kafka producers push data to brokers. These brokers receive the data and store them in separate topics (more on this later) so that they can be retrieved by consumers. Consumers fetch the data and act upon it in a variety of ways. Kafka ZooKeeper is used to keep track of these components and their activities.
All Kafka messages (whether pushed or pulled) are organized into topics.. Any message you want to write you have to push to a topic and any message you want to read must be pulled from a topic.
A producer publishes a message to a topic and a consumer pulls messages from the topic. Naturally, topic management becomes a key part of deploying and maintaining your Kafka infrastructure. For a more detailed look at Kafka architecture, check out Kafka’s documentation .
If you’re wondering if Kafka is right for you, take a look at your event sources. At present, you may be storing these events in some kind of data lake or data store, such as the Hadoop Distributed File System, or HDFS .
Kafka Pitfalls
Here are five common mistakes to avoid when planning to implement Kafka:
1. Keeping Too Much Data
Depending on the requirements in your environment, the amount of data Kafka processes can be overwhelming. Without properly tuned data retention settings, your data could be rendered useless.
Data retention is particularly important in Kafka because messages remain in topics, taking up disk space on the brokers until their configurable size is reached or the retention period elapses. These messages remain even if they have already been consumed.
If the data retention period or size is set too low, the data may not be consumed before it is removed from the broker. In this sense, Kafka operates differently from traditional message brokers.
One family of settings— log.retention.X
— manages this in Kafka. Examples include log.retention.bytes
for delineating the max size of a Kafka partition before its deletion, log.retention.hours
, log.retention.minutes
, or long.retention.ms
that outlines how long to keep the data before it’s deleted.
2. Not Balancing Topics
Inevitably, you will find yourself needing to scale Kafka out to meet the demands of your data streams. At this point, managing your topics becomes a balancing act. It’s important to rebalance your topics to reduce resource bottlenecks and maintain storage efficiency.
There are two things to aspects of topic balancing that need to be addressed: 1) partition leadership balance and 2) partition spread across the Kafka cluster.
Kafka Partition Leadership Balance
On partition leadership, the data in every Kafka partition is replicated across one or more brokers. These are collectively called a replica set . Each replica set has one broker acting as the leader for that partition, and the other brokers in the set are the replicas (for that partition).
The leader is then responsible for communicating with the publishers, consumers and other replica brokers for data transfer.
For this reason the leader is normally more loaded than the replicas.
When a broker goes down, one of the replica brokers assumes leadership for that replica set. If this process is not properly managed, a real-world outcome of a broker loss could be the creation of a hotspot on one of the other brokers, if it assumes leadership on more than its fair share of Kafka partitions.
This can also happen when the downed broker comes back online.
You can configure the Kafka cluster to promote a replica partition using the auto.leader.rebalance.enable
setting should the primary partition or its broker go down. Be warned, though: in the past, this process is known to have caused rebalancing to fail entirely or to have required a rolling restart of all brokers! Thankfully, the provided script, kafka-preferred-replica-election.sh
, enables you to avoid these issues.
Kafka Partition Spread across the Cluster
When adding nodes to your cluster, the cluster will not assume any workload automatically for existing topics—only for new ones. Therefore, it’s important to rebalance your existing topics using the kafka-reassign-partition.sh
script. This script will generate new metadata that guides the Kafka brokers’ management of Kafka partition assignments. This is also true when scaling the cluster. Partitions need to be reassigned away from the broker that is to be terminated prior to removal of the broker from the cluster.
With proper monitoring and execution, these scripts can help rebalance the load more evenly across the cluster.
3. Not Accounting for Long-Term Storage
You may need to store data with Kafka for a long period of time. Kafka stores persistent, checksummed, and replicated data. As a result, it can keep information indefinitely and reprocess it when the proper configuration is established. Unfortunately, Kafka doesn’t allow storage to scale beyond the capacity of a single node. Determine in advance how you want to use the data, since reprocessing it will require a development effort.
Thankfully, long-term storage and data reprocessing aren’t uncommon use cases for Kafka. As mentioned earlier, you can pair Kafka with HDFS or blob storage for additional permanence.
Deploying a reusable reprocessing cluster can also mitigate some of these problems.
4. No Disaster Recovery (DR) Plan
A disaster recovery plan is critical for all of your services, and Kafka is no exception. Kafka has a featured called MirrorMaker that lets you consume data from one cluster and copy it to another, retaining data for disaster recovery purposes.” . Using MirrorMaker, you can manage data retention to reduce costs. In this situation, Kafka consumers and producers can be switched to using the mirrored cluster if the main Kafka cluster goes down. However, the topic offsets are not replicated between Kafka clusters. By creating unique keys in messages, you can avoid this issue. Unique keys will point consumers to the last “checkpoint” where they will resume processing.
5. No API Enforcement
No Kafka consumer or producer uses the exact same amount of system resources. In fact, certain applications will use significantly more of a Kafka cluster’s processing power than others.
Kafka has many ways to modify the balance of the system. One way of maintaining balance is by setting quotas for the Kafka producer and Kafka consumer APIs. Quotas ensure that one system doesn’t hog all of the available resources.
By configuring the quota enforcement per client, you can ensure that all of the messages piped through your Kafka cluster will be able to make it through.
Summary
There’s a lot of buzz around Kafka and what it can do for your organization’s data. When properly implemented, Kafka can be a powerful tool for sending and receiving data across your network.
Improper implementations, such as the ones examined in this article, will not only prevent you from processing data more efficiently, they may also set off a technical debt cycle. Whether you have Kafka in production now or you’re evaluating it for your next use case, keep these pitfalls in mind as you implement this potentially powerful tool.
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
零售的哲学:7-Eleven便利店创始人自述
[日] 铃木敏文 / 顾晓琳 / 江苏文艺出版社 / 2014-12-1 / 36
全球最大的便利店连锁公司创始人——铃木敏文,结合40多年零售经验,为你讲述击中消费心理的零售哲学。铃木敏文的很多创新,现在已经成为商界常识,本书把那些不可思议的零售创新娓娓道来。关于零售的一切:选址、订货、销售、物流、管理……他一次又一次地在一片反对声中创造出零售界的新纪录。 翻开本书,看铃木敏文如何领导7-11冲破层层阻碍,成为世界第一的零售哲学。一起来看看 《零售的哲学:7-Eleven便利店创始人自述》 这本书的介绍吧!
JS 压缩/解压工具
在线压缩/解压 JS 代码
RGB转16进制工具
RGB HEX 互转工具