Event Driven Systems

Kafka

When I started working with KAFKA it looked like a message pub sub system similar to the ones I worked before like SQS , RabbitMQ . But upon some reading and working with it I realised it much more than that . Topics, partitions, consumers and consumer groups in kafka gives us more control and co ordination over the storage and retrievel of data in real time.

 

broken image

KAFKA can become a source of truth for all the data in an organisation.

With the use of microservices we are now creating a custiomized view of data from a larger data set that is suited for a particular service which is solving a particular business problem. Once data is transformed into specific views it becomes hard to go back in time to access the original form of data when something goes wrong, This can lead to data loss and bad user experience. KAFKA can become a source of truth for historical data in these kind of scenarios and we can replay messages by updating the offset.

KAFKA Integration in micro services

Simplest way to integrate KAFKA into your microserivice is to use a KAFKA client library for Go most commonly used one is shopify's sarama. We can easily create consumer and producer instances using it by passing configuration parameters like address(URL to access), consumer group, consumer topic(which topic to fetch message from) and producer topic(which topic to put message to).

Above method will work well, if your architecture is simple and there are one or two services consuming from and publishing to KAFKA. But if you are using KAFKA as a data pipeline into which many services are publishing messages and cosuming from it. It is better to use a side car container.

Tool that can be used to automate the process of fetching data from right topic in KAFKA, call a microservice and publish the result to relevant topic is benthos . Benthos is the right tool to solve this problem and by doing this we can move dependency on KAFKA from services to it. Once the benthos is up and running as a side car container in a Pod along with related service pod it will listen to messages in relevant topic from KAFKA and will call a REST/gRPC end point in the service that is ready as per the configuration. Service will do all the processing needed, and might store some relevant data in its local DB. Once all this is done it can sent a success or error message back to benthos. Service can also give back a processed payload which can be published to KAFKA which can be consumed by downstream services. This forms a perfect data processing pipeline which is decoupled and efficient.

 

broken image

With the above implementation managing messages from KAFKA and publishing into it, is simple and easy even when we have a data pipeline architecture which involves multiple services.