In this post, I am going to talk about Apache Avro, an open-source data serialization system that is being used by tools like Spark, Kafka, and others for big data processing. What is Apache Avro According to Wikipedia: Avro is a row-oriented remote procedure call and data serialization framework developed within Apache’s Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format. Its primary use is in Apache Hadoop, where it can provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes, and from client programs to the Hadoop services. Avro uses a schema…
-
Getting started with Apache Avro and Python