I talked at Big Data Analytics (London), in June this year (https://whitehallmedia.co.uk/bdajun2018/).
Here's a summary of my talk. The theme was old architecture existing alongside the new.
As the event was in Victoria, London, I showed the following digitally merged photographs, showing how Victoria has changed over the years. I found these here:
This is Victoria Street. Taken around the year I was born - and the new in 2018 – just a couple of month ago. The development of SW1 blends old and new architecture, positioning new developments alongside existing sites such as the Neo-Byzantine Westminster Cathedral.
Westminster Cathedral continues to dominate Victoria’s famous skyline. Recent years have seen much-needed office space spring up in Victoria, with a £2bn makeover over the past few years.
Some sites haven’t changed much. Below is Lower Grosvenor Place, which backs onto Buckingham Palace Gardens.
…and some places have changed a lot.
Below is Cardinal Place. The Stag Brewery - here on the left - was demolished in 1959. The property developer Landsec helped transform the site in 2006 into Cardinal Place - a shopping and dining destination. It is much more functional.
But why am I showing you these pictures… well.. I’m showing you how old ARCHITECTURE can exist alongside the new. And the same is true in BIG DATA architecture…
So, what’s the OLD vs. NEW in data architecture?
On the left, we have the old – Static Data systems - mostly focused on the passive storage of data. Phrases like “data warehouse” or “data lake” or “data store” all evoke places data goes to sit.
Movement of data tends to work in batches. By its very nature batch processing can be slow. On the right, we have a Streaming Platform. We are turning the database on its side, or some say, inside out.
This view of data architecture is challenging old assumptions.
With big data it was - the more the better. We can see here the value of data is proportional to the volume of data.
...With Streaming Data it’s about the speed. More recent data can be more valuable. And as data ages – it becomes less valuable.
Thousands of companies are already employing streaming data, using Apache Kafka.
Streaming data architectures have become a central element of many Silicon Valley’s technology companies.
EVERY message in UBER, NETFLIX, YELP, PayPal – is through Kafka.
A key point here is, a streaming platform doesn’t have to replace your data warehouse; in fact, quite the opposite: it feeds it data. It acts as a conduit for data to quickly flow into the warehouse environment for long-term retention, ad hoc analysis, and batch processing. The old can live alongside the new…
So, how is streaming used in the real-world?
Lets return to Victoria. Streaming data can apply to almost any business...
If you look closely at this picture - you can see an Audi… This afternoon I’ll talk about how Audi is streaming data from 100s of sensors in the car, in real-time,
Identifying issues with cars sooner (preventing costly repairs)
Alerting drivers about obstacles in the road so they can save time and avoid accidents
Long-term: provide insight into how drivers are using the cars.
We can also see a cash machine just here - behind all these people… I’m going to talk about how banks are off-loading data from traditional mainframes and running events through an event-streaming platform - in order to provide a better customer experience.
And we can see shops - I’ll talk about Retail. In the era of omni-channel retail, consumers are creating more data than ever before. For a retailer, real-time streaming data across the business can help drive insights into customer behaviors and improve business outcomes.
So, how does data streaming work - and why is it different to a traditional data store (I’ll explain this a little more this afternoon – but in short)? Apache Kafka is an open-source stream processing software platform developed by the founders of Confluent and an Apache Software Foundation project.
Whilst Kafka is often categorized as a messaging system (as it serves a similar role), it is much more – and yet the concept of Kafka is relatively simple – a commit log of updates (pic).
A producer of data sends a stream of records - appended to this log. Any number of consumers can continually stream these updates off the tail of the log with millisecond latency.
Importantly, Kafka is built as a modern distributed system - - to be fault-tolerant, high-throughput, horizontally scalable, and allows geographically distributed data streams and stream processing applications. Data is replicated and partitioned over a cluster of machines that can grow and shrink transparently to the apps using the cluster. Consumers can be scaled out as well and automatically adapt to failures in the consuming processes. Kafka handles persistence well. A Kafka broker can store many TBs of data, allowing usage patterns that would be impossible in a traditional database. Kafka’s storage layer is essentially a "massively scalable pub/sub message queue architected as a distributed transaction log.
Thank you… and in closing, I want to say our Vision is for Kafka and the Confluent Platform to act as the central nervous system of the modern company.
We think this technology is changing how data is put to use in companies. We are seeing that streaming data is redefining competition. Those that capitalize on it are creating a new, powerful customer experience, reducing costs, designing for regulatory uncertainty, and lowering risk in real-time.
We think the Confluent Platform represents the best place to get started if you are thinking about putting streaming data to use in your organization whether for a single app or at company-wide scale. Change takes time - as we can see here in Victoria… I will talk this afternoon about how to effect this change… it’s a journey. I’ll explain using the examples of IoT, Finance, and Retail.