Kafka Summit Talk San Francisco

Dec 4, 2018
16 min read

Updated: Sep 24, 2020

This is a rough transcript of the talk I gave at Kafka Summit in San Francisco Pier 27 in October 2018.

________________

Intro blurb... Hello, I’m Lyndon Hedderly, I’m a Director of Customer Solutions in Confluent.

I’m going to talk about the VALUE of Kafka and how to quantify that value in your organization..

Why is this important for you?

I’m sure we’re all here for the love of Kafka - but I’m hoping that because this group (selected this business case track) - perhaps you have an equal, or even greater interest in hard cash...
But seriously, quantifying the value of Kafka, or any technology, is important for three reasons:

You have to understand value in order to extract more value.
This exercise can be useful to justify an investment (an ROI, TCO, Business Case) and
It can be useful for benefits realization, for expanding Kafka in an organisation, and for general good business sense…

A quick question - who has has to create a business case, ROI, or TCO to implement Kafka?

(about 10% of audience raised their hand)

And who has implemented Kafka without a business case?

(about 10% of the audience raised their hand)

_____

Quantifying the monetary value of Kafka isn’t easy ...So, I’ll start with a quick story.

In 2006, a couple of guys created the Teehan+Lax investment fund. This was based on a hypothesis: Companies that focus on delivering great user experiences will see it reflected in their stock price.

They invested $50k across 10 companies which in their view offered a great User Experience, had a history of innovation and inspired loyalty in their customer base.

(details here: https://medium.com/habit-of-introspection/the-ux-fund-investing-50-000-in-10-companies-10-years-later-6fc65bd35e7a)

So, what happened… Overall $50k in 2006 became $306k, in 2016 - 10 years later.

That’s a 503% gain. The S&P in the same period returned 50%

$5k in Netflix became >$157k. That’s over a 3,000% gain.

The Same data, cut to show percentage gain is as follows, showing the UX fund:

- Outperformed the S&P 500 (similar for Nasdaq & Dow Jones). Total S&P Return over same period was c.50%

- Blackberry – lost money. Netflix provided the biggest gain.

What is interesting here is whilst this study was about Customer Experience - it could also be about Data… or using data to drive CX. You could argue Netflix was leading the pack in terms of being data-driven – segmenting users, recommending content based on algorithms that help understand users.

And - more interestingly - when I look at this study, I can see the top performers are all users of Kafka.

Netflix is obviously a digitally-native business. Or digital first. Everything is an event. It is using Kafka at massive scale:

They run 50+ Kafka clusters, with 4000+ brokers, processing an astonishing 2+ trillion messages every single day.

Can we use this to draw a correlation between Kafka and Value? Well… Netflix has been going for a while (since ‘97) a lot longer than Kafka (2011)... but, let’s take a closer look…

The Teehan & Lax study ran 2006 to 2016… When I last looked at Netflix, the share price was about $380. Had you invested $10k in Netflix - shortly after Kafka was created ($8) = 1,250 shares - it’d be worth almost half a million dollars now ($475k). That’s a 4750% increase!

Since 2015, when Netflix widely published it was using Kafka is has grown nearly 8 times (from $49 to $390 per share).

I’m not sure that value increase and the use of Kafka is a coincidence :-)

Perhaps, just as Teehan & Lax identified CX as an indicator of superior performance, we can identify being data-driven, or event-driven for an investment fund? Maybe - just maybe - we should set-up an event-driven-architecture (EDA) investment fund?

And whilst we’re looking at value - why don’t we look at the most valuable company in the world; Apple.

They gave a keynote at Kafka Summit London - and first slide said: KAFKA IS EVERYWHERE...

This is the first company to reach $1 trillion (except PetroChina - but let’s ignore that one).

I bought an old car for about $10k twenty years ago. Had I bought Apple stock, it’d be worth $2m now.

So again- using Kafka - and value - is this a coincidence?

OK - I admit, this is slightly tongue-in-cheek. I’ve used selection bias (Netftix & Apple) to prove a point - and common sense suggests there’s more to these company valuations than using Kafka. But I genuinely think there’s a link between Kafka and Value and I want to explore this, in order to identify the value of Kafka.

And why is identifying the value of Kafka important? Well… it’s useful to model value - as a business case to justify an investment, or as a benefits realization during a project or after one completes.

When I started my career, you tended to have to start with a comprehensive and detailed business case - before lifting a technical finger. Indeed, this is where I’ve spent a lot of my career - creating ROI, TCO and Business Case models. And understanding where value comes from can help us realise more value.

So, where to start? We need to get to a bit more detail before being so bullish about the value of Kafka… . We can start with the value of DATA. I’m sure at this point, everyone in the room is thinking - this is obvious. We know data-driven companies perform better.

The phrase - “Data is the new oil" was coined by Clive Humby, a UK Mathematician and architect of Tesco's Clubcard - in 2006. The same year as the Teehan & Lax study started.

The Teehan & Lax study set out to prove companies that focused on excellent Customer Experience outperform those that didn’t.

→ There’s now a lot of evidence to support this (we don’t have time to go into all that now).

Let’s take a quick look at data-driven companies to demonstrate this point.

This is a list of the top 5 publicly traded companies (by market cap).

We can see in 2001, 2006 and 2011 there was 1 tech company in the top 5.

If we fast forward to 2016, all of the top 5 companies by market cap are tech, or data driven, companies.

But how do these tech, or data-driven - or platform - companies - make their money? Or, how does data contribute to this value?

Well, this shows the revenue streams of the 5 big tech companies.

Apple, Amazon and Microsoft are Product companies

Apple is a hardware company - making money mostly from the iPhone - but is a heavy user of data - and increasingly so (we’ll come back to this).
Amazon is mostly an eCommerce business but again, uses data to great effect to maximize CUSTOMER EXPERIENCE (CX). It’s also expanding heavily with AWS and Media (spending second only to Netflix!) - using data.
Alphabet (Google) and Facebook are effectively online advertising companies. They’re data driven for personalised and contextualized targeted advertising. That’s why they’re dominating the ad market.
Microsoft is arguably the laggard - in this group. Making money from sw products.

Can we dig further into their data and get an idea of how to value data? Well, let’s take three examples...

Facebook has a market cap of $438 bn (- when I last looked in 2018). It had 2.23 billion monthly active users. That works out at $200 per user. We can see how many ads are sent to a user and quantify the data driving the ads.
Microsoft bought LinkedIn for $26bn in 2016: LinkedIn is also interesting - as it is here where Kafka was created. By acquiring the world's largest professional social network, Microsoft bought the data from more than 433 million LinkedIn members. This equates to $60 per user. Again, can we quantify the value of the data based on typical data of user profiles etc?
When Facebook bought WhatsApp for $18.4bn, it worked out at $42 per address book.

Can we use this information to place a dollar value on the data? Well... we know it’s not that easy… You can’t really extract the data (bits and bytes) - and easily draw a dollar value… We instinctively know there’s a lot more to the value of these companies - and their data.

But extracting value from data is notoriously difficult. A lot of companies look at the value from data as ANALYTICS. See my cross-post here about McKinsey. All their literature on the Value of data is really on the value of analytics of that data. But this is misleading in my opinion.

Let me explain with an example.

Let’s take an everyday item using data: Let’s say you have an Apple watch - what is that worth? A few hundred dollars?

The new Apple Watch is packed with a slew of sensors.

Let’s say the fitness information encourages you to run 5k three times a week; Now what’s it worth?
Also, some health insurers are offering a better deal, based on your tracking data. Now what’s it worth? The difference in insurance premiums?
And the iWatch can administer a medically accurate electrocardiogram. Let’s say the data gives you an early warning for a serious health issue; Now the value of the watch - and the data - is literally life and death.

The bottom-line - Value of data - and how you work with data is highly relative and situationally specific… and timely.

So, how do we disentangle the business and Technology - to understand the value of the data? And then Kafka?

The answer? You can’t.

Instead, business and technology are like the the yin and yang - they are complementary, interconnected, and interdependent...

We come to realise it makes little sense to separate business & technology (/ data) - in order assess value.

And this is my point 2 (point 1 was Kafka has a value - and I want to aim to quantify this) - but point 2 is it’s difficult to extricate this from business value. And this is what i think puts most people off from this endeavour - of quantifying value of data - of Kafka.

BUT - let’s not give up.

But back to the data is the new oil statement.

When Clive Humby coined the term, he went on to say: “Like oil, data is valuable, but if unrefined it cannot really be used.”

And this brings me to point 3. This statement is interesting because - if we put a value on, say an oil pipeline, the pipeline starts to assume the value of the oil it is carrying - and where it is carrying it.

In other words, if a pipeline was transferring $1m of oil a day - and it goes down, you could say the pipeline is worth $1m / day.

Also, the oil price, or value, may vary depending on the pipeline. And to complicate further, there are external factors which will impact the value or price, such as the dependence of critical infrastructure. Valuing infrastructure - not just the oil, or data, is a complex business.

The ability to move the oil becomes almost as valuable as the oil itself. And it is not just the data that is important, it is how it is moved around - and how timely it is.

Let’s pause. For a recap. The three main points so far are:

High Value organizations use Kafka - is there a link? And can this link help us in valuing Kafka? We know it’s more complicated than that...
We can look at the Value of Data - in order to value Kafka… e.g. how does data help drive the top 5 companies by Market Cap? - but it’s still not that easy…
And we can see it’s not just the data itself that is important - it is how it is moved around - and how timely it is.

________

So for section 2, where to start if we’re trying to assess the value of Kafka?

What we’re going to speak about today… is how we can quantify the value of Kafka in this context… (transition point)

I guess half the audience will see a duck here - and half will see a rabbit.

I’m showing this to demonstrate the paradigm shift … Kuhn used the duck-rabbit optical illusion, made famous by Wittgenstein, to demonstrate the way in which a paradigm shift could cause one to see the same information in an entirely different way. Whilst the term paradigm shift is probably over used, it is relevant here.

When we talk Kafka, some people will think of a Messaging, or ETL type tool.
Others will think Big Data that’s fast, like fast Hadoop.
Others may think of data integration. Something like an ESB.

I want you to think in terms of Event Processing. Kafka is unique as it processes events in real-time.

The key here is to assess value like a controlled experiment. Think of Kafka as the experimental group in a Controlled Experiment. The Control = No Kafka. Instead of relying on observation as in a controlled experiment, rely on estimates and assumptions…. But start with… the business use-case.

This event-driven-architecture enables use cases that might have been extremely difficult, or impossible before.

We know from the previous section it is hard to separate the business from the tech, so, let’s look at specific business use-cases - where Kafka plays an important role.

One of the challenges here, is Kafka can be used for so many different things… and this puts people off - but we can start to break this problem down…

The good news is - we can start to group these use-cases...

We can tie the use cases to → Strategic Objectives

And then tie the Strategic Objectives → to Key business drivers
And → we get back to Business Value.

Value here is shown as one of three things: either making money, saving money or protecting money.

Many of the use cases here - need to happen in real time; interacting with customers, feeding machine learning algorithms to make predictions, preventing fraud.

These all require data, but they also require timely AND reliable data.

We can see with Big Data value was proportional to VOLUME. The greater the volume, the higher the value.

But with Kafka use-cases we can also see fast data is more valuable; more recent data is more valuable.

If LinkedIn or Whatsapp had serious delays in their data - and customer experience - would they have had the same valuations? Or would they even have reached any valuation? If these applications were not operating in real-time with millisecond latency, would they have existed at all? How do we determine that?

So, let’s try this for three examples…

A Retailer - implementing Customer 360
A Bank - looking to increase operational efficiency - and decrease costs
And a payments provider looking to protect money by implementing a Fraud detection system.

I’m going to run through these quickly…

First, let’s assume we’re a Retail organisation for example - selling on-line and in-store, seeking to improve CX - with better Customer 360.

Let’s assume, we are able to offer real-time personalized and contextualized offers - both in store at point of sale, and on-line, enabled with a Kafka streaming platform.

There’s lots of data which suggests improved CX drives increased revenue.

This is just one example, an Accenture Forrester study which breaks down by industry and shows a one-point improvement in CX index scores results in significant incremental revenue per customer - which, when multiplied by number of customers equates to massive benefits. We can see here.

So, let’s model as-is revenue - without the Personalized Offers at point of sale.

We can also look at on-going IT costs supporting the current system around customer 360.

And we can model the cumulative net revenue over 3 years - in this case $57m

If we implement a streaming platform (a Kafka based 360 - for personalised offers), then we can model the target state revenue + increase in IT system costs. This includes a set-up or project cost.

Note that we could get Reduced on-going System costs - through increased agility / ease of use of a Kafka platform - for integration into other Retail systems…

= This could show a double benefit - kicking in in say, H2 Yr 1.

We can then model the Target State Cumulative net revenue - (incl. Minus system costs), over 3 Years - we can see here we’ve modeled $70 revenue.

When we compare the baseline (without Kafka, with the Target State (with Kafka) - We can see the Target state costs more initially over Set-up and Yr 1

There is a break even end Yr 1.

Total benefit of the Kafka based solution = $13.5m. (incl. increased revenue & decreased costs)

Even after the project set-up costs (and taking into account the varying IT costs...)

The question is, what proportion of this ‘Value’ is related to Kafka vs. the entire Customer 360 solution?

That’s debatable? So, let’s come back to that…

Let’s take another example: Digital Re-platforming in a Bank… leading to Decreasing IT operational costs.

Replacing some middleware and off-loading data from the Mainframe… reducing MIPS costs.

Assume we can Off-load 50% of the read-only MIPS from the (DB2) mainframe, reducing MIPS costs by 30-40%

Reducing mainframe data / events (MIPS) could reduce costs by approx. €1.2 to 2 million / yr (rough estimate).

Here I’m using a real scenario I actually roughly modelled for a customer.

A bank looking at Mainframe off-loading… This time the business case is over 5 not 3 years.

We can see the cumulative total costs on same graph - based on the right hand second axis.

Comparing the two… We can see the Target State, whilst more expensive initial (to implement the solution) has a series of Savings against the current state..

But overall break even is end Yr 1, start Yr 2 and we can see a $14m difference or 22% saving, over 5 years with the Target State. .

And we can drill down here to the sub-categories to see which actually decreased.

30% reduction in on-going Ops (see Nordea and RBC case studies)
20% saving in HW costs - Mainframe off-load (see RBC case study)
20% SW costs - removing reliance on existing ETL / Message Bus licensed technology (see Nordea & RBC case study)
23% incident risk costs

Finally, let’s look at a Risk Mitigation use case - implementing Fraud Detection use-case.

Here’s a quick joke - but online Fraud is a very serious issue.

We had a client from one of our banking customers talk about millions of dollars lost each month, through on-line fraud.

We might lose a certain amount each half year, through Fraud, let’s say $1m / month, = 12m / Yr, or $36m over 3 years.

And we can model the costs of implementing a Fraud prevention system - based on Kafka (real-time insights) - and we can see reduced loss through Fraud.

When we compare the two, we can see massive loss prevention.

Difference $18,000,000 or 50%

Wells Fargo also talked to us about 2 additional benefits to the reduced level of Fraud (loss prevention) but also 1) customer retention and 2) insurance premiums go down - additional benefits.

These examples are meant to be illustrative only. I want to show that we can quantify a value of a business use case - which Kafka supports… and ideally we would not disentangle the business from the infrastructure. Based on this, we can infer a value of Kafka.

We can see the Customer 360 (Retail) - had a 3 Year net difference: $13.5m or 24% to revenue
The Bank saved $14m or 22% over 5 years
The payments provider prevented a net loss of $18m or 50% net saving

But clearly, there is more to the business benefits than the infrastructure layer alone… We cannot claim Kafka has a value of $13m, or $14m or $18m… - but we can say Kafka supports delivering this value.

And that’s the key point. But - we don’t need to stop there… We can go further. We can isolate specific elements of Kafka and claim value. In the same way.

Let me talk through a quick example of this, in the extreme. Let’s assume Kafka is already in use - but we want to assess the value of Kafka within the Confluent Platform - an Enterprise grade version of Kafka? Well, Forrester, recently completed a Total Economic Impact (TEI) report on the core benefits of the Confluent Platform - isolating this difference, where Kafka is in both scenarios.

Forrester imagines an organization, based on a combination of some of our real customers, and suggests a $ number of the benefit of the Confluent Enterprise Platform (CEP).

Forrester states that this offers reduced developer and management costs and accelerated business enablement, over and above Core Kafka Open Source.

They state that CP provides Reduced developer and management costs. Amounting to $2.4m

They also state that CP Accelerated business enablement. To the tune of $3.8m

When we take into account the costs, we can see a total significant benefit.

The point here is we can work out, not only the value of Kafka - but the value of CP - an Enterprise version of kafka - by isolating the variables we want to test.

To help frame these value conversations / approaches to assessing value, I’ve created a 5 step process.

Let’s pause. For a recap. What are the key takeaways for this audience?

Whilst difficult to value Kafka exactly, we can quantify the value of a business use-case - by approaching valuation like a ‘controlled experiment’ - modelling as-is vs. target (with Kafka) - and then ‘infer’ the value of Kafka…
This is OK - but we have to accept there’s more to the business use-case than Kafka alone. So, Kafka can only take part of the credit…
We can also dig deeper - and quantify the operational efficiencies and agility, to derive value.
The example here - The Forrester TEI - which quantified the CP over and above Kafka..

But that’s not all… I want to close with some thoughts on the value of Kafka… and a few observations…

At the start of this presentation I talked about why identifying the value of Kafka important?

→ it’s useful to model value - as a business case to justify an investment, or as a benefits realization during a project or after one completes.

But what I have seen with the advent of Digital is this is no longer the case… Organizations don’t need hefty business cases. They often start iteratively, playing around with tech. Kafka is Open Source - and that’s often the route to market - developers or Engineers play with it - and it enters an organisation in stage 1 of our adoption curve - developer interest. So, having spent a large part of my career on detailed business cases, I find they’re less and less significant. In other words, it is useful to quantify the value of Kafka - but not essential. It is building its own momentum - and creating an inherent value.

However… perhaps more interestingly.

I also opened up with a story of an investment fund… I wonder whether there’s a case for a EDA - Event Driven Architecture investment fund.

Where we identify businesses that are innovative in their use of Events - and Streaming Platforms. Will they outperform more traditional businesses relying on legacy infrastructure?

And if we invest in these businesses - will we be able to see the value of Kafka in this way?

Perhaps if we invest wisely, we’ll be able to see returns like we saw with Apple and Netflix… Perhaps we could go and buy this beach house.

Maybe whilst Kafka is extremely valuable, to put a $ figure on this is missing the point. It’d be a bit like taking this beach house - and trying to put a separate value on the foundations.

They are essential to the house. They don’t have a separate value. Without them, the house would fail.
And I would argue Kafka will become a central foundation to the modern business.

Thank you.

I hope this was insightful in terms of the value of Kafka and aiming to quantify value in your organization.

Kafka Summit Talk San Francisco

Comments

RECENT POST

Two AI thought pieces from the Guardian

Reimagining the value proposition of tech services for agentic AI

IBM’s Acquisition of Confluent Will Change Everything For the Tech Sector

Intelligence at scale: Data monetization in the age of gen AI

Triple the return: How companies can get more from enterprise tech

Data Streaming: The Key to Tackling Data Challenges for AI Success

The platform play:How to operate like atech company

The missing data link: Five practical lessons to scale your data products

The Business Value of the DSP: Part 1 – From Apache Kafka® to a DSP and Part 2 – A Framework for Measuring Impact

On The Future Of Cloud Services And BYOC

The bottom-line benefit of the product operating model

Enterprise Apache Kafka Cluster Strategies: Insights and Best Practices

McKinsey: Is your company rewired to outcompete? & The potential of gen AI in maximizing cloud value

2023: The State of Generative AI in the Enterprise

How to build a data architecture to drive innovation—today and tomorrow

The data-driven enterprise of 2025

7 enterprise data strategy trends

Moving Up the Curve: 5 Tips For Enabling Enterprise-Wide Data Streaming

Managing the forces of fragmentation: How IT can balance local needs and global efficiency in a mult

What every CEO should know about generative AI

Two Great Data Mesh articles

Who Owns the Generative AI Platform?

4 great data posts

Why Modern Business Runs On Data Streaming

We’re Abusing The Data Warehouse; RETL, ELT, And Other Weird Stuff