lightbend

Everything you need to know about Reactive Streams and Lightbend

There are a lot of initiatives that aim to improve workflows for developers, but only some actually makes it to a living and breathing concept. One of them is Reactive Streams.

Its goal is to to provide a standard for asynchronous stream processing with non-blocking back pressure. Flow-wise, it allows the creation of many implementations, which will be able to preserve the benefits and characteristics of async programming across the whole processing graph of a stream application.

Let’s see what it means and how a huge project like this gets even started.

The Building Blocks of Reactive Streams

This concept was first ignited by Lightbend in 2013, soon to be joined by other companies in pursuit of creating a new interoperable standard that can work with any language. After a bit over a year of incubation as Reactive Streams, the interfaces/contracts have been included verbatim in the JDK and are scheduled for arrival in JDK9 – which right now is scheduled for July 2017. So while waiting for the JDK to include the Reactive Streams interfaces, some 3rd party libraries already implement them and you can use them and their exact semantics before they’re included in the JDK.

But what’s a Reactive Stream and how do you actually use it? To answer this question, we had a chat with Konrad Malawski, senior developer on Lightbend’s core Akka team. Konrad has contributed to various projects in Akka, including Streams, and implemented the technology compatibility kit (TCK) for the Reactive Streams.

From Reactive Streams to Akka Streams

The idea behind Reactive Streams started when Lightbend wanted to set-up an industry wide collaboration for solving back-pressured asynchronous stream processing. Back pressure describes the build-up of data, when the incoming task rate is higher than the system’s ability to process them – resulting in a buffer of unhandled data.

The company wanted to introduce modern reactive systems next to legacy systems, which often could not cope with the high throughput the greenfield projects would require.

For example, if an application that produces data streams would be rewritten using reactive technology, it would be much faster than the other applications which are consuming this data. That’s great, but the high volumes of data might lead to it being unstable, which in result might affect the data or the end-users.

On a smaller scale the same thing happens within any asynchronous system as well, which is what most, if not all, reactive applications are.

Enter Reactive Streams. It allows the developers to have a well throttled (back-pressured) flow of data throughout such systems. It gained popularity, and since it became a standard various independent libraries speak the same semantics, and can seamlessly connect to each other.

In case you’re wondering what are the driving forces behind Akka Streams, you can read our interview with Dr. Roland Kuhn, former Akka Tech Lead at Lightbend.

Specification to Implementation

Konrad states that in Akka Streams, Lightbend made a choice to lay the foundations for future monitoring and debugging utilities, since it’s important to think of the next steps when implementing Streams, due to the effort and “pain” of adding it afterwards.

“This decision came from our experience with implementing Actor based architectures, where we learned that their performance (multiple millions of messages per second) is usually vastly superior to their non-reactive counterparts.”

“However, the actual difficulty of asynchronous distributed systems lies with understanding and operationalizing these systems. So while we’re still focused on the performance aspect, we’re about to work on superior ‘understandability’ tooling, that with other libraries would simply not be possible to “slap on” as an afterthought.”

Konrad has been involved in the standard as well as the implementation of Akka Streams, that provides various operations (like filter, map, mapConcat, balance, merge, route) as well as a collection of connectors (codenamed Alpakka) to external systems such as Kafka, Cassandra, SQL Databases, JMS message queues and more.

He adds that: “In a way you can look from the perspective of attempting to fill the gap that Camel has left open in the integration space – reactive, high-performance, asynchronous integration between systems. We also use them to implement our HTTP server and remoting infrastructure.”

The Future of Streams

Lightbend is now in a process of exponential growth, in which the awareness of message driven architectures is increasing. Nowadays, the company is implementing HTTP/2 inside a fully Reactive HTTP Server (Akka HTTP), and improving performance of the remoting subsystem by pushing 700k messages per second to a typical EC2 environment.

According to Konrad, “This combination of leading HTTP support across our tools and insanely fast communication inside the cluster, allows us to build applications in ways that were not possible before.”

He also adds that “With high throughput and low latency of within-cluster messaging, we’re able to spend a few hops between servers, where otherwise you would not be able to afford that simply because of the network call overheads. This allows our systems to scale way more efficiently than in other all-the-way-JSON-and-HTTP architectures.”

To learn more about the principles and aspects of the Reactive initiative, download the “Why Reactive?” eBook by Konrad himself.

Final Thoughts

We at OverOps are always intrigued by new technologies and initiatives that aim to make developers’ lives easier. Reactive Streams was built as a community initiative in order to solve the issue of back-pressure, and now it’s about to become part of JDK 9. That’s pretty impressive, and we’re all for it.

email
Yoda

Looking for more posts like this?

Join our force of more than 30,000 Java Jedi masters!

Watch a live demo
Yoda
I write about Java, Scala and everything in between. Lover of gadgets, apps, technology and tea.
  • gacl

    There are dozens of reactive stream or iteratee type libraries/frameworks in the JDK world. For the big distributed heavy weights Spark and Kafka have reactive stream type models. Then Storm, Samza, Flink. And of course Akka featured here. Lightbend’s Play Framework has their own iteratee construct. For local in-process reactive streams there is RxJava, RxScala, Scala’s stdlib has a lazy stream, then there is the more complex and full featured fs2 (functional streams 2).

    • http://malaw.ski Konrad `ktoso` Malawski

      Hi there, it seems you’ve fallen victim of the popular misconception that anything with the word “stream” in it is really comparable with one another. Reactive Streams are a very specific thing, please read the blog post which outlines it’s inception https://medium.com/@viktorklang/reactive-streams-1-0-0-interview-faaca2c00bec and spec which we worked on (and has been incorporated into the upcoming JDK9): http://www.reactive-streams.org

      Having that said through, yes the exact purpose of Reactive Streams is bringing together these various libraries such that we can communicate easily across these various libs 🙂

      • gacl

        You and Lightbend make awesome products but you aren’t authorities on linguistic definitions of terms like “stream” or even “reactive”.

        By the wikipedia definition of reactive programming:
        https://en.wikipedia.org/wiki/Reactive_programming

        All of the following tools clearly count as reactive streaming technologies: Spark RDD/DataFrame/DStream, Kafka KStream, scala.collection.immutable.Stream, Scala fs2 streams, RxJava/RxScala, Play iteratee, and Akka are all reactive streams.

        If not, why not?

        • http://malaw.ski Konrad `ktoso` Malawski

          “Reactive stream” is a specific standard, with a specification and technology compatibility kit ( http://www.reactive-streams.org/reactive-streams-tck-1.0.0-javadoc/ ), so it’s as simple to check if a given tech passes/supports or not. It has received backing and recognition from various vendors, including Lightbend, Netflix, RedHat, Oracle, Pivotal and more. This becomes even more solidified by the fact of the JDK referring to the spec explicitly: http://download.java.net/java/jdk9/docs/api/java/util/concurrent/Flow.html (in fact, JDK9 Flow interfaces are exactly 1:1 what was develed as reactive-streams, and then included with slight rename).

          This is not to say that not being a reactive-stream is anything wrong, it’s perfectly fine and great – e.g. Spark etc. However for the sake of clarity, if referring to reactive-stream let’s stick to the well defined meaning. Also, you’ll notice that I’m rarely referring to reactive programming – rather we talk about reactive systems: explained in http://www.oreilly.com/programming/free/why-reactive.csp as well as the longer more in depth one: https://www.oreilly.com/ideas/reactive-programming-vs-reactive-systems which loops back all the way to 2013 in which the reactive manifesto was coined (way before reactive was as hyped as it is now :-)): http://www.reactivemanifesto.org/

          Hope you’ll have the time to read some of the links, at the same time I don’t feel we’re really in disagreement in general however I’d like the wording to be correct when it comes to the reactive-streams spec in order to avoid confusion in the industry (e.g. someone though x is reactive streams, but it’s not, so they would not be able to use it as intended).