Java 8 Functional Benchmark
How Java 8 lambdas and streams perform compared to longstanding implementations?

Lambda expressions and streams received an heartwarming welcome in Java 8. These are by far the most exciting features making their way to Java in a long long time. The new language features allow us to adopt a more functional style in our code and we had lots of fun playing around with them. So much fun that it should be illegal. Then we got suspicious, and decided to put them to the test.

We’ve taken a simple task of finding a max value in an ArrayList and tested longstanding implementations versus new methods that became available with Java 8. Honestly, the results were quite surprising.

Imperative vs Functional Style Programming in Java 8

We like getting straight down to the point, so let’s take a look at the results. For this benchmark, we’ve created an ArrayList, populated it with 100,000 random integers and implemented 7 different ways to go through all the values to find the maximum. The implementations are divided into 2 groups: Functional style with new language features introduced in Java 8 and an imperative style with longstanding Java methods.

Here’s how long each method took:

Java 8 Functional Benchmark

** The biggest error recorded was 0.042 on parallelStream, full results output is available at the bottom of this post

Takeaways

  1. Whoops! Implementing a solution with ANY of the new methods Java 8 offers caused around a 5x performance hit. Sometimes using a simple loop with an iterator is better than getting lambdas and streams into the mix. Even if it means writing a few more lines of code and skipping on that sweet syntactic sugar.
  2. Using iterators or a for-each loop is the most effective way to go over an ArrayList. Twice as better than a traditional for loop with an index int.
  3. Among the Java 8 methods, using parallel streams proved to be more effective. But watchout, in some cases it could actually slow you down.
  4. Lambas took their place in-between the stream and the parallelStream implementations. Which is kind of surprising since their implementation is based on the stream API.
  5. [EDIT] Things are not always as they seem: While we wanted to show how easy it is to introduce errors in lambdas and streams, we received lots of community feedback requesting to add more optimizations to the benchmark code and remove the boxing/unboxing of integers. The second set of results including the optimizations is available at the bottom of this post.

Find the Crap in Your Java App

Show me how >>

Fred

Wait, what exactly did we test here?

Let’s have a quick look on each of the methods, from the fastest to the slowest:

Imperative Style

iteratorMaxInteger() – Going over the list with an iterator:

forEachLoopMaxInteger() – Losing the Iterator and going over the list with a For-Each loop (not to be mistaken with Java 8 forEach):

forMaxInteger() – Going over the list with a simple for loop and an int index:

Functional Style

parallelStreamMaxInteger() – Going over the list using Java 8 stream, in parallel mode:

lambdaMaxInteger() – Using a lambda expression with a stream. Sweet one-liner:

forEachLambdaMaxInteger() – This one is a bit messy for our use case. Probably the most annoying thing with the new Java 8 forEach feature is that it can only use final variables, so we created a little workaround with a final wrapper class that accesses the max value we’re updating:

btw, if we’re already talking about forEach, check out this StackOverflow answer we ran into providing some interesting insights into some of its shortcomings.

streamMaxInteger() – Going over the list using Java 8 stream:

Find the Crap in Your Java App

Show me how >>

Fred

Optimized Benchmark

Following the feedback for this post, we’ve created another version of the benchmark. All the differences from the original code can be viewed right here. Here are the results:

Java 8 Functional Benchmark

TL;DR: Summary of the changes

  1. The list is no longer Volatile.
  2. New method forMax2 removes field access.
  3. The redundant helper function in forEachLambda is fixed. Now the lambda is also assigning a value. Less readable, but faster.
  4. Auto-boxing eliminated. If you turn on auto-boxing warnings for the project in Eclipse, the old code had 15 warnings.
  5. Fixed streams code by using mapToInt before reduce.

Thanks to Patrick Reinhart, Richard WarburtonYan BonnelSergey Kuksenko, Jeff MaxwellHenrik Gustafsson and everyone who commented and on twitter for your contribution!

Another optimization that came from our community was that parallelStream max is faster than using reduce, as you can see in James Pittendreigh‏’s tweet:

The groundwork

To run this benchmark we used JMH, the Java Microbenchmarking Harness. If you’d like to learn more about how to use it in your own projects, check out this post where we go through some of its main features with a hands-on example.

The benchmark configuration included 2 forks of the JVM, 5 warmup iterations and 5 measurement iterations. The tests were run on a c3.xlarge Amazon EC2 instance (4 vCPUs, 7.5 Mem (GiB), 2 x 40 GB SSD storage), using Java 8u66 with JMH 1.11.2. The full source code is available on GitHub, and you can view the raw results output right here.

With that said, a little disclaimer: Benchmarks tend to be pretty treacherous and it’s super hard to get it right. While we tried to run it in the most accurate way, it’s always recommended to take the results with a grain of salt.

Final Thoughts

The first thing to do when you get on Java 8 is to try lambda expressions and streams in action. But beware: It feels really nice and sweet so you might get addicted! We’ve seen that sticking to a more traditional Java programming style with iterators and for-each loops significantly outperforms new implementations made available by Java 8. Of course it’s not always the case, but in this pretty common example, it showed it can be around 5 times worse. Which can get pretty scary if it affects a core part of your system or creates a new bottleneck.

Java 8

Java 8 exceptions have never been so beautiful – Try Takipi for Java 8

email
Some kind of monster @ OverOps, GDG Haifa lead.
  • Jose Romero

    Good post! Thanks!

    • http://www.takipi.com/ Alex Zhitnitsky

      Thanks for the comment Jose 🙂

  • Henrik Gustafsson

    Interesting results! It certainly shows using streams in performance sensitive applications need careful thought.

    In this case, I believe the main reason for the performance difference lie in the ability of the JVM to optimize away much of the integer boxing/unboxing in the non-stream cases.

    With this hypothesis in mind, I tried out a few different variants.

    * A test that converts the String into an IntStream before calculating the max value, which improved things quite a bit.
    * A separate suite of tests that used an int[] and IntStream.of() to perform the same tests as previously, only they were done immediately using IntStreams, which caused the stream numbers to be almost identical to the original non-stream tests.

    My results (on my loaded laptop, so don’t trust the numbers) in the attached image.

    Sources here: https://github.com/gsson/loops-jmh-playground/tree/primitive-int

    • http://www.takipi.com/ Alex Zhitnitsky

      Hi Henrik, thanks for the comments and the extra tests! We’ll do another run and post some updates to the post. That’s probably the most important takeaway, how easily we can get a 5x performance hit without paying attention

      • Jeff Maxwell

        Here were my top results from adding the code below.

        LoopBenchmarkMain.parallelIntStreamMaxInteger
        avgt
        10
        0.105
        ± 0.007 ms/op

        LoopBenchmarkMain.forEachLoopMaxInteger
        avgt
        10
        0.115
        ± 0.004 ms/op

        Source: https://github.com/jmax01/loops-jmh-playground

        @Benchmark
        @BenchmarkMode(Mode.AverageTime)
        @OutputTimeUnit(TimeUnit.MILLISECONDS)
        @Fork(2)
        @Measurement(iterations = 5)
        @Warmup(iterations = 5)
        public int parallelIntStreamMaxInteger() {

        final List ints = this.integers;

        return ints.parallelStream()
        .mapToInt(e -> e)
        .max()
        .getAsInt();
        }

        • http://www.takipi.com/ Alex Zhitnitsky

          Thanks for the contribution Jeff! We’ve just posted an update, the new results are available at the bottom of this post

    • http://www.takipi.com/ Alex Zhitnitsky

      We’ve just posted an update, the new results are available at the bottom of this post. Thanks again for the contribution!

  • Sergey Kuksenko

    There are a couple benchmarking flaws.

    1. volatile List integers = null;

    Declaring this field as volatile leads to failure of range check elimination in the forMaxInteger benchmark. This confuses people and completely unnecessary here.

    2. All stream benchmarks suffer from Integer.max, which causes inadequate boxing. Yep, developer easily could meet this issue, but again – it doesn’t related to streams.

    Just replace “(a, b) -> Integer.max(a, b)” to “(a, b) -> a < b ? b : a" and check what you get. 🙂

    • http://www.takipi.com/ Alex Zhitnitsky

      Thanks for the comment Sergey! We’ll do another test run with the fixes and post the results. These issues also demonstrate how easily you can get into trouble with lambdas and streams

      • Sergey Kuksenko

        Nope. Both issues are not related neither lambdas nor streams.

        • http://www.takipi.com/ Alex Zhitnitsky

          Hi Sergey, we’ve just posted an update, check out the new results with optimizations at the bottom of this post. I think that issues like these can be pretty common and the post shows just how they affect performance.

  • yan bonnel

    I made a pull request with many fixes : https://github.com/takipi/loops-jmh-playground/pull/2

    • http://www.takipi.com/ Alex Zhitnitsky

      Thanks Yan! We’re reviewing and will post some updates soon

      • yan bonnel

        Looks better 🙂 For lambda, try change Integer.max by Math.max. I think you hit a limit of jit so it doesn’t inline code?

        • http://www.takipi.com/ Alex Zhitnitsky

          Thanks, perhaps. We’ll try that in a bit. It’s a pitfall that can be pretty common so it’s worth highlighting

    • http://www.takipi.com/ Alex Zhitnitsky

      Hi Yan, we’ve just posted an update, check out the new results at the bottom of this post. Thanks for the contribution!

  • Daniel

    I am failing to see how a forEach interator would be faster than the for iterator. Looking at the “get” vs “next” code, they should be at least equivalent. Are there any optimizations done by the compiler?

    public E get(int index) {
    rangeCheck(index);
    return elementData(index);
    }

    compared to

    public E next() {
    checkForComodification();
    int i = cursor;
    if (i >= size)
    throw new NoSuchElementException();
    Object[] elementData = ArrayList.this.elementData;
    if (i >= elementData.length)
    throw new ConcurrentModificationException();
    cursor = i + 1;
    return (E) elementData[lastRet = i];
    }

    • http://www.takipi.com/ Alex Zhitnitsky

      Hi Daniel, looks like most of the difference in performance has to do with integer unboxing. We’ll be adding an update to the post. However, it still shows how easy it is to introduce performance issues to the code

    • http://www.takipi.com/ Alex Zhitnitsky

      We’ve just posted an update, check out the new results at the bottom of this post

  • Massinissa Bouziad

    I would like to see the performance of a specialized Stream (IntStream). The IntStream should perform better as it avoid to inbox/outbox while the stream is getting consumed

    • Henrik Gustafsson

      See my comment (and code) below for precisely that 🙂

      • Massinissa Bouziad

        Oups ok sorry. My comment is useless.

    • http://www.takipi.com/ Alex Zhitnitsky

      Hi Massinissa, thanks for the comment! We’ve received quite a few feedbacks about it here and on twitter and will be adding an update to the post

    • http://www.takipi.com/ Alex Zhitnitsky

      Just posted an update, check out the new results at the bottom of this post

  • FuzzyFace

    My understanding was that there is a performance advantage to streams – but only if there is enough processing to warrant splitting it across multiple processors. What did you do to test that?

    • http://www.takipi.com/ Alex Zhitnitsky

      Hey, we’ve just published an update with new optimizations, check out the new results at the bottom of this post

  • kon901

    I found something interesting. I have 2 methods:

    private static int maxStream(List integers) {

    return integers.stream().mapToInt(Integer::intValue).

    reduce(Integer.MIN_VALUE, (a, b) -> a > b ? a : b);

    }

    private static int maxForEach(List integers) {

    int max = Integer.MIN_VALUE;

    for (Integer integer : integers) {

    max = Integer.max(max, integer.intValue());

    }

    return max;

    }

    And when I call this:

    long start = System.currentTimeMillis();
    int maxForEach = maxForEach(integers);
    long stop = System.currentTimeMillis();
    System.out.println(stop – start + ” ms for each”);
    System.out.println(maxForEach);

    //maxStream(integers);

    start = System.currentTimeMillis();
    int maxStream = maxStream(integers);
    stop = System.currentTimeMillis();
    System.out.println(stop – start + ” ms stream”);
    System.out.println(maxStream);

    I get something like this:

    13 ms for each

    999997

    78 ms stream

    999997

    But When I uncomment this line:

    //maxStream(integers);

    I get this:

    15 ms for each

    999991

    2 ms stream

    999991

    And each next run of maxStream() took about 2 ms.

    Does Java have something like cache for stream operations or I miss something?

  • Neeraj

    Why don’t you update the title based on the new benchmark results ? This article now seems not clear.

    • http://www.takipi.com/ Alex Zhitnitsky

      Hi Neeraj, thanks for the comment. We’ve just recently updated the title to “Misusing”.

      • Daniel Hajduk

        Please change the title of the article to : “We don’t know how to use new Java 8 features thus they are up to 5 times slower for us.”

  • Sergey Nechaev

    Good tests, but I believe, for the parallelism to kick in (and justify the overhead) the data set should fairly large and more complex – 100k of integers doesn’t really show its powers…

  • http://xenoterracide.com/ Caleb Cushing

    isn’t there also an int stream that’s optimized for unboxed integers? I realize that you’ve otherwise optimized but… it’s still interesting to know that the simple solution is sometimes slower, and that you should test your code. Though I feel this example is contrived and that in most cases you wouldn’t have enough ints for it to matter. I could be wrong, do you have this real world problem?

  • Alastair Rae

    I wouldn’t want to criticise the method or conclusions but I guess the key word is “misuse” in the title. I have a large body of old code which I know for certain would run much faster using streams even without parallels. And certainly Java 8 is fun.

  • Android Developer

    I think something is wrong in the code as it’s shown on the webpage. Does it suppose to have “& gt;” ?

    • Vladislav Kouchnarev

      It’s a formatting error on the blog.

      Optional & lt;
      Integer & gt;
      max = integers.stream().reduce(Integer::max);

      I presume is supposed to read as:

      Optional max = integers.stream().reduce(Integer::max);

  • Aram M

    Coming to this a little late, but I found it very odd that I got such a large error variation for streams.


    Benchmark Mode Cnt Score Error Units
    LoopBenchmarkMain.forEachLambdaMaxInteger avgt 10 0.592 ▒ 0.132 ms/op
    LoopBenchmarkMain.forEachLoopMaxInteger avgt 10 0.107 ▒ 0.008 ms/op
    LoopBenchmarkMain.forMaxInteger avgt 10 0.233 ▒ 0.004 ms/op
    LoopBenchmarkMain.iteratorMaxInteger avgt 10 0.106 ▒ 0.003 ms/op
    LoopBenchmarkMain.lambdaMaxInteger avgt 10 0.481 ▒ 0.008 ms/op
    LoopBenchmarkMain.parallelStreamMaxInteger avgt 10 0.183 ▒ 0.007 ms/op
    LoopBenchmarkMain.streamMaxInteger avgt 10 0.591 ▒ 0.034 ms/op

  • KevlarC

    Just re-ran the results then added
    – IntStream.of(int[])
    – for loop over the int[]
    – use of Stream.max()
    – and increased the list size to 10 million.

    The results are a little surprising where the parallel stream over the IntStream was about 8 times faster than a traditional for loop and the normal IntStream is marginally slower than the traditional for loop of the array.

    my fork and results are in:
    https://github.com/kevlarC/loops-jmh-playground.git
    with the pull request at:
    https://github.com/takipi/loops-jmh-playground/pull/5

  • Osama

    Question, I have ran the latest code from git and getting the same result for all of them, what am I missing?

    iteratorMaxInteger max is: 999992
    forEachLoopMaxInteger max is: 999992
    forEachLambdaMaxInteger max is: 999992
    forMaxInteger max is: 999992
    parallelStreamMaxInteger max is: 999992
    streamMaxInteger max is: 999992
    iteratorMaxInteger max is: 999992

  • Matthew Shapiro

    It looks like your code was abused and had turned into their html representations “& lt;” and “& gt;” (note the misplaced space, which is present in your code), which were then inserted back into your code. It’s causing odd line breaking and I believe masking content. Not on every blurb, but on most of them.

  • Johnny Phan

    Is there similar performance hit using LinQ in .NET?