A beginner’s crash course in Java Virtual Machine (JVM) architecture and Java bytecode 101

Java applications are all around us, they’re on our phones, on our tablets, and on our computers. In many programming languages this means compiling the code multiple times in order for it to run on different OSes. For us as developers, maybe the coolest thing about Java is that it’s designed to be platform-independent (as the old saying goes, “Write once, run anywhere”), so we only need to write and compile our code once.

How is this possible? Let’s dig into the Java Virtual Machine (JVM) to find out.

The JVM Architecture

It may sound surprising, but the JVM itself knows nothing about the Java programming language. Instead, it knows how to execute its own instruction set, called Java bytecode, which is organized in binary class files. Java code is compiled by the javac command into Java bytecode, which in turn gets translated into machine instructions by the JVM at runtime.

Threads

Java is designed to be concurrent, which means that different calculations can be performed at the same time by running several threads within the same process. When a new JVM process starts, a new thread (called the main thread) is created within the JVM. From this main thread, the code starts to run and other threads can be spawned. Real applications can have thousands of running threads that serve different purposes. Some serve user requests, others execute asynchronous backend tasks, etc.

Stack and Frames

Each Java thread is created along with a frame stack designed to hold method frames and to control method invocation and return. A method frame is used to store data and partial calculations of the method to which it belongs. When the method returns, its frame is discarded. Then, its return value is passed back to the invoker frame that can now use it to complete its own calculation.

JVM Process Structure

The JVM playground for executing a method is the method frame. The frame consists of two main parts:

  1. Local Variables Array – where the method’s parameters and local variables are stored
  2. Operand Stack – where the method’s computations are performed
Frame structure

Almost every bytecode command manipulates at least one of these two. Let’s see how.

How It Works

Let’s go over a simple example to understand how the different elements play together to run our program. Assume we have this simple program that calculates the value of 2+3 and prints the result:

To compile this class we run javac SimpleExample.java, which results in the compiled file SimpleExample.class. We already know this is a binary file that contains bytecode. So how can we inspect the class bytecode? Using javap.

javap is a command line tool that comes with the JDK and can disassemble class files. Calling javap -c -p prints out the disassembled bytecode (-c) of the class, including private (-p) members and methods:

Now what happens inside the JVM at runtime? java SimpleExample starts a new JVM process and the main thread is created. A new frame is created for the main method and pushed into the thread stack.

The main method has two variables: args and result. Both reside in the local variable table. The first two bytecode commands of main, iconst_2, and iconst_3, load the constant values 2 and 3 (respectively) into the operand stack. The next command invokestatic invokes the static method add. Since this method expects two integers as arguments, invokestatic pops two elements from the operand stack and passes them to the new frame created by the JVM for add. main’s operand stack is empty at this point.

In the add frame, these arguments are stored in the local variable array. The first two bytecode commands, iload_0 and iload_1 load the 0th and the 1st local variables into the stack. Next, iadd pops the top two elements from the operand stack, sums them up, and pushes the result back into the stack. Finally, ireturn pops the top element and passes it to the calling frame as the return value of the method, and the frame is discarded.

main’s stack now holds the return value of add. istore_1 pops it and sets it as the value of the variable at index 1, which is result. getstatic pushes the static field java/lang/System.out of type java/io/PrintStream onto the stack. iload_1 pushes the variable at index 1, which is the value of result that now equals 5, onto the stack. So at this point the stack holds 2 values: the ‘out’ field and the value 5. Now invokevirtual is about to invoke the PrintStream.println method. It pops two elements from the stack: the first one is a reference to the object for which the println method is going to be invoked. The second element is an integer argument to be passed to the println method, that expects a single argument. This is where the main method prints the result of add. Finally, the return command finishes the method. The main frame is discarded, and the JVM process ends.

This is it. All in all, not too complex.

“Write Once, Run Anywhere”

So what makes Java platform-independent? It all lies in the bytecode.

As we saw, any Java program compiles into standard Java bytecode. The JVM then translates it into the specific machine instructions at runtime. We no longer need to make sure our code is machine-compatible. Instead, our application can run on any device equipped with a JVM, and the JVM will do it for us. It’s the job of the JVM’s maintainers to provide different versions of JVMs to support different machines and operating systems.

This architecture enables any Java program to run on any device having a JVM installed on it. And so the magic happens.

Final Thoughts

Java developers can write great applications without understanding how the JVM works.
However, digging into the JVM architecture, learning its structure, and realizing how it interprets your code will help you become a better developer. It will also help you tackle really complex problem from time to time 🙂

PS. If you’re looking for a deeper dive into the JVM and how all of this relates to Java exceptions, look no further! (It’s all right here.)

You Might Also Like

Most Viewed Stories

Tzofia is Head of Application Development at OverOps, and the founder of the R&D Leaders Group in Tel Aviv. In her spare time, she likes to collect doodads from all over the world.
  • Prat Ambani

    Awesomely Explained.
    Well, can anybody suggest me how can I observe(while debugging) which element is residing on which part of the JVM’s memory spaces.

    • Tzofia Shiftan

      Thanks for the feedback Prat!
      The major profilers (visualvm, jconsole, etc) can help you analyze the memory footprint in general, and see more info about every instance, but I’m afraid I’m not aware of a way to see in what specific space a specific instance resides in. I’d be happy to find out as well 🙂

      • Pratik Ambani

        Fantastic!! I got what I was looking for. Thanks for sharing details Tzofia. Time to dig into visulavm and jconsole.. 🙂

        • Tzofia Shiftan

          Yayy, I’m glad to hear that! Another cool tool to check out is a VisualVM’s plugin called Visual GC.

          • Pratik Ambani

            Added in my ToDo… 🙂
            More power to you!

  • Eduardo Augusto

    Thanks

  • Rafael Monteiro

    Great explanation. Thanks.