Translating Java Code

Steven Zeil

Last modified: Jan 1, 2024
Contents:

Programming languages are generally translated in one of two ways:

1 Translating Code

1.1 How Code is Compiled

 

Here is a rather typical structure for a compiled language (in this case, C++).

  1. Source code is compiled into object code (native machine code with some addresses in symbolic form)

  2. object code files are linked to form executable (replacing symbolic addresses with real ones)

  3. Machine code can then be executed, reading input data and producing output data

1.2 How Code is Interpreted

 

In a purely interpreted language (e.g., Python, Perl, LISP, early forms of Basic, command shells), there is no object code or executable in the native machine code. Instead, an interpreter decides what each statement of source code “means” and executes it immediately.

Pure interpretation can be convenient in terms of turn-around. If you want to make a quick change to a program, you can immediately execute it without needing to wait for it to compile.

On the other hand, pure interpretation can be agonizingly slow. If, for example, you have a statement inside a loop that repeats 10,000 times, not only is that statement performed 10,000 times but it also must be translated 10,000 times. Purely interpreted programs tend to execute much, much more slowly than their compiled counterparts.

This slowdown is so dramatic that there are actually relatively few “pure” interpreters in modern use. Most interpreted languages are run by first doing a translation from the source code text into a data structure that describes the program into a more easily processed form, and then that data structure is interpreted to simulate the execution of the program.

Most modern Python interpreters use this kind of hybrid translation/interpretation approach, but the early translation step is entirely hidden behind the scene.

Java does something similar, but the initial translation is obvious and goes further than in Python and other hybrid interpreters.

1.3 Translating Java

 

It’s common for people to refer to Java as an interpreted language, but Java actually uses a “hybrid” translation model:

This hybrid approach allows Java code to be run on any machine that has a JVM program, regardless of what machine was used to compile the code. At the same time, the time penalty of interpretation is greatly reduced because the JVM, although it is technically an interpreter, is interpreting a very simple, low-level language and is therefore adds relatively low overhead.

In recent years, a lot of effort has been put into reducing that overhead further by allowing the JVM to request “just in time” compilation of code so that portions of the program may be rendered into native code.

2 Project Structure

We have previously commented on the fact that:

  1. Source code for a public Java class must appear in a .java file whose name exactly matches the name of the class.
  2. Java classes are often grouped into nested packages, and each package name must be matched by a directory with the same name as the package.

When we compile Java code into .class files, the compiler will apply similar rules to the .class files that it produces:

  1. Compiled code for a public Java class will appear in a .class file whose name exactly matches the name of the class.
  2. If the source code was in a package, the .class files are also considered to be long to the same package, and each package name will be matched by a directory with the same name as the package.

2.1 Example

For example, suppose that we are working in a directory named myProject and that we had source code like this:

package Project.Utilities;

class Storage {
  ⋮

and

package Project;

import Project.Utilities.Storage;

class InventoryManager { 
  ⋮
  public static void main (java.lang.String[] args) {
    ⋮
  }
}

then we would need a directory/file structure like this:

-- myProject/
   |-- Project/
   |   |-- InventoryManager.java
   |   |-- Utilities
   |   |   |-- Storage.java

Inside the Project directory would be the InventoryManager.java file and another directory named “Utilities”. The Storage.java function goes inside that Utilities directory.

2.2 How does Java find your code?

The combinations of rules relating Java code to the files that it inhabits means that the Java compiler can always find related source code files, and the Java virtual machine can always find related compiled code files.

Again, if we have

package Project.Utilities;

class Storage {
  ⋮

and

package Project;

class InventoryManager { 
  ⋮
  public static void main (java.lang.String[] args) {
    Project.Utilities.Storage warehouse = new Project.Utilities.Storage();
    ⋮
  }
}

When the compiler encounters the class name Project.Utilities.Storage, it knows to look for the already compiled code for that class in the file Project/Utilities/Storage.class or, if it has not been compiled yet, to look for the source code in the file Project/Utilities/Storage.java.

This is very different from both C++ and Python, where we as programmers are required to tell the compiler what files to look for.

2.2.1 If you come from C++

In C++, we might store a Storage class declaration in utilities/storage.h, looking like this:

namespace Project::Utilities {
    
    class Storage {
      ⋮
    };

};

and then use it like this:

#include "utilities/storage.h"

int main (java.lang.String[] args) {
    Project::Utilities::Storage warehouse;
    ⋮
  }

The #include statement is our way of telling the compiler where to look for some source code that, we as programmer happen to know, will contain our Storage class.

In Java, we didn’t need to say any such thing. The Java compiler knows where to find the code because it can turn the same of the class into the path to the source code.

2.2.2 If you come from Python

Similarly, in Python we might choose to create a class in project/utilities/utils.py:

class Storage:
    def __init__(self):
        ⋮

and then use it like this:

import project.utilities.utils

warehouse = project.utilities.Storage()

The import statement is our way of telling the compiler where to look for some source code that, we as programmer happen to know, will contain our Storage class.

In Java, we didn’t need to say any such thing. The Java compiler knows where to find the code because it can turn the same of the class into the path to the source code.

2.3 Imports: Abbreviating Names

A name like Project.Utilities.Storage in Java or Python, or `Project::Utilities::Storage in C++, is called a fully qualified name, meaning that it contains the all of the information on where in the program structure we would look to find that name.

Typing everything as fully qualified names can be tedious. In Java, we can use only the final part of a name if we first import that name:

package Project;

import Project.Utilities.Storage;

class InventoryManager { 
  ⋮
  public static void main (java.lang.String[] args) {
    Storage warehouse = new Storage();  // don't need to say Project.Utilities.Storage
    ⋮
  }
}

It’s possible, though somewhat frowned upon, to import all names from a package:

package Project;

import Project.Utilities.*;

class InventoryManager { 
  ⋮
  public static void main (java.lang.String[] args) {
    Storage warehouse = new Storage();  // don't need to say Project.Utilities.Storage
    ⋮
  }
}

2.3.1 If you come from C++

Don’t make the common mistake of thinking that Java import statements are like C++ #includes. The #include will make something available, but it does not shorten the name.

The C++ equivalent of Java’s import is the using statement.

The equivalent of

import Project.Utilities.Storage;

in C++ would be

using Project::Utilities::Storage; // Can henceforth refer to 'Storage' all by itself.

The equivalent of

import Project.Utilities.*;

in C++ would be

using namespace Project::Utilities; 

You’ve probably seen this most commonly as

using namespace std;

2.3.2 If you come from Python

The equivalent of

import Project.Utilities.Storage;

in Python would be

from project.utilities.utils import Storage

which both loads the source code file and then says what name we want to use in its abbreviated form.

2.4 Source Directories and Binary Directories

Now, here’s the part that trips up many a new Java programmer:

  • When you compile and execute code in Java packages, you must always do so from the directory at the top of the package structure.
  • If you are configuring a Java project in an IDE, you must tell it that your source directory is that same directory.

For example, if we have this structure for our Java project:

-- myProject/
   |-- Project/
   |   |-- InventoryManager.java
   |   |-- Utilities
   |   |   |-- Storage.java

we do not cd into myProject/Project before issuing a Java compilation or execution command. Instead, we do all compilation and execution in myProject/.

There’s an important distinction here. Project and Utilities are package names, so those directories are actually part of the Java source code. myProject is not a package name – that directory is simply “infrastructure”, it’s a place to keep our project.

Two more things to watch for:

So the compilation and execution commands would be

cd MyProject  # if we aren't already in there
javac -g Project/Utilities/Storage.java
javac -g Project/InventoryManager.java
java Project.InventoryManager

Notice that the javac compilation commands use ‘/’ within file paths because, well, that’s how we always write file paths (unless you are in Windows, in which case you use ’'). But the javac execution command uses a period (.) because we aren’t giving a file path – we are naming a compiled class. The name of the Java class that started its life inside Project/InventoryManager.java is Project.InventoryManager, which means “the class named ‘InventoryManager’ inside the package named ‘Project’”.

The above commands will leave us with out Java source and the compiled .class files intermingled:

-- myProject/
   |-- Project/
   |   |-- InventoryManager.class
   |   |-- InventoryManager.java
   |   |-- Utilities
   |   |   |-- Storage.class
   |   |   |-- Storage.java

That’s often considered ugly. For example, if we wanted to clean up our project and get rid of the .class files, we have to hunt through multiple package directories to get at all the class files. In Linux, we might do that like this:

find myProject -name '*.class' | xargs rm

(If there’s a good Windows equivalent, I don’t know it.)

It’s often considered better to place our source code into a designated source directory, usually called src/, and to ask the compiler to place our compiled binaries into a designated binary directory, often called bin/.

So we would start with a project structure like this:

-- myProject/
   |-- src/
   |   |-- Project/
   |   |   |-- InventoryManager.java
   |   |   |-- Utilities
   |   |   |   |-- Storage.java

and arrange for our compilation to produce this:

-- myProject/
   |-- bin/
   |   |-- Project/
   |   |   |-- InventoryManager.class
   |   |   |-- Utilities
   |   |   |   |-- Storage.class
   |-- src/
   |   |-- Project/
   |   |   |-- InventoryManager.java
   |   |   |-- Utilities
   |   |   |   |-- Storage.java

Among other things, this allows us to clean up our project by simply deleting the bin/ directory entirely. And that’s a much simpler single command in any operating system.

When you configure your IDE for a Java project, it will likely recognize that all of your source code is inside src/, but that src itself is not a package name in any of your Java source code. It will know, therefore, that src is part of your project infrastructure and will adjust its compilation commands accordingly.

In advanced projects, you might actually wind up with more than one source code directory and more than one binary directory. For example, it is common to separate “production” code from “testing” code using separate directories.

3 The CLASSPATH

Java’s special rules about how to name the source code files and the directories were they live are aimed at one idea: any time our code mentions a class that isn’t in the file being compiled or executed, Java can use those rules to find the .java or .class files containing that class. So if our code makes use of a class named edu.odu.cs.Example, the compiler knows to look inside a directory edu/odu/cs/ for a file named wither Example.java or Example.class.

But edu/odu/cs is a relative path. Where does Java start it’s search from? In other words, where does it look for the edu/ directory?

By default, when we compile with a javac command or run programs java, the compiler searches for code in 1. Our current working directory (.), and, if can’t find it there, 2. inside the compiler’s own pre-built libraries.

You can control everything except the final step of searching through the pre-build system libraries by manipulating the Java CLASSPATH, the list of directories where the Java programs search for code. This is done in one of two ways.

  1. You can set an environment variable named CLASSPATH to the appropriate list. e.g.:

    export CLASSPATH=.:/home/myLoginName/libraries/myProject/src
    

    A CLASSPATH is a list of paths to directories, separated by ‘:’. So the above command would tell all subsequent javac and java commands to first look in my current working directory (.) and then to look in /home/myLoginName/libraries/myProject/src, where, presumably, I have placed some of my favorite Java classes for later use.

  2. You can supply the same path list to a javac or java command via a -cp option, e.g.,

    javac -g -cp .:/home/myLoginName/libraries/myProject/src src/edu/odu/cs/InventoryManager.java
    java -cp .:/home/myLoginName/libraries/myProject/src edu.odu.cs.InventoryManager
    

Luckily, if you are working in an IDE, you are unlikely to need to worry about your CLASSPATH. In most cases, the IDE will set that for you. But you may see it mentioned in error messages from the Java compiler or from your IDE if your project settings are messed up.

Alternatively, if you are working with either of the Java-friendly build manger tools, Gradle or Maven, these will issue your javac commands for you with the appropriate CLASSPATH settings.