Translating Java Code

Steven Zeil

Last modified: Jun 23, 2022

Contents:

1 Compiled Program Structure

2 Interpreted Program Structure

3 Translating Java

4 Related Program and File Structures

4.1 Class Names == File Names

4.2 Package Names == Directory Names

4.3 Project Structure

5 The CLASSPATH

Programming languages are generally translated in one of two ways:

Compilation: source code is compiled to create an executable in “native” machine code, which can then be executed
- Examples: FORTRAN, C, C++
Interpretation: execute source code “directly”, translating “on the fly”
- Examples: LISP, Basic, Perl, shells

1 Compiled Program Structure

Here is a rather typical structure for a compiled language (in this case, C++).

Source code is compiled into object code (native machine code with some addresses in symbolic form)
object code files are linked to form executable (replacing symbolic addresses with real ones)
Machine code can then be executed, reading input data and producing output data

2 Interpreted Program Structure

In a purely interpreted language (e.g., Python, Perl, LISP, early forms of Basic, command shells), there is no object code or executable in the native machine code. Instead, an interpreter decides what each statement of source code “means” and executes it immediately.

Pure interpretation can be convenient in terms of turn-around. If you want to make a quick change to a program, you can immediately execute it without needing to wait for it to compile.

On the other hand, pure interpretation can be agonizingly slow. If, for example, you have a statement inside a loop that repeats 10,000 times, not only is that statement performed 10,000 times but it also must be translated 10,000 times. Purely interpreted programs tend to execute much, much more slowly than their compiled counterparts.

3 Translating Java

It’s common for people to refer to Java as an interpreted language, but Java actually uses a “hybrid” translation model:

Source code in .java files is compiled into object code for an imaginary CPU, the Java Virtual Machine (JVM)
- Stored in .class files
A simulator for that imaginary CPU is used to interpret the .class code
- Unlike “true” compiled code, Java .class files can be run on any CPU to which the simulator has been ported.

This hybrid approach allows Java code to be run on any machine that has a JVM program, regardless of what machine was used to compile the code. At the same time, the time penalty of interpretation is greatly reduced because the JVM, although it is technically an interpreter, is interpreting a very simple, low-level language and is therefore adds relatively low overhead.

In recent years, a lot of effort has been put into reducing that overhead further by allowing the JVM to request “just in time” compilation of code so that portions of the program may be rendered into native code.

One thing that will strike C++ programmers as odd is the link between program structure and file structure in Java. Both the JVM and the Java compiler enforce a general rule that file structure must mirror program structure.

4.1 Class Names == File Names

In C++, we choose file names related to the names of the classes that they contain. If I had a class Rectangle, I would probably store it in rectangle.h and rectangle.cpp. (Some programmers would capitalize the file names to increase the similarity to the class name.)

Keeping the file names related to the contents is a matter of courtesy in C++ — courtesy to other programmers who might later read your code and to yourself in those cases where you might return to your code some months later wanting to find a portion to re-use in a separate project. It’s not a requirement of C++. If you really wanted to, you could store the class Rectangle in circle.h and guessWho.cpp and no C++ compiler would complain at all about it.

Java is different. Class Foo must be stored in file Foo.java (and the capitalization must match). In part, that’s because Java does not use #include statements to stitch code together, but instead uses the relation between class names and file names to “automatically” locate the files containing source code and compiled code for classes that you mention in your programs.

When you compile your Java source code, the .class files that you produce will carry the same names. So, in Java, a class Foo will have its source code in Foo.java and its compiled code in Foo.class.

4.2 Package Names == Directory Names

Classes in Java are typically grouped into packages. Now, if your classes are represented by identically named files, and you group those classes into a package, then you might guess that the files will need to be grouped together as well. And how do we group files? Into directories (folders)!

So Java extends the principle that file structure mirrors program structure by requiring each package to be represented by a an identically named directory. So if Java classes Foo and Bar are part of a package named Baz, then I would expect to find their source code in Baz/Foo.java and Baz/Bar.java and their compiled code in Baz/Foo.class and Baz/Bar.class.

Packages can contain other packages, and it’s also conventional in Java for packages to reflect the URL of the person or organization that wrote the code. Thus, for example, I happen to be the author of a class named Animation that is contained in package AlgAE that is contained in package zeil that is contained in package cs that is contained in package odu that is contained in package edu. The full name of the class is actually edu.odu.cs.zeil.AlgAE.Animation, and I would find its source code in edu/odu/cs/zeil/AlgAE/Animation.java.

It’s a common convention in the Java programming community that the first several levels of a package are simply the reversed for of the the URL where you might go to find that code (or, at least, the people or organization who wrote it. For example, you might expect that anything I write would be stored at http://…cs.odu.edu/…, so my package list starts with edu.odu.cs. In theory, the edu package would contain all code written by members of educational institutions around the world, the package edu.odu would contain all code written at ODU, and the package edu.odu.cs would contain all code written by members of the ODU CS Dept.

4.3 Project Structure

For example, suppose that we are working in a directory named myProject and that we had source code like this:

package Project.Utilities;

class Storage {
  ⋮

and

package Project;

import Project.Utilities.Storage;

class InventoryManager { 
  ⋮
public static 
void main (java.lang.String[] args) {
  ⋮
}
}

then we would need a directory/file structure like this:

-- myProject/
   |-- Project/
   |   |-- InventoryManager.java
   |   |-- Utilities
   |   |   |-- Storage.java

Inside the Project directory would be the InventoryManager.java file and another directory named “Utilities”. The Storage.java function goes inside that Utilities directory.

Now, here’s the part that trips up many a Java programmer:

When you compile and execute code in Java packages, you must always do so from the directory at the top of the package structure. In our example, that would be myProject.

If you are configuring a Java project in an IDE, you must tell it that your source directory is that same directory ()

Two more things to watch for:

The javac compilation command takes the name of the source code filepath.
The java execution command takes the name of the class containing the main function.

So the compilation and execution commands would be

cd MyProject  # if we aren't already in there
javac -g Project/Utilities/Storage.java
javac -g Project/InventoryManager.java
java Project.InventoryManager

Notice that the javac compilation commands use ‘/’ within file paths because, well, that’s how we always write file paths. But the javac execution command uses a period (.) because we aren’t giving a file path – we are naming a compiled class. The name of the Java class that started its life inside Project/InventoryManager.java is Project.InventoryManager, which means “the class named ‘InventoryManager’ inside the package named ‘Project’”.

5 The CLASSPATH

Java’s special rules about how to name the source code files and the directories were they live are aimed at one idea: any time our code mentions a class that isn’t in the file being compiled or executed, Java can use those rules to find the .java or .class files containing that class. So if our code makes use of a class named edu.odu.cs.Example, the compiler knows to look inside a directory edu/odu/cs/ for a file named wither Example.java or Example.class.

But edu/odu/cs is a relative path. Where does Java start it’s search from? In other words, where does it look for the edu/ directory?

By default, when we compile with a javac command or run programs java, the compiler searches for code in 1. Our current working directory (.), and, if can’t find it there, 2. inside the compiler’s own pre-built libraries.

You can control everything except the final step of searching through the pre-build system libraries by manipulating the Java CLASSPATH, the list of directories where the Java programs search for code. This is done in one of two ways.

You can set an environment variable named CLASSPATH to the appropriate list. e.g.:
```
export CLASSPATH=.:/home/myLoginName/libraries/myJavaCode/
```
A CLASSPATH is a list of paths to directories, separated by ‘:’. So the above command would tell all subsequent javac and java commands to first look in my current working directory (.) and then to look in /home/myLoginName/libraries/myJavaCode/, where, presumably, I have placed some of my favorite Java classes for later use.

You can supply the same path list to a javac or java command via a -cp option, e.g.,

javac -g -cp .:/home/myLoginName/libraries/myJavaCode/ edu/odu/cs/InventoryManager.java
java -cp .:/home/myLoginName/libraries/myJavaCode/ edu.odu.cs.InventoryManager