Translating Java Code
Steven Zeil
Programming languages are generally translated in one of two ways:
-
Compilation: source code is compiled to create an executable in “native” machine code, which can then be executed
-
Examples: FORTRAN, C, C++
-
-
Interpretation: execute source code “directly”, translating “on the fly”
-
Examples: LISP, Basic, Python, Perl, shells
-
1 Translating Code
1.1 How Code is Compiled
Here is a rather typical structure for a compiled language (in this case, C++).
-
Source code is compiled into object code (native machine code with some addresses in symbolic form)
-
object code files are linked to form executable (replacing symbolic addresses with real ones)
-
Machine code can then be executed, reading input data and producing output data
1.2 How Code is Interpreted
In a purely interpreted language (e.g., Python, Perl, LISP, early forms of Basic, command shells), there is no object code or executable in the native machine code. Instead, an interpreter decides what each statement of source code “means” and executes it immediately.
Pure interpretation can be convenient in terms of turn-around. If you want to make a quick change to a program, you can immediately execute it without needing to wait for it to compile.
On the other hand, pure interpretation can be agonizingly slow. If, for example, you have a statement inside a loop that repeats 10,000 times, not only is that statement performed 10,000 times but it also must be translated 10,000 times. Purely interpreted programs tend to execute much, much more slowly than their compiled counterparts.
This slowdown is so dramatic that there are actually relatively few “pure” interpreters in modern use. Most interpreted languages are run by first doing a translation from the source code text into a data structure that describes the program into a more easily processed form, and then that data structure is interpreted to simulate the execution of the program.
Most modern Python interpreters use this kind of hybrid translation/interpretation approach, but the early translation step is entirely hidden behind the scene.
Java does something similar, but the initial translation is obvious and goes further than in Python and other hybrid interpreters.
1.3 Translating Java
It’s common for people to refer to Java as an interpreted language, but Java actually uses a “hybrid” translation model:
-
Source code in
.java
files is compiled into object code for an imaginary CPU, the Java Virtual Machine (JVM)-
Stored in
.class
files
-
-
A simulator for that imaginary CPU is used to interpret the
.class
code-
Unlike “true” compiled code, Java
.class
files can be run on any CPU to which the simulator has been ported.
-
This hybrid approach allows Java code to be run on any machine that has a JVM program, regardless of what machine was used to compile the code. At the same time, the time penalty of interpretation is greatly reduced because the JVM, although it is technically an interpreter, is interpreting a very simple, low-level language and is therefore adds relatively low overhead.
In recent years, a lot of effort has been put into reducing that overhead further by allowing the JVM to request “just in time” compilation of code so that portions of the program may be rendered into native code.
2 Project Structure
We have previously commented on the fact that:
- Source code for a public Java class must appear in a
.java
file whose name exactly matches the name of the class. - Java classes are often grouped into nested packages, and each package name must be matched by a directory with the same name as the package.
When we compile Java code into .class
files, the compiler will apply similar rules to the .class
files that it produces:
- Compiled code for a public Java class will appear in a
.class
file whose name exactly matches the name of the class. - If the source code was in a package, the
.class
files are also considered to be long to the same package, and each package name will be matched by a directory with the same name as the package.
2.1 Example
For example, suppose that we are working in a directory named myProject
and that we had source code like this:
package Project.Utilities;
class Storage {
⋮
and
package Project;
import Project.Utilities.Storage;
class InventoryManager {
⋮
public static void main (java.lang.String[] args) {
⋮
}
}
then we would need a directory/file structure like this:
-- myProject/
|-- Project/
| |-- InventoryManager.java
| |-- Utilities
| | |-- Storage.java
Inside the Project
directory would be the InventoryManager.java
file and another directory named “Utilities”. The Storage.java
function goes inside that Utilities
directory.
2.2 How does Java find your code?
The combinations of rules relating Java code to the files that it inhabits means that the Java compiler can always find related source code files, and the Java virtual machine can always find related compiled code files.
Again, if we have
package Project.Utilities;
class Storage {
⋮
and
package Project;
class InventoryManager {
⋮
public static void main (java.lang.String[] args) {
Project.Utilities.Storage warehouse = new Project.Utilities.Storage();
⋮
}
}
When the compiler encounters the class name Project.Utilities.Storage
, it knows to look for the already compiled code for that class in the file Project/Utilities/Storage.class
or, if it has not been compiled yet, to look for the source code in the file Project/Utilities/Storage.java
.
This is very different from both C++ and Python, where we as programmers are required to tell the compiler what files to look for.
2.2.1 If you come from C++
In C++, we might store a Storage
class declaration in utilities/storage.h
, looking like this:
namespace Project::Utilities {
class Storage {
⋮
};
};
and then use it like this:
#include "utilities/storage.h"
int main (java.lang.String[] args) {
Project::Utilities::Storage warehouse;
⋮
}
The #include
statement is our way of telling the compiler where to look for some source code that, we as programmer happen to know, will contain our Storage
class.
In Java, we didn’t need to say any such thing. The Java compiler knows where to find the code because it can turn the same of the class into the path to the source code.
2.2.2 If you come from Python
Similarly, in Python we might choose to create a class in project/utilities/utils.py
:
class Storage:
def __init__(self):
⋮
and then use it like this:
import project.utilities.utils
warehouse = project.utilities.Storage()
The import
statement is our way of telling the compiler where to look for some source code that, we as programmer happen to know, will contain our Storage
class.
In Java, we didn’t need to say any such thing. The Java compiler knows where to find the code because it can turn the same of the class into the path to the source code.
2.3 Imports: Abbreviating Names
A name like Project.Utilities.Storage
in Java or Python, or `Project::Utilities::Storage
in C++, is called a fully qualified name, meaning that it contains the all of the information on where in the program structure we would look to find that name.
Typing everything as fully qualified names can be tedious. In Java, we can use only the final part of a name if we first import that name:
package Project;
import Project.Utilities.Storage;
class InventoryManager {
⋮
public static void main (java.lang.String[] args) {
Storage warehouse = new Storage(); // don't need to say Project.Utilities.Storage
⋮
}
}
It’s possible, though somewhat frowned upon, to import all names from a package:
package Project;
import Project.Utilities.*;
class InventoryManager {
⋮
public static void main (java.lang.String[] args) {
Storage warehouse = new Storage(); // don't need to say Project.Utilities.Storage
⋮
}
}
2.3.1 If you come from C++
Don’t make the common mistake of thinking that Java import
statements are like C++ #include
s. The #include
will make something available, but it does not shorten the name.
The C++ equivalent of Java’s import
is the using
statement.
The equivalent of
import Project.Utilities.Storage;
in C++ would be
using Project::Utilities::Storage; // Can henceforth refer to 'Storage' all by itself.
The equivalent of
import Project.Utilities.*;
in C++ would be
using namespace Project::Utilities;
You’ve probably seen this most commonly as
using namespace std;
2.3.2 If you come from Python
The equivalent of
import Project.Utilities.Storage;
in Python would be
from project.utilities.utils import Storage
which both loads the source code file and then says what name we want to use in its abbreviated form.
2.4 Source Directories and Binary Directories
Now, here’s the part that trips up many a new Java programmer:
- When you compile and execute code in Java packages, you must always do so from the directory at the top of the package structure.
- If you are configuring a Java project in an IDE, you must tell it that your source directory is that same directory.
For example, if we have this structure for our Java project:
-- myProject/
|-- Project/
| |-- InventoryManager.java
| |-- Utilities
| | |-- Storage.java
we do not cd
into myProject/Project
before issuing a Java compilation or execution command. Instead, we do all compilation and execution in myProject/
.
There’s an important distinction here. Project
and Utilities
are package names, so those directories are actually part of the Java source code. myProject
is not a package name – that directory is simply “infrastructure”, it’s a place to keep our project.
Two more things to watch for:
- The
javac
compilation command takes the name of the source code filepath. - The
java
execution command takes the name of the class containing themain
function.
So the compilation and execution commands would be
cd MyProject # if we aren't already in there
javac -g Project/Utilities/Storage.java
javac -g Project/InventoryManager.java
java Project.InventoryManager
Notice that the javac
compilation commands use ‘/’ within file paths because, well, that’s how we always write file paths (unless you are in Windows, in which case you use ’'). But the javac
execution command uses a period (.
) because we aren’t giving a file path – we are naming a compiled class. The name of the Java class that started its life inside Project/InventoryManager.java
is Project.InventoryManager
, which means “the class named ‘InventoryManager’ inside the package named ‘Project’”.
The above commands will leave us with out Java source and the compiled .class
files intermingled:
-- myProject/
|-- Project/
| |-- InventoryManager.class
| |-- InventoryManager.java
| |-- Utilities
| | |-- Storage.class
| | |-- Storage.java
That’s often considered ugly. For example, if we wanted to clean up our project and get rid of the .class
files, we have to hunt through multiple package directories to get at all the class files. In Linux, we might do that like this:
find myProject -name '*.class' | xargs rm
(If there’s a good Windows equivalent, I don’t know it.)
It’s often considered better to place our source code into a designated source directory, usually called src/
, and to ask the compiler to place our compiled binaries into a designated binary directory, often called bin/
.
So we would start with a project structure like this:
-- myProject/
|-- src/
| |-- Project/
| | |-- InventoryManager.java
| | |-- Utilities
| | | |-- Storage.java
and arrange for our compilation to produce this:
-- myProject/
|-- bin/
| |-- Project/
| | |-- InventoryManager.class
| | |-- Utilities
| | | |-- Storage.class
|-- src/
| |-- Project/
| | |-- InventoryManager.java
| | |-- Utilities
| | | |-- Storage.java
Among other things, this allows us to clean up our project by simply deleting the bin/
directory entirely. And that’s a much simpler single command in any operating system.
When you configure your IDE for a Java project, it will likely recognize that all of your source code is inside src/
, but that src
itself is not a package name in any of your Java source code. It will know, therefore, that src
is part of your project infrastructure and will adjust its compilation commands accordingly.
- The major exception to this is when setting up a new project where you do not yet have any
.java
files. In that case, the IDE will not know what to make of your directory structure, and you may need to use the various project settings menus in the IDE to tell it that you wantsrc
to be a source code directory rather than a package directory.
In advanced projects, you might actually wind up with more than one source code directory and more than one binary directory. For example, it is common to separate “production” code from “testing” code using separate directories.
3 The CLASSPATH
Java’s special rules about how to name the source code files and the directories were they live are aimed at one idea: any time our code mentions a class that isn’t in the file being compiled or executed, Java can use those rules to find the .java
or .class
files containing that class. So if our code makes use of a class named edu.odu.cs.Example
, the compiler knows to look inside a directory edu/odu/cs/
for a file named wither Example.java
or Example.class
.
But edu/odu/cs
is a relative path. Where does Java start it’s search from? In other words, where does it look for the edu/
directory?
By default, when we compile with a
javac
command or run programsjava
, the compiler searches for code in 1. Our current working directory (.
), and, if can’t find it there, 2. inside the compiler’s own pre-built libraries.
You can control everything except the final step of searching through the pre-build system libraries by manipulating the Java CLASSPATH, the list of directories where the Java programs search for code. This is done in one of two ways.
-
You can set an environment variable named CLASSPATH to the appropriate list. e.g.:
export CLASSPATH=.:/home/myLoginName/libraries/myProject/src
A CLASSPATH is a list of paths to directories, separated by ‘:’. So the above command would tell all subsequent
javac
andjava
commands to first look in my current working directory (.
) and then to look in/home/myLoginName/libraries/myProject/src
, where, presumably, I have placed some of my favorite Java classes for later use. -
You can supply the same path list to a
javac
orjava
command via a-cp
option, e.g.,javac -g -cp .:/home/myLoginName/libraries/myProject/src src/edu/odu/cs/InventoryManager.java java -cp .:/home/myLoginName/libraries/myProject/src edu.odu.cs.InventoryManager
Luckily, if you are working in an IDE, you are unlikely to need to worry about your CLASSPATH. In most cases, the IDE will set that for you. But you may see it mentioned in error messages from the Java compiler or from your IDE if your project settings are messed up.
Alternatively, if you are working with either of the Java-friendly build manger tools, Gradle
or Maven
, these will issue your javac
commands for you with the appropriate CLASSPATH settings.