Basic I/O: Commentary

Steven Zeil

Last modified: Sep 21, 2016
Contents:

We’re only going to look at the first part of this tutorial at this point in the course- I/O Streams.

1 I/O Streams

Java’s concept of an I/O stream is not so different in Java than in C++. The biggest difference is that,

1.1 Byte Streams

1.2 Character Streams

The statement made on this page that “All character stream classes are descended from Reader and Writer.” is not really true and points to a source of confusion within the java.io package.

There are really two collections of I/O classes in this API package. The older classes all have “Stream” in their name. A newer I/O framework was later added based on Reader and Writer. These were probably intended to replace the older InputStream and OutputStream classes, but never quite happened. In fact, it’s possible to interface between the two, as some of the examples in this tutorial will eventually show.

Most Java programmers use Reader/Writer rather than InputStream/OutputStream when all other things are equal, but seldom hesitate to fall back to the Streams when convenient.

1.3 Buffered Streams

1.4 Scanning and Formatting

1.4.1 Scanning

The Scanner class was a very late, but very welcome addition to the Java API. For many years, it was very frustrating to try to read numbers and other formatted data from java input streams. You usually had to simply read an entire line of input into a String and then handle all input fomatting within the line on your own.

The Scanner class makes that all much easier to do.

One thing that I find counter-intuitive about the use of the scanner is that it splits the input into tokens first, dividing the input accoprding to where the delimiters (whitespace) are, then allows you to use the various nextWhatever() functions to prociess those tokens.

This can lead to some problems when you want to process part of a token. For example, I recently wanted to read some data consisting of (x,y) coordinates where the data appeared rather like this:

(14, 6)  (15,5)
( 7, 12)
(21,14)  (15,22) (155,76)

The file contains each point inside parentheses, with the x and y values separated by a comma. The use of blank spaces and line breaks is arbitrary.

Now, usually Scanner is exactly what you want to use when the amount of whitespace (blanks, new lines, etc) is arbitrary. The problem here is that Scanner wants to treat everything separated by whitespace as single token. For example, you and I look at something like “(14,” and perceive it as three tokens - a left parenthesis, a number, and a comma. So it’s tempting to think that we could process this input file with Java code like this:

public void readPoints (Scanner in, ArrayList<Point> points) {
   while (in.hasNext()) {
	  String leftParen = in.next();
	  int x = in.nextInt();
	  String comma  = in.next();
	  int y = in.nextInt();
	  String rightParen = in.next();
	  points.add(new Point(x,y));
   }
}  

This won’t work, however, because the first assignment will actually grab the whole token “(14,” into leftParen. Then it will try to treat “6)” as an integer to be placed into x, but will throw an exception because “6)”, as a whole, is not a valid integer.

So Scanner has problems when we want to take tokens apart, piece by piece. If you look at the Scanner API documentation, you will see that it’s possible to give it specific patterns to look for. So a more sophisticated attempt to read this input might look like:

public void readPoints (Scanner in, ArrayList<Point> points) {
   while (in.hasNext("\\(")) {
	  String leftParen = in.next("\\(");
	  int x = in.nextInt();
	  String comma  = in.next(",");
	  int y = in.nextInt();
	  String rightParen = in.next("\\)");
	  points.add(new Point(x,y));
   }
}  

The patterns that can be given to next() and hasNext() are actually regular expressions, which we will cover later. For now, just accept that the three patterns shown should match the left and right parentheses and the comma. (The backslashes are required for the parentheses because parentheses normally have a special meaning in regular expressions, and the “\” is an instruction to suppress that special meaning and to simply treat what follows as an ordinary character.)

This new version, however, fares no better than the first version. In fact, the assignment to leftParen fails, because the first token, “(14,”, does nto consist exclusively of the desired left parenthesis, but has other characters as well.

It is possible to salvage this approach by using findWithinHorizon functions provided by Scanner. These search for patterns without breaking things into tokens first. So, this would work:

public void readPoints (Scanner in, ArrayList<Point> points) {
   while (in.hasNext("\\(")) {
	  String leftParen = in.findWithinHorizon("\\(", 0);
	  String xs = in.findWithinHorizon("\\d+", 0);
	  int x = Integer.parseInt(xs);
	  String comma  = in.findWithinHorizon(",", 0);
	  String ys = in.findWithinHorizon("\\d+", 0);
	  int y = Integer.parseInt(ys);
	  String rightParen = in.findWithinHorizon("\\)", 0);
	  points.add(new Point(x,y));
   }
}

But this is, without a doubt, rather ugly.

A more clever way to solve this problem is to change the delimiters that Scanner uses to break things up. Anything that Scanner considers to be a delimiter gets ignored except as a way of separating the “real” data into tokens. The normal pattern for delimiters could be written as "[\\s]+", where the square brackets mean “match any one thing inside these”, the \\s is a sepcial code for “any whitespace character”, and the plus sign means “at least one, but maybe more”. So this entire pattern matches one or more adjacent whitespace characters. If we add the parentheses and comma characters to this pattern:

public void readPoints (Scanner in, ArrayList<Point> points) {
   in.useDelimiter("[\\s\\(\\),]+");
   while (in.hasNext()) {
	  int x = in.nextInt();
	  int y = in.nextInt();
	  points.add(new Point(x,y));
   }
}  

then the original input

(14, 6)  (15,5)
( 7, 12)
(21,14)  (15,22) (155,76)

gets treated just as if it had been

14 6 15 5
7 12
21 14 15 22 155 76

In effect, the parentheses and commas get treated as blanks.

1.4.2 Formatting

1.5 I/O from the Command Line

1.6 Data Streams

1.7 Object Streams

This might not make a whole lot of sense yet, but it’s really very cool. In Java, it’s an easy matter to write out data that, in memory, consists of multiple distinct objects pointing to one another, and to later read them in and get a collection of new objects that is isormphic to (has the same inter-connecting pointer structure as) the original objects.

To do that in C++ requires a tremendous abount of code and/or a deep understanding of the pointer structures in the data. And the resulting I/O code is usually fragile - even small changes to the data structure may force you to rewrite your I/O algorithms entirely.

2 File I/O (Featuring NIO.2)

Stop! We’re not doing this section yet.