Saturday, August 30, 2008

Image Loading in Potato

The Squeak Image Format
In order to understand the image loading process we first take a look at the image concept and the image format.
The concept of an image might sound strange for Java programmers, but has its advantages. Instead of having multiple class files containing algorithms and type description, everything resides in one file - the image. But the image is not just one mega class file, it is a snapshot of the memory of a running application. Thus it also contains all objects and data of the running application.

The picture shows the structure of a Squeak Image containing the objects of the application (blue) as well as meta data (green).



The objective of the image loading process is to read the bytes from the image file and reconstruct the objects they represent. The JSqueak approach to accomplish this involved a lot of handwork for the byte handling. This is now done in a more simple (and more Java-like) way by using the tools available in the Java class library, e.g., the FileCachedImageInputStream class. (Despite the word "Image" in its name this is a class from the Java library and has nothing to do with Squeak images in general.)

Handling Byte Order
When a Squeak image is saved to a file, it will either be in the big-endian or little-endian format, depending on the architecture. Similarly, when loading the image file, the byte order may has to be converted. Reading individual 32-bit words can be done with the help of the class FileCacheImageInputStream. It provides the possibility to adjust the byte order automatically to the big-endian byte order that is used as Java's default byte order.

Here you can see the detection of the image byte order with the help of the Java library:
int MAGIC_NUMBER = 6502;

ByteOrder detectAndSetEndianess() {
ByteOrder result = null;
try {
int firstWordinImage = super.readInt();

if (firstWordinImage == MAGIC_NUMBER) {
result = ByteOrder.BIG_ENDIAN; // Java default
} else if (Integer.reverseBytes(
firstWordinImage) == MAGIC_NUMBER) {
result = ByteOrder.LITTLE_ENDIAN;
}
// ...

super.seek(0); // reset stream
} catch (IOException ex) { ... }
return result;
}

The first word in the image file is compared to a fixed value: 6502. In the original version of Potato, these values were created from four bytes manually, after which their order was inverted manually, too.

In particular, when reading the image, one has to take into account that, for certain binary data, such as byte code and strings, the byte order must not be inverted during byte order conversion. Since the adjustment to the Java byte order, for reasons of simplicity, is done initially for all content,the byte order must be fixed later. This is done in the method decodeBytes of the class SqueakObject. Whether or not such a conversion is required for a particular object can be decided on the base of the format field of the SqueakObject-Header. This recovery step is required when the value of the field is greater than or equal to eight.

Extraction of Objects
In the course of studying the image reading procedure for extracting the individual elements from the Squeak image file we revised and simplified the code. The classes SqueakImageInputStream, SqueakImageHeader and SqueakObjectHeader were introduced.


The class SqueakImageInputStream has methods for reading complex data structures from the image file. The method readImageHeader reads all values from the image header and the methods readSqueakObject and readSqueakObjectHeader read complete objects from the image.


Squeak Object Layout
A Squeak object is represented by a header describing the object and a body part containing the actual object data. The header may consist of 1 to 3 header words, where the rightmost (mandatory) header contains information on the format of the object.
The header of a Squeak object consists of three fields. The first two fields are optional and indicate the size or the class identification in the case that these values are too big to be stored in the small standard header field. We realize this via a switch statement, where the case blocks are not ending with a break statement for the optional headers. Therefore the following header fields are read, too.

The processing of the different optional headers:
switch (headerType) {
case HeaderTypeSizeAndClass:
readSizeHeader(currentHeaderWord);
currentHeaderWord = imageInputStream.readInt();
case HeaderTypeClass:
readClassHeader(currentHeaderWord);
currentHeaderWord = imageInputStream.readInt();
case HeaderTypeShort:
readBaseHeader(currentHeaderWord);
break;
default:
throw new RuntimeException("unknown header");
}

In order to deal with the complex object header structure we introduced the class SqueakObjectHeader. The class contains fields for all header elements (e.g., object size and format) as well as utility methods for reading the header directly from a SqueakImageInputStream. After reading the object header, the rest of the object is read from the file and decoded into a SqueakObject instance according to the type indicated by the form field in the header.