The RASSP Digest - Vol. 2, 2nd. Qtr. 1995


Timing Insensitive Binary-to-Binary Translation (TIBBIT)

By Bryce Cogswell and Zary Segall


1. Introduction

The TIBBIT project is developing novel methods of applying binary-to-binary translation (BBT) technology to real-time and embedded applications. While BBT has been used commercially for translating workstation-class applications by companies such as DEC, HP, Tandem, NonStop and others, the technology has yet to be applied to the domain of real-time and embedded systems in which applications can be very tightly tied to the underlying hardware in terms of both functionality and timing. Translation methods developed under TIBBIT allow multiple real-time and embedded applications to be migrated from dedicated processors to newer, faster multiprocessing systems while ensuring that the hardware/software interfaces, and the timing of I/O events generated or processed by the applications, is kept equivalent to that of the original application and platform. A retargetable framework is employed which supports a wide variety of architectures including digital signal processors.

In the pursuit of increased performance at reduced cost, real-time and embedded applications may resort to ad hoc methods of enforcing the timing of operations, and there is little guarantee that event timing will be regulated by hardware alarms or a timer-driven scheduler. The result is that migration to a different processor with different timing can disrupt the careful balance of timing incorporated in the original design, leading to subtle bugs or complete failure on the target platform. Figure 1 diagrams the problem of ensuring equivalence of both application and hardware interactions.

The goals of the TIBBIT methodology are as follows:

2. Approach

The type of code that is of concern when migrating to a faster processor is that which implicitly uses the processor performance to regulate program timing, such as:

Read_Port();
...compute...
Read_Port();
...compute...
Read_Port();
...compute...


The minimum delay between consecutive reads of the port, which is satisfied on the source processor by the time spent performing intermediate computations, may or may not be satisfied after migration to a faster target.

Our approach to the problem is to precisely track and mimic the timing behavior of the code as it was when executing on the source processor. During the translation from the source to the target, a timing code is inserted into each block of instructions that describes the amount of time required to execute that block on the source processor. This timing information is analyzed as the target processor executes each block in turn, and is used to compute the time at which the current block on the target would be executed on the source. This forms a virtual clock which tracks time as it passes on the source. We call this the source clock. It contrasts with the target clock, which is the real, or wall-clock, time. Figure 2 shows how the code is augmented with timing information such that even in the presence of loops or conditional execution, the time spent executing the block on the source processor is known.

At regular intervals during execution the target processor compares the values of the source clock and the target clock and determines the difference between its progress and the progress the application would have achieved on the source processor. The result is used as feedback to the scheduler that runs the various applications that have been migrated to the target processor.

Scheduling on the target can be done two ways: To maximize the degree of timing equivalence, a dedicated target processor can be used; the application can be scheduled on the target under rate monotonic scheduling. Using RMS allows multiple TIBBIT translated and native applications to be executed concurrently.

3. Analysis

Given an application that a user wishes to binary translate to a target platform with a specified degree of timing equivalence, we wish to determine whether the translated application will meet the user-specified timing-equivalence requirements under worse-case conditions. The modeling of translated tasks is performed by considering the maximum amount of time required on the target to execute a code fragment requiring a given amount of time on the source processor, and then adding in the overhead of performing the TIBBIT scheduling.

Table 1 provides a summary of the parameters impacting TIBBIT schedulability, while Table 2 specifies the greatest amount by which timing on the target processor can become out of sync with the source. The precondition column specifies the condition that must hold for the task to be schedulable under TIBBIT, while the max behind and max ahead columns bound the maximum timing error that can occur.

4. Results

The algorithms and models developed under TIBBIT have been validated by translating a variety of applications developed for the Motorola M68000 Education computer to both Unix and PC platforms. Most of the applications are drawn from a mix of C and assembly language programs given as lab assignments for the undergraduate Real-Time Systems class at Carnegie-Mellon University, and represent systems whose implementation and timing is unknown to the user of the translator.

The timing equivalence of translated code has been measured as accurate as 80 microseconds in the short-term and 0.1% in the long term, with an overhead of about 20% additional processor utilization due to timing instrumentation.

An example of the abilities of the system is an application that reads a data set from the serial port, performs a simulation based on the data, and returns a single byte indicating whether the simulation was successful or not. A host program executes on another machine, communicating via the serial line, and records the time recorded for the simulation to complete for various data sets. This test highlights the problem of modeling the timing behavior of an application whose I/O timing is completely data dependent. The time required for each simulation varies according to the contents of the data set, and it is essentially impossible to determine how long it will last before performing it.

Figure 3 shows the execution time recorded by the host for a particular data set, in which the RMS period the translated application is scheduled with is varied from 100 to 100000 microseconds. 10 trials are run at each period, and the graph shows the average and the actual measured values. The ‘V’ shows the bounds on timing that are predicted by the model, while the line in the center gives the average timing of the trials.

Increasing the period decreases the processor utilization required on the target, since context switching and scheduling overhead is reduced. At the same time, however, the accuracy of the time equivalence is diminished. The graph demonstrates that the bounds predicted by the models are accurate over a range of scheduling periods.

The largest application translated to date is a real-time M68000-based operating system, implemented in a combination of C and assembly code, which allows one to execute one of 6 different applications embedded in it, each application spawning from 1 to 8 tasks that interact with each other, the O.S. and the user. The operating system contains a flexible scheduler that can be configured to use any of the common scheduling algorithms and provides support for semaphores, priority control, interrupt driven or polled I/O, etc.

This application was assigned as the final project of the Real-Time Systems class, and the applications that execute on top of it are designed to stress the system and make evident weaknesses in the performance and fairness of the real-time scheduler. TIBBIT translation of this application similarly stresses the ability of the target platform to maintain those same timing constraints, and when the degree of timing equivalence is relaxed beyond about 5 millisecond, many of the sample applications fail with obvious regularity.

5. Conclusions

The TIBBIT project has developed a means of binary translating real-time and embedded applications from slower to faster processors while maintaining the timing characteristics of the original host. The translation is performed by augmenting the translated code with timing information from the source processor, and using that information at run time to ensure that the timing of events corresponds with the original timing. The algorithms have been modeled and validated under a real implementation, and results demonstrate that timing equivalence can be maintained to within 80 microseconds or 0.1%.

The RASSP Digest - Vol. 2, 2nd. Qtr. 1995

95q2/news_7.html