/ Tuneo Ikedo / Professor
/ A. Y. Kondratyev / Associate Professor
/ Yamin Li / Associate Professor
/ Omar Hammami / Assistant Professor
/ Wanming Chu / Research Associate
The computer architecture laboratory is actively working on several projects and has achieved the following results for the year 1993:
Some of the projects in the Computer Architecture laboratory are conducted in collaboration with other laboratories in the university or with companies. Besides, some SCCP projects are conducted with students in order to introduce them to research.
Refereed Journal Papers
The GVIP (geometric and TV Image Processor) graphics processor which creates and synthesizes computer graphics and TV images and which meets the requirements of multi-media systems is described. The hardware modules which make up this graphics processor module : a 32-bit embedded RISC processor, a Phong and Gouraud shading processor, a texture mapping processor, a hidden surface removal processor, an HDTV video image processor, a BitBlt processor, an image processing module and an outline font fill generator. These hardware modules fabricated using 0.8 micons CMOS standard cells have been placed in three integrated circuit chips. The total number of gates used for one set of chips is approximately 350,000. Parallel link channels have been provided to manage I/O operations for multimedia devices with various data types and for three dimensional image synthesis of computer graphics and TV pictures. The drawing performance achieved by one set of chips was 1.2 million polygons/sec. when hidden surface removal, texture mapping and Phong shade polygon rendering were carried out simultaneously. Sets of chips may be interconnected to form large scale MIMD structures. A MIMD system with a drawing performance of 10 million polygons/sec. was built.
Problems of self-timed behavior specification and verification are considered on the basis of event models. A new model, Change Diagrams (CD), based on two types of causal relations (AND, OR) is suggested. The notion of CD correctness is introduced and the necessity and sufficiency of this notion for the implementation to be in self-timed class is shown. The original approach for CD correctness verification is considered. The main advantage of the method is a simple (polynomial) algorithm of CD analysis.
The object of this paper is the analysis of asynchronous circuits for speed-independence or delay-insensitivity. The circuits are specified as a netlist of logic functions describing the components. The analysis is based on a derivation of an event specification of the circuit behavior in a form of a Signal Graph. Signal Graphs can be viewed either as a formalization of timing diagrams, or as a signal interpreted version of Marked Graphs (a subclass of Petri Nets). The main advantage of this method is that a state explosion is avoided. A restoration of an event specification of a circuit also helps to solve the behavior identification problem, i.e. to compare the obtained specification with the desired specification. We illustrate the method by means of some examples.
Enhancing the processing speed of the existing systems becomes more important as the requirements for system performance increase. A hardware/software architecture for this purpose is proposed and evaluated in the paper.
Refereed Proceeding Papers
In this paper, we propose to reduce execution time and to gain predictability by making use of a concurrent hardware software scheme for memory hierarchies. Making use of memory hierarchies will allow reducing memory access time while concurrency will relax memory bandwidth resource constraint. The software part of the scheme makes a static analysis of the real time application and generates a file containing special controller instructions. These instructions are generated and scheduled using artificial intelligence optimization techniques so as to assure an optimal concurrent management scheme of the memory hierarchy. The hardware part is composed by specially designed memory controllers which are connected to a dedicated bus which allows access to all the memory hierarchy levels. These controllers will execute the instructions associated to the application concurrently with the execution of the application on the microprocessor. Bus contention is avoided between the microprocessor executing the real time application and the controllers on the dedicated bus due to good scheduling generated at compile time.
The increasing power of processors sets a challenging problem for compiler designers. They have to optimize the use of the resources required by a program during its execution to avoid losing the benefit of this power due to resource conflicts. These resources can either be the registers, the functionnal units or the cache memory which is a small memory that contains the most recently referenced data. These problems are much known as the register allocation problem, the instruction scheduling problem and the cache management problem. In this paper, we propose the use of the A* algorithm for the data cache management problem and propose the n admissible heuristic. The algorithm deals with the basic problem but constitutes the core of a family of algorithms.
The design and control of memory hierarchies greatly affect the performance of microprocessors. Hardware schemes have been proposed to enhance successfully the hit rate of instruction and data caches in various architectures. However, the increasing frequency of microprocessors makes hardware schemes insufficient due to their poor look ahead capability. Compile time schemes make use of the compile time information and of the flow analysis of the program to manage data caches with special hardware support. In this paper, we propose a compile time data cache management algorithm for uniprocessors and prove its optimality. This algorithm is a branch and bound like algorithm making use of heuristics.
Existing methods for synthesis of speed-independent circuits under unbounded delay model have difficulties in combining the generality of formal approach with the practicality of the implementation architectures used at the logic level. This paper presents a characteristic property of the state graph specification, called the Monotonous Cover requirement, implying its hazard-free implementation within the standard structure of a two-level SOP logic and a row of latches. The overall synthesis procedure ensures satisfiability of this condition by applying the generalised state assignment approach.
Asynchronous circuits behave like concurrent programs implemented in hardware logic. The processes in such circuits are synchronised in accordance with the dynamic logical and causal conditions between switching events. The classical paradigm, easily represented in most process-oriented languages for concurrent systems modelling, is AND causality, which is often associated with a rendezvous synchronisation. In this paper we investigate a different, less known paradigm, called OR causality. We present a unified descriptive tool, called Causal Logic Net, which can equally handle both AND and OR-causality. A number of examples demonstrate the usefulness of this model in the synthesis of asynchronous control circuits.
A model generalizing model of Change Diagrams (CD) is presented. The advantages of CDs over the other assembler-type languages suggest an idea of topping the CD-based foundation with a high-level language intended for self-timed circuits behavioral design. One of the main objectives of such a language is to offer a formal tool for behavioral specification of analysis and design of open circuits.
The decomposition method of synthesis of speed-independent circuits from Signal Transition Graphs (STG) is suggested. This method does not generate a state graph corresponding to the STG-specification, i.e. a state explosion is avoided. The resulting circuit is derived by assembling circuits implementing projections of the initial STG on various groups of signals. Both the method and the resulting circuit is of linear complexity from the size of the initial STG. The main advantage of the decomposition method is that it automatically solves the Unique State Coding problem.
The paper proposes a formal model named change diagram (CD), for analysis and synthesis of the very high speed complex VLSI circuits/systems. The CD allows for unambiguous expression of concurrent activities not only of distributive processes, but also those of semi-modular. The equivalence of CDs and Petri Nets is investigated too. Possible usage of CDs in new generation simulators design is outlined.
Branch prediction becomes more vital to delivering the potential performance of a wide-issue deep-pipelined architecture. Many prediction schemes are proposed and performances are measured by simulation. In this paper, we propose an approach of branch consistency analysis from which we construct dynamic prediction functions on the basis of average branch consistent length(ACL) to evaluate the performance of branch prediction schemes.
Books
Unpublished Textbooks
Technical Reports
Academic Activities
Attendance as a panelist at International TRON Research Associate Symposium.
Alex Yu. Kondratyev.
Participation in the University of Aizu Seminars on Self-Timing (together with Logic Design Lab., Computer Networks Lab. and Computer Arch. Lab.).
Alex Yu. Kondratyev.
Making presentations at Osaka and Kyoto universities.
Wanming Chu.
Software development,
April 1993.
Software Development 1. A cache simulator for a GVIP System ( a Graphics System), 2. A FIFO simulator for a new GVIP System, 3. A texture mapping program for 3D perspective projection, 4. A destorted texture mapping program for 2D and 3D objects, 5. Bezier curve and surface filling, 6. A performance simulator for a multithreaded RISC processor.
Wanming Chu.
Hardware design,
April 1993.
Hardware design datapath and control circuit design of an embedded RISC processor for a single-chip graphics processor.
enumerate