Next: Computer Solid State Up: Department of Computer Previous: Department of Computer

Computer Architecture Laboratory

/ Tuneo Ikedo / Professor
/ A. Y. Kondratyev / Associate Professor
/ Yamin Li / Associate Professor
/ Omar Hammami / Assistant Professor
/ Wanming Chu / Research Associate

The computer architecture laboratory is actively working on several projects and has achieved the following results for the year 1993:

GVIP.
The development and testing phases of the hardware and the software of this graphics processor project has been completed and the technology has been transferred to the Pioneer company, which will soon announce the processor as a commercial product.
Single-chip GVIP.
This is joint research with 5 companies and the development phase is for software and hardware in progress. A prototype is expected at the end of year 1994.
AIZU Supercomputer.
This project is well advanced from the hardware point of view and the PE board has already been designed and fabricated. Actually, the project has entered the evaluation and test phase for the hardware (PCB board, etc.). Software development is undertaken by the Distributed Parallel Processing Laboratory.
Memory Hierarchy Management.
Several algorithms have been designed for the management of various memory hierarchy configurations with a special emphasis on multiport data cache. Trace-driven simulations have been conducted and good results have been obtained using several typical benchmark traces. The hardware design of a multi-application multiport data cache has started and is in progress.
CAD systems for Asynchronous Design.
The aim of this project is to elucidate the relationship between asynchronous hardware implementation and behavioural specification of concurrent systems. There is a trend towards VLSI systems of ever increasing complexity, including safety-critical applications. The long-term goal of our basic research is to provide tools that will increase the confidence and productivity of digital logic designers in building such systems. The necessary and sufficient conditions for the absence of hazards in speed-independent circuits has been obtained for the first time. Based on these conditions, the synthesis method of hazard-free circuits was suggested. Quasi-polynomial algorithms of asynchronous circuits verification for speed-independence were developed.
PSMP.
Parallel Multithreaded Superscalar Processor (PMSP) architecture attempts to exploit both coarse-grain parallelism and fine-grain parallelism in a single processor environment. The simulation shows that a 2*2 PMSP (2-thread-slot, 2-instruction/slot/cycle) with eight functional units is capable of achieving up to a factor of four speed-up over a conventional RISC processor. We have proposed an instruction scheduling strategy, developed a performance prediction model, and finished the architecture description. Based on an embedded RISC processor which we have designed for a graphics system, we are now designing the detailed circuits of 2*2 PMSP.

Some of the projects in the Computer Architecture laboratory are conducted in collaboration with other laboratories in the university or with companies. Besides, some SCCP projects are conducted with students in order to introduce them to research.

Refereed Journal Papers

Tsuneo Ikedo. A scalable high performance graphics processor gvip. Visual Computer, 1994.
The GVIP (geometric and TV Image Processor) graphics processor which creates and synthesizes computer graphics and TV images and which meets the requirements of multi-media systems is described. The hardware modules which make up this graphics processor module : a 32-bit embedded RISC processor, a Phong and Gouraud shading processor, a texture mapping processor, a hidden surface removal processor, an HDTV video image processor, a BitBlt processor, an image processing module and an outline font fill generator. These hardware modules fabricated using 0.8 micons CMOS standard cells have been placed in three integrated circuit chips. The total number of gates used for one set of chips is approximately 350,000. Parallel link channels have been provided to manage I/O operations for multimedia devices with various data types and for three dimensional image synthesis of computer graphics and TV pictures. The drawing performance achieved by one set of chips was 1.2 million polygons/sec. when hidden surface removal, texture mapping and Phong shade polygon rendering were carried out simultaneously. Sets of chips may be interconnected to form large scale MIMD structures. A MIMD system with a drawing performance of 10 million polygons/sec. was built.
M. Kishinevsky, A. Kondratyev, and A. Taubin. Specification and analysis of self-timed circuits. Journal of VLSI Signal Processing, 7(1):117-135, 1994.
Problems of self-timed behavior specification and verification are considered on the basis of event models. A new model, Change Diagrams (CD), based on two types of causal relations (AND, OR) is suggested. The notion of CD correctness is introduced and the necessity and sufficiency of this notion for the implementation to be in self-timed class is shown. The original approach for CD correctness verification is considered. The main advantage of the method is a simple (polynomial) algorithm of CD analysis.
M. Kishinevsky, A. Kondratyev, A. Taubin, and V. Varshavsky. Analysis and identification of speed-independent circuits on an event model. Formal Methods in System Design, 4(1):33-75, 1994.
The object of this paper is the analysis of asynchronous circuits for speed-independence or delay-insensitivity. The circuits are specified as a netlist of logic functions describing the components. The analysis is based on a derivation of an event specification of the circuit behavior in a form of a Signal Graph. Signal Graphs can be viewed either as a formalization of timing diagrams, or as a signal interpreted version of Marked Graphs (a subclass of Petri Nets). The main advantage of this method is that a state explosion is avoided. A restoration of an event specification of a circuit also helps to solve the behavior identification problem, i.e. to compare the obtained specification with the desired specification. We illustrate the method by means of some examples.
Yamin Li. A accelerating technique for computer systems. Microcomputer and applications, (3), 1993.
Enhancing the processing speed of the existing systems becomes more important as the requirements for system performance increase. A hardware/software architecture for this purpose is proposed and evaluated in the paper.

Refereed Proceeding Papers

Omar Hammami. A concurrent hardware software management scheme for memory hierarchies. In IEEE, editor, IEEE International Symposium on Industrial Electronics, pages 368-373, New York, USA, May 1994. IEEE, IEEE Society.
In this paper, we propose to reduce execution time and to gain predictability by making use of a concurrent hardware software scheme for memory hierarchies. Making use of memory hierarchies will allow reducing memory access time while concurrency will relax memory bandwidth resource constraint. The software part of the scheme makes a static analysis of the real time application and generates a file containing special controller instructions. These instructions are generated and scheduled using artificial intelligence optimization techniques so as to assure an optimal concurrent management scheme of the memory hierarchy. The hardware part is composed by specially designed memory controllers which are connected to a dedicated bus which allows access to all the memory hierarchy levels. These controllers will execute the instructions associated to the application concurrently with the execution of the application on the microprocessor. Bus contention is avoided between the microprocessor executing the real time application and the controllers on the dedicated bus due to good scheduling generated at compile time.
Omar Hammami. A novel cache management using the a* algorithm. In IEA/AIE, editor, The 7th International Conference on Industrial &Engineering Applications of Artificial Intelligence &Expert Systems, pages 533-539, USA, June 1994. IEA/AIE, Gordon and Breach Science Pub.
The increasing power of processors sets a challenging problem for compiler designers. They have to optimize the use of the resources required by a program during its execution to avoid losing the benefit of this power due to resource conflicts. These resources can either be the registers, the functionnal units or the cache memory which is a small memory that contains the most recently referenced data. These problems are much known as the register allocation problem, the instruction scheduling problem and the cache management problem. In this paper, we propose the use of the A* algorithm for the data cache management problem and propose the n admissible heuristic. The algorithm deals with the basic problem but constitutes the core of a family of algorithms.
Omar Hammami. A compile time data cache management algorithm. In IEEE, editor, IEEE TENCON'94, New York, USA, Aug. 1994. IEEE, IEEE Society.
The design and control of memory hierarchies greatly affect the performance of microprocessors. Hardware schemes have been proposed to enhance successfully the hit rate of instruction and data caches in various architectures. However, the increasing frequency of microprocessors makes hardware schemes insufficient due to their poor look ahead capability. Compile time schemes make use of the compile time information and of the flow analysis of the program to manage data caches with special hardware support. In this paper, we propose a compile time data cache management algorithm for uniprocessors and prove its optimality. This algorithm is a branch and bound like algorithm making use of heuristics.
A. Kondratyev, M. Kishinevsky, B. Lin, P. Vanbekbergen, and A. Yakovlev. Basic gate implementation of speed-independent circuits. In Proc. of the 31st Design Automation Conference, pages 56-62, June 1994.
Existing methods for synthesis of speed-independent circuits under unbounded delay model have difficulties in combining the generality of formal approach with the practicality of the implementation architectures used at the logic level. This paper presents a characteristic property of the state graph specification, called the Monotonous Cover requirement, implying its hazard-free implementation within the standard structure of a two-level SOP logic and a row of latches. The overall synthesis procedure ensures satisfiability of this condition by applying the generalised state assignment approach.
A. Yakovlev, M. Kishinevsky, A. Kondratyev, and L. Lavagno. OR causality: Modelling and hardware implementation. In Proc of the 15th International Conference on Application and Theory of Petri Nets, June 1994.
Asynchronous circuits behave like concurrent programs implemented in hardware logic. The processes in such circuits are synchronised in accordance with the dynamic logical and causal conditions between switching events. The classical paradigm, easily represented in most process-oriented languages for concurrent systems modelling, is AND causality, which is often associated with a rendezvous synchronisation. In this paper we investigate a different, less known paradigm, called OR causality. We present a unified descriptive tool, called Causal Logic Net, which can equally handle both AND and OR-causality. A number of examples demonstrate the usefulness of this model in the synthesis of asynchronous control circuits.
A. Kondratyev, A. Taubin, and M. Kishinevsky. Self-timed formal design based on generalized behavioral specification. In Workshop on Design Automation, Moscow, Russia, July 1993.
A model generalizing model of Change Diagrams (CD) is presented. The advantages of CDs over the other assembler-type languages suggest an idea of topping the CD-based foundation with a high-level language intended for self-timed circuits behavioral design. One of the main objectives of such a language is to offer a formal tool for behavioral specification of analysis and design of open circuits.
M. Kishinevsky, A. Kondratyev, and A. Taubin. Synthesis method in self-timed design. Decompositional approach. In ICVC'93 - 1993 International conference on VLSI and CAD, pages 324-327, September 1993.
The decomposition method of synthesis of speed-independent circuits from Signal Transition Graphs (STG) is suggested. This method does not generate a state graph corresponding to the STG-specification, i.e. a state explosion is avoided. The resulting circuit is derived by assembling circuits implementing projections of the initial STG on various groups of signals. Both the method and the resulting circuit is of linear complexity from the size of the initial STG. The main advantage of the decomposition method is that it automatically solves the Unique State Coding problem.
A. Kondratyev, A. Taubin, V. Varshavsky, M. Kishinevsky, and E. Pissaloux. Change diagram: A behavioral model for very speed vlsi circuits / highly parallel systems. In Euromicro Workshop on Parallel and Distributed Processing. Proceedings, pages 220-226, Malaga, January 1994.
The paper proposes a formal model named change diagram (CD), for analysis and synthesis of the very high speed complex VLSI circuits/systems. The CD allows for unambiguous expression of concurrent activities not only of distributive processes, but also those of semi-modular. The equivalence of CDs and Petri Nets is investigated too. Possible usage of CDs in new generation simulators design is outlined.
Xianrong Ma and Yamin Li. Branch consistency analysis of dynamic branch prediction. In Proceedings of Third International Conference for Young Computer Scientists, Beijing, July 1993.
Branch prediction becomes more vital to delivering the potential performance of a wide-issue deep-pipelined architecture. Many prediction schemes are proposed and performances are measured by simulation. In this paper, we propose an approach of branch consistency analysis from which we construct dynamic prediction functions on the basis of average branch consistent length(ACL) to evaluate the performance of branch prediction schemes.

Books

M. Kishinevsky, A. Kondratyev, A. Taubin, and V. Varshavsky. Concurrent Hardware. The Theory and Practice of Self-Timed Design. J. Wiley &Sons, Chichester, New York, 1994.

Unpublished Textbooks

Sanli Li and Yamin Li. RISC Architecture and Instruction Level Parallel Processing. Tsinghua University Press, ISBN 7-302-01383-7, Beijing, 1993.

Technical Reports

O. Hammami. A compile time data cache management algorithm. 93-2-005, University of Aizu, December 1993.
O. Hammami and S. Ten. Instruction scheduling and cache management at the basic block level - algorithm and hardware support. 93-2-007, University of Aizu, December 1993.
T. Ikedo and N. Mirenkov. Aizu Supercomputer: A reality engine. 93-2/1-016, University of Aizu, November 1993.
Yamin Li, Tsuneo Ikedo, and Wanming Chu. Hardware organization of PMSP. 93-2-003, University of Aizu, December 1993.

Academic Activities

Tsuneo Ikedo, December 1993.
Attendance as a panelist at International TRON Research Associate Symposium.
Alex Yu. Kondratyev.
Participation in the University of Aizu Seminars on Self-Timing (together with Logic Design Lab., Computer Networks Lab. and Computer Arch. Lab.).
Alex Yu. Kondratyev.
Making presentations at Osaka and Kyoto universities.

Wanming Chu. Software development, April 1993.

Software Development 1. A cache simulator for a GVIP System ( a Graphics System), 2. A FIFO simulator for a new GVIP System, 3. A texture mapping program for 3D perspective projection, 4. A destorted texture mapping program for 2D and 3D objects, 5. Bezier curve and surface filling, 6. A performance simulator for a multithreaded RISC processor.

Wanming Chu. Hardware design, April 1993.

Hardware design datapath and control circuit design of an embedded RISC processor for a single-chip graphics processor.

enumerate

Next: Computer Solid State Up: Department of Computer Previous: Department of Computer

a-fujitu@edumng1.u-aizu.ac.jp
Fri Feb 10 09:19:38 JST 1995