Next: Computer Solid State Up: Department of Computer Previous: Department of Computer

Computer Architecture Laboratory


/ Tuneo Ikedo / Professor
/ Michael Kishinevsky / Professor
/ Li San Li / Visiting Professor
/ Robert H. Fujii / Associate Professor
/ A. Y. Kondratyev / Associate Professor
/ Yamin Li / Associate Professor
/ Wanming Chu / Research Associate

Computer Architectrue Laboratory is organized with 7 faculty members and one visiting researcher. The followings re the summary of each members.

  1. Prof. T. Ikedo:

    Multimedia Processor Architecture: Multimedia processor has been developed since 1995, which implements font, graphics, audio, video processing in a single chip of 10 million gates. we focused on the architecture design of computer graphics renderer in fine grain processing in 1996 and 30Gflop reconfigurable multimedia accelerator (coarse grain processing). The development of reconfigurable accelerator is a succession of Aizu supercomputer project. 4 graduate students and 8 undergraduate students join in this research.

  2. Profs. Y. Li and W. Chu:

    1. Research: (1) Parallel Multithreaded Architecture We have examined various design issues of the parallel multithreaded architecture. A prototype of parallel multithreaded processor was designed. We are working on the performance evaluation in order to enhance the the computation capabilities of the processor. (2) Computer Arithmetic Algorithms and Implementations We have examined latency, throughput and complexity for various computer arithmetic algorithms and developed two new algorithms. The hardware designs are also finished and prototyped with FPGAs.

    2. Teaching: We taught Computer Architecture (Li and Chu), Computer Organization I and Organization II (Li), and the laboratory of Logic Circuit Design (Chu). We also developed and updated two processor design projects: Sim2 processor design for the laboratory of Computer Architecture and Aizup pipeline processor design and implementation on FPGA for the laboratory of Computer Organization I. All of the teaching notes and laboratories handouts are prepared and available at Li's homepage, http://www.u-aizu.ac.jp/~yamin/.

  3. Profs. M. Kishinevsky and A. Kondratyev:

    Design Automation of Concurrenct and Asynchronous Systems: We have reached further progress in research in automation of asynchronous design, embedded reactive systems and models of concurrency. In a few research directions we continued close cooperation with Prof. A. Taubin (Computer Education Lab.) and with research groups in Cadence Berkeley Labs. (USA), Politecnico di Torino (Italy), University Politecnica de Catalunya (Barcelona, Spain) and University of Newcastle upon Tyne (England).

    Research results: 1. Embedded systems and models of concurrency: (1). We have developed a new model, called Place Chart Nets (PCN). It allows the modelling of both asynchronicity and exception handling (preemption). PCNs specify a system behavior using partial orders. PCNs have a notion of hierarchy, which is determined by preemption. PCNs is a non-trivial generalization of classical PNs, in the sense that (1) for the finite (bounded) case modeling a PCN may require an exponentially larger PN, (2) for the infinite (unbounded) case a class of PCN languages properly includes a class of PN languages and (3) k-boundedness of PCNs is decidable. (2). We developed a method for synthesis of PCNs starting from labeled transition systems. We considered applications of PCNs for design of embedded reactive systems in hardware/software codesign framework. 2. Technology mapping of speed-independent circuits: We have reached significant progress in solving a problem of technology mapping for speed-independent circuits based on two different techniques: (1). Algebraic factorization: the proposed method performs both combinational (inserting new gates) and sequential (inserting new memory elements) decomposition of complex gates in a given standard cell library, while preserving original behaviour and speed-independence. (2). Boolean decomposition: the proposed method iteratively performs Boolean decomposition of each complex gate using Boolean relations, as opposed to the less powerful algebraic factorization approach used in previous methods. After logic decomposition, the overall library matching and optimization is carried out. 3. Testing of path delay faults: We provided effective procedures to solve the initialization and the test pattern generation problems for the path delay fault testing of asynchronous circuits. Experimental results shows that a high level of path delay fault testability can be achieved with partial scan.

    We have presented a few Invited tutorials at the Summer School, International Conference, and Fujitsu Labs.

    Teaching: we taught logic design, computer architecture in the undegradute school and "Synthesis and Optimization of Digital Circuits" in the graduate school. We have lead graduation reserch projects.

  4. Prof. Robert H. Fujii:

    This year's activities have included the design of a microcontroller controlled motorized wheelchair prototype, the design of a fuzzy logic controller using Verilog, and the design of a high speed pipeline control unit using Verilog. Implementation of the motorized wheelchair prototype into a real system which can be ridden by people with disabilities is being carried out. Analog circuit designs of various modules will be forthcoming.

  5. Prof. Jianhua Ma:

    My researches in 1996 are devoted to improving algorithms of the Truga001 graphics chip and modeling multimedia hyperworld. A new circuit combined Phong shading with bump mapping has been developed. Several bump-mapped shading pictures have been produced to test the developed algorithm and circuit. The interpolation method has been proposed to improve bump-mapped shading effects. Texture mapping and video mapping performance of the Truga001 has been simulated and evaluated. The hyperworld is an integration of various interaction worlds in different time, space and reality. Our research starts from basic features of the hyperworld, composition and models of an one-to-many interaction system. We are now focusing on developing the prototype of an educational hyperworld system, called Cheer.


Refereed Journal Papers

  1. A. Kondratyev and M. Kishinevsky and A. Taubin and J. Cortadella and L. Lavagno, The Use of Petri nets for the design and verification of asynchronous circuits and systems. Journal of Circuits, Systems, and Computers, vol.8, no.1, p.67--118, 1998.

    Petri nets are a powerful formalism for modeling concurrent systems. They are capable of implicitly describing a vast state space by a succinct representation which gracefully captures the notions of causality, concurrency and conflict between events. Petri nets have also been chosen by many authors as a formalism to describe the behavior of asynchronous circuits by interpreting the events as signal transitions, thus coining the term Signal Transition Graph. A design framework for asynchronous systems involves three main aspects: formal specification, verification and synthesis. In this paper we review the main techniques we have used to cover these aspects in recent years, with a special focus on asynchronous circuits.

  2. A. Taubin, M. Kishinevsky, and A. Kondratyev., Deadlock prevention using petri nets and their unfoldings. International Journal of Advanced Manufacturing Technology, vol.14, no.10, p.750--759, 1998.

    Unfoldings of Petri Nets (PN) provide a method for analysis of concurrent systems without restoring the state space of a system. This allows one to overcome the ``state explosion'' problem while dealing with Petri Nets. Many properties of the initial PN (boundedness, safety, persistency and hazards) can be checked on-the-fly by constructing the unfolding. This paper presents techniques for deadlock prevention based on unfoldings of PN, refereed.

  3. A. Kondratyev, M. Kishinevsky, A. Taubin, and S. Ten., Analysis of Petri nets by ordering relations in reduced unfoldings. Formal Methods in System Design, vol.12, no.1, p.5--38, 1998.

    This paper suggests a way for Petri Net analysis by checking the ordering relations between places and transitions. The method is based on unfolding the original net into an equivalent acyclic description. In an unfolding the ordering relations can be determined directly by the structure of an underlying graph. We improved on the previously known cutoff criterion for truncating the unfolding. No restrictions are imposed on the class of general PNs. The new criterion significantly reduces the size of unfolding obtained by PN.

  4. Alex Kondratyev and Michael Kishinevsky and Alexandre Yakovlev, Hazard-free Implementation of Speed-Independent Circuits. p.749--771, IEEE Transactions on Computer-Aided Design, vol.17, no.9, 1998.

    This paper develops a theoretical framework for the hazard-free gate-level implementation of speed-independent circuits specified by event-based models, such as Signal Transition Graphs. It presents sufficient conditions, called the generalized Monotonous Cover requirements, for a hazard-free circuit to be built within a standard implementation structure. This structure consists of two-level simple-gate combinational logic and a row of latches, either a C-element or a \RS/-latch. A set of semantic-preserving transformations is defined that can be applied to an original behavioural description of the circuit, so as to produce its specification in the form that satisfies the Monotonous Cover requirement. The main result of the paper is therefore twofold: (1) the proof that any speed-independent behavior can be implemented at the gate level without hazards, and (2) an efficient method for constructing such an implementation.

  5. A. Kondratyev and J. Cortadella and M. Kishinevsky and L. Lavagno and A. Yakovlev, Logic Decomposition of Speed-Independent Circuits. Proceedings of the IEEE, vol.87, no.2, 1999.

    Logic decomposition is a well-known problem in logic synthesis, but it poses new challenges when targeted to speed-independent circuits. The decomposition of a gate into smaller gates not only must preserve the functional correctness of a circuit but also speed-independence, i.e. hazard-freedom under unbounded gate delays. This paper presents a new method for logic decomposition of speed-independent circuits that solves the problem in two major steps: 1) logic decomposition of complex gates and 2) insertion of new state signals that preserve hazard-freedom. The method is shown to be more general than previous approaches and its effectiveness is evaluated by experiments on a set of benchmarks.

  6. Sanli Li and Yamin Li, New Processor Architecture with Extremely Large Scale Integration(ELSI). Mini-Micro Systems, vol.18, no.12, p.1--7, 1997.

    Nowadays, the Computer Architecture designers are discussing about the next generation processor architecture and the future computer technology. With the rapid development of Micro-electronics, it is possible to integrate One Billions transistors on one chip. Such Extremely Large Scale Integration(ELSI) extends a wide area for computer architecture designers to construct the wholly new kinds of processor architecture. In this paper, it surveys four kinds of so-called ``One Billion Transistors on Chip'', i.e. High Performance Uniprocessor(UNIPROC), Simultaneous Multi-Threaded Processor(SMT), Multiprocessor on one chip(CMP) and Intelligent RAM(IRAM). These four kinds of processor architecture fully utilize the Instruction Level Parallelism ILP and Thread Level Parallelism, fully utilize the high bandwidth of data transfer on chip, or fully utilize the high bandwidth between Memory/Processor on chip. These new processor architectures would bring the peak performance reaching 16GFLOPs per chip, it must have the deep influences on the future computer technology. Processor designers should take sufficient consideration of some special issues brought by the micro-electronics implementations for the extremely large scale integration chip.

  7. Yamin Li, Parallel Multithreaded Architecture - A New Processor Architecture. Mini-Micro Systems, vol.18, no.12, p.8--13, 1997.

    This paper introduces a new processor architecture, called ``Parallel Multithreaded Architecture (PMA)''. A PMA processor consists of multiple ``logical processors''. Multiple pipelined functional units are provided and shared by the logical processors. The PMA processor tries to issue multiple instructions from multiple instruction threads in every clock cycle. Compared to Single-Chip Multiprocessors, the PMA processor can exploit both the Instruction Level Parallelism (ILP) and Thread Level Parallelism (TLP), and can efficiently use the large amount of functional units. A possible PMA processor implementation and the issues of software supporting are also discussed.

  8. Sanli Li, Yamin Li, Heng Liao and Wanming Chu, THAZ-Net: An Interconnection Network for Large Scale Cluster Computing. Australian Computer Science Communications, vol.20, no.4, p.123-132, 1998.

    An Interconnection Network (IN), called THAZ-Net, for Networked Parallel Computing (NPC) systems is fully described: the network media, network routing and network interface adapter are considered. The design of THAZ-Net employs circulating priority scheduling, cut-through and data partitioning techniques similar to the FLIT of Worm-Hole Routing used in MPPs. The design of the Network Interface Adaptor demonstrates the feasibility of constructing a NPC system. The basic idea of THAZ-Net is to combine the IN technology of NPCs and MPPs, providing an IN for building an NPC with a considerable proportion of the computing nodes commercially available. We term this ``NPC with moderate communication distance'' (NPC/MOCOD). An NPC/MOCOD offers the high peak performance of MPP and the flexibility which enables easy construction and use for NPC.

Refereed Proceeding Papers

  1. S. Fujino and R. H. Fujii, VLSI Design of a RISC Processor Using Parthenon and Verilog Hardware Description Languages. Tenth Parthenon Research Conference, p.59 - 62, Parthenon Research Association, Parthenon Research Association, Kyoto, Japan, July 1997.

    The design of a 32-bit RISC processor carried out using Verilog-Synergy and PARTHENON is described. Semantic, procedural, and optimization differences are analyzed and utilized to take advantages of the strengths of the two hardware description languages.

  2. R. H. Fujii, Microelectronic Systems Design Educational Challenge. 1997 IEEE International Conference on Microelectronic Systems Education, p.91 - 92, IEEE, Arlington, Virginia, U.S. A. July 1997.

    Microelectronic systems design education at Japanese Universitites and the support provided by industry and government are described.

  3. R. H. Fujii and Y. Honda, VLSI Computer System for Motorized Wheelchair. 12th Japanese Conference on Advancement of Rehabilitation Technology, Japan Rehabilitation Engineering Planning Committee, p.61 - 64, Japan Rehabilitation Engineering Society, Japan Rehabilitation Engineering Society, Sasebo, Japan. August 1997.

    The design of a prototype motorized wheelchair controlled by a VLSI microcontroller and various sensors is described. Its performance on various terrains and courses is decsribed.

  4. L. M. Schmitt, C. L. Nehaniv, and R. H. Fujii, Linear Analysis of Genetic Algorithms. International Workshop on Mathematical and Computational Biology, Computational Morphogenesis, Hierarchical Complexity, and Digital Evolution, p.39, University of Aizu, Aizu- Wakamatsu, Fukushima, Japan. October 1997.

    A survey of recent theoretical results on convergence of genetic algorithms, based upon spectral analysis of Markov matrices for the distinct phases of a genetic algorithm is presented.

  5. M.Kishinevsky, J. Cortadella, A. Kondratyev, L.Lavagno, A.Taubin, A. Yakovlev, Coupling Asynchrony and interrupts: Place Chart Nets and their Synthesis. Proc. of International Conference on Application and Theory of Petri Nets, p.328--347, Springer, Toulouse, June 1997.

    A model, called Place Chart Nets (PCN), is presented. It allows the modelling of both asynchronicity and exception handling (preemption). Contrary to State Charts and other reactive models, which are inherently synchronous, PCNs specify a system behavior using partial orders. Contrary to Petri nets, PCNs have a notion of hierarchy. Contrary to other hierarchical models based on Petri net extensions, the hierarchy in PCNs is determined by preemption. We present a method for synthesis of safe PCNs from transition systems, which generalizes the theory of regions previously developed for PCNs.

  6. M. Kishinevsky and A. Kondratyev and L. Lavagno and A. Saldanha and A. Taubin, Delay Fault Testing of Asynchronous Sequential Circuits. Proc. of International Workshop on Logic Synthesis, Granlibakken Resort, California, May 1997.

    Hazard-free robust path delay fault testing (HFRPDFT) is a critical problem in asynchronous circuits that can insure correct and hazard-free operation only in the absence of delay faults (isocronic forks, negligible wire delays, bounded gate and wire delays, etc.). Asynchronous nets are circuits without clock signals composed from combinational gates and asynchronous latches (in this paper we consider only C-elements) such that feedbacks are allowed only inside latches. We reduced the problem of HFRPDFT for asynchronous nets to the classical problem of single stuck fault test generation for combinational logic. This yields an efficient algorithm for robust fault test generation of asynchronous circuits. We consider two testing schemes: a full output scan technique, simplified in comparison with the previously published results, for which each testable path can be tested with two vectors, and a partial scan -- for which applying testing pair must be preceded with an initialization phase. In the latter case a test can be a longer sequence of testing vectors. Experimental results illustrates the method.

  7. M. Kishinevsky and A. Kondratyev and L. Lavagno and A. Saldanha and A. Taubin, Partial scan delay fault testing of asynchronous circuits. Proceedings of the International Conference on Computer-Aided Design, p.728--735, IEEE, San-Jose, California, November 1997.

    Asynchronous circuits operate correctly only under timing assumptions. Hence testing those circuits for delay faults is crucial. Previous work has shown that full-scan delay-fault testing of asynchronous circuits is feasible. In this work we tackle the problem of partial scan testing, that requires test pattern generation on a sequential circuit. We show how this problem can be effectively reduced to a classical problem of stuck-at test pattern generation for a related combinational circuit. The technique is complete, automated, and requires only partial scan of some memory element outputs.

  8. A. Taubin, A. Kondratyev, and M. Kishinevsky. Applications of petri nets unfoldings to asynchronous design. Proc. of 1997 IEEE International Conference on Systems, Man, and Cybernetics, p.4279--4284, IEEE, Orlando, Florida, October 1997.

    An unfolding is a finite acyclic prefix of a Petri Net behavior, which preserves all essential properties of the original Petri net, in particular all reachable markings of the net. An unfolding allows one to analyze partial orders between instances of places and events of the original net in a much simpler form due to absence of cycles. Cutoff criteria for truncating an infinite occurrence net into finite unfoldings are reviewed. We then show how unfoldings can be used for analysis of different properties of Petri Nets: boundedness, safety, persistency, deadlocks, etc. Signal Transition Graphs are interpreted Petri nets widely used for specification and design of asynchronous control circuits. We show how unfoldings can be used at different stages of the design cycle.

  9. A. Kondratyev, J. Cortadella, M. Kishinevsky, L. Lavagno, A. Taubin, and A. Yakovlev, Identifying state coding conflicts in asynchronous system specifications using petri net unfoldings. Proc. of the International Conference on Application of Concurrency to System Design (CSD'98), p.152--163, IEEE, Fukushima, Japan. March 1998.

    State coding conflict detection is a fundamental part of synthesis of asynchronous concurrent systems from their specifications as Signal Transition Graphs (STGs), which are a special kind of labelled Petri nets. The paper develops a method for identifying state coding conflicts in STGs that is intended to work within a new synthesis framework based on Petri net unfolding. The latter offers potential advantages due to a partial order representation of highly concurrent behaviour as opposed to the more traditional construction of a state graph, known to suffer from combinatorial explosion. We develop a necessary condition for coding conflicts to exist, by using an approximate state covering approach. Being computationally easy, yet conservative, such a solution may produce fake conflicts. A technique for refining the latter, with extra computational cost, is provided.

  10. Alex Kondratyev, Jordi Cortadella, Michael Kishinevsky, Luciano Lavagno, Alexander Taubin, and Alex Yakovlev. Lazy transition systems: application to timing optimization of asynchronous circuits. Proceedings of the International Conference on Computer-Aided Design, p.324--331, IEEE, San-Jose, California, November 1998.

    This paper introduces Lazy Transitions Systems (LzTSs). The notion of laziness explicitly distinguishes between the enabling and the firing of an event in a transition system. LzTSs can be effectively used to model the behavior of asynchronous circuits in which relative timing assumptions can be made on the occurrence of events. These assumptions can be derived from the information known a priori about the delay of the environment and the timing characteristics of the gates that will implement the circuit. The paper presents necessary conditions to synthesize circuits with a correct behavior under the given timing assumptions. Preliminary results show that significant area and performance improvements can be obtained by exploiting the extra ``don't care'' space implicitly provided by the laziness of the events.

  11. Yamin Li and Wanming Chu, Implementation of Single Precision Floating Point Square Root on FPGAs. IEEE Symposium on FPGAs for Custom Computing Machines (FCCM97), K. L. Pocek and J. Arnold, p.226--232, IEEE, IEEE Computer Society Press, Apr. 1997.

    Square root operation is hard to implement on FPGAs because of the complexity of the algorithms. In this paper, we present a non-restoring square root algorithm and two very simple single precision floating point square root implementations based on the algorithm on FPGAs. One is low-cost iterative implementation that uses a traditional adder/subtractor. The operation latency is 25 clock cycles and the issue rate is 24 clock cycles. The other is high-throughput pipelined implementation that uses multiple adder/subtractors. The operation latency is 15 clock cycles and the issue rate is one clock cycle. It means that the pipelined implementation is capable of accepting a square root instruction on every clock cycle.

  12. Yamin Li and Wanming Chu, Parallel-Array Implementations of A Non-Restoring Square Root Algorithm. Int'l Conf. on Computer Design -- VLSI in Computers and Processors (ICCD97), p.690--695, IEEE Computer Society Press, Oct. 1997.

    In this paper, we present a parallel-array implementation of a new non-restoring square root algorithm (PASQRT). The carry-save adder (CSA) is used in the parallel array. The PASQRT has several features unlike other implementations. First, it does not use redundant representation for square root result. Second, each iteration generates an exact resulting value. Next, it does not require any conversion on the inputs of the CSA. And last, a precise remainder can be obtained immediately. Furthermore, we present an improved version --- a root-select parallel-array implementation (RS-PASQRT) for fast result value generation. The RS-PASQRT is capable of achieving up to about 150\% speedup ratio over the PASQRT. The simplicity of the implementations indicates that the proposed approach is an alternative to consider when designing a fully pipelined square root unit.

  13. Yamin Li and Wanming Chu, Implementation of Single Precision Floating Point Square Root on FPGAs. IEEE Symposium on FPGAs for Custom Computing Machines (FCCM97), K. L. Pocek and J. Arnold, p.226--232, IEEE, IEEE Computer Society Press, Apr. 1997.

    Square root operation is hard to implement on FPGAs because of the complexity of the algorithms. In this paper, we present a non-restoring square root algorithm and two very simple single precision floating point square root implementations based on the algorithm on FPGAs. One is low-cost iterative implementation that uses a traditional adder/subtractor. The operation latency is 25 clock cycles and the issue rate is 24 clock cycles. The other is high-throughput pipelined implementation that uses multiple adder/subtractors. The operation latency is 15 clock cycles and the issue rate is one clock cycle. It means that the pipelined implementation is capable of accepting a square root instruction on every clock cycle.

  14. Yamin Li and Wanming Chu, Parallel-Array Implementations of A Non-Restoring Square Root Algorithm. Int'l Conf. on Computer Design -- VLSI in Computers and Processors (ICCD97), p.690--695, IEEE Computer Society Press, Oct. 1997.

    In this paper, we present a parallel-array implementation of a new non-restoring square root algorithm (PASQRT). The carry-save adder (CSA) is used in the parallel array. The PASQRT has several features unlike other implementations. First, it does not use redundant representation for square root result. Second, each iteration generates an exact resulting value. Next, it does not require any conversion on the inputs of the CSA. And last, a precise remainder can be obtained immediately. Furthermore, we present an improved version --- a root-select parallel-array implementation (RS-PASQRT) for fast result value generation. The RS-PASQRT is capable of achieving up to about 150\% speedup ratio over the PASQRT. The simplicity of the implementations indicates that the proposed approach is an alternative to consider when designing a fully pipelined square root unit.

Academic Activities

  1. Alex Yu. Kondratyev, Chair: International Conference on Application of Concurrency to System Design (CSD'98).

Others

  1. Honda Yoshihiko, Microcontroller Based Robot Control. The University of Aizu, 1998. Thesis Advisor: R. H. Fujii.

  2. Kubo Keisuke, Subsethood Defuzzification for Fuzzy Logic Controller. The University of Aizu, 1998. Thesis Advisor: R. H. Fujii.

  3. Kikuta Natsuka, A Microcontroller Selection Steps. The University of Aizu, 1998. Thesis Advisor: R. H. Fujii.

  4. Kanada, Fractal Algorithm for Parallel Multithreaded Architecture. The Univ. of Aizu, 1997. Thesis Advisor: Yamin Li.

  5. Takayama, Instruction Scheduling for Parallel Multithreaded Architecture. The Univ. of Aizu, 1997. Thesis Advisor: Yamin Li.

  6. Matsumoto, Performance Evaluation of MIS-MEP Multithreaded Architecture. The Univ. of Aizu, 1997. Thesis Advisor: Yamin Li.

  7. Wakebe, Cache Design for Multiple Instruction Streams Multiple Execution Pipelines Processor. The Univ. of Aizu, 1997. Thesis Advisor: Yamin Li.



Next: Computer Solid State Up: Department of Computer Previous: Department of Computer


www@u-aizu.ac.jp
December 1998