Next: Computer Solid State Up: Department of Computer Previous: Department of Computer

Computer Architecture Laboratory

/ Tuneo Ikedo / Professor
/ Michael Kishinevsky / Professor
/ Robert H. Fujii / Associate Professor
/ A. Y. Kondratyev / Associate Professor
/ Yamin Li / Associate Professor
/ Omar Hammami / Assistant Professor
/ Jianhua Ma / Visiting Researcher
/ Wanming Chu / Research Associate

Computer Archtecture Laboratoy is organized with 7 faculty members and one visiting reseacher. The followings are the summary of each members.

Prof. Tsuneo Ikedo Research on:

(1) Aizu supercomputer Project: The Aizu supercomputer has been developping since 1993. It is a massive parallel processing system with specialized ASIC designed by our lab. The ASIC consists of 5 PEs(processing elements) and 5 sets of matrix multiplier, divider and square root arithmetic processor in floating point format. It has 5.6 GFlops as hardware performance. The chips are interconnected in complete graph network with optical links. It will be used for supercomputing of virtual reality and mutlimedia application where the diversed data types and processing are requred. 20 students join in this reseach.

(2) Graphics Processor: The Graphics processor with 1.2 million polygons/s. (100-pixel triangle) under simultenuous operations of texture mapping, Phong and bump mapped shading, hidden surface removal has been developping. It has theoretically unlimitted scalabilty. Embedded Phong and bump mapped shading, gaseous object renderer and sinc resampling filtering for texture and antialiasing in a single AISC chip may be first production. This project has been promoted under the joint reseach with university/industry. 12 students also participates since 1994. This chip will be used for fine grain processor of virtual reality system.

Profs. M. Kishinevsky and A. Kondratyev:

Design Automation of Concurrent Systems: Our goal was to make further progress in developing theory, methods, and tools for design automation of concurrent systems. In a few research directions we worked in a close cooperation with Prof. Taubin (Computer Education Lab) and with professors from Politecnico di Torino, Italy (L. Lavagno), University Politecnica de Catalunya, Spain (J. Cortadella) and University of Newcastle upon Tyne, England (A. Yakovlev). Prof. Ten was involved in software development of unfoldings tools.

Research results:

Models of concurrency: methods for synthesis of concurrent specifications (general and safe Petri Nets) starting from state based specifications (labeled transition systems) are developed. We also presented techniques for optimization and transformation of Petri Nets based on theory of regions.
Verification of concurrent systems: methods of checking properties of concurrent specifications based on reduced unfoldings of Petri Nets are developed.
Synthesis of asynchronous systems: A new method for state encoding of asynchronous specifications and for hazard-free technology mapping of asynchronous circuicts are developed. Experiments shows that these methods are much more efficient than previously published.
Testing of sequential circuits: We developed an efficient algorithm for robust path-delay fault test generation of sequential asynchronous circuits. The technique is based on reducing the problem of robust path-delay fault test for sequential circuits to the classical problem of single stuck fault test generation for combinational logic. Both partial and full scan for latches can be used.
Deadlock prevention in concurrent systems: Methods for deadlock detection and deadlock prevention in manufacturing systems based on net unfoldings are developed.

Teaching: we taught logic design, computer architecture and computer organization I classes. We developed project for computer architecture classes (design of Sim machine).

Profs. Y. Li and W. Chu.

1. Research: Parallel Multithreaded Superscalar Processor (PMSP)

The PMSP is a multiple-instruction-stream multiple-execution-pipeline processor capable of improving processor throughput by exploiting instruction level parallelism (ILP). The goal of this project is to design the processor, develop an instruction scheduling strategy for exploiting the ILP, and evaluate the architecture. We have finished the datapath design which includes ALU/Shifter, floating-point adder/subtracter, multiplier, divider, squire rooter, and floating-point integer converter. Some parts have been verified with FPGAs. We are working on designing the control unit, Verilog simulation, and performance evaluation.

2. Education: Aizup--A Pipeline Processor Design and Implementation on FPGA Chip

A pipeline processor model (named Aizup) has been developed for the exercise course of "Computer Organization I". The Aizup has been designed at Cadence environment and implemented on Xilinx XC4006PC84 FPGA chip. We ask students to design the processor, to perform functional simulations, to implement the design on the chip, and to measure the chip with Logic Analyzer. The exercise course is helpful to students to understand the operations of pipelined processors and to master the design methodologies and the use of measuring instruments. It was presented at the Second Workshop on Computer Architecture Education, San Jose, California, Feb., 1996 and the IEEE Symposium on FPGAs for Custom Computing Machines, Napa, California, April 1996.

Prof. O. Hammami.

(1) Research have been conducted on several subjects:

performance evaluation tools for computer architects. Research have been conducted to evaluate and analyze the potential of the wavelet theory for the compression of computer traces used in trace driven simulation.
instruction scheduling for superscalar processors The multiple hazards which occur during superscalar execution make difficult precise scheduling of instructions. A fuzzy approach was proposed which is more adapted.
Cache Memories Architectures We have proposed a new cache memory hardware design which have similar hit rate to fully associative cache of same size with a lower hit time.
memory hierarchy management a fuzzy inference system for cache management was specified and simulated. A programmable pipelined fuzzy processor was designed to support this fuzzy inference system and to provide real time cache management. A 32 processors SIMD SOM neural network was specified and currently simulated for the purpose of global memory hierarchy management.

(2) We also lectured and took part in laboratories in Logic design, Computer architecture and Computer Architecture I courses.

Prof. Fujii R.

Research in the following areas were carried out: 1) analysis of genetic algorithms; 2) simplified symbolic circuit function generation for analog circuits; 3) determination of worst case circuit sensitivity to circuit element manufacturing tolerance; and 4) microcomputer control of devices which aid people with disabilities. Educational activities included preparation of tutorials and laboratory exercises for the VLSI-II class and preparation of new laboratory exercises for the logic design class.

Visiting Professor Jianhua MA, the overview of research in 1995:

My researches in 1995 are devoted to algorithm designing for the Truga001 graphics chip and multimedia modelling in multimedia synchronization and interface. Professor T. Ikedo and I have developed a set of new algorithms on Phong shading and bump mapping which are embed into the Truga001 graphics chip. The performance evaluations of the Truga001 are simulated under conditions of burst page mode, mapping, HDTV and NTSC video input, and font. An essential model of distributed multimedia synchronization based on multimedia distortion and fidelity is proposed, and many other models may be regarded as special cases of the essential model. The direct mapping techniques in a hyperworld and hyperinterface problems are also studied.

Refereed Journal Papers

Ikedo T. Design and perfomance evaluation of a pixel cache implemented within application-sepcific intergrated circuits. Visual Computer, 12(5):215--233, 1996.
Application-specific intergrated circuit design and the performance of a graphics processor that uses a pipelined cache with FIFO memory to transfer a 3D pixel array and its z values to the frame buffer in one cycle are described. Digital differencial analyzer and the size of the pixel cache relative to the frame buffer bandwidth, have been selected for good overall performance. A rendering speed of 8ns/pixel or 1.2 million polygons/s(100pixel triangle) was achieved when 60ns access-time single port DRAM.
Tosiyasu L. Kunii, Jianhua Ma, Runhe Huang, and Takao Maeda. Computer graphics research activities in japan. SIGGRAPH Quarterly, Computer Graphics, 30(2):28--31, 1996.
Computer Graphics has been a very active research area in Japan. Universities and companies have been allocating a relatively large amount of research and development budget on computer graphics research. With the developments of CAD/CAM, visualization and entertainment applications, especially, recent advanced technologies of multimedia and virtual reality, computer graphics research and its applications have become hotter topics. There are unique pure academic researches at universities. Most of the research and development at companies have been traditionally emphasizing engineering and applied research. This article reviews the `state-of-the-art' of the Computer Graphics industry and research activities in Japan.
A. Yakovlev, M. Kishinevsky, A. Kondratyev, L. Lavagno, and M. Pietkiewicz-Koutny. On the models for asynchronous circuit behaviour with OR causality (accepted for publication). Formal Methods in System Design, 1996.
Asynchronous circuits behave like concurrent programs implemented in hardware logic. The processes in such circuits are synchronised in accordance with the dynamic logical and causal conditions between switching events. The classical paradigm, easily represented in most process-oriented languages for concurrent systems modelling, is AND causality, which is often associated with a rendez-vous synchronisation. In this paper we investigate a different, less known paradigm, called OR causality. This paradigm is however different from the classical MERGE paradigm, which is based on mutually exclusive events. Petri nets and Change Diagrams provide adequate modelling and circuit synthesis tools for the various OR causality types, yet they do not always bring the specifier to a unique decision about which modelling construct must be used for which type. We present a unified descriptive tool, called Causal Logic Net, which is graphically based on Petri net but has an explicit logic causality annotation for transitions. It is aimed as the least possible generalisation of Petri nets and Change Diagrams. The signal-transition interpretation of this tool is analogous to, but more powerful than, the well-known Signal Transition Graph. A number of examples demonstrate the usefulness of this model in the synthesis of asynchronous control circuits.

Refereed Proceeding Papers

Omar Hammami. Experiencing compression of computer address traces using wavelets. In SPIE International Symposium on Aerospace/defense Sensing and Dual Use Photonics Wavelets aplications for Dual Use, pages pp. 1142-- pp.1153, USA, April 1995. SPIE.
We present in this paper very simple compression experiments using wavelets of computer address traces which are heavily used in Computer Architecture simulation. Trace driven simulation is facing a major obstacle with the sizes of traces used during simulation. These sizes are requiring an increasing amount of storage to be stored and an increasing amount of simulation time to be consumed. We report in this paper some preliminary experiments of using the wavelet theory as a tool to reduce this size.
Omar Hammami. Wavelets, compression and trace driven simulation. In PERMEAN95, pages pp.92--pp.99. IPSJ-ACM, August 1995.
Wavelet theory have been of tremendous help to reduce storage size of different objects in different fields such as signal processing and image processing. Although, wavelet based compression is classified as a lossy technique, the attractive aspect of multiresolution approach of wavelets transform is the possibility to define different levels of lossiness. In this paper, we intend to further analyze the effectiveness of wavelet theory as a tool for computer addresses traces compression and traces representation. We conducted several experiments on a standard suite of benchmarks used for trace-driven simulations using several filters and thresholding policies. A part from excessive computation time requirements, this approach suffers from unstable behavior with performance varying from poor to excellent. Besides, cache simulations using some of the transformed traces showed the poor locality of the inverted transformed traces.
Omar Hammami. Fuzzy scheduling in compiler optimizations. In Proc. of ISUMA-NAFIPS, pages pp.543--548, USA, September 1995. IEEE.
The compile time cache management approach The compile time cache management approach makes use of specialized cache memories management instructions to generate an optimal management. This is done by generating an optimal schedule of these specialized instructions for the program being compiled. Up to now, conservative approaches have been used to tackle this issue despite the occurence of unpredictable real time events and the fact that many variables are imprecise. This explains the unstable performances of these algorithms varying from excellent to very poor. We propose to make use of a fuzzy scheduling approach to deal with the problem.
Omar Hammami. Adaptive cache coherence schemes in shared memory multiprocessors. In Proc. of the 1995 Annual Conference of Japanese Neural Network Society, pages pp.223--224, Japan, October 1995. JNNS.
The paper presents how to use the clustering and pattern recognition capacity of neural networks to the problem of cache coherency in shared memory multiprocessors.
Omar Hammami. Real time aspects of cluster based caches. In RTCSA'95, pages pp.16--20, USA, October 1995. IEEE.
This paper makes a contribution to real time aspects of cache systems by proposing a new cache organization with the potential of reducing hit time and increasing hit ratio. Although, predictability remains a concern, a better potential can be found with these caches due to a total elimination of conflict misses. This proposal simply constitutes a starting point for further studies.
Omar Hammami. Towards self organizing cache memories using neural networks. In ICNN95, pages pp.917--922, USA, November 1995. IEEE.
In this paper we investigate the possibility to use neural networks to implement a new cache block placement strategy in cache memories. Based on the principle of space locality, we propose a cluster based cache block placement strategy more adapted to program behavior. The capacity of neural networks to produce such clusters is evaluated and cross-validated with standards statistical techniques.Positive results and the prospective of a possible simple hardware implementation are in favor of this choice.
Omar Hammami. Cluster based cache memories using neural networks. In ICECS'95, pages pp.8--14, USA, December 1995. IEEE.
Based on the observation that memory references can be clustered due to locality of reference, we propose in this paper a new cache design. Our main concern is how to propose a cache design which match dynamically the different locality patterns present during programs execution. Neural networks bring a solution to both the clustering problem and the dynamic reconfigurability of caches. Emphasis is on the hardware implementation alternatives.
Ikedo T. and N. Mirenkov. Aizu supercomputer project. In M. Hamza, editor, In: Proceeding of the Seventh IASTED/ISMM International Conference Parallel and Distributed Computing and Systems, pages 443--438, Washinton DC, Oct. 1995. IASTED/ISMM, IASTED-ACTA.
The paper describes a project of a massively parallel system related to Aizu's multimedia center. The system employs a highly parallel MIMD architecture using a conflict-free communication system as well as special-purpose units for graphics and sound processing and image buffering which support high-speed input/output operations. The idea behind the project software is making use of animation films as communication units for computer-human dialog. Each film is related to series of frames and relects some knowledge about data processing.
Ikedo T. A multimedia vr system. In S. Stevens, editor, In: Proceeding of IEEE Multimedia System'96, International Conference on Multimedia Computing and Systems, pages 4--11, Hiroshima, June 1996. IEEE MultiMedia, IEEE Computer Society, Multimedia Computing.
The paper first presents the technical limitations of system of Aizu multimedia center owing to its contruction from a combination of commercially available products, and then introduces a new architecture for computing machines, computer graphics processors, and sound gnerators which has been developed at the University of Aizu in reply to the demand for second generation multimedia systems.
Runhe Huang and Jianhua Ma. Synchronization modeling of distributed multimedia systems. In Proceedings of the IASTED International Conference on Applied Informatics, pages 444--447. IASTED/ISMM, January 1996.
Research of this paper is focused on developing an essential model of multimedia synchronization in a distributed multimedia system. Information of a multimedia object is decomposed into content information and time information. An essential model of multimedia synchronization is proposed, and relations of the essential model with some other models are discussed. Multimedia distortion and fidelity are introduced with considering perception of multimedia objects. It is emphasized that multimedia fidelity should become one of basic criteria of multimedia synchronization. It is first time that transmission error of time information and transformation distortion are found as two kinds of time distortion sources. Some new forms of time distortion are given and discussed from different viewpoints.
Tosiyasu L. Kunii, Jianhua Ma, and Runhe Huang. Hyperworld modelling. In Proceedings of VISUAL96 Information Systems, pages 1--8, Melbourne, Australia, February 1996. Victoria University of Technology.
By direct mapping, multimedia information worlds and real worlds are connected closely, and in fact, their combinations form a new world: a hyperworld. In this hyperworld we can, not only get passive multimedia information but also, sense and control real worlds directly and actively. The basic characteristic of a hyperworld is the direct mapping between multimedia information worlds and real worlds. The direct mapping means what we or computers are thinking are realized directly. Drastic efficiency and exactness increase is the primary feature of direct mappings in general. How to effectively model the direct mapping between the multimedia information worlds and the real worlds is key to hyperworld modeling. Two cases of the direct mapping modeling are presented for explaining the drastic efficiency and exactness increase in related application areas.
Jianhua Ma and Runhe Huang. Improving human interaction with a hyperworld. In David Du and Olivia R. Liu Sheng, editors, 1996 Pacific Workshop on Distributed Multimedia Systems, pages 46--50, June 1996.
Person interacts with various worlds in two ways: the one-to-one interaction with a single world and the one-to-many interaction with multi-worlds. Most of the current researches on improving human interaction with the world are limited to the one-to-one interaction, i.e. the interaction of a person with each individual world. Since relations among worlds in the multi-worlds are nonlinear and can be expressed by a set of links, such multi-worlds as a whole is called a hyperworld. This paper focuses on giving an outline of a future hyperworld as a system and presents some problems of developing such a system and proposes potential solutions.
Jianhua Ma and Runhe Huang. Multimedia synchronization based on fidelity requirements. In Tat Seng Chua, Hung Keng Pung, and Tosiyasu L. Kunii, editors, IEEE Singapore International Conference on Networks and International Conference on Information Engineering'95, pages 488--492. IEEE and IEEE Singapore, World Scientific, July 1995.
Research of this paper is focused on definitions and models of multimedia distortion and fidelity, and their relations with multimedia synchronizations. The basic idea is that information of a multimedia object can be decomposed into content information (or non-time information) and time information. Time information can be either classified into centralized and decentralized time information, or classified into steady and dynamic time information. According to burstness, diffusions and accumulation of distortion, several new forms of distortion are defined. A logic model of multimedia synchronizations based on synchronization controllers and transmissions is proposed. Main tasks of the controllers are distribution and mapping of distortion on basis of fidelity. It is addressed how to map distortion tolerance to QoS (quality of service) of logic channels for time or non-time information and physical channels.
Runhe Huang, Jianhua Ma, T. L. Kunii, and Eiju Tsuboi. An optimized parallel algorithm for extracting ridges and ravines. In Proc. of Internatioal Symposium on Parallel and Distributed Supercomputing, pages 253--260, September 1995.
In order to reduce computational complexity of extracting ridges and ravines, two approaches called explicit arithmetic formulae (EAF) and explicit arithmetic formulae with local memory (EAFWLM) are proposed and compared to the approach in which the derivative formulae are used. An even strip parallel algorithm with different approaches of reducing complexity of extraction procedure were implemented on a GCel-1/64 transputer based parallel machine. The results show that the approach of using EAFWLM can greatly reduce the computational complexity of the extraction procedure and significantly ease load balance problem. The even strip parallel algorithm, in conjunction with the EAFWLM approach as an optimized parallel algorithm has shown the much better performance comparing with the other two approaches.
Jianhua Ma, Runhe Huang, and Tosiyasu L. Kunii. Fidelity and distortion in multimedia synchronization modeling. In Tat Seng Chua, Hung Keng Pung, and Tosiyasu L. Kunii, editors, Multimedia Modeling Towards Information Superhighway, pages 203--217, Singapore, November 1995. World Scientific.
Both good measures and essential models are the important premises of evaluating models and mechanisms of distributed multimedia synchronization. Multimedia distortion and fidelity are introduced as the bases of measuring the performance of a reproduced or synthetic multimedia object. Time fidelity, as one kind of important construction information fidelity, is classified into intra-media, inter-media and inter-destination fidelity. The essential model of distributed multimedia synchronization is proposed, and many other models may be regarded as special cases of the essential model. Transmission error and transformation distortion as two new sources of time distortion, are addressed in this paper, and the detailed forms of time distortion are presented and discussed from different viewpoints.
A. Kondratyev, M. Kishinevsky, A. Taubin, and S. Ten. Verification of asynchronous systems based on petri net unfoldings. In Proc. of IEICE Concurrent Systems Technology Conference, CST-96, pages 17--23, Aizu, May 1996.
We review different methods of unfoldings (McMillan, Esparza, and the ours) and show how to use unfoldings in verification of asynchronous systems.
A. Taubin, A. Kondratyev, and M. Kishinevsky. Deadlock prevention by petri net transformations. In Proc. of IEICE Concurrent Systems Technology Conference, CST-96, pages 25--32, Aizu, May 1996.
A deadlock prevention procedure first detects deadlocks using the unfolding, then prevent deadlocks by transforming the initial PN to a deadlock-free net. This transformation is based on the ordering relations between places and transitions in PN and inserts special "switch" places and transitions to control choice that can lead to a deadlock situation. The final cyclic deadlock-free net is equivalent to the initial net in the following weak sense: complete cyclic behaviors retain their "most important" representatives, however concurrency can be reduced and some live and non-live behaviors can be removed.
Alex Kondratyev, Michael Kishinevsky, and Alex Yakovlev. On hazard-free implementation of speed-independent circuits. In Proceedings of the Asia and South Pacific Design Automation Conference, pages pp. 241--248, Chiba, September 1995. IEEE/IEICE.
We investigate the problem of hazard-free gate-level implementation of speed-independent circuits specified by event-based models, such as Signal Transition Graph (for processes with AND causality and input choice) or its extension, called Change Diagram (which allows OR-causality). The main result of the paper is twofold: (1) the proof that any speed-independent behavior can be implemented at the gate-level without hazards, and (2) an efficient method for such an implementation. This method is based on transformations of the specification to the form satisfying the generalized Monotonous Cover requirement.
J. Cortadella, M. Kishinevsky, L. Lavagno, and A. Yakovlev. Synthesizing petri nets from state-based models. In Proceedings of the IEEE/ACM International Conference in Computer Aided Design, pages pp. 164--171, Sun Jose, USA, November 1995. IEEE Computer Society Press.
This paper presents a method to synthesize labeled Petri nets from state-based models. Although state-based models (such as Finite State Machines) are a powerful formalism to describe the behavior of sequential systems, they cannot explicitly express the notions of concurrency, causality and conflict. Petri nets can naturally capture these notions. The proposed method in based on deriving an Elementary Transition System (ETS) from a specification model. Previous work has shown that for any ETS there exists a Petri net with minimum transition count (one transition for each label) with a reachability graph isomorphic to the original ETS. This paper presents the first known approach to obtain a ETS from a non-elementary TS systems and derive a place-irredundant Petri net. Furthermore, by imposing constraints on the synthesis method, different classes of Petri nets can be derived from the same reachability graph (pure, free choice, unique choice). This method has been implemented and efficiently applied in different frameworks: Petri net composition, synthesis of Petri nets from asynchronous circuits, and resynthesis of Petri nets, modelling parallel processes in Ada programs, extracting hierarchy for hardware/software specifications.
J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, and A. Yakovlev. Complete state encoding based on the theory of regions. In Proc. of Second International Symposium on Advanced research in Asynchronous Circuits and Systems, pages pp. 36--47, Aizu-Wakamatsu, March 1996. IEEE Society Press.
Synthesis of asynchronous circuits from Signal Transition Graphs (STGs) and/or State Graphs (SGs) involves solving state coding problems. A well-known example of such problems is that of Complete State Coding (CSC), which happens when a pair of different states in an SG has the same binary encoding. This paper aims at presenting such a general framework, which is based on two fundamental concepts. One is a region of states in an abstract labelled SG (called a Transition System). Regions correspond to places in the associated STG. The second concept is a speed-independence preserving set, which is strongly related to the implementability of the model in logic. Regions and their intersections offer ``nice'' structural properties that make them efficient ``construction blocks'' for event insertion. The application of our theory, through the software tool petrify, to state graphs of large size has proved to be successful.
J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, and A. Yakovlev. Methodology and tools for state encoding in asynchronous circuit synthesis. In Proc. of the ACM/IEEE Design Automation Conference (DAC-33) Conference, pages 63--66, Las Vegas, June 1996. ACM.
State encoding is one of the problems that still need a satisfactory solution to make asynchronous circuit synthesis practical. This paper proposes a method which improves over existing approaches by coupling generality, optimality and em efficiency. A region in a Transition System is a set of states that ``behave uniformly'' with respect to a given transition (value change of an observable signal), and is analogue to a place in a Petri net. Regions play a crucial role because they are tightly connected with a set of properties that we want to preserve across the state encoding process. The algorithms have been implemented in a software tool called petrify. the efficiency of the method is demonstrated on a number of ``difficult'' examples.
A. Kondratyev, M. Kishinevsky, A. Taubin, and S. Ten. A structural approach for the analysis of petri nets by reduced unfoldings. In Proc. of International Conference on Application and Theory of Petri Nets (Lecture Notes in Computer Science, vol. 1091), pages 346--365, Osaka, June 1996. Springer.
This paper suggests a way for Petri Net analysis by checking the ordering relations between places and transitions. The method is based on unfolding the original net into an equivalent acyclic description. In an unfolding the ordering relations can be determined directly by the structure of an underlying graph. The PN properties for analysis can be various: boundedness, safety, persistency etc. The practical example of the suggested approach is given in application to the asynchronous design. The circuit behavior is specified by an interpreted Petri net, called Signal Transition Graph (STG) which is then analyzed for the implementability by asynchronous hazard-free circuit. The experimental results show that for highly parallel STGs checking the implementability by unfolding is one to two orders of magnitude less time-consuming than checking it by symbolic BDD traversal of the corresponding State Graph.
Taubin A., A. Kondratyev, and M. Kishinevsky. Deadlock prevention using petri net unfoldings. In Proc. of Computational Engineering in Systems Application (CESA'96) Conference, Lille, France, July (to appear) 1996.
A deadlock prevention procedure first detects deadlocks using the unfolding, then reduces the unfolding to a deadlock-free sub-unfolding, and finally folds the deadlock-free acyclic net into a cyclic net. The method is implemented as a subroutine in the SIS tool and is ftp-available. For optimizing the size and properties of the transformed deadlock-free net we use a Petri Net re-synthesis algorithm available in the tool Petrify.
Michael Kishinevsky, Jordi Cortadella, Alex Kondratyev, Luciano Lavagno, and Alex Yakovlev. Synthesis of general petri nets. In Proc. of IEICE Concurrent Systems Technology Conference, CST-96, pages 33--39, Aizu, May 1996.
This paper presents a novel method to derive a general Petri net from state-based representations. The method is based on the theory of general regions for Transition Systems (TS). Transition systems are state graphs with labeled arcs. General regions are multisets of states preserving entry and exit conditions to each event of a TS. We have shown that if the excitation closure condition holds for a TS, then a general PN can be derived which obey two properties: (1) each transition of the PN is uniquely labeled with an event of the TS and (2) the PN is bisimilar to the original TS. The excitation closure condition we present here generalizes our condition for safe PNs. The notion of irredundant region set is exploited, to minimize the number of places in the net without affecting its behaviour.
E. Pastor, J. Cortadella, A. Kondratyev, and O. Roig. Cover approximations for the synthesis of speed-independent circuits. In Proc. of IFIP Workshop on Logic and Architecture Synthesis, pages pp. 150--159, Grenoble, France, December 1995.
The monotonous cover approach is applied to the synthesis framework by STG. Free-choice STG is represented as a complete set of state machines and the considration of each machine gives the approximation of cover cubes on the basis of concurrency relations.
E. Pastor, J. Cortadella, A. Kondratyev, and O. Roig. Structural methods for the synthesis of speed-independent circuits. In Proc. of EDTC-96 European Conference, pages pp. 340--347, Paris, France, March 1996. IEEE Society Press.
This paper presents novel methods for the synthesis of asynchronous circuits from Signal Transition Graphs (STGs). Unlike current existing tools, the proposed techniques do not require the derivation of the reachability graph to calculate the logic equations for the outputs of the circuit. Synthesis of speed-independent circuits can be done by only analyzing the structure of the underlying Petri net with polynomial algorithms. The presented techniques can be applied to any live and safe STG that can be covered by state machines and, in particular, to all live and safe free-choice STGs. These techniques provide significant improvements with regard to previous state-based and structural methods. An experimental synthesis tool based on the proposed techniques has been able to synthesize STGs (some of them being non-free-choice) with over (10 in 27) markings.
Yamin Li and Wanming Chu. Design and implementation of a multiple-instruction-stream multiple-execution-pipeline architecture. In M. H. Hamza, editor, Proc. of the Seventh IASTED International Conference on Parallel and Distributed Computing and System, pages 477--480, Washington DC, USA, Oct. 1995. IASTED/ISMM, IASTED ACTA Press.
This paper describes a single chip Multiple-Instruction-Stream Multiple-Execution-Pipeline (MIS-MEP) processor which we have designed and implemented by using Toshiba ASIC library. The MIS-MEP processor consists of four instruction dispatch units, four instruction cache modules, two ALU, a load/store unit, a floating-point adder, a floating-point multiplier, an instruction scheduling unit, sixteen communication register, and four communication queues. A set of special instructions is developed for the management of multiple instruction streams. In order to analyze the performance potential of the MIS-MEP architecture, several small benchmarks were hand compiled and optimized. Two examples have been chosen to show how to use the synchronization registers and queues and to illustrate the abilities of the MIS-MEP architecture. The first example is the Lawrence Livermore Loops. We use the C source version of LLL \#3 for purposes of this analysis. Three new instruction streams were created by using extended instructions (total four). In the loop body, four instruction streams were executed in fully parallel. The MIS-MEP processor can issue 3.945 instructions per cycle in this example. The second example program is strcpy() function. Compare to the first example, this function has little parallelism due to the dependency found in the loop control variable. The instruction stream 4 performs destination address calculation. The instruction stream 3 performs source address calculation. The instruction stream 2 is responsible for loading characters of source string. And the instruction stream 1 is responsible for loop control and store character store. All the streams communicate each other via the communication queues. We got a total CPI rate of 0.613, i,e, 1.633 instructions were allowed to issue each clock cycle. Achieving a CPI rate of about 0.6 for this example is an excellent result.
Yamin Li and Wanming Chu. Design and performance analysis of a multiple-threaded multiple-pipelined architecture. In S. Sahni, V. K. Prasanna, and V. P. Bhatkar, editors, Proceedings of the Second International Conference on High Performance Computing, pages 483--490, New Delhi, Dec. 1995. IEEE TCPP, IEEE TCCA, Tata McGraw-Hill Publishing Company Limited.
According to the number of execution pipelines and the number of instruction threads, we divide the pipelined architectures into four types: STSP (Single-Threaded Single-Pipelined) architecture, MTSP (Multiple-Threaded Single-Pipelined) architecture, STMP (Single-Threaded Multiple-Pipelined) architecture, and MTMP (Multiple-Threaded Multiple-Pipelined) architecture. We investigated and compared the four architectures and discussed the design issues of the MTMP architecture. The issues include the number of resident threads in processor, the relationship between the number of resident threads and the number of hardware context, the effect arrangement of execution pipelines considered with performance/cost ratio. Several different processor configurations are examined. In order to evaluate the architecture by using real applications, we choose a texture mapping program, that maps a texture pattern onto a 3D object's perspective projection in screen space. For this kind of applications, the floating point computations are considered as the key performance factors. The execution pipelines are arranged as below. An integer unit performs integer arithmetic and logic operations, a Load/store unit performs memory access, a Floating-point adder performs floating-point add, subtract, and comparison, a Floating-point multiplier performs floating-point multiplication, a Floating-point divider performs floating-point division, and a Floating-point convert unit performs floating-point/integer data type conversions. When the MTMP has four hardware context, we can expect speedups of 1.9 and 2.7 when the numbers of threads are four and eight respectively.
Yamin Li and Wanming Chu. Using computer architecture/organization at the university of aizu. In Proceedings of the Second Workshop on Computer Architecture Education, San Jose, California, USA, Feb. 1996. IEEE Computer Society, IEEE Computer Society Press.
Computer science, technology, and engineering have been changing rapidly. We must consider how to teach students the rapid changed Computer Architecture/Organization Courses and what contents should be taught in the courses. In this paper, we introduce curricula for Computer Architecture Course, Computer Organization Course, and Parallel Computer Organization and Architecture Course. We also introduce the teaching strategy, lecture design, and laboratory development for these courses.

Technical Reports

Robert H. Fujii. Determination of the worst case circuit sensitivity to circuit element manufacturing tolerance. Technical Report, 95-2-004, Sept. 28, The University of Aizu, Aizu-Wakamatsu, Japan, 1995.
Robert H. Fujii. Generation of a simplified symbolic circuit function and analysis of simplification errors. Technical Report, 95-2-005, Sept. 28, The University of Aizu, Aizu-Wakamatsu, Japan, 1995.
Jordi Cortadella, Michael Kishinevsky, Alex Yu. Kondratyev, Luciano Lavagno, and Alex Yakovlev. A region-based theory for state assignment in asynchronous circuits. Technical Report, 95-2-006, October 16, 32pgs, The University of Aizu, Aizu-Wakamatsu, Japan, 1995.
Robert H. Fujii, Chrystopher L. Nehaniv, and Lothar M. Schmitt. On genetic algorithms. Technical Report, 95-1-024, June 8, 1995. 29pgs, The University of Aizu, Aizu-Wakamatsu, Japan, 1996.
Runhe Huang and Jianhua Ma. A general model of multimedia synchronization and multimedia fidelity and distortion. Technical Report, 95-1-013, March 28, 15pgs, The University of Aizu, Aizu-Wakamatsu, Japan, 1995.
Jianhua Ma, Runhe Huang, and Tosiyasu L. Kunii. Fidelity and distortion in multimedia synchronization modeling. Technical Report, 95-1-018, May 22, 14pgs, The University of Aizu, Aizu-Wakamatsu, Japan, 1995.
Tosiyasu L. Kunii, Jianhua Ma, and Runhe Huang. Hyperworld modeling. Technical Report, 95-1-040, December 8, 8pgs, The University of Aizu, Aizu-Wakamatsu, Japan, 1995.
Alex Kondratyev, Michael Kishinevsky, Alexander Taubin, and Sergei Ten. Analysis of petri nets by ordering relations in reduced unfoldings. Technical Report, 95-2-003, June 21, 24pgs, The University of Aizu, Aizu-Wakamatsu, Japan, 1995.
Michael Kishinevsky, Alex Kondratyev, Luciano Lavagno, Alex Saldanha, and Alexander R. Taubin. Hazard free robust path delay fault testing of asynchronous nets. Technical Report, 96-2-001, March 12, 13pgs, The University of Aizu, Aizu-Wakamatsu, Japan, 1995.

Patents

Patent: Tuneo Ikedo, H7-201250 (Japan), 1996. Title: Gaseous Object Generator. Computer Graphcis technology to render the gaseous object such as fog, steam,or cloud in hardware is claimed. The Gaussian random number generator, spline curve circuit shading processor, synthesizer of surface-defined object with gaseous object, and transparency control circuit are combined for the redendering of gaseous object.
Tuneo Ikedo, 1995. Patent: H7-201251 (Japan), Title: Multi-screen System, In the multiple projection system organized large sized screen, the multiple coordinate transformations which differ from each projector are needed. This patent claims the new architecture consiting of the graphics array processor to cope with such system.
Tuneo Ikedo, 1996. Patent: PCT/JP96/00726 (USA, CA, GB, FR, IT, DE, SE, DK), Title: Computer Graphcis Circuit, The hardware architecture for Phong and bump mapped shading is claimed. It is achieved by angular interpolation which can organized the circuit with small RAM, ROM, multipier and adder.

Academic Activities

Jianhua Ma, Feb. 1996. Section Chairman for the International Conference on Visual Information Systems.
Jianhua Ma, 1995. Refereed paper for the Journal of Visual Computer.
Jianhua Ma, 1995. Assistant in evaluating 1995 CG$\&$A Industry Excellence Awards.
Michael Kishinevsky, March 1996. Tutorial/CAD chair of the Second International Symposium on Advanced research in Asynchronous Circuits and Systems.
Michael Kishinevsky, November 1995. A full-day tutorial {Systematic Design of Asynchronous Circuits} at IEEE/ACM ICCAD'95 Conference.
Michael Kishinevsky, 1995/1996. Refereeing papers for: Journal of VLSI Signal Processing, ICCAD-95,96, Async96, ASP-DAC, IEE Transactions, Second Working Conference on Asynchronous Design Methodologies-95, etc.
Michael Kishinevsky, 1996. Serving at the Editorial board of the Special Issue on Asynchronous Circuit and System Design of IEICE Japanese and English Journals Information and System Society.
Alex Yu. Kondratyev, March 1996. Co-chair of the Second International Symposium on Advanced research in Asynchronous Circuits and Systems.
Alex Yu. Kondratyev, 1995-1996. Refereeing papers for: IEICE Transactions on Fundamentals of Electronics, ICCAD-95,96, Async96, Second Working Conference on Asynchronous Design Methodologies-95, etc.
Yamin Li, 1996. Referee, Reviewing paper for International Symposium on Parallel Architectures, Algorithms, and Networks.

Next: Computer Solid State Up: Department of Computer Previous: Department of Computer

www@u-aizu.ac.jp
October 1996