Next: Computer Networks Laboratory Up: Software Department Previous: Language Processing Systems

Distributed Parallel Processing Laboratory

/ N. N. Mirenkov / Professor
/ S. G. Sedukhin / Professor
/ Shietung Peng / Associate Professor
/ Alexander Vazhenin / Visiting Associate Professor

Our general program is high-performance computing and advanced information infrastructure. We are also thinking about a planetary machine and active knowledge being developed by humanity. We are undertaking research efforts in parallel and distributed algorithms/architectures, visual (multimedia) languages and tools. We lead joint research projects ``Communication-efficient parallel computing" and ``A new paradigm for clustered computing". In the first research project, we focus on the design of communication-efficient parallel algorithms for Bulk-synchronous parallel computers. The key is to develop a router that can perform efficient point-to-point and collective communication among processors. The second research project will focus on modeling, developing software packages, investigating fault-tolerance and performance analysis for the clustered computing.

We are studying the problem of parallel program portability, skeleton-based techniques and developing the supporting tools. Our theoretical work is related to efficient algorithms for parallel numerical computation including linear algebra and Fourier transformation as well as for fault tolerant routing on certain interconnection networks, like hypercubes, star graphs, and torus. Information about our activities can be found on the laboratory Home Page: http://www.u-aizu.ac.jp/labs/sw-dpp/, which contains links to three experimental laboratory WWW servers. These servers hold the High-Performance Computing and Communication topics, on-line CD-ROM Digital Library and serve as mirror sites of, for instance, the ``Computational Science Educational Project".

As one of our results we would like to mention the S4CAD system. S4CAD (System of Systolic Structures Synthesis) is a high level interactive software tool for designing, analysis and modeling of systolic VLSI-oriented array processors. Starting with a localized data dependence graph of the algorithm S4CAD automatically produces all permissible array processors, among which the designer may chose the optimal one. World Wide Web page of the S4CAD is available at http://www-dpp.u-aizu.ac.jp/HPCC/S4CAD/.

The important question of our research is parallel program transparency and accessibility of massively parallel computers to large user populations. We are developing a multimedia technology for the interactive specification of application algorithms. This technology is related to visualization, animation and sonification of data processing methods. In fact, it is a `filmification' of application algorithms. Each film is based on a rather big peace of knowledge of a class of the algorithms. Our approach is close to the idea of algorithmic skeletons (higher order functions). However, skeletons leave a big gap between reality and a human (a multi-channel being). So, we use algorithmic multimedia skeletons based on both mathematical and physical abstractions.

Each film is a series of frames (computational steps) displaying one or more parameterized sets of nodes and/or moving objects in multi-dimensional space-time. Each frame highlights a subset of these nodes and/or moving objects. The films have their own sets of figures, sounds, colors and their own ``shape'' of the movement. Users recognize them due to a combination of all these features. Each film defines a partial order of scanning of the nodes or objects (and possibly, colors and sounds). As a rule, computation specified on different nodes (objects) of a frame is considered to be performed in parallel. Computation specified in different frames is performed sequentially. So, it is possible to say: the shorter film the better.

The user defines the specification by creating his new film. The corresponding program (sequential or parallel) is generated automatically. Each film itself is considered as a special-purpose application for which a parallel program is developed in advance as an item of the system library. The specification of a problem is considered as input data for such a parallel program. However, this data can introduce a composition of a few films, an irregularity of computation, special sizes of data, etc. It means that to reach efficiency it is impossible to have only one variant of the film implementation. So, each film is supported at least by a few parallel programs from the system library. Different programs are related to different conditions being defined by the specification. In other words, a film has a special space of decision-making. This space takes into account levels of problem granularity, irregularity, dynamics of the data decomposition, and some others related to algorithm features.

In fact, we are developing film machines where data, knowledge and algorithms (as well as results) are specified by films. Distributed film databases can be considered as active common knowledge/software being developed by humanity.

Refereed Journal Papers

Mirenkov, N., Filmification of Methods: Programming Technology for the 21-st Century. Nonlinear Analysis: Theory, Methods and Applications, vol.30, no.2, p.779-790, 1997.
Ikedo, T. and Mirenkov, N., Aizu Supercomputer Project. International Journal of Computers and Applications, vol.20, no.1, p.40--48, 1998.
S. Peng, S. Sedukhin, and I. Sedukhin, Designs of Arrays Processors for 2D Discrete Fourier Transform. IEICE Trans. on Information and Systems, vol.E80-D, no.4, p.455-465, 1997.
In this paper the design of systolic array processors for computing 2-dimensional Discrete Fourier Transform (2-D DFT) is considered. We investigated three different computational schemes for designing systolic array processors using systematic approach. The systematic approach guarantees to find optimal systolic array processors from a large solution space in terms of the number of processing elements and I/O channels, the processing time, topology, pipeline period, etc. The optimal systolic array processors are scalable, modular and suitable for VLSI implementation. An application of the designed systolic array processors to the prime-factor DFT is also presented.
S. Peng and S. Sedukhin, and I. Sedukhin, Designs of optimal systolic arrays for 2D discrete Fourier transform. IEICE Trans. on Information and Systems, vol.E80-D, no.4, p.455--465, 1997.
In this paper the design of systolic array processors for computing 2-dimensional Discrete Fourier Transform (2-D DFT) is considered. We investigated three different computational schemes for designing systolic array processors using systematic approach. The systematic approach guarantees to find optimal systolic array processors from a large solution space in terms of the number of processing elements and I/O channels, the processing time, topology, pipeline period, etc. The optimal systolic array processors are scalable, modular and suitable for VLSI implementation. An application of the designed systolic array processors to the prime-factor DFT is also presented.
Q.P. Gu and S. Peng, Node-to-set disjoint paths with optimal length in star graphs. IEICE Trans. on Information and Systems, vol.E80-D, no.4, p.425--433, 1997.
Finding disjoint paths for certain collective communication in a parallel computer is an important issue for fault-tolerance and traffic control. In this paper, we consider finding disjoint paths for node-to-set routing problem in star graphs. Given a node $s$ and a set $T=\{t_1,...,t_k\}$ of $k$ nodes in a $k$-connected graph $G$, node-to-set routing is to find $k$ node disjoint paths $s \rightarrow t_i,~ i = 1\leq i \leq k$. In this paper, we improve the previous node-to-set routing algorithm for $n$-dimensional star graphs $G_n$. The new algorithm finds $n-1$ node disjoint paths of length at most $d(G_n)+1$ if $n \neq 4, 6$, where $d(G_n)$ is the diameter of $G_n$. Our result on the length of the disjoint paths is optimal.
Q.P. Gu and S. Peng, Node-to-set disjoint paths problem in star graphs. Information Processing Letters, vol.62, no.5, p.201--207, 1997.
In this paper, an efficient algorithm for node-to-set routing in star graphs is presented. Our algorithm improves the previous results on the maximal length of the paths found, from $5n$ to $1.5n+2$. More specifically, for node-to-set routing in $n$-dimensional star graph $G_n$, our algorithm finds $n-1$ node disjoint paths of length at most $d(G_n)+2$ in optimal time, where $d(G_n)$ is the diameter of $G_n$. The algorithm use cycle forms of permutations to derive important formula that allow us to find disjoint paths with shorter length.
Q.P. Gu and S. Peng, k-Pairwise cluster fault tolerant routing in hypercubes. IEEE Transactions on Computers, vol.46, no.9, p.1042--1049, 1997.
In this paper, we introduce a general fault tolerant routing problem, called cluster fault tolerant (CFT) routing, which is a natural extension of the node fault tolerant routing problem. As a case study, we investigate the following $k$-pairwise CFT routing in $n$-dimensional hypercube $H_n$: Given a set of faulty clusters and $k$ distinct nonfaulty node pairs, find $k$ fault-free disjoint paths connecting the $k$ node pairs. We show that $H_n$ can tolerate $n-2k+1$ faulty clusters of diameter one for the $k$-pairwise CFT routing ($k>1$). We also present an $O(kn \log n)$ time algorithm which finds the $k$ fault-free disjoint paths.

Refereed Proceeding Papers

Mirenkov, N. and Mirenkova, T., Filmification of methods and film machines. Proceedings of the International Symposium on High-performance Computing, p.329--334, publisher SCS, Apr. 1998.
Mirenkov, N. and Vazhenin, A., Filmification of Methods: Computation on Matrices. Proceedings of the International Symposium on Software Engineering for Parallel and Distributed Systems (PDSE-98), p.176--185, IEEE Press, Apr. 1998.
Mirenkov, N., Monakhov, O., Chunikhin, O., A multimedia system for the investigation of mapping algorithms. Proceedings of the International Conference on High-Performance Computing and Networking, LNCS, vol.1401, p.737--746, Springer Verlag, Apr. 1998.
Mirenkov, N. and Hirotomi, T., Filmification of Methods: Computation on Depth Search Machines. Proc. of the Int. Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA-98), p.1297--1304, CSREA, July 1998.
Mirenkov, N. and Ebihara, T., Filmification of Methods: Computation on Pyramids. Proc. of the Int. Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA-98), p.1353--1356, CSREA, July 1998.
S. Peng, S. Sedukhin, and I. Sedukhin, Array Processors Based on Gaussian Fraction-free Method. Proc. of the 1st JAERI-Kansai International Workshop on Ultrashort-pulse Ultrahigh-power Lasers and Simulation for Laser-Plasma Interactions, p.112--117, JAERI, Kansai Research Establishment, JAERI, March 1998.
The design of algorithmic array processors for solving linear systems of equations using fraction-free Guassian method is presented. The design is based on a formal approach which constructs a family of planar array processors systematically. These array processors are synthesized and analyzed. It is shown that two of them are optimal under the framework of linear allocation of computations and in the terms of computing time and the number of processing elements.
S. Peng, and S. Sedukhin, Parallel Algorithm and Architectures for Two-step Division-free Gaussian Elimination. Proc. of the IEEE Third International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP'97), A.Goscinski, M.Hobbs and W. Zhou, Editor, p.489--502, IEEE Victorian Section, World Scientific, Melbourne, Australia, Dec. 1997.
The design of optimal array processors for solving linear systems is considered in this paper. We present a new parallel algorithm/architecture based on the two-step division-free Gaussian elimination method. The new design circumvents the one-step method in terms of numerical stability. In spite of the rather complicated computations needed at each iteration of the two-step method, we develop an innovative parallel algorithm whose data dependency graph meets the requirements for regularity and locality. Then we derive two-dimensional array processors by adopting a systematic approach to investigate the set of all admissible solutions. Finally, we obtain the optimal array processors in terms of the number of processors under linear time-space scheduling.
S. Sedukhin and I. Sedukhin, An Interactive Graphic Tool for Systematic Design and Analysis of VLSI Array Processors. Proc. of the International Conference on Parallel and Distributed Techniques and Applications (PDPTA'97), H.R. Arabnia, editor, p.41-49, CSREA Press, Las Vegas, Nevada, USA. June, 1997.
A systematic approach and software tool for synthesis and analysis of a set of systolic-like array processors are presented. The S4CAD tool allow us to obtain and examine the set of admissible array processors. The software tool uses advaaanced graphical media technologies that gets more comfort for the user to evaluate and choose an optimal solution, observing the requirements on the computing time, number of processing elements, topology of the connection, data flow formats, etc. As an example, the design of systolic array processors for the transitive closure algorithm is shown.
S. Peng and S. Sedukhin, Parallel algorithm and architecture for two-step division-free Gaussian elimination. The International Conference on Algorithms and Architectures for Parallel Processing, p.489--502, University of Singapore, IEEE Computer Society, Singapore, Dec. 1997.
The design of optimal array processors for solving linear systems is considered in this paper. We present a new parallel algorithm/architecture based on the two-step division-free Gaussian elimination method. The new design circumvents the one-step method in terms of numerical stability. In spite of the rather complicated computations needed at each iteration of the two-step method, we develop an innovative parallel algorithm whose data dependency graph meets the requirements for regularity and locality. Then we derive two-dimensional array processors by adopting a systematic approach to investigate the set of all admissible solutions. Finally, we obtain the optimal array processors in terms of the number of processors under linear time-space scheduling.
Q.P. Gu and S. Peng, Node-to-node cluster fault tolerant routing in hypercubes. The International Symposium on Parallel Architectures, Algorithms and Networks, p.404--409, National University of Taiwan, IEEE Computer Society, Taipei, Dec. 1997.
In this paper, we investigate the node-to-node fault tolerant routing problem in $n$-dimensional hypercubes $H_n$ based on a generalized cluster fault tolerance model: an improvement over the previous cluster fault tolerance model of limited diameters. For a graph $G$, a faulty cluster is a connected subgraph of $G$ such that all its nodes are faulty. In cluster fault-tolerant routing problems, how many faulty clusters and how large of those clusters can be tolerated are studied. It was previously known that for node-to-node routing, $H_n$ can tolerate as many as $n-1$ faulty clusters of diameter at most 1 with at most $2n-3$ faulty nodes in total. In this paper, we extend the above results to include faulty clusters of arbitrary diameter. We also give efficient routing algorithms which, given a tolerable set of faulty clusters and non-faulty nodes $s$ and $t$ in $H_n$, finds a fault-free path $s \rightarrow t$ of length $d(s,t)+O(\log n)$ in $O(n+|F|)$ optimal time, where $|F|$ is the total number of faulty nodes.
S. Peng, S. Sedukhin, and I Sedukhin, Array processors based on Gaussian fraction-free method. The first Jaeri-Kansai International Workshop on Ultrashort-pulse Ultrahigh-power Lasers and Simulation for Laser-Plasma Interactions, p.112--117, Japan Atomic Energy Research Institue, Japan Atomic Energy Research Institue, Kyoto, March 1998.
The design of algorithmic array processors for solving linear systems of equations using fraction-free Guassian method is presented. The design is based on a formal approach which constructs a family of planar array processors systematically. These array processors are synthesized and analyzed. It is shown that two of them are optimal under the framework of linear allocation of computations and in the terms of computing time and the number of processing elements. (invited talk).

Books

S. Peng and S. Sedukhin, Problems of High-Performance Computers. Polish Academy of Sciences. Szczecin, 1997.
This volume is one of the Applied Informatics Series published by Polish Academy of Sciences. Chapter one of this volume (pages 5--46) was written by Peng and Sedukhin. This chapter describes a new approach for designing optimal systolic array processors. We show how to apply the proposed systematic approach for solving linear systems of equations. More specifically, in this chapter, the systematic design of systolic array processors for solving linear systems of equations using division-free and fraction-free Gaussian elimination method is presented. The design is a new systematic approach which constructs array processors systematically by three stages: first constructing the 3-D data dependency graphs of the algorithms, and then generating all planar systolic array processors by projecting the data dependency graph along certain feasible directions, and finally, selecting optimal solutions based on the given criteria. The key for designing optimal array processors is to develop regular algorithms and then map them into 2-D systolic arrays to meet advanced VLSI requirements.

Academic Activities

Nikolay N. Mirenkov, Member of the Program Committee of the ISPAN-97, Korea.
Nikolay N. Mirenkov, Member of the Program Committee of the ISPAN-99, Australia.
Nikolay N. Mirenkov, Chair of the Aizu Int. Student Contest on Multimedia, AIScontest-98, Japan.
Nikolay N. Mirenkov, Member of the Program Committee of the SMI-99, Japan.
Nikolay N. Mirenkov, Member of the Program Committee of the PaCT99, Russia.
Nikolay N. Mirenkov, Member of Editorial Board of the Journal "Scientific programming", John Wiley Publisher, USA. 1997.
Nikolay N. Mirenkov, Member of the IFIP Working Group 10.3 (Concurrent Systems).
Nikolay N. Mirenkov, Member of ACM, IEEE and The New York Academy of Sciences.
Nikolay N. Mirenkov, Member of the Program Committee of the International Conferences on Parallel and Distributed Processing Techniques and Applications (PDPTA'98), Las Vegas, Nevada, USA.
Stanislav G. Sedukhin, Member of the Program Committee of the 2nd International Conference on Parallel Processing and Applied Mathematics (PPAM'97), September 1997, Zakopane, Poland.
Stanislav G. Sedukhin, Member of the Program Committee of the International Conferences on Parallel and Distributed Processing Techniques and Applications: (1) (PDPTA'97), June 30-July 3, 1997, Las Vegas, Nevada, USA. (2) (PDPTA'98), July 13-16, 1998, Las Vegas, Nevada, USA.
Stanislav G. Sedukhin, Member of the (1) IEEE and IEEE Computer Society, (1994 - ), (2) ACM (1994 - ), (3) SIAM (1995 - ), (4) AMS (1996 - ).
Stanislav G. Sedukhin, Invited talk at the Research & Development Center for Gaia Simulator, Scientific and Technology Agency,Tokyo, December 16, 1997.
Stanislav G. Sedukhin, Member of the Editorial Board of the International Journal "Parallel Processing Letters" (World Scientific Publ.) (1991.9- ).

Others

Asano, Y., Filmification of Methods: Computation on Pixels with Local Substitutions. The Univ. of Aizu, 1997. Thesis Advisor: N. Mirenkov.
Ebihara, Y., Filmification of Methods: Computation on Pyramids. The Univ. of Aizu, 1997. Thesis Advisor: N. Mirenkov.
Watanabe, Y., Filmification of Methods: Computation on Pixels with Global Substitutions. The Univ. of Aizu, 1997. Thesis Advisor: N. Mirenkov.
Yamamoto, T., Digital Library (CD-ROM jike box) supported by Java. The Univ. of Aizu, 1998. Thesis Advisor: S. Sedukhin.
Matsukura, T., Java Interface to the Computing and Communication WWW Glossary. The Univ. of Aizu, 1998. Thesis Advisor: S. Sedukhin.
Morimoto, N., Reduced Array Processor for 2-dimensional Discrete Fourier Transform. The Univ. of Aizu, 1998. Thesis Advisor: S. Sedukhin.
Nagai, M., Design of Array Processors for Matrix Inversion. The Univ. of Aizu, 1998. Thesis Advisor: S. Sedukhin.
Takigahira, T., Division-free Algorithms and Array Processors for Solving Linear System of Equations. The Univ. of Aizu, 1998. Thesis Advisor: S. Sedukhin.
Nagata, H., Computer Interface for Red-Black Trees Using Java. The Univ. of Aizu, 1997. Thesis Advisor: S. Peng.
Takayanagi, K., Comuter Animation for Client-Server Model. The Univ. of Aizu, 1997. Thesis Advisor: S. Peng.
Naruse K., Modeling hte DNA Structure Using Java. The Univ. of Aizu, 1997. Thesis Advisor: S. Peng.
Suzuki, T., Modeling the DNA double helix Using Java and HTML. The Univ. of Aizu, 1997. Thesis Advisor: S. Peng.
Takahashi, M., Computer Animation Using Java. The Univ. of Aizu, 1997. Thesis Advisor: S. Peng.

Next: Computer Networks Laboratory Up: Software Department Previous: Language Processing Systems

www@u-aizu.ac.jp
December 1998