Kung, H. T.Overview
Most widely held works by
H. T Kung
Traffic management for highspeed networks
by H. T Kung
(
)
8 editions published in 1997 in English and held by 1,514 WorldCat member libraries worldwide
VLSI systems and computations
by H. T Kung
(
Book
)
13 editions published in 1981 in English and Italian and held by 332 WorldCat member libraries worldwide
An optimality theory of concurrency control for databases
by H. T Kung
(
Book
)
8 editions published between 1979 and 1980 in English and Undetermined and held by 17 WorldCat member libraries worldwide A concurrency control mechanism (or a scheduler) is the component of a database system that safeguards the consistency of the database in the presence of interleaved accesses and update requests. We formally show that the performance of a scheduler, i.e., the amount of parallelism that it supports, depends explicitly upon the amount of information that is available to the scheduler. We point out that most previous work on concurrency control is simply concerned with specific points of the base tradeoff between performance and information. In fact, several of these approaches are shown to be optimal for the amount of information that they use. (Author)
Sorting on a meshconnected parallel computer
by C. D Thompson
(
Book
)
2 editions published in 1976 in English and held by 10 WorldCat member libraries worldwide Two algorithms are presented for sorting M to the 2nd power elements on an nxn meshconnected processor array that require O(n) routing and comparison steps. The best previous algorithm takes time O(n log n). The algorithms of this paper are shown to be optimal in time within small constant factors. Extensions to higherdimensional meshconnected processor arrays are also given. (Author)
Why systolic architectures?
by H. T Kung
(
Book
)
5 editions published between 1981 and 1982 in English and Undetermined and held by 10 WorldCat member libraries worldwide
A systolic algorithm for integer GCD computation
by R. P Brent
(
Book
)
6 editions published in 1984 in English and Undetermined and held by 9 WorldCat member libraries worldwide Abstract: "We show that the greatest common divisor of two nbit integers (given in the usual binary representation) can be computed in time O(n) on a linear array of O(n) identical systolic cells, each of which is a finitestate machine with connections to its nearest neighbours."
An efficient parallel garbage collection system and its correctness proof
by H. T Kung
(
Book
)
3 editions published in 1977 in English and Undetermined and held by 9 WorldCat member libraries worldwide An efficient system to perform garbage collection in parallel with list operations is proposed and its correctness is proven. The system consists of two independent processes sharing a common memory. One process is performed by the list processor (LP) for list processing and the other by the garbage collector (GC) for marking active nodes and collecting garbage nodes. The system is derived by using both the correctness and efficiency arguments. Assuming that memory references are indivisible the system satisfies the following properties: No critical sections are needed in the entire system. The time to perform the marking phase by the GC is independent of the size of memory, but depends only on the number of active nodes. Nodes on the free list need not be marked during the marking phase by the GC. Minimum overheads are introduced to the LP. Only two extra bits for encoding four colors are needed for each node. Efficiency results show that the parallel system is usually significantly more efficient in terms of storage and time than the sequential stack algorithm. (Author)
Systolic algorithms for the CMU Warp processor
by H. T Kung
(
Book
)
3 editions published in 1984 in English and Undetermined and held by 8 WorldCat member libraries worldwide The prototype has 10 cells, each of which is capable of performing 10 million floatingpoint operations per second (10 MFLOPS) and is build on a single board using only offtheshelf components. This 10cell processor for example can process 1024point complex FFTs at a rate of one FFT every 600 [mu]s. Under program control, the same processor can perform many other primitive computations in signal, image and vision processing, including twodimensional convolution and complex matrix multiplication, at a rate of 100 MFLOPS. Together with another processor capable of performing divisions and square roots, the processor can also efficiently carry out a number of difficult matrix operations such as solving covariant linear systems, a crucial computation in realtime adaptive signal processing. This paper outlines the architecture of the Warp processor and describes how the signal processing tasks are implemented on the processor."
Faulttolerance and twolevel pipelining in VLSI systolic arrays
by H. T Kung
(
Book
)
3 editions published in 1983 in English and Undetermined and held by 7 WorldCat member libraries worldwide This paper addresses two important issues in systolic array designs: faulttolerance and twolevel pipelining. The proposed 'systolic' faulttolerant scheme maintains the original data flow pattern by bypassing defective cells with a few registers. As a result, many of the desirable properties of systolic arrays (such as local and regular communication between cells) are preserved. Twolevel pipelining refers to the use of pipelined functional units in the implementation of systolic cells. This paper addresses the problem of efficiently utilizing pipelined units to increase the overall system throughput. We show that both of these problems can be reduced to the same mathematical problem of incorporating extra delays on certain data paths in originally correct systolic designs. We introduce the mathematical notion of a cut which enables us to handle this problem effectively. The results obtained by applying the techniques described in this paper are encouraging. When applied to systolic arrays without feedback cycles, the arrays can tolerate large numbers of failures (with the addition of very little hardware) while maintaining the original throughput. Furthermore, all of the pipeline stages in the cells can be kept fully utilized through the addition of a small number of delay registers. However, adding delays to systolic arrays with cycles typically induces a significant decrease in throughput. In response to this, we have derived a new class of systolic algorithms in which the data cycle around a ring of processing cells
Synchronized and asynchronous parallel algorithms for multiprocessors
by H. T Kung
(
Book
)
1 edition published in 1976 in English and held by 7 WorldCat member libraries worldwide
All algebraic functions can be computed fast
by H. T Kung
(
Book
)
3 editions published in 1976 in English and Undetermined and held by 7 WorldCat member libraries worldwide The expansions of algebraic functions can be computed 'fast' using the Newton Polygon Process and any 'normal' iteration. Let M(j) be the number of operations sufficient to multiply two jth degree polynomials. It is shown that the first N terms of an expansion of any algebraic function defined by an nth degree polynomial can be computed in O(n(M(N)) operations, while the classical method needs O(N sup n) operations. Among the numerous applications of algebraic functions are symbolic mathematics and combinatorial analysis. Reversion, reciprocation, and nth root of a polynomial are all special cases of algebraic functions
The areatime complexity of Binary multiplication
by R. P Brent
(
Book
)
3 editions published in 1979 in English and Undetermined and held by 7 WorldCat member libraries worldwide We consider the problem of performing multiplication of nbit binary numbers on a chip. Let A denote the chip area, and T the time required to perform multiplication. Using a model of computation which is a realistic approximation to current and anticipated VLSI technology, we show that (A/A sub 0) (T/T sub 0) to the 2 alpha power> or = n to the (1 + alpha) power for all alpha is an element (0, 1), where A sub 0 and T sub 0 are positive constants which depend on the technology but are independent of n. The exponent 1 + alpha is the best possible. A consequence is that binary multiplication is 'harder' than binary addition if AT to the 2 alpha power is used as a complexity measure for any alpha> or = 0. (Author)
Numerically stable solution of dense systems of linear equations using meshconnected processors
by A Bojanczyk
(
Book
)
2 editions published in 1981 in English and held by 6 WorldCat member libraries worldwide
Systolic arrays for (VLSI)
by H. T Kung
(
Book
)
4 editions published between 1978 and 1979 in English and Undetermined and held by 6 WorldCat member libraries worldwide A systolic system is a network of processors which rhythmically compute and pass data through the system. Physiologists use the work 'systole' to refer to the rhythmically recurrent contraction of the heart and arteries which pulses blood through the body. In a systolic computing system, the function of a processor is analogous to that of the heart. Every processor regularly pumps data in and out, each time performing some short computation, so that a regular flow of data is kept up in the network. Many basic matrix computations can be pipelined elegantly and efficiently on systolic networks having an array structure. As an example, hexagonally connected processors can optimally perform matrix multiplication. Surprisingly, a similar systolic array can compute the LUdecomposition of a matrix. These systolic arrays enjoy simple and regular communication paths, and almost all processors used in the networks are identical. As a result, special purpose hardware devices based on systolic arrays can be built inexpensively using the VLSI technology. (Author)
A systolic 2D convolution chip
by H. T Kung
(
Book
)
2 editions published in 1981 in English and held by 6 WorldCat member libraries worldwide This paper describes a chip for performing the 2D (twodimensional) convolution in signal and image processing. The chip, based on a systolic design, consists of essentially only one type of simple cells, which are meshinterconnected in a regular and modular way, and achieves high performance through extensive concurrent and pipelined use of these cells. Denoting by u the cycle time of the basic cell, the chip allows convolving a kxk window with an nxn image in O(sq m)(u/k) time, using a total of cu k basic cells. The total number of cells is optimal in the sense that the usual sequential algorithm takes O(sq m)(sq k)(u) time. Furthermore, because of the modularity of the design, the number of cells used by the chip can be easily adjusted to achieve any desirable balance between I/O and computation speeds. (Author)
Systolic (VLSI) arrays for relational database operations
by H. T Kung
(
Book
)
1 edition published in 1980 in English and held by 6 WorldCat member libraries worldwide
MISE, Machine for InSystem Evaluation of custom VLSI chips
(
Book
)
3 editions published in 1982 in English and held by 6 WorldCat member libraries worldwide This paper identifies some of the key research problems that one encounters in specifying, designing, testing and demonstrating a custom chip in relation to the application system in which it will be used, and proposes a system called MISE(Machine For InSystem Evaluation) to be a solution to the issues raised
Deadlock avoidance for systolic communication
by H. T Kung
(
Book
)
3 editions published in 1987 in English and Undetermined and held by 6 WorldCat member libraries worldwide
Virtual channels for faulttolerant programmable twodimensional processor arrays
by H. T Kung
(
Book
)
3 editions published in 1986 in English and Undetermined and held by 6 WorldCat member libraries worldwide
Let's design algorithms for VLSI systems
by H. T Kung
(
Book
)
2 editions published in 1979 in English and held by 6 WorldCat member libraries worldwide more
fewer
Audience Level
Related Identities
Associated Subjects
Algebraic functions Algorithms Binary system (Mathematics) Computer architecture Computer programmingManagement Computers ComputersAccess control ComputersCircuits Database management Data transmission systems Debugging in computer science Digital electronics Electronics Engineering Faulttolerant computing Image processing Integrated circuits Integrated circuitsLarge scale integration Integrated circuitsVery large scale integration Linear programming Microprocessors Multiprocessors Parallel processing (Electronic computers) Parallel programming (Computer science) Polynomials Signal theory (Telecommunication) Sorting (Electronic computers) Systolic array circuits TelecommunicationTrafficManagement

Languages
Covers
