Song, S. W.
Overview
Works:  9 works in 13 publications in 1 language and 22 library holdings 

Roles:  Author 
Classifications:  QA76.9.D3, 510.7808 
Publication Timeline
.
Most widely held works by
S. W Song
An efficient parallel garbage collection system and its correctness proof by
H. T Kung(
Book
)
3 editions published in 1977 in English and Undetermined and held by 9 WorldCat member libraries worldwide
An efficient system to perform garbage collection in parallel with list operations is proposed and its correctness is proven. The system consists of two independent processes sharing a common memory. One process is performed by the list processor (LP) for list processing and the other by the garbage collector (GC) for marking active nodes and collecting garbage nodes. The system is derived by using both the correctness and efficiency arguments. Assuming that memory references are indivisible the system satisfies the following properties: No critical sections are needed in the entire system. The time to perform the marking phase by the GC is independent of the size of memory, but depends only on the number of active nodes. Nodes on the free list need not be marked during the marking phase by the GC. Minimum overheads are introduced to the LP. Only two extra bits for encoding four colors are needed for each node. Efficiency results show that the parallel system is usually significantly more efficient in terms of storage and time than the sequential stack algorithm. (Author)
3 editions published in 1977 in English and Undetermined and held by 9 WorldCat member libraries worldwide
An efficient system to perform garbage collection in parallel with list operations is proposed and its correctness is proven. The system consists of two independent processes sharing a common memory. One process is performed by the list processor (LP) for list processing and the other by the garbage collector (GC) for marking active nodes and collecting garbage nodes. The system is derived by using both the correctness and efficiency arguments. Assuming that memory references are indivisible the system satisfies the following properties: No critical sections are needed in the entire system. The time to perform the marking phase by the GC is independent of the size of memory, but depends only on the number of active nodes. Nodes on the free list need not be marked during the marking phase by the GC. Minimum overheads are introduced to the LP. Only two extra bits for encoding four colors are needed for each node. Efficiency results show that the parallel system is usually significantly more efficient in terms of storage and time than the sequential stack algorithm. (Author)
A systolic 2D convolution chip by
H. T Kung(
Book
)
3 editions published in 1981 in English and held by 6 WorldCat member libraries worldwide
This paper describes a chip for performing the 2D (twodimensional) convolution in signal and image processing. The chip, based on a systolic design, consists of essentially only one type of simple cells, which are meshinterconnected in a regular and modular way, and achieves high performance through extensive concurrent and pipelined use of these cells. Denoting by u the cycle time of the basic cell, the chip allows convolving a kxk window with an nxn image in O(sq m)(u/k) time, using a total of cu k basic cells. The total number of cells is optimal in the sense that the usual sequential algorithm takes O(sq m)(sq k)(u) time. Furthermore, because of the modularity of the design, the number of cells used by the chip can be easily adjusted to achieve any desirable balance between I/O and computation speeds. (Author)
3 editions published in 1981 in English and held by 6 WorldCat member libraries worldwide
This paper describes a chip for performing the 2D (twodimensional) convolution in signal and image processing. The chip, based on a systolic design, consists of essentially only one type of simple cells, which are meshinterconnected in a regular and modular way, and achieves high performance through extensive concurrent and pipelined use of these cells. Denoting by u the cycle time of the basic cell, the chip allows convolving a kxk window with an nxn image in O(sq m)(u/k) time, using a total of cu k basic cells. The total number of cells is optimal in the sense that the usual sequential algorithm takes O(sq m)(sq k)(u) time. Furthermore, because of the modularity of the design, the number of cells used by the chip can be easily adjusted to achieve any desirable balance between I/O and computation speeds. (Author)
On a highperformance VLSI solution to database problems by Siang Wun Song(
Book
)
1 edition published in 1981 in English and held by 1 WorldCat member library worldwide
This thesis explores the design and use of custommade VLSI hardware in the area of database problems. Our effort differs from most previous ones in that we search for structures and algorithms, directly implementable on silicon, for the solution of computationintensive database problems. The types of target database systems include the general database management systems and the design database systems. The thesis deals mainly with database systems of the relational model. One common view concerning specialpurpose hardware usage is that it performs a specific task. The proposed device is not a hardware solution to a specific problem, but provides a number of useful data structures and basic operations. It can be used to improve the performance of any sequential algorithm which makes extensive use of such data structures and basic operations. The design is based on a few basic cells, interconnected together in the form of a complete binary tree. The proposed device can handle all the basic relational operations: select, join, project, union, and intersection. With a specialpurpose device of limited size attached to a host, the overall performance may ultimately be dictated by the I/O between the two sites. The ideal specialpurpose device design is one that achieves a balance between computation and I/O. We propose a model to study the I/O complexity for sorting n numbers with any specialpurpose hardware device of size s, and show a lower bound result of omega (n log n/log s). We present an optimal design achieving this bound. An important finding is that for practical ranges on the quantity of data to be sorted, systolic sorting devices of small sizes can beat fast sequential sorting algorithms
1 edition published in 1981 in English and held by 1 WorldCat member library worldwide
This thesis explores the design and use of custommade VLSI hardware in the area of database problems. Our effort differs from most previous ones in that we search for structures and algorithms, directly implementable on silicon, for the solution of computationintensive database problems. The types of target database systems include the general database management systems and the design database systems. The thesis deals mainly with database systems of the relational model. One common view concerning specialpurpose hardware usage is that it performs a specific task. The proposed device is not a hardware solution to a specific problem, but provides a number of useful data structures and basic operations. It can be used to improve the performance of any sequential algorithm which makes extensive use of such data structures and basic operations. The design is based on a few basic cells, interconnected together in the form of a complete binary tree. The proposed device can handle all the basic relational operations: select, join, project, union, and intersection. With a specialpurpose device of limited size attached to a host, the overall performance may ultimately be dictated by the I/O between the two sites. The ideal specialpurpose device design is one that achieves a balance between computation and I/O. We propose a model to study the I/O complexity for sorting n numbers with any specialpurpose hardware device of size s, and show a lower bound result of omega (n log n/log s). We present an optimal design achieving this bound. An important finding is that for practical ranges on the quantity of data to be sorted, systolic sorting devices of small sizes can beat fast sequential sorting algorithms
Achieving optimality for gate matrix layout and PLA folding: a graph theoretic approach by
Afonso Ferreira(
Book
)
1 edition published in 1992 in English and held by 1 WorldCat member library worldwide
1 edition published in 1992 in English and held by 1 WorldCat member library worldwide
A parallel algorithm for transitive closure by E. N Cáceres(
Book
)
1 edition published in 2002 in English and held by 1 WorldCat member library worldwide
Abstract: "We present a parallel algorithm for the problem of computing the transitive closure for an acyclic digraph D with n vertices and m edges. We use the BSP/CGM model of parallel computing. Our algorithm uses O(log p) rounds of communications with p processors, where p [<or =] n, and each processor has O(mn/p) local memory. The local computation of each processor is equal to the product of the number of edges and vertices of D that are stored in p."
1 edition published in 2002 in English and held by 1 WorldCat member library worldwide
Abstract: "We present a parallel algorithm for the problem of computing the transitive closure for an acyclic digraph D with n vertices and m edges. We use the BSP/CGM model of parallel computing. Our algorithm uses O(log p) rounds of communications with p processors, where p [<or =] n, and each processor has O(mn/p) local memory. The local computation of each processor is equal to the product of the number of edges and vertices of D that are stored in p."
Revisiting cycle shrinking by
Y Robert(
Book
)
1 edition published in 1991 in English and held by 1 WorldCat member library worldwide
1 edition published in 1991 in English and held by 1 WorldCat member library worldwide
A highly configurable architecture for systolic arrays of powerful processors by
Universidade de São Paulo(
Book
)
1 edition published in 1990 in English and held by 1 WorldCat member library worldwide
1 edition published in 1990 in English and held by 1 WorldCat member library worldwide
Sequential and parallel algorithms for the allsubstrings longest common subsequence problem by Carlos Eduardo Rodrigues Alves(
Book
)
1 edition published in 2003 in English and held by 1 WorldCat member library worldwide
Abstract: "Given two strings A and B of lengths n[subscript a] and n[subscript b], respectively, the Allsubstrings Longest Common Subsequence (ALCS) problem obtains, for any substring B ́of B, the length of the longest string that is a subsequence of both A and B.́ The sequential algorithm takes O(n[subscript a]n[subscript b]) time and O(n[subscript b]) space. We present a parallel algorithm for the ALCS on the Coarse Grained Multicomputer (BSP/CGM) model with p <[square root of m] processors, that takes O(n[subscript a]n[subscript b]/p time and O(n[subscript b[square root of n[subscript a]] space per processor, with O(log p) communication rounds. The proposed algorithm also solves the basic Longest Common Subsequence (LCS) Problem that finds the longest string (and not only its length) that is a subsequence of both A and B. To our knowledge, this is the best BSP/CGM algorithm for the LCS and ALCS problems in the literature."
1 edition published in 2003 in English and held by 1 WorldCat member library worldwide
Abstract: "Given two strings A and B of lengths n[subscript a] and n[subscript b], respectively, the Allsubstrings Longest Common Subsequence (ALCS) problem obtains, for any substring B ́of B, the length of the longest string that is a subsequence of both A and B.́ The sequential algorithm takes O(n[subscript a]n[subscript b]) time and O(n[subscript b]) space. We present a parallel algorithm for the ALCS on the Coarse Grained Multicomputer (BSP/CGM) model with p <[square root of m] processors, that takes O(n[subscript a]n[subscript b]/p time and O(n[subscript b[square root of n[subscript a]] space per processor, with O(log p) communication rounds. The proposed algorithm also solves the basic Longest Common Subsequence (LCS) Problem that finds the longest string (and not only its length) that is a subsequence of both A and B. To our knowledge, this is the best BSP/CGM algorithm for the LCS and ALCS problems in the literature."
Revisiting hamiltoniam decomposition of the hypercube by Kunio Okuda(
Book
)
1 edition published in 1998 in English and held by 1 WorldCat member library worldwide
Abstract: "The Hamiltonian decomposition of a hypercube or binary ncube is the partitioning of its edge set into Hamiltonian cycles. It is known that there are [n/2] disjoint Hamiltonian cycles on a binary ncube. The proof of this result, however, does not give rise to any simple construction algorithm of such cycles. In a previous work Song presents ideas towards a simple and interesting method to this problem. Two phases are involved. First decompose the binary ncube into cycles of length 16, C₁₆, and then apply a merge operator to join the C₁₆ cycles into larger Hamiltonian cycles. The case of dimension n=6 (a 64node hypercube) is illustrated. He conjectures the method can be generalized for any even n. In this paper, we generalize the first phase of that method for any even n and prove its correctness. Also we show four possible merge operators for the case of n=8 (a 256node hypercube). This result can be viewed as a step toward the general merge operator, thus proving the conjecture."
1 edition published in 1998 in English and held by 1 WorldCat member library worldwide
Abstract: "The Hamiltonian decomposition of a hypercube or binary ncube is the partitioning of its edge set into Hamiltonian cycles. It is known that there are [n/2] disjoint Hamiltonian cycles on a binary ncube. The proof of this result, however, does not give rise to any simple construction algorithm of such cycles. In a previous work Song presents ideas towards a simple and interesting method to this problem. Two phases are involved. First decompose the binary ncube into cycles of length 16, C₁₆, and then apply a merge operator to join the C₁₆ cycles into larger Hamiltonian cycles. The case of dimension n=6 (a 64node hypercube) is illustrated. He conjectures the method can be generalized for any even n. In this paper, we generalize the first phase of that method for any even n and prove its correctness. Also we show four possible merge operators for the case of n=8 (a 256node hypercube). This result can be viewed as a step toward the general merge operator, thus proving the conjecture."
Audience Level
0 

1  
Kids  General  Special 
Related Identities
 Kung, H. T. Author
 CarnegieMellon University Computer Science Department
 CARNEGIEMELLON UNIV PITTSBURGH PA Dept. of COMPUTER SCIENCE
 Universidade de São Paulo Instituto de Matemática e Estatística Departamento de Ciencia da Computacao
 Robert, Y. Author
 Menzilcioglu, O.
 Okuda, Kunio Author
 Cáceres, Edson Norberto
 Alves, Carlos Eduardo Rodrigues Author
 Szwarcfilter, Jayme L.
Associated Subjects
Acyclic models Computer programmingManagement Database management Debugging in computer science Decomposition (Mathematics) Directed graphs Hamiltonian graph theory Image processing Integrated circuitsLarge scale integration Parallel algorithms Sequences (Mathematics) Signal theory (Telecommunication)
Languages