Della Pietra, Stephen
Overview
Works:  4 works in 6 publications in 1 language and 15 library holdings 

Roles:  Author 
Publication Timeline
.
Most widely held works by
Stephen Della Pietra
Inducing features of random fields by Stephen Della Pietra(
Book
)
1 edition published in 1995 in English and held by 5 WorldCat member libraries worldwide
Abstract: "We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the KullbackLeibler divergence between the model and the empirical distribution of the training data. A greedy algorithm determines how features are incrementally added to the field and an iterative scaling algorithm is used to estimate the optimal values of the weights. The random field models and techniques introduced in this paper differ from those common to much of the computer vision literature in that the underlying random fields are nonMarkovian and have a large number of parameters that must be estimated. Relations to other learning approaches including decision trees and Boltzmann machines are given. As a demonstration of the method, we describe its application to the problem of automatic word classification in natural language processing."
1 edition published in 1995 in English and held by 5 WorldCat member libraries worldwide
Abstract: "We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the KullbackLeibler divergence between the model and the empirical distribution of the training data. A greedy algorithm determines how features are incrementally added to the field and an iterative scaling algorithm is used to estimate the optimal values of the weights. The random field models and techniques introduced in this paper differ from those common to much of the computer vision literature in that the underlying random fields are nonMarkovian and have a large number of parameters that must be estimated. Relations to other learning approaches including decision trees and Boltzmann machines are given. As a demonstration of the method, we describe its application to the problem of automatic word classification in natural language processing."
A Sortonce parallel method for the EM algorithm by S Della Pietra(
Book
)
1 edition published in 1995 in English and held by 5 WorldCat member libraries worldwide
Abstract: "We present a method for implementing the wellknown Expectation Maximization or EM algorithm that works in parallel, requires only a moderate amount of primary memory, and requires just a single sort of each of a small number of moderatesized files. Our method addresses the problems of efficient merging of a large sparse table of expected counts, where this merge is performed across distinct parallel threads of the computation, and efficient redistribution of the reestimated model parameters. We have implemented this method and used it to build a link grammer language model."
1 edition published in 1995 in English and held by 5 WorldCat member libraries worldwide
Abstract: "We present a method for implementing the wellknown Expectation Maximization or EM algorithm that works in parallel, requires only a moderate amount of primary memory, and requires just a single sort of each of a small number of moderatesized files. Our method addresses the problems of efficient merging of a large sparse table of expected counts, where this merge is performed across distinct parallel threads of the computation, and efficient redistribution of the reestimated model parameters. We have implemented this method and used it to build a link grammer language model."
Duality and auxiliary functions for Bregman distances by Stephen Della Pietra(
Book
)
2 editions published between 2001 and 2002 in English and held by 3 WorldCat member libraries worldwide
Abstract: "We formulate and prove a convex duality theorem for Bregman distances and present a technique based on auxiliary functions for deriving and proving convergence of iterative algorithms to minimize Bregman distance subject to linear constraints."
2 editions published between 2001 and 2002 in English and held by 3 WorldCat member libraries worldwide
Abstract: "We formulate and prove a convex duality theorem for Bregman distances and present a technique based on auxiliary functions for deriving and proving convergence of iterative algorithms to minimize Bregman distance subject to linear constraints."
Duality and Auxiliary Functions for Bregman Distances (revised)(
)
2 editions published between 2001 and 2002 in English and held by 0 WorldCat member libraries worldwide
In this paper, the authors formulate and prove a convex duality theorem for minimizing a general class of Bregman distances subject to linear constraints. The duality result is then used to derive iterative algorithms for solving the associated optimization problem. Their presentation is motivated by the recent work of Collins, Schapire, and Singer (2001), who showed how certain boosting algorithms and maximum likelihood logistic regression can be unified within the framework of Bregman distances. In particular, specific instances of the results given here are used by Collins et al. (2001) to show the convergence of a family of iterative algorithms for minimizing the exponential or logistic loss. Following an introduction, Section 2 recalls the standard definitions from convex analysis that will be required, and presents the technical assumptions made on the class of Bregman distances that the authors work with. They also introduce some new terminology, using the terms LegendreBregman conjugate and LegendreBregman projection to extend the classical notion of the Legendre conjugate and transform to Bregman distances. Section 3 contains the statement and proof of the duality theorem that connects the primal problem with its dual, showing that the solution is characterized in geometrical terms by a Pythagorean equality. Section 4 defines the notion of an auxiliary function, which is used to construct iterative algorithms for solving constrained optimization problems. This section shows how convexity can be used to derive an auxiliary function for Bregman distances based on separable functions. The last section summarizes the main results of the paper
2 editions published between 2001 and 2002 in English and held by 0 WorldCat member libraries worldwide
In this paper, the authors formulate and prove a convex duality theorem for minimizing a general class of Bregman distances subject to linear constraints. The duality result is then used to derive iterative algorithms for solving the associated optimization problem. Their presentation is motivated by the recent work of Collins, Schapire, and Singer (2001), who showed how certain boosting algorithms and maximum likelihood logistic regression can be unified within the framework of Bregman distances. In particular, specific instances of the results given here are used by Collins et al. (2001) to show the convergence of a family of iterative algorithms for minimizing the exponential or logistic loss. Following an introduction, Section 2 recalls the standard definitions from convex analysis that will be required, and presents the technical assumptions made on the class of Bregman distances that the authors work with. They also introduce some new terminology, using the terms LegendreBregman conjugate and LegendreBregman projection to extend the classical notion of the Legendre conjugate and transform to Bregman distances. Section 3 contains the statement and proof of the duality theorem that connects the primal problem with its dual, showing that the solution is characterized in geometrical terms by a Pythagorean equality. Section 4 defines the notion of an auxiliary function, which is used to construct iterative algorithms for solving constrained optimization problems. This section shows how convexity can be used to derive an auxiliary function for Bregman distances based on separable functions. The last section summarizes the main results of the paper
Audience Level
0 

1  
Kids  General  Special 
Related Identities
Languages