WorldCat Identities

Leskovec, Jurij

Overview
Works: 28 works in 40 publications in 2 languages and 270 library holdings
Roles: Author, Thesis advisor
Classifications: QA76.9.D343, 006.312
Publication Timeline
.
Most widely held works by Jurij Leskovec
Mining of massive datasets by Anand Rajaraman( Book )

13 editions published between 2011 and 2014 in English and held by 109 WorldCat member libraries worldwide

This book focuses on practical algorithms that have been used to solve key problems in data mining and can be applied successfully to even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. Other chapters cover the PageRank idea and related tricks for organizing the Web, the problems of finding frequent itemsets and clustering. This second edition includes new and extended coverage on social networks, machine learning and dimensionality reduction. It includes a range of over 150 challenging exercises. --
Dynamics of large networks by Jurij Leskovec( Book )

1 edition published in 2008 in English and held by 2 WorldCat member libraries worldwide

Abstract: "A basic premise behind the study of large networks is that interaction leads to complex collective behavior. In our work we found very interesting and counterintuitive patterns for time evolving networks, which change some of the basic assumptions that were made in the past. We then develop models that explain processes which govern the network evolution, fit such models to real networks, and use them to generate realistic graphs or give formal explanations about their properties. In addition, our work has a wide range of applications: it can help us spot anomalous graphs and outliers, forecast future graph structure and run simulations of network evolution. Another important aspect of our research is the study of 'local' patterns and structures of propagation in networks. We aim to identify building blocks of the networks and find the patterns of influence that these blocks have on information or virus propagation over the network. Our recent work included the study of the spread of influence in a large person to-person product recommendation network and its effect on purchases. We also model the propagation of information on the blogosphere, and propose algorithms to efficiently find influential nodes in the network. A central topic of our thesis is also the analysis of large datasets as certain network properties only emerge and thus become visible when dealing with lots of data. We analyze the world's largest social and communication network of Microsoft Instant Messenger with 240 million people and 255 billion conversations. We also made interesting and counterintuitive observations about network community structure that suggest that only small network clusters exist, and that they merge and vanish as they grow."
Data association for topic intensity tracking by Andreas Krause( Book )

1 edition published in 2006 in English and held by 1 WorldCat member library worldwide

Abstract: "We present a unified model of what was traditionally viewed as two separate tasks: data association and intensity tracking of multiple topics over time. In the data association part, the task is to assign a topic (a class) to each data point, and the intensity tracking part models the bursts and changes in intensities of topics over time. Our approach to this problem combines an extension of Factorial Hidden Markov models for topic intensity tracking with exponential order statistics for implicit data association. Our approach is general in the sense that it can be combined with a variety of learning techniques; we demonstrate this flexibility by applying it in a supervised, an unsupervised and a semi-supervised (active learning) setting. Experiments on text and email datasets show that the interplay of classification and topic intensity tracking improves the accuracy of both classification and intensity tracking. Even a little noise in topic assignments can mislead the traditional algorithms. However, our approach detects correct topic intensities even with 30% topic noise."
Razpoznavanje obrazov z nevronskim omrežjem : diplomsko delo by Jurij Leskovec( Book )

1 edition published in 2007 in Slovenian and held by 1 WorldCat member library worldwide

Reputation and incentives in online social systems by Ashton Anderson( )

1 edition published in 2016 in English and held by 1 WorldCat member library worldwide

Online social systems are quickly becoming fundamental to a large part of the modern world, as the platforms for an ever-increasing diversity of aspects of daily life. This thesis aims to develop principled foundations for the reputation and incentive mechanisms that underpin online social systems. These social mechanisms are studied at various levels of resolution: at a microscale, where the atomic units of online social behavior exist; at a mesoscale, where these atomic units coalesce into collective phenomena that affect groups of people; and at a macroscale, where patterns of social interaction in entire communities are guided by the social mechanisms we design. Work in this thesis begins with an examination of the user-to-user evaluations that form the basis of reputation systems across different domains, and shows how relative similarity and relative status play critical roles in shaping these evaluations. We apply this understanding to successfully predict how communities synthesize the many evaluations of a single person into a collective opinion of their reputation. We also investigate how the structure of feedback from social mechanisms can be used to identify content of long-term value in the important domain of question-answering sites. Finally, we introduce a framework for understanding the incentive structures introduced by badge systems. We develop a model for reasoning about user behavior in the presence of badges, and then validate its predictions on real-world data. We find that badges can influence and steer user behavior on a site--leading both to increased participation and to changes in the mix of activities a user pursues. Several robust design principles emerge from our framework that could potentially aid in the design of incentives for a broad range of sites. Finally, we discuss our implementation of a large-scale badge system to 100,000 students on an online education platform. We find a fivefold increase in forum engagement as a result of our system. Overall, we find that studying social mechanisms at all levels of resolution leads to principled foundations that can improve the design of reputation and incentive systems
Human navigation of information networks by Robert West( )

1 edition published in 2016 in English and held by 1 WorldCat member library worldwide

Network navigation constitutes a fundamental human behavior: in order to make use of the information and resources around us, we constantly explore, disentangle, and browse networks such as the Web, social networks, academic paper collections, and encyclopedias, among others. Studying the navigation patterns humans employ is important because it lets us better understand how humans reason about complex networks and lets us build more intuitively navigable and human-friendly information systems. In this dissertation, we study how humans navigate information networks by analyzing tens of thousands of navigation traces harvested from the human-computation game Wikispeedia, where participants are asked to navigate between two given Wikipedia articles in as few clicks as possible. We first shed light on human navigation strategies by describing the anatomy of typical human navigation traces. We then build on these results to develop models and tools for predicting the targets of human paths from only the first few clicks, learning to navigate automatically, and recommending the insertion of important missing hyperlinks. These are useful building blocks for designing more intuitively navigable information spaces and tools to help people find information. The navigation traces collected through the Wikispeedia game have the unique property of being labeled with users' explicit navigation targets. In general, however, humans need not have a precise target in mind when navigating the Web. Records of such navigation traces are abundant in the logs kept by any web server software. We demonstrate the value of passively collected web server logs by presenting an algorithm that leverages such raw logs in order to improve website hyperlink structure. The resulting system is deployed on Wikipedia's full server logs at terabyte scale, producing links that are clicked 12 times as frequently as the average link added by human Wikipedia editors
Large scale graph completion by Reza Bosagh Zadeh( )

1 edition published in 2014 in English and held by 1 WorldCat member library worldwide

We present a framework for completing missing edges in a large graph. We focus on each component of the framework separately, provide algorithms, prove efficiency guarantees, and run experiments. The system described is partially in production at the Twitter web service. In the first chapter we describe a method to compute similar nodes in the graph, given a sparsity assumption. In the second chapter, we describe a generalization of the first chapter to compute singular values of a very tall and skinny matrix. Such matrices are so large that they cannot even be streamed through a single machine. In the final chapter, we develop a novel machine learning algorithm to learn weights on a random walk, while also modeling edge removals
Cost-effective outbreak detection in networks( Book )

1 edition published in 2007 in English and held by 1 WorldCat member library worldwide

Abstract: "Given a water distribution network, where should we place sensors to quickly detect contaminants? Or, which blogs should we read to avoid missing important stories? These seemingly different problems share common structure: Outbreak detection can be modeled as selecting nodes (sensor locations, blogs) in a network, in order to detect the spreading of a virus or information as quickly as possible. We present a general methodology for near optimal sensor placement in these and related problems. We demonstrate that many realistic outbreak detection objectives (e.g., detection likelihood, population affected) exhibit the property of 'submodularity.' We exploit submodularity to develop an efficient algorithm that scales to large problems, achieving near optimal placements, while being 700 times faster than a simple greedy algorithm. We also derive online bounds on the quality of the placements obtained by any algorithm. Our algorithms and bounds also handle cases where nodes (sensor locations, blogs) have different costs. We evaluate our approach on several large real-world problems, including a model of a water distribution network from the EPA, and real blog data. The obtained sensor placements are provably near optimal, providing a constant fraction of the optimal solution. We show that the approach scales, achieving speedups and savings in storage of several orders of magnitude. We also show how the approach leads to deeper insights in both applications, answering multicriteria trade-off, cost-sensitivity and generalization questions."
Voting and historical games by Michael Andrew Munie( )

1 edition published in 2010 in English and held by 1 WorldCat member library worldwide

For a group of agents to make a good decision, we must be able to choose a good decision making method. In this thesis, we first study the problem of choosing decision making methods by taking social choice functions and providing quantitative measures on how well they might perform against a population distribution. Then we study a common, possibly the most common, decision making method online: voting to rate products. In this domain, we provide a new class of models and prove that certain convergence results hold. Finally we prove a connection between this new model and traditional voting methods. In the first half of the thesis, we develop a means of comparing various social choice functions with regards to a desired axiom by quantifying how often the axiom is violated. To this end, we offer a new framework for measuring the quality of social choice functions that builds from and provides a unifying framework for previous research. This framework takes the form of what we call a "violation graph." Graph properties have natural interpretations as metrics for comparing social choice functions. Using the violation graph we present new metrics, such as the minimal domain restriction, for assessing social choice functions and provide exact and probabilistic results for voting rules including plurality, Borda, and Copeland. Motivated by the empirical results, we also prove asymptotic results for scoring voting rules. These results suggest that voting rules based on pairwise comparison (ex: Copeland) are better than scoring rules (ex: Borda count). They also suggest that although we can never fulfill our desired set of axioms, the frequency of violation is so small that with even a modest number of voters we can expect to never violate our axioms. In the second half of the thesis, we define a new class of games called historical influence games (HIGs). HIGs are infinite games in which agents take turns round-robin style iv choosing a value for a single-dimensional variable. The payoff at each stage to each agent is a monotonically decreasing function of the distance between two quantities: a weighted average of past values chosen by all agents, and some fixed ideal value personal to that agent. The overall payoff to the agent is the limit average of the stage payoffs. We show that myopic strategies form a subgame perfect Nash equilibrium in HIGs. Then we introduce certain smoothness constraints on how the impact of a given action changes over time, constraints which define the class of valid HIGs. We prove that for valid HIGs, under myopic play the limit average value converges to what we call the central value, which is the median of the agents' ideal values jointly with certain societal focal points. As a side effect, we show a polarization theorem: after a finite period, almost all agents settle on one of the extreme values. Finally, we show a tight connection between valid HIGs and the class of Moulin strategy-proof voting rules in single-peaked domains
Structure and dynamic processes in complex networks by Chunyan Wang( )

1 edition published in 2013 in English and held by 1 WorldCat member library worldwide

The emergence of cyberspace gave rise to detailed traces of human behavior on-line, which results in an unprecedented opportunity to better understand the dynamics of social activities. Despite its diverse nature, the on-line behavior displays a number of strong regularities which can be understood by drawing on methods from statistical physics. This thesis first discusses statistical properties of a special kind of information network formed on-line, conversation threads, and more importantly, develops a dy- namical model which explains discrepancies in existing studies. It is also demon- strated that there are predictabilities of human interaction patterns by estimating mutual information of activity sequences. Additionally, properties of human behav- ior as a group, such as group purchasing and gathering, are scrutinized and modeled. And finally, diversity patterns of competing opinions/viruses diffused on network are modeled by investigating the formation of Turing pattern on large scale free networks
Skill and billiards by Christopher James Archibald( )

1 edition published in 2011 in English and held by 1 WorldCat member library worldwide

Computational pool is a relatively recent entrant into the group of games played by computer agents. It features a novel combination of properties that distinguish it from other such games, including continuous action and state spaces, uncertainty in execution, a unique turn-taking structure, and of course an adversarial nature. This combination leads to new challenges, both in modeling and reasoning about the game and in designing agents for effective play. We address the modeling challenges by presenting a model of generalized billiards games and showing that an equilibrium exists within this model. To address the practical challenges of designing an agent, we discuss CueCard, our agent which won the 2008 computational pool tournament, with a special focus on which new advancements made this agent successful. The second portion of the dissertation focuses on a topic inspired by the computational pool domain, that of execution skill. In many AI settings an agent is comprised of both action-planning and action-execution components. We first present experimental work in which we examine the relationship between the precision of the execution component, the intelligence of the planning component, and the overall success of the agent within our computational pool framework. Our motivation lies in determining whether higher execution skill rewards more strategic playing. Finally, we present a method for modeling imperfect execution skill in normal form games and examine the effect that changing execution skill levels can have in these games. We then study games in which players have imperfect execution skill and one player's true skill is not common knowledge. In these settings the possibility arises of a player "hustling", or pretending to have lower execution skill than they actually have. Focusing on repeated zero-sum games, we provide a hustle-proof strategy; this strategy guarantees a player the same payoff without knowledge of the opponents execution skill level as they could guarantee with knowledge of the opponent's true execution skill level
Social influence in online environments models and analysis by Simla Ceyhan( )

1 edition published in 2011 in English and held by 1 WorldCat member library worldwide

In this thesis we study the effects of social influence on the decisions of individuals in online environments. We focus on two different applications, social media and peer-to-peer lending. We initially propose a model for the evolution of market share in the presence of social influence. We study a simple market in which individuals arrive sequentially and choose one of the available products. Their decision of which product to choose is a stochastic function of the inherent quality of the product and its market share. Using techniques from stochastic approximation theory, we show that market shares converge to an equilibrium. We also derive the market shares at equilibrium in terms of the level of social influence and the inherent fitness of the products. In a special case, when the choice model is a multinomial logit model, we show that inequality in the market increases with social influence and with strong enough social influence, monopoly occurs. These results support the observations made by Salganik et al. [SDW06] in their experimental study of cultural markets. Next, we consider the effects of social influence in an online P2P lending service. Online peer- to-peer (P2P) lending services are a new type of social platform that enable individuals to borrow and lend money directly to each other. In this part of the thesis, we study the dynamics of bidder behavior in a P2P loan auction website, prosper.com. We investigate the change of various attributes of loan request listings over time, such as the interest rate and the number of bids. We observe the effects of social influence during bidding: for most listings, the rate of bids peaks at very similar time points. We explain these phenomena by showing that there are economic and social factors that lenders take into account when deciding to bid on a listing. We also observe that the profits that lenders make are tied with their bidding preferences. Finally, we build a model based on the temporal progression of the bidding, that reliably predicts the success of a loan request listing, as well as whether a loan will be paid back or not
Green-Marl a domain specific language for graph analysis by Sungpack Hong( )

1 edition published in 2013 in English and held by 1 WorldCat member library worldwide

Graph is a fundamental data structure that captures arbitrary relationship between data entities. In recent years, the importance of efficient and scalable processing of large graph instances has been growing due to emerging applications such as social network services and computational biology, where large graph data-sets are heavily used. Fortunately, recent proliferation of parallel (e.g. multi-core CPUs and GPUs) and distributed (e.g. Amazon's EC2) computing environment has provided ways to exploit inherent parallelism in large graph data processing. However, it is still burdensome for a single programmer to implement graph algorithms correctly and efficiently while exploiting parallelism in a different way for each parallel or distributed environment. In this thesis, we present how this burden can be lightened up by the means of a Domain-Specific Language (DSL). First, we introduce Green-Marl, a domain specific language which allows the users to program their graph algorithms in an intuitive manner. However, the language is also designed in such a way that the underlying data-parallelism in the given graph analysis program is easily exposed to the Green-Marl compiler. Then, I will explain how the compiler can exploit such high-level semantic information to optimize the given user algorithm and to produce an efficient parallel implementation out of it. Experimental results show that the compiler-generated codes are as efficient as hand-coded versions in parallel graph libraries. Next, we explain how the Green-Marl compiler can produce an implementation for a distributed environment from the same Green-Marl program. Again the Green-Marl compiler uses high-level semantic knowledge to transform the given Green-Marl program into another one which is based on completely different programming model. Again the performance of compiler-generated code closely matches with hand-coded version. In summary, I show that the Green-Marl DSL can provide benefits of productivity, performance and portability to the users in the domain of large graph analysis
Functional map networks for the joint analysis of image and shape collections by Fan Wang( )

1 edition published in 2015 in English and held by 1 WorldCat member library worldwide

In many machine learning and computer vision problems, data are often easy and cheap to acquire but very expensive to label. For example, it is effortless to collect a large amount of images of the same object using image search engines, but annotating them about where the objects are costs much more work of a skilled human agent. If we can build relationships between data, abundant information can be transported from labeled ones to unlabeled ones. In our work, we use a novel representation of the relationships between images or shapes, called functional maps. Unlike point-based maps, functional maps build correspondences between functions over images or shapes based on their local properties. We also propose a functional map network to jointly analyze collections of images or shapes. Each image/shape is a node in the network and each edge connecting two images/shapes is associated with the map between them. Finally, we proposed three applications of the map network. First, given a collection of images sharing one similar object, segmentations are transferred among the images via functional maps, and the segmentation for the common object emerges if the map network is cycle-consistent. Second, we extend the functional map networks to handle multi-class joint image segmentation, which is more challenging and requires partial similarity constraint because each common object may only appear in a subset of images. Third, we build a functional map network among 3D shapes for joint shape segmentation. Maps between shapes are regularized and improved by partial cycle-consistency and the shared structure across the collection is discovered, corresponding to the meaningful shape parts. For all applications, experimental results on various benchmark data sets are provided to demonstrate the effectiveness of the proposed approaches
Community structure of large networks by Jaewon Yang( )

1 edition published in 2014 in English and held by 1 WorldCat member library worldwide

One of the main organizing principles in real-world networks is that of network communities, which are sets of nodes that share common properties, functions, or roles. Communities in networks often overlap as nodes can belong to multiple communities at once. Identifying such overlapping network communities is crucial for an understanding of social, technological, and biological networks. In this thesis, we develop a family of accurate and scalable community detection methods and apply them to large networks. We begin by challenging the conventional view that defines network communities as densely connected clusters of nodes. We show that the conventional view leads to an unrealistic structure of community overlaps. We present a new conceptual model of network communities, which reliably captures the overall structure of a network as well as accurately models community overlaps. Based on our model, we develop accurate algorithms for detecting overlapping communities that scale to networks an order of magnitude larger than what was possible before. Our approach leads to novel insights that unify two fundamental organizing principles of networks: modular communities and the commonly observed core-periphery structure. In particular, our results show that dense network cores stem from the overlaps between many communities. As the final part of the thesis, we present several extensions of our models such that we can detect communities with a bipartite connectivity structure and we combine the node attributes and the network structure for community detection
Structure and Dynamics of Diffusion Networks by Manuel Gomez Rodriguez( )

1 edition published in 2013 in English and held by 1 WorldCat member library worldwide

Diffusion of information, ideas, behaviors and diseases are ubiquitous in nature and modern society. One of the main goals of this dissertation is to shed light on the hidden underlying structure of diffusion. To this aim, we developed flexible probabilistic models and inference algorithms that make minimal assumptions about the physical, biological or cognitive mechanisms responsible for diffusion. We avoid modeling the mechanisms underlying individual activations, and instead develop a data-driven approach which uses only the visible temporal traces diffusion generates. We first developed two algorithms, NetInf and MultiTree, that infer the network structure or skeleton over which diffusion takes place. However, both algorithms assume networks to be static and diffusion to occur at equal rates across different edges. We then developed NetRate, an algorithm that allows for static and dynamic networks with different rates across different edges. NetRate infers not only the network structure but also the rate of every edge. Finally, we develop a general theoretical framework of diffusion based on survival theory. Our models and algorithms provide computational lenses for understanding the structure and temporal dynamics that govern diffusion and may help towards forecasting, influencing and retarding diffusion, broadly construed. As an application, we study information propagation in the online media space. We find that the information network of media sites and blogs tends to have a core-periphery structure with a small set of core media sites that diffuse information to the rest of the Web. These sites tend to have stable circles of influence with more general news media sites acting as connectors between them. Information pathways for general recurrent topics are more stable across time than for on-going news events. Clusters of news media sites and blogs often emerge and vanish in matter of days for on-going news events. Major social movements and events involving civil population, such as the Libyan's civil war or Syria's uprise, lead to an increased amount of information pathways among blogs as well as in the overall increase in the network centrality of blogs and social media sites. Additionally, we apply our probabilistic framework of diffusion to the influence maximization problem and develop the algorithm MaxInf. Experiments on synthetic and real diffusion networks show that our algorithm outperforms other state of the art algorithms by considering the temporal dynamics of diffusion
Tagging and other microtasks by Paul Brian Heymann( )

1 edition published in 2011 in English and held by 1 WorldCat member library worldwide

Over the past decade, the web has become increasingly participatory. Many web sites would be non-functional without the contribution of many tiny units of work by users and workers around the world. We call such tiny units of work microtasks. Microtasks usually represent less than five minutes of someone's time. However, microtasks can produce massive effects when pooled together. Examples of microtasks include tagging a photo with a descriptive keyword, rating a movie, or categorizing a product. This thesis explores tagging systems, one of the first places where unpaid microtasks became common. Tagging systems allow regular users to annotate keywords ("tags") to objects like URLs, photos, and videos. We begin by looking at social bookmarking systems, tagging systems where users tag URLs. We consider whether social bookmarking tags are useful for web search, finding that they often mirror other available metadata. We also show that social bookmarking tags can be predicted to varying degrees with two techniques: support vector machines and market basket data mining. To expand our understanding of tags, we look at social cataloging systems, tagging systems where users tag books. Social cataloging systems allow us to compare user generated tags and expert library terms that were created in parallel. We find that tags have important features like consistency, quality, and completeness in common with expert library terms. We also find that paid tagging can be an effective supplement to a tagging system. Finally, our work expands to all microtasks, rather than tagging alone. We propose a framework called Human Processing for programming with and studying paid and unpaid microtasks. We then develop a tool called HPROC for programming within this framework, primarily on top of a paid microtask marketplace called Amazon Mechanical Turk (AMT). Lastly, we describe Turkalytics, a system for monitoring of workers completing paid microtasks on AMT. We cover tagging from web search, machine learning, and library science perspectives, and work extensively with both the paid and unpaid microtasks which are becoming a fixture of the modern web
Efficient algorithms for Personalized PageRank by Peter Andrew Lofgren( )

1 edition published in 2015 in English and held by 1 WorldCat member library worldwide

We present new, more efficient algorithms for estimating random walk scores such as Personalized PageRank from a given source node to one or several target nodes. These scores are useful for personalized search and recommendations on networks including social networks, user-item networks, and the web. Past work has proposed using Monte Carlo or using linear algebra to estimate scores from a single source to every target, making them inefficient for a single pair. Our contribution is a new bidirectional algorithm which combines linear algebra and Monte Carlo to achieve significant speed improvements. On a diverse set of six graphs, our algorithm is 70x faster than past state-of-the-art algorithms. We also present theoretical analysis: while past algorithms require Omega(n) time to estimate a random walk score of typical size 1/n on an n-node graph to a given constant accuracy, our algorithm requires only O(m) expected time for an average target, where m is the number of edges, and is provably accurate. In addition to our core bidirectional estimator for personalized PageRank, we present an alternative algorithm for undirected graphs, a generalization to arbitrary walk lengths and Markov Chains, an algorithm for personalized search ranking, and an algorithm for sampling random paths from a given source to a given set of targets. We expect our bidirectional methods can be extended in other ways and will be useful subroutines in other graph analysis problems
Data analytics integration and privacy by Steven Euijong Whang( )

1 edition published in 2012 in English and held by 1 WorldCat member library worldwide

Data analytics has become an extremely important and challenging problem in disciplines like computer science, biology, medicine, finance, and homeland security. As massive amounts of data are available for analysis, scalable integration techniques become important. At the same time, new privacy issues arise where one's sensitive information can easily be inferred from the large amounts of data. In this thesis, we first cover the problem of entity resolution (ER), which identifies database records that refer to the same real-world entity. The recent explosion of data has now made ER a challenging problem in a wide range of applications. We propose scalable ER techniques and new ER functionalities that have not been studied in the past. We also view ER as a black-box operation and provide general techniques that can be used across applications. Next, we introduce the problem of managing information leakage, where one must try to prevent important bits of information from being resolved by ER, to guard against loss of data privacy. As more of our sensitive data gets exposed to a variety of merchants, health care providers, employers, social sites and so on, there is a higher chance that an adversary can "connect the dots" and piece together our information, leading to even more loss of privacy. We propose a measure for quantifying information leakage and use "disinformation" as a tool for containing information leakage
Modeling information flow in networks competition, evolution, and external influence by Seth A Myers( )

1 edition published in 2016 in English and held by 1 WorldCat member library worldwide

In online social networks such as Twitter and Facebook, users are constantly sharing information with people they are connected to, as well as re-sharing information posted by others. Through this process, a single piece of information called a contagion can spread from user to user over the connections until it has reached a large portion of the network. In this thesis, we develop a series of probabilistic methods for modeling the spread of contagions in social networks in order to better understand the factors that affect the process. Our work examines several different phenomena that affect information flows through social networks. One such phenomenon is unobserved sources of information influencing members of the network. We present a model that not only quantifies these hidden information sources but also provides a more accurate view of information spread. We find that as much as 29% of all information spreading through social networks like Twitter originates from sources outside the network. Next, we examine how different contagions spreading through a network can interact with each other. We observe and model competition (when one contagion can decrease the spread of another contagion) and cooperation (when one contagion increases the spread of another). We find that contagion interaction can increase or decrease the probability of contagion spread by more than 70% on average. We also explore the dynamic nature of social network structure, and how these dynamics are affected by the spread of information. As social network users are exposed to new contagions, they are constantly forming new connections with other users as well as deleting connections. We find that the spread of contagions can cause sudden "bursts" in both the creation of new connections and the deletion of old connections. We also find that contagions can change the network structure on a global scale by moving like-minded members closer to each other as well as pushing less similar users farther away. Additionally, we consider the problem of inferring the structure of a hidden network when only patterns of information spread are known. Given only the timing of when each user adopted each piece of information, our method accurately predicts which users are connected to which other users by converting the maximum likelihood estimation of the network into a series of independent convex optimization problems that can be solved efficiently. Taken together, the results presented in this thesis make contributions to understanding how information flows through networks, and they provide insights into the nature of how people exchange information
 
moreShow More Titles
fewerShow Fewer Titles
Audience Level
0
Audience Level
1
  Kids General Special  
Audience level: 0.67 (from 0.52 for Data assoc ... to 0.97 for Razpoznava ...)

Alternative Names
Leskovec, Jure

Languages