WorldCat Identities

Akbarinia, Reza

Works: 14 works in 32 publications in 2 languages and 529 library holdings
Roles: Editor, htt, Author, Other, Contributor, Opponent
Publication Timeline
Most widely held works by Reza Akbarinia
Transactions on large-scale data- and knowledge-centered systems XXXIII( )

12 editions published in 2017 in English and German and held by 333 WorldCat member libraries worldwide

The LNCS journal Transactions on Large-Scale Data- and Knowledge-Centered Systems focuses on data management, knowledge discovery, and knowledge processing, which are core and hot topics in computer science. Since the 1990s, the Internet has become the main driving force behind application development in all domains. An increase in the demand for resource sharing across different sites connected through networks has led to an evolution of data- and knowledge-management systems from centralized systems to decentralized systems enabling large-scale distributed applications providing high scalability. Current decentralized systems still focus on data and knowledge as their main resource. Feasibility of these systems relies basically on P2P (peer-to-peer) techniques and the support of agent systems with scaling and decentralized control. Synergy between grids, P2P systems, and agent technologies is the key to data- and knowledge-centered systems in large-scale environments. This, the 33rd issue of Transactions on Large-Scale Data- and Knowledge-Centered Systems, contains five revised selected regular papers. Topics covered include distributed massive data streams, storage systems, scientific workflow scheduling, cost optimization of data flows, and fusion strategies
P2P techniques for decentralized applications by Esther Pacitti( )

7 editions published in 2012 in English and held by 176 WorldCat member libraries worldwide

As an alternative to traditional client-server systems, Peer-to-Peer (P2P) systems provide major advantages in terms of scalability, autonomy and dynamic behavior of peers, and decentralization of control. Thus, they are well suited for large-scale data sharing in distributed environments. Most of the existing P2P approaches for data sharing rely on either structured networks (e.g., DHTs) for efficient indexing, or unstructured networks for ease of deployment, or some combination. However, these approaches have some limitations, such as lack of freedom for data placement in DHTs, and high latency and high network traffic in unstructured networks. To address these limitations, gossip protocols which are easy to deploy and scale well, can be exploited. In this book, we will give a overview of these different P2P techniques and architectures, discuss their trade-offs and illustrate their use for decentralizing several large-scale data sharing applications
Techniques d'accès aux données dans des systèmes pair-à-pair by Reza Akbarinia( Book )

2 editions published in 2007 in English and held by 2 WorldCat member libraries worldwide

""The goal of this thesis is to contribute to the development of new data access techniques for query processing services in P2P environments. We focus on novel techniques for two important kinds of queries: queries with currency guarantees and top-k queries. To improve data availability, most P2P systems rely on data replication, but without currency guarantees. However, for many applications which could take advantage of a P2P system (e.g. agenda management), the ability to get the current data is very important. To support these applications, the query processing service must be able to efficiently detect and retrieve a current, i.e. up-to-date, replica in response to a user requesting a data. The second problem which we address is supporting top-k queries which are very useful in large scale P2P systems, e.g. they can reduce the network traffic significantly. However, efficient execution of these queries is very difficult in P2P systems because of their special characteristics, in particular in DHTs. In this thesis, we first survey the techniques which have been proposed for query processing in P2P systems. We give an overview of the existing P2P networks, and compare their properties from the perspective of query processing. Second, we propose a complete solution to the problem of current data retrieval in DHTs. We propose a service called Update Management Service (UMS) which deals with updating replicated data and efficient retrieval of current replicas based on timestamping. Third, we propose novel solutions for top-k query processing in structured, i.e. DHTs, and unstructured P2P systems. We also propose new algorithms for top-k query processing over sorted lists which is a general model for top-k queries in many centralized, distributed and P2P systems, especially in super-peer networks. We validated our solutions through a combination of implementation and simulation and the results show very good performance, in terms of communication and response time."""
A highly scalable parallel algorithm for maximally informative k-itemset mining by Saber Salah( )

1 edition published in 2016 in English and held by 2 WorldCat member libraries worldwide

Entity resolution for distributed probabilistic data by Naser Ayat( )

1 edition published in 2013 in English and held by 2 WorldCat member libraries worldwide

Data placement in massively distributed environments for fast parallel mining of frequent itemsets by Saber Salah( )

1 edition published in 2017 in English and held by 2 WorldCat member libraries worldwide

ParCorr: efficient parallel methods to identify similar time series pairs across sliding windows by Djamel Edine Yagoubi( )

1 edition published in 2018 in English and held by 2 WorldCat member libraries worldwide

P2P Techniques for Decentralized Applic : Applications by Cynthia Jean Stallman-Pacitti( )

1 edition published in 2012 in English and held by 2 WorldCat member libraries worldwide

Simulation and modeling of microfluidic systems by Reza Akbarinia( Book )

1 edition published in 2013 in English and held by 2 WorldCat member libraries worldwide

Data Access in Dynamic Distributed Systems Basics, Concepts and Techniques of P2P Query Processing by Reza Akbarinia( )

1 edition published in 2009 in German and held by 2 WorldCat member libraries worldwide

Efficient techniques for large-scale Web data management by Jesus Camacho Rodriguez( )

1 edition published in 2014 in English and held by 1 WorldCat member library worldwide

The recent development of commercial cloud computing environments has strongly impacted research and development in distributed software platforms. Cloud providers offer a distributed, shared-nothing infrastructure, that may be used for data storage and processing.In parallel with the development of cloud platforms, programming models that seamlessly parallelize the execution of data-intensive tasks over large clusters of commodity machines have received significant attention, starting with the MapReduce model very well known by now, and continuing through other novel and more expressive frameworks. As these models are increasingly used to express analytical-style data processing tasks, the need for higher-level languages that ease the burden of writing complex queries for these systems arises.This thesis investigates the efficient management of Web data on large-scale infrastructures. In particular, we study the performance and cost of exploiting cloud services to build Web data warehouses, and the parallelization and optimization of query languages that are tailored towards querying Web data declaratively.First, we present AMADA, an architecture for warehousing large-scale Web data in commercial cloud platforms. AMADA operates in a Software as a Service (SaaS) approach, allowing users to upload, store, and query large volumes of Web data. Since cloud users support monetary costs directly connected to their consumption of resources, our focus is not only on query performance from an execution time perspective, but also on the monetary costs associated to this processing. In particular, we study the applicability of several content indexing strategies, and show that they lead not only to reducing query evaluation time, but also, importantly, to reducing the monetary costs associated with the exploitation of the cloud-based warehouse.Second, we consider the efficient parallelization of the execution of complex queries over XML documents, implemented within our system PAXQuery. We provide novel algorithms showing how to translate such queries into plans expressed in the PArallelization ConTracts (PACT) programming model. These plans are then optimized and executed in parallel by the Stratosphere system. We demonstrate the efficiency and scalability of our approach through experiments on hundreds of GB of XML data.Finally, we present a novel approach for identifying and reusing common subexpressions occurring in Pig Latin scripts. In particular, we lay the foundation of our reuse-based algorithms by formalizing the semantics of the Pig Latin query language with extended nested relational algebra for bags. Our algorithm, named PigReuse, operates on the algebraic representations of Pig Latin scripts, identifies subexpression merging opportunities, selects the best ones to execute based on a cost function, and merges other equivalent expressions to share its result. We bring several extensions to the algorithm to improve its performance. Our experiment results demonstrate the efficiency and effectiveness of our reuse-based algorithms and optimization strategies
Interrogation des bases de données XML probabilistes by Asma Souihli( )

1 edition published in 2012 in English and held by 1 WorldCat member library worldwide

Probabilistic XML is a probabilistic model for uncertain tree-structured data, with applications to data integration, information extraction, or uncertain version control. We explore in this dissertation efficient algorithms for evaluating tree-pattern queries with joins over probabilistic XML or, more specifically, for approximating the probability of each item of a query result. The approach relies on, first, extracting the query lineage over the probabilistic XML document, and, second, looking for an optimal strategy to approximate the probability of the propositional lineage formula. ProApproX is the probabilistic query manager for probabilistic XML presented in this thesis. The system allows users to query uncertain tree-structured data in the form of probabilistic XML documents. It integrates a query engine that searches for an optimal strategy to evaluate the probability of the query lineage. ProApproX relies on a query-optimizer--like approach: exploring different evaluation plans for different parts of the formula and predicting the cost of each plan, using a cost model for the various evaluation algorithms. We demonstrate the efficiency of this approach on datasets used in a number of most popular previous probabilistic XML querying works, as well as on synthetic data. An early version of the system was demonstrated at the ACM SIGMOD 2011 conference. First steps towards the new query solution were discussed in an EDBT/ICDT PhD Workshop paper (2011). A fully redesigned version that implements the techniques and studies shared in the present thesis, is published as a demonstration at CIKM 2012. Our contributions are also part of an IEEE ICDE
FP-Hadoop: Efficient processing of skewed MapReduce jobs( )

1 edition published in 2016 in English and held by 1 WorldCat member library worldwide

Abstract: Nowadays, we are witnessing the fast production of very large amount of data, particularly by the users of online systems on the Web. However, processing this big data is very challenging since both space and computational requirements are hard to satisfy. One solution for dealing with such requirements is to take advantage of parallel frameworks, such as MapReduce or Spark, that allow to make powerful computing and storage units on top of ordinary machines. Although these key-based frameworks have been praised for their high scalability and fault tolerance, they show poor performance in the case of data skew. There are important cases where a high percentage of processing in the reduce side ends up being done by only one node. In this paper, we present FP-Hadoop, a Hadoop-based system that renders the reduce side of MapReduce more parallel by efficiently tackling the problem of reduce data skew. FP-Hadoop introduces a new phase, denoted intermediate reduce (IR), where blocks of intermediate values are processed by intermediate reduce workers in parallel. With this approach, even when all intermediate values are associated to the same key, the main part of the reducing work can be performed in parallel taking benefit of the computing power of all available workers. We implemented a prototype of FP-Hadoop, and conducted extensive experiments over synthetic and real datasets. We achieved excellent performance gains compared to native Hadoop, e.g. more than 10 times in reduce time and 5 times in total execution time . Abstract : Highlights: A novel approach for dealing with data skew in the reduce side of MapReduce. Parallel reducing of each key, using multiple reduce workers. Hierarchical execution of MapReduce jobs. Non-overwhelming reducing of intermediate data
Optimization of User-Defined Aggregate Functions : Parallization and Sharing by Chao Zhang( )

1 edition published in 2019 in English and held by 1 WorldCat member library worldwide

Les applications des agrégations pour la synthèse d'informations sont significatives dans de nombreux domaines. Les agrégations incorporées par défaut dans les systèmes ne sont pas suffisantes pour satisfaire les besoins qui émergent avec les progrès de l'analyse de données. Les UDAFs (User-Defined Aggregate Functions ou, en français, fonctions d'agrégation définies par l'utilisateur) sont en train de devenir un des opérateurs fondamentaux en analyse de données avancée. Le mécanisme UDAF fourni par la plupart des systèmes modernes souffre cependant d'au moins deux défauts : la définition d'UDAFs nécessite le codage en dur de la routine qui calcule la fonction d'agrégation, et la sémantique des UDAFs est totalement ou partiellement inconnue des processeurs de requêtes, empêchant leur optimisation. Cette thèse présente SUDAF (Sharing User-Defined Aggregate Functions), un cadre framework déclaratif qui permet aux utilisateurs de formuler des UDAFs sous la forme d'expressions mathématiques et de les utiliser dans des déclarations SQL. SUDAF est capable de générer automatiquement des implémentations parallèles efficientes à partir des UDAFs des utilisateurs, et supporte la mise en cache dynamique et la réutilisation des agrégats partiels. Nos expérimentations montrent que la technique de partage proposée permet des gains d'un à deux ordres de magnitude sur les temps d'exécution des requêtes
moreShow More Titles
fewerShow Fewer Titles
Audience Level
Audience Level
  General Special  
Audience level: 0.61 (from 0.59 for Data Acces ... to 0.99 for Efficient ...)

Transactions on large-scale data- and knowledge-centered systems XXXIII
P2P techniques for decentralized applicationsP2P Techniques for Decentralized Applic : Applications
English (30)

German (2)