WorldCat Identities

Ecole doctorale Informatique de Paris-Sud

Overview
Works: 188 works in 189 publications in 2 languages and 189 library holdings
Roles: Other, 996
Publication Timeline
.
Most widely held works by Ecole doctorale Informatique de Paris-Sud
Optimized broadcasting in wireless ad-hoc networks using network coding by Nour Kadi( Book )

2 editions published in 2010 in English and held by 2 WorldCat member libraries worldwide

Network coding is a novel technique which attracts the research interest since its emergence in 2000. It was shown that network coding, combined with wireless broadcasting, can potentially improve the performance in term of throughput, energy efficiency and bandwidth utilization. Our study begins with integrating network coding with multipoint relay (MPR) technique. MPR is an efficient broadcast mechanism which has been used in many wireless protocols. We show how combining the two techniques together can reduce the number of transmitted packets and increase the throughput. We further reduce the complexity by proposing an opportunistic coding scheme which performs coding operations on the binary field. Instead of linearly combining packets, we employ arithmetic summing packets in modulo 2, which simply corresponds to XOR the corresponding bits in each packet. These operations are computationally cheap. Using neighbors state information, a node in our scheme chooses packets to encode and transmit at each transmission trying to deliver a maximum number of packets. Therefore, an exchange of the reception information between the neighbors is required. To reduce the overhead of the required feedback, we propose a new coding scheme. It uses LT-code (a type of fountain code) to eliminate the need of a perfect feedback among neighbors. This scheme performs encoding and decoding with a logarithmic complexity. We optimize LT-code to speed up the decoding process. The optimization is achieved by proposing a new degree distribution to be used during the encoding process. This distribution allows intermediate nodes to decode more symbols even when few encoded packets are received
Techniques d'optimisation pour des données semi-structurées du web sémantique by Julien Leblay( )

1 edition published in 2013 in English and held by 1 WorldCat member library worldwide

Since the beginning of the Semantic Web, RDF and SPARQL have become the standard data model and query language to describe resources on the Web. Large amounts of RDF data are now available either as stand-alone datasets or as metadata over semi-structured documents, typically XML. The ability to apply RDF annotations over XML data emphasizes the need to represent and query data and metadata simultaneously. While significant efforts have been invested into producing and publishing annotations manually or automatically, little attention has been devoted to exploiting such data. This thesis aims at setting database foundations for the management of hybrid XML-RDF data. We present a data model capturing the structural aspects of XML data and the semantics of RDF. Our model is general enough to describe pure XML or RDF datasets, as well as RDF-annotated XML data, where any XML node can act as a resource. We also introduce the XRQ query language that combines features of both XQuery and SPARQL. XRQ not only allows querying the structure of documents and the semantics of their annotations, but also producing annotated semi-structured data on-the-fly. We introduce the problem of query composition in XRQ, and exhaustively study query evaluation techniques for XR data to demonstrate the feasibility of this data management setting. We have developed an XR platform on top of well-known data management systems for XML and RDF. The platform features several query processing algorithms, whose performance is experimentally compared. We present an application built on top of the XR platform. The application provides manual and automatic annotation tools, and an interface to query annotated Web page and publicly available XML and RDF datasets concurrently. As a generalization of RDF and SPARQL, XR and XRQ enables RDFS-type of query answering. In this respect, we present a technique to support RDFS-entailments in RDF (and by extension XR) data management systems
Un système de recherche d'information personnalisée basé sur la modélisation multidimensionnelle de l'utilisateur by Myriam Hadjouni Krir( )

1 edition published in 2012 in French and held by 1 WorldCat member library worldwide

The web explosion has led Information Retrieval (IR) to be extended and web search engines emergence. The conventional IR methods, usually intended for simple textual searches, faced new documents types and rich and scalable contents. The users, facing these evolutions, ask more for IR systems search results quality. In this context, the personalization main objective is improving results returned to the end user based sing on its perception and its interests and preferences. This thesis context is concerned with these different aspects. Its main objective is to propose new and effective solutions to the personalization problem. To achieve this goal, a spatial and semantic web personalization system integrating implicit user modeling is proposed. This system has two components: 1/ user modeling; /2 implicit users' collaboration through the construction of a users' models network. A system prototype was developed for the evaluation purpose that contains: a) user model quality evaluation; b) information retrieval quality evaluation; c) information retrieval quality evaluation with the spatial user model data; d) information retrieval quality evaluation with the whole user model data and the users' models network. Experiments showed amelioration in the personalized search results compared to a baseline web search
Synchronization and Fault-tolerance in Distributed Algorithms by Peva Blanchard( )

1 edition published in 2014 in English and held by 1 WorldCat member library worldwide

In the first part of this thesis, we focus on a recent model, calledpopulation protocols, which describes large networksof tiny wireless mobile anonymous agents with very limited resources.The harsh constraints of the original model makes most of theclassical problems of distributed algorithmics, such as datacollection, consensus and leader election, either difficult to analyzeor impossible to solve.We first study the data collection problem, which mainly consists intransferring some values to a base station. By using a fairnessassumption, known as cover times, we compute tight bounds on theconvergence time of concrete protocols. Next, we focus on theproblems of consensus and leader election. It is shown that theseproblems are impossible in the original model. To circumvent theseissues, we augment the original model with oracles, and study theirrelative power. We develop by the way a formal framework generalenough to encompass various sorts of oracles, as well as theirrelations.In the second part of the thesis, we study the problem ofstate-machine replication in the more classical model of asynchronousmessage-passing communication. The Paxos algorithm is a famous(partial) solution to the state-machine replication problem whichtolerates crash failures. Our contribution is the enhancement of Paxosin order to tolerate transient faults as well. Doing so, we define thenotion of practically self-stabilizing replicated state-machine
Composition des modèles de lignes de produits logiciels by Takoua Ben Rhouma( )

1 edition published in 2012 in French and held by 1 WorldCat member library worldwide

The Software Product Line (SPL) engineering aims at modeling and developing a set of software systems with similarities rather than individual software systems. Modeling task can be, however, tedious or even infeasible for large scale and complex SPLs. To address such a problem, the modeling task is distributed among different stakeholders. At the end, the models separately developed have to be composed in order to obtain the global SPL model. Composing SPL models is not a trivial task; variability information of model elements has to be treated during the composition, as well as the variability constraints. Similarly, the model structure and the composition semantics are key points that have to be considered during the composition. This thesis aims at providing specific mechanisms to compose SPL models. Therefore, we propose two composition mechanisms: the merge and the aggregation mechanisms. The merge mechanism aims at combining models including structural similarities. The aggregation mechanism, however, intends to compose models without any structural similarity but having eventual constraints across their structural elements. We focus on UML composite structures of SPLs and use specific annotations to identify variable elements. Our composition mechanisms deal with the variability information of structural elements, the variability constraints associated with the variable elements as well as the structures of the manipulated models. We also specify a set of semantic properties that have to be considered during the composition process and show how to preserve them. At the end, we have carried out an assessment of the proposals and have showed their ability to compose SPL models in a reasonable time. We have also showed how model consolidation is important in reducing le number of products having incomplete structure
Déploiement multiplateforme d'applications multitâche par la modélisation by Wassim El Hajj Chehade( )

1 edition published in 2011 in French and held by 1 WorldCat member library worldwide

Given the complexity of multitasked software, linked to very pressing economic and competitive contexts, application portability and deployment process reusability has become a major issue. The model driven engineering is an approach that aspires to meet these needs by separating functional concerns of multitasking systems from their technical concerns, while maintaining the relationship between them. In practice, this takes the form of model transformations that specializes models for target platforms. Currently, concerns specific to these platforms are described implicitly in the transformations themselves. Consequently, these transformations are not reusable and do not meet the heterogeneous evolutionary needs that characterize multitasking systems. Our objective is then to apply the principle of separation of concern even at the level of transformation models, an approach that guarantees portability and reusability of models transformation process.To do this, this study provides first a detailed behavioral modeling of software execution platform. This modeling allows to extract specific concerns from model transformation and to capture them in a detailed platform model independent and reusable. In a second step, based on these models, it presents a generic process for developing concurrent systems. The originality of this approach is a true separation of concerns between three actors: the developer of transformation tool, who specifies a generic model transformation, platform providers that provide detailed models of their platforms and the multitasked system designer that models the system. At the end of this study, an evaluation of this approach shows a reduction in the cost of deploying applications on multiple platforms without incurring an additional cost of performance
Computations on Massive Data Sets : Streaming Algorithms and Two-party Communication by Christian Konrad( )

1 edition published in 2013 in English and held by 1 WorldCat member library worldwide

Analyse acoustique de la voix émotionnelle de locuteurs lors d'une interaction humain-robot by Marie Tahon( )

1 edition published in 2012 in French and held by 1 WorldCat member library worldwide

This thesis deals with emotional voices during a human-robot interaction. In a natural interaction, we define at least, four kinds of variabilities: environment (room, microphone); speaker, its physic characteristics (gender, age, voice type) and personality; emotional states; and finally the kind of interaction (game scenario, emergency, everyday life). From audio signals collected in different conditions, we tried to find out, with acoustic features, to overlap speaker and his emotional state characterisation taking into account these variabilities.To find which features are essential and which are to avoid is hard challenge because it needs to work with a high number of variabilities and then to have riche and diverse data to our disposal. The main results are about the collection and the annotation of natural emotional corpora that have been recorded with different kinds of speakers (children, adults, elderly people) in various environments, and about how reliable are acoustic features across the four variabilities. This analysis led to two interesting aspects: the audio characterisation of a corpus and the drawing of a black list of features which vary a lot. Emotions are ust a part of paralinguistic features that are supported by the audio channel, other paralinguistic features have been studied such as personality and stress in the voice. We have also built automatic emotion recognition and speaker characterisation module that we have tested during realistic interactions. An ethic discussion have been driven on our work
Phonemic variability and confusability in pronunciation modeling for automatic speech recognition by Panagiota Karanasou( )

1 edition published in 2013 in English and held by 1 WorldCat member library worldwide

This thesis addresses the problems of phonemic variability and confusability from the pronunciation modeling perspective for an automatic speech recognition (ASR) system. In particular, several research directions are investigated. First, automatic grapheme-to- phoneme (g2p) and phoneme-to-phoneme (p2p) converters are developed that generate alternative pronunciations for in-vocabulary as well as out-of-vocabulary (OOV) terms. Since the addition of alternative pronunciation may introduce homophones (or close homophones), there is an increase of the confusability of the system. A novel measure of this confusability is proposed to analyze it and study its relation with the ASR performance. This pronunciation confusability is higher if pronunciation probabilities are not provided and can potentially severely degrade the ASR performance. It should, thus, be taken into account during pronunciation generation. Discriminative training approaches are, then, investigated to train the weights of a phoneme confusion model that allows alternative ways of pronouncing a term counterbalancing the phonemic confusability problem. The objective function to optimize is chosen to correspond to the performance measure of the particular task. In this thesis, two tasks are investigated, the ASR task and the KeywordSpotting (KWS) task. For ASR, an objective that minimizes the phoneme error rate is adopted. For experiments conducted on KWS, the Figure of Merit (FOM), a KWS performance measure, is directly maximized
High Performance by Exploiting Information Locality through Reverse Computing by Mouad Bahi( )

1 edition published in 2011 in English and held by 1 WorldCat member library worldwide

The main resources for computation are time, space and energy. Reducing them is the main challenge in the field of processor performance.In this thesis, we are interested in a fourth factor which is information. Information has an important and direct impact on these three resources. We show how it contributes to performance optimization. Landauer has suggested that independently on the hardware where computation is run information erasure generates dissipated energy. This is a fundamental result of thermodynamics in physics. Therefore, under this hypothesis, only reversible computations where no information is ever lost, are likely to be thermodynamically adiabatic and do not dissipate power. Reversibility means that data can always be retrieved from any point of the program. Information may be carried not only by the data but also by the process and input data that generate it. When a computation is reversible, information can also be retrieved from other already computed data and reverse computation. Hence reversible computing improves information locality.This thesis develops these ideas in two directions. In the first part, we address the issue of making a computation DAG (directed acyclic graph) reversible in terms of spatial complexity. We define energetic garbage as the additional number of registers needed for the reversible computation with respect to the original computation. We propose a reversible register allocator and we show empirically that the garbage size is never more than 50% of the DAG size. In the second part, we apply this approach to the trade-off between recomputing (direct or reverse) and storage in the context of supercomputers such as the recent vector and parallel coprocessors, graphical processing units (GPUs), IBM Cell processor, etc., where the gap between processor cycle time and memory access time is increasing. We show that recomputing in general and reverse computing in particular helps reduce register requirements and memory pressure. This approach of reverse rematerialization also contributes to the increase of instruction-level parallelism (Cell) and thread-level parallelism in multicore processors with shared register/memory file (GPU). On the latter architecture, the number of registers required by the kernel limits the number of running threads and affects performance. Reverse rematerialization generates additional instructions but their cost can be hidden by the parallelism gain. Experiments on the highly memory demanding Lattice QCD simulation code on Nvidia GPU show a performance gain up to 11%
Approches supervisées et faiblement supervisées pour l'extraction d'événements et le peuplement de bases de connaissances by Ludovic Jean-Louis( )

1 edition published in 2011 in French and held by 1 WorldCat member library worldwide

La plus grande partie des informations disponibles librement sur le Web se présentent sous une forme textuelle, c'est-à-dire non-structurée. Dans un contexte comme celui de la veille, il est très utile de pouvoir présenter les informations présentes dans les textes sous une forme structurée en se focalisant sur celles jugées pertinentes vis-à-vis du domaine d'intérêt considéré. Néanmoins, lorsque l'on souhaite traiter ces informations de façon systématique, les méthodes manuelles ne sont pas envisageables du fait du volume important des données à considérer.L'extraction d'information s'inscrit dans la perspective de l'automatisation de ce type de tâches en identifiant dans des textes les informations concernant des faits (ou événements) afin de les stocker dans des structures de données préalablement définies. Ces structures, appelées templates (ou formulaires), agrègent les informations caractéristiques d'un événement ou d'un domaine d'intérêt représentées sous la forme d'entités nommées (nom de lieux, etc.).Dans ce contexte, le travail de thèse que nous avons mené s'attache à deux grandes problématiques : l'identification des informations liées à un événement lorsque ces informations sont dispersées à une échelle textuelle en présence de plusieurs occurrences d'événements de même type;la réduction de la dépendance vis-à-vis de corpus annotés pour la mise en œuvre d'un système d'extraction d'information.Concernant la première problématique, nous avons proposé une démarche originale reposant sur deux étapes. La première consiste en une segmentation événementielle identifiant dans un document les zones de texte faisant référence à un même type d'événements, en s'appuyant sur des informations de nature temporelle. Cette segmentation détermine ainsi les zones sur lesquelles le processus d'extraction doit se focaliser. La seconde étape sélectionne à l'intérieur des segments identifiés comme pertinents les entités associées aux événements. Elle conjugue pour ce faire une extraction de relations entre entités à un niveau local et un processus de fusion global aboutissant à un graphe d'entités. Un processus de désambiguïsation est finalement appliqué à ce graphe pour identifier l'entité occupant un rôle donné vis-à-vis d'un événement lorsque plusieurs sont possibles.La seconde problématique est abordée dans un contexte de peuplement de bases de connaissances à partir de larges ensembles de documents (plusieurs millions de documents) en considérant un grand nombre (une quarantaine) de types de relations binaires entre entités nommées. Compte tenu de l'effort représenté par l'annotation d'un corpus pour un type de relations donné et du nombre de types de relations considérés, l'objectif est ici de s'affranchir le plus possible du recours à une telle annotation tout en conservant une approche par apprentissage. Cet objectif est réalisé par le biais d'une approche dite de supervision distante prenant comme point de départ des exemples de relations issus d'une base de connaissances et opérant une annotation non supervisée de corpus en fonction de ces relations afin de constituer un ensemble de relations annotées destinées à la construction d'un modèle par apprentissage. Cette approche a été évaluée à large échelle sur les données de la campagne TAC-KBP 2010
Modélisation et conception d'une plateforme pour l'interaction multimodale distribuée en intelligence ambiante by Gaëtan Pruvost( )

1 edition published in 2013 in French and held by 1 WorldCat member library worldwide

This thesis deals with ambient intelligence and the design of Human-Computer Interaction (HCI). It studies the automatic generation of user interfaces that are adapted to the interaction context in ambient environments. This problem raises design issues that are specific to ambient HCI, particularly in the reuse of multimodal and multidevice interaction techniques. The present work falls into three parts. The first part is an analysis of state-of-the-art software architectures designed to solve those issues. This analysis outlines the limits of current approaches and enables us to propose, in the second part, a new approach for the design of ambient HCI called DAME. This approach relies on the automatic and dynamic association of software components that build a user interface. We propose and define two complementary models that allow the description of ergonomic and architectural properties of the software components. The design of such components is organized in a layered architecture that identifies reusable levels of abstraction of an interaction language. A third model, called behavioural model, allows the specification of recommendations about the runtime instantiation of components. We propose an algorithm that allows the generation of context-adapted user interfaces and the evaluation of their quality according to the recommendations issued from the behavioural model. In the third part, we detail our implementation of a platform that implements the DAME approach. This implementation is used in a qualitative experiment that involves end-users. Encouraging preliminary results have been obtained and open new perspectives on multi-devices and multimodal HCI in ambient computing
Scalable view-based techniques for web data : algorithms and systems by Asterios Katsifodimos( )

1 edition published in 2013 in English and held by 1 WorldCat member library worldwide

XML was recommended by W3C in 1998 as a markup language to be used by device- and system-independent methods of representing information. XML is nowadays used as a data model for storing and querying large volumes of data in database systems. In spite of significant research and systems development, many performance problems are raised by processing very large amounts of XML data. Materialized views have long been used in databases to speed up queries. Materialized views can be seen as precomputed query results that can be re-used to evaluate (part of) another query, and have been a topic of intensive research, in particular in the context of relational data warehousing. This thesis investigates the applicability of materialized views techniques to optimize the performance of Web data management tools, in particular in distributed settings, considering XML data and queries. We make three contributions.We first consider the problem of choosing the best views to materialize within a given space budget in order to improve the performance of a query workload. Our work is the first to address the view selection problem for a rich subset of XQuery. The challenges we face stem from the expressive power and features of both the query and view languages and from the size of the search space of candidate views to materialize. While the general problem has prohibitive complexity, we propose and study a heuristic algorithm and demonstrate its superior performance compared to the state of the art.Second, we consider the management of large XML corpora in peer-to-peer networks, based on distributed hash tables (or DHTs, in short). We consider a platform leveraging distributed materialized XML views, defined by arbitrary XML queries, filled in with data published anywhere in the network, and exploited to efficiently answer queries issued by any network peer. This thesis has contributed important scalability oriented optimizations, as well as a comprehensive set of experiments deployed in a country-wide WAN. These experiments outgrow by orders of magnitude similar competitor systems in terms of data volumes and data dissemination throughput. Thus, they are the most advanced in understanding the performance behavior of DHT-based XML content management in real settings.Finally, we present a novel approach for scalable content-based publish/subscribe (pub/sub, in short) in the presence of constraints on the available computational resources of data publishers. We achieve scalability by off-loading subscriptions from the publisher, and leveraging view-based query rewriting to feed these subscriptions from the data accumulated in others. Our main contribution is a novel algorithm for organizing subscriptions in a multi-level dissemination network in order to serve large numbers of subscriptions, respect capacity constraints, and minimize latency. The efficiency and effectiveness of our algorithm are confirmed through extensive experiments and a large deployment in a WAN
Interactions simultanées de plusieurs utilisateurs avec une table interactive by Jonathan Chaboissier( )

1 edition published in 2011 in French and held by 1 WorldCat member library worldwide

This thesis presents our work in computer science in human-computer interaction (HCI). The subject concerns the use of a new kind of computer called the interactive tabletop display. It is a table whose top is both a screen and a multi-touch detection device. Interactive tabletops open up new uses of computer applications by allowing several co-located users to work or play together on the same system. Tabletop's users naturally want to interact simultaneously on the shared surface. This simultaneity is difficult to observe in a collaborative and not artificial environment. Existing studies have not sufficiently analyzed the problems nor sought how the system can help manage concurrency.Our approach was to exploring simultaneous interactions by studying original situations where the system puts pressure on users. We explain how we used a video game as an exploration and experimentation tool. This thesis traces the design and development of RealTimeChess, a game for 2-4 players, a real-time version of Chess adapted to tabletops. We report the results of experiments on groups of 2 to 4 participants in situations of cooperation and competition, which helped to highlight problems and physical discomfort of access to remote objects, awareness in dynamic context, and control the pace of interaction.This thesis also presents lessons learned on simultaneous interactions of multiple users, territoriality aspects, collaborative behavior, and finally gives tabletop Game Design guidelines
Scalable algorithms for cloud-based Semantic Web data management by Stamatis Zampetakis( )

1 edition published in 2015 in English and held by 1 WorldCat member library worldwide

In order to build smart systems, where machines are able to reason exactly like humans, data with semantics is a major requirement. This need led to the advent of the Semantic Web, proposing standard ways for representing and querying data with semantics. RDF is the prevalent data model used to describe web resources, and SPARQL is the query language that allows expressing queries over RDF data. Being able to store and query data with semantics triggered the development of many RDF data management systems. The rapid evolution of the Semantic Web provoked the shift from centralized data management systems to distributed ones. The first systems to appear relied on P2P and client-server architectures, while recently the focus moved to cloud computing.Cloud computing environments have strongly impacted research and development in distributed software platforms. Cloud providers offer distributed, shared-nothing infrastructures that may be used for data storage and processing. The main features of cloud computing involve scalability, fault-tolerance, and elastic allocation of computing and storage resources following the needs of the users.This thesis investigates the design and implementation of scalable algorithms and systems for cloud-based Semantic Web data management. In particular, we study the performance and cost of exploiting commercial cloud infrastructures to build Semantic Web data repositories, and the optimization of SPARQL queries for massively parallel frameworks.First, we introduce the basic concepts around Semantic Web and the main components and frameworks interacting in massively parallel cloud-based systems. In addition, we provide an extended overview of existing RDF data management systems in the centralized and distributed settings, emphasizing on the critical concepts of storage, indexing, query optimization, and infrastructure. Second, we present AMADA, an architecture for RDF data management using public cloud infrastructures. We follow the Software as a Service (SaaS) model, where the complete platform is running in the cloud and appropriate APIs are provided to the end-users for storing and retrieving RDF data. We explore various storage and querying strategies revealing pros and cons with respect to performance and also to monetary cost, which is a important new dimension to consider in public cloud services. Finally, we present CliqueSquare, a distributed RDF data management system built on top of Hadoop, incorporating a novel optimization algorithm that is able to produce massively parallel plans for SPARQL queries. We present a family of optimization algorithms, relying on n-ary (star) equality joins to build flat plans, and compare their ability to find the flattest possibles. Inspired by existing partitioning and indexing techniques we present a generic storage strategy suitable for storing RDF data in HDFS (Hadoop's Distributed File System). Our experimental results validate the efficiency and effectiveness of the optimization algorithm demonstrating also the overall performance of the system
Flot de conception système sur puce pour radio logicielle by Guangye Tian( )

1 edition published in 2011 in English and held by 1 WorldCat member library worldwide

The Software Defined Radio (SDR) is a reconfigurable radio whose functionality is controlled by software, which greatly enhances the reusability and flexibility of waveform applications. The system update is also made easily achievable through software update instead of hardware replacement. The Software Communication Architecture (SCA), on the other hand, is an open architecture framework which specifies an Operating Environment (OE) in which waveform applications are executed. A SCA compliant SDR greatly improves the portability, reusability and interoperability of waveforms applications between different SDR implementations.The multiprocessor system on chip (MPSoC) consisting of large, heterogeneous sets of embedded processors, reconfiguration hardware and network-on-chip (NoC) interconnection is emerging as a potential solution for the continued increase in the data processing bandwidth, as well as expenses for the manufacturing and design of nanoscale system-on-chip (SoC) in the face of continued time-to-market pressures.We studied the challenges of efficiently deploying a SCA compliant platform on an MPSoC. We conclude that for realizing efficiently an SDR system with high data bandwidth requirement, a design flow with systematic design space exploration and optimization, and an efficient programming model are necessary. We propose a hybrid programming model combining distributed client/server model and parallel shared memory model. A design flow is proposed which also integrates a NoC topology synthesis engine for applications that are to be accelerated with parallel programming and multiple processing elements (PEs). We prototyped an integrated SW/HW development environment in which a CORBA based integrated distributed system is developed which depends on the network-on-chip for protocol/packet routing, and software components are deployed with unified interface despite the underlying heterogeneous architecture and os; while the hardware components (processors, IPs, etc) are integrated through interface conforming to the Open Core Protocol (OCP)
Direct Manipulation for Information Visualization by Charles Perin( )

1 edition published in 2014 in English and held by 1 WorldCat member library worldwide

There is a tremendous effort from the information visualization (Infovis) community to design novel, more efficient or more specialized desktop visualization techniques. While visual representations and interactions are combined to create these visualizations, less effort is invested in the design of new interaction techniques for Infovis. In this thesis, I focus on interaction for Infovis and explore how to improve existing visualization techniques through efficient yet simple interactions. To become more efficient, the interaction techniques should reach beyond the standard widgets and Window/Icon/Menu/Pointer (WIMP) user interfaces. In this thesis, I argue that the design of novel interactions for visualization should be based on the direct manipulation paradigm, instrumental interaction, and take inspiration from advanced interactions investigated in HCI research but not well exploited yet in Infovis. I take examples from multiple projects I have designed to illustrate how opportunistic interactions can empower visualizations and I explore design implications raised by novel interaction techniques, such as the tradeoff between cognitive congruence and versatility, the problem of engaging interaction, and the benefits of seamless, fluid interaction. Finally, I provide guidelines and perspectives, addressing the grand challenge of building or consolidating the theory of interaction for Infovis
Sequential prediction for budgeted learning : Application to trigger design by Djalel Benbouzid( )

1 edition published in 2014 in English and held by 1 WorldCat member library worldwide

Cette thèse aborde le problème de classification en apprentissage statistique sous un angle nouveau en rajoutant une dimension séquentielle au processus de classification. En particulier, nous nous intéressons au cas de l'apprentissage à contraintes de budget (ou apprentissage budgété) où l'objectif est de concevoir un classifieur qui, tout en apportant des prédictions correctes, doit gérer un budget computationnel, consommé au fur et à mesure que les différents attributs sont acquis ou évalués. Les attributs peuvent avoir des coûts d'acquisition différents et il arrive souvent que les attributs les plus discriminatifs soient les plus coûteux. Le diagnostic médical et le classement de pages web sont des exemples typiques d'applications de l'apprentissage budgété. Pour le premier, l'objectif est de limiter le nombre de tests médicaux que le patient doit endurer et, pour le second, le classement doit se faire dans un temps assez court pour ne pas faire fuir l'usager. Au cours de cette thèse, nous nous sommes intéressés à des contraintes de budget atypiques, que la conception de trigger nous a motivés à investiguer. Les triggers sont un type de classifieurs rapides, temps-réel et sensibles aux coûts qui ont pour objectif de filtrer les données massives que les accélérateurs de particules produisent et d'en retenir les événements les plus susceptibles de contenir le phénomène étudié, afin d'être enregistrés pour des analyses ultérieures. La conception de trigger impose des contraintes computationnelles strictes lors de la classification mais, surtout, exhibe des schémas complexes de calcul du coût de chaque attributs. Certains attributs sont dépendants d'autres attributs et nécessitent de calculer ces derniers en amont, ce qui a pour effet d'augmenter le coût de la classification. De plus, le coût des attributs peut directement dépendre de leur valeur concrète. On retrouve ce cas de figure lorsque les extracteurs d'attributs améliorent la qualité de leur sortie avec le temps mais peuvent toujours apporter des résultats préliminaires. Enfin, les observations sont regroupées en sacs et, au sein du même sac, certaines observations partagent le calcul d'un sous-ensemble d'attributs. Toutes ces contraintes nous ont amenés à formaliser la classification sous un angle séquentiel.Dans un premier temps, nous proposons un nouveau cadriciel pour la classification rapide en convertissant le problème initial de classification en un problème de prise décision. Cette reformulation permet d'un part d'aborder la séquentialité de manière explicite, ce qui a pour avantage de pouvoir aisément incorporer les différentes contraintes que l'on retrouve dans les applications réelles, mais aussi d'avoir à disposition toute une palette d'algorithmes d'apprentissage par renforcement pour résoudre le nouveau problème. Dans une seconde partie, nous appliquons notre modèle de classification séquentielle à un problème concret d'apprentissage à contraintes de budget et démontrant les bénéfices de notre approche sur des données simulées (à partir de distributions simplifiées) de l'expérience LHCb (CERN)
Contributions to Multi-Armed Bandits : Risk-Awareness and Sub-Sampling for Linear Contextual Bandits by Nicolas Galichet( )

1 edition published in 2015 in English and held by 1 WorldCat member library worldwide

This thesis focuses on sequential decision making in unknown environment, and more particularly on the Multi-Armed Bandit (MAB) setting, defined by Lai and Robbins in the 50s. During the last decade, many theoretical and algorithmic studies have been aimed at cthe exploration vs exploitation tradeoff at the core of MABs, where Exploitation is biased toward the best options visited so far while Exploration is biased toward options rarely visited, to enforce the discovery of the the true best choices. MAB applications range from medicine (the elicitation of the best prescriptions) to e-commerce (recommendations, advertisements) and optimal policies (e.g., in the energy domain). The contributions presented in this dissertation tackle the exploration vs exploitation dilemma under two angles. The first contribution is centered on risk avoidance. Exploration in unknown environments often has adverse effects: for instance exploratory trajectories of a robot can entail physical damages for the robot or its environment. We thus define the exploration vs exploitation vs safety (EES) tradeoff, and propose three new algorithms addressing the EES dilemma. Firstly and under strong assumptions, the MIN algorithm provides a robust behavior with guarantees of logarithmic regret, matching the state of the art with a high robustness w.r.t. hyper-parameter setting (as opposed to, e.g. UCB (Auer 2002)). Secondly, the MARAB algorithm aims at optimizing the cumulative 'Conditional Value at Risk' (CVar) rewards, originated from the economics domain, with excellent empirical performances compared to (Sani et al. 2012), though without any theoretical guarantees. Finally, the MARABOUT algorithm modifies the CVar estimation and yields both theoretical guarantees and a good empirical behavior. The second contribution concerns the contextual bandit setting, where additional informations are provided to support the decision making, such as the user details in the ontent recommendation domain, or the patient history in the medical domain. The study focuses on how to make a choice between two arms with different numbers of samples. Traditionally, a confidence region is derived for each arm based on the associated samples, and the 'Optimism in front of the unknown' principle implements the choice of the arm with maximal upper confidence bound. An alternative, pioneered by (Baransi et al. 2014), and called BESA, proceeds instead by subsampling without replacement the larger sample set. In this framework, we designed a contextual bandit algorithm based on sub-sampling without replacement, relaxing the (unrealistic) assumption that all arm reward distributions rely on the same parameter. The CL-BESA algorithm yields both theoretical guarantees of logarithmic regret and good empirical behavior
Strengthening the heart of an SMT-solver : Design and implementation of efficient decision procedures by Mohamed Iguernelala( )

1 edition published in 2013 in English and held by 1 WorldCat member library worldwide

This thesis tackles the problem of automatically proving the validity of mathematical formulas generated by program verification tools. In particular, it focuses on Satisfiability Modulo Theories (SMT): a young research topic that has seen great advances during the last decade. The solvers of this family have various applications in hardware design, program verification, model checking, etc.SMT solvers offer a good compromise between expressiveness and efficiency. They rely on a tight cooperation between a SAT solver and a combination of decision procedures for specific theories, such as the free theory of equality with uninterpreted symbols, linear arithmetic over integers and rationals, or the theory of arrays.This thesis aims at improving the efficiency and the expressiveness of the Alt-Ergo SMT solver. For that, we designed a new decision procedure for the theory of linear integer arithmetic. This procedure is inspired by Fourier-Motzkin's method, but it uses a rational simplex to perform computations in practice. We have also designed a new combination framework, capable of reasoning in the union of the free theory of equality, the AC theory of associative and commutativesymbols, and an arbitrary signature-disjoint Shostak theory. This framework is a modular and non-intrusive extension of the ground AC completion procedure with the given Shostak theory. In addition, we have extended Alt-Ergo with existing decision procedures to integrate additional interesting theories, such as the theory of enumerated data types and the theory of arrays. Finally, we have explored preprocessing techniques for formulas simplification as well as the enhancement of Alt-Ergo's SAT solver
 
moreShow More Titles
fewerShow Fewer Titles
Audience Level
0
Audience Level
1
  General Special  
Audience level: 0.98 (from 0.92 for Optimized ... to 0.99 for Optimized ...)

Alternative Names
Ecole doctorale d'informatique de Paris-Sud

ED 427

ED427

EDIPS (Orsay, Essonne)

Université Paris 11. Ecole doctorale d'informatique de Paris-Sud

Université Paris-Sud. Ecole doctorale d'informatique de Paris-Sud

Languages
English (14)

French (7)