Upcoming Events | Past Events

Past Events

  • Tue

    Query Driven Crowd Mining

    2:00 pmAmphi MJK

    by Susan Davidson

    Harnessing the crowd to collect massive amounts of data (crowdsourcing) has become increasingly popular. Examples in our culture include Wikipedia, social tagging systems for images, traffic information aggregators like Waze, and hotel and movie ratings like TripAdvisor and IMDb. In this talk, I give an overview of the challenges inherent in providing declarative, database-style platforms for supporting crowdsourcing. I then discuss how to go one step further and enable users to posed general questions to mine the crowd for potentially relevant data, and to receive concise, relevant answers that represent frequent significant data patterns. I close by discussing the challenges that crowd mining poses for provenance.

    Dr. Davidson’s research interests include database and crowdsourced systems, bioinformatics, and scientific workflow systems. Within bioinformatics she is best known for her work in data integration strategies, with XML as a data exchange and integration strategy, and with provenance for scientific workflows.

  • Fri

    Thesis defense: Christiane KAMDEM KENGNE, "Abstraction et comparaison de traces d'exécution pour l'analyse d'applications multimédias embarquées"

    4:00 pmUFR IM2AG, amphi F022

    Mots-clés : techniques d’optimisation,détection d’anomalies , fouille de séquences,traces exécution,mesures de dissimilarité,applications multimedia,

    Résumé :
    De nos jours, dû à la complexité croissante des applications et du matériel, il est difficile de comprendre ce qui se passe durant l’exécution de ces applications. Les techniques de traçage sont communément utilisées pour collecter et fournir les informations sur l’application sous forme de traces d’exécution. Les traces d’exécution, qui sont des séquences d’événements, peuvent être très volumineuses (elles atteignent facilement des millions d’événements), difficiles à comprendre et donc nécessitent des outils d’analyse spécifiques. Un cas critique est l’analyse d’applications pour systèmes embarqués tels les décodeurs ou les smartphones, en particulier pour comprendre les bugs d’applications multimédias. Dans cette thèse, nous proposons deux nouvelles techniques adaptées aux applications multimédia sur systèmes embarqués. La première méthode réduit la taille de la trace donnée aux analystes. Cette méthode nécessite de regrouper un ensemble d’événements connexes. Nous proposons une approche basée sur des techniques d’optimisation et de fouille de motifs afin d’extraire automatiquement un ensemble de sous-séquences d’une trace. Nos expérimentations ont montré que cette méthode passe à l’échelle sur de gros volumes de données, et ont par la même occasion mis en évidence l’intérêt pratique de cette approche. La seconde contribution consiste en la mise en place d’une méthode de diagnostic basée sur la comparaison de traces d’éxécution avec des traces de référence. Cette méthode est implémentée dans TED, notre outil de diagnostic de traces. Les expérimentations faites sur des cas d’utilisation concrets de traces d’exécution multimédia ont validé que TED passe à l’échelle et apporte une plus-value à l’analyse de traces. Nous montrons aussi que l’outil peut être appliqué sur des traces de taille réduite afin d’améliorer davantage le passage à l’échelle.

    Composition du jury proposé :
    Mme Marie Christine ROUSSET, Université de Grenoble (Directeur de thèse)
    M. Maurice TCHUENTE, Université de Yaoundé I ( CoDirecteur de thèse)
    Mme Noha IBRAHIM, Université de Grenoble (Co-encadrant de thèse)
    M. Pascal PONCELET, Université de Montpellier 2 (Rapporteur)
    M. Laks V.S. LAKSHMANAN, University of British Columbia (Rapporteur)
    M. Eric GAUSSIER, Université de Grenoble (Examinateur)
    M. Alexandre TERMIER, Université de Rennes 1 (Examinateur)
    Mme Celine ROBARDET, INSA Lyon Université de Lyon (Examinateur)

  • Mon

    Thesis defense: Mustafa AL-BAKRI, "Uncertainty-sensitive Reasoning over the Web of Data"

    11:00 amamphi MJK

    In this thesis we investigate several approaches that help users to find useful and trustful information in the Web of Data using the Semantic Web technologies. In this purpose, we tackle two
    research issues: Data Linkage in Linked Data and Trust in Semantic P2P Networks.
    We model the problem of data linkage in Linked Data as a reasoning problem on possibly decentralized data. We describe a novel Import-by-Query algorithm that alternates steps of sub-query rewriting and of tailored querying the Linked Data cloud in order to import data as specific as possible for inferring or contradicting given target same-as facts. Experiments conducted on real-world datasets have demonstrated the feasibility of this approach and its usefulness in practice for data linkage and disambiguation. Furthermore, we propose an adaptation of this approach to take into account possibly uncertain data and knowledge, with a result, the inference of same-as and different-from links having some weights. In this adaptation we modeled uncertainty as probability values. Our experiments have showed that our the adapted approach scales to large data sets and produces meaningful probabilistic weights.
    Concerning trust, we introduce a trust mechanism for guiding the query-answering process in Semantic P2P Networks. Peers in Semantic P2P Networks organize their information using separate ontologies and rely on alignments between their ontologies for translating queries. Trust is such a setting is subjective and estimates the probability that a peer will provide satisfactory answers for specific queries in future interactions. In order to compute trust, the mechanism exploits the information provided by alignments, along with the one that comes from peer’s experiences.
    The calculated trust values are refined over time using Bayesian inference as more queries are sent and answers received. For the evaluation of our mechanism, we build a semantic P2P bookmarking system (TrustMe) in which we can vary different quantitative and qualitative parameters. The results show the convergence of trust, and highlight the gain in the quality of peers’ answers —measured with precision and recall— when the process of query answering is guided by our trust mechanism.

    Mme Marie-Christine Rousset, Université de Grenoble-Alpes (Directeur de thèse)
    M. Manuel Atencia, Université de Grenoble-Alpes (Co-Directeur de thèse)
    M. Mohand-Said Hacid, Université Claude Bernard Lyon 1 (Rapporteur)
    M. Andrea Tettamanzi, Université Nice Sophia Antipolis (Rapporteur)
    M. Jérôme Euzenat, INRIA Grenoble Rhône-Alpes (Examinateur)
    Mme. Marie-Laure Mugnier, Université de Montpellier 2 (Examinateur)

  • Thu

    Citizen as smart sensor – measuring the city

    10:00 amRoom D102, Bat D, LIG

    by Francois Charoy

    Le crowdsourcing contextuel, en particulier le crowdsourcing spatial est de plus en plus utilisé pour produire, annoter, mettre à jour ou faire des recommandations sur des bases géographiques. Nous proposons d’utiliser cette approche pour conduire des campagnes de mesures et d’évaluations sur des villes en utilisant l’intelligence de la foule. Dans cette présentation, nous présentons l’état actuel de ce travail. Nous cherchons pour l’instant à valider par simulation différentes approches (patterns) permettant de faire collecter et analyser des données par des citoyens. Cette simulation pose plusieurs problèmes que nous présenterons.
    (details here)

  • Mon

    !CANCELED! Pushing the Limits of Instance Matching Systems: A Semantics-Aware Benchmark for Linked Data

    1:30 pmRoom D102, Bat D, LIG

    This talk is canceled and reported to undefined date.

    by Irini Fundulaki
    (Institute of Computer Science, Greece)

    In this talk we are going to present the Semantic Publishing Instance Matching Benchmark (SPIMBENCH) developed in the context of the Linked Data Benchmark Council EU project. SPIMBench allows the benchmarking of instance matching systems against not only structure-based and value-based test cases, but also against semantics-aware test cases based on OWL axioms. The benchmark features a scalable data generator and a weighted gold standard that can be used for debugging instance matching systems and for reporting how well they perform in various matching tasks.

  • Tue

    Workshop on Traces: "Ethics and Property: How to garantee data reproducibility and data analysis?"

    14:00 – 17:00Amphi MJK

    Reproducibility of experiments and analysis by others is one of the pillars of modern science. Yet, the description of experimental protocols (particularly in computer science articles) is often lacunar and rarely allows to reproduce a study. Such inaccuracies may not have been too problematic 20 years ago when hardware and operating systems were not too complex. However nowadays are made of a huge number of heterogeneous components and rely on an software stack (OS, compiler, libraries, virtual machines, …) that are so complex that they cannot be perfectly controlled anymore. As a consequence some observations have become more and more difficult to reproduce and to explain by other researchers and even sometimes by the original researchers themselves.

     In the last decade there has been an increasing number of article withdrawal even in prestigious journals and the realization by both the scientific community and the general public that many research results and studies were actually flawed and wrong.

     Open science is the umbrella term of the movement to make scientific research, data and dissemination accessible to all levels of an inquiring society. In particular, it encompasses practices such as the use of open laboratory notebooks and reproducible research, which refers to the idea that the ultimate product of academic research is the paper along with the full computational environment used to produce the results in the paper such as the code, data, etc. that can be used to reproduce the results and create new work based on the research.

    Program of the Workshop.

  • Mon

    Using Social Media for Health Studies, by Ingmar Weber

    11:00 amIMAG, room 406


    In this presentation, I’ll present research done at the Qatar Computing Research Institute on using social media for health studies. Most of my work in this domain is related to understanding lifestyle diseases such as obesity and diabetes and has a two-fold goal: (i) monitoring population health and health-related lifestyles, and (ii) combining social media and quantified self data for a more complete and holistic patient view.
    Concerning population-level studies, I’ll present results on using food mentions on Twitter (CHI’15) and Instagram images (CHI’16) for modeling regional variations in obesity and diabetes rates. I’ll also discuss a feasibility study on crowdsourcing “does this person look overweight” labels (DigitalHealth’16).
    On the combination of quantified self and social media data, I’ll show how tweets from smart scales create an interesting data linking the two domains (DigitalHealth’16), how data from sleeping tracking apps on social media can be used for sleep studies (ICHI’16), and howpublic food diaries can help us predict dieting success or failure (CSCW’17, forthcoming).
    If there’s time, then I’ll also discuss the aspect of behavioral change and the importance of receiving social feedback on continued participation in a weight loss subreddit (DigitalHealth’16).


    Ingmar Weber is a senior scientist in the Social Computing group at the Qatar Computing Research Institute (QCRI). His interdisciplinary research uses large amounts of online data from social media and other sources to study human behavior at scale. Particular topics of interest include studying lifestyle diseases and population health, quantifying international migration using digital methods, and looking at political polarization and extremism. He has published over 100 peer-reviewed articles and his work is frequently featured in popular press. Since 2016 he has been selected as an ACM Distinguished Speaker. 
    As an undergraduate Dr. Weber studied mathematics at Cambridge University (1999-2003), before pursuing a PhD at the Max-Planck Institute for Computer Science (2003-2007). He subsequently held positions at the Ecole Polytechnique Federale de Lausanne (2007-2009) and Yahoo Research Barcelona (2009-2012), as well as a visiting researcher position at Microsoft Research Cambridge (summer 2008). He serves on a number of program committees for top-tier conferences in the domain of web data mining and social media analysis including ICWSM, KDD, WWW, ACL, SDM, VLDB and WebSci, as well as on the editorial board for the Journal of Web Science.
  • Mon

    Ph.D. Defense, Sofia Kleisarchaki

    2:00 pmIMAG, Auditorium

    Difference Analysis in Big Data: Exploration, Explanation, Evolution

    Variability in Big Data refers to data whose meaning changes continuously. For instance, data derived from social platforms and from monitoring applications, exhibits great variability. This variability is essentially the result of changes in the underlying data distributions of attributes of interest, such as user opinions/ratings, computer network measurements, etc. Difference Analysis aims to study variability in Big Data. To achieve that goal, data scientists need: (a) measures to compare data in various dimensions such as age for users or topic for network traffic, and (b) efficient algorithms to detect changes in massive data. In this thesis, we identify and study three novel analytical tasks to capture data variability: Difference Exploration, Difference Explanation and Difference Evolution.

    Difference Exploration is concerned with extracting the opinion of different user segments (e.g., on a movie rating website). We propose appropriate measures for comparing user opinions in the form of rating distributions, and efficient algorithms that, given an opinion of interest in the form of a rating histogram, discover agreeing and disargreeing populations. Difference Explanation tackles the question of providing a succinct explanation of differences between two datasets of interest (e.g., buying habits of two sets of customers). We propose scoring functions designed to rank explanations, and algorithms that guarantee explanation conciseness and informativeness. Finally, Difference Evolution tracks change in an input dataset over time and summarizes change at multiple time granularities. We propose a query-based approach that uses similarity measures to compare consecutive clusters over time. Our indexes and algorithms for Difference Evolution are designed to capture different data arrival rates (e.g., low, high) and different types of change (e.g., sudden, incremental). The utility and scalability of all our algorithms relies on hierarchies inherent in data (e.g., time, demographic).

    We run extensive experiments on real and synthetic datasets to validate the usefulness of the three analytical tasks and the scalability of our algorithms. We show that Difference Exploration guides end-users and data scientists in uncovering the opinion of different user segments in a scalable way. Difference Explanation reveals the need to parsimoniously summarize differences between two datasets and shows that parsimony can be achieved by exploiting hierarchy in data. Finally, our study on Difference Evolution provides strong evidence that a query-based approach is well-suited to tracking change in datasets with varying arrival rates and at multiple time granularities. Similarly, we show that different clustering approaches
    can be used to capture different types of change.

    The committee will be composed of:
    – Dr, INGMAR WEBER, QCRI – QATAR, Reviewer