Upcoming Events | Past Events

Past Events

  • Thu
    18
    Sep
    2014

    Reading Group talk by Martin

    11:00 amRoom D102, Bat D, LIG

    Title: Uncovering Locally Characterizing Regions within Geotagged Data
    from VLDB 2014

  • Thu
    02
    Oct
    2014

    Reading Group talk by Sofia

    11:30 amD102

    Online detection of Geo-Correlated Information Trends in Social Networks, VLDB 2014

  • Tue
    14
    Oct
    2014
    Fri
    17
    Oct
    2014

    Conference BDA 2014

    Grenoble – Autrans, à l'Escandille

    La 30ème édition de la conférence BDA (Gestion de Données – Principes, Technologies et Applications) est organisée à Autrans, à quelques kilomètres de son lieu de naissance, Saint Pierre de Chartreuse (BDA 1985), du 14 au 17 octobre 2014.
    Cette édition marquera encore plus les changements radicaux à l’œuvre dans notre domaine (hyper-massification des données, diversification des supports de calcul, renouveau des usages, etc.) BDA 2014 est organisée autour de la présentation d’articles originaux, de sessions démonstrations et jeunes chercheurs. Des invités prestigieux viendront esquisser avec nous, au sein de keynotes et d’ateliers, les futurs systèmes de gestion de données (Divesh Srivastava, C. Mohan, Luc de Raedt, Serge Abiteboul, Cyril Labbé).

     

  • Thu
    16
    Oct
    2014

    Reading Group Talk by Serge

    11:30 amD102

    A Framework for Periodic Outlier Pattern Detection in Time-Series Sequences by Rasheed, F. and Alhajj, R.
    IEEE Transactions on Cybernetics (Volume:44, Issue: 5), 2014

  • Thu
    30
    Oct
    2014

    Reading Group Talk by Christiane

    11:30 amD102

    Activity ranking in LinkedIn feed by Agarwal, D. et al.
    KDD 2014

  • Tue
    04
    Nov
    2014
    Wed
    05
    Nov
    2014

    Visitor: Susan Davidson

    Susan B. Davidson received the B.A. degree in Mathematics from Cornell University in 1978, and the M.A. and Ph.D. degrees in Electrical Engineering and Computer Science from Princeton University. Dr. Davidson joined the University of Pennsylvania in 1982, and is now the Weiss Professor of Computer and Information Science (CIS). She is an ACM Fellow, a Fulbright scholar, and formerly served as Department Chair of CIS and Deputy Dean of the School of Engineering and Applied Science. She was also a founding co-Director of the Center for Bioinformatics at UPenn (PCBI). The PCBI is a multi-school center spanning Medicine, Engineering and Applied Sciences, and Arts and Sciences, and is known for its pioneering work in database integration, genomic schema development, visualization tools, and annotation systems.

    Dr. Davidson’s research interests include database and crowdsourced systems, bioinformatics, and scientific workflow systems. Within bioinformatics she is best known for her work in data integration strategies, with XML as a data exchange and integration strategy, and with provenance for scientific workflows.

  • Tue
    04
    Nov
    2014

    Query Driven Crowd Mining

    2:00 pmAmphi MJK

    by Susan Davidson

    Harnessing the crowd to collect massive amounts of data (crowdsourcing) has become increasingly popular. Examples in our culture include Wikipedia, social tagging systems for images, traffic information aggregators like Waze, and hotel and movie ratings like TripAdvisor and IMDb. In this talk, I give an overview of the challenges inherent in providing declarative, database-style platforms for supporting crowdsourcing. I then discuss how to go one step further and enable users to posed general questions to mine the crowd for potentially relevant data, and to receive concise, relevant answers that represent frequent significant data patterns. I close by discussing the challenges that crowd mining poses for provenance.

    Dr. Davidson’s research interests include database and crowdsourced systems, bioinformatics, and scientific workflow systems. Within bioinformatics she is best known for her work in data integration strategies, with XML as a data exchange and integration strategy, and with provenance for scientific workflows.

  • Thu
    13
    Nov
    2014

    Reading Group Talk by Oleg

    11:30 amD102

    Experiences with Mining Temporal Event Sequences from Electronic Medical Records: Initial Successes and Some Challenges by Patnaik et al.
    KDD 2011 industrial track

  • Thu
    27
    Nov
    2014

    Reading Group Talk by Behrooz

    11:30 amD102

    Demographics, Weather and Online Reviews: a Study of Restaurant Recommendations by Bakhshi et al.
    KDD 2011 industrial track

  • Fri
    05
    Dec
    2014

    Thesis defense: Christiane KAMDEM KENGNE, "Abstraction et comparaison de traces d'exécution pour l'analyse d'applications multimédias embarquées"

    4:00 pmUFR IM2AG, amphi F022

    Mots-clés : techniques d’optimisation,détection d’anomalies , fouille de séquences,traces exécution,mesures de dissimilarité,applications multimedia,

    Résumé :
    De nos jours, dû à la complexité croissante des applications et du matériel, il est difficile de comprendre ce qui se passe durant l’exécution de ces applications. Les techniques de traçage sont communément utilisées pour collecter et fournir les informations sur l’application sous forme de traces d’exécution. Les traces d’exécution, qui sont des séquences d’événements, peuvent être très volumineuses (elles atteignent facilement des millions d’événements), difficiles à comprendre et donc nécessitent des outils d’analyse spécifiques. Un cas critique est l’analyse d’applications pour systèmes embarqués tels les décodeurs ou les smartphones, en particulier pour comprendre les bugs d’applications multimédias. Dans cette thèse, nous proposons deux nouvelles techniques adaptées aux applications multimédia sur systèmes embarqués. La première méthode réduit la taille de la trace donnée aux analystes. Cette méthode nécessite de regrouper un ensemble d’événements connexes. Nous proposons une approche basée sur des techniques d’optimisation et de fouille de motifs afin d’extraire automatiquement un ensemble de sous-séquences d’une trace. Nos expérimentations ont montré que cette méthode passe à l’échelle sur de gros volumes de données, et ont par la même occasion mis en évidence l’intérêt pratique de cette approche. La seconde contribution consiste en la mise en place d’une méthode de diagnostic basée sur la comparaison de traces d’éxécution avec des traces de référence. Cette méthode est implémentée dans TED, notre outil de diagnostic de traces. Les expérimentations faites sur des cas d’utilisation concrets de traces d’exécution multimédia ont validé que TED passe à l’échelle et apporte une plus-value à l’analyse de traces. Nous montrons aussi que l’outil peut être appliqué sur des traces de taille réduite afin d’améliorer davantage le passage à l’échelle.

    Composition du jury proposé :
    Mme Marie Christine ROUSSET, Université de Grenoble (Directeur de thèse)
    M. Maurice TCHUENTE, Université de Yaoundé I ( CoDirecteur de thèse)
    Mme Noha IBRAHIM, Université de Grenoble (Co-encadrant de thèse)
    M. Pascal PONCELET, Université de Montpellier 2 (Rapporteur)
    M. Laks V.S. LAKSHMANAN, University of British Columbia (Rapporteur)
    M. Eric GAUSSIER, Université de Grenoble (Examinateur)
    M. Alexandre TERMIER, Université de Rennes 1 (Examinateur)
    Mme Celine ROBARDET, INSA Lyon Université de Lyon (Examinateur)

  • Thu
    11
    Dec
    2014

    Reading Group Talk by Julien

    11:30 amD102

    Overlap Interval Partition Join by Dignös et al.
    SIGMOD 14

  • Mon
    15
    Dec
    2014

    Thesis defense: Mustafa AL-BAKRI, "Uncertainty-sensitive Reasoning over the Web of Data"

    11:00 amamphi MJK

    Abstract
    In this thesis we investigate several approaches that help users to find useful and trustful information in the Web of Data using the Semantic Web technologies. In this purpose, we tackle two
    research issues: Data Linkage in Linked Data and Trust in Semantic P2P Networks.
    We model the problem of data linkage in Linked Data as a reasoning problem on possibly decentralized data. We describe a novel Import-by-Query algorithm that alternates steps of sub-query rewriting and of tailored querying the Linked Data cloud in order to import data as specific as possible for inferring or contradicting given target same-as facts. Experiments conducted on real-world datasets have demonstrated the feasibility of this approach and its usefulness in practice for data linkage and disambiguation. Furthermore, we propose an adaptation of this approach to take into account possibly uncertain data and knowledge, with a result, the inference of same-as and different-from links having some weights. In this adaptation we modeled uncertainty as probability values. Our experiments have showed that our the adapted approach scales to large data sets and produces meaningful probabilistic weights.
    Concerning trust, we introduce a trust mechanism for guiding the query-answering process in Semantic P2P Networks. Peers in Semantic P2P Networks organize their information using separate ontologies and rely on alignments between their ontologies for translating queries. Trust is such a setting is subjective and estimates the probability that a peer will provide satisfactory answers for specific queries in future interactions. In order to compute trust, the mechanism exploits the information provided by alignments, along with the one that comes from peer’s experiences.
    The calculated trust values are refined over time using Bayesian inference as more queries are sent and answers received. For the evaluation of our mechanism, we build a semantic P2P bookmarking system (TrustMe) in which we can vary different quantitative and qualitative parameters. The results show the convergence of trust, and highlight the gain in the quality of peers’ answers —measured with precision and recall— when the process of query answering is guided by our trust mechanism.

    Jury:
    Mme Marie-Christine Rousset, Université de Grenoble-Alpes (Directeur de thèse)
    M. Manuel Atencia, Université de Grenoble-Alpes (Co-Directeur de thèse)
    M. Mohand-Said Hacid, Université Claude Bernard Lyon 1 (Rapporteur)
    M. Andrea Tettamanzi, Université Nice Sophia Antipolis (Rapporteur)
    M. Jérôme Euzenat, INRIA Grenoble Rhône-Alpes (Examinateur)
    Mme. Marie-Laure Mugnier, Université de Montpellier 2 (Examinateur)

  • Thu
    15
    Jan
    2015

    Reading Group Talk by Shashwat

    11:00 amD102

    Beyond Itemsets: Mining Frequent Featuresets over Structured Items by Thirumuruganathan et al.
    VLDB 2015 (to appear)

  • Thu
    29
    Jan
    2015

    Reading Group talk by Rafik

    11:30 amRoom D102, Bat D, LIG

    Title: ClusterJoin: A Similarity Joins Framework using Map-Reduce, VLDB 2014

  • Thu
    26
    Feb
    2015

    Citizen as smart sensor – measuring the city

    10:00 amRoom D102, Bat D, LIG

    by Francois Charoy
    (LORIA)

    Le crowdsourcing contextuel, en particulier le crowdsourcing spatial est de plus en plus utilisé pour produire, annoter, mettre à jour ou faire des recommandations sur des bases géographiques. Nous proposons d’utiliser cette approche pour conduire des campagnes de mesures et d’évaluations sur des villes en utilisant l’intelligence de la foule. Dans cette présentation, nous présentons l’état actuel de ce travail. Nous cherchons pour l’instant à valider par simulation différentes approches (patterns) permettant de faire collecter et analyser des données par des citoyens. Cette simulation pose plusieurs problèmes que nous présenterons.
    (details here)

  • Thu
    05
    Mar
    2015

    Reading Group talk by Rafik

    11:30 amRoom D102, Bat D, LIG

    Title: Query-Aware Compression of Join Results, EDBT 2013

  • Mon
    16
    Mar
    2015

    !CANCELED! Pushing the Limits of Instance Matching Systems: A Semantics-Aware Benchmark for Linked Data

    1:30 pmRoom D102, Bat D, LIG

    This talk is canceled and reported to undefined date.

    by Irini Fundulaki
    (Institute of Computer Science, Greece)

    In this talk we are going to present the Semantic Publishing Instance Matching Benchmark (SPIMBENCH) developed in the context of the Linked Data Benchmark Council EU project. SPIMBench allows the benchmarking of instance matching systems against not only structure-based and value-based test cases, but also against semantics-aware test cases based on OWL axioms. The benchmark features a scalable data generator and a weighted gold standard that can be used for debugging instance matching systems and for reporting how well they perform in various matching tasks.

  • Thu
    19
    Mar
    2015

    Reading Group talk by Behrooz

    11:30 amRoom D102, Bat D, LIG

    Title: Group Recommendation with Temporal Affinities, EDBT 2015

  • Tue
    24
    Mar
    2015

    Workshop on Traces: "Ethics and Property: How to garantee data reproducibility and data analysis?"

    14:00 – 17:00Amphi MJK

    Reproducibility of experiments and analysis by others is one of the pillars of modern science. Yet, the description of experimental protocols (particularly in computer science articles) is often lacunar and rarely allows to reproduce a study. Such inaccuracies may not have been too problematic 20 years ago when hardware and operating systems were not too complex. However nowadays are made of a huge number of heterogeneous components and rely on an software stack (OS, compiler, libraries, virtual machines, …) that are so complex that they cannot be perfectly controlled anymore. As a consequence some observations have become more and more difficult to reproduce and to explain by other researchers and even sometimes by the original researchers themselves.

     In the last decade there has been an increasing number of article withdrawal even in prestigious journals and the realization by both the scientific community and the general public that many research results and studies were actually flawed and wrong.

     Open science is the umbrella term of the movement to make scientific research, data and dissemination accessible to all levels of an inquiring society. In particular, it encompasses practices such as the use of open laboratory notebooks and reproducible research, which refers to the idea that the ultimate product of academic research is the paper along with the full computational environment used to produce the results in the paper such as the code, data, etc. that can be used to reproduce the results and create new work based on the research.

    Program of the Workshop.

  • Tue
    31
    Mar
    2015
    Thu
    30
    Apr
    2015

    LIG's PhD day: Sofia wins the 2mn madness presentation

    Congratulations Sofia !
    She wins the 2mn presentation for 2nd-year PhD students during the LIG’s PhD day the 2015/03/26.
    Sofia 2mn madness presentation
    LIG’s PhD day is a great opportunity for graduate students to get together and present their work. This year we will have 25 graduate students giving a “2-minutes madness” presentation to go along with a poster session right after.

  • Thu
    02
    Apr
    2015

    Reading Group talk by Fiston

    11:30 amRoom D102, Bat D, LIG

  • Thu
    30
    Apr
    2015

    Reading Group talk by Shaswhat

    11:30 amRoom D102, Bat D, LIG

  • Thu
    21
    May
    2015

    Reading Group talk by Martin

    11:30 amRoom D102, Bat D, LIG

    Title: Inferring Continuous Dynamic Social Influence and Personal Preference for Temporal Behavior Prediction by Zhang et al. from VLDB 2015

  • Thu
    04
    Jun
    2015

    Reading Group talk by Sofia

    11:00 amD102

  • Thu
    09
    Jul
    2015

    Reading Group talk by Julien

    11:30 amD327

    Joins for Hybrid Warehouses: Exploiting Massive Parallelism in Hadoop and Enterprise Data Warehouses, EDBT 2015

  • Thu
    23
    Jul
    2015

    Reading Group Talk by Oleg

    11:30 amD102

    A few useful things to know about machine learning by P Domingos.
    CACM 2012.

  • Thu
    24
    Sep
    2015

    Reading Group

    11:30 amD 102

    Title: Skypatterns

    Speaker: Willy Ugarte-Rojas

  • Mon
    28
    Nov
    2016

    Using Social Media for Health Studies, by Ingmar Weber

    11:00 amIMAG, room 406

    Abstract

    In this presentation, I’ll present research done at the Qatar Computing Research Institute on using social media for health studies. Most of my work in this domain is related to understanding lifestyle diseases such as obesity and diabetes and has a two-fold goal: (i) monitoring population health and health-related lifestyles, and (ii) combining social media and quantified self data for a more complete and holistic patient view.
     
    Concerning population-level studies, I’ll present results on using food mentions on Twitter (CHI’15) and Instagram images (CHI’16) for modeling regional variations in obesity and diabetes rates. I’ll also discuss a feasibility study on crowdsourcing “does this person look overweight” labels (DigitalHealth’16).
     
    On the combination of quantified self and social media data, I’ll show how tweets from smart scales create an interesting data linking the two domains (DigitalHealth’16), how data from sleeping tracking apps on social media can be used for sleep studies (ICHI’16), and howpublic food diaries can help us predict dieting success or failure (CSCW’17, forthcoming).
     
    If there’s time, then I’ll also discuss the aspect of behavioral change and the importance of receiving social feedback on continued participation in a weight loss subreddit (DigitalHealth’16).

    Bio

    Ingmar Weber is a senior scientist in the Social Computing group at the Qatar Computing Research Institute (QCRI). His interdisciplinary research uses large amounts of online data from social media and other sources to study human behavior at scale. Particular topics of interest include studying lifestyle diseases and population health, quantifying international migration using digital methods, and looking at political polarization and extremism. He has published over 100 peer-reviewed articles and his work is frequently featured in popular press. Since 2016 he has been selected as an ACM Distinguished Speaker. 
    As an undergraduate Dr. Weber studied mathematics at Cambridge University (1999-2003), before pursuing a PhD at the Max-Planck Institute for Computer Science (2003-2007). He subsequently held positions at the Ecole Polytechnique Federale de Lausanne (2007-2009) and Yahoo Research Barcelona (2009-2012), as well as a visiting researcher position at Microsoft Research Cambridge (summer 2008). He serves on a number of program committees for top-tier conferences in the domain of web data mining and social media analysis including ICWSM, KDD, WWW, ACL, SDM, VLDB and WebSci, as well as on the editorial board for the Journal of Web Science.
  • Mon
    28
    Nov
    2016

    Ph.D. Defense, Sofia Kleisarchaki

    2:00 pmIMAG, Auditorium

    Difference Analysis in Big Data: Exploration, Explanation, Evolution

    Abstract:
    Variability in Big Data refers to data whose meaning changes continuously. For instance, data derived from social platforms and from monitoring applications, exhibits great variability. This variability is essentially the result of changes in the underlying data distributions of attributes of interest, such as user opinions/ratings, computer network measurements, etc. Difference Analysis aims to study variability in Big Data. To achieve that goal, data scientists need: (a) measures to compare data in various dimensions such as age for users or topic for network traffic, and (b) efficient algorithms to detect changes in massive data. In this thesis, we identify and study three novel analytical tasks to capture data variability: Difference Exploration, Difference Explanation and Difference Evolution.

    Difference Exploration is concerned with extracting the opinion of different user segments (e.g., on a movie rating website). We propose appropriate measures for comparing user opinions in the form of rating distributions, and efficient algorithms that, given an opinion of interest in the form of a rating histogram, discover agreeing and disargreeing populations. Difference Explanation tackles the question of providing a succinct explanation of differences between two datasets of interest (e.g., buying habits of two sets of customers). We propose scoring functions designed to rank explanations, and algorithms that guarantee explanation conciseness and informativeness. Finally, Difference Evolution tracks change in an input dataset over time and summarizes change at multiple time granularities. We propose a query-based approach that uses similarity measures to compare consecutive clusters over time. Our indexes and algorithms for Difference Evolution are designed to capture different data arrival rates (e.g., low, high) and different types of change (e.g., sudden, incremental). The utility and scalability of all our algorithms relies on hierarchies inherent in data (e.g., time, demographic).

    We run extensive experiments on real and synthetic datasets to validate the usefulness of the three analytical tasks and the scalability of our algorithms. We show that Difference Exploration guides end-users and data scientists in uncovering the opinion of different user segments in a scalable way. Difference Explanation reveals the need to parsimoniously summarize differences between two datasets and shows that parsimony can be achieved by exploiting hierarchy in data. Finally, our study on Difference Evolution provides strong evidence that a query-based approach is well-suited to tracking change in datasets with varying arrival rates and at multiple time granularities. Similarly, we show that different clustering approaches
    can be used to capture different types of change.

    The committee will be composed of:
    – Pr, CLAUDIA RONCANCIO, GRENOBLE INP, President
    – Pr, ALBERT BIFET, TELECOM PARISTECH, Reviewer
    – Dr, INGMAR WEBER, QCRI – QATAR, Reviewer
    – Pr, ANGELA BONIFATI, UNIVERSITE DE LYON, Examiner
    – Pr, ANNE LAURENT, UNIVERSITE DE MONTPELLIER, Examiner
    – Pr, IOANNIS TSAMARDINOS, UNIVERSITE DE CRETE – GRECE, Examiner
    – Dr, SIHEM AMER-YAHIA, CNRS DELEGATION ALPES, Advisor
    – Pr, VASSILIS CHRISTOPHIDES, UNIVERSITE DE CRETE – GRECE, Advisor