AusDM 2015 will be sharing Keynote Speeches with ICDS 2015 (http://ic-datascience.org/icds2015/).
Prof Huan Liu
Arizona State University
Title:Discovering Negative Links on Social Networking Sites.
Abstract: Social networking sites make it easy for users to connect with, follow, or "like" each other. Such a mechanism promotes positive connections and helps a social networking site to grow without direct belligerent or negative encounters. This type of one-way connections makes no distinction between indifference and dislike; in other words, two users have only, by default, positive connections. However, it is apparent that as one's network grows, some users might not be benevolent toward each other, or negative links could form, though not explicitly stated. In this talk, we assess the need for discovering such hidden negative links, explore ways of finding negative links, and show the significance of negative links in social media applications like data classification and clustering, recommendation systems, link prediction, and tie-strength estimation. *This presentation is based on Dr. Jiliang Tang's Doctoral Dissertation at ASU.
Prof Ramamohanarao Kotagiri
University of Melbourne
Title:Large Scale Metric Learning using Locality Sensitive Hashing.
Abstract:Metric learning tries discover mapping of features such that objects belonging a particular class each other in the new space. However, the current methods of discovering such matric mappings are computationally in feasible when the data set is huge with large number of features. My talk will describe the state of the art algorithms for metric learning. I will present our recent work on an efficient approach for discovering metric learning based mappings using Locality Sensitive Hashing (LSH). Our generic approach can accelerate state-of-the-art metric learning while achieving competitive classification accuracy, expanding feasibility by an order of magnitude. Our approach can accelerate Large Margin Nearest Neighbour (LMNN) to learn metrics on 1,000,000 samples in 3.6 minutes which is reduced from 5.8 hours.
Prof Jian Pei
Simon Fraser University
Title:Big Data for Everyone
Abstract:Big Data post grand opportunities and challenges for egocentric analytics on Big Data. In this talk, I will discuss several interesting problems centered on egocentric queries and analysis on Big Data. We want to answer a series of natural questions imperative in several killer applications, such as "How is this patient similar to or different from the other Type II diabetes patients in the database?", "How is University X distinct from the other universities?", and "How is this residential property distinct from the others available in the market?" To answer such questions on Big Data, we have to search data of high dimensionality and high volume, and possibly of high dynamics as well. I will present some preliminary research results and some application case studies we obtained recently, as well as more challenges we identified.
Prof Yong Shi
Chinese Academy of Sciences
Title: Big Data Mining and Data Science.
Abstract:Big Data has become a reality that no one can ignore. Big Data is our environment whenever we need to make a decision. Big Data is a buzz word that makes everyone understands how important it is. Big Data shows a big opportunity for academia, industry and government. Big Data then is a big challenge for all parties. This talk will discuss some fundamental issues of Big Data problems, such as data heterogeneity vs. decision heterogeneity, data stream research and data-driven decision management. Furthermore, this talk will provide a number of real-life Big Data Applications. In the conclusion, the talk suggests a number of open research problems in Data Science, which is a growing field beyond Big Data.
Prof Geoff Webb
Title:Deep Broad Learning - Big Models for Big Data.
Abstract:Deep learning has demonstrated the power of detailed modeling of complex high-order (multivariate) interactions in data. For some learning tasks there is value in learning models that are not only Deep but also Broad. By Broad, we mean models that incorporate evidence from large numbers of features. This is particularly valuable in applications where many different features and combinations of features all carry small amounts of information about the class. The most accurate models will integrate all that information. In this talk, we propose an algorithm for Deep Broad Learning (DBL). The proposed algorithm has only one tunable parameter n, that specifies the depth of the model. We demonstrate that DBL learns models on several big datasets that are highly competitive to those learned by state-of-the-art alternatives.
Prof Wei Wang
University of California, Los Angeles
Title:Algorithm acceleration for high throughout biology
Abstract:High throughput sequencing technique has been demonstrated as a revolutionary means for modern biology because it provides deep coverage and base pair-level resolution. It produces vast amount of data which pose new computational challenges, because subsequent analyses often rely on a sequence alignment step that re-establishes the origin of each read, a process that is both time consuming and error prone. In this talk, we will present our latest accomplishment in algorithm advances that dramatically accelerate the analysis by removing the necessity of sequence alignment. We will demonstrate through a concrete example of RNASeq quantification, in which we are able to achieve two orders of magnitude speedup and deliver competitive accuracy.
Prof Hui Xiong
State University of New Jersey
Title:Big Data Analytics in Business Environments.
Abstract:Recent years have witnessed the big data movement throughout all the business sectors. As a result, awareness of the importance of data mining for business is becoming wide spread. However, the big data are usually immense, fine-grained, diversified, dynamic, and sufficiently information-rich in nature, and thus demand a radical change in the philosophy of data analytics. In this talk, we introduce a set of scenarios for understanding and mining of business data in various business sectors. In particular, we will discuss the technical and domain challenges of big data analytics in business environments. The theme to be covered will include (1) the data mining problem formulation in different business applications; (2) the challenging issues of data pre-processing and post-processing in business analytics; (3) how the underlying computational models can be adapted for managing the uncertainties in relation to big data process in a huge nebulous business environment. Finally, we will also show some promising research directions.
Prof Phillip Yu
University of Illinois at Chicago
Title: On Mining Heterogeneous Information Networks
Abstract:The problem of big data has become increasingly importance in recent years. On the one hand, the big data is an asset that potentially can offer tremendous value or reward to the data owner. On the other hand, it poses tremendous challenges to distil the value out of the big data. The very nature of the big data poses challenges not only due to its volume, and velocity of being generated, but also its variety, where variety means the data can be collected from various sources with different formats from structured data to text to network/graph data, etc. In this talk, we focus on the variety issue and discuss the recent development in mining of heterogeneous information networks which can be applied to multiple disciplines, including social network analysis, World-Wide Web, database systems, data mining, machine learning, and networked communication and information systems. We will examine the problem of integration of multiple data sources using heterogeneous information network models. Fusion of multiple social networks will also be considered.
Prof Albert Zomaya
University of Sydney
Title:Resource Management in Cloud Computing Systems
Abstract:The cloud is well known for its elasticity by leveraging abundant resources. Cloud data centres easily host thousands or even millions of multicore servers. Further, these servers are increasingly virtualized for the sake of data centre efficiency. However, the reality is that these resources are often relentlessly exploited particularly to improve applications performance. Although the elasticity facilitates achieving cost efficiency (or the performance to cost ratio), the ultimate efficiency in resource usage (or more broadly data centres) lies in scheduling and resource allocation strategies that explicitly take into account actual resource consumption. The optimization of resource efficiency in clouds is of great practical importance considering its numerous benefits in the economic and environmental sustainability. In this talk, we will discuss resource efficiency in cloud data centres with an example of large-scale distributed processing applications including scientific workflows and MapReduce jobs.
A/Prof Yixin Chen
Title:Big Data Algorithms and Clinical Applications
Abstract:In the era of big data, we need novel algorithms on top of the supporting platform. In this talk, I will first discuss some key aspects of big data algorithms in general. Then, I will talk about our recent medical big data project as a case study. Early detection of clinical deterioration is essential to improving clinical outcome. In this project, we develop new algorithms for clinical early warning by mining massive clinical records in hospital databases. The research focuses on the large population of patients in the general hospital wards, who are not in the intensive care units and suffer from infrequent monitoring. I will discuss the challenges this big data application poses to traditional machine learning and data mining algorithms, our recent progress, and the lessons we learnt. Promising results from a formal clinical trial at the Barnes-Jewish Hospital, the teaching hospital of the Washington University School of Medicine, will be discussed.
Prof Yangyong Zhu
Title:Defining Data Science.
Abstract:In the age of big data, data science has become a hot occupation, supplanting traditional information science and big data engineering. This may indicate that data science has become its own branch of research. The term “data science” first appeared in CODATA Data Science Journal in 1990. So far, it has had several different interpretations. This talk aims to address what goals data science should seek to meet, and what data science itself is. We will present key connotations of data science: the first is the study of data itself. Its goal is to explore datanature and scientific issues related to datanature. The second is the study of the rules of the natural world as reflected by data, i.e., the study the natural world performed through the study of data.
Prof Zhi-Hua Zhou
Title:Learning with Big Data by Incremental Optimization of Performance Measures
Abstract: A popular approach to achieve a strong learning system is to take the performance measure that will be used for evaluation as an optimization target, and then accomplish the learning task by an optimization procedure. Many performance measures in machine learning, however, are unfortunately non-linear, non-smooth and non-convex, leading to difficult optimization problems. With big data, the optimization becomes even more challenging because of the concerns of computational, storage, communication costs, etc. Particularly, it becomes almost impossible to collect all data at first and then perform optimization, and it is desired to be able to optimize performance measures incrementally, without accessing the whole data. In this talk we will introduce some studies along this direction.