Tutorial 1: Deep Learning for Biomedical Discovery and Data Mining

Author: Truyen Tran

The goals of this tutorial are to provide the general PAKDD audience with knowledge and materials about a great venture for KDD research – the intersection between deep learning and biomedicine and to provide the deep learning community with relatively new, high impact research problems within biomedicine.
The tutorial introduces the state of the field for deep learning, and argues how biomedicine is an ideal data–intensive domain. It gives a brief review of deep learning, covering classic neural architectures including feedforward, recurrent and convolutional nets and more advanced topics including CapsNet, powerful memory-augmented neural nets (MANN), as well as models for graph data.
Two major subtopics of Genomics are covered: nanopore sequencing (which is about converting electrical signals into DNA character sequences), and genomics modeling ( which is about making sense of the DNA sequences for multiple biological processes).
Introducing biomedical imaging briefly covers imaging modalities, including vision-based, sound-based and EEG/ECG-based technologies. Then we show how deep learning technologies are being adapted, sometimes achieving human-level accuracy.
For healthcare coverage is on data mining of Electronic Medical Records. Two main problems are considered: The first is modeling time-series of physiological measurements and the second is mid-term health trajectories prediction.
Generative biomedicine section presents the recent advances in few-shot learning and deep generative models (DBN/DBM, VAE and GAN). This describes how to apply these advances to drug designs, and the future outlook into a 5 years horizon and beyond on the joint venture of deep learning and biomedicine.
Prerequisite Knowledge: the tutorial does not require detailed prior knowledge of biomedicine or deep learning, but basic familiarity with machine learning is assumed.


Truyen Tran is a Senior Lecturer at Deakin University where he leads a research team on deep learning and its applications to accelerating sciences, biomedicine and software analytics at Centre for Pattern Recognition and Data Analytics. He publishes regularly at top AI/ML/KDD venues such as CVPR, NIPS, UAI, AAAI, KDD and ICML. Tran has received multiple recognition, awards and prizes including Best Paper Runner Up at UAI (2009), Geelong Tech Award (2013), CRESP Best Paper of the Year (2014), Third Prize on Kaggle Galaxy-Zoo Challenge (2014), Title of Kaggle Master (2014), Best Student Papers Runner Up at PAKDD (2015) and ADMA (2016), Distinguished Paper at ACM SIGSOFT (2015), and Deakin Thought Leader (2016). He obtained a Bachelor of Science from University of Melbourne and a PhD in Computer Science from Curtin University in 2001 and 2008, respectively.

Tutorial 2: Non-IID Recommender Systems in Practice with Modern AI Techniques

Author: Liang Hu, Longbing Cao, Songlei Jian

The renaissance of artificial intelligence (AI) has attracted huge attention from every corner of the world. Specially, machine learning approaches have deeply involved in AI research in almost all areas, e.g., natural language processing (NLP), computer vision (CV). In particular, recommender systems (RS), as probably one of the most widely used AI systems, has integrated into every part of our daily life. In this AI age, state-of-the-art machine learning approaches, e.g. deep learning, have become the primary choice to model advanced RSs.
Classic RSs are built on the assumption that the relevant data, e.g. ratings, contents and/or social relations, are independent and identical distributed (IID). Intuitively, this is inconsistent with real-life data characteristics, and cannot represent the heterogeneity and coupling relationships over relevant data. Therefore, we employ modern machine learning approaches to enhance RSs with Comprehensive, Complementary, and Contextual (3C) information by coupling relevant heterogeneous data.
This tutorial will analyze data, challenges, and business needs in advanced recommendation problems, and take non-IID perspective to introduce recent advances in machine learning to model the 3C-based RSs. This includes an overall of RS evolution and non-IIDness in recommendation, advanced machine learning for cross-domain RS, social RS, multimodal RS, multi-criteria RS, context-aware RS, and group-based RS, and their integration in building real-life RS.
The goal of this tutorial aims to enable both academic and practical audience with a comprehensive understanding and relevant techniques of how to apply state-of-the-art machine learning approaches to build more sensible next-generation RSs in contexts with various heterogeneous data and complex relations. In this tutorial, we will present a systematic review and applications of recent advanced machine learning techniques to build real-life intelligent RSs.
Prerequisite Knowledge: a rudimentary knowledge of RSs and some machine learning methods will be helpful, including: (1) Recommender systems; (2) Latent factor models and (3) Deep learning models


Liang Hu received his first Ph.D. degree in computer application technology with the Department of Computer Science and Engineering, Shanghai Jiao Tong University, China in 2015. The title of his dissertation is Research on Modeling Approaches to Recommender Systems by Exploiting Multi-Information. Currently, he is a Ph.D. candidate major in Analytics with Advanced Analytics Institute, University of Technology Sydney, Australia. His research interests include recommender systems, data mining, machine learning and general artificial intelligence. He has published a number of papers in top-rank international conferences and journals in the area of recommender systems, including WWW, IJCAI, AAAI, ICDM, ICWS, TOIS, JWSR. He serves as program committee member on more than 10 top conferences, including IJCAI, AAAI, ICDM, CIKM.

Longbing Cao is a professor of information technology at the University of Technology Sydney (UTS), Australia. He is the Founding Director of Advanced Analytics Institute at UTS. He is the Chair of ACM SIGKDD Australia and New Zealand Chapter, IEEE Task Force on Data Science and Advanced Analytics, and IEEE Task Force on Behavioral, Economic and Socio-cultural Computing. He serves as conference co-chair of KDD2015, PAKDD13 and ADMA13, and program co-chair or vice-chair of PAKDD17, PAKDD11, ICDM10 etc., and area chair or (senior) program committee member on around 100 conferences including KDD, AAAI, IJCAI, ICDM and AAMAS. His primary research interests include data science and mining, machine learning, behavior informatics, agent mining, multi-agent systems, and open complex intelligent systems. He is currently dedicated to the research on non-iid learning in big data and behavior informatics which involve very wide enterprise applications. He has successfully delivered 11 tutorials including to IJCAI and CIKM and dozens of invited talks to main conferences/workshops and public seminars to industry and government.

Songlei Jian is a joint Ph.D. student with the Advanced Analytics Institute, University of Technology Sydney (UTS) and the National University of Defense Technology (NUDT). Her research interests include machine learning, recommender systems, network modeling, and representation learning. She has published a number of papers in top-rank international conferences and journals in the area of data mining, machine learning, and recommender systems, including IJCAI, AAAI, TKDE. She has served the community as program committee member or reviewer of AAAI, KDD, ICDM, and TKDE.

Tutorial 3: Relevant Structure Search in Graph Databases: Methods and Applications

Author: Yuanyuan Zhu, Xin Huang

TGraph is a powerful tool for modeling structural relationships between data objects in many application domains, such as social networks, collaboration networks, chemical compound structures, protein-protein interaction networks, etc. In many real-world applications, finding relevant structures from graph databases is an essential task in graph processing to uncover complex relationships within a graph database.
In this tutorial, we consider two typical scenarios of graph databases: a large collection of small graphs (e.g., chemical compound structure database and region adjacency graphs derived from a image database) and a single large graph (e.g., a social networks and a protein-protein interaction network). For the first scenario, finding similar structures for a query graph in the graph database is an important way to analyze the complex relationship between massive graphs. For the second scenario, finding cohesive subgraphs containing a set of query nodes is an essential way to understand the organization of many real-world graph data. In this tutorial, we will survey the state-of-the-art methods of finding relevant structures for these two scenarios and discuss their typical applications. For the scenario of finding similar structures, we will introduce two typical similarity measures including graph edit distance and maximum common subgraph and graph querying algorithms including subgraph similarity query, supergraph similarity query, and graph similarity query. For the scenario of finding cohesive subgraphs, we will introduce recent cohesive graph models including quasi-clique, k-core, k-truss, and k-edge connected component, and survey the state-of-the-art decomposition and search algorithms for these models. In addition, we also show the performance of some of above algorithms in real-world applications such as chemical compounds, protein-protein interaction networks and social networks, and discuss the potential research directions.
The goal of this tutorial aims to enable the general PAKDD audience with a comprehensive understanding about the newly developed models and algorithms of relevant structure search in in graph databases and their application in related fields.
Prerequisite Knowledge: The tutorial does not require detailed prior knowledge of graph processing technology, but basic familiarity with the general concepts of graph will be helpful


Yuanyuan Zhu is currently an associate professor in the School of Computer Science at Wuhan University in China. She received the BS and MS degrees in computer science from Harbin Institute of Technology in 2007 and 2009, respectively, and PhD degree in computer science from the Chinese University of Hong Kong in 2013. Her research interests include graph mining, graph database querying, and big graph processing. She has published a number of papers in top-tier conferences and journals, including VLDBJ, PVLDB, ICDE, and CIKM. She serves as invited reviewers for journals including TKDE and WWW Journal, and program committee members for conference including CIKM, DSAFAA, and PAKDD.

Xin Huang is currently an assistant professor in the department of computer science at the Hong Kong Baptist University. He received his BEng degree in computer science from the Xiamen University in 2010, and PhD degree in systems engineering and engineering management from the Chinese University of Hong Kong in 2014. During 2015-2016, he worked as a postdoctoral research fellow at the University of British Columbia. His research interests mainly focus on graph data management and mining. He serves as invited reviewers for journals including VLDBJ and TKDE, and program committees for conferences including VLDB, KDD, ICDE, WWW, EDBT, AAAI, and SDM.