The Data Science Lab
since 2005
  • Home
  • Research
      • Research grants
      • Research interests
      • Research leadership
      • Student theses
      • Humanoid Ameca
      • AI Server
        • GPU
        • Request
        • Allocation
  • Consultancy
      • Consulting projects
      • Cooperate training
      • Enterprise innovation
      • Impact cases
      • Our clients
      • Partnership
  • People
      • Awards and honors
      • Staff
      • Team members
  • Activities
      • Events and services
      • Talks
      • Tutorials
      • Workshops
  • Publications
  • Communities
      • ACM ANZKDD Chapter
      • Big data summit
      • Data Analytics book series
      • DSAA conferences
      • IEEE TF-DSAA
      • IEEE TF-BESC
      • JDSA Springer
      • DataSciences.Info
      • MQ's DSAI
  • Resources
      • Actionable knowledge discovery
      • Agent mining
      • AI: Artificial-intelligence
      • AI4Tech: AI enabling technologies
      • AI4Finance: AI for FinTech
      • AI robots & humanoid AI
      • Algorithmic trading
      • Banking analytics
      • Behavior analytics, computing, informatics
      • Coupling and interaction learning
      • COVID-19 global research and modeling
      • Data science knowledge map
      • Data science dictionary
      • Data science terms
      • Data science tools
      • Data science thinking
      • Domain driven data mining
      • Educational data mining
      • Large-scale statistical learning
      • Metasynthetic engineering
      • Market surveillance
      • Negative Sequence Analysis
      • Non-IID Learning
      • Pattern relation analysis
      • Recommender systems
      • Smart beach analytics
      • Social security analytics
      • Tax analytics
  • About us
Large-scale statistical learning

 
Introduction

Large-scale statistical learning aims to develop advanced statistical methods for complex machine learning problems with large, sparse, and multi-source data and complex relations and dynamics in the data. Such methods are critical for statistical machine learning of real-life applications such as collaborative filtering, network analysis, text analysis, and count data analysis, data mining, recommender systems, network analysis, document analysis, and natural language processing.

Classic statistical models face challenges in modeling large, sparse and multi-source data due to the intensive mathematical computation and inefficiency on the entire and sparse data, poor representation of multis-source relations and dynamics. Large-scale statistical learning performs the computation only on non-missing data combined with efficient Bayesian inference methods.

 
Research Topics

There are many interesting problems and topics to be explored in large-scale statistical learning, including:

  • Large Scale Bayesian Inference: developing techniques for the inference of Bayesian probabilistic models such as Markov Chain Monte Carlo (MCMC) and Variational Inference (VI) on large and sparse data;
  • Bayesian Nonparametrics: reducing computational time on large datasets including by Dirichlet process, Gaussian process, and latent feature models;
  • Modeling count data: developing statistical models for count data with sparsity;
  • Modeling sparse data: developing statistical models for sparse data, e.g., Poisson matrix factorization;
  • Modeling multi-source data: developing statistical techniques to model relations in heterogeneous data;
  • Modeling dynamic data: developing statistical models for both static and dynamic data with temporal transition and dynamics;
  • Modeling networking behaviors: developing statistical models for interactive networks, and networking behaviors;
  • Modeling large-scale recommendation: modeling rating relations, item relations, user relations and user-item relations for collaborative filtering etc. for large, sparse and multi-source recommendation problems;
  • Deep Bayesian networks: developing deep/hierarchical Bayesian networks with deep learning mechanisms, e.g., attention, and dropout.

 
Tutorials

  • Trong Dinh Thac Do, Longbing Cao and Jinjin Guo. Statistical Machine Learning: Big, Multi-source and Sparse Data with Complex Relations and Dynamics, AAAI2020
  • Trong Dinh Thac Do, Longbing Cao. Statistical Machine Learning of Large, Sparse, and Multi-source Data, PAKDD2019, large-scale statistical learning tutorial slides

 
References

Qing Liu, Trong Dinh Thac Do and Longbing Cao. Answer Keyword Generation for Community Question Answering by Multi-aspect Gamma-Poisson Matrix Completion, IEEE Intelligent Systems. BibTeX
Trong Dinh Thac Do and Longbing Cao. Gamma-Poisson Dynamic Matrix Factorization Embedded with Metadata Influence, NIPS2018. BibTeX
Trong Dinh Thac Do and Longbing Cao. Metadata-dependent Infinite Poisson Factorization for Efficiently Modelling Sparse and Large Matrices in Recommendation, IJCAI2018. BibTeX
Trong Dinh Thac Do, Longbing Cao. Coupled Poisson Factorization Integrated with User/Item Metadata for Modeling Popular and Sparse Ratings in Scalable Recommendation. AAAI2018. BibTeX
Xuhui Fan; Richard Yi Da Xu; Longbing Cao, Yin Song. Learning Nonparametric Relational Models by Conjugately Incorporating Node Information in a Network, IEEE Transactions on Cybernetics, DOI: 10.1109/TCYB.2016.2521376, 2016. BibTeX
Xuhui Fan, Richard Xu, Longbing Cao. Copula Mixed-Membership Stochastic Blockmodel. IJCAI2016. BibTeX
Fan, Xuhui; Longbing Cao, Xu, Richard Yi Da. Dynamic Infinite Mixed-Membership Stochastic Blockmodel, IEEE Transactions on Neural Networks and Learning Systems, 26(9): 2072-2085 (2015). BibTeX

About us
School of Computing, Faculty of Science and Engineering, Macquarie University, Australia
Level 3, 4 Research Park Drive, Macquarie University, NSW 2109, Australia
Tel: +61-2-9850 9583
Staff: firstname.surname(a)mq.edu.au
Students: firstname.surname(a)student.mq.edu.au
Contacts@datasciences.org