Data Science Lab

Hire: Postdoc, PhD and visiting student/scholar opportunities

The Data Science Lab is hiring!

We are interested in candidates at all levels from Ph.D. students, Master by Research (MRes), and post-doctoral fellows to visiting Ph.D. students or visiting scholars with demonstrated strong track record in AI and data science research and enterprise innovation. Candidates who are motivated for high-quality research output on original AI and data science problems or high impact innovation for enterprise AI and data science are welcome.

New: Two Research Fellow/Associate Positions

Successful candidates will work with Prof Cao on Australian Research Council Future Fellow, Discovery and LIEF grants:

Postdoctoral Research Associates – Data Science, apply at SEEK.

PhD Scholarships: AI/Data Science/Machine Learning Frontiers

Ph.D. scholarships and postdoctoral fellowships are available for talented and well-motivated students and fellows to study high-quality and original AI, data science and machine learning problems. You are expected to work in the Data Science Lab with strong research culture and demonstrated impact to address foundational issues in deeply and completely understanding complex data, behavior, and systems, and to create cutting-edge and significant theories, algorithms, models and tools for advanced AI, data science, advanced analytics, shallow and deep machine learning, and applied statistics.

Research topics of interest include but are not limited to:
• Applied statistics and modeling
• Advanced optimization theories
• Shallow-to-deep non-IID learning
• Shallow-to-deep representation and learning of complex data, behaviors and systems
• Deep Bayesian learning and deep variational learning
• Representation and learning of poor-quality data, behaviors and systems
• Representation and learning of ultrahigh dimensional data, behaviors and systems
• Representation and learning of complex relations, couplings and interactions
• Representation and learning of hierarchical and dynamic heterogeneities
• Learning dynamic and high-dimensional uncertainties and dependencies
• Deep modeling of complex occurring and non-occurring behaviors
• Learning semantic and syntactic complexities in natural language, text and documents
• Large multimodal models
• Humanoid AI and AI robots

PhD Scholarships: Enterprise AI/Data Science/Machine Learning Innovation

Ph.D. scholarships are available for talented and well-motivated students who aim for high business, economic, social and environmental impact on innovative AI, data science and machine learning applications. You are expected to work in the Data Science Lab with a strong research culture and demonstrated impact to address practical issues in deeply and completely understanding real-world data, behaviors, and problems. Your Ph.D. or MRes will involve a certain proportion of onsite work and hands-on practice in major industry and government organizations to create novel and actionable algorithms, tools and systems for enterprise innovation in advanced AI, data science, advanced analytics, machine learning, and behavior informatics.

Research topics of interest include but are not limited to:
• Business, social, economic analytics and informatics
• Behavioral AI, analytics, computing and informatics
• Enterprise data representation and learning
• Cloud analytics of enterprise complex data
• Learning low and poor-quality enterprise data
• Multi-source and multimodal data analytics
• Analyzing large-scale, dynamic, high-dimensional and sparse data
• Modeling and analytics of complex behaviors
• Actionable and explainable machine learning for enterprise
• Analyzing, querying and recommending text and documents
• Long/short text-based semantic representation for query-answering and chatbot
• Detecting complex anomalies, frauds, exceptions, and risks
• Risk and compliance analytics of enterprise problems
• AI in FinTech and financial data analytics
• AI in digital health and care
• Enterprise data science in the public sector
• Enterprise data science in private sectors

Intake, Scholarship, and Inquiries

These positions are available per funding availability and candidate qualifications, sponsored by Australian government research funding, university, or major industry sponsors. Excellent domestic students are encouraged to apply. Ph.D. scholarship will be at tax-free >$32k p.a. for three years with 1 possible extension of six months. Students who work on industry-sponsored projects may receive additional top-up scholarship depending on your performance. The postdoctoral fellowship may be on Level A or Level B, depending on your track record.

Interested candidates would have got a solid background in statistics, AI, machine learning, pattern recognition, data science and analytics, programming capabilities, high GPA, research capabilities demonstrated by publications in major conferences and/or journals in the relevant areas. You are welcome to send us the following materials for consideration:

Your full resume
Your research plan (start/end dates, research significance, issues, tasks, methods, and expected outcomes etc.)
Your publications (including three selected papers with your major contributions if applicable)
Connection between your research plan and the relevant work in the Lab

More information about scholarships and applications can be found in https://www.mq.edu.au/research/phd-and-research-degrees/how-to-apply/scholarship-opportunities.

Your inquiries and initial application materials can be sent to contacts@datasciences.org.

ScholarGPS: Prof. Cao as Highly Ranked Scholar – Top 0.05%

ScholarGPS listed Prof. Cao as Highly Ranked Scholars in Data Mining in 2022

ScholarGPS ranked Prof. Longbing Cao as #34 Highly Ranked Scholarship – Lifetime in Data Mining, which makes him the No. 1 in Australia.

ScholarGPS selects top 0.05% scholars in each discipline or specialty as their Highly Ranked Scholars. Their ranking integrates the performance in Productivity, Quality, and Impact of 30M scholars across all disciplines.

ARC LP2023: Ethical Enterprise Representations for Personalised Sustainable Finance

2023 ARC Linkage Project: LP230201022
Ethical Enterprise Representations for Personalised Sustainable Finance
Professor Longbing Cao, Associate Professor Yin Liao, Associate Professor Di Bu, Professor Dr Vito Mollica, Dr Xuhui Fan, Professor Alberto Rossi (PI), Ms Jing Sun (PI).

The rapidly evolving field of sustainable finance requires responsible services, satisfying environmental, social and governance (ESG) criteria. This requires disruptive FinTech innovations – ethical enterprise learning from whole-of-business financial data, however the corresponding valid theories and industrial solutions are unavailable. We aim to develop forward-looking ESG-integrated enterprise learning theories and tools to represent and analyse entire businesses and data and develop novel ESG ratings and ESG-efficient investment solutions. These will advance knowledge and capabilities in enterprise AI and sustainable finance, transform financial services, and enhance Australia’s leadership in FinTech research and innovation.

Access the relevant information on at the ARC grant outcome announcement webpage.

Book: Global COVID-19 Research and Modeling

Longbing Cao. Global COVID-19 Research and Modeling: A Historical Record, 1-409, ISBN: 978-981-99-9914-9, Springer, 2024.

To answer the big questions like ‘how have global scientists responded to tackling COVID-19?’ and ‘how has COVID-19 been quantified?’, our team explored 1M publications in English affiliated with 194 countries and 2M authors across 26 subjects and conducted series of research on COVID-19 modeling in the past 3.5 years.

This book provides answers to fundamental and challenging questions regarding the global response to COVID-19. It creates a historical record of COVID-19 research conducted over the four years of the pandemic, with a focus on how researchers have responded, quantified, and modeled COVID-19 problems. Since mid-2021, we have diligently monitored and analyzed global scientific efforts in tackling COVID-19. Our comprehensive global endeavor involves collecting, processing, analyzing, and discovering COVID-19 related scientific literature in English since January 2020. This provides insights into how scientists across disciplines and almost every country and regions have fought against COVID-19. Additionally, we explore the quantification of COVID-19 problems and impacts through mathematics, AI, machine learning, data science, epidemiology, and domain knowledge. The book reports findings on publication quantities, impacts, collaborations, and correlations with the economy and infections globally, regionally, and country-wide. These results represent the first and only holistic and systematic studies aimed at scientifically understanding, quantifying, and containing the pandemic. We hope this comprehensive analysis will contribute to better preparedness, response, and management of future emergencies and inspire further research in infectious diseases. The book also serves as a valuable resource for research policy, funding management authorities, researchers, policy makers, and funding bodies involved in infectious disease management, public health, and emergency resilience.

Humanoid AI: A new era of AI and robotics & Ameca

AI and robotics are ushering in a new era – humanoid AI.

Humanoid AI is emerging as the next major advancement in humanlike and humanlevel AI, paving the way for both artificial general intelligence and artificial narrow intelligence.

Longbing Cao. AI Robots and Humanoid AI: Review, Perspectives and Directions, 1-37, 19 March, 2024.

AI-powered humanoids synergize the advancements in large language models (LLMs), large multimodal models (LMMs), generative AI, and human-level AI with humanoid robotics, omniverse, and decentralized AI, transitioning from human-looking to humane humanoids and fostering human-like robotics, a new area of AI: humanoid AI.

Humanoid AI has emerged into a human-AI-robotics-web-integrative ecosystem, revolutionizing the landscape of the intelligent digital economy, societies, and cultures. While only a limited number of humanoids are currently empowered by LLMs or driven by generative AI, humanoid AI is emerging and driving fast-paced development of real-time, interactive, and humane humanoids, with revolutionary advancements and possibilities:

Synergizing generative to humanlevel AI into humanoids
Evolving paradigm shift from humanlooking to humane and humanlike humanoids
Interacting between AI and robotics and between human systems and intelligence systems
Enabling humane and humanlevel humanoids

Our humanoid AI Ameca – real-time, interactive and multimodal humanoid robot driven by generative AI, LLMs and LMMs

https://youtu.be/OUDPcn_7pts

Many techniques are required to enable humane and humanlevel robots, such as mechanical, material, biomedical, electrical and anthropomorphic designs. Intelligent techniques to enable humane and humanlevel robots include: (1) humanizing robots toward humane and humanlevel features, structures, functions and moral traits; (2) digitizing human features in robotics; and (3) intelligentizing robots with human intelligence in complex decentralized, distributed, or even virtualized applications and environments. Essential studies include:

building mind-to-action mindful and actionable humanoids
learning general humanoid intelligence
supporting omnimodal perception-to-behavior humanoid learning
advancing humanlike humanoids with humanlevel AI
hybridizing humanoids with humanoid animation, imitation, digital twins, metaverse and mixed reality
enabling humanoids with decentralized AI for decentralized humanoids: on-humanoid, edge and cloud humanoid systems
developing humanoid AI hardware, software and applications

Fig. 3. Decentralized humanoids: On-humanoid, edge and cloud humanoid AI framework, synergizing humans, humanoids, edge and cloud devices, algorithms and services including LLMs.

ARC DP24: Data Complexity and Uncertainty-Resilient Deep Variational Learning

2024 ARC Discovery Project DP240102050
Data Complexity and Uncertainty-Resilient Deep Variational Learning
Professor Longbing Cao and Professor Joao Gama (Partner Investigator).

Enterprise data present increasingly significant characteristics and complexities, such as multi-aspect, heterogeneous and hierarchical features and interactions, and evolving dependencies and multi-distributions. They continue to significantly challenge the state-of-the-art probabilistic and neural learning systems with limited to insufficient capabilities and capacity. This research aims to develop a theory of flexible deep variational learning transforming new deep probabilistic models with flexible variational neural mechanisms for analytically explainable, complexity-resilient analytics of real-life data. The outcomes are expected to fill important knowledge gaps and lift critical innovation competencies in wide domains.

Access the relevant information on at the ARC grant outcome announcement webpage.

ARC LIEF24: Federated Omniverse Facilities for Smart Digital Futures

2024 ARC Linkage Infrastructure, Equipment and Facilities (LIEF) Project: LE240100131
Federated Omniverse Facilities for Smart Digital Futures
Professor Longbing Cao; Professor Patricia Davidson; Professor Vijay Varadharajan; Professor Jinman Kim; Professor Ping Yu; Professor Amin Beheshti; Associate Professor Quang Vinh Nguyen; Dr Sankalp Khanna (PI).

A world-first trans-disciplinary, -domain, and -institutional smart 3D omniverse R&D ecosystem AuVerse will be built in NSW, affiliated with Queensland, and accessible to academia and industry. AuVerse will support cloud-based, reality-virtuality-fused, immersive, interactive and secure future-oriented digital design, development, training and society. In the new era of digital innovation and paradigm shift, AuVerse will substantially boost Australia’s pivotal research leadership and business competitiveness in nurturing new-generation, collaborative and transformative digital R&D and talent pipeline. It will enable large-scale strategic business innovation and transformation including smart manufacturing and Industry 4.0.

Access the relevant information on at the ARC grant outcome announcement webpage.

TNNLS: Explicit and Implicit Pattern Relation Analysis for Discovering Actionable Negative Sequences

Explicit and Implicit Pattern Relation Analysis for Discovering Actionable Negative Sequences.
Wei Wang, Longbing Cao. IEEE Trans Neural Netw Learn Syst, vol. 35, no. 4, pp. 5183-5197, 2024.
Access the paper at the TNNLS website.

Real-life events, behaviors, and interactions produce sequential data. An important but rarely explored problem is to analyze those nonoccurring (also called negative) yet important sequences, forming negative sequence analysis (NSA). A typical NSA area is to discover negative sequential patterns (NSPs) consisting of important nonoccurring and occurring elements and patterns. The limited existing work on NSP mining relies on frequentist and downward closure property-based pattern selection, producing large and highly redundant NSPs, nonactionable for business decision-making. This work makes the first attempt for actionable NSP discovery. It builds an NSP graph representation, quantifies both explicit occurrence and implicit nonoccurrence-based element and pattern relations, and then discovers significant, diverse, and informative NSPs in the NSP graph to represent the entire NSP set for discovering actionable NSPs. A DPP-based NSP representation and actionable NSP discovery method, EINSP, introduces novel and significant contributions to NSA and sequence analysis: 1) it represents NSPs by a determinantal point process (DPP)-based graph; 2) it quantifies actionable NSPs in terms of their statistical significance, diversity, and strength of explicit/implicit element/pattern relations; and 3) it models and measures both explicit and implicit element/pattern relations in the DPP-based NSP graph to represent direct and indirect couplings between NSP items, elements, and patterns. We substantially analyze the effectiveness of EINSP in terms of various theoretical and empirical aspects, including complexity, item/pattern coverage, pattern size and diversity, implicit pattern relation strength, and data factors.

TNNLS: eVAE: Evolutionary variational autoencoder

eVAE: Evolutionary variational autoencoder
Zhangkai Wu, Longbing Cao and Lei Qi. IEEE Trans Neural Netw Learn Syst, 2024.
Access the paper at the arXiv website.

The surrogate loss of variational autoencoders (VAEs) poses various challenges to their training, inducing the imbalance between task fitting and representation inference. To avert this, the existing strategies for VAEs focus on adjusting the tradeoff by introducing hyperparameters, deriving a tighter bound under some mild assumptions, or decomposing the loss components per certain neural settings. VAEs still suffer from uncertain tradeoff learning.We propose a novel evolutionary variational autoencoder (eVAE) building on the variational information bottleneck (VIB) theory and integrative evolutionary neural learning. eVAE integrates a variational genetic algorithm into VAE with variational evolutionary operators including variational mutation, crossover, and evolution. Its inner-outer-joint training mechanism synergistically and dynamically generates and updates the uncertain tradeoff learning in the evidence lower bound (ELBO) without additional constraints. Apart from learning a lossy compression and representation of data under the VIB assumption, eVAE presents an evolutionary paradigm to tune critical factors of VAEs and deep neural networks and addresses the premature convergence and random search problem by integrating evolutionary optimization into deep learning. Experiments show that eVAE addresses the KL-vanishing problem for text generation with low reconstruction loss, generates all disentangled factors with sharp images, and improves the image generation quality, respectively. eVAE achieves better reconstruction loss, disentanglement, and generation-inference balance than its competitors.

AAAI24: Frequency Spectrum is More Effective for Multimodal Representation and Fusion

Frequency Spectrum is More Effective for Multimodal Representation and Fusion: A Multimodal Spectrum Rumor Detector
An Lao, Qi Zhang, Chongyang Shi, Longbing Cao, Kun Yi, Liang Hu, Duoqian Miao. AAAI 2024.
Access the paper at the arXiv website.

Multimodal content, such as mixing text with images, presents significant challenges to rumor detection in social media. Existing multimodal rumor detection has focused on mixing tokens among spatial and sequential locations for unimodal representation or fusing clues of rumor veracity across modalities. However, they suffer from less discriminative unimodal representation and are vulnerable to intricate location dependencies in the time-consuming fusion of spatial and sequential tokens. This work makes the first attempt at multimodal rumor detection in the frequency domain, which efficiently transforms spatial features into the frequency spectrum and obtains highly discriminative spectrum features for multimodal representation and fusion. A novel Frequency Spectrum Representation and fUsion network (FSRU) with dual contrastive learning reveals the frequency spectrum is more effective for multimodal representation and fusion, extracting the informative components for rumor detection. FSRU involves three novel mechanisms: utilizing the Fourier transform to convert features in the spatial domain to the frequency domain, the unimodal spectrum compression, and the cross-modal spectrum co-selection module in the frequency domain. Substantial experiments show that FSRU achieves satisfactory multimodal rumor detection performance.