Data Science Lab

NeurIPS24: Revealing Distribution Discrepancy by Sampling Transfer

Revealing Distribution Discrepancy by Sampling Transfer in Unlabeled Data
Zhilin Zhao, Longbing Cao, Xuhui Fan, Wei-Shi Zheng. NeurIPS, 2024.

There are increasing cases where the class labels of test samples are unavailable, creating a significant need and challenge in measuring the discrepancy between training and test distributions. This distribution discrepancy complicates the assessment of whether the hypothesis selected by an algorithm on training samples remains applicable to test samples. We present a novel approach called Importance Divergence (I-Div) to address the challenge of test label unavailability, enabling distribution discrepancy evaluation using only training samples. I-Div transfers the sampling patterns from the test distribution to the training distribution by estimating density and likelihood ratios. Specifically, the density ratio, informed by the selected hypothesis, is obtained by minimizing the Kullback-Leibler divergence between the actual and estimated input distributions. Simultaneously, the likelihood ratio is adjusted according to the density ratio by reducing the generalization error of the distribution discrepancy as transformed through the two ratios. Experimentally, I-Div accurately quantifies the distribution discrepancy, as evidenced by a wide range of complex data scenarios and tasks.

ScholarGPS: Prof. Cao as lifetime Highly Ranked Scholar – Top 0.05%

ScholarGPS listed Prof. Cao as lifetime Highly Ranked Scholars in Data Mining in 2022

ScholarGPS ranked Prof. Longbing Cao as #34 Highly Ranked Scholarship – Lifetime in Data Mining, which makes him the No. 1 in Australia.

ScholarGPS selects top 0.05% scholars in each discipline or specialty as their Highly Ranked Scholars. Their ranking integrates the performance in Productivity, Quality, and Impact of 30M scholars across all disciplines.

ARC LP2023: Ethical Enterprise Representations for Personalised Sustainable Finance

2023 ARC Linkage Project: LP230201022
Ethical Enterprise Representations for Personalised Sustainable Finance
Professor Longbing Cao, Associate Professor Yin Liao, Associate Professor Di Bu, Professor Dr Vito Mollica, Dr Xuhui Fan, Professor Alberto Rossi (PI), Ms Jing Sun (PI).

The rapidly evolving field of sustainable finance requires responsible services, satisfying environmental, social and governance (ESG) criteria. This requires disruptive FinTech innovations – ethical enterprise learning from whole-of-business financial data, however the corresponding valid theories and industrial solutions are unavailable. We aim to develop forward-looking ESG-integrated enterprise learning theories and tools to represent and analyse entire businesses and data and develop novel ESG ratings and ESG-efficient investment solutions. These will advance knowledge and capabilities in enterprise AI and sustainable finance, transform financial services, and enhance Australia’s leadership in FinTech research and innovation.

Access the relevant information on at the ARC grant outcome announcement webpage.

Book: Global COVID-19 Research and Modeling

Longbing Cao. Global COVID-19 Research and Modeling: A Historical Record, 1-409, ISBN: 978-981-99-9914-9, Springer, 2024.

To answer the big questions like ‘how have global scientists responded to tackling COVID-19?’ and ‘how has COVID-19 been quantified?’, our team explored 1M publications in English affiliated with 194 countries and 2M authors across 26 subjects and conducted series of research on COVID-19 modeling in the past 3.5 years.

This book provides answers to fundamental and challenging questions regarding the global response to COVID-19. It creates a historical record of COVID-19 research conducted over the four years of the pandemic, with a focus on how researchers have responded, quantified, and modeled COVID-19 problems. Since mid-2021, we have diligently monitored and analyzed global scientific efforts in tackling COVID-19. Our comprehensive global endeavor involves collecting, processing, analyzing, and discovering COVID-19 related scientific literature in English since January 2020. This provides insights into how scientists across disciplines and almost every country and regions have fought against COVID-19. Additionally, we explore the quantification of COVID-19 problems and impacts through mathematics, AI, machine learning, data science, epidemiology, and domain knowledge. The book reports findings on publication quantities, impacts, collaborations, and correlations with the economy and infections globally, regionally, and country-wide. These results represent the first and only holistic and systematic studies aimed at scientifically understanding, quantifying, and containing the pandemic. We hope this comprehensive analysis will contribute to better preparedness, response, and management of future emergencies and inspire further research in infectious diseases. The book also serves as a valuable resource for research policy, funding management authorities, researchers, policy makers, and funding bodies involved in infectious disease management, public health, and emergency resilience.

Humanoid AI: A new era of AI and robotics & Ameca

AI and robotics are ushering in a new era – humanoid AI.

Humanoid AI is emerging as the next major advancement in humanlike and humanlevel AI, paving the way for both artificial general intelligence and artificial narrow intelligence.

Longbing Cao. AI Robots and Humanoid AI: Review, Perspectives and Directions, 1-37, 19 March, 2024.

AI-powered humanoids synergize the advancements in large language models (LLMs), large multimodal models (LMMs), generative AI, and human-level AI with humanoid robotics, omniverse, and decentralized AI, transitioning from human-looking to humane humanoids and fostering human-like robotics, a new area of AI: humanoid AI.

Humanoid AI has emerged into a human-AI-robotics-web-integrative ecosystem, revolutionizing the landscape of the intelligent digital economy, societies, and cultures. While only a limited number of humanoids are currently empowered by LLMs or driven by generative AI, humanoid AI is emerging and driving fast-paced development of real-time, interactive, and humane humanoids, with revolutionary advancements and possibilities:

Synergizing generative to humanlevel AI into humanoids
Evolving paradigm shift from humanlooking to humane and humanlike humanoids
Interacting between AI and robotics and between human systems and intelligence systems
Enabling humane and humanlevel humanoids

Our humanoid AI Ameca – real-time, interactive and multimodal humanoid robot driven by generative AI, LLMs and LMMs

https://youtu.be/OUDPcn_7pts

Many techniques are required to enable humane and humanlevel robots, such as mechanical, material, biomedical, electrical and anthropomorphic designs. Intelligent techniques to enable humane and humanlevel robots include: (1) humanizing robots toward humane and humanlevel features, structures, functions and moral traits; (2) digitizing human features in robotics; and (3) intelligentizing robots with human intelligence in complex decentralized, distributed, or even virtualized applications and environments. Essential studies include:

building mind-to-action mindful and actionable humanoids
learning general humanoid intelligence
supporting omnimodal perception-to-behavior humanoid learning
advancing humanlike humanoids with humanlevel AI
hybridizing humanoids with humanoid animation, imitation, digital twins, metaverse and mixed reality
enabling humanoids with decentralized AI for decentralized humanoids: on-humanoid, edge and cloud humanoid systems
developing humanoid AI hardware, software and applications

Fig. 3. Decentralized humanoids: On-humanoid, edge and cloud humanoid AI framework, synergizing humans, humanoids, edge and cloud devices, algorithms and services including LLMs.

ARC DP24: Data Complexity and Uncertainty-Resilient Deep Variational Learning

2024 ARC Discovery Project DP240102050
Data Complexity and Uncertainty-Resilient Deep Variational Learning
Professor Longbing Cao and Professor Joao Gama (Partner Investigator).

Enterprise data present increasingly significant characteristics and complexities, such as multi-aspect, heterogeneous and hierarchical features and interactions, and evolving dependencies and multi-distributions. They continue to significantly challenge the state-of-the-art probabilistic and neural learning systems with limited to insufficient capabilities and capacity. This research aims to develop a theory of flexible deep variational learning transforming new deep probabilistic models with flexible variational neural mechanisms for analytically explainable, complexity-resilient analytics of real-life data. The outcomes are expected to fill important knowledge gaps and lift critical innovation competencies in wide domains.

Access the relevant information on at the ARC grant outcome announcement webpage.

ARC LIEF24: Federated Omniverse Facilities for Smart Digital Futures

2024 ARC Linkage Infrastructure, Equipment and Facilities (LIEF) Project: LE240100131
Federated Omniverse Facilities for Smart Digital Futures
Professor Longbing Cao; Professor Patricia Davidson; Professor Vijay Varadharajan; Professor Jinman Kim; Professor Ping Yu; Professor Amin Beheshti; Associate Professor Quang Vinh Nguyen; Dr Sankalp Khanna (PI).

A world-first trans-disciplinary, -domain, and -institutional smart 3D omniverse R&D ecosystem AuVerse will be built in NSW, affiliated with Queensland, and accessible to academia and industry. AuVerse will support cloud-based, reality-virtuality-fused, immersive, interactive and secure future-oriented digital design, development, training and society. In the new era of digital innovation and paradigm shift, AuVerse will substantially boost Australia’s pivotal research leadership and business competitiveness in nurturing new-generation, collaborative and transformative digital R&D and talent pipeline. It will enable large-scale strategic business innovation and transformation including smart manufacturing and Industry 4.0.

Access the relevant information on at the ARC grant outcome announcement webpage.

TNNLS: Explicit and Implicit Pattern Relation Analysis for Discovering Actionable Negative Sequences

Explicit and Implicit Pattern Relation Analysis for Discovering Actionable Negative Sequences.
Wei Wang, Longbing Cao. IEEE Trans Neural Netw Learn Syst, vol. 35, no. 4, pp. 5183-5197, 2024.
Access the paper at the TNNLS website.

Real-life events, behaviors, and interactions produce sequential data. An important but rarely explored problem is to analyze those nonoccurring (also called negative) yet important sequences, forming negative sequence analysis (NSA). A typical NSA area is to discover negative sequential patterns (NSPs) consisting of important nonoccurring and occurring elements and patterns. The limited existing work on NSP mining relies on frequentist and downward closure property-based pattern selection, producing large and highly redundant NSPs, nonactionable for business decision-making. This work makes the first attempt for actionable NSP discovery. It builds an NSP graph representation, quantifies both explicit occurrence and implicit nonoccurrence-based element and pattern relations, and then discovers significant, diverse, and informative NSPs in the NSP graph to represent the entire NSP set for discovering actionable NSPs. A DPP-based NSP representation and actionable NSP discovery method, EINSP, introduces novel and significant contributions to NSA and sequence analysis: 1) it represents NSPs by a determinantal point process (DPP)-based graph; 2) it quantifies actionable NSPs in terms of their statistical significance, diversity, and strength of explicit/implicit element/pattern relations; and 3) it models and measures both explicit and implicit element/pattern relations in the DPP-based NSP graph to represent direct and indirect couplings between NSP items, elements, and patterns. We substantially analyze the effectiveness of EINSP in terms of various theoretical and empirical aspects, including complexity, item/pattern coverage, pattern size and diversity, implicit pattern relation strength, and data factors.

TNNLS: eVAE: Evolutionary variational autoencoder

eVAE: Evolutionary variational autoencoder
Zhangkai Wu, Longbing Cao and Lei Qi. IEEE Trans Neural Netw Learn Syst, 2024.
Access the paper at the arXiv website.

The surrogate loss of variational autoencoders (VAEs) poses various challenges to their training, inducing the imbalance between task fitting and representation inference. To avert this, the existing strategies for VAEs focus on adjusting the tradeoff by introducing hyperparameters, deriving a tighter bound under some mild assumptions, or decomposing the loss components per certain neural settings. VAEs still suffer from uncertain tradeoff learning.We propose a novel evolutionary variational autoencoder (eVAE) building on the variational information bottleneck (VIB) theory and integrative evolutionary neural learning. eVAE integrates a variational genetic algorithm into VAE with variational evolutionary operators including variational mutation, crossover, and evolution. Its inner-outer-joint training mechanism synergistically and dynamically generates and updates the uncertain tradeoff learning in the evidence lower bound (ELBO) without additional constraints. Apart from learning a lossy compression and representation of data under the VIB assumption, eVAE presents an evolutionary paradigm to tune critical factors of VAEs and deep neural networks and addresses the premature convergence and random search problem by integrating evolutionary optimization into deep learning. Experiments show that eVAE addresses the KL-vanishing problem for text generation with low reconstruction loss, generates all disentangled factors with sharp images, and improves the image generation quality, respectively. eVAE achieves better reconstruction loss, disentanglement, and generation-inference balance than its competitors.

AAAI24: Frequency Spectrum is More Effective for Multimodal Representation and Fusion

Frequency Spectrum is More Effective for Multimodal Representation and Fusion: A Multimodal Spectrum Rumor Detector
An Lao, Qi Zhang, Chongyang Shi, Longbing Cao, Kun Yi, Liang Hu, Duoqian Miao. AAAI 2024.
Access the paper at the arXiv website.

Multimodal content, such as mixing text with images, presents significant challenges to rumor detection in social media. Existing multimodal rumor detection has focused on mixing tokens among spatial and sequential locations for unimodal representation or fusing clues of rumor veracity across modalities. However, they suffer from less discriminative unimodal representation and are vulnerable to intricate location dependencies in the time-consuming fusion of spatial and sequential tokens. This work makes the first attempt at multimodal rumor detection in the frequency domain, which efficiently transforms spatial features into the frequency spectrum and obtains highly discriminative spectrum features for multimodal representation and fusion. A novel Frequency Spectrum Representation and fUsion network (FSRU) with dual contrastive learning reveals the frequency spectrum is more effective for multimodal representation and fusion, extracting the informative components for rumor detection. FSRU involves three novel mechanisms: utilizing the Fourier transform to convert features in the spatial domain to the frequency domain, the unimodal spectrum compression, and the cross-modal spectrum co-selection module in the frequency domain. Substantial experiments show that FSRU achieves satisfactory multimodal rumor detection performance.