Zhilin Zhao. Statistical Methods for Out-of-distribution Detection, April 2023, UTS
For a network trained on in-distribution (ID) samples, test samples could be out-of-distribution (OOD) that are
drawn from distributions different from that of ID samples. Accordingly, OOD detection aims to identify OOD samples in test phases. The main challenge lies in that a network could provide high-confidence predictions for OOD samples, which indicates that the network cannot distinguish ID and OOD samples. The main causes of the high-confidence issue include limited ID and unavailable OOD samples in training processes. One strategy to enhance the detection performance of a network is to make the outputs more sensitive to OOD samples, i.e., the network tends to provide high- and low-confidence predictions for ID and OOD samples, respectively.
Improving the OOD sensitivity for a network requires to address a series of important problems and challenges: (1) Penalizing OOD samples with high-confidence predictions can improve the OOD sensitivity. Accordingly, how to generate specific OOD samples for a network? (2) If partial OOD samples are observed, how to involve them in the retraining process to balance the ID generalization and OOD detection? (3) If OOD samples are unavailable, how to fine-tune a network with augmented ID samples to improve the OOD sensitivity? (4) If modifying the network is not allowed, how to learn an auxiliary network to capture the OOD-sensitive information for the network?
This thesis systematically studies how to effectively solve the aforementioned issues with experimental and theoretical support. Due to the significant difference between ID and OOD samples, it is essential to consider the data characteristics and data correlations that statistical methods can model. Accordingly, this thesis attempts to incorporate statistical methods into deep neural networks to improve the OOD sensitivity. Specifically, this thesis proposes four novel methods to address these issues. The main ideas include inferring an implicit generator based on the Shannon entropy to generate high-confidence OOD samples, constructing adaptive supervision information for OOD samples to minimize the disruption for learning to classify ID samples, exploring the data space around ID samples to construct the vicinity distributions for OOD samples, and utilizing an auxiliary network to explore the discarded OOD-sensitive information in ID samples according to information bottleneck theory.
Qi Zhang. Non-IID Learning for Recommendation, Time Series and Hashing, May 2023, UTS
With the prevalence of information technology, a huge amount of data emerges every day from various domains and has been pervasive in our daily living, studying, working, and entertaining applications. In such a big data era, data learning playing a major role in transforming the thinking of data science has dominated research communities and business applications. Meanwhile, the increasing complexities of real-world data, e.g., heterogeneity and coupling relationships, extremely challenge the existing data learning methodologies and techniques and may seriously limit their applicability and feasibility.
For several decades, the independent and identically distributed (short for IID) assumption has laid the foundation of data learning, simplifying real-world data’s intricate nature for effectively achieving approximate, traceability, and asymptotic problem-solving. Unfortunately, real-world scenarios generally go beyond the IID assumption and count on specific knowledge and capability to address practical problems and challenges, where IID may show significant limitations and gaps. A broad-reaching non-IID thinking is to explore and exploit the intrinsic heterogeneities and couplings of real-world data, which has been increasingly attractive and prevalent in data learning research and applications. However, non-IIDness shows diversified properties with different data scenarios, for example, heterogeneities in data types, attributes, sources, and couplings within and between structures, distributions, and variables. It is far from reaching a unified non-IID learning paradigm for addressing various real-world heterogeneities and couplings. More importantly, it is also extremely challenging to exhaustively tailor non-IID data learning methodologies for specified scenarios and applications.
In this thesis, I explore non-IID learning in terms of different applications, specifically recommender systems, multivariate time series (MTS) analysis, and learning to hash, to enlighten non-IID methodologies and techniques. The elaborately chosen applications and scenarios penetrate our daily living, studying, working, and entertaining activities, and cover various tasks of classification, ranking, representation, and retrieval. Accordingly, the main research objectives include modeling and learning the non-IIDness in recommender systems, multivariate time series (MTS) analysis, and learning to hash, respectively, and delivering non-IID models to effectively handle the scenarios with both IID and non-IID data.
(1) To build non-IID recommender systems, we make attempts from two aspects: 1) learning user/item/context feature couplings, and 2) modeling rating distribution heterogeneity. First, we analyze the user/item/context coupling relationships and their influence on user actions; and then build a neural time-aware recommendation model with a specified feature interaction network to factorize the pairwise couplings between users, items, and temporal context. Second, we analyze the potential rating generation process which intrinsically determines the rating distribution heterogeneity. Accordingly, we propose a tripartite collaborative filtering framework and instantiate a tripartite probabilistic matrix factorization to model the rating generation and eliminate the distribution bias for debiasing rating estimation.
(2) To perform non-IID MTS forecasting, we jointly model inter- and intra-series coupling relationships and inter-series heterogeneities. Specifically, we first propose a non-IID MTS forecasting model integrating spectral clustering and Transformer. The model introduces a spectral clustering network that adaptively learns to segregate heterogeneous time series and a clusterwise forecasting network with multi-channel Transformers to model intra- and inter-series couplings. In addition, we revisit the coupling relationships from the perspective of mutual information and propose a deep coupling network that introduces a coupling module to explicitly model variable relationships and a coupling representation module to encode high-order coupling patterns.
(3) To model non-IID learning to hash, we aim to 1) preserve the couplings between inputs and hash codes, and 2) address the high-dimensional and heterogeneous issues. First, we study the impracticality of conventional code balance constraints and then introduce probabilistic code constraints to improve hash quality by guaranteeing the mutual informativeness between inputs and their hash codes. Second, we apply deep supervised hashing on high-dimensional and heterogeneous data and propose a deep hashing network to learn similarity-preserving hash codes for efficient case retrieval and deliver deep-hashing-enabled case-based reasoning. The network introduces position embedding to represent heterogeneous features and utilizes a multilinear interaction layer to effectively filtrate zero-valued features for addressing the sparsity issues and capturing feature couplings.
Thorough empirical evaluations have been conducted on real-world datasets to compare our proposed methods with the state-of-the-art approaches. The results prove that our non-IID modeling methods effectively address real-world couplings and heterogeneity issues in various complex data and significantly benefit the corresponding specific applications.