The Data Science Lab
since 2005
  • Home
  • Research
      • Research grants
      • Research interests
      • Research leadership
      • Student theses
      • Humanoid Ameca
      • AI Server
        • GPU
        • Request
        • Allocation
  • Consultancy
      • Consulting projects
      • Cooperate training
      • Enterprise innovation
      • Impact cases
      • Our clients
      • Partnership
  • People
      • Awards and honors
      • Staff
      • Team members
  • Activities
      • Events and services
      • Talks
      • Tutorials
      • Workshops
  • Publications
  • Communities
      • ACM ANZKDD Chapter
      • Big data summit
      • Data Analytics book series
      • DSAA conferences
      • IEEE TF-DSAA
      • IEEE TF-BESC
      • JDSA Springer
      • DataSciences.Info
      • MQ's DSAI
  • Resources
      • Actionable knowledge discovery
      • Agent mining
      • AI: Artificial-intelligence
      • AI4Tech: AI enabling technologies
      • AI4Finance: AI for FinTech
      • AI robots & humanoid AI
      • Algorithmic trading
      • Banking analytics
      • Behavior analytics, computing, informatics
      • Coupling and interaction learning
      • COVID-19 global research and modeling
      • Data science knowledge map
      • Data science dictionary
      • Data science terms
      • Data science tools
      • Data science thinking
      • Domain driven data mining
      • Educational data mining
      • Large-scale statistical learning
      • Metasynthetic engineering
      • Market surveillance
      • Negative Sequence Analysis
      • Non-IID Learning
      • Pattern relation analysis
      • Recommender systems
      • Smart beach analytics
      • Social security analytics
      • Tax analytics
  • About us
Anomaly and outlier detection

 
Introduction
Anomaly detection, also called outlier detection, is a technique widely used to identify abnormality, outlinerness, irregularity, exceptions, inconsistencies, change/deviation/drift/shift, distortion/variation, etc. of an object (which could be individual or compound) or its behavior, dynamics or associated effect.

Outlier detection has a long history and wide applications. Its theoretical foundations involve but are not limited to mathematical and statistical analysis, representation learning, simulation and modeling, signal processing, machine learning, data mining and knowledge discovery, event analysis, behavior informatics (non-occurring behavior analysis and sequence analysis), language processing (incl. negative feedback, emotion/sentiment analysis), risk analytics, and deep learning.

The outlier detection applications have been spread over almost every discipline and business domain, such as space and astronomic exploration and discovery, fraud detection, credit/loan risk analysis, customer risk and care analytics, health and medical diagnosis, disease and drug discovery, gene sequence analysis, product quality control, market surveillance, government regulation, business compliance, border protection and antiterrorism, and cybersecurity.

 
Research Topics

The research topics include but are not limited to the following areas:

  • Analyzing abnormal values, value ranges, value distributions, distributional change, out-of-distributions, irregular distributions, features, and objects
  • Analyzing abnormal value/feature/object couplings, interactions, relations such as dependencies, and networking
  • Analyzing abnormal actions, activities, behaviors, events and their developments
  • Analyzing negative feedback, sentiment and emotion
  • Analyzing negative intention, goals and objectives
  • Analyzing abnormal effect, consequence, impact and risk
  • Detecting unknowns, open worlds, novelties, cold-starts, fraud, flaw, inconsistency, irregularity, change, drift, and exceptions
  • Algorithms and methods for detecting abnormal points (objects), groups and collections (collective behaviors or groups), contexts, and (evolutionary) processes
  • Algorithms and methods for detecting dynamic, transformative, networked, distributed, clustered, hierarchical anomalies and outliers
  • Non-IID anomaly detection: where anomalies and outliers are non-independent and identically distributed, i.e. with abnormal couplings and heterogeneities
  • Deep anomaly detection: applying deep learning methods for representing and detecting anomalies and outliers
  • Non-occurring behavior analysis and negative sequence analysis: analyze rare, important behaviors or behavior sequences that should happen but didn’t
  • Anomaly detection in complex settings, data and context: e.g., high-dimensional, noisy, imbalanced, multi-sourced, multimodal, cross-domain, cross-market, nonstationary, evolving, unsupervised, semi-supervised, weakly-supervised, interactive data and contexts

The following diagram shows a taxonomy of deep anomaly detection [1].

Figure 1. A taxonomy of deep anomaly detection.

 
Activities

  • KDD21 workshop on Anomaly and Novelty Detection, Explanation and Accommodation (ANDEA 2021)
  • IJCAI21 workshop on Artificial Intelligence for Anomalies and Novelties (AI4AN 2021)
  • IJCAI20 workshop on Artificial Intelligence for Anomalies and Novelties (AI4AN 2020)
  • Special Issue on Non-i.i.d. Anomaly Detection, IEEE Intelligent Systems
  • KDD21 tutorial on Deep Anomaly Detection and Explanation
  • WSDM’21 tutorial on Deep Anomaly Detection and Explanation
  • TNNLS Special issue on Deep Anomaly Detection and Explanation
  • KDD’17 tutorial on Non-IID Learning in Big Data

 
Our Experience and References

Below, we briefly illustrate some of our research, projects, and publications on anomaly and outlier detection.

  • Research on anomaly/outlier detection and risk analytics

    • Non-IID outlier detection: techniques and algorithms for modeling abnormal couplings and heterogeneities between values/value groups, features/feature sets, objects/object clusters, etc.

    • Abnormal and contrastive group behaviors: techniques and algorithms for modeling abnormal and contrastive collective behaviors of associated individuals or groups

    • Cross-aspect anomaly detection: anomalies across groups, markets, domains, modality, media, data sources, etc.

    • Abormal impact modeling: modeling the effect and impact of abnormal objects, behaviors, actions, events etc.

    • Risk analytics: analyze and predict risk factors, patterns, exceptions, and risk severity, etc.

    • Non-occurring behavior analysis and negative sequence analysis: analyze rare, important behaviors or behavior sequences that should happen but didn’t

    • Anomaly detection in complex settings, data and context: e.g., high-dimensional, noisy, imbalanced, multi-sourced, multimodal, cross-domain, cross-market, nonstationary, evolving, unsupervised, semi-supervised, weakly-supervised, interactive data and contexts

  • Real-world project experience on anomaly/outlier detection and risk analytics

    • Capital markets: market surveillance (insider trading and market manipulation, IPO, etc.), exceptional algorithmic trading, cross-market manipulation, etc.

    • Financial crisis: abnormal cross-market interactions, couplings, crisis contagion, etc.

    • Banking: detecting fraud, customer churn, online phishing/malware/ID takeover, over/under credit, loan risk, etc.

    • Insurance: fraud, over/under services, `hospital shopping’, mispricing, overclaims, risk factors, etc.

    • Government: integrity, government debt, fraud, tax overclaim, incorrect income declaration, overpayment, border control risk, procurement risk, financial service risk, etc.

    • Marketing and customer relationship management: supply-demand imbalance, marketing failure, customer churn/lapse, new product/service campaign, new customer care, etc.

    • Education: student attrition, course failure, under/over-performing behavior analysis, etc.

    • Health/medical services: fraud, disease diagnosis from biochemical sample test, medical imaging, over/under-services, inappropriate discharges/readmissions, etc.

    • Behavior: abnormal individual/collective/multi-party behavior, behavior sequence, behavior impact, behavior utility, etc.

    • Document/text/language: rule/clause/policy violation, abnormal reporting, exceptional events/statements, etc.

    • Image: image manipulation, abnormal object/region detection, etc.

  • Relevant publications

    • [1] Guansong Pang, Chunhua Shen, Longbing Cao and Anton Van Den Hengel. Deep Learning for Anomaly Detection: A Review. ACM Computing Surveys, 54(2), 38: 1-38, 2021. https://doi.org/10.1145/3439950. BibTeX

    • Guansong Pang, Longbing Cao and Ling Chen. Homophily outlier detection in non-IID categorical data, Data Min Knowl Disc, 2021, https://doi.org/10.1007/s10618-021-00750-y

    • Guansong Pang, and Longbing Cao. Heterogeneous Univariate Outlier Ensembles in Multidimensional Data, ACM Transactions on Knowledge Discovery from Data, 14(6): 1-27, 2020. BibTeX

    • Guansong Pang, Longbing Cao, Ling Chen and Huan Liu. Learning Representations of Ultrahigh-dimensional Data for Random Distance-based Outlier Detection, KDD2018. BibTeX

    • Guansong Pang, Longbing Cao, Ling Chen, Defu Lian and Huan Liu. Sparse Modeling-based Sequential Ensemble Learning for Effective Outlier Detection in High-dimensional Numeric Data. AAAI2018. BibTeX

    • Guansong Pang, Hongzuo Xu, Longbing Cao and Wentao Zhao. Selective Value Coupling Learning for Detecting Outliers in High-Dimensional Categorical Data. CIKM2017. BibTeX

    • Guansong Pang, Longbing Cao, Ling Cheny and Huan Liu. Learning Homophily Couplings from Non-IID Data for Joint Feature Selection and Noise-Resilient Outlier Detection. IJCAI2017. BibTeX

    • Guansong Pang, Longbing Cao, Ling Chen. Outlier Detection in Complex Categorical Data by Modelling the Feature Value Couplings. IJCAI2016: 1902-1908. BibTeX

    • Guansong Pang, Longbing Cao, Ling Chen, Huan Liu. Unsupervised Feature Selection for Outlier Detection by Modelling Hierarchical Value-Feature Couplings. ICDM2016. BibTeX

    • Longbing Cao, Philip S. Yu, Vipin Kumar. Nonoccurring Behavior Analytics: A New Area. IEEE Intelligent Systems 30(6): 4-11 (2015). BibTeX

    • Wei Cao, Longbing Cao. Financial Crisis Forecasting via Coupled Market State Analysis, IEEE Intelligent Systems, 30(2): 18-25 (2015). BibTeX

    • Wei Cao, Yves Demazeau, Longbing Cao, Weidong Zhu. Financial crisis and global market couplings. DSAA 2015: 1-10, Research Track. BibTeX

    • Bo Liu, Yanshan Xiao, Philip S. Yu, Zhifeng Hao, Longbing Cao. An Efficient Approach for Outlier Detection with Imperfect Data Labels. IEEE Trans. Knowl. Data Eng. 26(7): 1602-1616 (2014). BibTeX

    • Yin Song, Longbing Cao, Yin Junfu and Wang Cheng. Extracting Discriminative Features for Identifying Abnormal Sequences in One-class Mode, IJCNN 2013. BibTeX

    • Wei Cao, Longbing Cao, Yin Song. Coupled Market Behavior Based Financial Crisis Detection, IJCNN2013. BibTeX

    • Longbing Cao, Yuming Ou, Philip S Yu. Coupled Behavior Analysis with Applications (KDD2010 extension), IEEE Trans. on Knowledge and Data Engineering, 24(8): 1378-1392 (2012). BibTeX

    • Yin Song, Longbing Cao, et al. Coupled Behavior Analysis for Capturing Coupling Relationships in Group-based Market Manipulation, KDD 2012. BibTeX

    • Yin Song and Longbing Cao. Graph-based Coupled Behavior Analysis: A Case Study on Detecting Collaborative Manipulations in Stock Markets, IJCNN 2012. BibTeX

    • Wanqi Yang, Yang Gao, Longbing Cao. TRASMIL: A Local Anomaly Detection Framework Based on Trajectory Segmentation and Multi-instance Learning, Computer Vision and Image Understanding, 117(10): 1273-1286 (2013). BibTeX

    • Bo Liu, Yanshan Xiao, Longbing Cao. SVDD-Based Outlier Detection on Uncertain Data, Knowledge and Information Systems. BibTeX

    • Wei Wei, Jinjiu Li, Longbing Cao, Yuming Ou, Jiahang Chen. Effective detection of sophisticated online banking fraud on extremely imbalanced data, World Wide Web, 2012. BibTeX

    • Huaifeng Zhang, Yanchang Zhao, Longbing Cao, Chengqi Zhang and Hans Bohlscheid. Customer Activity Sequence Classification for Debt Prevention in Social Security, Journal of Computer Science and Technology, 24(6): 1000-1009 (2009). BibTeX

    • Bo Liu, Jie Yin, Yanshan Xiao, Longbing Cao, Philip S Yu. Exploiting Local Data Uncertainty to Boost Global Outlier Detection, ICDM2010. BibTeX

    • Longbing Cao, Yuming Ou, Philip S YU, Gang Wei. Detecting Abnormal Coupled Sequences and Sequence Changes in Group-based Manipulative Trading Behaviors, KDD2010, 85-94. BibTeX

    • Huaifeng Zhang, Yanchang Zhao, Longbing Cao, Chengqi Zhang and Hans Bohlscheid. Rare Class Association Rule Mining with Multiple Imbalanced Attributes, Rare Association Rule Mining and Knowledge Discovery: Technologies for Infrequent and Critical Event Detection, Information Science Reference, 2009.

    • Yanshan Xiao, Bo Liu, Longbing Cao, Xindong Wu, Chengqi Zhang, Zhifeng Hao, Fengzhao Yang, Jie Cao. Multi-sphere Support Vector Data Description for Outliers Detection on Multi-distribution Data, ICDM Workshops on Domain Driven Data Mining 2009: 82-87. BibTeX

    • Huaifeng Zhang, Yanchang Zhao, Longbing Cao, Chengqi Zhang and Hans Bohlscheid. Customer Activity Sequence Classification for Debt Prevention in Social Security, Journal of Computer Science and Technology, 24(6): 1000-1009 (2009). BibTeX

    • Yanchang Zhao, Huaifeng Zhang, Shanshan Wu, Jian Pei,Longbing Cao, Chengqi Zhang and Hans Bohlscheid. Debt Detection in Social Security by Sequence Classification Using Both Positive and Negative Patterns, ECML/PKDD2009, 648-663, 2009. BibTeX

    • Yanchang Zhao, Huaifeng Zhang, Longbing Cao, Chengqi Zhang and Hans Bohlscheid. Efficient Mining of Event-Oriented Negative Sequential Rules, WI 08, pp. 336-342. BibTeX

    • Chao Luo, Yanchang Zhao, Longbing Cao, Yuming Ou and Li Liu. Outlier Mining on Multiple Time Series Data in Stock Market, PRICAI2008, pp. 1010-1015. BibTeX

About us
School of Computing, Faculty of Science and Engineering, Macquarie University, Australia
Level 3, 4 Research Park Drive, Macquarie University, NSW 2109, Australia
Tel: +61-2-9850 9583
Staff: firstname.surname(a)mq.edu.au
Students: firstname.surname(a)student.mq.edu.au
Contacts@datasciences.org