The Data Science Lab
since 2005
  • Home
  • Research
      • Research grants
      • Research interests
      • Research leadership
      • Student theses
      • Humanoid Ameca
      • AI Server
        • GPU
        • Request
        • Allocation
  • Consultancy
      • Consulting projects
      • Cooperate training
      • Enterprise innovation
      • Impact cases
      • Our clients
      • Partnership
  • People
      • Awards and honors
      • Staff
      • Team members
  • Activities
      • Events and services
      • Talks
      • Tutorials
      • Workshops
  • Publications
  • Communities
      • ACM ANZKDD Chapter
      • Big data summit
      • Data Analytics book series
      • DSAA conferences
      • IEEE TF-DSAA
      • IEEE TF-BESC
      • JDSA Springer
      • DataSciences.Info
      • MQ's DSAI
  • Resources
      • Actionable knowledge discovery
      • Agent mining
      • AI: Artificial-intelligence
      • AI4Tech: AI enabling technologies
      • AI4Finance: AI for FinTech
      • AI robots & humanoid AI
      • Algorithmic trading
      • Banking analytics
      • Behavior analytics, computing, informatics
      • Coupling and interaction learning
      • COVID-19 global research and modeling
      • Data science knowledge map
      • Data science dictionary
      • Data science terms
      • Data science tools
      • Data science thinking
      • Domain driven data mining
      • Educational data mining
      • Large-scale statistical learning
      • Metasynthetic engineering
      • Market surveillance
      • Negative Sequence Analysis
      • Non-IID Learning
      • Pattern relation analysis
      • Recommender systems
      • Smart beach analytics
      • Social security analytics
      • Tax analytics
  • About us
NeurIPS23: R-divergence for Estimating Model-oriented Distribution Discrepancy

R-divergence for Estimating Model-oriented Distribution Discrepancy
Zhilin Zhao, Longbing Cao. NeurIPS, 2023.

Real-life data are non-IID owing to complex distributions and interactions, while the sensitivity to the distribution of samples varies for learning models. Accordingly, for a supervised or unsupervised model, a fundamental question is whether the probability distributions of two given datasets can be treated as identical. Here, we propose R-divergence, which is used to evaluate the model-oriented distribution discrepancy, to address the above question. The insight is that two distributions are likely identical if their optimal hypothesis has the same expected risk on each distribution. Accordingly, to estimate the distribution discrepancy between two given datasets, R-divergence learns a minimum hypothesis on their mixture data and then estimates the empirical risk difference between them. We evaluate the test power of R-divergence on various unsupervised and supervised tasks, where R-divergence achieves state-of-the-art performance. Additionally, we apply R-divergence to train robust neural networks on samples with noisy labels.

About us
School of Computing, Faculty of Science and Engineering, Macquarie University, Australia
Level 3, 4 Research Park Drive, Macquarie University, NSW 2109, Australia
Tel: +61-2-9850 9583
Staff: firstname.surname(a)mq.edu.au
Students: firstname.surname(a)student.mq.edu.au
Contacts@datasciences.org