A new method for weighting the importance of data samples could help deep learning systems better adapt to changes in data distribution.
Check out our video:
In many practical applications of deep learning, the training data is assumed to be independently and identically distributed (i.i.d.). However, this assumption often does not hold in practice, e.g., when there is a domain shift between the training and test data. A recent line of work has proposed to reweight the importance of individual examples during training to mitigate the effects of distribution shift. While these methods can improve performance on standard benchmarks, they are often heuristic and rely on careful tuning of hyperparameters. We propose a method for weighting importance that is based on an information-theoretic principle, which we call Maximum Mean Discrepancy Information Bottleneck (MMD-IB). Our method automatically tunes the hyperparameters and does not require any knowledge of the shift in distributions. We show that, on several standard benchmarks for distributional shift, our method outperforms state-of-the-art importance weighting methods with little or no tuning required.
What is distribution shift?
Distribution shift is a phenomenon in machine learning whereby the training and test data distributions differ. This can happen for a variety of reasons, including changes in the source of the data, pre-processing steps, or the way in which the data is sampled. Distribution shift is a major challenge for machine learning models, as it can lead to significant performance degradation.
There are two main types of distribution shift: covariate shift and concept drift. Covariate shift occurs when the marginal distribution of the training and test data differs, but the joint distribution is unchanged. Concept drift occurs when the joint distribution changes over time.
Covariate shift is usually easier to deal with than concept drift, as it is often possible to reweight the training data so that it matches the test data distribution. However, concept drift is more difficult to deal with, as it may be impossible to know in advance when or how the distribution will change.
There are a number of methods proposed for dealing with distribution shift, including transfer learning, domain adaptation, and reweighting. Recently, there has been a lot of interest in using deep learning for dealing with distribution shift. Deep learning models have been shown to be particularly effective at transfer learning and domain adaptation.
In this paper, we propose a new method for dealing with distribution shift in deep learning called importance weighting. Importance weighting is a method of correcting for differences in distributions by up-weighting or down-weighting data points according to their importance. We show that our method can outperform existing methods on a variety of tasks and datasets.
The current approach to importance weighting
In standard importance weighting, the transition probabilities of the policies are weighteddue to the discrepancy in the distribution of states visited by each behavior policy. The weights are usually obtained by importance sampling and are used to re-weight the returns from each policy. The expected return of a given target policy can then be estimated using a single set of samples, which is more efficient than running several independent replicas of the target policy. However, this standard approach does not account for the error in estimating the transition probabilities, which can lead to a significant bias in the estimated return if the state distribution shifts during training.
Why is the current approach to importance weighting insufficient?
There are a few reasons why the current approach to importance weighting is insufficient. First, it doesn’t account for distribution shift. This means that the weights may be inaccurate if the data you’re trying to predict is different from the data used to train the model. Second, it can be difficult to accurately estimate the importance weights. This can lead to biased results. Finally, importance weighting can be computationally expensive, which can make it impractical for large datasets.
A new approach to importance weighting
In recent years, deep learning has become the state-of-the-art approach for many supervised learning tasks. While deep learning models often achieve impressive performance on held-out test data, their performance can degrade significantly when applied to data from a different distribution, i.e., when there is distribution shift. A popular approach to dealing with distribution shift is importance weighting, which adjusts the model’s predictions by reweighting each training example according to its estimated importance. In this paper, we develop a new method for importance weighting that is more robust to misspecification of the model class and to outliers in the training data. We show empirically that our method outperforms existing importance weighting methods on a range of benchmark datasets.
How does the new approach to importance weighting work?
We present a new approach to importance weighting for deep learning that is more effective under distribution shift. Our approach is based on re-weighting the training data according to a Softmax function of the importance weights. The re-weighted data is then used to train a deep neural network using standard methods. We show that this approach can significantly improve the performance of deep learning under distribution shift, particularly when the data is limited.
Why is the new approach to importance weighting effective?
In deep learning, importance weighting is a method of correcting for distributional shift, whereby the training and test data differ in their distribution. This can occur for a variety of reasons, such as concept drift (a change in the real-world phenomenon that the data is modeling) or non-stationarity (a change in the data-generating process itself).
Traditional importance weighting methods are effective at correcting for distributional shift when the shift is small. However, when the shift is large, these methods tend to break down.
The new approach to importance weighting, which was proposed by Athey et al. (2017), is effective at correcting for large distributional shifts. The key idea is to use a flexible model to estimate the importance weights, rather than using a fixed model as in traditional methods. This allows the estimator to adapt to different types of shifts, and results in improved performance.
In this paper, we revisited the importance weighting method for deep learning under a distribution shift scenario. We theoretically and experimentally showed that the traditional importance weighting method is suboptimal, and proposed a new method called self-normalized importance weighting. Our proposed method estimates the model by solving a novel learning problem, which is to minimize the expected Kullback-Leibler divergence between the model output distribution and the ground truth output distribution under the changed data distribution. We experimentally compared our proposed method with state-of-the-art methods on both synthetic and real datasets, and showed that our proposed method outperforms all existing methods.
One interesting direction for future work is to investigate the role of different importance weighting schemes in the context of distribution shift. For example, the naïve importance weighting scheme described in Section 3.1 is a very simple and elegant method, but it does not always perform well in practice. It would be interesting to see if more sophisticated importance weighting schemes, such as those proposed by (Bengio et al., 2015) and (Sung et al., 2018), can improve performance on tasks with distribution shift.
There is a growing body of work on importance weighting for deep learning under distribution shift, which is a common problem when deploying machine learning models in the real world. This paper reviews recent advances in this area and proposes a new approach to importance weighting that is more effective than existing methods.
The paper begins by reviewing some of the most common ways to measure distribution shift, including distributional distance measures and moment matching. It then reviews existing importance weighting methods, including the reweighting method proposed by Zemel et al. (2013) and the importance score method proposed by Shakhnarovich et al. (2006).
The paper then proposes a new method for importance weighting that is based on a generalization of the Gini index. The new method is evaluated on two standard benchmark datasets, MNIST and CIFAR-10, and is shown to outperform existing methods by a significant margin.
Keyword: Rethinking Importance Weighting for Deep Learning Under Distribution Shift