Importance Weighted TL for regression

TOM的这个文章，一年前就看过，可是看了几次都觉得不能理解。现在看来这个方法可能是效果最好的方法，于是决定仔细看一次，争取搞懂。另外，问作者要程序，作者说找不到了，这就很麻烦了。本文主要思想是：对于辅助数据集中的数据，依据其对目标模型的贡献度设置权重。基于此提出了两种方法：KITL和DITL。

Garcke J, Vanck T. Importance weighted inductive transfer learning for regression[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Berlin, Heidelberg, 2014: 466-481.

基本定义

Primal Data ： $\left(\mathcal{X}^{P}, \mathcal{Y}^{P}\right)$
Primal Data分布：$p^{P}(x, y)$
Secondary Data：$\left(\mathcal{X}^{S}, \mathcal{Y}^{S}\right)$
Secondary Data分布：$p^{S}(x, y)$
Dataset shift： $p^{P}(x, y) \neq p^{S}(x, y)$
Covariate shift: $p\left(y | x^{P}\right)=p\left(y | x^{S}\right)$ and $p\left(x^{P}\right) \neq p\left(x^{S}\right)$
Prior probability shift: $p\left(y^{P}\right)\neq p\left(y^{S}\right)$ and $p\left(x^{P}\right) = p\left(x^{S}\right)$
本文关注：$p^{P}(x, y) \neq p^{S}(x, y)$

在许多实际案例上，可以说明标签偏置问题的合理性。如地震预测，和航班延误预测等，不同地方，不同时间，导致了标签分布呈不同趋势，然而映射关系依然是相似的。用加州的地震数据辅助建立日本的地震数据模型，就是一种Inductive transfer learning。

假设：源数据中存在某些数据$(x, y)$满足：$p^{P}(x, y) \approx p^{S}(x, y)$。所以加权的任务就是使这些数据有更高的权重。