label\prediction | pos. | neg. |
---|---|---|
pos. | tp | fp |
neg. | fn | tn |
David told me about another interesting problem: discovering alignment between proteins in different species. Each species has a different set of proteins, but often proteins can be linked because the species are relatively close in the evolutionary tree and their functions have remained somewhat similar. Such proteins are called "homologous". We have information about PPI for each species. The question is, can we use this information to determine which proteins are homologous? Assume that interactions are symmetric. Then if we learn a matrix factorization of some underlying interaction parameter, the two factors are identical, e.g. X=UUT. Imagine that we put all the interactions into a single matrix. Top columns are proteins of species #1, bottom columns are proteins of species #2. We have no data concerning inter-species interaction. But, a trace norm penalty on the underlying parameter matrix would encourage the model to find similarites in the protein representations. We could run a bi-partide graph matching algorithm using Euclidean distances between the representations (or some such). But, this doesn't seem that much better than the algorithms David told me about... That's not completely true. At least the similarity measure is not completely heuristic, but rather learned in a semi-intelligent way.
ffunction). It is interesting to note that they are effectively producing a ranking by a weighted combination of expert ratings (since
Sis a (discrete) totally ordered set). There is a separate
ffor each
expert, which defines his/her binning/rating of the examples. Down sides of their approach is that they do exponential weight updates, effectively trying to focus on the single best expert. And, their loss function is classification error, which may not generalize well to new data.
kernel. The idea: θ ∼ N(0,1); the observation is f(xk) = θTφ(xk) → yk; then Cov[f(xi)f(xj)] = φ(xi) φ(xj), which is a kernel: K(xi,xj). Their model is similar to an immediate-thresholds model. But, the loss function is not defined as a sum of two MPF's. In fact, the way of constructing their
loss functionis somewhat reversed from the method I use. They define an
ideallikelihood which is unity in the correct segment (defined by the thresholds). But, they assume a Gaussian distribution over the mapping of the example to the real number line. The result is something very similar to immediate thresholds with the modified least squares loss function (note the middle plot from Figure 1: derivative is linear). Tommi thinks that if you replace the Gaussian prior on the example mapping to the real line, you get the Proportional Odds model. Chu/Ghahramani achieve regularization via the Gaussiann prior on f. Both of their learning methods try to maximize the posterior, which is comparable to how we learn parameters. Tommi notes that the immediate-thresholds plus Logistic Regression MPF does not correspond to a probability distribution. When we sum the LR MPFs, we would need to renormalize.