||X||Σ = maxY: ||Y|| ≤ 1 tr(YTX),where ||Y|| is the spectral norm (maximum singular value) of Y and tr() denotes the trace. The maximization is over matrices with maximal eigen value of one or less. However, it is equivalent to maximize over orthogonal matrices, which have all singular values equal to one. Thus, the trace norm is less than or equal to the sum of the lengths of the columns (or rows) of X.
dA+ = A+A+'(dA' - (dA'A + A'dA)A+).We haven't been able to show convexity for arbitrary A. Some terms in the second derivative are negative and do not appear to cancel w/ other terms. Useful in the square A case is what is known as a commutation matrix, which is defined on page 16 of Thomas Minka's Matrix Algebra for Statistics paper. Note that the commutation matrix has orthonormal columns.
∇f(x)T(y-x) ≥ 0 ⇔ f(y) ≥ f(x)Every differentiable convex function is pseudoconvex. A pseudoconvex function is quasiconvex.
f(λx + (1-λ)y) ≤ max{f(x),f(y)}.If f(x) is convex, then f(x) is quasiconvex. For each α ∈ ℜ, define the α-level set of f(x) as
Sα = {X ∈ X | f(x) ≤ &alpha}.A function f(x) is quasiconvex iff Sα is a convex set ∀ α ∈ ℜ.
dY = (I⊗X + XT⊗I)dXNote that in the calculation of the determinant, ∏i (λi + λi) = 2n(det X). Similar logic is used to obtain the expression for general powers, though neither John nor I completely understand the manipulations. And, neither of us have a good handle on the formula for general functions...
det J = ∏i≤j λiλj = (det A)n+1Note that in the product, λi appears n+1 times. One way to see this is that there are n(n+1)/2 λiλj terms, each of which has 2 lambdas. There are n different λi, so each λi must occur n+1 times. Another way to see this is to consider λ1. It appears in the following terms: λ1λ1, λ1λ2, λ1λ3, ... It appears in n terms, but appears twice in one of those terms, for a total of n+1 occurrences.
∏1≤i≤n, 1≤j≤m μiλj = (det A)m(det B)nNote that the number of instances of λj in the product is equal to the number of distinct μi's. Hence, the reason that (det A) is taken to the power of the dimesion of B and vice versa.
An alternate proof is to note that A⊗B = (A⊗I)(I⊗B) and that the determinant of a product is the product of determinants. Note that the first I is m-by-m; the second is n-by-n.