Convex Optimization 2026, 27 - Nonparametric Distribution Estimation
A random variable X with values in an infinite subset of ℝ has a distribution characterized with p ∈ ℝn with prob(X = αk) = pk. p ⪰ 0, 1Tp = 1. The inverse of argument is also true. This defines the probability simplex {p ∈ ℝn | p ⪰ 0, 1Tp = 1} (p.359). Many types of prior information about p is written in terms of constraint or inequalities, using E f(X) = ∑ni=1pif(αi) as a linear function of p. A special case occurs for C ⊆ ℝ, then the probability is a linear function of p. Known expected values of certain functions can be incorporated as linear equality constraints on p (p. 360).
Bounds of the expected value can be derived from prior information by solving the convex problem minimizing ∑ni=1f(αi)pi (p. 361). p could also be estimated from the distribution using maximum likelihood estimation through a log-likelihood function of l(p) = ∑ni=1ki log pi (p. 361). The maximum entropy distribution is determined by minimizing ∑ni=1pi log pi (p. 362). The minimum Kullback-Leibler divergence of p given a prior distribution q is determined by minimizing ∑ni=1 pi log(pi/qi) (p. 362).