What :

Graphical Model

G = (V, E)

𝕐 = (𝕐v)v ∈ V

Markov Property :

P(\mathbb{Y}_v | \mathbb{X}, \mathbb{Y}_w, w \neq v) = p(\mathbb{Y}_v | \mathbb{X}, \mathbb{Y}_w, w\eq v)

where v and w are neighbors in the graph G

For sequences, Chains/ Tree

Fundamental Theorem of random Fields

p_{\theta}(\mathbb{y} | \mathbb{x}) \prop \exp \left( \sum_{e\in E, k} \lambda_k f_k(e,\mathbb{y}|_e, \mathbb{x}) + \sum_{v\in V, k} \mu_k g_k (v, \mathbb{y}|_v, \mathbb{x}) \right)

𝕩 data sequence 𝕪 label sequence |Y| Dictionary of possible states fk, gk boolean features, f associated to pair/edge (transition), g to point/vertices (state) 𝕪|S Set of components of 𝕪 with verticies in subgraph S

fy′,y(<u, v > ,𝕪|<u, v>, 𝕩)=δ(𝕪u, y′)δ(𝕪v, y)

g − y, x(v, 𝕪|v, 𝕩)=δ(𝕪v, y)δ′𝕩v, x)

Calculating CRF

For a sequence, set 𝕐0 = start and 𝕐n + 1 = stop

M, matrix |Y|×|Y| Mi(𝕩)=[Mi(y′,y|𝕩)] where i is the position of the observation in the sequence 𝕩

Mi(y′,y|𝕩)=exp(Λ(y′,y|𝕩)) where Λ(y′,y|𝕩)=∑kλkfk(ei, 𝕐|ei = (y′,y),𝕩)+∑kμkgk(vi, 𝕐|vi = y, 𝕩)

Normalisation constant :

Zθ(𝕩)=(M1(𝕩)M2(𝕩)...Mn + 1(𝕩))start, stop

Giving :

p_{\theta}(\mathbb{y}| \mathbb{x}) = \frac{\Pi_{i=1}^{n+1} M_i(\mathbb{y}_{i-1}, \mathbb{y}_i|\mathbb{x})}{(\Pi_{i=1}^{n+1} M_i(\mathbb{x}))_{start, stop}}

with y0 = start and yn + 1 = stop

Training CRF

IIS Improve Iterative Scaling