Cost for MDP

Discrete MDP

(S, A, C, π)

According to Bellman equation :

C(s_i)_tot = C(s_i)+∑_j ∈ SγA(i, j)C(s_j)

γin(0, 1]

Continuous-(time) MDP

γ is replaced by a kind of Poisson process exp^−λt = γ(t)

Cohen's Kappa

Distance between grade of observator when labelling data

$$\kappa = \frac{Pr(a) - Pr(e)}{1 - Pr(e)}$$

Pr(e) Random consensus Pr(a) Relative consensus between observator

κ = 1 Total accord κ = 0 No agreement

Chebyshev

For a ℝⁿ space, max 1D-projection distance

max(A, B)=max{|a₁ − b_i|,...,|a_n − b_n|}

DICE coefficient

Similarity between two sets

number of bigram / trigram similar in two strings

DSC

Dice similarity coefficient

$$DSC = \frac{|U \cap V|}{|U|+|V|}$$

Euclidian

On a ℝⁿ space :

With x_j = (a_1, j, ..., a_n, j)

$$d(\mathbf{x}_1, \mathbf{x}_2) = \sqrt{\sum^n_i (a_{i, 1} - a_{i,2})^2}$$

EPE: Expected Predicted Error

EPE = E{Y − f̂(x)}² = E{Y − f(x)}² + {E(f̂(x)) − f(x)}² + E{f̂(x)−E(f̂(x))}²

EPE = Var(Y)+Bias² + Var(f̂(x))

Error of the measure (even if the model is right)
Bias : Mispefying the model
Error using a sample (lack of continuity and covering the space)

Geodesic

A geodesic is a curve which follow the minimal distance path.

γ : I → M where I is the unit interval space and M is the metric space.

γ is a geodesic if there exist v ≥ 0∀t ∈ I∃J|∀(t₁, t₂)∈J

d(γ(t₁),γ(t₂)) = v|t₁ − t₂|

Giny impurity

Distance from predicted classification to effective classification, for a set of items J items, which classification value (before rounding) is f_i. If f_i is close to 0 or 1, the error related to item i is minimal. If f_i is close to 0.5, ie randomly choosen, the error is maximal

$$I_G(f) = \sum_{i=1}^J f_i(1 - f_i)$$

Hamming

Distance in bits between two strings.

If they don't have the same size, the extra bits of the longest can be considered as 1.

d(x₁, x₂)=∑XOR(x₁, x₂)

XOR table :

	0	1
0	0	1
1	1	0

Haversine

Distance of two points on a sphere.

First :

$$hav(\theta) = \sin^2 \left(\frac{\theta}{2}\right) = \frac{1 - \cos(\theta)}{2}$$

$$hav^{-1} (h) = 2 \arcsin(\sqrt(h))$$

d : greatest distance (unknown)
r : radius of the sphere
ϕ : latitude
λ: longitude

$$hav \left(\frac{d}{r}\right) = hav (\phi_2 - \phi_1) + \cos(\phi_1) \cos(\phi_2) hav(\lambda_2 - \lambda_1)$$

Jordan center :

For a set of points U,

û = JC(U)=argmin_{u_s ∈ U}max_{u_i ∈ U}Dist(u_s, u_i)

It is the point which is the most central, ie., the maximal distance to the other point is the lowest.

Question :

How to find it efficiently ?

is this center nearby the mean of the cluster ?? ==> No, as if too much point are concentrated elsewhere than in the middle, it does not work.

Find the maximal distance between all points, and find the point nearby the middle of this biggest axis

Kolmogorov complexity

Length of the shortest computer programm that is able to generate the "string" / "byte string" from scratch in a given language.

Kullback-Leibler Divergence

Dissimilarity between two probability distribution.

Dissimilarity of Q according to P

$$D_{KL}(P||Q) = \sum_i P(i) \log \frac{P(i)}{Q(i)}$$

$$D_{KL}(P||Q) = \int_{-\infty}^{\infty} p(x) \log \frac{p(x)}{q(x)} dx$$

Levenshtein

Distance in strings counting the number of characters between two words. Number of character to add/remove/change between two strings.

Exemple, with the code :

I : insert
- exchange
R remove

Distance(Alice, Claie) = 3

a l i c e
c l a i e
- l * * e

In order to calculate the distance, there are three common algorithms :

The Levenshtein algorithm, for short length strings
The Jaccard algorithm
TF-IDF algorithm

Levenshtein algorithm

TODO

Complexity

Use a matrice of size (n + 1)×(m + 1)

Jaccard algorithm

TODO

Manhattan

$$d(x_1, x_2) = \sum^n_i |a_{i, 1} - a_{i,2}|$$

MDS: Multidimensional scaling

Compare the similarity between 2 datasets of N-Dimensional object. Try to preserve the distance between two objects in a distance space

N = 2

$d_{ij} = \sqrt{(x_i - x_j)^2 + (y_i - y_j)^2}$

Stress / Strain : Loss function

$Stress =\sqrt{\frac{\sum (f(x) - d)^2}{\sum d^2}}$

t-Distributed Stochastic Neighbor Embedding (t-SNE)

Visualisation of x without any y by using similarity KL divergence, ...

Sperical distance / Great circle / orthorombic distance

Distance on the sphere

Φ, λ: latitude, longitude

Δσ: angle at the middle of the sphere between two points

Δσ = 2arccos(sin Φ₁ ⋅ sin Φ₂ + cos Φ₁ ⋅ cos Φ₂ ⋅ cos Δλ)

d = rΔσ

Neighbourhood

Von Newman

1D : 2 adjacent pixels

X A Y

2D : 4 adjacent pixels

	X
X	A	X
	X

3D : 6 Adjacents pixels

And for voxels : 2D : Hexagonal, so 6 neighbours. Same distance than Moore in that case

Moore

2D : Manhattant distance = 1

NW	N	NE
w	A	E
SW	S	SE

Normalized compression distance

Paturi : clustering malware

p, the shortest program to compute x from y or y from x.

|p|=max{K(x|y),K(y|x)}

K(x|y): Algorithmic information of x given y

To express similarity, normalize by average, and obtain the Normalized Information Distance:

$$NID(x,y) = \frac{max\{K(x|y),K(y|x)\}}{max\{K(x),K(y)\}}$$

As the distance is not always computable, given a compressor Z. Z(x) is the bitlength of x compressed using a specific algorithm, such as gzip, PPMZ or others.

$$NCD_Z (x,y) = \frac{Z(xy) - min\{Z(x),Z(y)\}}{max\{Z(x),Z(y)\}}$$

NCD is robust to noise, and free of parameter (just the compressor Z).

Normalized Google distance

Give the semantic similarity of two words x and y:

$$NGD (x,y) = \frac{max\{log(f(x)), log(f(y))} - log(f(x,y))\}{log N - min\{log f(x), log f(y)\}}$$

Use the page hit count f(x) : number of page containing x f(x, y): number of page containing x AND y N : number of pages indexed