# Cost for MDP

## Discrete MDP

(S, A, C, π)

According to Bellman equation :

C(si)tot = C(si)+∑j ∈ SγA(i, j)C(sj)

γin(0, 1]

## Continuous-(time) MDP

γ is replaced by a kind of Poisson process expλt = γ(t)

# Cohen's Kappa

Distance between grade of observator when labelling data

$$\kappa = \frac{Pr(a) - Pr(e)}{1 - Pr(e)}$$

Pr(e) Random consensus Pr(a) Relative consensus between observator

κ = 1 Total accord κ = 0 No agreement

# Chebyshev

For a n space, max 1D-projection distance

max(A, B)=max{|a1 − bi|,...,|an − bn|}

# DICE coefficient

Similarity between two sets

number of bigram / trigram similar in two strings

# DSC

Dice similarity coefficient

$$DSC = \frac{|U \cap V|}{|U|+|V|}$$

# Euclidian

On a n space :

With xj = (a1, j, ..., an, j)

$$d(\mathbf{x}_1, \mathbf{x}_2) = \sqrt{\sum^n_i (a_{i, 1} - a_{i,2})^2}$$

# EPE: Expected Predicted Error

EPE = E{Y − (x)}2 = E{Y − f(x)}2 + {E((x)) − f(x)}2 + E{(x)−E((x))}2

EPE = Var(Y)+Bias2 + Var((x))

• Error of the measure (even if the model is right)
• Bias : Mispefying the model
• Error using a sample (lack of continuity and covering the space)

# Geodesic

A geodesic is a curve which follow the minimal distance path.

γ : I → M where I is the unit interval space and M is the metric space.

γ is a geodesic if there exist v ≥ 0∀t ∈ IJ|∀(t1, t2)∈J

d(γ(t1),γ(t2)) = v|t1 − t2|

# Giny impurity

Distance from predicted classification to effective classification, for a set of items J items, which classification value (before rounding) is fi. If fi is close to 0 or 1, the error related to item i is minimal. If fi is close to 0.5, ie randomly choosen, the error is maximal

$$I_G(f) = \sum_{i=1}^J f_i(1 - f_i)$$

# Hamming

Distance in bits between two strings.

If they don't have the same size, the extra bits of the longest can be considered as 1.

d(x1, x2)=∑XOR(x1, x2)

XOR table :

0 1
0 0 1
1 1 0

# Haversine

Distance of two points on a sphere.

First :

$$hav(\theta) = \sin^2 \left(\frac{\theta}{2}\right) = \frac{1 - \cos(\theta)}{2}$$

$$hav^{-1} (h) = 2 \arcsin(\sqrt(h))$$

Next :

• d : greatest distance (unknown)
• r : radius of the sphere
• ϕ : latitude
• λ: longitude

$$hav \left(\frac{d}{r}\right) = hav (\phi_2 - \phi_1) + \cos(\phi_1) \cos(\phi_2) hav(\lambda_2 - \lambda_1)$$

# Jordan center :

For a set of points U,

= JC(U)=argminus ∈ Umaxui ∈ UDist(us, ui)

It is the point which is the most central, ie., the maximal distance to the other point is the lowest.

Question :

How to find it efficiently ?

• is this center nearby the mean of the cluster ?? ==> No, as if too much point are concentrated elsewhere than in the middle, it does not work.

Find the maximal distance between all points, and find the point nearby the middle of this biggest axis

# Kolmogorov complexity

Length of the shortest computer programm that is able to generate the "string" / "byte string" from scratch in a given language.

# Kullback-Leibler Divergence

Dissimilarity between two probability distribution.

Dissimilarity of Q according to P

$$D_{KL}(P||Q) = \sum_i P(i) \log \frac{P(i)}{Q(i)}$$

$$D_{KL}(P||Q) = \int_{-\infty}^{\infty} p(x) \log \frac{p(x)}{q(x)} dx$$

# Levenshtein

Distance in strings counting the number of characters between two words. Number of character to add/remove/change between two strings.

Exemple, with the code :

• I : insert
• exchange
• R remove

Distance(Alice, Claie) = 3

• a l i c e
• c l a i e
• l * * e

In order to calculate the distance, there are three common algorithms :

• The Levenshtein algorithm, for short length strings
• The Jaccard algorithm
• TF-IDF algorithm

## Levenshtein algorithm

TODO

### Complexity

Use a matrice of size (n + 1)×(m + 1)

TODO

# Manhattan

$$d(x_1, x_2) = \sum^n_i |a_{i, 1} - a_{i,2}|$$

# MDS: Multidimensional scaling

Compare the similarity between 2 datasets of N-Dimensional object. Try to preserve the distance between two objects in a distance space

• N = 2

$d_{ij} = \sqrt{(x_i - x_j)^2 + (y_i - y_j)^2}$

Stress / Strain : Loss function

$Stress =\sqrt{\frac{\sum (f(x) - d)^2}{\sum d^2}}$

# t-Distributed Stochastic Neighbor Embedding (t-SNE)

Visualisation of x without any y by using similarity KL divergence, ...

# Sperical distance / Great circle / orthorombic distance

Distance on the sphere

Φ, λ: latitude, longitude

Δσ: angle at the middle of the sphere between two points

Δσ = 2arccos(sin Φ1 ⋅ sin Φ2 + cos Φ1 ⋅ cos Φ2 ⋅ cos Δλ)

d = rΔσ

# Neighbourhood

## Von Newman

• 1D : 2 adjacent pixels

X A Y

• 2D : 4 adjacent pixels
X
X A X
X
• 3D : 6 Adjacents pixels

And for voxels : 2D : Hexagonal, so 6 neighbours. Same distance than Moore in that case

## Moore

• 2D : Manhattant distance = 1
NW N NE
w A E
SW S SE

# Normalized compression distance

Paturi : clustering malware

p, the shortest program to compute x from y or y from x.

|p|=max{K(x|y),K(y|x)}

K(x|y): Algorithmic information of x given y

To express similarity, normalize by average, and obtain the Normalized Information Distance:

$$NID(x,y) = \frac{max\{K(x|y),K(y|x)\}}{max\{K(x),K(y)\}}$$

As the distance is not always computable, given a compressor Z. Z(x) is the bitlength of x compressed using a specific algorithm, such as gzip, PPMZ or others.

$$NCD_Z (x,y) = \frac{Z(xy) - min\{Z(x),Z(y)\}}{max\{Z(x),Z(y)\}}$$

NCD is robust to noise, and free of parameter (just the compressor Z).

$$NGD (x,y) = \frac{max\{log(f(x)), log(f(y))} - log(f(x,y))\}{log N - min\{log f(x), log f(y)\}}$$