(S, A, C, π)
According to Bellman equation :
C(si)tot = C(si)+∑j ∈ SγA(i, j)C(sj)
γin(0, 1]
γ is replaced by a kind of Poisson process exp−λt = γ(t)
Distance between grade of observator when labelling data
$$\kappa = \frac{Pr(a) - Pr(e)}{1 - Pr(e)}$$
Pr(e) Random consensus Pr(a) Relative consensus between observator
κ = 1 Total accord κ = 0 No agreement
For a ℝn space, max 1D-projection distance
max(A, B)=max{|a1 − bi|,...,|an − bn|}
Similarity between two sets
number of bigram / trigram similar in two strings
Dice similarity coefficient
$$DSC = \frac{|U \cap V|}{|U|+|V|}$$
On a ℝn space :
With xj = (a1, j, ..., an, j)
$$d(\mathbf{x}_1, \mathbf{x}_2) = \sqrt{\sum^n_i (a_{i, 1} - a_{i,2})^2}$$
EPE = E{Y − f̂(x)}2 = E{Y − f(x)}2 + {E(f̂(x)) − f(x)}2 + E{f̂(x)−E(f̂(x))}2
EPE = Var(Y)+Bias2 + Var(f̂(x))
A geodesic is a curve which follow the minimal distance path.
γ : I → M where I is the unit interval space and M is the metric space.
γ is a geodesic if there exist v ≥ 0∀t ∈ I∃J|∀(t1, t2)∈J
d(γ(t1),γ(t2)) = v|t1 − t2|
Distance from predicted classification to effective classification, for a set of items J items, which classification value (before rounding) is fi. If fi is close to 0 or 1, the error related to item i is minimal. If fi is close to 0.5, ie randomly choosen, the error is maximal
$$I_G(f) = \sum_{i=1}^J f_i(1 - f_i)$$
Distance in bits between two strings.
If they don't have the same size, the extra bits of the longest can be considered as 1.
d(x1, x2)=∑XOR(x1, x2)
XOR table :
0 | 1 | |
---|---|---|
0 | 0 | 1 |
1 | 1 | 0 |
Distance of two points on a sphere.
First :
$$hav(\theta) = \sin^2 \left(\frac{\theta}{2}\right) = \frac{1 - \cos(\theta)}{2}$$
$$hav^{-1} (h) = 2 \arcsin(\sqrt(h))$$
Next :
$$hav \left(\frac{d}{r}\right) = hav (\phi_2 - \phi_1) + \cos(\phi_1) \cos(\phi_2) hav(\lambda_2 - \lambda_1)$$
For a set of points U,
û = JC(U)=argminus ∈ Umaxui ∈ UDist(us, ui)
It is the point which is the most central, ie., the maximal distance to the other point is the lowest.
Question :
How to find it efficiently ?
Find the maximal distance between all points, and find the point nearby the middle of this biggest axis
Length of the shortest computer programm that is able to generate the "string" / "byte string" from scratch in a given language.
Dissimilarity between two probability distribution.
Dissimilarity of Q according to P
$$D_{KL}(P||Q) = \sum_i P(i) \log \frac{P(i)}{Q(i)}$$
$$D_{KL}(P||Q) = \int_{-\infty}^{\infty} p(x) \log \frac{p(x)}{q(x)} dx$$
Distance in strings counting the number of characters between two words. Number of character to add/remove/change between two strings.
Exemple, with the code :
Distance(Alice, Claie) = 3
In order to calculate the distance, there are three common algorithms :
TODO
Use a matrice of size (n + 1)×(m + 1)
TODO
$$d(x_1, x_2) = \sum^n_i |a_{i, 1} - a_{i,2}|$$
Compare the similarity between 2 datasets of N-Dimensional object. Try to preserve the distance between two objects in a distance space
$d_{ij} = \sqrt{(x_i - x_j)^2 + (y_i - y_j)^2}$
Stress / Strain : Loss function
$Stress =\sqrt{\frac{\sum (f(x) - d)^2}{\sum d^2}}$
Visualisation of x without any y by using similarity KL divergence, ...
Distance on the sphere
Φ, λ: latitude, longitude
Δσ: angle at the middle of the sphere between two points
Δσ = 2arccos(sin Φ1 ⋅ sin Φ2 + cos Φ1 ⋅ cos Φ2 ⋅ cos Δλ)
d = rΔσ
X A Y
X | ||
---|---|---|
X | A | X |
X |
And for voxels : 2D : Hexagonal, so 6 neighbours. Same distance than Moore in that case
NW | N | NE |
---|---|---|
w | A | E |
SW | S | SE |
Paturi : clustering malware
p, the shortest program to compute x from y or y from x.
|p|=max{K(x|y),K(y|x)}
K(x|y): Algorithmic information of x given y
To express similarity, normalize by average, and obtain the Normalized Information Distance:
$$NID(x,y) = \frac{max\{K(x|y),K(y|x)\}}{max\{K(x),K(y)\}}$$
As the distance is not always computable, given a compressor Z. Z(x) is the bitlength of x compressed using a specific algorithm, such as gzip, PPMZ or others.
$$NCD_Z (x,y) = \frac{Z(xy) - min\{Z(x),Z(y)\}}{max\{Z(x),Z(y)\}}$$
NCD is robust to noise, and free of parameter (just the compressor Z).
Give the semantic similarity of two words x and y:
$$NGD (x,y) = \frac{max\{log(f(x)), log(f(y))} - log(f(x,y))\}{log N - min\{log f(x), log f(y)\}}$$
Use the page hit count f(x) : number of page containing x f(x, y): number of page containing x AND y N : number of pages indexed