On the previous post, I connected the dots between the different time series, allowing to recover the full time order, and get a unique identifier for each asset.
In this part, I will use external data to disambiguate the assets and understand the magic quantities.
The data and details of the challenge are still here data.
The first thing we can do to learn what are the two quantities md
and bc
is to look at their distribution and their evolution over time.
When looking at these two quantities, it doesn’t look like usual distribution (Normal, Poisson, Cauchy, …). At least, we know the range of the values.
For bc
, the values are mainly between [0, 1]
but some part of the distribution leaks bellow 0
, so this might not be a rate in %
.
Also, the values are weirdly capping at 1, which doesn’t look natural as negative values are possible.
bc
might be a correlation coefficient, however we don’t know how it is calculated.
For md
, the first hypothesis was the log of the true asset price relatively to the bitcoin. Because no cryptoasset is more expensive than the bitcoin, the values would be in \((0, 1]\), leading to log values in \((-\infty, 0]\).
As we know the relative time of each asset, we can see how these two quantities evolve over time.
For md
we get:
And bc
:
We can see that there are major events that affect the md
and bc
time-series globally.
This is very clear for md
, where the lines seems parallel to each others, with a jump near week 20
and a progressive drop with recovery between week 160-180
.
The behavior of bc
is clearly different.
It it less stable, with short-term events of large amplitude.
The dates of the events of md
and bc
don’t seem to be related.
If we look the history for each asset, it helps to understand more precisely the long term variation would be.
For md
, because of the closeness between the top-items, we expect they belong to a single asset.
Showing the lineage confirms this hypothesis:
We can study bc
the same way:
The results are noisier and more difficult to understand.
The value of bc
is less stable over time.
For the legibility of the plot, we only represented assets with an history with at least 150 weeks.
We can see that even if the series are noisy, it is still the “top” series.
Selecting one asset at random, we get the following reconstructed signal:
The corresponding md
values is:
You wan see that the relationship between asset and the md
value is not straightforward.
Given that the assets variation are quite large, a way to study it is to move to the log scale.
Superposing the two curves, (with different scaling), we get the following:
Here, we can see how similar are the two curves. The shape is almost the same, and short-term events occurring at the same time. There are still some differences which are visible on the long term, as the two curves do not exactly match.
Looking at other assets, we get similar results:
The hypothesis that md
represents the mean relatively to the bitcoin is highly probable.
The observed difference can be due to the 24th missing point of each day.
Here, we used the natural log to convert the asset time-series. As a reminder, the value of an asset at time \(t\) given the previous results is:
\[X_t = X_0 \prod_{i=1}^t (1 + R_i)\]where \(X_0\) is the value of the asset at time \(t_0\), and \(R_i\) the return at time \(t_i\), i.e. \(R_i = \frac{X_i}{X_{i-1}} - 1\). Because we don’t know what is \(X_0\), we set it to \(1\) to study the behavior of \(\mathbf{X}\).
When moving to the log, everything become simpler:
\[\log(X_t) = \log(X_0) + \sum_{i=1}^t \log(1 + R_i)\]To convert one log from one base to another, we have a simple formula:
\[\log_b(a) = \frac{\log(a)}{\log(b)}\]The log in base \(b\) is simply the value of the log in the natural base divided by the value of the log of the base.
Here, we observed that the natural log is the best match \(md \propto A + \sum_{i=1}^t \log(1 + R_i)\).
Centering the asset log series to the md
mean, we can see that the scale is similar:
Because cryptocurrencies are “open systems”, prices are well recorded and available in multiple databases for free.
To confirm our hypothesis about md
interpretation, we wanted to compare the values to true historical asset prices.
We found Coinmetrics, which provides a day-by-day history for many assets.
For each asset, we have the price relatively to the Bitcoin price, and the price of the asset relatively to USD or Euro.
There are many other informations, but we won’t exploit them here.
The only issue we have with this dataset is the samping: The Napoleon’s dataset is an hourly dataset, while the Coinmetrics base gives us the daily prices. We need to adjust to this by averaging over one day the return we have.
With this dataset in the hands, we can quickly see if our hypothesis is valid or not. The most valuable crypto asset after Bitcoin is Ethereum. If we look at its price relatively to the Bitcoin, we get:
And if we move to the log scale, we get this signal.
Averaging over one week’s chunk and moving to the log scale, we get:
where the signal exactly overlap the md
values of the best asset.
We can now put a date on it:
We are quite happy but we needed to do a small adjustment.
We needed to shift the time series by 1.8
points up, where there is no clear explanation to it.
When we use the wrong log base, it impacts the amplitude of the time-series movement. However, the amplitude was correct, and didn’t need to be adjusted.
It means that \(MD(t) = A + \log(ETC/BTC price(t)) = \log(\exp(A) \times ETC/BTC price(t))\).
The reason is not clearly understood why we have this factor.
Additionally, the factor is not the same for all crypto-assets.
For XLM
, we needed to readjust by 7
points, for DOGE
by 8.8
points.
The factor doesn’t seem to be related to the initial asset value.
Hopefully, we found that Ethereum is the top one, but for instance dash which is pricey has a negative coefficient of -0.69
.
We did not find a general law to explain these coefficients.
It seems that for each asset, a random factor has been selected to transform the true price so we cannot recover it trivially just using the dataset.
To identify which dates suits the most to our dataset, we studied the cross-correlation.
For finding the recording date of md
, we tested averaging over one, two and three weeks the asset’s values.
We tested different starting days of the week, because it is possible that day 1
is not a Monday.
Additionally, in some country, the first day of the week is Sunday, so we checked for it.
The best averaging window was 1 week
, and the best starting date was the 2017-08-02
(i.e. averaging between this date and the 2017-08-09
).
The 2nd of August is a Wednesday. This date match md
for the earliest record.
To find the recording date of the asset value, we integrated our series and under-sampled it once every day:
(1 + R).cumprod()[::24]
We did not pay attention to the possible hour lag, as it is of limited interest.
By searching the best cross-correlation, testing different chunks, we found that the best starting day is 2017-07-19
.
Knowing that there are 216 weeks recorded, the recording ends the 2021-09-07
.
In other words, md
is the log value of the asset average value over the 3rd week of a cluster.
As we were able to find that md
was the mean log price, we tried to fill the missing 24 hours of each day by adjusting to the difference between the log returns and the md
values. However, this approach was unsuccessful.
The gap between the returns and md
is too large to be compensated using 5%
of the dataset.
We obtained a strongly modified distribution.
Therefore, we did not try any submission with this information.
>> You can subscribe to my mailing list here for a monthly update. <<