🎉 Gate.io Growth Points Lucky Draw Round 🔟 is Officially Live!
Draw Now 👉 https://www.gate.io/activities/creditprize?now_period=10
🌟 How to Earn Growth Points for the Draw?
1️⃣ Enter 'Post', and tap the points icon next to your avatar to enter 'Community Center'.
2️⃣ Complete tasks like post, comment, and like to earn Growth Points.
🎁 Every 300 Growth Points to draw 1 chance, win MacBook Air, Gate x Inter Milan Football, Futures Voucher, Points, and more amazing prizes!
⏰ Ends on May 4, 16:00 PM (UTC)
Details: https://www.gate.io/announcements/article/44619
#GrowthPoints#
Taking the Bitcoin double top structure of 2021 as an example: discussing what is "future data leakage"
When we backtest a set of strategies and examine a set of indicators, are we really standing in the moment? This paper starts from the two traps of "future data leakage" and "overfitting", and analyzes the fatal test of double-top on-chain analysis in 2021. (Synopsis: Continuation of life market: analysis of the weirdest double top in 2021 with on-chain data) (Background supplement: On-chain data academy (1): Do you know what is the average cost of BTC in the whole market? Key Points: Talk about the concept of "Look-ahead bias" Talk about hits in trading: Overfitting Double top in 2021: the biggest test for carving the sword Review of the performance of three indicators and one model Look-ahead bias Imagine a scenario: "Let's say I develop a trading strategy and rigorously backtest it over the past 50 years 1975 ~ 2024, and got a very good backtest performance. That's why I decided to launch this strategy in 2025." Dear readers, for the above description, does anyone see what is wrong? If you really only do 1975 ~ 2024 backtesting, it is actually easy to fall into the trap of "future data leakage". Because we simply used all the data "so far" to do backtesting, this will lead to the strategy parameters we trained, which are likely to be "impatient"! A more rigorous approach is, for example: you can backtest the performance of the whole year of 2024 for "1975 ~ 2023" (assuming that we are in 2024) and use the training strategy; Of course, you can also use the data of the 1975 ~ 2022 period to train the strategy, and then use the trained strategy to backtest the performance of 2023 and 2024. More precisely, we can avoid "future data leakage" as much as possible through "rolling backtesting" or "stepping backtesting". The advantage of this is to "ensure that the post-training strategy can withstand the test of the future." Overfitting: the deadliest poison Anyone with a little basic quantitative strategy development concept will know how serious the problems overfitting can be. The so-called overfitting, in layman's terms, is "carving the boat for the sword", which will make the backtest data look good (low training error), but difficult to apply to actual combat (high test error). Here I intend to introduce a mathematical concept to explain: (readers who have a headache when they see mathematics, you can skip directly to the next paragraph to see the conclusion) Suppose there is a sequence of numbers: "1 , 2 , 4 , 8 , 16 , ?" Readers who are slightly sensitive to numbers should think that the next number is 32, because the first five data items are obviously related to the "power of 2". But in fact, we can't actually predict what the next number will be. Because from a mathematical point of view, we can use Lagrange interpolation to construct another higher-order polynomial, so that the sixth data is not 32, but also satisfies the law. This means: "Predictions that are extrapolated solely on finite data points are unreliable." 2021 Second Top: The Biggest Test for Most Indicators After talking about boring theoretical concepts, let's talk about actual combat. Below, I will take out three on-chain data indicators and a model that I personally developed to explain to all readers: 1. MVRV I believe that readers who have studied on-chain data analysis a little must have heard of MVRV, and my previous article has also done detailed teaching on this indicator ( On-chain data school (1): Do you know what is the average cost of BTC in the whole market? )。 As shown above, this is a chart of MVRV's historical data. The 1, 2, 3, and 4 marked in the graph correspond to the top of 13, 17, and 2021, respectively. We can clearly see that the MVRV highs at the top of each round are "decreasing". I have heard many people use the following methods to deal with the phenomenon of diminishing highs: "I know it is decreasing, so when judging the top, I will grab a more conservative and lower threshold as a warning signal." Now the question arises: how to set a more conservative threshold? If we go back to April 2021 and see only 13 or 17 years of historical data, can the conservatively set threshold be triggered in April 21? Can the threshold set in this way be triggered at the second top in 2021? If you don't think April 2021 is the top, then the second top in 2021 is even less likely to escape the top, right? What I want to say is this: because the sample size of BTC historical data is too small, and if you simply consider the previous cycles, it is likely to fall into the trap of "future data leakage". A person in April 2021 (the first top) will not know that the MVRV value at that time is actually the highest point of that cycle, because he can only see the data of 13 or 17 years; Similarly, when the second top appeared in 2021, the MVRV value was in a very low position, and if the first top did not escape, it was naturally impossible to escape the top according to the data of the second top, so that the best opportunity to escape the top in 2021 would be missed. 2. AVIV indicator AVIV can be regarded as a corrected, more well-considered MVRV, and has a more obvious "mean reversion" characteristic than MVRV. But even so, the phenomenon of "diminishing peaks (highs)" is still obvious: the 1, 2, 3, and 4 marked in the figure correspond to the top of 13, 17, and 2021, respectively. For the same question, I will directly copy the above text for readers to consider: How to set a more conservative threshold? If we go back to April 2021 and see only 13 or 17 years of historical data, can the conservatively set threshold be triggered in April 21? Can the threshold set in this way be triggered at the second top in 2021? If you don't think April 2021 is the top, then the second top in 2021 is even less likely to escape the top, right? RUP (Relative Unrealized Profit) I also introduced RUP on-chain data in detail before, interested readers can refer to the following two articles: On-Chain Data Academy (9): Market Barometer RUPL(I) - Data Introduction > Bottom Reading Application On-Chain Data Academy (10): Market Barometer RUPL(II) - Strongest Top Signal & Historical Cycle Top Detailed Analysis A reader once asked: "Can understand the logic of RUP divergence, But should we also consider the all-time highs that RUP has reached?" As shown in the figure above, this is the historical chart of the RUP, and the 1, 2, 3, and 4 marked in the figure correspond to the top of 13, 17, and 2021, respectively. It can be seen that even if the RUP has been standardized for market capitalization, there is still a phenomenon of diminishing peaks. One more soul torture: How to set a more conservative threshold? If we go back to April 2021, we can see that the historical data is only 13 or 17 years, and the conservatively set threshold can be set at 21 ...