Real-time anomaly detection for crypto transactions.
The run on FTX deposits and subsequent liquidity crunch at FTX has sent shock waves through the crypto market. In one day, FTX’s token price dropped 80+% and FTX CEO Sam Bankman-Fried lost an estimated $14.6 billion dollars – nearly 94% of his total wealth – according to the Bloomberg Billionaire Index.
Disclosure: I am not a crypto trader. I am a software engineer. But this rapid deterioration of a well-capitalized, crypto platform got me thinking: how could streaming technology have been applied to help crypto traders see signals in data instantly, so as to potentially avoid catastrophic losses (or, to realize massive gains)? This is a classic streaming analytics use case: automating an action or alert to help make sense of massive amounts of data in real time. And while Binance CEO Changpeng Zhao’s tweet on November 6 (where he announced plans to sell his company's roughly US$530 million holding of FTT, the native token of FTX) was clearly the initial move triggering customer withdrawals, I think knowing the quantum and speed of customer movements early would have been hugely valuable.
Fortunately I do streaming analytics as my day job. The product we’ve built at Timeplus helps users set up ultra-low-latency analytics quickly with a simple SQL query. So, I thought I’d try our solution for this trading scenario, as we’ve helped financial service institutions leverage our platform in similar applications (real-time pricing) in the past.
To get started, I connected the ETH transaction data into Timeplus to find out what happened and how real-time streaming analysis could have helped detect these critical signals and anomalies. Here is the definition of the ETH transaction stream:
Then I ran a simple query to monitor all the transactions in real-time:
To build an anomaly detection algorithm, I thought I’d start with understanding the baseline pattern of the data. We can use simple queries to run an analysis on the historical data of this stream to establish the baseline, which will inform our understanding of what an anomaly will look like.
Historical Data Analysis
As FTX is a crypto trading platform, the typical transaction I want to analyze is how many users are buying or selling the coin at a given point in time, and what the transaction amount is.
Top transactions
The first thing I want to know is: what are the top transactions?
SELECT
from AS sender, count_distinct(to) AS nreceivers, sum(cast(value, 'decimal(24, 0)') / 1000000000000000000) AS ether, window_start
FROM
tumble(table(eth_transactions), 1d)
GROUP BY
sender, window_start
ORDER BY
ether DESC
LIMIT 10
The above query uses a tumble window to aggregate all the transactions and return the top 10 transactions by the transaction values.
FTX with the sender code `0xc098b2a3aa256d2140208c3de6543aaef5cd3a94` ranked 5th largest daily transactions which seems normal. If we take a look at hourly based, we even found FTX in the list.
But if we take a look at the number of receivers, it shows something.
7,637 distinct transaction receivers ranked 1st in the daily transaction and from hourly based view, FTX occupied top 6, the data shows that a large number of users are selling from FTX.
Next, let’s take a look at the transaction trend of FTX. We can run the following query which shows the hourly transaction values using a one hour tumble window.
SELECT from AS sender, count_distinct(to) AS nreceivers, to_float(sum(cast(value, 'decimal(24, 0)') / 1000000000000000000)) AS ether, window_start FROM tumble(table(eth_transactions), 60m) WHERE sender = '0xc098b2a3aa256d2140208c3de6543aaef5cd3a94' GROUP BY sender, window_start ORDER BY window_start ASC
According to this data, it seems to start from Nov 7, 2:00 am, there seems to be a sudden increase in transactions. The trend line shows that sudden increase.
Real-time Anomaly Detection
Historical analysis is good for retrospect analysis to find out the cause of existing issues. However when massive real-time transaction data arrives at scale, real-time responsiveness at scale is critical. That’s where Timeplus can help. Here’s how I built a real-time alerting engine using streaming SQL on our platform.
Detect anomaly based on sudden increase or drop
The following streaming query will calculate the difference between the transaction aggregation value of current hour compared to the previous hour, if there is a significant difference, a result will be shown and we can trigger an alert so the user can table action, in this case, we set the threshold to 3 which means the new transaction value is 3 times bigger than the previous period.
SELECT ether, lag(ether) AS prev, (ether - prev) / prev AS increase FROM ( SELECT from AS sender, count_distinct(to) AS nreceivers, to_float(sum(cast(value, 'decimal(24, 0)') / 1000000000000000000)) AS ether, window_start FROM tumble(eth_transactions, 1h) WHERE (sender = '0xc098b2a3aa256d2140208c3de6543aaef5cd3a94') AND (window_start > (now() - 3d)) GROUP BY sender, window_start ORDER BY window_start ASC ) WHERE (prev > 0) AND (increase > 3)
If we run the above query (the following result is based on backfill replay of historical data), there would have been two alerts triggered for FTX transactions on Nov 7. We capture these those two sudden increases in our previous trend visualization:
Monitor millions of users at the same time
In the previous sample, the streaming just monitors one specific trader FTX. But the user could monitor all traders concurrently. This is challenging work because there are about 3 million users in total and more than 1 million active daily users.
SELECT DISTINCT count(from) FROM table(eth_transactions) WHERE _tp_time > (now() - 1d)
The above query returns the active user in the past one day, which gave me 1,211,487
Here, using OVER (PARTITION BY sender), the query will monitor all the users at the same time, and we have applied the same rule to trigger an alert when there is a sudden transaction amount increase.
SELECT sender, ether, prev, (ether - prev) / prev AS increase FROM ( SELECT window_start AS _tp_time, sender, ether, lag(ether) OVER (PARTITION BY sender) AS prev FROM ( SELECT from AS sender, count_distinct(to) AS nreceivers, to_float(sum(cast(value, 'decimal(24, 0)') / 1000000000000000000)) AS ether, window_start FROM tumble(eth_transactions, 10s) GROUP BY sender, window_start ORDER BY window_start ASC ) ) WHERE (prev > 0) AND (increase > 3)
This query will give the user all those transactions that have a significant amount increase in the real-time for all the millions of traders.
Detect anomaly compared to historical reference points
In previous queries, the anomaly is defined as a sudden change of value compared to a previous value, while in time series data, seasonality is an important feature to consider. In time series data, seasonality is the presence of variations that occur at specific regular intervals less than a year, such as weekly, monthly, or quarterly. Seasonality may be caused by various factors, such as weather, vacation, and holidays and consists of periodic, repetitive, and generally regular and predictable patterns in the levels of a time series. So, it is better to compare the previous values based on seasonality intervals, such as the same hour of yesterday.
With lag functions, getting that data is pretty simple, assuming the transaction has a seasonality of daily interval, we use lag(value, 24) as the baseline to compare. The query is changed to:
SELECT sender, ether, prev, (ether - prev) / prev AS increase FROM ( SELECT window_start AS _tp_time, sender, ether, lag(ether, 24) OVER (PARTITION BY sender) AS prev FROM ( SELECT from AS sender, count_distinct(to) AS nreceivers, to_float(sum(cast(value, 'decimal(24, 0)') / 1000000000000000000)) AS ether, window_start FROM tumble(eth_transactions, 10s) GROUP BY sender, window_start ORDER BY window_start ASC ) ) WHERE (prev > 0) AND (increase > 3)
This query will trigger once the current transaction value is 3 times bigger than that value happened yesterday. You can change the lag interval to 30 * 24 in case you want to use last month as reference, or use 365 * 24 in case the seasonality interval is yearly based.
SELECT window_start, ether, lag(ether, 24) AS prev, (ether - prev) / prev AS increase, array_join([(ether, 'realtime'), (prev, 'history')]) AS v, v.1 AS value, v.2 AS tag FROM ( SELECT from AS sender, count_distinct(to) AS nreceivers, to_float(sum(cast(value, 'decimal(24, 0)') / 1000000000000000000)) AS ether, window_start FROM tumble(table(eth_transactions), 1h) WHERE sender = '0xc098b2a3aa256d2140208c3de6543aaef5cd3a94' GROUP BY sender, window_start ORDER BY window_start ASC )
Using array function, the above query transforms the real-time and baseline data into the same column so we can visualize the trend, making comparisons easier.
In addition, I set up a notification monitoring system with a few clicks on our Timeplus platform. This allows me to send results to Slack or other notification channels right way, so action can be taken immediately. If a user had been running this streaming query I mentioned above, an alert email would have been triggered on the morning of Nov 7th. Action could clearly have been taken to avoid losses.
Another interesting topic will be have to analysis the relationship among those transactions, we can talk about that in the future.
Summary
From start to finish, it took me about 1 hour to build this query. And while I am no trader, I think the results produced by this query indeed would have been a useful signal to trigger portfolio changes quickly, thereby potentially saving huge amounts of capital. I would love to hear your feedback about how this kind of streaming data analytics could be improved.
Please feel free to contact me to discuss, or visit our website to learn more about our approach to streaming analytics at Timeplus. Happy streaming!