The cryptocurrency ecosystem generates vast amounts of high-velocity, mutable data. From Ethereum’s 2.7 billion transactions since 2015 to Solana’s 500TB of data and sub-second block times, the challenges of ingesting, processing, and querying this data are immense. Fragmented tooling, latency in analytics, and the need for cost efficiency plague organizations trying to derive real-time insights.
In this case study, we explore how Zyre, a leading cryptocurrency data platform leveraged Timeplus Enterprise to overcome these challenges. While the use case centers on blockchain data, the lessons learned apply broadly to industries dealing with high-cardinality, mutable data and real-time analytics demands.
“We tried countless tools and rewrote our pipeline multiple times. Timeplus was the first solution that let us index the full Ethereum chain—and now, any chain—in real-time. It’s transformed how we deliver data to our customers.”
— Mathew Haji, Founder at Zyre

Why Crypto Data Engineering Is Hard
If you're a data engineer, you’re probably comfortable handling high-velocity data, managing complex ETL pipelines, and scaling infrastructure to meet demand. But crypto data? That’s a whole different beast. Unlike traditional financial data, blockchain data is decentralized, high-cardinality, constantly mutating, and often needs to be analyzed in real-time. If you’ve ever worked with a high-throughput transactional system, so-called “web scale”, imagine that—but at "chain scale."
To put things in perspective, the Ethereum Mainnet alone has processed over 2.7 billion transactions since 2015, with blocks being added every 12 seconds. Faster chains like Arbitrum (250ms block times) and upcoming networks like MegaETH (10ms block times) generate even more data. The sheer volume, combined with the complexity of blockchain state changes, presents unique technical challenges:
1. High Cardinality and Scale
Public blockchains generate an immense amount of structured event data. Every transaction, smart contract execution, and token transfer gets recorded permanently.
Consider this:
Ethereum Mainnet: 2.7 billion transactions, 2 billion accounts, 2.5 billion transfers.
Solana: 500TB of core data with sub-second block times.
Base & Arbitrum: Blocks produced as fast as 250ms, leading to terabytes of data growth.
Storing, indexing, and querying this data efficiently without breaking the bank is a significant challenge. A single chain can require petabytes of storage, and maintaining full historical datasets means ongoing infrastructure costs in the range of $4,000–$10,000 per month per chain on cloud providers (source).
2. Fast Data Mutation
Blockchain data isn’t just large—it’s complex. It contains high-cardinality attributes such as wallet addresses, transaction hashes, and smart contract interactions, making indexing and querying expensive. On top of that, the data isn’t static:
Frequent Reorganizations (Reorgs): Blocks can be invalidated and replaced, requiring real-time corrections to previously written data.
Backfills of Massive Datasets: Loading 8 billion+ rows (e.g., Ethereum’s historical data) in a reasonable timeframe (e.g., 12 hours) is non-trivial.
Fast Mutations & Event-Driven Updates: Blockchains are constantly updating. Handling 100ms block times means ingesting, mutating, and making data queryable instantly.
Traditional OLAP databases struggle with this level of real-time mutation, making incremental processing essential.
3. Real-Time Incremental Analytics, Without Full Recomputations
Unlike batch ETL systems where data is transformed in predefined intervals, crypto data requires continuous, incremental updates. Some critical use cases include:
Tracking Real-Time Asset Balances: A user’s wallet balance must be recalculated every time a transaction occurs—potentially millions of times per day.
Aggregating Market Data (OLHC): Crypto trading platforms need real-time price updates based on high-frequency swaps.
Detecting Anomalies & Fraud: Identifying suspicious transactions requires immediate insights across billions of historical records.Running full table recomputations every few minutes isn't feasible, which is why a streaming-first approach is required.
4. High-Performance Queries, with Great Flexibility
Crypto applications like Zyre rely on both point queries and large-scale analytical queries. Some common ones include:
Fetching specific transactions or NFT balances instantly (e.g., user 0x4838…5f97).
Aggregating trading volumes for specific assets over different timeframes (e.g. over the last 7 days).
Analyzing wallet activity trends over time.
With high-cardinality datasets and constant updates, these queries must be optimized for performance—without requiring full table scans.
Why Existing Solutions Fell Short
Solving these challenges requires a specialized approach to database architecture.
Zyre evaluated several tools before choosing Timeplus and found limitations:
Apache Flink
Batch-oriented workflows couldn’t support real-time updates
Large footprint and high operational overhead
RisingWave
Using S3 as the primary storage is much slower than NVMe SSD
Since the S3 was too slow and also too expensive, the team had to deploy its own MinIO cluster to achieve higher throughput (100K EPS). MinIO introduced extra cost and maintenance effort
RisingWave was hours behind the message queues during backfills
Missing real-time queries with customizable SQL for alerting
ClickHouse
Struggled with fast point queries
Struggled with fast range queries
Inefficient data mutation and update
Lack of streaming computation and real-time emit
Timeplus: A Unified Real-Time Platform
After exploring various solutions, the Zyre team found that existing options couldn't fully address the unique challenges of crypto data engineering. Through close collaboration with the Timeplus team, they built an innovative solution that effectively tackles the four major challenges outlined above.
1. Mutable Streams for High-Cardinality, Fast-Mutating Data
Since EVM and other modern chains are general-purpose computing platforms, they support many different applications, each with its own event structure (to learn more about this, check out this episode of our Streaming Caffeine podcast: Streaming Caffeine E3: With Yaroslav (Goldsky) and Jove (Timeplus), to hear insights from Yaroslav, then Principal Software Engineer at Goldsky, a leading crypto data platform)
These applications require custom indexers to extract meaningful insights from raw blockchain data. Once indexed, events need to be streamed into Timeplus, where downstream tables are updated seamlessly.
The following diagram illustrates the data flow of the real-time pipelines:

Timeplus allows real-time pipelines to be reused for historical data backfills and reflows, making it easier to handle indexing errors or schema changes. Each Materialized View acts as a real-time pipeline, with a Mutable Stream as the pipeline output.
Supporting multiple chains is a significant challenge, especially when full-chain indexing can take weeks. Timeplus accelerates this process, enabling support for new chains in hours instead of weeks.
2. Secondary Indexes on Mutable Streams for High-Performance and Flexible Queries
Crypto data needs to be queried from multiple angles without duplicating data. Each mutable stream has 1 or more columns as the primary key, serving as the primary index. To improve the query performance, secondary indexes can be defined on other columns. This can optimize lookups for high-cardinality attributes, such as:
Fetching all actions for a specific contract or asset over the last 7 days
Retrieving all actions from a specific user within a given timeframe
Looking up actions by unique ID
Querying actions that occurred in a specific transaction
3. Changelog-based Incremental Aggregation
Full table recomputations are inefficient for mutable blockchain data. Timeplus’ materialized views enable incremental aggregation, allowing new insights to be derived from existing data. When data changes (e.g., due to chain reorgs), all dependent tables update automatically, ensuring consistency without expensive recomputation.
4. Hybrid Aggregation Powered by Hybrid Hash Tables
High-cardinality data often requires significant memory overhead. Timeplus’ hybrid hash table architecture flushes aggregations to disk as needed, reducing memory usage by over 60% and making backfilling more efficient.

10x Cost Efficiency, Onboard New Chains in Hours vs Weeks
For the first time, the company successfully indexed the entire Ethereum chain—something previous solutions failed to achieve. The team can now onboard new blockchains like Base and Arbitrum in hours instead of weeks. The peak indexing throughput is 700K EPS, much better than 100K EPS with RisingWave. Query performance has improved dramatically, delivering sub-second point queries and range aggregations in under 0.1 seconds. Additionally, infrastructure costs have been reduced by 50%, eliminating the penalties associated with S3 requests while using half the server resources compared to their previous setup with RisingWave.
It’s Not Just for Crypto
While this case study focuses on crypto, the lessons apply to any domain with high-velocity, mutable data:
Real-Time Reactivity: Handle data mutations (e.g., IoT sensor corrections, fraud detection rollbacks) without batch delays.
Unified Analytics: Replace fragmented OLAP + key-value systems with a single platform.
Cost Efficiency: Optimized storage and compute reduce TCO for petabyte-scale datasets.
Zyre's success underscores Timeplus Enterprise’s ability to unify real-time ingestion, mutable data handling, and flexible querying at scale. As data volumes grow and latency tolerances shrink, streaming databases are no longer a luxury—they’re a necessity.
Ready to try Timeplus Enterprise? Try for free for 30-days.
Join our Timeplus Community! Connect with other users or get support in our Slack community.