top of page
WHY TIMEPLUS?

Timeplus vs. Apache Flink

See why developers are choosing Timeplus over Apache Flink.

WHAT IS APACHE FLINK?
flink_squirrel_500.png

Apache Flink is a distributed stream processing engine supporting stateful transformations on bounded and unbounded sources of data. These sources include Apache Kafka and AWS Kinesis among others. Results of computations can be written out to external sinks such as MongoDB and ElasticSearch.

WHO IS APACHE FLINK FOR?

Apache Flink primarily serves those users who want a distributed computation layer which reads data from streaming or batch sources and transforms the data as it arrives and pushes them to other data systems downstream.

 

Originally written with Java/Scala APIs, this was catering to software developers more than data engineers. Recently, Flink SQL was added to allow data engineers to define streaming jobs using SQL.

HIGH LEVEL DESIGN

Flink is designed as a distributed computation engine where jobs written in Java/Scala or SQL are compiled and scheduled as tasks which are spread over multiple machines to do a series of data transformations. Flink itself does not store any data which is queryable though it stores internal computational state in a configurable state store such as RocksDB.

CHALLENGES WITH APACHE FLINK

Inefficiency of JOIN Operations

Since Flink does not store data locally, JOINs require reading and broadcasting data to different tasks, which can be inefficient compared to local lookups.

Lack of External Queries for Derived Data

Flink does not support querying derived data externally without recomputing it, unlike streaming databases that allow querying from pre-computed results.

Need to Write Computed Results to Sinks

Aggregations and computations in Flink are created within Flink itself but need to be written to external sinks (like relational databases or search engines) for further querying and aggregation.

Enter Timeplus

Vector.png
Vector-1.png
Vector-2.png

Timeplus is designed from the ground up in C++ based on database technology (Clickhouse in this case) but extended for Stream Processing. It leverages Clickhouse libraries and data structures under the hood in its process for extremely fast database operations such as filtering, projection, and aggregations.

 

For stream processing, it has created a native stream data structure as a first class citizen which does not require any coupling  with Apache Kafka although it can integrate with it if required. Timeplus also has its own internal data format which avoids serializing and deserializing the data at multiple steps of the processing pipeline unlike general systems like Apache Flink. This allows for a much simpler and more performant system for data ingestion, processing and analytics all in one single binary. Data products created within Timeplus can be queried ad-hoc in Timeplus itself or pushed out to external systems via Streaming. As such it can easily integrate into the wider ecosystem of systems that integrate with Apache Kafka or Database/BI Tools.

WHAT PAIN POINTS OF APACHE FLINK CAN WE ADDRESS?

Efficient Query and Aggregation Capabilities

Timeplus supports rich ad-hoc queries, including multi-way JOINs, and allows for quick lookups by any column. It handles queries and aggregations using historical stores optimized for row-based or column-based queries, significantly reducing the need for full recomputation and delivering query results in milliseconds or seconds, compared to the minutes or hours required by Flink.

Enhanced Performance and Resource Efficiency

Timeplus operates independently of Apache Kafka and utilizes performance optimizations to process larger volumes of data with fewer hardware resources. By grouping and querying data more effectively, Timeplus reduces the latencies associated with multiple stages in a Flink pipeline, supporting more SQL functionalities and simplifying enterprise adoption.

Local Management and Optimizations

Timeplus eliminates the need for external consensus mechanisms like Apache Zookeeper by managing all backup and replication processes internally within its cluster. Leveraging technologies like Clickhouse, Timeplus benefits from SQL-based optimizers that enhance query performance, avoiding the overhead of writing to downstream systems.

HOW DOES APACHE FLINK COMPARE WITH TIMEPLUS?

Apache Flink offers Java, Scala, and Python as the main languages with a recently added SQL API, integration with Apache Kafka, stateful processing, scalability, and great security features. However, it has its limitations, including complex setup and maintenance, requiring much more hardware and resources for achieving the throughputs in Timeplus. 

 

The computational framework design necessarily requires the data to be shuffled (often over the network) and thus adds to lag in computing results. Timeplus’s design is much simpler such that it avoids unnecessarily serialization and deserialization as well as unnecessarily shuffling data over the network to output the results from each streaming “job”.

Is it simple?

APACHE FLINK

Requiring multiple systems for ingestion as well as downstream querying

Apache Flink requires at least an upstream ingestion system such as Apache Kafka and at least one downstream system to output computed results so that they can be further queried or analyzed.

TIMEPLUS

Simple and flexible with native and external streams

Timeplus streams natively which can be produced from SDKs, data integration frameworks, or files. As such, one can start with only Timeplus. Timeplus also supports Kafka topics as external streams which can be read from or written to from other derived streams. When using an external Kafka stream as a source, data is not required to be stored in Timeplus unless a derived materialized view is created. Timeplus is not coupled to a single cluster and can read and write to different Kafka clusters in the same way.

Is it efficient?

APACHE FLINK

Heavy Resource Consumption

Every job in Flink (whether from Java/Scala or Flink SQL) creates a number of tasks which are distributed across many machines and the data is pushed to these tasks. This is inefficient and affects lag in performance compared to operating on the data that needs to be queried together from local stores in Timeplus. In addition, Apache Flink requires the computational jobs to be checkpointed to an external store such as RocksDB or S3 which can take minutes or hours in some instances depending on the size of the computational state.

TIMEPLUS

Lightweight and Efficient

Timeplus is lightweight, written in C++ and uses highly performant ClickHouse libraries and data structures for powering historical queries alongside streaming queries.

 

Timeplus contains a dedicated internal format, optimized for SIMD (highly performant CPU access), and can process over 1 million records per second on a commodity computer. In Timeplus, the checkpoints happen locally but are very cheap as all the computed data is already stored in efficient queryable state stores.

Is it for analytic workloads?

APACHE FLINK

Not Designed for Performance

While the data can be queried with aggregations ad-hoc using Flink SQL, the underlying engine does not make use of database technology and has to recompute all the results from source data. This ends up with results being in seconds to minutes for certain queries versus milliseconds in Timeplus. In the worst case scenario, the results of ad-hoc queries cannot even be satisfied in reasonable time (for data analysts or dashboards).

TIMEPLUS

Supports Rich Row-Based and Analytical Queries

Timeplus supports queries filtering on both primary key and non-primary key columns. Timeplus provides flexibility to the end user and contains data structures to further speed up row-retrieval via indices and column families as well as columnar formats for blazing fast analytics. As such, it is a perfect fit for data analysts using SQL query browsers or BI/Dashboard tools to consume analytical results in milliseconds or seconds and have fresh views of the business.

Does it support UDFs?

APACHE FLINK

Java/Python UDFs

Apache Flink can be programmed in Java/Scala to handle flexible computations as well as write UDFs/UDAFs in Java and in Python. Deploying them does involve uploading files to the workers rather than via SQL.

TIMEPLUS

SQL/Javascript/Python-Based UDFs

Timeplus supports three different syntaxes to create UDFs and UDAFs for leveraging different functionalities and libraries. They are easy to deploy and provide data engineers with a choice to bring their existing code to perform complex functions in addition to the hundreds of built-in functions from Clickhouse.

Does it support integrations with applications or dashboards?

APACHE FLINK

Limited Integrations

Although Apache Flink does have integrations such as JDBC connector/gateway, its usage is complex and the results are often not queryable within reasonable time for operational applications or dashboards. As such, the recommended practice is to use the Apache Flink connectors to write the data to Apache Kafka, dedicated databases or warehouses first before being consumed by other systems.

TIMEPLUS

Grafana Plugins and Various Drivers/SDKs

Timeplus includes a Grafana plugin which self-refreshes based on SQL queries as configuration. JDBC drivers make it easy for microservices written in Java or existing BI tools to make use of the data in Timeplus. SDKs for Javascript/Go/Python also exist for operational applications that want to integrate with refined data in Timeplus. Websocket APIs make it particularly powerful for developing live data applications.

Does it integrate with external databases?

APACHE FLINK

Connectors to External Systems

There are a number of connectors to external systems that support the transactional model of Flink. The best practice is generally to write the data to Apache Kafka from which there are hundreds of connectors to push to external databases.

TIMEPLUS

Timeplus Connectors with Native Streams

Timeplus Connectors is a component that can be used to integrate various systems (even message queues) with Timeplus Streams. These include Pulsar, NATs, WebSockets, and can be extended to include hundreds of other systems if required. The data is pushed or pulled directly from Timeplus and thus requires less hops and minimizes duplication.

Apache Flink
Timeplus
License
Apache 2.0
Timeplus Proton: Apache 2.0. Enterprise license with advance features is also available.
Language
Java/Scala
C++
Resource consumption
High
Low
Stateful streaming processing
Yes
Yes
Bounded, pull based query
Yes
Yes
Unbounded, continuously push base query
Yes
Yes
Materialized view and table concept
No (can be done with another system)
Yes (on top of ClickHouse)
Kafka Connection
Yes (source and sink)
Yes (external stream)
Support for other messaging systems
Kafka, Kinesis, RabbitMQ, Pulsar
Pulsar, Kinesis, and more (supports but not dependent)
Cluster and HA
Yes
Yes
User-Defined Function
Java
JavaScript, Python
Throughput
Good, but requires lots of machines to get the throughput up
Millions of events ingested and processed per second
Latency
100 milliseconds or seconds, to minutes
Milliseconds
Security
Yes, role-based access control
Yes, Timeplusd Authn/Authz based on ClickHouse users and roles
Expertise required to run
High
Low
Expertise required to develop
High (Java, Scala, Python)
Low/medium (SQL)
Key Enterprise features
-
Clustering/ Replication based on RAFT; Mutable Streams, Integration with sources (WebSocket, Pulsar, NATS, etc.)

Summary

Disclaimer: This comparison involves products not owned by Timeplus. The information provided is based on public sources and personal research. We do not endorse or guarantee the accuracy of the product details. Please do your own research before making any decisions.

Join Our Community

Connect with other users or get support in our Slack community.

Sign Up for Our Newletter

Stay up to date on feature launches, resources, and company news.

Try Timeplus Enterprise for Free

Deploy your way with a 30-day free trial.
No credit card required.

Looking for the cloud?

We've got you covered with our fully-managed cloud service. Rest assured with zero ops, enterprise-grade security, and pay-as-you-go pricing.

Try Timeplus Enterprise Cloud, risk free.

Start your 14-day free trial

bottom of page