WHAT IS ksqlDB?
ksqlDB is a stream processing engine designed specifically to read data from Apache Kafka topics, create stateless/stateful transformations, and write them back to Apache Kafka. Data then has to be landed in other dedicated downstream systems for rich query capability. It was renamed from KSQL to ksqlDB with limited capabilities to query some of the derived state from stream processing functions. These “ad-hoc” queries are limited to quick lookups via primary key equality or range queries.
Where analytical queries are required, the best practice is to do the transformations in ksqlDB and use Kafka Connect to push the data to more dedicated systems such as search engines or data warehouses.
CHALLENGES WITH ksqlDB
Limited Query and Join Capabilities
The main challenge with ksqlDB is that it does not have the capability to answer any ad-hoc queries as a database or data warehouse would. It can only do certain key based lookups or range lookups. The data can also be joined in very limited ways based on primary key lookups.
Performance Overhead from Serialization and Deserialization
The performance of ksqlDB can be impacted by the time needed to serialize and deserialize data between Apache Kafka and RocksDB. The strong coupling to Kafka also has a big impact on performance: frequent data publication and retrieval from Kafka can increase latency and costs.
Challenges with State Store Management
ksqlDB state stores are notoriously difficult to maintain due to limitations in TTL management and other storage configurations. State stores are backed in Apache Kafka and thus require way more storage and network bandwidth overall for high availability and resilience.
Enter Timeplus
Timeplus is designed from the ground up in C++ based on database technology (Clickhouse in this case) but extended for Stream Processing. It leverages Clickhouse libraries and data structures under the hood in its process for extremely fast database operations such as filtering, projection, and aggregations.
For stream processing, it has created a native stream data structure as a first class citizen which does not require any coupling with Apache Kafka although it can integrate with it if required. This allows for a much simpler and more performant system for data ingestion, processing and analytics all in one single binary. Data products created within Timeplus can be pushed out to external systems via Streaming or or consumed via Ad-hoc querying. As such it can easily integrate into the wider ecosystem of systems that integrate with Apache Kafka or Database/BI Tools.
WHAT PAIN POINTS OF ksqlDB CAN WE ADDRESS?
Rich Ad-Hoc Query and Join Capabilities
Timeplus enables complex ad-hoc queries and multi-way joins, supporting lookups by any column. It provides seamless data access through various SDKs (Python, Go, JavaScript, Java) for developers and integrates easily with Database/BI tools for analysts.
Enhanced Performance and Reduced Resource Usage
Capable of handling large volumes of data with ease, Timeplus ensures seamless scalability as your data grows. Its high-performance architecture ensures quick processing and analysis of data streams, mitigating performance bottlenecks.
Lower Total Cost of Ownership
Timeplus reduces the total cost of implementing streaming analytics by streamlining state store management and lowering operational overhead. Its efficient resource usage and automatic replication contribute to cost savings and ease of maintenance.
HOW DOES ksqlDB COMPARE WITH TIMEPLUS?
ksqlDB offers a SQL interface, integration with Apache Kafka, stateful processing, scalability, and great security features. However, it has its limitations, including deep coupling with Kafka, heavy resource consumption, and not specifically designed for analytics.
Along with shared features, Timeplus Proton offers additional benefits compared to ksqlDB. Let's see 5 reasons why developers are choosing Timeplus as an alternative to ksqlDB.
Is it flexible?
ksqlDB
Deep Coupling with Kafka
ksqlDB is tightly coupled with Kafka, at the deployment level. Each ksqlDB server is binded with a Kafka cluster, ksqlDB uses Kafka as storage to keep lots of internal state. There is no way to process streams from different clusters unless you route the data from different clusters into the same Kafka. Additionally, while running ksqlDB, it will impact the Kafka cluster by creating more internal topics with extra read and write.
TIMEPLUS
High Flexibility with Native and External Streams
Timeplus streams natively which can produced from SDKs, data integration frameworks, or files. Timeplus also supports Kafka topics as external streams which can be read from or written to from other derived streams. When using an external Kafka stream as a source, data is not required to be stored in Timeplus unless a derived materialized view is created. Timeplus is not coupled to a single cluster and can read and write to different Kafka clusters in the same way.
Is it efficient?
ksqlDB
Heavy Resource Consumption
Every SQL query run on ksqlDB is a Kafka Streams application, which creates its own worker threads, adding overhead to every query.
ksqlDB uses Kafka topics to store state changelogs and using RocksDB to materialize these changelogs into tables, which means more resource consumption for the state.
TIMEPLUS
Lightweight and Efficient
Timeplus is lightweight, written in C++ and uses highly performant ClickHouse libraries and data structures for powering historical queries alongside streaming queries.
Timeplus contains a dedicated internal format, optimized for SIMD (highly performant CPU access), and can process over 1 million records per second on a commodity computer.
Is it for analytic workloads?
ksqlDB
Not Designed for Rich queries
With Kafka, ksqlDB can support streaming processing, but historical queries can only be filtered via primary key or range query. Analytical queries doing large aggregations also do not perform well compared to Timeplus' columnar data structures
TIMEPLUS
Supports Rich Row-Based and Analytical Queries
Timeplus supports queries filtering on both primary key and non-primary key columns. Timeplus provides flexibility to the end user and contains data structures to further speed up row-retrieval via indices and column families as well as columnar formats for blazing fast analytics.
Does it support UDFs?
ksqlDB
Java UDF
ksqlDB uses Java-based UDFs. Compared to JavaScript UDFs, it's not as easy to use, and there's added complexity of handling JVM version, Kafka version, or dependency versions.
TIMEPLUS
SQL/Javascript/Python-Based UDFs
Timeplus supports 3 different syntaxes to created UDFs and UDAFs for leveraging different functionalities and libraries. They are easy to deploy and provide data engineers with a choice to bring their existing code to perform complex functions in addition to the hundreds of built-in functions from Clickhouse.
Does it support integrations with applications or dashboards?
ksqlDB
Limited Integrations
ksqlDB has SDKs written in Java and with HTTP Pull queries, it requires additional development to integrate the analytics with visualizations.
TIMEPLUS
Grafana Plugins and Various Drivers/SDKs
Timeplus includes a Grafana plugin which self-refreshes based on SQL queries as configuration. JDBC drivers make it easy for existing BI tools to make use of the data in Timeplus. SDKs for Javascript/Go/Python also exist for operational applications that want to integrate with refined data in Timeplus. Websocket APIs make it particularly powerful for developing live data applications.
Does it integrate with external databases?
ksqlDB
Kafka Connect
ksqlDB has an embedded Kafka Connect engine to integrate various systems' data to Kafka from where ksqlDB streams can be utilized. This requires writing the data out to Kafka first and then onto the other systems. This creates duplication of effort and complexity of maintenance.
TIMEPLUS
Timeplus Connectors with Native Streams
Timeplus Connectors is a component that can be used to integrate various systems (even message queues) with Timeplus Streams. These include Pulsar, NATs, Websockets, and can be extended to include hundreds of other systems if required. The data is pushed or pulled directly from Timeplus and thus requires less hops and minimizes duplication.
WHO IS ksqlDB FOR?
ksqlDB primarily serves those users who want a data transformation layer for use in other applications such as microservices or databases. The incoming data has to come via Kafka topics and the majority of the data is mostly being served via other Kafka topics to downstream systems.
This is useful in environments that make heavy use of Apache Kafka and also has dedicated databases or data warehouses where data will be served from. As such, it requires a certain pre-existing investment in Kafka infrastructure and skills as well as other database infrastructure.
HIGH LEVEL DESIGN
ksqlDB is designed as a distributed stream processing cluster with SQL as its primary API. The end user uses SQL as a domain specific language (DSL) to create data flow pipelines reading from and writing to Kafka at each step of the way. The SQL statements each get executed as a Kafka Streams Java App internally, which parses the data and transforms it based on the query definition to write it out to Apache Kafka. ksqlDB has local intermediate state stores within each Kafka Streams application using RocksDB which can be queried using limited query predicates.
ksqlDB | Timeplus | |
---|---|---|
License | Confluent Community License (CCL) | Timeplus Proton: Apache 2.0. Enterprise license with advance features is also available. |
Language | Java (on Kafka Stream) | C++ |
Resource consumption | High | Low |
Stateful streaming processing | Yes | Yes |
Bounded, pull based query | Yes | Yes |
Unbounded, continuously push base query | Yes | Yes |
Materialized view and table concept | Yes (on top of RocksDB) | Yes (on top of ClickHouse) |
Kafka Connection | Yes (source and sink) | Yes (external stream) |
Support for other messaging systems | Kafka only (dependency) | Pulsar, Kinesis, and more (supports but not dependent) |
Cluster and HA | Yes | Yes |
User-Defined Function | Java | JavaScript |
Performance | Good | Millions of events ingested and processed per second |
Security | Yes, role-based access control | Yes, Timeplusd Authn/Authz based on ClickHouse users and roles |
Key Enterprise features | - | Clustering/ Replication based on RAFT; Mutable Streams; Integration with sources (WebSocket, Pulsar, NATS, etc)
|
Summary
Disclaimer: This comparison involves products not owned by Timeplus. The information provided is based on public sources and personal research. We do not endorse or guarantee the accuracy of the product details. Please do your own research before making any decisions.
Looking for the cloud?
We've got you covered with our fully-managed cloud service. Rest assured with zero ops, enterprise-grade security, and pay-as-you-go pricing.
Try Timeplus Enterprise Cloud, risk free.