Swarm Intelligence and the Power of Stream Data Virtualization

Gang Tao
Mar 10
4 min read

Updated: Mar 19

In an increasingly interconnected world, the ability to process and analyze streaming data in real time has become essential. Stream data virtualization is a paradigm that enables organizations to access, integrate, and process data from multiple real-time sources without necessarily requiring physical movement or replication of all the data. This approach aligns with the principles of distributed intelligence, a concept that Kevin Kelly explores in Out of Control, which is one of my favorite books, particularly in his discussion of swarms.

Cover for Kevin Kelly's 1992 book Out of Control — **Cover for Kevin Kelly's 1992 book** ***Out of Control***

The Swarm as a Model for Stream Data Virtualization

In Out of Control, Kevin Kelly presents the idea of swarm intelligence—where decentralized, autonomous agents collaborate to produce emergent behavior. Nature provides many examples of this, from ant colonies optimizing food collection to flocking birds synchronizing their movement without centralized control. In the same way, stream data virtualization leverages a decentralized model where multiple data streams interact dynamically, allowing systems to self-organize and adapt without a rigid, predefined structure.

Traditional data architectures rely on centralization—data warehouses, batch processing, and scheduled ETL jobs. However, in a modern streaming architecture, data remains distributed across various sources while virtualization technology provides a unified, real-time view. This model mirrors the decentralized and cooperative nature of swarm intelligence.

Key Principles of Swarm Intelligence in Stream Data Virtualization

Decentralization: Just as no single ant dictates the actions of a colony, no single system component controls the entirety of a streaming data platform. Instead, data flows from multiple sources, and virtualization platforms dynamically adjust to changing data patterns.

Adaptability: Swarm-based systems are resilient to change. Likewise, stream data virtualization enables organizations to integrate new data sources without re-architecting their entire data ecosystem.

Parallelism: Swarm intelligence thrives on parallel processing—many simple agents working simultaneously. Similarly, stream data platforms leverage distributed processing engines like Apache Flink or Kafka Streams to handle high-velocity data in parallel.

Emergent Insights: In natural swarms, intelligence emerges from the interactions of individual agents. In streaming data virtualization, insights emerge from the fusion of multiple live data streams, allowing for real-time decision-making.

How Timeplus Stream Virtualization Works

Timeplus, as a high performance stream data processing tool, has been widely used to process data at edge. Customers typically will run multiple Timeplus instances close to where the data source generated and analyze these data locally to generate quick, real-time analysis results and take actions accordingly.

Timeplus provides a feature called external stream, such a Kafka external stream, an external stream is not a physical instance, as an external stream does not store any data locally at Timeplus, users can directly query these data without moving data. That is the most important trait of data virtualization, users can analyze data without moving it.

Timeplus has another type of external stream called Timeplus external stream, which means the user can query a stream from another timeplus without moving it.

To see an example of how to leverage Timeplus external streams, refer to this code repo.

The above diagram shows the overall deployment. As an example, there are three Timeplus instances: timeplus edge 1 and timeplus edge 2 are two edge nodes and Timeplus central is another Timeplus where we create external streams from edge1/2 and directly query the stream from the other two Timeplus instances.

First, we create a simulated stream on two edge nodes:

CREATE RANDOM STREAM network_flow_source
(
 'time' string default to_string(now()),
 'source' ipv4,
 'destination' ipv4,
 'protocol' string default if(rand()%2=0,'TCP','UDP'),
 'length' int64 default rand()%1000
) SETTINGS eps=100;


CREATE STREAM network_flow
(
 'time' string,
 'source' ipv4,
 'destination' ipv4,
 'protocol' string,
 'length' int64
);


CREATE MATERIALIZED VIEW mv_network INTO network_flow
AS
SELECT
 *
FROM
 network_flow_source;

The above SQL creates a random stream called network_flow_source to simulate the network flow data and then uses a materialized view mv_network to persist the source into a stream called network flow.

By running above SQL, both edge 1 and edge 2 Timeplus will be running two streams to simulate local network flow.

On the central Timeplus, we can run the following SQL to directly query those two streams on the edge Timeplus.

CREATE EXTERNAL STREAM IF NOT EXISTS network_edge_1
SETTINGS
   type = 'timeplus',
   hosts = 'timeplus_edge_1',
   db = 'default',
   user = 'proton',
   password = 'timeplus@t+',
   stream = 'network_flow';


CREATE EXTERNAL STREAM IF NOT EXISTS network_edge_2
SETTINGS
   type = 'timeplus',
   hosts = 'timeplus_edge_2',
   db = 'default',
   user = 'proton',
   password = 'timeplus@t+',
   stream = 'network_flow';


CREATE STREAM IF NOT EXISTS network_flow
(
 'time' string,
 'source' ipv4,
 'destination' ipv4,
 'protocol' string,
 'length' int64,
 'edge_name' string
);

CREATE MATERIALIZED VIEW mv_network INTO network_flow
AS
select time, source, destination, protocol, length, 'edge1' as edge_name from network_edge_1
union
select time, source, destination, protocol, length, 'edge2' as edge_name from network_edge_2;

The external streams network_edge_1 and network_edge_2 are the virtualized remote streams; users can directly query the stream from the edge node on the central Timeplus. In case it is required, users can create a materialized view like in this sample, to persist the analysis result locally. In lots of cases, users can run a higher level aggregation on the central node and leave fine granularity analysis on the edge node. This can save users a lot of cost to move data from edge to central node.

Conclusion

Stream virtualization can be applied in the following use cases, such as IoT and Edge Computing, where Sensor networks in smart cities or industrial automation utilize decentralized data flows to optimize operations, and Cybersecurity, where distributed monitoring of network traffic enables the rapid identification and mitigation of threats.

The future of data management is shifting toward models that embrace distributed, autonomous, and self-organizing principles. Inspired by nature and articulated in Kevin Kelly’s Out of Control, swarm intelligence provides a compelling framework for understanding and advancing stream data virtualization. By moving beyond centralized control and embracing decentralized, emergent systems, organizations can unlock the full potential of real-time data.

WHY TIMEPLUS?

PRODUCT

WHY TIMEPLUS?

PRODUCT

Swarm Intelligence and the Power of Stream Data Virtualization

The Swarm as a Model for Stream Data Virtualization

Key Principles of Swarm Intelligence in Stream Data Virtualization

How Timeplus Stream Virtualization Works

Conclusion

Related Posts

WHY TIMEPLUS?

PRODUCT

DEPLOYMENT

WHY TIMEPLUS?

PRODUCT

WHY TIMEPLUS?

PRODUCT

The Swarm as a Model for Stream Data Virtualization

Key Principles of Swarm Intelligence in Stream Data Virtualization

How Timeplus Stream Virtualization Works

Conclusion