Real-Time Video Analytics and Monitoring
- Gang Tao
- 2 days ago
- 3 min read
Timeplus + Vision Model Inference
A machine learning vision model is a type of AI system trained to understand and interpret visual data—like images or videos—much like how humans use their eyes and brain to see and recognize things. In simple terms, It’s a computer program that "learns" from visual examples and can then:
Identify what’s in a picture (e.g., cats vs. dogs)
Detect objects (like people, cars, or faces)
Understand scenes (e.g., "a person walking on a rainy street")
Generate new images or describe what's happening in a video
Vision models can be used to analyze video footage in real time and trigger actions in response to security or safety threats, with applications in surveillance, industrial safety, and healthcare.

Timeplus, as a high-performance, streaming processing platform, is a great complementary component to build video analytics and monitoring applications, combined with visioning models. Here is an application I built, which can be used to analyze objects and violent behavior from any recorded video or online video from YouTube. You can find the code here, in our GitHub repo: https://github.com/timeplus-io/examples/tree/main/realtime_video_analytics
The diagram below shows how the application is implemented:

There are two components:
Python: Inference server
The inference server leverages OpenCV to process the frames from the video, and use two vision models to process the images.
ultralytics/yolov8n is used to detect object from the video
jaranohaal/vit-base-violence-detection is used to detect violent behavior from the video
Timeplus: Analytic sever
All the video metrics are stored in a Timeplus stream and can be processed in real-time. We build a dashboard to show those key metrics in real-time, users can monitor what's happening in the video as soon as something happens.
Here is one of the sample events the inference server generated from the video:
{
"timestamp": 1744650965.2081842,
"violence": {
"class": "LABEL_0",
"confidence": 0.5662581324577332
},
"detected_objects": [
{
"name": "bed",
"class": 59,
"confidence": 0.57137,
"box": {
"x1": 0,
"y1": 3.57532,
"x2": 824.53455,
"y2": 1053.63733
}
}
]
}
There are two analytic queries that monitor:
The total objects count detected in the last five seconds
WITH obj AS
(
SELECT
_tp_time AS time,
array_join(json_extract_array(raw, 'detected_objects')) AS detected_objects,
detected_objects:name AS name
FROM
video_stream_log
)
SELECT
count(*) as count, name, window_start
FROM
hop(obj, time, 1s, 5s)
GROUP BY
window_start, name
order by count desc
The violence score in the last five seconds
WITH vio AS
(
SELECT
_tp_time, raw:violence:class AS flag, if(flag != 'LABEL_0', cast(raw:violence:confidence, 'float64'), 0) AS vscore
FROM
video_stream_log
where _tp_time > now()-1h
)
SELECT
window_start, count(*) AS count, sum(vscore) AS svscore, svscore / count AS violence_rate
FROM
hop(vio, 1s, 5s)
GROUP BY
window_start
Summary
In today’s blog, I have demonstrated how you can build a real-time video stream analytic tool with Timeplus and machine learning vision models. Here are the benefits of this method:
Real-Time Analytics with Streaming SQL: Run low-latency, stateful analytics using familiar SQL on continuous video event streams—no need to learn new frameworks
Seamless Integration of Vision Models: Ingest unstructured video inference data (e.g., object & violence detection) directly into Timeplus using unstructured text formats
Powerful JSON data query and processing: Processing and extracting information from raw JSON text with maximum flexibility, no schema limitation
Try it yourself! Download Timeplus and join our Slack Channel to connect with our team and other users.