Real-Time Video Analytics and Monitoring

Timeplus + Vision Model Inference

A machine learning vision model is a type of AI system trained to understand and interpret visual data—like images or videos—much like how humans use their eyes and brain to see and recognize things. In simple terms, It’s a computer program that "learns" from visual examples and can then:

Identify what’s in a picture (e.g., cats vs. dogs)
Detect objects (like people, cars, or faces)
Understand scenes (e.g., "a person walking on a rainy street")
Generate new images or describe what's happening in a video

Vision models can be used to analyze video footage in real time and trigger actions in response to security or safety threats, with applications in surveillance, industrial safety, and healthcare.

Timeplus, as a high-performance, streaming processing platform, is a great complementary component to build video analytics and monitoring applications, combined with visioning models. Here is an application I built, which can be used to analyze objects and violent behavior from any recorded video or online video from YouTube. You can find the code here, in our GitHub repo: https://github.com/timeplus-io/examples/tree/main/realtime_video_analytics

The diagram below shows how the application is implemented:

There are two components:

Python: Inference server

The inference server leverages OpenCV to process the frames from the video, and use two vision models to process the images.

ultralytics/yolov8n is used to detect object from the video
jaranohaal/vit-base-violence-detection is used to detect violent behavior from the video

Timeplus: Analytic sever

All the video metrics are stored in a Timeplus stream and can be processed in real-time. We build a dashboard to show those key metrics in real-time, users can monitor what's happening in the video as soon as something happens.

Here is one of the sample events the inference server generated from the video:

{
   "timestamp": 1744650965.2081842,
   "violence": {
       "class": "LABEL_0",
       "confidence": 0.5662581324577332
   },
   "detected_objects": [
       {
           "name": "bed",
           "class": 59,
           "confidence": 0.57137,
           "box": {
               "x1": 0,
               "y1": 3.57532,
               "x2": 824.53455,
               "y2": 1053.63733
           }
       }
   ]
}

There are two analytic queries that monitor:

The total objects count detected in the last five seconds

WITH obj AS
 (
   SELECT
     _tp_time AS time, 
     array_join(json_extract_array(raw, 'detected_objects')) AS detected_objects, 
     detected_objects:name AS name
   FROM
     video_stream_log
 )
SELECT
 count(*) as count, name, window_start
FROM
 hop(obj, time, 1s, 5s)
GROUP BY
 window_start, name
order by count desc

The violence score in the last five seconds

WITH vio AS
 (
   SELECT
     _tp_time, raw:violence:class AS flag, if(flag != 'LABEL_0', cast(raw:violence:confidence, 'float64'), 0) AS vscore
   FROM
     video_stream_log
   where _tp_time > now()-1h
 )
SELECT
 window_start, count(*) AS count, sum(vscore) AS svscore, svscore / count AS violence_rate
FROM
 hop(vio, 1s, 5s)
GROUP BY
 window_start

Summary

In today’s blog, I have demonstrated how you can build a real-time video stream analytic tool with Timeplus and machine learning vision models. Here are the benefits of this method:

Real-Time Analytics with Streaming SQL: Run low-latency, stateful analytics using familiar SQL on continuous video event streams—no need to learn new frameworks
Seamless Integration of Vision Models: Ingest unstructured video inference data (e.g., object & violence detection) directly into Timeplus using unstructured text formats
Powerful JSON data query and processing: Processing and extracting information from raw JSON text with maximum flexibility, no schema limitation