Skip to main content
The detect action is the primary detection action in RES. It analyzes camera streams using YOLO-based models to detect objects, track them across frames with stable identities, and monitor how long they remain in specific zones.

What it does

The detect action provides a complete detection and tracking pipeline:
  1. Captures frames from one or more RTSP camera streams using the optimized StreamManager
  2. Runs inference using configured YOLO models (TensorRT or PyTorch)
  3. Filters detections based on allowed classes, detection zones, and exclusion zones
  4. Tracks objects across frames using BoTSORT with re-identification support
  5. Monitors zone dwell time calculating how long each tracked object stays in a zone
  6. Saves evidence including detection crops, annotated frames, and video clips

Requirements

The action fails if these requirements aren’t met. Configure them in the action detail before running.
RequirementDescription
StreamAt least one valid RTSP stream URL configured in action_detail.streams
Model labelsAt least one detection model with configured labels
Confidence thresholdValue between 0.0 and 1.0 (default: 0.5)
Detection zonesAt least one detection zone per stream (streams without zones are skipped)

Optional settings

SettingDescriptionDefault
max_runtime_secondsMaximum seconds to run before timing out60 (0 = unlimited for tracking mode)
processing_fpsFrame rate for analysisStream native
enable_tracking_and_zone_dwellEnable object tracking and dwell time monitoringfalse

Operating modes

The detect action operates in two distinct modes depending on your configuration.

Standard detection mode

When enable_tracking_and_zone_dwell is set to false, the action runs in standard detection mode:
  • Processes frames until a detection is found or timeout is reached
  • Returns immediately when objects matching your criteria are detected
  • Saves a single detection image and video clip
  • Best for: Alert-based rules where you need to know if something appeared

Continuous tracking mode

When enable_tracking_and_zone_dwell is set to true, the action runs continuously:
  • Processes frames indefinitely (set max_runtime_seconds to 0)
  • Tracks objects across frames with stable identities
  • Calculates zone dwell time for each tracked object
  • Yields results continuously for downstream edge conditions
  • Best for: Monitoring scenarios where you need to track presence duration

Multi-stream support

The detect action can process multiple camera streams simultaneously:
FeatureSingle streamMulti-stream
ProcessingSequential frame captureBatch capture across all streams
ZonesZones apply to the single streamEach stream has its own zone configuration
TrackingOne tracker instanceSeparate tracker per stream with unique IDs
OutputSingle set of attachmentsAttachments labeled by stream ID
When using multi-stream mode, each stream must have at least one detection or exclusion zone configured. Streams without zones are automatically released and skipped.

Detection zones

Zones define regions of interest in each camera frame. Coordinates are specified as percentages (0-100) of the frame dimensions.

Zone types

TypeBehavior
DetectionOnly detections inside these zones are considered valid
ExclusionDetections inside these zones are ignored
ROIRegion of interest for display purposes only

Zone filtering logic

  1. If a detection falls inside any exclusion zone, it’s discarded
  2. If detection zones are configured, the detection must be inside at least one detection zone
  3. If no detection zones are configured, all non-excluded detections are valid
Keep zones simple with fewer points. Complex polygons with many vertices increase filtering overhead on every frame.

Object tracking deep dive

When tracking is enabled, the detect action uses BoTSORT (Bag of Tricks for SORT) with re-identification to maintain stable object identities across frames.

How tracking works

1

Detection to tracker

Each frame’s detections are passed to the TrackerWrapper, which manages the BoTSORT tracker instance. The tracker receives bounding boxes, confidence scores, and class IDs.
2

Track association

BoTSORT uses a combination of motion prediction (Kalman filter) and appearance features (ReID) to associate new detections with existing tracks. This allows objects to be re-identified even after brief occlusions.
3

Unique ID assignment

Each tracked object receives a globally unique track ID formatted as: {stream_id:02d}{model_id:03d}{tracker_id:06d}. This ensures IDs are unique across streams and models.
4

Track persistence

Track IDs persist across frames as long as the object is visible. The system maintains mappings between tracker-assigned IDs and globally unique IDs, with automatic cleanup of stale tracks after 120 seconds.

Track ID format

Track IDs are constructed to be globally unique:
ComponentFormatExample
Stream ID2 digits01
Model ID3 digits001
Tracker ID6 digits000042
Full ID11 digits01001000042
This format ensures that even when processing multiple streams with multiple models, every tracked object has a unique identifier.

Re-identification (ReID)

BoTSORT includes appearance-based re-identification:
  • Extracts visual features from each detected object
  • Compares features when associating detections to tracks
  • Helps recover tracks after occlusions or brief disappearances
  • Particularly effective for person tracking
ReID requires the tracker configuration to have with_reid: true. The default BoTSORT configuration in RES has this enabled.

Track lifecycle

StateDescription
NewDetection doesn’t match any existing track, new track created
ActiveTrack successfully associated with detection this frame
LostTrack not matched for several frames, using motion prediction
RemovedTrack lost for too long, removed from tracker

Zone dwell time

When tracking is enabled, the ZoneDwellEngineStable calculates how long each tracked object remains in detection zones.

How dwell time works

1

Zone entry detection

When a tracked object’s bounding box enters a detection zone, the engine records the entry timestamp and zone index.
2

Continuous monitoring

Each frame, the engine updates the position of all tracked objects and checks zone membership. Objects can move between zones, and the engine tracks entry/exit events.
3

Dwell time calculation

The dwell time is calculated as the difference between the current time and the zone entry time. This is displayed in the format MM:SS on the detection label.
4

Label enrichment

Detection labels are enriched with the stable ID and dwell time: {class_name} ID:{stable_id} {dwell_time}. For example: person ID:A1 02:34.

Stable IDs

The zone dwell engine assigns human-readable stable IDs to tracked objects:
FormatExampleDescription
Letter + NumberA1, B2, C3Sequential assignment within each stream
Root ID trackingMaintained across zone transitionsSame object keeps same ID when moving between zones

Zone events

The engine logs zone entry and exit events to the database:
EventData recorded
EntryTrack ID, zone ID, entry timestamp, stream ID
ExitTrack ID, zone ID, exit timestamp, dwell duration

Image-from-previous-actions mode

The detect action can also run detection on images from previous workflow actions instead of live streams.

When this mode activates

  • No streams are configured in the action detail, AND
  • Previous actions have attachments in their results

Supported attachment formats

FormatSource
attachments listNew unified ResultAttachment schema
cropped_object_framesLegacy base64 image list
records[].attachmentsFrom zone_dwell_reader action
This mode is useful for cascading detection workflows where you want to run a secondary model on crops from an initial detection.

Result format

During processing (tracking mode)

While the action is running in tracking mode, it yields results with:
FieldValue
feature_resultdetection_in_progress
is_successtrue
extras.frame_countNumber of frames processed
extras.detection_countTotal detections this frame
extras.all_detectionsList of detection objects with tracking info
extras.stream_resultsPer-stream detection data

On completion

Outcomefeature_resultDescription
Objects detectedobjects_detectedAt least one valid detection found
No objectsno_objects_detectedNo objects matched criteria within timeout
Timeoutmax_runtime_exceededMax runtime reached (saves last frame)
Edge conditionstopped_by_edge_conditionWorkflow edge condition triggered stop
Operation windowoperation_window_exceededOutside configured operation hours
Stream errorstream_errorFailed to connect to camera

Possible errors

Error message: No model labels configured for this action detailWhat happened: The action detail doesn’t have any model labels assigned.How to fix:
  • Open the action detail configuration
  • Add at least one model label with a valid model_id, name, and model_label_id_reference
  • The model_label_id_reference must be a valid class ID from the detection model
Error message: Failed to load any detection modelsWhat happened: None of the configured models could be loaded from disk.How to fix:
  • Verify model files exist at the configured paths
  • Check file permissions allow reading the model files
  • For TensorRT engines, ensure they were compiled for the correct GPU architecture
  • Try using the .pt model as a fallback if .engine fails
Error message: Failed to acquire any streamsWhat happened: Could not connect to any of the configured camera streams.How to fix:
  • Verify the RTSP URL is correct and accessible
  • Check network connectivity between RES and the camera
  • Ensure the camera is powered on and accepting connections
  • Check for firewall rules blocking RTSP traffic (port 554)
  • Verify stream credentials if authentication is required
Error message: No streams have matching zones configuredWhat happened: All configured streams were skipped because none had detection or exclusion zones.How to fix:
  • Add at least one detection zone to each stream you want to process
  • Zones must be associated with the correct stream_id
  • Check that zone coordinates are valid percentages (0-100)
Error message: Stream capture failed - status changed to ‘failed_to_capture’What happened: The stream was connected but frames stopped arriving.How to fix:
  • The StreamManager will automatically attempt recovery
  • If the issue persists, check camera health and network bandwidth
  • The stream may have timed out due to network issues
  • Camera may be restarting or experiencing hardware issues

Performance considerations

Model selection

Use TensorRT engines for best performance. They’re 2-5x faster than PyTorch models on NVIDIA GPUs. Ensure engines are compiled for your specific GPU architecture.

Stream resolution

The action automatically optimizes stream resolution based on model input size. Lower resolutions reduce bandwidth and processing time.

Zone complexity

Keep zones simple with fewer vertices. Complex polygons increase filtering overhead. Use rectangular zones when possible.

Multi-stream batching

When using multiple streams, frames are captured and processed in batches for efficiency. This reduces per-frame overhead.

Memory management

The detect action implements several optimizations to manage memory during long-running tracking sessions:
OptimizationIntervalDescription
Track pruning120 secondsStale track IDs removed after not being seen
Track mapping limit500 entriesOldest mappings removed when limit exceeded
Garbage collectionEvery 3000 framesPython GC triggered to free memory
CUDA cache clearingWith GCGPU memory cache emptied periodically
Memory threshold2000 MBHeavy cleanup triggered when exceeded
For very long-running tracking sessions (hours), monitor memory usage. The automatic cleanup should prevent issues, but edge cases may require action restarts.

Practical examples

Example 1: Detecting unauthorized access

Scenario: You want to detect when people enter a restricted area during business hours. Configuration:
  • Set up one detection zone covering the restricted area
  • Configure a person detection model with appropriate confidence threshold
  • Set max_runtime_seconds to 60 for quick detection
  • Leave tracking disabled since you only need to know if someone entered
Expected behavior: The action captures frames and returns as soon as a person is detected in the zone. A detection image and video clip are saved for review.

Example 2: Monitoring queue wait times

Scenario: You want to track how long customers wait in a queue area. Configuration:
  • Draw a detection zone covering the queue area
  • Enable enable_tracking_and_zone_dwell
  • Set max_runtime_seconds to 0 for continuous operation
  • Configure edge conditions to trigger when dwell time exceeds threshold
Expected behavior: The action continuously tracks people in the queue, assigning stable IDs and calculating dwell time. Detection labels show person ID:A1 05:23 indicating person A1 has been waiting 5 minutes 23 seconds. Edge conditions can trigger violations when wait times exceed limits.

Example 3: Multi-camera perimeter monitoring

Scenario: You want to monitor multiple entry points simultaneously. Configuration:
  • Add multiple streams to the action detail
  • Configure detection zones for each stream’s entry point
  • Enable tracking to maintain identities across frames
  • Set appropriate confidence thresholds for outdoor conditions
Expected behavior: All streams are processed in parallel. Each stream has independent tracking with globally unique IDs. Detections from any stream can trigger the workflow’s edge conditions.