Object identification in Python
Object detection is crucial in autonomy and manufacturing for enhancing efficiency, safety, and precision. In autonomous systems like vehicles or robots, it enables real-time navigation, obstacle avoidance, and interaction with dynamic environments. In manufacturing, object detection facilitates automation by enabling machines to recognize, sort, and manipulate parts, improving quality control, productivity, and reducing human error.
Nominal has 1st class support for video ingestion, analysis, time sychronization across sensor channels, and automated checks that signal when a video feature is out-of-spec.
Connect to Nominal
Get your Nominal API token from your User settings page.
See the Quickstart for more details on connecting to Nominal from Python.
Download video files
For convenience, Nominal hosts sample test data on Hugging Face. To download the sample data for this guide, copy-paste the snippet below.
(Make sure to first install huggingface_hub with pip3 install huggingface_hub
).
Display video
If you’re working in Jupyter notebook, here’s a shortcut to display the video inline in your notebook.
(For faster loading, only 20s of the full 225 MB video is shown above).
Inspect video metadata
After downloading the video, you can inspect its properties with OpenCV.
Download OpenCV with pip3 install opencv-python
.
Extracted CV data
For convenience, the computer vision (“CV”) features extracted from this video are available on Nominal’s Hugging Face.
RT-DETR, a pre-trained ML model, was used to generate this data. If you’re interested in how to extract computer vision data for your own video, please see the Appendix.
Let’s download and inspect this data.
frame | object | score | x_min | y_min | x_max | y_max |
---|---|---|---|---|---|---|
i64 | str | f64 | f64 | f64 | f64 | f64 |
5 | ”car” | 0.444338 | 973.15 | 350.76 | 1172.47 | 369.04 |
5 | ”car” | 0.403373 | 433.44 | 351.53 | 531.85 | 368.34 |
5 | ”motorbike” | 0.351543 | 298.63 | 350.76 | 438.73 | 369.57 |
6 | ”motorbike” | 0.40321 | 308.95 | 348.02 | 437.47 | 369.22 |
6 | ”car” | 0.400091 | 432.22 | 349.05 | 533.11 | 367.95 |
timestamps | total_object_count | motorbike_count | car_count | person_count |
---|---|---|---|---|
str | i64 | i64 | i64 | i64 |
”2011-11-11T11:11:11.208333” | 3 | 1 | 2 | 0 |
”2011-11-11T11:11:11.208333” | 3 | 1 | 2 | 0 |
”2011-11-11T11:11:11.208333” | 3 | 1 | 2 | 0 |
”2011-11-11T11:11:11.250000” | 6 | 2 | 3 | 1 |
”2011-11-11T11:11:11.250000” | 6 | 2 | 3 | 1 |
As you can see, each row of this table represents an object that the ML model (RT-DETR) identified.
Specifically, each row includes a label for the object identified, the video frame number, the object’s position within the frame, the video frame’s timestamp, and a count for all other objects identified in the same frame.
Let’s uplaod this data and video to Nominal for data review and automated check authoring.
Upload to Nominal
We’ll upload both the annotated video and extracted features dataset, then group them together as a Run.
In Nominal, Runs are containers of multimodal test data - including Datasets, Videos, Logs, and database connections.
To see your organization’s latest Runs, head over to the Runs page
Upload dataset
Since the extracted features CSV is already in a Polars
dataframe, we can conveniently upload it to Nominal with the upload_polars()
function.
Upload video
Similarly, we can upload the video file with the upload_video()
convenience function.
Video upload requires a start time. If the start time of your video capture is not
important, you can choose an arbitrary time like datetime.now()
or 2011-11-11 11:11:11
.
Since Nominal uses timestamps to cross-correlate between datasets, make sure that whichever start time you choose makes sense for the other datasets in the Run.
Create an empty Run
In Nominal, Runs are containers of multimodal test data - including Datasets, Videos, Logs, and database connections.
To see your organization’s latest Runs, head over to the Runs page
Set the Run start and end times with minimum and maximum values from the timestamp
column.
Add dataset & video to Run
Add the video file and feature dataset to the Run with Run.add_datasets()
.
On the Nominal runs page, click on “RT-DETR model analysis” (login required). If you go to the “Data sources” tab of the run, you’ll now see the Video and CSV file associated with this run:
Create a workbook
Now that your data is organized in a Run, it’s easy to create a Workbook for ad-hoc analysis on the Nominal platform.
This Workbook synchronizes the extracted feature data with the playback of the annotated video. Feature data like object count and ML model confidence score can be inspected frame-by-frame. Checks that signal anomalous behaviour can also be defined and applied to future video ingests.
Workbook link (Login required)
Appendix
This section outlines the general steps for applying a pre-trained ML model to a video. The model chosen is RT-DETR - an object identification model. Other types of ML image models can be applied as well (such as depth detection, temperature analysis, etc). Choose an ML model or video analysis technique that is most helpful for your hardware testing goals. Please contact our team if you’d like to discuss!
For automating the ingestion of computer vision artifacts in Nominal, please see the previous section.
Identify objects per frame
The function below takes a PIL image and returns a Polars dataframe with all of the objects in the image identified.
We’ll use this function to step through the video frame-by-frame and identify each object.
Step through video frames
The below script steps through each frame in the video and uses get_objects_from_pil_image()
(see above)
to identify each object. Each identified object is added as a row to the Polars dataframe df_video
.
In less than 5 minutes of video, the RT-DETR model identified almost 300k objects!
Enrich metadata
The scripts below add timestamp and object count columns to df_video
.
Timestamp column
df_video
only has a frame count column. This script adds a timestamp
column and
assigns each frame an absolute time (starting with ‘2011-11-11 11:11:11’ for the first frame).
Video start times are used to align playback with other time-domain data in your run. Whichever absolute start time that you choose for your video (for example, 2011-11-11 11:11:11), make sure that it aligns with the other start times in your run’s data sources.
Object count
The script below adds columns that count each object per video frame.
For example, if the boat_count
column is 6, then 6 boats were identified in that frame.
Annotate video
Finally, the below script adds a color-coded bounding box and label to each object identified in each frame. The result is a fully annotated video.
(For faster loading, only 20s of the full 225 MB video is shown above).