Data Integration Tutorial for Python
The quickstart explained the mechanics of setting up your Nominal Python client, as well as uploading simple datasets. Here, in this tutorial, we think more broadly about the steps involved with data capture, how those steps correspond with the Nominal Data Model, as well as how to automate uploads using Python.
Working with assets
In Nominal, an Asset is the digital representation of a physical device operated during tests. For example, if you are handling data for a fleet of airplanes, then each of those airplanes could be represented as an Asset within Nominal.
While it is simple to describe an Asset as a 1:1 relation between physical assets, it should be noted that an Asset may also refer to a shared concept. For example, an Asset in Nominal could be used to group simulation runs for a planned future aircraft that doesn’t exist physically yet.
Assets have (amongst other things):
- A human readable name and a description.
- Properties which may help programmatically identify an asset from others. Some examples:
- For airplanes: a model and tail number are commonly used to map between a physical plane and the asset in Nominal.
- For self-driving food delivery cars: a license plate number, make / model, sensors fitted (
has_lidar
,has_radar
,has_night_vision_cameras
, etc.) - For vibration stands: a serial number, type, and warehouse number.
Other associated metadata may include labels, URLs to resources associated with the asset, file attachments, etc. As we’ll see later, an asset can be found by name, or by searching metadata.
Being able to uniquely identify an Asset in Nominal via a combination of properties enables easy lookup and search based on conventions contextual to your organization.
Case Study: Electric Gliders
Imagine that we are working at a company that produces electric gliders.
We have two generations of our product, codenamed shimmer
and sidewinder
.
Each of our gliders has a uniquely identifying tail number, sn-001
through sn-999
.
While operating these gliders, we generate the following types of data:
- a variety of files containing timeseries data, ranging from
.csv
files to proprietary binary file formats; - an
.mcap
recording containing timeseries data and camera data; - several
.mp4
videos from several cameras; and - log files from
systemd
(read usingjournalctl
).
We now have a test event coming up, and need to ingest flight artifacts for analysis in Nominal afterwards.
Creating an Asset
To start, we create an Asset
that we can upload data to, one per glider.
If we had only a few assets, this could be done via the frontend.
However, we have thousands of gliders, so we will be automating the process using python.
This is something you’ll only ever have to do once per asset. Future test events will be uploading to the same asset that we create initially.
Once you’ve created an asset, it’s useful to write a function that can look up an asset for uploading data to later.
Here’s an example that can be used given the example’s platform
and serial_num
properties:
Later, once we have data from our test event, we can look up the correct asset to append data to using this helper function.
See the Asset function reference for more information on what you can do with a nominal.Asset
.
Normalizing the data to a Nominal supported format
The Nominal platform requires uploaded data to be in one of several supported file formats. After the flight, we therefore need to post-process our data to convert our proprietary format to a known format, and otherwise normalize our data to be compliant.
See below for how we could setup processing for each of our data modalities in our flight data:
Tabular data files
Mcap Files
Video Files
Log Files
Tabular data comes in a large variety of formats, ranging from simple .csv
files to complex proprietary binary formats.
Prior to uploading data, it should be converted into a format supported by Nominal.
Today, the most commonly used intermediary file formats are CSV and Parquet.
Parquet is generally preferred, since it produces smaller files that can be ingested faster than CSV files.
The platform imposes a few additional requirements:
-
Only floating point and string columns are supported.
e are considering including other data types, such as vectors, uint64, etc. Please contact your Nominal representative if this is of interest.
-
Each file of data must have one known column with timestamps. We support a wide warray of timestamp types, with some of the most popular including:
- Absolute floating point {hours, minutes, seconds, milliseconsd, microseconds, nanoseconds} since unix epoch
- Relative floating point {hours, minutes, seconds, milliseconsd, microseconds, nanoseconds} since a provided absolute timestamp
- ISO8601 Timestamps
- Custom string-based timestamp formats with a provided jodatime format string
-
The platform supports viewing channels in a hierarchical manner. However, data must be flattened during ingest.
Consider the following example data, in JSON:
You can preserve the hierarchical structure of this data by naming columns appropriately:
When creating the dataset using the Python client, you must specify a
prefix_delimiter
of"."
for the columns to be interpreted hierarchically.As you are flattening the data, do not feel compelled necessarily to jam-pack all of the flattened data into a single file to upload to Nominal. In many cases, it can be easier to simply produce a folder of .CSV or .parquet files to upload in a for-loop to prevent having to perform joins on different parts of the data, depending on your format (e.g. .mat files in particular benefit from this). You can upload as many files as you want to a dataset, and they will all be combined into a unified source for you to work with.
This can be used both to concatenate and join additional data across files— new columns will result in additional channels being created, and new timestamps for existing channels will add additional data to those existing channels.
Uploading data from multiple files to the same channels with duplicate timestamps will overwrite existing data.
Uploading data to Nominal
Once your data has been transformed into a neutral / normalized format, uploading and ingesting the data into Nominal is straightforward.
The first time that data is uploaded to an asset, we create new datasets and video datasources for the asset. On subsequent uploads, we append new files directly to existing datasources (datasets, videos, etc.)
See below for instructions on uploading data to an asset in each case:
In the following examples, we associate datasources with an asset using a “refname”. Refnames are a mechanism for performing two common tasks within Nominal:
- Looking up a datasource on an Asset to later edit / append data to, and
- Comparing likewise datasources on different assets in multi-asset workflows.
It is a good practice to give a descriptive, yet terse, refname when associating a datasource with an Asset.
For example, if our gliders communicate data over mavlink, a common refname for data associated with that conection could be "mavlink_data"
.
If we are working with camera data and video files, it is typical to use a refname based on the context of the camera, such as "front_center_camera"
or "night_vision_camera"
.
Tabular data files
Mcap Files
Video Files
Log Files
When uploading tabular data, there are a few common formats we support, as well as some more specialized formats. These include (but are not limited to):
.csv
files (and tarballs of CSV files.csv.gz
).parquet
files.bin
ardupilot dataflash files
For subsequent flight test events, we can skip creating the dataset in lieu of searching for the dataset we already created:
Want to ingest Ardupilot Dataflash .bin
files?
When creating a dataset, you can use client.create_ardupilot_dataflash_dataset
.
To add this data to an existing dataset, simply use dataset.add_ardupilot_dataflash_to_dataset
.
Creating runs in Nominal
When working with assets, users generally work with data from many distinct test events. As shown in other sections of the tutorial, this is generally accomplished by repeatedly uploading data to a set of datasources within an Asset. For example, if every test event generates some CSV, Parquet, log, and video files, these will be uploaded to the same set of datasources for each test event. However, as useful as it is to see all of the data for an Asset in a single place, frequently it is useful to investigate a single test event and all of the data associated with it.
This is where Runs come in to play! A run is defined by a start and end time on an existing asset that is a view on all of the files and datasources attached to that asset. When creating workbooks, running checklists, or doing other validation on your data, it is useful to be able to perform these tasks on a single flight test. To do this, we must create these runs ourselves when uploading data to Nominal.
Determining the correct start / end bounds for a nominal.Run
can be challenging when you have a large number of data files being ingested.
Typically, a good practice is to create the nominal.Run
as early on in the data ingestion script and update the bounds as you go.
An example of doing this would look like:
However you intend to add bounds to your Run (either ahead of time or as you go), you must add the correct bounds to the run in order for data to visualize correctly in the website when viewing workbooks on runs.