Data Integration Tutorial for Python

The quickstart explained the mechanics of setting up your Nominal Python client, as well as uploading simple datasets. Here, in this tutorial, we think more broadly about the steps involved with data capture, how those steps correspond with the Nominal Data Model, as well as how to automate uploads using Python.

Working with assets

In Nominal, an Asset is the digital representation of a physical device operated during tests. For example, if you are handling data for a fleet of airplanes, then each of those airplanes could be represented as an Asset within Nominal.

While it is simple to describe an Asset as a 1:1 relation between physical assets, it should be noted that an Asset may also refer to a shared concept. For example, an Asset in Nominal could be used to group simulation runs for a planned future aircraft that doesn’t exist physically yet.

Assets have (amongst other things):

  • A human readable name and a description.
  • Properties which may help programmatically identify an asset from others. Some examples:
    • For airplanes: a model and tail number are commonly used to map between a physical plane and the asset in Nominal.
    • For self-driving food delivery cars: a license plate number, make / model, sensors fitted (has_lidar, has_radar, has_night_vision_cameras, etc.)
    • For vibration stands: a serial number, type, and warehouse number.

Other associated metadata may include labels, URLs to resources associated with the asset, file attachments, etc. As we’ll see later, an asset can be found by name, or by searching metadata.

Being able to uniquely identify an Asset in Nominal via a combination of properties enables easy lookup and search based on conventions contextual to your organization.

Case Study: Electric Gliders

Imagine that we are working at a company that produces electric gliders. We have two generations of our product, codenamed shimmer and sidewinder. Each of our gliders has a uniquely identifying tail number, sn-001 through sn-999.

While operating these gliders, we generate the following types of data:

  • a variety of files containing timeseries data, ranging from .csv files to proprietary binary file formats;
  • an .mcap recording containing timeseries data and camera data;
  • several .mp4 videos from several cameras; and
  • log files from systemd (read using journalctl).

We now have a test event coming up, and need to ingest flight artifacts for analysis in Nominal afterwards.

1

Creating an Asset

To start, we create an Asset that we can upload data to, one per glider. If we had only a few assets, this could be done via the frontend. However, we have thousands of gliders, so we will be automating the process using python.

1from nominal.core import NominalClient
2
3# Get a client to interact with Nominal.
4# Assumes authentication has already been performed.
5# Replace "default" with your profile name.
6client = NominalClient.from_profile("default")
7
8asset = client.create_asset(
9 # Human readable name for the asset
10 name="Shimmer SN-001",
11 # Optional description, useful if you have notes about this glider.
12 description="Our very first shimmer glider produced",
13 # Properties which can help us find this asset later.
14 # Ideally, you should be able to uniquely identify any physical asset
15 # using some combination of these properties.
16 properties={
17 "platform": "shimmer",
18 "serial_num": "SN-001",
19 }
20)

This is something you’ll only ever have to do once per asset. Future test events will be uploading to the same asset that we create initially.

Once you’ve created an asset, it’s useful to write a function that can look up an asset for uploading data to later. Here’s an example that can be used given the example’s platform and serial_num properties:

1def retrieve_asset(
2 client: nominal.NominalClient,
3 platform: str,
4 serial_num: str
5) -> nominal.Asset:
6 properties = {"platform": platform, "serial_num": serial_num}
7 existing_assets = client.search_assets(
8 properties=properties
9 )
10 if len(existing_assets) > 1:
11 raise RuntimeError(
12 f"Too many assets ({len(existing_assets)}) found with properties {properties}"
13 )
14 elif len(existing_assets) == 0:
15 raise RuntimeError(
16 f"No such asset found with properties {properties}"
17 )
18 else:
19 return existing_assets[0]

Later, once we have data from our test event, we can look up the correct asset to append data to using this helper function.

See the Asset function reference for more information on what you can do with a nominal.Asset.

2

Normalizing the data to a Nominal supported format

The Nominal platform requires uploaded data to be in one of several supported file formats. After the flight, we therefore need to post-process our data to convert our proprietary format to a known format, and otherwise normalize our data to be compliant.

See below for how we could setup processing for each of our data modalities in our flight data:

Tabular data comes in a large variety of formats, ranging from simple .csv files to complex proprietary binary formats. Prior to uploading data, it should be converted into a format supported by Nominal. Today, the most commonly used intermediary file formats are CSV and Parquet. Parquet is generally preferred, since it produces smaller files that can be ingested faster than CSV files.

The platform imposes a few additional requirements:

  • Only floating point and string columns are supported.

    We are considering including other data types, such as vectors, uint64, etc. Please contact your Nominal representative if this is of interest.

  • Each file of data must have one known column with timestamps. We support a wide array of timestamp types, with some of the most popular including:

    • Absolute floating point {hours, minutes, seconds, milliseconds, microseconds, nanoseconds} since unix epoch
    • Relative floating point {hours, minutes, seconds, milliseconds, microseconds, nanoseconds} since a provided absolute timestamp
    • ISO8601 Timestamps
    • Custom string-based timestamp formats with a provided jodatime format string
  • The platform supports viewing channels in a hierarchical manner. However, data must be flattened during ingest.

    Consider the following example data, in JSON:

    1{
    2 "timestamp": 12345,
    3 "gps": {
    4 "lat": 12.1,
    5 "lon": 12.3,
    6 "height": 10000
    7 },
    8 "imu": {
    9 "roll": 1.23,
    10 "pitch": -0.2,
    11 "yaw": 0.0
    12 }
    13}

    You can preserve the hierarchical structure of this data by naming columns appropriately:

    timestampgps.latgps.longps.heightimu.rollimu.pitchimu.roll
    1234512.112.3100001.23-0.20.0

    When creating the dataset using the Python client, you must specify a prefix_delimiter of "." for the columns to be interpreted hierarchically.

    When flattening data, do not feel compelled to jam-pack it all into a single file to upload to Nominal. In many cases, it is easier to produce a folder of .csv or .parquet files and to upload those in a for-loop. You can upload as many files as you want to a dataset, and they will all be combined into a unified source.

    This method can be used both to concatenate and to join additional data across files—new columns will result in additional channels being created, and new timestamps for existing channels will add additional data to those existing channels.

    Uploading data from multiple files to the same channels with duplicate timestamps will overwrite existing data.

3

Uploading data to Nominal

Once your data has been transformed into a neutral / normalized format, uploading and ingesting the data into Nominal is straightforward.

The first time that data is uploaded to an asset, we create new datasets and video datasources for the asset. On subsequent uploads, we append new files directly to existing datasources (datasets, videos, etc.)

See below for instructions on uploading data to an asset in each case:

In the following examples, we associate datasources with an asset using a “refname”. Refnames are a mechanism for performing two common tasks within Nominal:

  • Looking up a datasource on an Asset to later edit / append data to, and
  • Comparing likewise datasources on different assets in multi-asset workflows.

It is a good practice to give a descriptive, yet terse, refname when associating a datasource with an Asset. For example, if our gliders communicate data over mavlink, a common refname for data associated with that conection could be "mavlink_data". If we are working with camera data and video files, it is typical to use a refname based on the context of the camera, such as "front_center_camera" or "night_vision_camera".

When uploading tabular data, there are a few common formats we support, as well as some more specialized formats. These include (but are not limited to):

  • .csv files (and tarballs of CSV files .csv.gz)
  • .parquet files
  • .bin ardupilot dataflash files
1from nominal.core import NominalClient
2
3# Assumes authentication has already been performed.
4# Replace "default" with your profile name.
5client = NominalClient.from_profile("default")
6
7# Using utility from previous section on creating assets
8# Hardcoding platform and serial number for this example, but typically this would be
9# computed dynamically based on metadata about the data files, such as the filepath.
10asset = retrieve_asset(client, platform="shimmer", serial_num="SN-001")
11
12# Create dataset (only required for the first upload for a given data source)
13dataset = client.create_dataset(
14 # Human readable name for the dataset
15 name="Flight Data",
16 # Key-value properties that are useful for looking up and finding this dataset
17 properties={
18 "platform": "shimmer",
19 "serial_num": "SN-001",
20 },
21 # Optional description for the dataset, useful for storing notes for future readers
22 description="Flight data from onboard telemetry platform"
23)
24dataset.add_tabular_data(
25 path="path/to/data.parquet",
26 # Column within the parquet file containing absolute or relative timestamp information
27 # for all other columns within the file
28 timestamp_column="unix_timestamps",
29 # Type of data contained within the timestamp column.
30 # Using absolute floating point seconds from unix epoch for this example, but a wide variety
31 # of formats are supported, such as other resolutions of time, relative timestamps, or even custom-formatted
32 # string timestamps (e.g. ISO8601).
33 timestamp_type="epoch_seconds"
34)
35
36# Attach the dataset to the asset
37asset.add_dataset(
38 # reference name for this dataset within the asset
39 # This can be used later to retrieve a reference to this dataset via the same reference name for
40 # future flight tests
41 "flight_data",
42 dataset,
43)

For subsequent flight test events, we can skip creating the dataset in lieu of searching for the dataset we already created:

1# NOTE: refname needs to match the refname used to add the dataset to the asset
2dataset = asset.get_dataset("flight_data")
3
4dataset.add_tabular_data(
5 "path/to/other/data.parquet",
6 # column containing timestamp information for this data file.
7 # NOTE: Need not be the same as other files in this dataset.
8 timestamp_column="unix_timestamps",
9 # type of timestamps stored in this data file.
10 # NOTE: Need not be the same as other files in this dataset.
11 timestamp_type="epoch_seconds"
12)

Want to ingest Ardupilot Dataflash .bin files? When adding data to the dataset, you can use Dataset.add_ardupilot_dataflash.

4

Creating runs in Nominal

When working with assets, users generally work with data from many distinct test events. As shown in other sections of the tutorial, this is generally accomplished by repeatedly uploading data to a set of datasources within an Asset. For example, if every test event generates some CSV, Parquet, log, and video files, these will be uploaded to the same set of datasources for each test event. However, as useful as it is to see all of the data for an Asset in a single place, frequently it is useful to investigate a single test event and all of the data associated with it.

This is where Runs come in to play! A run is defined by a start and end time on an existing asset that is a view on all of the files and datasources attached to that asset. When creating workbooks, running checklists, or doing other validation on your data, it is useful to be able to perform these tasks on a single flight test. To do this, we must create these runs ourselves when uploading data to Nominal.

1import datetime
2from nominal.core import NominalClient
3
4# Assumes authentication has already been performed
5# Replace "default" with your profile name
6client = NominalClient.from_profile("default")
7
8# Using utility from previous section on creating assets
9# Hardcoding platform and serial number for this example, but typically this would be
10# computed dynamically based on metadata about the data files, such as the filepath.
11asset = retrieve_asset(client, platform="shimmer", serial_num="SN-001")
12
13# Create the run in Nominal and associate with the Asset producing the data
14run = client.create_run(
15 name="Test Flight #2 10/09/2024",
16 start=datetime.datetime(
17 year=2024,
18 month=10,
19 day=9,
20 hour=13,
21 minute=3,
22 second=5,
23 tzinfo=datetime.timezone.utc,
24 )
25 end=datetime.datetime(
26 year=2024,
27 month=10,
28 day=9,
29 hour=14,
30 minute=2,
31 second=17,
32 tzinfo=datetime.timezone.utc,
33 )
34 # Useful for looking up runs later on by asset properties
35 properties={
36 "platform": "shimmer",
37 "serial_num": "SN-001"
38 },
39 # Link back to the asset containing data for this run
40 asset=asset,
41 # Optional human description of the run, useful for describing what maneuvers we tested
42 description="Testing unpowered glide",
43)

Determining the correct start / end bounds for a nominal.Run can be challenging when you have a large number of data files being ingested. A good practice is to create the nominal.Run early on in the data ingestion script and to update the bounds as you go.

An example of doing this would look like:

1run = client.create_run(
2 ...,
3 # Initialize start time to now, as it will always be greater than data
4 # we are ingesting from a flight test that happened in the past
5 start=datetime.datetime.now(),
6)
7
8# ... normalize data from flight test ...
9
10# derived directly from the data uploaded to nominal---
11# can look at the start and end of the provided timestamp_column
12data_start_time = datetime.datetime(...)
13data_end_time = datetime.datetime(...)
14
15# ... upload data to nominal ...
16
17# Update bounds of run based on the earlier / latest of the start and end
18# times respectively from the existing run, and
19run.update(
20 start = min(run.start, data_start_time),
21 end = data_end_time if run.end is None else max(run.end, data_end_time),
22)

Whether you add Run bounds ahead of time or as you go, it is important to set them correctly, since they influence data visualization on the website ( e.g. when viewing workbooks on runs).