Data Integration Tutorial for Python

The quickstart explained the mechanics of setting up your Nominal Python client, as well as uploading simple datasets. Here, in this tutorial, we think more broadly about the steps involved with data capture, how those steps correspond with the Nominal Data Model, as well as how to automate uploads using Python.

Working with assets

In Nominal, an Asset is the digital representation of a physical device operated during tests. For example, if you are handling data for a fleet of airplanes, then each of those airplanes could be represented as an Asset within Nominal.

While it is simple to describe an Asset as a 1:1 relation between physical assets, it should be noted that an Asset may also refer to a shared concept. For example, an Asset in Nominal could be used to group simulation runs for a planned future aircraft that doesn’t exist physically yet.

Assets have (amongst other things):

  • A human readable name and a description.
  • Properties which may help programmatically identify an asset from others. Some examples:
    • For airplanes: a model and tail number are commonly used to map between a physical plane and the asset in Nominal.
    • For self-driving food delivery cars: a license plate number, make / model, sensors fitted (has_lidar, has_radar, has_night_vision_cameras, etc.)
    • For vibration stands: a serial number, type, and warehouse number.

Other associated metadata may include labels, URLs to resources associated with the asset, file attachments, etc. As we’ll see later, an asset can be found by name, or by searching metadata.

Being able to uniquely identify an Asset in Nominal via a combination of properties enables easy lookup and search based on conventions contextual to your organization.

Case Study: Electric Gliders

Imagine that we are working at a company that produces electric gliders. We have two generations of our product, codenamed shimmer and sidewinder. Each of our gliders has a uniquely identifying tail number, sn-001 through sn-999.

While operating these gliders, we generate the following types of data:

  • a variety of files containing timeseries data, ranging from .csv files to proprietary binary file formats;
  • an .mcap recording containing timeseries data and camera data;
  • several .mp4 videos from several cameras; and
  • log files from systemd (read using journalctl).

We now have a test event coming up, and need to ingest flight artifacts for analysis in Nominal afterwards.

1

Creating an Asset

To start, we create an Asset that we can upload data to, one per glider. If we had only a few assets, this could be done via the frontend. However, we have thousands of gliders, so we will be automating the process using python.

1import nominal
2
3# get a client to interact with Nominal
4client = nominal.get_default_client()
5
6asset = client.create_asset(
7 # Human readable name for the asset
8 name="Shimmer SN-001",
9 # Optional description, useful if you have notes about this glider.
10 description="Our very first shimmer glider produced",
11 # Properties which can help us find this asset later.
12 # Ideally, you should be able to uniquely identify any physical asset
13 # using some combination of these properties.
14 properties={
15 "platform": "shimmer",
16 "serial_num": "SN-001",
17 }
18)

This is something you’ll only ever have to do once per asset. Future test events will be uploading to the same asset that we create initially.

Once you’ve created an asset, it’s useful to write a function that can look up an asset for uploading data to later. Here’s an example that can be used given the example’s platform and serial_num properties:

1def retrieve_asset(
2 client: nominal.NominalClient,
3 platform: str,
4 serias_num: str
5) -> nominal.Asset:
6 properties = {"platform": platform, "serial_num": serial_num}
7 existing_assets = client.search_assets(
8 properties=properties
9 )
10 if len(existing_assets) > 1:
11 raise RuntimeError(
12 f"Too many assets ({len(existing_assets)}) found with properties {properties}"
13 )
14 elif len(existing_assets) == 0:
15 raise RuntimeError(
16 f"No such asset found with properties {properties}"
17 )
18 else:
19 return existing_assets[0]

Later, once we have data from our test event, we can look up the correct asset to append data to using this helper function.

See the Asset function reference for more information on what you can do with a nominal.Asset.

2

Normalizing the data to a Nominal supported format

The Nominal platform requires uploaded data to be in one of several supported file formats. After the flight, we therefore need to post-process our data to convert our proprietary format to a known format, and otherwise normalize our data to be compliant.

See below for how we could setup processing for each of our data modalities in our flight data:

Tabular data comes in a large variety of formats, ranging from simple .csv files to complex proprietary binary formats. Prior to uploading data, it should be converted into a format supported by Nominal. Today, the most commonly used intermediary file formats are CSV and Parquet. Parquet is generally preferred, since it produces smaller files that can be ingested faster than CSV files.

The platform imposes a few additional requirements:

  • Only floating point and string columns are supported.

    e are considering including other data types, such as vectors, uint64, etc. Please contact your Nominal representative if this is of interest.

  • Each file of data must have one known column with timestamps. We support a wide warray of timestamp types, with some of the most popular including:

    • Absolute floating point {hours, minutes, seconds, milliseconsd, microseconds, nanoseconds} since unix epoch
    • Relative floating point {hours, minutes, seconds, milliseconsd, microseconds, nanoseconds} since a provided absolute timestamp
    • ISO8601 Timestamps
    • Custom string-based timestamp formats with a provided jodatime format string
  • The platform supports viewing channels in a hierarchical manner. However, data must be flattened during ingest.

    Consider the following example data, in JSON:

    1{
    2 "timestamp": 12345,
    3 "gps": {
    4 "lat": 12.1,
    5 "lon": 12.3,
    6 "height": 10000,
    7 },
    8 "imu": {
    9 "roll": 1.23,
    10 "pitch": -0.2,
    11 "yaw": 0.0,
    12 },
    13}

    You can preserve the hierarchical structure of this data by naming columns appropriately:

    timestampgps.latgps.longps.heightimu.rollimu.pitchimu.roll
    1234512.112.3100001.23-0.20.0

    When creating the dataset using the Python client, you must specify a prefix_delimiter of "." for the columns to be interpreted hierarchically.

    As you are flattening the data, do not feel compelled necessarily to jam-pack all of the flattened data into a single file to upload to Nominal. In many cases, it can be easier to simply produce a folder of .CSV or .parquet files to upload in a for-loop to prevent having to perform joins on different parts of the data, depending on your format (e.g. .mat files in particular benefit from this). You can upload as many files as you want to a dataset, and they will all be combined into a unified source for you to work with.

    This can be used both to concatenate and join additional data across files— new columns will result in additional channels being created, and new timestamps for existing channels will add additional data to those existing channels.

    Uploading data from multiple files to the same channels with duplicate timestamps will overwrite existing data.

3

Uploading data to Nominal

Once your data has been transformed into a neutral / normalized format, uploading and ingesting the data into Nominal is straightforward.

The first time that data is uploaded to an asset, we create new datasets and video datasources for the asset. On subsequent uploads, we append new files directly to existing datasources (datasets, videos, etc.)

See below for instructions on uploading data to an asset in each case:

In the following examples, we associate datasources with an asset using a “refname”. Refnames are a mechanism for performing two common tasks within Nominal:

  • Looking up a datasource on an Asset to later edit / append data to, and
  • Comparing likewise datasources on different assets in multi-asset workflows.

It is a good practice to give a descriptive, yet terse, refname when associating a datasource with an Asset. For example, if our gliders communicate data over mavlink, a common refname for data associated with that conection could be "mavlink_data". If we are working with camera data and video files, it is typical to use a refname based on the context of the camera, such as "front_center_camera" or "night_vision_camera".

When uploading tabular data, there are a few common formats we support, as well as some more specialized formats. These include (but are not limited to):

  • .csv files (and tarballs of CSV files .csv.gz)
  • .parquet files
  • .bin ardupilot dataflash files
1import nominal
2
3# Assumes authentication has already been performed
4client = nominal.get_default_client()
5
6# Using utility from previous section on creating assets
7# Hardcoding platform and serial number for this example, but typically this would be
8# computed dynamically based on metadata about the data files, such as the filepath.
9asset = retrieve_asset(client, platform="shimmer", serial_num="SN-001")
10
11# Create dataset (only required for the first upload for a given data source)
12dataset = client.create_tabular_dataset(
13 path="path/to/data.parquet",
14 # Human readable name for the dataset
15 name="Flight Data",
16 # Column within the parquet file containing absolute or relative timestamp information
17 # for all other columns within the file
18 timestamp_column="unix_timestamps",
19 # Type of data contained within the timestamp column.
20 # Using absolute floating point seconds from unix epoch for this example, but a wide variety
21 # of formats are supported, such as other resolutions of time, relative timestamps, or even custom-formatted
22 # string timestamps (e.g. ISO8601).
23 timestamp_type="epoch_seconds",
24 # Key-value properties that are useful for looking up and finding this dataset
25 properties={
26 "platform": "shimmer",
27 "serial_num": "SN-001",
28 },
29 # Optional description for the dataset, useful for storing notes for future readers
30 description="Flight data from onboard telemetry platform",
31)
32
33# Attach the dataset to the asset
34asset.add_dataset(
35 # reference name for this dataset within the asset
36 # This can be used later to retrieve a reference to this dataset via the same reference name for
37 # future flight tests
38 "flight_data",
39 dataset,
40)

For subsequent flight test events, we can skip creating the dataset in lieu of searching for the dataset we already created:

1# NOTE: refname needs to match the refname used to add the dataset to the asset
2dataset = asset.get_dataset("flight_data")
3
4dataset.add_data_to_dataset(
5 "path/to/other/data.parquet",
6 # column containing timestamp information for this data file.
7 # NOTE: Need not be the same as other files in this dataset.
8 timestamp_column="unix_timestamps",
9 # type of timestamps stored in this data file.
10 # NOTE: Need not be the same as other files in this dataset.
11 timestamp_type="epoch_seconds"
12)

Want to ingest Ardupilot Dataflash .bin files? When creating a dataset, you can use client.create_ardupilot_dataflash_dataset. To add this data to an existing dataset, simply use dataset.add_ardupilot_dataflash_to_dataset.

4

Creating runs in Nominal

When working with assets, users generally work with data from many distinct test events. As shown in other sections of the tutorial, this is generally accomplished by repeatedly uploading data to a set of datasources within an Asset. For example, if every test event generates some CSV, Parquet, log, and video files, these will be uploaded to the same set of datasources for each test event. However, as useful as it is to see all of the data for an Asset in a single place, frequently it is useful to investigate a single test event and all of the data associated with it.

This is where Runs come in to play! A run is defined by a start and end time on an existing asset that is a view on all of the files and datasources attached to that asset. When creating workbooks, running checklists, or doing other validation on your data, it is useful to be able to perform these tasks on a single flight test. To do this, we must create these runs ourselves when uploading data to Nominal.

1import datetime
2import nominal
3
4# Assumes authentication has already been performed
5client = nominal.get_default_client()
6
7# Using utility from previous section on creating assets
8# Hardcoding platform and serial number for this example, but typically this would be
9# computed dynamically based on metadata about the data files, such as the filepath.
10asset = retrieve_asset(client, platform="shimmer", serial_num="SN-001")
11
12# Create the run in Nominal and associate with the Asset producing the data
13run = client.create_run(
14 name="Test Flight #2 10/09/2024",
15 start=datetime.datetime(
16 year=2024,
17 month=10,
18 day=9,
19 hour=13,
20 minute=3,
21 second=5,
22 tzinfo=datetime.timezone.utc,
23 )
24 end=datetime.datetime(
25 year=2024,
26 month=10,
27 day=9,
28 hour=14,
29 minute=2,
30 second=17,
31 tzinfo=datetime.timezone.utc,
32 )
33 # Useful for looking up runs later on by asset properties
34 properties={
35 "platform": "shimmer",
36 "serial_num": "SN-001"
37 },
38 # Link back to the asset containing data for this run
39 asset=asset,
40 # Optional human description of the run, useful for describing what maneuvers we tested
41 description="Testing unpowered glide",
42)

Determining the correct start / end bounds for a nominal.Run can be challenging when you have a large number of data files being ingested. Typically, a good practice is to create the nominal.Run as early on in the data ingestion script and update the bounds as you go.

An example of doing this would look like:

1run = client.create_run(
2 ...,
3 # Initialize start time to now, as it will always be greater than data
4 # we are ingesting from a flight test that happened in the past
5 start=datetime.datetime.now(),
6)
7
8# ... normalize data from flight test ...
9
10# derived directly from the data uploaded to nominal--
11# can look at the start and end of the provided timestamp_column
12data_start_time = datetime.datetime(...)
13data_end_time = datetime.datetime(...)
14
15# ... upload data to nominal ...
16
17# Update bounds of run based on the earlier / latest of the start and end
18# times respectively from the existing run, and
19run.update(
20 start = min(run.start, data_start_time),
21 end = data_end_time if run.end is None else max(run.end, data_end_time),
22)

However you intend to add bounds to your Run (either ahead of time or as you go), you must add the correct bounds to the run in order for data to visualize correctly in the website when viewing workbooks on runs.