Datasets in Nominal with Python

To use this guide, install the Nominal Python library with pip3 install nominal.

See Quickstart for more details.

Please contact us if you’re not sure whether your organization has access to Nominal.

Datasets are Nominal’s primitive for ingesting and working with tabular data files. They must have at least one timestamp column.

Once uploaded to Nominal, Datasets can be organized into Runs with other data sources - including video files, database connections, and log files.

Datasets are the file representation of Nominal’s Data Source primitive. Most often, Datasets are tabular files with at least one time dimension. Datasets can also be video files.

Head over to the Datasets page to see your organization’s most recently uploaded Datasets.

This guide details common patterns for working with Nominal Datasets in Python.

Connect to Nominal

To upload a Dataset, we’ll first have to connect to your Nominal platform tenant.

Concepts
  • Base URL: The URL through which the Nominal API is accessed (typically https://api.gov.nominal.io/api; shown under Settings → API keys).
  • Workspace: A mechanism by which to isolate datasets; each user has one or more workspace, and data in one cannot be seen from another. Note that a token / API key is attached to a user, and may access multiple workspaces.
  • Profile: A combination of base URL, API key, and workspace.

There are two primary ways of authenticating the Nominal Client. The first is to use a profile stored on disk, and the second is to use a token directly.

Run the following in a terminal and follow on-screen prompts to set up a connection profile:

$$ nom config profile add default
>
># Alternatively, if `nom` is missing from the path:
>$ python -m nominal config profile add default

Here, “default” can be any name chosen to represent this profile (reminder: a profile represents a base URL, API key, and workspace).

The profile will be stored in ~/.config/nominal/config.yml, and can then be used to create a client:

1from nominal.core import NominalClient
2
3client = NominalClient.from_profile("default")
4
5# Get details about the currently logged-in user to validate authentication
6# Will display an object like: `User(display_name='your_email@your_company.com', ...)`
7print(client.get_user())

If you have previously used nom to store credentials, prior to the availability of profiles, you will need to migrate your old configuration file (~/.nominal.yml) to the new format (~/.config/nominal/config.yml).

You can do this with the following command:

$nom config migrate
>
># Or, if `nom` is missing from your path:
>python -m nominal config migrate
1from nominal.core import NominalClient
2
3# Get an instance of the client using provided credentials
4client = NominalClient.from_token("<insert api key>")
5
6# Get details about the currently logged-in user to validate authentication
7# Will display an object like: `User(display_name='your_email@your_company.com', ...)`
8print(client.get_user())

NOTE: you should never share your Nominal API key with anyone. We therefore recommend that you not save it in your code and/or scripts.

  • If you trust the computer you are on, use nom to store the credential to disk.
  • Otherwise, use a password manager such as 1password or bitwarden to keep your token safe.
If you’re not sure whether your company has a Nominal tenant, please reach out to us.

Upload a Dataset

1from nominal.core import NominalClient
2
3client = NominalClient.from_profile("default") # replace with your profile name
4
5dataset = client.create_dataset('Frosty Flight')
6dataset.add_tabular_data_to_dataset(
7 'frosty_flight_1k_rows.csv',
8 timestamp_column = 'source_time',
9 timestamp_type = 'iso_8601',
10)
11
12print('Uploaded dataset:', dataset.rid)

Replace frosty_flight_1k_rows.csv with a path to a CSV file on your own computer.

If you don’t have a CSV file handy, you can download ‘frosty_flight_1k_rows.csv’ by copy-pasting the below 4 lines into your Python terminal.

1import polars as pl
2sample_csv = 'hf://datasets/nominal-io/frosty-flight/frosty_flight_1k_rows.csv'
3df = pl.read_csv(sample_csv)
4df.write_csv('frosty_flight_1k_rows.csv')

Head over to the CSV files page for more options for CSV file upload.

Relative timestamps

In the above example, we uploaded a Dataset with an absolute time series column (source_time). In lay person’s terms, absolute times refers to the date + time on a calendar + clock when recording a measurement. Absolute time is usually expressed in reference to a global standard (such at UTC) that normalizes the timestamp timezone.

Sometimes, test measurements are recorded in relative time, where 0 represents the start of the measurement and 0 + (i * time unit) represent the timestamp of subsequent measurements.

For example, say that you’re measuring the pressure in an engine chamber once per second. In relative time, your first measurement would be at time 0 and your 1000th measurement would be at 1000 seconds.

Nominal has first class support for measurements in either relative or absolute time, down to picosecond resolution.

To upload a dataset clocked in relative time, set the timestamp_type parameter to relative_seconds (replace “seconds” with your measurement’s resolution - for example relative_hours, relative_microseconds, relative_milliseconds, relative_minutes, or relative_nanoseconds).

Let’s look at this jet engine simulation from NASA. The relative timestamp column is cycle, which represents one operational cycle of a jet engine. (Nominal doesn’t have a relative_cycle timestamp type, so we’ll proxy with relative_hours).

First, let’s download and inspect this CSV:

1import polars as pl
2
3dataset_name = 'NASA_turbofan_train_engine_1_FD001.csv'
4link_to_csv = f'hf://datasets/nominal-io/nasa-turbofan-degradation/{dataset_name}'
5
6df_engine1 = pl.read_csv(link_to_csv)
7df_engine1.write_csv(dataset_name)
8
9df_engine1.head().select(df.columns[:8])
enginecyclesetting_1setting_2setting_3(Fan inlet temperature) (◦R)(LPC outlet temperature) (◦R)(HPC outlet temperature) (◦R)
i64i64f64f64f64f64f64f64
11-0.0007-0.0004100.0518.67641.821589.7
120.0019-0.0003100.0518.67642.151591.82
13-0.00430.0003100.0518.67642.351587.99
140.00070.0100.0518.67642.351582.79
15-0.0019-0.0002100.0518.67642.371582.85

The column that we’ll use for relative time, cycle, spans from 1 to 192.

Let’s look at all of the column names:

1df_engine1.columns.to_list()
['engine',
'cycle',
'setting_1',
'setting_2',
'setting_3',
'(Fan inlet temperature) (◦R)',
'(LPC outlet temperature) (◦R)',
'(HPC outlet temperature) (◦R)',
'(LPT outlet temperature) (◦R)',
'(Fan inlet Pressure) (psia)',
'(bypass-duct pressure) (psia)',
'(HPC outlet pressure) (psia)',
'(Physical fan speed) (rpm)',
'(Physical core speed) (rpm)',
'(Engine pressure ratio(P50/P2)',
'(HPC outlet Static pressure) (psia)',
'(Ratio of fuel flow to Ps30) (pps/psia)',
'(Corrected fan speed) (rpm)',
'(Corrected core speed) (rpm)',
'(Bypass Ratio) ',
'(Burner fuel-air ratio)',
'(Bleed Enthalpy)',
'(Required fan speed)',
'(Required fan conversion speed)',
'(High-pressure turbines Cool air flow)',
'(Low-pressure turbines Cool air flow)']

Finally, let’s upload this dataset to Nominal with timestamp_type set to relative:

1from nominal.core import NominalClient
2
3client = NominalClient.from_profile("default") # replace with your profile name
4
5dataset = client.create_dataset(dataset_name)
6dataset.add_tabular_data_to_dataset(
7 dataset_name,
8 timestamp_column = 'cycle',
9 timestamp_type = 'relative_hours',
10)

Again, since Nominal doesn’t have a relative_cycle timestamp type (representing an operational cycle of a jet engine), we’ve used relative_hours as a proxy.

If you navigate to your organization’s Datasets page, you’ll see this dataset at the top:

relative-dataset-list

If you click on the dataset and inspect its metadata, you’ll see that the timestamp type is set to “relative:”

relative-dataset-list

Acceptable values for timestamp_type include:

iso_8601, epoch_days, epoch_hours, epoch_minutes, epoch_seconds, epoch_milliseconds, epoch_microseconds, epoch_nanoseconds, relative_days, relative_hours, relative_minutes, relative_seconds, relative_milliseconds, relative_microseconds, relative_nanoseconds

Timestamps in the form 2024-06-08T05:58:42.000Z will have a timestamp_type of iso_8601.

Timestamps in the form 1581933170999989 will most likely be epoch_microseconds.

epoch_ timestamps refers to timestamps in Unix format.

For more information about Nominal timestamps in Python, see the nominal.ts docs page.

Retrieve a Dataset

To download a Dataset on the Nominal platform, you’ll first need to obtain the Dataset’s resource identifier.

If you click on a Dataset from the Datasets page, you can copy/paste its RID from the metadata drawer:

rid-copy-paste

Now that we have the Dataset RID, we can download it in Python with client.get_dataset(rid):

1from nominal.core import NominalClient
2
3client = NominalClient.from_profile("default") # replace with your profile name
4
5id = 'ri.catalog.cerulean-staging.dataset.e5ede17b-05f9-404d-aaf5-ba85c99761a2'
6
7ds = client.get_dataset(id)

ds is a Dataset object that contains the dataset’s metadata (name, description, labels, etc).

With the ds object that client.get_dataset(id) returns, you can inspect the Dataset’s metadata, but not its actual data content. This capability will be included in an upcoming release.

All Nominal primitives (eg Datasets, Runs, Workbooks, and Checks) have a unique identifier called a “Resource ID”. Resource IDs may be referred to as “RIDs”, or simply “IDs”, throughout the platform. They can be obtained from a primitive’s detail page (or URL) and have a format that looks like ri.catalog.cerulean-staging.dataset.e5ede17b-05f9-404d-aaf5-ba85c99761a2.

Retrieve a Channel

🚧 This will be included in an upcoming release. Check back soon!

Set labels and properties

Let’s inspect the labels and properties of this dataset:

1ds.properties

{'Fault Modes': 'HPC Degradation'}

1ds.labels

('Simulation', 'NASA', 'Training data')

If we visit this dataset’s detail page, we’ll see these same labels and properties:

dataset-labels-properties

You can set a dataset’s labels and properties through the Dataset.update() function.

For example, to remove a dataset’s labels and properties:

1ds.update(
2 labels = [],
3 properties = {}
4)

To set the dataset’s labels and properties back to their original values:

1ds.update(
2 labels = ['Simulation', 'NASA', 'Training data'],
3 properties = {'Fault Modes': 'HPC Degradation'}
4)

If you’re appending to a dataset’s labels or properties, you’ll want to cache the originals to not overwrite them:

1existing_labels = list(ds.labels)
2new_labels = ['Sea Level', 'Fan Failure']
3ds.update(labels = existing_labels + new_labels)

Append to a Dataset

You can append to an existing dataset with a CSV that has the same columns.

For example, let’s download 1000 rows of this flight test data. We’ll split it into 2 dataframes that are 500 rows each and create a Nominal dataset with the first one:

1import polars as pl
2from nominal.core import NominalClient
3from nominal.thirdparty.pandas import upload_dataframe
4
5client = NominalClient.from_profile("default") # replace with your profile name
6
7df_1000_rows = pl.read_csv('hf://datasets/nominal-io/frosty-flight/frosty_flight_1k_rows.csv')
8
9df_first_chunk = df_1000_rows.slice(0, 500) # First 500 rows
10df_second_chunk = df_1000_rows.slice(500, 1000) # Next 500 rows
11df_second_chunk.write_csv('frosty_flight_2nd_chunk.csv')
12dataset = upload_dataframe(
13 client,
14 df_first_chunk.to_pandas(),
15 name = 'Frosty Flight: First 500 rows',
16 timestamp_column = 'source_time',
17 timestamp_type = 'iso_8601'
18)

Add the 2nd 500 rows with Dataset.add_csv_to_dataset().

1dataset.add_csv_to_dataset(
2 path = 'frosty_flight_2nd_chunk.csv',
3 timestamp_column='source_time',
4 timestamp_type='iso_8601'
5)

Note that Dataset.add_csv_to_dataset() only works for datasets with absolute (not relative) timestamps.

Update the dataset name as well:

1dataset.update(name = 'Frosty Flight: Full 1000 rows')

You’re now familiar with the most common ways of interacting with Nominal’s Dataset primitive in Python. For all methods available in the Dataset class, please see the Function Reference.