Datasets in Nominal with Python

To use this guide, install the Nominal Python library with pip3 install nominal.

See Quickstart for more details.

Please contact us if you’re not sure whether your organization has access to Nominal.

Datasets are Nominal’s primitive for ingesting and working with tabular data files. They must have at least one timestamp column.

Once uploaded to Nominal, Datasets can be organized into Runs with other data sources - including video files, database connections, and log files.

Datasets are the file representation of Nominal’s Data Source primitive. Most often, Datasets are tabular files with at least one time dimension. Datasets can also be video files.

Head over to the Datasets page to see your organization’s most recently uploaded Datasets.

This guide details common patterns for working with Nominal Datasets in Python.

Connect to Nominal

To upload a Dataset, we’ll first have to connect to your Nominal platform tenant.

Get your Nominal API token from your User settings page.

See the Quickstart for more details on connecting to Nominal from Python.

1import nominal.nominal as nm
2
3nm._config.set_token(
4 url = 'https://api.gov.nominal.io/api',
5 token = '* * *' # Replace with your Access Token from
6 # https://app.gov.nominal.io/settings/user?tab=tokens
7)
If you’re not sure whether your company has a Nominal tenant, please reach out to us.

Upload a Dataset

To upload a Dataset to Nominal, use the nm.upload_csv() command.

1import nominal.nominal as nm
2
3dataset = nm.upload_csv(
4 'frosty_flight_1k_rows.csv',
5 name='Frosty Flight',
6 timestamp_column='source_time',
7 timestamp_type='iso_8601',
8)
9
10print('Uploaded dataset:', dataset.rid)

Replace frosty_flight_1k_rows.csv with a path to a CSV file on your own computer.

If you don’t have a CSV file handy, you can download ‘frosty_flight_1k_rows.csv’ by copy-pasting the below 4 lines into your Python terminal.

1import polars as pl
2sample_csv = 'hf://datasets/nominal-io/frosty-flight/frosty_flight_1k_rows.csv'
3df = pl.read_csv(sample_csv)
4df.write_csv('frosty_flight_1k_rows.csv')

Head over to the CSV files page for more options for CSV file upload.

Relative timestamps

In the above example, we uploaded a Dataset with an absolute time series column (source_time). In lay person’s terms, absolute times refers to the date + time on a calendar + clock when recording a measurement. Absolute time is usually expressed in reference to a global standard (such at UTC) that normalizes the timestamp timezone.

Sometimes, test measurements are recorded in relative time, where 0 represents the start of the measurement and 0 + (i * time unit) represent the timestamp of subsequent measurements.

For example, say that you’re measuring the pressure in an engine chamber once per second. In relative time, your first measurement would be at time 0 and your 1000th measurement would be at 1000 seconds.

Nominal has first class support for measurements in either relative or absolute time, down to picosecond resolution.

To upload a dataset clocked in relative time, set the timestamp_type parameter to relative_seconds (replace “seconds” with your measurement’s resolution - for example relative_hours, relative_microseconds, relative_milliseconds, relative_minutes, or relative_nanoseconds).

Let’s look at this jet engine simulation from NASA. The relative timestamp column is cycle, which represents one operational cycle of a jet engine. (Nominal doesn’t have a relative_cycle timestamp type, so we’ll proxy with relative_hours).

First, let’s download and inspect this CSV:

1import polars as pl
2
3dataset_name = 'NASA_turbofan_train_engine_1_FD001.csv'
4link_to_csv = f'hf://datasets/nominal-io/nasa-turbofan-degradation/{dataset_name}'
5
6df_engine1 = pl.read_csv(link_to_csv)
7df_engine1.write_csv(dataset_name)
8
9df_engine1.head().select(df.columns[:8])
enginecyclesetting_1setting_2setting_3(Fan inlet temperature) (◦R)(LPC outlet temperature) (◦R)(HPC outlet temperature) (◦R)
i64i64f64f64f64f64f64f64
11-0.0007-0.0004100.0518.67641.821589.7
120.0019-0.0003100.0518.67642.151591.82
13-0.00430.0003100.0518.67642.351587.99
140.00070.0100.0518.67642.351582.79
15-0.0019-0.0002100.0518.67642.371582.85

The column that we’ll use for relative time, cycle, spans from 1 to 192.

Let’s look at all of the column names:

1df_engine1.columns.to_list()
['engine',
'cycle',
'setting_1',
'setting_2',
'setting_3',
'(Fan inlet temperature) (◦R)',
'(LPC outlet temperature) (◦R)',
'(HPC outlet temperature) (◦R)',
'(LPT outlet temperature) (◦R)',
'(Fan inlet Pressure) (psia)',
'(bypass-duct pressure) (psia)',
'(HPC outlet pressure) (psia)',
'(Physical fan speed) (rpm)',
'(Physical core speed) (rpm)',
'(Engine pressure ratio(P50/P2)',
'(HPC outlet Static pressure) (psia)',
'(Ratio of fuel flow to Ps30) (pps/psia)',
'(Corrected fan speed) (rpm)',
'(Corrected core speed) (rpm)',
'(Bypass Ratio) ',
'(Burner fuel-air ratio)',
'(Bleed Enthalpy)',
'(Required fan speed)',
'(Required fan conversion speed)',
'(High-pressure turbines Cool air flow)',
'(Low-pressure turbines Cool air flow)']

Finally, let’s upload this dataset to Nominal with timestamp_type set to relative:

1import nominal.nominal as nm
2
3dataset = nm.upload_csv(
4 dataset_name,
5 name = dataset_name
6 timestamp_column = 'cycle',
7 timestamp_type = 'relative_hours',
8)

Again, since Nominal doesn’t have a relative_cycle timestamp type (representing an operational cycle of a jet engine), we’ve used relative_hours as a proxy.

If you navigate to your organization’s Datasets page, you’ll see this dataset at the top:

relative-dataset-list

If you click on the dataset and inspect its metadata, you’ll see that the timestamp type is set to “relative:“

relative-dataset-list

Acceptable values for timestamp_type include:

iso_8601, epoch_days, epoch_hours, epoch_minutes, epoch_seconds, epoch_milliseconds, epoch_microseconds, epoch_nanoseconds, relative_days, relative_hours, relative_minutes, relative_seconds, relative_milliseconds, relative_microseconds, relative_nanoseconds

Timestamps in the form 2024-06-08T05:58:42.000Z will have a timestamp_type of iso_8601.

Timestamps in the form 1581933170999989 will most likely be epoch_microseconds.

epoch_ timestamps refers to timestamps in Unix format.

For more information about Nominal timestamps in Python, see the nominal.ts docs page.

Retrieve a Dataset

To download a Dataset on the Nominal platform, you’ll first need to obtain the Dataset’s unique identifier.

If you click on a Dataset from the Datasets page, you can copy/paste its ID from the metadata drawer:

rid-copy-paste

Now that we have the Dataset ID, we can download it in Python with nm.get_dataset(id):

1import nominal.nominal as nm
2
3id = 'ri.catalog.cerulean-staging.dataset.e5ede17b-05f9-404d-aaf5-ba85c99761a2'
4
5ds = nm.get_dataset(id)

ds is a Dataset object that contains the dataset’s metadata (name, description, labels, etc).

With the ds object that nm.get_dataset(id) returns, you can inspect the Dataset’s metadata, but not its actual data content. This capability will be included in an upcoming release.

All Nominal primitives (eg Datasets, Runs, Workbooks, and Checks) have a unique identifier called a “Resource ID”. Resource IDs may be referred to as “RID” or simply “ID” throughout the platform. They can be obtained from a primitive’s detail page (or URL) and have a format that looks like ri.catalog.cerulean-staging.dataset.e5ede17b-05f9-404d-aaf5-ba85c99761a2.

Retrieve a Channel

🚧 This will be included in an upcoming release. Check back soon!

Set labels and properties

Let’s inspect the labels and properties of this dataset:

1ds.properties

{'Fault Modes': 'HPC Degradation'}

1ds.labels

('Simulation', 'NASA', 'Training data')

If we visit this dataset’s detail page, we’ll see these same labels and properties:

dataset-labels-properties

You can set a dataset’s labels and properties through the Dataset.update() function.

For example, to remove a dataset’s labels and properties:

1ds.update(
2 labels = [],
3 properties = {}
4)

To set the dataset’s labels and properties back to their original values:

1ds.update(
2 labels = ['Simulation', 'NASA', 'Training data'],
3 properties = {'Fault Modes': 'HPC Degradation'}
4)

If you’re appending to a dataset’s labels or properties, you’ll want to cache the originals to not overwrite them:

1existing_labels = list(ds.labels)
2new_labels = ['Sea Level', 'Fan Failure']
3ds.update(labels = existing_labels + new_labels)

Append to a Dataset

You can append to an existing dataset with a CSV that has the same columns.

For example, let’s download 1000 rows of this flight test data. We’ll split it into 2 dataframes that are 500 rows each and create a Nominal dataset with the first one:

1import polars as pl
2import nominal.nominal as nm
3
4df_1000_rows = pl.read_csv('hf://datasets/nominal-io/frosty-flight/frosty_flight_1k_rows.csv')
5
6df_first_chunk = df_1000_rows.slice(0, 500) # First 500 rows
7df_second_chunk = df_1000_rows.slice(500, 1000) # Next 500 rows
8df_second_chunk.write_csv('frosty_flight_2nd_chunk.csv')
9
10dataset = nm.upload_polars(
11 df_first_chunk,
12 name = 'Frosty Flight: First 500 rows',
13 timestamp_column = 'source_time',
14 timestamp_type = 'iso_8601'
15)

Add the 2nd 500 rows with Dataset.add_csv_to_dataset().

1dataset.add_csv_to_dataset(
2 path = 'frosty_flight_2nd_chunk.csv',
3 timestamp_column='source_time',
4 timestamp_type='iso_8601'
5)

Note that Dataset.add_csv_to_dataset() only works for datasets with absolute (not relative) timestamps.

Update the dataset name as well:

1dataset.update(name = 'Frosty Flight: Full 1000 rows')

You’re now familiar with the most common ways of interacting with Nominal’s Dataset primitive in Python. For all methods available in the Dataset class, please see the Function Reference.

Built with