Retrieve decimated data points in Python

When exploring large datasets, it is often useful to decimate the data so it can be faster to download and visualize. The nominal platform has a very fast decimation service that retains details in the signal. You can see this in action on the Nominal platform when viewing a timeseries chart in a workbook. This chart shows a decimated signal while retaining details, retrieving higher resolution data points as you zoom in.

This guide demonstrates how to retrieve decimated data points from the Nominal platform in a Jupyter notebook using Python and display them in an interactive time series chart. The chart will update to show higher-resolution data as you zoom in, similar to the behavior in the Nominal workbook.

Prerequisites

Make sure you have the following Python packages installed:

  • nominal
  • jupyterlab
  • pandas
  • plotly
  • ipywidgets

You can install them all using:

1pip3 install nominal jupyterlab pandas plotly ipywidgets

Generate a sample dataset

For this guide, we’ll generate a sample dataset with 720,000 rows of random data.

1from dateutil import parser
2import datetime
3import pandas as pd
4import numpy as np
5
6dataset_name = "1h_5ms"
7start = parser.isoparse("2024-10-22T00:00:00Z")
8end = parser.isoparse("2024-10-22T01:00:00Z")
9resolution = datetime.timedelta(milliseconds=5)
10
11dt = end - start
12n = int((dt.total_seconds() * 1e6 + dt.microseconds) / resolution.microseconds)
13
14np.random.seed(2)
15cols = ["value"]
16
17# Generate a fake signal
18signal = np.random.normal(0, 0.3, size=n).cumsum() + 50
19
20# Generate many noisy samples from the signal
21noise = lambda var, bias, n: np.random.normal(bias, var, n)
22
23data = {c: signal + noise(1, 10*(np.random.random() - 0.5), n) for c in cols}
24
25# Pick a few samples from the first line and really blow them out
26locs = np.random.choice(n, 10)
27data["value"][locs] *= 2
28
29data["Time"] = [(start + resolution * i).isoformat() for i in range(n)]
30
31df_gen = pd.DataFrame(data)
32df_gen

This should display a Pandas DataFrame with a Time column and a value column with 720,000 rows.

Upload the dataset to Nominal

Before uploading the data, ensure you’re connected to Nominal.

Get your Nominal API token from your User settings page.

See the Quickstart for more details on connecting to Nominal from Python.

1import nominal.nominal as nm
2
3nm.set_token(
4 base_url = 'https://api.gov.nominal.io/api',
5 token = '* * *' # Replace with your Access Token from
6 # https://app.gov.nominal.io/settings/user?tab=tokens
7)
If you’re not sure whether your company has a Nominal tenant, please reach out to us.

Upload the dataset to Nominal using the upload_pandas() method:

1dataset = nm.upload_pandas(df_gen, dataset_name, "Time", timestamp_type="iso_8601")
2dataset

This should display the dataset metadata.

You can access this dataset later on by it’s rid. You can find the rid in the output of the code above or look it up on the nominal platform in the “Datasources” section and enter it in the nm.get_dataset() function

Retrieve decimated data points

Now that the dataset is uploaded, we can retrieve decimated data points using the get_decimated() method of Channel.

1channel = dataset.get_channel("value")
2df_decimated = channel.get_decimated(start, end, buckets=2000)
3df_decimated

This should display a Pandas DataFrame with the 2000 decimated data points.

Display the data in a time series chart

We can display the data in a time series chart using Plotly.

1import plotly.graph_objects as go
2
3if "value" in df_decimated.columns:
4 # range had less than 1000 points
5 fig = go.Figure([
6 go.Scatter(
7 x=df_decimated.index,
8 y=df_decimated["value"],
9 mode="lines",
10 # line=dict(width=0),
11 showlegend=False,
12 # line_shape="vh"
13 ),
14 ])
15else:
16 fig = go.Figure([
17 go.Scatter(
18 x=df_decimated.index,
19 y=df_decimated["max"],
20 mode="lines",
21 line=dict(width=0),
22 showlegend=False,
23 line_shape="vh"
24 ),
25 go.Scatter(
26 x=df_decimated.index,
27 y=df_decimated["min"],
28 mode="lines",
29 fill="tonexty",
30 fillcolor="blue",
31 line=dict(width=0),
32 showlegend=False,
33 line_shape="vh"
34 )
35 ])
36
37fig.update_layout(
38 xaxis_title="Time",
39 yaxis_title="Value"
40)
41
42fig.show()

This will display a time series chart with the decimated data points. Note the outliers in the signal that are preserved.

Increase resolution when zooming in

When you zoom in on the chart, the resolution of the data points will not increase. In this chapter we will add an event handler and request higher resolution data points when zooming in.

1def on_range_change(layout, range_x):
2 start = parser.isoparse(range_x[0]+"Z")
3 end = parser.isoparse(range_x[1]+"Z")
4
5 df = channel.get_decimated(start, end, buckets=2000)
6
7 # Create a new figure with the new data
8 fig = make_fig(df)
9 fig.update_layout(xaxis={"range": range_x})
10
11 # Update the existing figure widget with the new figure
12 fig_widget.layout = fig.layout
13 fig_widget.layout.on_change(on_range_change, "xaxis.range")
14 fig_widget.data = []
15 fig_widget.add_traces(fig.data)
16
17def make_fig(df):
18 if "value" in df.columns:
19 # range had less than 1000 points
20 fig = go.Figure([
21 go.Scatter(
22 x=df.index,
23 y=df["value"],
24 mode="lines",
25 showlegend=False,
26 ),
27 ])
28 else:
29 fig = go.Figure([
30 go.Scatter(
31 x=df.index,
32 y=df["max"],
33 mode="lines",
34 line=dict(width=0),
35 showlegend=False,
36 line_shape="vh"
37 ),
38 go.Scatter(
39 x=df.index,
40 y=df["min"],
41 mode="lines",
42 fill="tonexty",
43 fillcolor="blue",
44 line=dict(width=0),
45 showlegend=False,
46 line_shape="vh"
47 )
48 ])
49
50 fig.update_layout(
51 xaxis_title="Time",
52 yaxis_title="Value",
53 )
54 return fig
55
56fig_range = make_fig(channel.get_decimated(start, end, buckets=2000))
57fig_widget = go.FigureWidget(fig_range)
58
59fig_widget.layout.on_change(on_range_change, "xaxis.range")
60
61fig_widget

This will display a time series chart with the decimated data points. When you zoom in, the resolution of the data increases.